Pass on the JSON, and choose binary encoding formats instead
Find out how developers can achieve significant performance boosts using new binary encoding formats as alternatives to JSON and XML.
Binary encoding formats promise significant performance improvements for communications-intensive apps. TheServerSide caught up with Martin Thompson, founder of Real Logic, who implemented a new binary protocol format for financial trading that is over 1,000 times faster than JSON.
What is your take on the status of new techniques for replacing JSON with more efficient binary enconding formats and protocols like Google Protocol Buffer?
Martin Thompson: First up, none of these formats are protocols. They are codecs. A protocol is a means of describing an communication interaction. Each of the individual messages used in a protocol can be encoded or decoded in a particular format by a codec.
What's your take on the relative performance of JSON/REST, compared with binary encoding formats like Google Protocol Buffers, Avro, Thrift, Bond and Simple Binary?
Thompson: Text-based encodings are typically 10x slower than the less efficient binary codecs such as GBP. There are binary encodings that are 10x to 100x more efficient such as FlatBuffers, Cap'n Proto and SBE (Simple Binary Encoding).
Does this kind of efficiency just reduce latency, or do you see a role from more efficient cloud usage by moving message parsing from JSON to these more efficient binary formats?
Thompson: This increase in efficiency results in direct reductions in latency, increases in throughput, and efficiency gains. We can also see bandwidth reduction due to more compact encodings. One of the biggest wins can be on mobile devices where the battery usage is significantly reduced.
You will likely be shocked how much CPU time and memory is dedicated to protocols and codecs relative to the business logic.
If you profile the typical business application you will likely be shocked how much CPU time and memory is dedicated to protocols and codecs relative to the business logic. It seems our applications are mostly doing protocol handling and encoding and as a side effect do a little business logic.
What are the types of applications where binary enconding format efficiency might translate into the most significant gains or reduce the cloud instance size required for a particular type of application?
Thompson: Any application that does a significant amount of communication or encoding, such as microservices or monitoring data. Text-based logging is an abomination.
What are the limitations of binary encoding formats, and particularly SBE? Are there places where it is not as good a fit?
Thompson: The main limitation is lack of understanding and experience in the development community. We spend so much of our time debugging all types of applications. Text encodings are easier for those inexperienced with binary encodings. However, with experience, binary encodings become easy to work with and in many cases are even simpler to debug because there are less edge cases.
Where would you see these formats being used in the communications stack compared with lower-level protocols like UDP/TCP, and higher-level protocols like WebSockets, XMPP, CoAP and MQTT?
Thompson: In the OSI layer model these encodings are Layer 6, i.e., presentations. UDP is Layer 4, TCP is a mix of Layer[s] 4 and 5. WebSockets, XMPP, HTTP, etc. are Layer 7 application protocols.
What are the development challenges around using SBE in terms of debugging compared with GPB and REST?
Thompson: SBE compared to GBP is very similar in usage. SBE has the restriction that messages with repeating groups must be accessed in order versus arbitrary access. Some find this restricting. I find this is just a matter of development discipline. Arbitrary memory access does not play well with the prefetchers in a CPU cache subsystems. CPUs love predictable patterns. REST is a Layer 7 protocol and does not compare.
What do you see as some of the factors holding back the wider adoption of binary encoding formats like GPB and SBE?
Thompson: Lack of experience and awareness. The cool kids are mostly using JSON these days. This is such a shame because it is such a poor encoding. It has no types and is very inefficient.
What would you consider the best practices for organizations to replace the use of REST and JSON with these more efficient formats?
Thompson: Try them on a small project and build experience. Then in time write tools to help with debugging such as Wireshark dissectors and viewing tools. The viewing tools do not need to be complex; simple command line tools can be enough.