Protobuf vs JSON Serialization

Andrew Dawson
3 min readMay 3, 2023

--

  • Protobuf requires a schema to encode and decode, while JSON does not.
  • Protobuf is encoded as binary while JSON is encoded just as a string. Additionally protobuf encoding does not include key names because the proto schema definition file includes numbers that get used during serialization and deserialization to determine what the encoded format should be. This means that the size is smaller and the transmission speed is faster.
  • Protobuf schemas allow for structural validation. When a client gets JSON it is just a string — it could contain anything. On the flip side as long as the client has the protobuf schema and successfully can deserialize into it, the client can make assumptions about the structure.
  • Backwards compatibility means that a client is able to understand a previous version of a message, forward compatibility means that a client is able to understand a future version of a message.
  • Protobuf enables forwards and backwards compatibility without requiring the client or server to do explicit version checks.
  • The numbers in the protobuf schema are important because they indicate how clients should parse a proto message. Suppose a client gets a protobuf message which contains a new field. The client will parse this new message structure into its old protobuf schema by having the proto compiler skip over any fields which the client’s schema does not know about. In this way forward compatability is achieved. It is because of this you cannot resue these numbers.
  • All additive changes (adding a service, adding a method, adding a field and adding an enum) are backwards compatible in proto. This is because when these new messages are transmitted clients which don’t know about them will simply ignore them.
  • Renaming fields within a message is a backwards compatible change because field names are not sent over the wire. Fields are defined by their number and their data type, so renaming a field is not a problem. However for human readability you can encode proto as json or text and if you do this then renaming a field is a breaking change.
  • Renaming or removing a package, service, method is a breaking change and the client will get a not implemented error.
  • Removing a field is also fine. When a old message (with the field) is read into the new code (without the field) it will simply be ignored.
  • Proto2 had required fields but Proto3 does not have any required fields. This improves both forwards and backwards compatibility because if fields are unknown or unset they can just be skipped and set to default values.
  • If you remove a field you should reseve the number associated with that field so that future authors do not accidently reuse that field number.
  • Generally since protobuf can be used in forwards and backwards compatable ways you do not need to include packages / namespaces to version different version of the same proto. However, doing this is still good practice becuse IF you do end up having to make a breaking change then its useful to have a window where the old version and the new version are both supported until all clients have been moved off the old version and the old version can be removed. For example if you have a message called Greeting you might want to namespace it like v1.Greeting so that IF you ever need to have a breaking version of Greeting you can introduce a v2.Greeting without having a naming conflict.

--

--

Andrew Dawson
Andrew Dawson

Written by Andrew Dawson

Senior software engineer with an interest in building large scale infrastructure systems.

No responses yet