Thursday, July 10, 2008

First look at Googles Protocol Buffer.

First look at Googles Protocol Buffer.

First the hype; Google wants you to think of Protocol Buffers as the New XML.
As such the list 5 reasons on the overview page for why Protocol Buffers have advantages over XML.
http://code.google.com/apis/protocolbuffers/docs/overview.html

1. Simpler
2. 3 to 10 times smaller
3. 20 to 100 times faster
4. less ambiguous
5. easier to program and generate data access classes

Two comments come to mind right off the bat.

Some of the performance gains come from poorly form xml design as such their are cases where the Protocol Buffer will be more than 10 time smaller and more than 100 times faster.
So it might condense things to say 3 times or more smaller and 20 times faster.
Lets look at the examples Google provides here is an xml snippet


John Doe
jdoe@example.com


And here is the thing which is NOT the protocol buffer

# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
name: "John Doe"
email: "jdoe@example.com"
}

The 'Not' is important here because the protocol buffer is not stored in a human readable or text format.
It is stored as binary, which means human readability is gone.
No matter, but looking at the example it is hard to see how if I use a absurdly long name, how the XML would be three times larger than the Protocol Buffer.
In fact as I push it out to large lengths the two formats should have a ratio of sizes of about 1 to 1.

The faster is easy to see.
We will not be opening and closing a bunch of tags and the tags themselves have opening closing brackets that aid human comprehension but become dead weight repetitive.
But in the counter example used for size it is hard to see how the same example would not show a speed ratio between the formats of 2 to 1.

Now on the upside.
Clearly it is less ambiguous. If you have tried to make XML live up to its hype you know the ambiguous pitfalls.
The "easier to program and generate data access classes" is function of Protocol Buffers being more constrained in what they will support and what is legal.
History relives itself here, XML was the same "simplify to gain power" condensing of HTML.
Likewise HTML condensed SGML.

There is another fundamental divergence from XML. You could use XML without documenting the schema.
This was in classic software fashion the greatest strength in rapidly spreading XML because there were no rules.
And the greatest weakness in that there where no rules so we never were 100% sure what documents meant or if they were well formed or complete.
This is diffent in that Protocol Buffers are defined in "Protocol Buffer Message Definitions" stored in a ".proto" file.
These are similiar to XML's DTD or Schemas.
The difference is they are now required.
Again there is the history flashback, HTML did not require closing tags, XML did and that made all the difference.

No comments: