Monday, October 4, 2010

Data Serialization standards.... beyond XML

I am reading a series of EXCELLENT pages on the Wikipedia:




I am not saying one is better than the other. Each has pros and cons.

XML disturbs me for his prolixity, the entities like < and > and &, and the close tags . I like it because it's fairly robust and unambiguous - barred Namespaces.

JSON disturbs me for his quotes. I like its simplicity.

YAML disturbs for the reliance on indentation and special notations. I like the rigorous representation of complex data structures.

But the huge question mark raising in my head is: why on Earth do we need a format that is directly readable/editable by humans?
So that one can use Notepad or vi to edit them?

Ok, then I will give you a YAMLNotepad and a YAMLvi and we move away from a (necessarily inefficient) human readable format and use a more rigorous binary representation, mutuated for instance from Java .ser format or Coherence POF format or TIBCO or whatever other high-speed serialization standard.
We will gain in performance, validation, readibility (you can customize the program to display in whatever format you want, attaching the stylesheets of your choice)...

Why on Earth the actual data on the wire must be in the same format that we display to the user? To me it's only because it's difficult to have a standard be supported across multiple products / platform, and an ASCII editor is the most universal too available. Too bad though.... so much is lost.

No comments: