Saturday, February 26, 2011

XML Streaming Parsing: a desert populated by skinny animals


Having to implement a XML Streaming parser is no fun these days.

On one hand, you have a solid technology which is JAXB. Wonderful. But it doesn't handle streaming. So if you have a 1 GB document to parse, you are screwed.

On the other hand, you have SAX and StAX.... well I have played a few hours with StAX, and I can't really see the improvement over SAX, apart from the silly "pull" approach over the "push" approach... so what? Basically you still have to handle individual atomic events (startElement, characters, stopElement) and manually reconstruct your Java XmlEntity from them.... painful if you have complex Entities...

StAX is useful only if you need to parse bits and pieces of information here and there, without having to parse the whole thing. Otherwise, FIASCO.


Finally I resorted to using the excellent java.util.Scanner class to locate the start of an element, and read until the end of the element into a String, then use JAXB to parse the String into a XmlEntity.... it works like a charm.

you just need to annotate your entity as @XmlRootElement - no worries you can have multiple XmlRootElement in your file, and you can annotate even a static class.


I see here a big need for a streaming technology that can still make use of JAXB... it seems not too difficult to implement... or some adjustments on StAX so that it can call JAXB...

No comments: