Over the past few months, a topic I’ve become quite interested in is parsing XML. And more specifically, parsing XML in parallel.
Although I won’t take this opportunity to expound in any detail on what I’ve been up to, I did want to state that this topic is receiving interest from significant industry players. For example, here are two data points:
Parsing of XML documents has been recognized as a performance bottleneck when processing XML. One cost-effective way to improve parsing performance is to use parallel algorithms and leverage the use of multi-core processors. Parallel parsing for XML Document Object Model (DOM) has been proposed, but the existing schemes do not scale up well with the number of processors. Further, there is little discussion of parallel parsing methods for other parsing models. The question is: how can we improve parallel parsing for DOM and other XML parsing models, when multi-core processors are available?
Intel Corp. released a new software product suite that is designed to enhance the performance of XML in service-oriented architecture (SOA) environments, or other environments where XML handling needs optimization. Intel XML Software Suite 1.0, which was announced earlier this month, provides libraries to help accelerate XSLT, XPath, XML schemas and XML parsing. XML performance was found to be twice that of open source solutions when Intel tested its product …
As someone with a vested interest in XML, I regard data points such as these as very positive overall.