FTPOnline - Parse XML Docs With the XMLReader

FTPOnline

Channels

Conferences

Resources

Back to VSLive! Show Daily Home

Parse XML Docs With the XMLReader
Use a stateful parser to process XML documents quickly.
by Jon Rauschenberger

VSLive! SF, Day 3, February 14, 2002 — Parsing XML documents is inherently inefficient. XML is a text-based format that can contain extremely complex data. If you need to write server-side code to process large volumes of XML documents, it's important to understand how your XML parser processes the documents. There are two basic types of XML parsers: stateful and stateless. Stateful parsers process an entire document in one pass and construct a data structure based on the contents of the document. You can then access the contents of the document by programming against the structure the parser returns. Stateless parsers process XML documents one element at a time and expose the contents of each element to the caller as it traverses the document. You are responsible for remembering the data from the document you're looking for—the parser does not populate a structure for you.

Most developers are familiar with stateful parsers based on the Document Object Model (DOM). These parsers are easy to work with and process XML documents quickly. DOM parsers are, however, inefficient when it comes to processing XML in a multiuser server environment. Because the parser populates a structure representing the contents of the entire XML document, it consumes tremendous amounts of memory. In addition, if you only need to extract select values from the document, a stateful parser will still need to process the entire document to expose the values you need. In most instances, you are better off using a stateless parser in server-side code. It's more work to write the code, but the benefits in terms of reduced resource utilization easily outweigh the additional development work.

The .NET Framework provides both stateful and stateless XML parsers. This code shows how to parse an XML document containing customer information using the XMLReader, which is a stateless parser. Note that you have to call a method on the parser to extract the value of the element/attribute you're looking for. The parser extracts only the values you ask for:

Public Function ProcessCustomersXml _
	(ByVal CustomersDocument As String, _
	ByVal NumberToProcess As Integer) As Integer

'Create a new instance of an XmlTextReader, with 
'the CustomersDocument as the FileName
Dim CustomersReader As New _
	XmlTextReader(CustomersDocument)
Dim RecordsProcessed As Integer
Dim Gender As String
Dim Age As String
Dim CustomerName As String

'Read each Customer node and increase the 
'RecordsProcessed by 1
While CustomersReader.Read()
If CustomersReader.Name = "Order" Then
'Get attribute values
Gender = CustomersReader.GetAttribute("Gender")
Age = CustomersReader.GetAttribute("Age")
'Get the element value
CustomerName = _
	CustomersReader.ReadElementString()

'Increment the counter
RecordsProcessed += 1

'Check to see if we have processed the requested 
'number of elements
If NumberToProcess <> 0 And NumberToProcess <= _
	RecordsProcessed Then
	Exit While
End If
End If
End While

ProcessCustomersXML = RecordsProcessed

End Function

About the Author
Jon Rauschenberger is a partner and the director of technology at Clarity Consulting Inc., a Chicago-based information technology consulting firm and Microsoft Gold Certified Partner. In addition to architecting and building scalable Web-based solutions, Jon is a frequent speaker at conferences such as Microsoft TechoEd, VBITS, Comdex, and DevDays. Jon is also the MSDN regional director for Chicago. Contact Jon at jrausch@claritycon.com.