Search:
Locator+ Code:
FTPOnline Channels Conferences Resources Hot Topics Partner Sites Magazines About FTP RSS 2.0 Feed

Back to Tech•Ed Show Daily Home

email article
printer friendly
more resources


Exploit XML-Word Interoperability
Discover new ways you can use XML to make Word more interoperable.
by Mitch Gitman

Tech•Ed, May 27, 2004

ADVERTISEMENT

Are you frustrated at how difficult it is to produce structured data with Microsoft Word for use outside the Office world? Hopefully, you haven't resorted to writing your own word processor. Instead, the answer might lie with the new XML features of Microsoft Office Word 2003.

In this article, I'll examine some of the new ways you can use XML to make Word more interoperable. I'll look at these from the standpoint of someone who has bought into Microsoft's broader vision of interoperability, revolving around the .NET Framework and the XML Schema language for defining the structure and typology of XML documents.

WordprocessingML
With Word 2003, you can now save a Word document as a text .xml file and not lose any data that would otherwise be present in a binary .doc file. The two file formats are equivalent, and Word can go back and forth seamlessly, opening a file in one and saving in the other.

The foundation for this XML support lies in Word's XML representation of its document model, called WordprocessingML, or just WordML. Actually, WordML is simply an XML schema for defining XML documents. Other products in Microsoft Office 2003 have their own schemas; for example, Excel 2003 uses SpreadsheetML.

Literally, WordML comprises eight namespace-assigned schemas. In any Word-generated WordML document, you'll find these schemas declared using these xmlns:prefix="namespace" combinations:

  • w="http://schemas.microsoft.com/office/word/2003/wordml"
  • v="urn:schemas-microsoft-com:vml"
  • w10="urn:schemas-microsoft-com:office:word"
  • sl="http://schemas.microsoft.com/schemaLibrary/2003/core"
  • aml="http://schemas.microsoft.com/aml/2001/core"
  • wx="http://schemas.microsoft.com/office/word/2003/auxHint"
  • o="urn:schemas-microsoft-com:office:office"
  • dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"

Some of these schemas merit an explanation:

  • w: This is the core WordML schema that defines key elements and data types such as the root wordDocument element. You could get away with manually defining an entire WordML document without straying from this namespace.
  • aml: The Annotation Markup Language schema defines annotations—comments, revision history, bookmarks. Often, this is the kind of extra data that delivers enormous value during the authoring process but can be stripped from the published document.
  • wx: Auxiliary hints that Word ignores but can be helpful to an outside XML processor.
  • o: Properties that are common across the Office application suite.

You can download the .xsd files for each of these namespaces. Start at the Office 2003 XML Reference Schemas Licensing page (see Resources).

Well, this is all very well and good, but how does WordML relate to your own XML data structures?

Back to top





Java Pro | Visual Studio Magazine | Windows Server System Magazine
.NET Magazine | Enterprise Architect | XML & Web Services Magazine
VSLive! | Thunder Lizard Events | Discussions | Newsletters | FTP Home