Microsoft Word, XML and i4i

Okay, has anyone got a clue about the real issue with the i4i vs Microsoft case? Last August there was a ruling that Microsoft would be banned from selling copies of Word than opened XML files (DOCX etc), because they had infringed upon a patent held by the Canadian company i4i. Microsoft recently lost the appeal and are now having to remove the technology covered by the patent. But…

What technology?

I’ve been going through the various articles available about this, and sadly it is abundantly clear that not a single journalist who has covered this story in the main press has a clue about what the patent actually covers. Half of them seem be claiming that it is Microsoft’s use of XML itself, which is clearly errant nonsense as i4i don’t own a patent on XML. Others refer to Microsoft’s use of “custom XML”, which is a bit like saying that an author has infringed copyright for using “some text”. Most seem to imply that Microsoft won’t be allowed to have versions of Word that will open .docx, .docm etc, but to me that sounds unlikely. Meanwhile, Microsoft are saying that they are patching Word to remove this “little-used business feature”, which they describe as “obscure”.

Reading between the lines, it seems as though Microsoft used something very specific in their XML readers and writers and that infringed upon i4i’s patent, and that they will remove this chunk of code without it having much impact on the end-user. As far as I can tell, this won’t entail them dropping formats such as .docx, which has been accepted as an ISO standard and is just an XML spec at the end of the day.

But does anyone have a clearer understanding as to what it is exactly that has been infringed, and exactly what it is that MS are having to drop from future versions of Word? The reporting on this has been particularly poor and frustrating.

Thanks and all the best,
Keith

I’m afraid I know nothing about this, but I couldn’t resist saying how much I like the idea of “errant nonsense” – roaming around the world and getting into all sorts of adventures. Nonsense does seem to have a propensity for errantry. (Sorry, too many y’s there.)

Best wishes,

Martin BB.

I just like the adjective “errant” because of Churchill’s use of “errant pedantry”. Still, I defend my use because the nonsense that XML could be owned by i4i is entirely off track. :slight_smile:

Anyway, anyone with any insights into the truth or facts behind this story, please spill!

All the best,
Keith

From the legal “crap” I had to answer on our side, my understanding is that the patent is for an explicit mapping of element to display methodology. As explained to me by the legal eagle that keeps me in line as long as you are using the standard XML dtd method (no dynamic data typing) you should be fine. Not having actually cared enough to read the patent I may have something “not quite right” in that statement.

If you want I can go sit on someone’s head and find you a pointer to that actual patent and technical evaluation.

I have an imperfect understanding of this, but it’s the most thorough explanation I’ve yet seen.

HTML is “HyperText Markup Language.” You can write in HTML and a browser will display what you’ve written because everyone has agreed on exactly what commands are recognized in HTML.

and so on. (Anyone who has tried to code webpages that will work both in IE and any other browser will laugh at the "Everyone agrees" bit.)

XML is “eXtensible Markup Language.” It has a lot of similarities with HTML, but it allows you to define a language with your own commands.

With your own set of commands, you can design XML languages for very specific purposes. For example: banking software. Remember how hard it used to be to get a bank program to understand specific banks? This is why it’s so much easier now. The banks have come together and agreed on a XML language that allows banking programs to connect to all of their interfaces without needing translation.

So as I understand it, i4i is claiming that MS stole the XML code that they developed for the displaying of documents and integrated it into MS Word. Presumably not .

Ah – a little slip, I believe. What Churchill is supposed to have written was “arrant pedantry”, not errant. He may not even have written it, according to some. But I’ll leave others to argue about that! It is still a fine sentence, whoever wrote it.

All the best,

Martin.

PS: My OED tells me that “arrant” is a Middle English variant on “errant”. But in modern English the two words do have different meanings.

PPS: for more about the “Churchill” quote, see

groups.google.com/group/alt.engl … 9923cd662c

Hmm, so it’s just that Microsoft are using a specific set of XML names or suchlike? I would have thought it must be more than that. XML is nothing more than a format, really (I’ve recently converted all of Scrivener’s internal files to XML; many file formats are XML-based). I did see a reference to the patent somewhere, but I have to admit I didn’t follow the link so I’ll have to look it up again.

Thanks and all the best,
Keith

I think the whole confusion stems from the fact that patent issue is about something that MS calls “Custom XML”, which isn’t really XML. It’s just a technique to store raw file data in an XML-based MS Office document. You can read about it here:

milan.kupcevic.net/custom-xml-mi … 7449-msdn/

Again, imperfect understanding here, but the situation is more related to how elements are interpreted for document display. Not sure if this is rendering (low level) or dtd level (there is a mention of “dynamic dtd” in my notes). All our apps checked out OK and we are using XML the same way that you are, as storage for data.

Precisely: “Custom XML.” XML has its own built-in set of commands, then you customize from there.
(Excuse all the nautical examples this morning, I’m preparing to take my almost-toddler to the aquarium this morning)

So they patented their customization. MS used it. Now they have to stop.

Kendric - no, that’s not quite right. XML has no set of commands, it just defines a syntax. The creator of an XML file then defines the names of elements, which elements have subelements and the names of those subelements, any attributes elements may have, and the meaning of these elements and attributes within the context of the application that will use them. Technically, the term “custom XML file” could be used to describe any XML file, as there is no one set of elements that any XML file should have in order not to be a custom XML file. The term “custom XML” is pretty meaningless, in fact. And I doubt you could patent any XML file format. Consider the OPML file format. This is rather simple:

<opml>
<head>
</head>
<body>
<outline text="Title of this OPML element"></outline>
</body>
</opml>

(OPML is actually quite bad - the “outline” elements should really allow for a string value rather than placing the text in an attribute, given that strict XML doesn’t allow proper whitespace in attributes, but that’s another topic entirely.)

You couldn’t patent the use of the word “outline” in XML elements; that would be ludicrous.

Many thanks to signinstranger for the link - I hadn’t seen that, and finally there is an article that clarifies things. It turns out that the term “custom XML” is indeed entirely meaningless - they are not talking about XML at all:

Again from the article signinstranger linked to:

In fact, although that article gets rather technical, it seems that the patent doesn’t refer to XML, but to a way of storing any document inside the file, linked within the XML or main file. That is, an MS Word document can have any other file added to it (e.g. a Photoshop file, a sound file, anything) and that file will be stored inside the .docx file (because .docx is just a renamed .zip file) and linked inside the main XML file using some kind of “metacode map”. And it is this - the way the raw data of another file can be stored inside the .docx file and linked within it - that seems to be infringing on i4i’s patent. (At this stage, I wonder how it is that Apple’s .pages format doesn’t infringe on the patent, too, as that does much the same thing - presumably it is down to the particular way of mapping the files between the raw data and the XML file.)

So, in other words, Microsoft misleadingly used the term “custom XML” in its own documentation of this feature, which has led to a lot of journalists wrongly declaring that i4i have somehow patented XML itself, and a lot more wrongly to state that Microsoft is being forced to withdraw its .docx format. Both these assertions are unfounded and untrue, and I’m truly surprised to see them in so much of the coverage of the case. Really, the entire result of the case - other than i4i receiving millions of pounds - is that Microsoft have to remove a minor feature of their XML-reading code that will have no impact on the vast majority of Word users. DOCX will continue to be the default Word format, and has not been nixed by this case at all.

Many thanks again to signinstranger for the link which explains the issues and clarifies the misleading terminology.

All the best,
Keith

In that case I’m glad I with “I have an imperfect understanding of this.” :smiley:

I think just about everyone who wrote about it on most sites and most magazines had an imperfect understanding. :slight_smile: From the way it was described on other sites, your description was on the ball, which is what was confusing me; it’s a shame not many of the journalists who have written about this have tried to understand the technical jargon to make it more comprehensible.

All the best,
Keith

Hmm… I wonder what folks were talking to me about then. Thanks for the info (and the analysis KB). I will need to have to folks make sure they actually read this thing over again.

Well you seem to have analysed the situation much better than most of the so-called ‘IT press’.

The technology in dispute is a method for attaching arbitrary file types to documents. MS just calls it Custom XML because they’re using it to bundle data with their XML-based office storage format. It is probably similar to the way you use the MacOSX file packages to create Scrivener project files. It doesn’t really have anything to do with XML at all, and will have zero impact on Office users because MS will simply remove the tech from Office 2007 (the only version of Office that uses it).

I think MS thought it would be cheaper to fight it out in court rather than change the code – seems they may have been wrong.

I’m not versed on the internal workings of .docx or .pages files, but I’m assuming that .docx is a single file, and not a packaged directory. If that’s true, then I’m betting the patented tech is a fancy implementation of the old Unix “uuencode/uudecode” program, which can turn a binary file (images, executable files, etc…) into ascii text and back again. I’m thinking that you can’t normally splice non-text data into xml, so they had to invent or borrow a way to do it.

For .pages, I’d assume that the “file” is actually a package/directory much like what Scrivener does (only probably zipped so it appears as one file when copied to a windows machine).

While you can’t patent a tag that says “look in the current directory for a file named awesome.jpeg”, you can certainly patent an algorithm & data format that works like a sophisticated uuencode.

Am I totally off base here?

Actually .docx is a zipped package just like a .pages file, but I bet the technology they are using is something similar to what you say. It may be that rather than storing the other file in the package and linking to it, they are encrypting inside one of the XML files in the package somehow.

All the best,
Keith

That’s exactly what it does and is why the OOXML is a bit of a hack, IMO.