Export outlines just as OPML

shoebuddy · March 24, 2010, 6:36am

I read an earlier post that the reason outlines weren’t really exportable was because it was hard to do that across several office programs. But what about just exporting as OPML? We could then load into several available outline programs (some free) and use them to convert to whatever writing program we wanted to use.

KB · March 24, 2010, 8:24am

This is on the to-do list for a 2.x update. OPML is fairly simple to export to. However, the problem is that several programs have taken to the informal convention of inserting note text (such as synopses) in a “_notes” attribute of the “title” element. This is not great, as attributes in XML don’t support different whitespace characters, so the Cocoa XML classes strip this whitespace when generating an XML document. This means that exporting to XML in such a way that the synopses would appear in the notes area in OmniOutliner or SuperNoteCard, for instance, would result in any tabs or returns in the synopses being replaced by spaces. Not a deal-breaker, but annoying, and something I want to explore more before implementing an exporter.

All the best,
Keith

Inspired · March 3, 2011, 10:50pm

Thanks Keith,
Nice to see OPML export is being considered for a future update.
Just a note to second this request.
My wish is to move data seamlessly back and forth between a mind mapping app (currently iMindMap and MindManager) and Scrivener.

My initial tests in importing OPML exports from iMindMap and MindManager 8 is that notes are note imported into Scrivener.

iMindMap is using the following OPML for an outline node with notes:

The “outline text” goes in, but the “notes” are missed.

I see that MindManager uses what looks like a far more complex OPML export format, and that this does not include any notes. Which is a pity.

I suspect each mind mapping app might use it’s own implementation of OPML and may or may not includes notes.

Anyway… there’s something thoughts I wanted to add to this topic.

UPDATE:
I’ve just discovered that Scrivener will import OPML from NovaMind excellently. All node notes come through. I am mentioning this here, Keith, as it would be great if an OPML export feature on a future version of Scrivener supported importing back into NovaMind equally as well.

Inspired · April 20, 2011, 7:10pm

Hi Keith,
I am in the process of transferring another one of my books from its initial plan in a mind-mapping tool, over to Scrivener for the serious writing work.

I was wondering if there has been any progress with Scriv having full OPML Export functionality? I’d love to be able to keep my mindmaps up-to-date (basically in the game) once I have started working on a book in Scrivener.

Cheers…

KB · April 20, 2011, 10:18pm

Hi,

No, I’m afraid not. The problems remain - OPML export is still off in the future because of the way most programs use the non-standard “_notes” attribute in a way that the Cocoa NSXML classes don’t work well with.

All the best,
Keith

Inspired · April 21, 2011, 1:41am

I’ll have to take your word on that one Keith… says he who is ignorant of what the _notes attribute and Cocao NSXML is…

I just imported another book from Novamind to Scriv this morning. Works fantastically. Everything passes across without a hitch… each top branch comes out as a folder, sub branches as sub folders, all notes in their rightful place, etc. Just as if I’d created it in Scriv to begin with. Real nice. So, thanks for doing something right standard-wise (whatever that is) on your implementation of importing OPML (from Novamind, at least).

KB · April 21, 2011, 9:07am

Sorry, that was a poor late-at-night explanation. To elaborate:

OPML is a very sparse format - the standard specs only define a single “text” attribute for each item in the outline. This is used for the title of each item. So, say you had an outline that looked like this:

Foo
Bar
  Blah

The OPML would look like this:

<outline text="Foo"></outline>
<outline text="Bar">
    <outline text="Blah"></outline>
</outline>

Note how the title of each item is stored after “text=” - this is an XML attribute. The trouble is that only this “text” attribute is part of the official OPML spec (although even the opml.org site seems to be down now). So what if you want to add other text too? For instance, for Scrivener, you would want to take the title and the synopsis across, at the very least. Other programs have had the same problem, and they have solved it by adding their own fields - but because these other fields (attributes) aren’t part of the official spec, then there’s no way of knowing if the program you take the file to will support it or not.

One extra attribute has become fairly common - it’s used by OmniOutliner and SuperNoteCard, for instance - “_note”. This is the field Scrivener looks for to import synopses when importing OPML. So, suppose you have a document called “Bartleby” with the synopsis “I’d prefer not to.” In Omni and SuperNoteCard, this would be created in OPML as:

<outline text="Bartleby" _note="I'd prefer not to."></outline>

This is fine, except that text that appears as an attribute in an XML element - that is, that appears in quotation marks after an equals sign - generally shouldn’t contain whitespace other than space characters - it should not contain tabs or return characters. For text that does, to be better XML, it would need to appear like this:

<outline text="Bartleby">I'd prefer not to.</outline>

But because that’s not part of the official spec, nobody does that; they add extra data as extra attributes. Other programs - OmniOutliner and SuperNoteCard etc - presumably use their own XML readers and writers to create these files, readers and writers that don’t worry too much about the tabs and return characters in places you wouldn’t normally expect them. However, for me to write an exporter, I would use the Cocoa NSXML classes - this is Apple’s standard XML-generating code. The problem is that these classes strip extra tabs and return characters from places they don’t belong.

In short, this means that an exporter created using these classes would cause you to lose tab and return characters from your synopses in the exported document. This might not be a big deal to everyone, but it would become a support issue. The current importer uses some old legacy sample code I managed to find, which ignores these whitespace issues, but it’s too low-level for me to be able to write an exporter out of it without a lot of unnecessary extra coding and learning the more low-level classes.

So - those are the issues involved.

I hope that makes more sense.

All the best,
Keith

Inspired · April 26, 2011, 11:44pm

Thanks Keith,

I really appreciate you taking the time to explain that. Very helpful.
I can much better understand the challenge involved when you look at coding for Scriv to export OPML. It sounds to me like OPML seriously needs more thought and development by those who oversea it as a standard. For now, it looks to me like it is crippled before it even crosses the start line.

Well, I am still very happy Scriv imports data from NoveMind. Based on what you have explained to me, I gather that NoveMind also uses the _notes field because all my notes come across to Scriv without a hitch.

Regards,

Jonathan

KB · April 27, 2011, 8:12am

Hi Jonathan,

Yes, I believe NovaMind uses the “_notes” field as well - it’s become a sort of de facto unofficial standard - even though technically it’s a bit of an abuse of the way XML attributes should be used, programs tend to use it because others do.

Thanks and all the best,
Keith

pete340 · April 27, 2011, 1:25pm

In fact, that’s what the XML specification requires: all adjacent whitespace gets replaced by a single space character. But that’s sometimes wrong. For example in code listings and poetry, whitespace counts. So XML provides the attribute xml:space, which can be set to “default” or “preserve”; the latter, as the name suggests, preserves whitespace.

Which is mostly irrelevant, except that in order to support this, XML parsers have to provide a way for the application to see the original whitespace. I haven’t used it, but from the documentation, it sounds like

- (void)parser:(NSXMLParser *)parser foundIgnorableWhitespace:(NSString *)whitespaceString

should do that.

pete340 · April 27, 2011, 1:32pm

Which is mostly irrelevant, except that in order to support this, XML parsers have to provide a way for the application to see the original whitespace. I haven’t used it, but from the documentation, it sounds like

- (void)parser:(NSXMLParser *)parser foundIgnorableWhitespace:(NSString *)whitespaceString

should do that.
[/quote]
Which is also irrelevant, since it only applies to the content of an element, not to its attributes. Yes, stuffing general text into an attribute is a really bad idea. Sorry about the noise.

KB · April 27, 2011, 2:35pm

The trouble is more the other way - importing is fine, but the NSXML generation classes enforce adjacent whitespace to be replaced with single spaces in attributes - as soon as you write an NSXMLDocument to disk the extra whitespace is lost. There is a “preserve whitespace” option (NSXMLNodePreserveWhitespace) but it only works for the string values, not for attribute values. I don’t think this is wrong - I think Apple’s XML classes are enforcing “good” XML - but because it does make writing an OPML exporter that supports the commonly-used “_notes” attribute difficult unless you use the lower-level Core Foundation stuff (which for me would be a much bigger job and way out of my comfort zone). If it weren’t for that, writing an OPML exporter would be fairly trivial, because the Cocoa NSXML classes are a breeze to use.

All the best,
Keith

pete340 · April 27, 2011, 7:36pm

KB:

The trouble is more the other way - importing is fine, but the NSXML generation classes enforce adjacent whitespace to be replaced with single spaces in attributes - as soon as you write an NSXMLDocument to disk the extra whitespace is lost. There is a “preserve whitespace” option (NSXMLNodePreserveWhitespace) but it only works for the string values, not for attribute values. I don’t think this is wrong - I think Apple’s XML classes are enforcing “good” XML - but because it does make writing an OPML exporter that supports the commonly-used “_notes” attribute difficult unless you use the lower-level Core Foundation stuff (which for me would be a much bigger job and way out of my comfort zone). If it weren’t for that, writing an OPML exporter would be fairly trivial, because the Cocoa NSXML classes are a breeze to use.

I have to read more carefully before I reply. You did say for me to write an exporter. But I don’t agree that rewriting my document when storing it is enforcing “good” XML. The XML specifications allow arbitrary whitespace wherever spaces are allowed. Section 2.10 of the XML 1.1 specification says:

It’s when the document is delivered (i.e. displayed, printed, etc.) that the processor is supposed to collapse whitespace. Granted, the XML specification doesn’t describe writing XML documents, so doesn’t constrain what happens when an application writes XML data, but that kind of fiddling is just plain wrong.

KB · April 27, 2011, 8:29pm

That’s what Apple’s NSXML classes are doing, though - my point is that I am limited by what they do. I spent a lot of time trying to get the NSXML classes not to collapse whitespace in attributes, including a lot of hackery, but nothing worked. It’s not me collapsing the whitespace when the document is written, but Apple. Try it for yourself - create an NSXMLElement with an attribute with newlines and tabs in it and write it to file using NSXMLDocument, setting it to preserve whitespace. Then open the resulting file in a text editor - any whitespace in attributes is collapses. I can provide a code sample if you’re really interested.

pete340 · April 28, 2011, 10:14am

Understood. I wasn’t disagreeing with that, just saying that what the NSXML classes are doing is not good design.

KB · April 28, 2011, 10:25am

Ah, right. And annoying too as an OPML exporter would be useful. Although as someone pointed out to me when I asked about this on the Cocoa lists a couple of years ago, the XML specs do determine that XML processors should normalise attribute values and remove all extra whitespace:

xml.com/axml/target.html#AVNormalize

I guess your point is that this shouldn’t be done in the writing-to-file code, though, but instead in the reading-from-file part. It’s a shame that whoever started the trend of putting text in the “_notes” attribute didn’t use a CDATA block at least…

pete340 · April 28, 2011, 3:31pm

Right. The XML specification tells you what an XML processor should do. Here’s how it defines an XML processor [emphasis added]:

[Definition: A software module called an XML processor is used to read XML documents and provide access to their content and structure.] [Definition: It is assumed that an XML processor is doing its work on behalf of another module, called the application.] This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.

KB · April 28, 2011, 3:43pm

Although of course given that the NSXML classes also provide the reading part, I guess they’d strip it out there anyway.