Change default storage to HTML, XML or DocBook

Since Not Kevin’s mentioned the 2.5 release in the Technical Support area…I’d like to make a bold suggestion. The current Scrivener uses RTF as the internal mechanism for text storage. I would like to propose the radical departure to a non-proprietary, UTF-8(like) format: HTML.
If not HTML, then XML or DocBook. This results in more flexibility in the data without significant adverse impact on the writer.

Why we use Scrivener, probably more for self-publishing. Why change? Look at the Scrivener customer domain. All of us are writers. Most of us prefer to focus on the text over the formatting, while we still enjoy the WYSIWYG. (Some of us don’t care abotu WYSIWYG.) What we all need is an internal storage format that gives us flexibility.

There are other features that attract us to Scrivener. It’s a document project management tool from hell. You can compile it into a dozen different formats. It has the toolchain most self-published authors want to publish a finished novel; or the bones of a screenplay that can be fed into a professional screenplay tool; or (for you attorneys) another work of fiction called the brief. It’s as close to a self-contained publishing tool as you can find, especially in its price range. But, did you buy Scrivener because it internally uses RTF?

Using HTML, XML or DocBook does not prevent exports to formats suitable for working with other downstream software or publishers. It does allow creation of stylesheets that make submission to publishers easier. For example, Tor has specific formatting rules, which can be defined more readily in a stylesheet.

Ebook formats are variants of HTML. When you look at Ebook Formats, the big three (EPUB, MOBI, iBook) are variations of HTML. Using HTML, therefore, lines the formatting perfectly. Not that “Not Kevin” and crew haven’t done a fine job of making that happen. HTML also converts rather nicely to PDF. For the typesetting fanatics, you can convert HTML into LaTeX.

Other advantages of XML-based Formatting. Being an open standard, HTML transitions between Markdown rather nicely. Not as nicely with LaTeX (trust me, I’ve tried). That allows you to do some “outside Scrivener” work when you need to rather seemlessly.

HTML also helps ensure consistent formatting, as opposed to RTF. “RTF parsing is non-trivial. The inner workings . . . are somewhat scary.” IIRC, it’s a stream-based parser, changing the look of the document when it hits a token that tells it to italics, or color, or whathaveyou. HTML markup is based on a demarked tag pair, making it easier to convert between formats. You can define italics style (for example) and know it is consistent throughout in the final document.

Yes, Scrivener does this now. Without looking under the sheets, I’ll bet it’s not easy or elegant. It might be a little kludgy (due to RTF, not the development team).

Writer Impact, Focusing on writing. What impact would it have to the writer? Define your work’s stylesheet…setting the font, etc. Then, Scrivener uses that stylesheet to present the WYSIWYG. There could be a stylesheet for writing, revision management (colors), and the final publication. So, what you see when you write could be in Courier and the final could be in MS Comic Sans. Otherwise, the impact should be fairly minor.

Development Impact, platform independence. What impact on development? My understanding is Apple’s RTF is different than Microsoft’s (the owner of the format). Using HTML, XML or DocBook makes for a platform-agnostic format. It’s in text format, as opposed to binary, making it easier to export, import and use with version control (such as git).

Scrivener will continue to use RTF internally for the foreseeable future. HTML doesn’t even have all the formatting that rich text supports. Besides, Scrivener uses the OS X text engine, which is based on rich text, so it would need to translate any file format into a rich text variant before loading it anyway, so using anything else doesn’t make much sense. An advantage of RTF is that it can be opened in almost any word processor, too. Really, Scrivener’s internal format isn’t something that you should worry about, given that Scrivener can export and compile to so many formats. Once the text is loaded into Scrivener, the underlying format is of no relevance. RTF was chosen because of its ubiquity and support of everything we need. So, this won’t be changing, certainly not in version 2.0.

I think many of us these days work on different devices, be it laptops, pda, iphones, and tablets. Any format that is common to all of these (and windows, android, iOS) would do the job. Of course, different platforms support different versions, so there is the issue of finding a common demonenator is something for the developers to sort out.

As someone who also uses version control (because I also develop software) RTF works fine because it looks like text, it can be compared by diff tools, and in the case of a catastrophic failure, you can retrieve your precious text by opening a file in an ordinary text editor. I use GIT as my version control system, mostly because it works well, you can set it up ocally or via GIT hub, and it’s what we use where I work.

DocBook, SGML and a few others are XML based, meaning they are essentially marked up text, but the standards for marking up differ in each implementation.

RTF seems to be supported well on all the platforms I’ve looked at, including:
Word and OpenOffice (now LibreOffice) on windows,
iOS, Android with Softmaker Office and windows based PDAs/phones with Word or Softmaker.

KB, I know you said you won’t be changing the internal format any time soon, but if it every comes up as a possability, I’d vote for a marked up text format that can be opened in any text editor, and diffed in a version control system.


Marked-up text has a couple of disadvantages as an internal Scriv file format:

  1. Something like Markdown (for instance) doesn’t support all of Scriveners rich text features, so I’d have to come up with a lot of my own mark-up.

  2. Many users write in Scrivener using Markdown format for MultiMarkdown export - mixing Markdown in the editor and in the underlying format would cause conflicts.

That’s not to say that the internal file format will never change, by the way - just that we’ll always try to keep it something standard and that supports all of Scrivener’s features with the minimum of fuss or overhead.

All the best,