MultiMarkdown implementation - How?

KB · September 25, 2006, 5:16pm

Okay, the largest thing I have remaining for beta 3 is implementing MultiMarkdown. (There will be no intermediary betas simply because each beta brings with it numerous small fixes and a flurry of activity here, which would only push back the larger stuff; thus I want to get MMD out of the way before dealing with the little stuff that will arise from recent changes.)

So, to those of you who want MultiMarkdown support, I ask: how would you like it done? Specifically, I am thinking of the “Export Draft” sheet. Nothing else will change as far as the user is concerned, apart from a per-document option somewhere to “Use MultiMarkdown”. But the Export Draft thing is a bit different. For a start, the export formats will be different depending on what you select. If you use MMD, you can use LaTeX - but only if every document you export uses MMD. Unless, that is, other documents are converted to plain text for the sake of the export.

And what about setting up styles for the titles/subtitles? Clearly, these should still use the styles set in Export Settings und Project Settings, so I suppose I just have to add the MMD code for that (help? ).

Thanks in advance for ideas,
Keith

Joakim_Hertze · September 26, 2006, 6:39am

The way I see it MultiMarkdown filtering only is applicable when the user wants to export as plain text. Why not let a second drop down list appear (or be enabled) then the user selects â€œPlain Textâ€

KB · September 26, 2006, 6:53am

Thanks for your input, Joakim, very useful.

[quote]
You mentioned a document specific â€œUse MultiMarkdownâ€

Joakim_Hertze · September 26, 2006, 7:10am

I agree – the checkboxes mentioned are totally unnecessary. I really can’t see a situation where one wants to export parts of one’s draft as LaTeX, and other parts as plain text only. And even if there is such a situation I suggest exporting your draft as two different versions and then cut and paste. After all – you can’t do everything for everybody.

I forgot one thing in my previous post. You would probably want a third drop down menu to appear, where the user can select the language specific version of SmartyPants they want to use.

As for the group and text titles – do you really need to supply formatting for those? I don’t think it would be a bad trade off if I had to apply those later. I suspect it would be fairly simple for you to add Multimarkdown formatting during the export process (# Heading one, ## Heading two), but that would be hardwired into Scrivener, right? What if you want to supply different filters in the future?

I would propose that you append all metadata to the draft files before running them through the filters. Perhaps it could look like this:

[code]Title
25 september 2006, 19:18
Status: First Draft
Label: No Label
Keywords: duck

Synopsis
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Vestibulum in odio eu urna tempus ultrices. Duis tincidunt. Pellentesque pretium justo sit amet orci.

Text
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Vestibulum in odio eu urna tempus ultrices. Duis tincidunt. Pellentesque pretium justo sit amet orci. Fusce turpis nibh, porta sed, luctus auctor, luctus vitae, enim. Curabitur urna libero, gravida id, fringilla nec, porta ut, urna. Maecenas risus erat, sollicitudin sed, ultrices vel, elementum nec, nisi […]

Notes
Fusce turpis nibh, porta sed, luctus auctor, luctus vitae, enim. Curabitur urna libero, gravida id, fringilla nec, porta ut, urna.[/code]

If you insist on adding Markdown syntax I think this would be one way to do it:

[code]# Title
25 september 2006, 19:18
Status: First Draft
Label: No Label
Keywords: duck

Synopsis

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Vestibulum in odio eu urna tempus ultrices. Duis tincidunt. Pellentesque pretium justo sit amet orci.

Text

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Vestibulum in odio eu urna tempus ultrices. Duis tincidunt. Pellentesque pretium justo sit amet orci. Fusce turpis nibh, porta sed, luctus auctor, luctus vitae, enim. Curabitur urna libero, gravida id, fringilla nec, porta ut, urna. Maecenas risus erat, sollicitudin sed, ultrices vel, elementum nec, nisi […]

Notes

Fusce turpis nibh, porta sed, luctus auctor, luctus vitae, enim. Curabitur urna libero, gravida id, fringilla nec, porta ut, urna.
[/code]
If you want to do it the advanced way you would probably want to let the user define a “pre pass filter” (or a templete), where he, or she, specifies how titles and metadata should be marked up before running any export filters. This would open up for the possibility of using something else instead of MMD and it would move the decisions about markup outside the Srivener source code. But this is perhaps something for Scrivener 2.0?

AmberV · September 26, 2006, 9:49am

I would recommend not using the header tags, actually. These are more than just font shortcuts, they define actual structure, much like the

to

tags in HTML. In other words they would be listed in outliners and table of contents. At first blush that might be all right. If a user wants to have notes and synopsis, they probably are not exporting a “final” copy, and having these things in the ToC might not be the end of the world – but, where it gets sticky is in decided what level they should be.

For example, I might have a document which is a chapter, within a part, within a book. I might have already used the single ‘#’ header for the name of the book, and the ‘##’ for the part, and named the document chapter name with ‘###’ That means, if you were to use the above syntax, synopsis would be promoted to a Part of the book, when in actuality it needs ‘####’ to fall logically within the structure.

It would probably be next to impossible to figure out what level they should be, because all of the above is arbitrary. The next user may not have parts, or they may be writing all of their documents as sub-chapters and the base is ‘####’.

So, if you want things to look precisely as they look in the RTF output, I would simply suggest using the format syntax to make it look that way, rather than applying structure to the document.

[code]Title of Document
25 september 2006, 19:18
Status: First Draft
Label: No Label
Keywords: duck

Synopsis
Unfortunately, there is no underline syntax is MMD, but you can easily use HTML wherever you want, and since the user will never see this – we needn’t worry about ugliness.

Text
And so on… you get the idea.[/code]

Joakim_Hertze · September 26, 2006, 10:23am

Yeah, you’re right about that. But since no two users are alike I would really advocate no formatting at all (coming from the web world I really favour structural mark up), or to let the user specify what he wants with an export template (much like the ones above).
But hey, this is no big deal. I could very well live with style formatting – and I think most people out there could too.

AmberV · September 26, 2006, 11:07am

My opinion: I think, in this case the added formatting does not bother me because if you have these options turned on, you probably are not exporting a final draft anyway, and the alternative is to just have all of this meta-data displayed in the bare default (I’d be really surprised if Keith goes for the template thing right now – that would be a lot of extra work), which would be inconsistent with standard export.

AmberV · September 26, 2006, 11:24am

Okay, I know this is probably ridiculously over-simplistic, but from a conceptual stand-point, how about something like this?

From the user side, you would simply select LaTeX as your export method. I do not see why you would need any other checkboxes or drop downs (except maybe the localisation bit), or the confusion of selecting Plain Text first. While technically everything will be handled as plain text, the user only cares about what comes out at the end, be that a web page, a LaTeX book, or an RTF.

A warning would come up (with a NeverShowAgainFlag) informing the user that all RTF formatting will be lost, and that documents without MMD syntax will be converted to plain text. Presumably, if a person were formatting their Draft with MMD for the purpose of eventual LaTeX export, they probably will not have very many RTFs hanging around, anyway.

With the LaTeX exporter, having an MMD flag at all seems redundant, since everything is getting converted to plain text. But of course, it is necessary for the other exports, where the MMD document would get converted into an RTF first, and then added to the working stream.

I think, this would mean there would be two different pre-processors. The LaTeX pre-processor would convert Scrivener annotations and footnotes into MMD analogues, so that they are properly converted into LaTeX. But the RTF/XHTML/PlainText exporters would want to retain the Scrivener mark-up, so that they get properly handled by Scrivener, not MMD. MMD flattens footnotes.

Actually, does that mean the MMD=>XHTML script would never be needed? If everything gets converted to RTF internally, and then Scrivener does what it does from that point on – I think not? Although, on the other hand, I would miss the ultra-clean XHTML that it generates, as opposed to Apple’s nightmare version of it.

Annotation and Footnotes Syntax Example

This would only need to be done for the LaTeX exporter. Footnotes are the easiest, just convert Scrivener footnotes into this:

[code]Here is the text stream[^1], and so forth.

[^1]: And here is the referenced text.[/code]

The next would be [^2] and so on.

Annotations are a something to think about. It might be nice to actually convert them into LaTeX’s version of a comment. But this would require an extra few steps, because you cannot insert raw LaTeX into an MMD file, it will try to be smart and escape all of the characters that make it work. So, you would want to create some sort of unambiguous start and stop in the pre-processor step, say:

[A[Here is the text of the annotation.]A]

And then, after the LaTeX file has been completely generated, go through and replace all of those so that the final result looks like:

\marginpar{Here is the text of the annotation.}

This will set the annotation into the margin area of the page, and of course if the user is versed in LaTeX and wished it to look some other way, it would be a simple matter of overriding the appearance of the marginpar environment, or changing it entirely with search and replace. Whether or not this method is used could even be tied to the current “Export annotations as RTF comments” option. Just change the label to “…as LaTeX comments…” when relevant.

The only major problem I see is that doing it this way would lose the positional context of the annotation. I think in Word, doesn’t it place a little icon where the comment is in the text, or does it remove it from the context as well? Some consideration could be given to this, if you wanted context retained – I know I would.

The other option is to simply format them into the text the way RTFD does and whatnot. Either way, adding colour might be nice. That would look like:

\marginpar{\color[rgb]{1.0, 0.0, 0.0}{Here is the text of the annotation.}}

If you wanted to do that, you would need to add this line manually to the final LaTeX output:

\usepackage{color}

If you open up a generated LaTeX file, you’ll see a section near the top that has a block of these \usepackage declarations. It can be added anywhere in this top part of the document. To make it easier for you, you could just have it insert on line 2 or something.

So, if the user has the annotations as comments thing un-checked (or if you decide against using marginpar), the code would be:

\color[rgb]{1.0, 0.0, 0.0}{[Here is the text of the annotation.]}

AmberV · September 26, 2006, 12:18pm

<style type="text/css" media="all"> body {font-family: Optima; font-size: 10pt; line-height: 1.5em; } h1 { font-size: 1.5em; text-align: center; } h2 { font-size: 1.1em; text-align: center; } h3 { font-size: 1.1em; } </style>

While sticking this in one line below the MMD header section (see readme.markdown, that I sent you a while back) will produce sloppy XHTML, it is the only way that I can think of to easily insert style information. Since it is just a temporary file, I don’t think it really matters. You could specify a link to an external CSS stylesheet – but where would this stylesheet go, and would it work properly with textutils? It is more complicated.

So, if the user selects Optima, 24pt for the title font:

h1, h2, h3, h4, h5, h6 { font-family: Optima; font-weight: bold } h1 { font-size: 24pt } h2 { font-size: 18pt } h3 { font-size: 14pt } h4 { font-size: 13pt } h5 { font-size: 13pt } h6 { font-size: 13pt }

I suppose you could extrapolate all six header formats based on their choice for the text titles font. Decrease each one by a certain amount until it reaches the preference for body text size. In this example, the floor was 13pt.

That should be all you need. The user’s use of MMD’s header syntax ‘#’ to mark out titles, will get automatically formatted using these rules. So all you have to do is make the CSS.

KB · September 26, 2006, 6:07pm

Thanks for these responses, they are really helpful - I still need to go through the last couple of posts in detail to see what I need to use.

Before that, though, I need to make something a little clearer: I don’t think there is any need for an MMD checkbox on a per-document basis at all, even for RTF export. Why not? Well, because files using standard RTF and files using RTF generated by MMD would not be able to be combined. There is no way, technically, that you could combine RTF generated from within Scrivener with RTF generated by Markdown. Let me explain why:

When Scrivener creates an RTF file, it hacks the Apple RTF, which doesn’t support footnotes, annotations or images, by grabbing the RTF file as plain text from file and inserting all of the necessary RTF tags, information and hex data directory into the RTF stream so that the generated RTF file now holds all necessary information to store proper RTF footnotes, comments and images.

What Scrivener cannot do, however, is read in an RTF file that has images, annotations or footnotes in it and retain them. Scrivener just uses the standard Apple RTF reading methods, which cause these things to get stripped. Not ideal, but the alternative would be spending a year (and I’m not exaggerating ) writing my own RTF parser that converts RTF to a standard Apple attributed string instead of using the default Apple one. This is because importing is not as straightforward as exporting. I can choose how to format the footnote, image and annotation tags when I insert them into the RTF stream for export. But the various combinations that can be chosen mean that trying to find these tags inside an existing RTF file to read them in is much, much, much harder (for a start, the tags don’t even have a set order; and they can all have other random info attached to them, too). Well, I won’t bore you with the details: suffice to say that this is something I am NOT going to do. Instead, I hold out hope that Apple will one day get their act together and improve their RTF support. For now it is enough that Scrivener can get this stuff out at all (most apps cannot).

Now consider how MMD export would work if it were to try to combine documents not marked for MMD formatting with those that were. At this level, it is impossible for Scrivener to work with the RTF stream itself - that is way too complicated. So instead, Scrivener would have to use MMD to convert the MMD-formatted files to RTF and then read them back into a standard format to append them to the other files before exporting them back out again as RTF - and in the process, any footnotes and annotations generated by the MMD conversion would be completely lost while the MMD RTF files are read in for concatenation using the Apple standard methods.

So, it’s an all or nothing affair. If you want your draft exported as MultiMarkdown - to whatever format - you have to accept that all Scrivener formatting will be lost and formatting will only be applied by MMD. Anything that doesn’t have MMD tags will become plain text.

With this in mind, it probably just makes sense for these exports just to be under the normal export popup menu as extra options such as “MultiMarkdown -> LaTeX”, “MultiMarkdown -> RTF” and “MultiMarkdown -> XHTML”.

Hope that makes sense… Probably not.

All the best,
Keith

Joakim_Hertze · September 26, 2006, 7:07pm

And by God, don’t forget “MultiMarkdown -> Plain text”.

KB · September 26, 2006, 7:15pm

Actually, I haven’t got a script that converts MultiMarkdown to plain text, only to LaTeX, RTF and HTML. Any takers?

Oh, and as for the SmartyPants language options - I think I will plonk this in the preferences, under Typography.

Thanks,
Keith

Joakim_Hertze · September 26, 2006, 7:26pm

If nothing else comes up – why not let MultiMarkdown create a RTF-file, which then gets converted to plain utf-8 text. Not an elegant solution, but I believe it would work.

Putting the SmartyPants option in preferences is probably a wise idea. No need to clutter the export sheet when you don’t have to.

Joakim_Hertze · September 26, 2006, 7:51pm

The MultiMarkdown (http://fletcher.freeshell.org/wiki/MultiMarkdown) home page hints at the possibility of creating an XHTML page and then use XSLT to create any output you want, including plain text.

I’m not sure exactly how this is done, but it appears you’d want to put the following under the stylesheet declaration (http://www.xslt.com/html/xsl-list/2005-04/msg01266.html):

<xsl:output method="text" />

Hopefully someone with more knowhow could explain this better.

AmberV · September 26, 2006, 8:08pm

There are a plethora of HTML to text conversion scripts out there. It would be a lot easier to use one of those than create an XSLT from scratch. But yes, that route probably would be much more efficient than RTF to text, as generating an RTF requires an extra step, and the RTF gets created from the XHTML – so you might as well just not take the extra step.

All of that said, isn’t the whole point of Markdown to have a plain text document that is easy on the eyes and functional?

janra · September 27, 2006, 3:47am

I’m not sure how useful this will be, but since my usual writing tools consist of plain text files with simple markup and conversion scripts to get LaTeX and HTML out of them, I thought I’d offer a few thoughts

Transitioning from that to Scrivener, I liked being able to break up my files even further than one file per chapter, while having even easier access and management. I wasn’t sure about the file format being RTF at first, since I find formatting options to be distracting when I’m writing. My markup is literally nothing more than:

=title=
emphasized (italics)
[blockquote]

That third one was added after I’d been using only the first two for several years.

So anyway, I’ve been reading this Markdown/conversion script discussion with interest, because it brings Scrivener closer to the way I prefer to work.

I’ve gotten to like writing my unformatted text somewhere that has an attractive default font, I admit

Anyway, from playing with LaTeX, there are a few things that I want to mention. One is that LaTeX takes ASCII - I think it has a language mode that will allow it to take, for example, accented characters natively, but I had to convert my typed accented characters to the LaTeX markup for that accented character. Another is quotes. LaTeX takes `` and ‘’ (double backticks and double-apostrophes) which it converts into opening and closing typographic quotes. I waffled a few times between typing them that way, and typing a straight double-quote and having my script figure out whether they were opening or closing quotes, and I’m still not sure which way’s best. One thing I think would be useful for a Scrivener → Markdown → LaTeX conversion tool would be a way to either figure out straight quotes as my script did, and/or to convert Scrivener’s “educated” typographic quotes to the appropriate LaTeX commands (or HTML entities, or whatever is appropriate for the desired export format). I’m not sure how this would interact with different quotes for different languages, however.

I know you don’t want to get rid of RTF and go to plain text, but I really appreciate the effort you’re putting in to let us plain text types work the way we want to

AmberV · September 27, 2006, 3:54am

Part of the MultiMarkdown chain is a Perl script called Smartypants, which basically does everything you are worried about. It parses quotes and turns them into `` and ‘’, and it even handles a certain number of special characters like em dashes and such. It will make XHTML and RTF files look good, too.

AmberV · September 27, 2006, 11:00am

Oh, and in response to the wishes of templates, and being able to customise things. While it isn’t all fancy and user friendly, if the test application is any indication, all of the files that will be responsible for actually creating the LaTeX documents will be easily accessible in the Scrivener package. As long as the three shell scripts starting with “md2” names stay the same, you should be able to add your own XSLT declarations and such.

Joakim_Hertze · September 27, 2006, 12:42pm

I thought the point of Markdown was to create valid XHTML, without having to do all the markup yourself.

I honestly think it would be a misstake not to include the possibility of plain text output via MMD. There are a lot of plain text fiends out there, which I think would agree with me. It’s not a dealbreaker of course, but why waste an opportunity to steal marketshares from Ulysses?

For me it would be quite enough to run the plain text export through SmartyPants and then through a script that changes the HTML entities to their utf-8 counterparts. Taking the route trough MMD is not really necessary.

AmberV · September 27, 2006, 1:33pm

I guess I don’t understand what that would accomplish, other than a few em dashes and typographic quotes? You would still have all of the Markdown syntax in the file. Why not just use Scrivener’s regular plain text exporter, at that point?