Apologies: To everyone who loves the English language. I love English too but that does not stop me from doing things to it. If the English language is Lolita, you may call me Humbert.
Bonus apologies: For a lot of times I will be using English sentences as examples but I use the dashes and hyphens in these English sentences as if they were German. This is not at all a revenge for the English language somewhat invading the German language but due to the fact that my examples are mostly about the German usage of dashes and hyphens but the examples wouldn’t make any sense if you do not understand them because they are German. Nonetheless, these bastards or better: orthographically challenged (challenging?) sentences will make you go “Yuck!” more than once. I’m truely sorry about that.
To replace double hyphens with an em-dash is very handy and common in writing programs and I can’t see any reason for changing this technique.
But what you seem to have forgotten is that in other languages, like in German, the em-dash is not in use (any more) but the en-dash is.
One option could be to allow to set the replacement of double hyphens to either em-dashes or en-dashes.
But this wouldn’t be a very comfortable solution.
Because of the en-dashes being significantly shorter than the em-dashes, to prevent confusion with hyphens en-dashes never stand directly in front or behind characters, there is always a space or a punctuation mark between them.
This allows to let hyphens be automatically replaced by en-dashes, and programs like MS Word or OpenOffice (and it’s offsprings) Writer are doing this. But they work less than perfect.
And that’s because they didn’t think far enough: The en-dash ist not only used between spaces but also like this (all of these examples are not replaced by Word or Writer as en-dashes):
– at the end of paragraphs
– at the beginning of paragraphs
– between sentences
– a parenthesis might end with a punctuation mark – just like this! – and it’s surrounding hyphens have to be replaced by en-dashes
– in some languages (like, again, German) a parenthesis will be followed by a comma or a semicolon if there would be a comma or a semicolon without the parenthesis: “Pamela Andersons(*) – whoa! –, who once became famous for wearing a super-tight red bathing suit, has married for the fourth time recently.” i read this loud in your best Borat voice.[/i]
Solving this looks a little complicated but it is not at all, and it’s definitely less complicated than the clumsy solution of Word and Writer: The non-replacement of the hyphen in Word and Writer when the parenthesis ends with a punctuation mark – like this! – indicates that their en-dash replacing routine does not only look for spaces before and after the hyphen but also wants to find out if the hyphen-with-spaces-before-and-after is located inside of a sentence or between two of them. In the latter it should become an en-dash too, so it’s hard to understand why both of the two big programs make the same mistake. Maybe one of them should be less clone-y in some cases. Or in general …
But back to how it should work. The easy solution works a little paradoxically: Although you will find a lot of people who never use dashes at all (not in our circles of writing people, mind you!) the programming-wise simple solution would be to replace every hyphen with en-dashes!
Every – except for the ones who have direct contact with a letter. Direct contact means: The hyphens is located right after or right before a letter. And this is a logical “or”, not either-or. To stay a hyphen it must have direct contact with at least one letter.
Letter means: not only the ones from a to z and A to Z but also language specific characters like German Umlauts and ß, letters with cedillas and such and of course all non-latin characters. But not figures!
This rule allows to properly display the following cases:
a) Hyphens:
b) En-dashes:
And all of these different cases fall under the one simple rule:
IF ((character before hyphen = letter) OR (character after hyphen = letter)) hyphen = hyphen ELSE hyphen = en-dash.
I have used this in self-made macros and it worked pretty fine, WAY better than the Word/Writer stuff.
But does it work 100% correct? No, it doesn’t, because that’s not possible without semantics; a 100% correct working routine would have to understand the meaning of a sentence.
Two examples, again with German use of hyphens and en-dashes:
A district of a town gets connected to the name of the town by a hyphen (actually it is not a hyphen but a ‘Bindestrich’ but that is exactly the same character so we won’t get picky here), and the latter one is a self-explaining ‘distance dash’.
No macro could separate those two unless it looks them up in Wikipedia or so to check if they are two different places or if one is part of the other. Maybe, certainly, one day every writing program will have a universal knowledge plug-in, but until then we have to manually correct some hyphens.
And by the way, not many German people would use the ‘distance dash’ like that but use it with surrounding spaces too.
The automatic replacement of en-dashes would make the writing process MUCH more comfortable. And Scrivener could possibly be the first program to use this optimized replacement.
Of course the en-dash replacement should be tick-able like the replacement of the em-dash is. So if someone does not want it for whatever reason (like typing program code) it can be switched off.
And of course it needs a reverting replace by hyphens in export counterpart to the em-dash.
a) General
In general, the automatic replacement of the inch symbols, those relicts from the ancient typewriter age, by typographically correct quotation marks works very well.
More specific, Scrivener is the first writing program I know (and I know quite a lot) that even does use the right quotation marks included and right in the beginning of the including quotation marks. Example:
All other writing programs except for Scrivener would turn the single quotation mark at the beginning of ‘Nirvana’ in the second sentence into a closing quotation mark and not into an opening one. Some might argue that this clash of quotation marks doesn’t happen very often but I can assure you that they are not as seldom as you might think. At least not here on my computer …
My guess why this is wrong almost anywhere is that this has to do with the fact that most of the common programs come from English speaking countries. And in English – I don’t know if it is a rule, if this is maybe American only, but I know I have seen this more than once in English texts – there would be a space between the two opening quotation marks:
In this case the replacement would work correct. If a space between the including and the included quotation mark is expected by the programmers that explains why this does not work when there is no such space. And the explanation why it does work in Scrivener although Keith is English too? Keithness, I’d say, which soon will become a synonym for brilliance in the rest of the world too.
But is Scrivener doing this perfect? Well, almost. I found a minor flaw, a really, really minor one that could be easily fixed:
After a slash Scrivener sets a closing quotation mark.
Spontaneously I can’t come up with any quotation that ends with a slash. Though there might be some – by automatic replacement you could never find a rule for all possibilities. Language is alive and can be used in so many ways no algorithm can describe in full.
But the automatic replacement should go for the more common options. Something like
does not look too made-up to me. So I would vote for ‘a quotation mark right behind a slash is an opening one’.
b) Always the trouble with the apostrophe
Different languages use different sets of quotation marks/inverted commas.
More clever programs (not Microsoft Word) not only offer one set per language but maybe more and most important: they allow them to be changed manually. The replacement routine does not care what characters it has to use as quotation marks, and with the option of manual changes you get exactly what you want – if you prefer Apple symbols as quotation marks, well, why not?
I will again use German as an example. I do this not to annoy you but because of it’s the only language I have intimate knowledge of plus in this case it very well expresses the problems.
In German we have two, or maybe three sets of inverted commas/quotation marks:
„ text“ and ‚text‘ – »text« and ›text‹ – it’s less common french variation: «text» and ‹text›.
(It is common to use the double ones as the outer and the single ones as inner marks in German but again: That doesn’t matter, the replacement routine takes whatever characters you have chosen.)
But now comes the trouble, trouble that most program(er)s are not aware of because of, again, their English origin:
The apostrophe is NOT necessarily identical with the closing single inverted comma!
In two of the above mentioned German usages of quotation marks (these can’t be called inverted commas anymore, can they?), the ones with » and «, this is obvious (‘it‹s’???). But also in the first variant – we call these ‘goose feet’ – they are different as you can see when you give them a closer look:
(There is a crib: ‘Goose feet’ quotation marks are like a subscript 9 and a superscript 6: [size=59]9[/size]Hello![size=59]6[/size] And an apostrophe is like a superscript 9: It[size=59]9[/size]s friday!)
A lot of people confuse this and the fact that a lot of writing programs use closing inverted commas for apostrophes doesn’t help. On an Apple keyboard you don’t even get an apostrophe by just typing alt+’ but alt+shift+’ which indicates that Apple doesn’t know anything about this neither.
What can be done about this? ‘Will he come up with a so-easy-but-no-one-before-him-has-ever-seen-this-solution like with the dashes?’ the breathless Scrivener forum readers ask themselves.
Well, I’d love too, but I’m afraid there is no or at least no manageable solution.
How could you tell which amongst the following is an apostrophe or a single inverted comma (you not being a human with a properly functioning brain but a mere IF-THEN-ELSE structure):
The apostrophe can stand right in front of a word or right behind a word to show missed out letters or to show possession. I don’t see a way how you could tell them apart from inverted commas.
If the ’ stands behind of a word a replacement routine could look for for a not yet closed opening inverted comma and if it finds one it would go for: ‘This is a closing inverted comma’. But a word with an apostrophe can be included in a phrase in inverted commas, which means there is an opening inverted comma but it doesn’t have anything to do with the apostrophe.
On the other side, when a ’ stands right before a word the routine would have to wait until a closing single inverted comma is typed to decide if that first one is an apostrophe or not. But how long would it have to wait (e. g. to keep in mind that the status of this particular ’ is not yet clear) – until the end of the sentence? Or the paragraph? The text? It is possible that a longer quote starts not at the beginning of a sentence (and because of that not with a capital letter which would have been an indicator for an inverted comma), so the end of the sentence would not be enough.
And what happens when the ’ typed some time after an uncertain ’ is also uncertain? The replacement algorithm would go like: ‘Status of first ’ depends on the next one, status of the next ’ depends on the one before.’ In human language: ‘Have a break, go to the beach, here’s the spinning beach-ball already!’ This would so easily lead to a closed loop, at least when this routine works while typing (and that is what it should do).
So: forget it.
But can’t anything be done?
Well, maybe a little bit. What about a ’ inside of a word?
In these cases of alphanumerical-'-alphanumerical the ’ is an apostrophe.
This goes for all cases of this type.
All? Almost. Sigh.
Example:
A very simple sentence, with neither an apostrophe nor an inverted comma. But, sadly, he is not much of a business man. And so you have to add some sarcasm by using inverted commas:
Problem? Not yet. Not in English. But in German. German works like Lego, you make new words by glueing others together. Means: We don’t call it ‘business man’ but ‘businessman’. Which changes the example to:
So what do we have here? A ’ in a word that’s not an apostrophe but a closing inverted comma (could be an opening one in other occasions). A word with one part in inverted commas. A half-sarcastic half-non-sarcastic half-parrot word.
What now? Time to send troops to Germany again? Well, maybe, yeah.
But this is up to other people to decide. Here the question is: Should you alter the routine to turn in-word inches into apostrophes or not? Is altering the algorithm worth the effort?
I’m undecided myself. On one hand it’s: yes. You probably use way more apostrophes in phrases like ‘it’s’, ‘don’t’ etc. (even in German) than part-sarcastic or part-quoted words.
Then again, if you need an apostrophe you will easily get it by typing alt+shift+'. But › and ‹ are hidden behind alt+shift+n and alt+shift+b (on German keyboards) which is not very obvious. Maybe this would make people using sarcastic inverted commas become even more sarcastic!
Then again – yes, the pendulum swings back one more time – you usually use double inverted commas in German and single ones only inside of direct speech (marked by double inverted commas). This means: Only Germans (and people using languages that have the same or a similar way of inverted comma usage) being sarcastic just in parts of words that are included in direct speech have to search for the key combination of › and ‹.
Everyone else would profit from the slightly changed replacement routine for the '. (For English speaking/writing people it would be plain neutral: a character would be replaced by the same character.)
Anyway, if you decide to alter the replacement routine to give at least the in-word apostrophe it’s right each language’s typography settings should be extended and contain the (manually changeable) apostrophe character.
It is important that a text has a consistent look.
For a lot of times parts of a text are of heterogeneous origins. You put stuff into your project’s research ordner, you copy and paste portions from it into your text.
It would be great if there was an ‘apply typography’ menu item (and button!) that would change all inverted commas, dashes etc. to the text’s default.
Maybe this could additionally be included in ‘convert to document style’. Maybe the latter should have a box with options what the conversion affects.
Christ. I hope you don’t take offence if I don’t read this properly until next week. I have 1.1 to get out, the web page to update, the release notes to write… But I will read it. Promise.
Best,
Keith
I glanced through most of this, but can’t claim to have read every word.
But, it does sound like you should take a look at Smarty-Pants - John Gruber’s program to apply smart typography to text. It is used to provide smart typography with MultiMarkdown, but is entirely independent of his Markdown project.
You might be interested in seeing what he did. Additionally, I included alternate versions of Smarty Pants in MMD for various languages. You may or not have use of the software, but may find additional support for the practical algorithms.
I have not compared SmartyPants and its options with the built-in options in Scrivener, since all of my Scrivener documents are also MMD documents. (sort of the exact opposite of Keith in that regard, I suppose)
A fascinating post. I learned a lot about German punctuation, which is often quite different from British, as British is from American. For example, in American usage at least, band names don’t appear inside quotation marks, whether single or double. Only a song title would appear as “Lucy in the Sky with Diamonds,” and an album title would be The White Album.
As I see it, you are asking for localization/localisation of Scrivener into a writing program that honors German punctuation. Usually that happens because a highly skilled volunteer offers to do the coding. You seem to have the requisite knowledge.
Anyway, thanks for the post. I read it all. But then I once wrote 8 pages of instructions to editors on the proper transcription of hyphens in a 19th-century manuscript.