Improved Spanish dialogue handling

Hi Scrivener team,

You may already know this, but to clarify my suggestion I’ll briefly explain the context, just in case.

In Spanish (and I guess that in other languages too), dialogues in narrative are represented differently than in English. In English you’d say:

“Hi,” he said. “Long time no see.”

In Spanish, however, dialogues are usually represented by a new paragraph starting with an m-dash (which I replace in the following example with a single hyphen):

-Hola -dijo él-. Hace tiempo que no nos veíamos.

Narrator comments are placed between m-dashes as well, which act as a sort of opening and closing parenthesis. In the example above: -dijo él- is a narrator comment (the “he said” part).

The problem arises when Scrivener compiles the dialogues. It’s not uncommon that either the opening or ending m-dash of a narrator comment is placed alone at the end or beginning of the line, as it is considered, I presume, a symbol and not a punctuation character like a true parenthesis. So you may end up with several paragraphs that looks like this:

-Estoy seguro de que este problema tiene solución -
dijo-. ¿No crees?

-Estoy seguro de que este problema tiene solución -dijo
-. ¿No crees?

Instead of what you find in actually published material, like this:

-Estoy seguro de que este problema tiene solución
-dijo-. ¿No crees?

I suppose that if the m-dash was considered by Scrivener a true punctuation character, just as the comma, the period, the colon, etc., it’d never be left alone and be treated always an “in-word” character which won’t be split.

Note that this is different from the true hyphen, which actually is allowed to split words in Spanish and, as such, is welcome at the end of a line.

Is it possible that you consider supporting this behaviour?

Many thanks in advance.

Whether or not a character is acceptable for breaking is up to the character itself. That is a feature of typography, and there is special meta-data built into the character for controlling that. We have for example the non-breaking hyphen (U+2011) which is designed for cases where hyphenation should not break lines. There are also codes to control breaking, such as the no-break space (U+00A0), and what will probably work best for you, the narrow no-break space (U+202F) or the word joiner character (U+2060).

In my testing, placing a narrow no-break around an em dash will cause it to “stick” to the text it is adjacent to, forming the wrapping model you are looking for. The way I would go about doing this is with regex Replacements. I have attached a couple of compile presets that contain a pair of Replacements (one with a small space, the other with the no-space joiner) that you can copy and paste into your real compile settings. These will look for an em dash with any character other than a space on one side, and insert a narrow no-break space between that character and the em-dash, “gluing” them together.
spanish-em-dashes.tar.gz (8.12 KB)

Thanks, AmberV. I get the general idea, but I don’t know how to use those .plist files. Since your solution requires RegExp support I suppose it’s well suited for the Mac version since there is no RegExp support yet in the Windows version. Am I right?

Anyway, I’m trying to type the unicode characters you suggested directly into the text besides the em-dash but I get some symbols instead. I’m using Windows XP and I also tried the character map, but it does not contain those unicode characters so it does not help either.

I’ll keep trying.

Blast, for some reason I had in my mind you were a Mac user and didn’t double-check. Those plists won’t work, and the regex trick in general won’t work until we get a good library built in (probably for the big update as it is rather niche). It also seems that as you report, this particular Unicode control character isn’t broadly supported on Windows. I tried on Win 7 as well with an RTF file I’d compiled from the Mac and got generic blocks. However, if I insert a Narrow No-Break Space manually it appears to work fine. It may be that won’t work in XP however.

I think I found something that will work for you though, it is the horizontal bar instead of the em dash. They look virtually identical, but the bar has the characteristic of being “sticky” on both ends, meaning it will not become a preferred break point. The code for it is ALT-2015 instead of ALT-2014 (which is the em dash). I couldn’t testing inputting it that way as my numpad doesn’t work with the VM, but when inserting it with character map it worked, so creating a custom substitution in Options should do the trick.

To be a purist about it, if you can do the narrow no-break + em dash trick with custom substitutions that would be best. Horizontal Bar isn’t technically meant to be used as an em dash even though it looks like one, but on the other hand the latter is only one character instead of two or three.

It works like a charm! I created a single compile time substitution to replace the em-dash with the horizontal bar, as you suggested.

Thank you very much!