When I run my Ruby splitter script from the command line, the input file is seen as UTF-8 encoding. When I run it from the Processing tab in Scrivener, the file is seen as US-ASCII. The latter does not work.
Adding the line
ARGF.set_encoding(Encoding::UTF_8)
before reading the file solves the problem.
I don’t see anything in Ioa’s writeup talking about US-ASCII vs UTF=8. Did I miss it, is there a setting I missed, or ?
That’s a good point, I’m old school enough that I rarely pass through anything but ASCII, so it’s something that slipped past my basic testing. With the user manual I use the same directive you do in the post-processing script.
It’s a quirk of NSTask environment, best I know, and from what I’ve gathered asking around, not easily solvable from the Cocoa side. I’ve run into the problem in several different programs that offer command-line integration, such as Keyboard Maestro and Hazel. A bit strange considering that whole Mac is UTF-8 based and its default interactive shell environments are as well, but yes, definitely something to be aware of.
Well, I’m not sure what’s in there but the fact that it didn’t work in US-ASCII makes me think the Scrivener output file actually contains UTF-8 characters. I suppose I may have typed something in.
I also noticed that when the file was read in ASCII its size was much larger than the size in UTF-8, I suppose due to the double-wide characters.
US-ASCII is kind of deprecated anyway, isn’t it?
It was a miracle that I figured it out. I’m not sure how it came to me to check.
Most likely typographic punctuation. Not strictly a necessity since MultiMarkdown and Pandoc both convert ASCII equivalents automatically, I tend to turn all of these features off in the Corrections pane myself, but it’s also possible to set up the compiler to convert these back to ASCII characters in the Transformations compile format pane.
Further confusing the matter is that UTF-8 has an ASCII compatibility layer. So long as you don’t use any characters outside of the first few bytes of the character table, a UTF-8 file will procecss through ASCII-only engines as-if.
I don’t know about that, for maximum compatibility it is still a safe target, but like I say it is a bit odd that NSTask operates from that assumption in this day and age. Short of Apple changing it, it’s something we have to work around.
Good point on the punctuation, I think I had smart quotes still turned on for a while, I remember turning them off while trying to figure out why I couldn’t type [code]
[/code] and have it not turn into em-dashes.
I do wonder why (and how) it gets set in Processing. Should I file a bug report?
Thanks!! My conversion and little booklet are coming along nicely and your help has been invaluable!
I had a similar problem using a Ruby gem in the processing pane, but my Scrivener output was plain text. It seemed to be caused by reading some third-party files used by the gem that work fine in my normal environment but caused errors from Scrivener. My solution was to switch from invoking the gem directly to invoking it using Ruby with a flag (-K) to enforce the correct character encoding (u):
I also tried editing the source code and rebuilding the gem (something I’ve done with it before for other reasons), but that didn’t seem like a good idea; other people are more likely to use the real gem than my version, and I didn’t really know what the implications were of forcing UTF-8 in other circumstances. Anyway, it did work and the change I made still wasn’t the same as yours; I added a second argument to File::read:
and it’d be really nice to have this as an option. I see Scrivener mainly as a writing tool, helping organize ideas, and large writing or research projects. This is where it’s infinitely better than anything else out there, and incredibly versatile in how the individual parts of a project can be organized/structured, allowing for a large number of use cases. Now for those of us who like to use other programs for further processing the text or sharing it, or posting it somewhere, etc., it’d be really neat if we were able to keep some of that versatility after compiling.
A few years ago I asked on this forum about exporting to multiple .tex files (viewtopic.php?f=21&t=35898) that I could then \include{} in my LaTeX project
where Scrivener’s output are the individual chapter .tex files (or even sections, or whatever the user specifies).
My dream scenario would be an enhanced sync function of Scrivener that would allow me to keep synchronized .tex, .docx, or other files of components of my project in designated folders. Currently I have set up different compile settings for markdown conversion using pandoc. If I could execute them simultanously with a single keyboard shortcut (sync) and then magically export separate .tex files to folder A, .docx files to folder B, etc.
Maybe that’s something to consider for Scrivener 4
I am using AmberV’s great little splitter script included in the reply to the original question above.
I’m just stumbling into the ASCII / UTF-8 encoding problem discussed in several posts here. I am using IPA characters, such as this one here ʧ which apparently have been the culprit. I tried what RonJeffries suggested
However, this did not help in my case.
I am getting the following:
/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/delegate.rb:341:in `write': U+00E9 from UTF-8 to US-ASCII (Encoding::UndefinedConversionError)
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/delegate.rb:341:in `print'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/delegate.rb:341:in `block in delegating_block'
from /var/folders/11/yyzl685d0h11s7ctdldn02pc0000gn/T/my-script:26:in `block in <main>'
from /var/folders/11/yyzl685d0h11s7ctdldn02pc0000gn/T/my-script:13:in `each'
from /var/folders/11/yyzl685d0h11s7ctdldn02pc0000gn/T/my-script:13:in `<main>'
Before I try to parse all this for myself, can I ask — Is this the general method I should think about using if my use case is this:
I have a Scrivener project with several documents with rich text and that I want to be separate markdown documents on my machine that I can later compile with Pandoc into separate pdfs. Each document in the Scrivener project has its own metadata section at the top delimited by two instances of ---.
What is my best route to ending up with a markdown file for every document in the Scrivener project?
Well one thing to consider right off the top, if you’re heading toward Pandoc with the intention of generating individual HTML files, is to consider installing Pandoc 3, and taking a look at its new ‘chunkedhtml’ output method. In conjunction with the --split-level=n flag, which lets you choose which heading level to split into documents. That may be all you need. HTML is all it supports at the moment though. I hope to see that broaden into a more general-purpose capability in the future.
Otherwise, let me know what further questions you might have, after going through the thread, because that is covered and demonstrated with samples and quite a few follow-up discussions on customisation and usage.
All right, yeah you’ll find a line in the sample project’s script where you can slot that in, if you want to essentially just get straight to a series of PDFs out of the compiler. If not it can be commented out.
The one other tweak I can think of that you might need to do is how it handles metadata. It presumes each file will be using the same metadata block, as I recall, and so it captures that initially and then prepends it to each output file. It should be a simple matter to remove that insertion and let the binder items themselves drive their own metadata.
Thanks so much. I downloaded the sample project, and it split the documents under ‘Red book’ and ‘Black book’ just as you intended, as separate tex files. But I couldn’t seem to make the changes needed to compile and split several project documents into markdown files. Would I need to edit the script to accomplish this?
Yes, this was a very single-purpose demonstration, I think a better script+project example would push more of the configuration into the project and make it easier to switch post-processing on or off—perhaps by using the extension more intelligently, allowing for mixed content out of one compile command. I.e. the extension printed by the Section Layout prefix tab could inform the script whether post-processing should be used for that specific chunk, and if so, to what file type.
I’ve never had time to come back to this and make it a more universally useful tool though.
But for your specific purpose, try the attached script instead:
The metadata block capture and prepending into each sub-file is removed, since you stipulate putting metadata into each section that is meant to become a file.
The post-processing command is removed from the final loop (where footnotes and image references are added to each file), and replaced with a simple command to move the temp file into the working folder.
As for what to change in the project:
Edit the “MultiFile Output” format.
Change the Section Layout: New File: Prefix tab to end in .md instead of .tex.
Likewise fix the “references.tex” entry in the Text Layout pane.
In Processing, tick Use Pandoc syntax and update the embedded script with the above content, or switch to external execution.
That should work as-is right? Just drop it in as the last line of the script and put backticks around it to make it a system call.
How I would do it though is more like the first script, where the Pandoc call is placed into the final loop, replacing the original sample MultiMarkdown call, rather than (or maybe in addition to) moving the .md file from the tmp folder to the output folder. You’d have to rework things a bit though, since the filename variable includes the extension. Something like this:
Of note, I’m hinting markdown input since the tmpfile naming scheme confuses Pandoc. It does the right thing, falling back to Markdown, but better to let it know that’s the source format.
But like I say, I don’t see why the brute force method wouldn’t work either.