Request JSON export capability

DeeLew · January 20, 2023, 5:55pm

JSON is possibly the most ubiquitous text file format used today.
It would be fantastic to be able to export the text and formatting contents of manuscripts, etc. directly to JSON format, for further programmatic processing.

This would only make your fine package that much better.

By the way, currently we export to XML and then do an XML-to-JSON conversion.
The styling information is presented in a “grouped” manner that makes reconciliation/association with the related text cumbersome.

Thanks, in advance.
And keep up the great work!

Cheers!

nontroppo · January 22, 2023, 1:44am

I didn’t even know Scrivener could export as XML… I think some sort of structured plain text would be good for export.

For compile (more tweakability than export) you could build some sort of intermediate structured format (styles + section layouts both allow prefix / suffix injection, regex replacements could transform in-text markers etc.). Probably you could get the compiler to output 99% JSON, with some issues regarding commas for list elements etc that may require a bit of post-processor cleanup.

DeeLew · January 22, 2023, 1:53pm

Hi nontroppo,

Thanks for your remarks.
OPML exports are in the XML format.
(Already developed infrastructure to process the XML contents, including producing JSON. A direct method would be preferable.)

Cheers!

nontroppo · January 23, 2023, 12:27am

Ah, yes OPML. JSON has a lot of cool tooling available so I think it would make a great structured output for further processing, so +1.

AmberV · January 23, 2023, 5:14pm

Better XML output

As far as that goes, as noted above Scrivener can execute a post-processing chain for you, essentially resulting in a direct method, as you put—it’s one where you have ultimate control over the conversion rather than hoping that whatever assumptions we make are what you require. Refer to §24.21, Processing, in the user manual PDF for further information. So at the very least you may be able to automate what you’ve been doing by hand.

The trick though is that the compiler itself does not generate OPML natively, so you’d be needing to start from a different point to get to where you want to go with that method. It’s worth considering MMD → HTML. It’s a different DTD, but you’d at least be starting with the same tech, one with unfathomably more support than OPML—and more importantly the formatting would be a lot more expressive. OPML is essentially just a plain-text dump into attributes.

Using Markdown as a compile file type doesn’t necessarily mean having to start with Markdown either, in case you’re worried about that. The compiler can convert a fair amount of formatting itself, and with a little prompting on our end via styles and section layouts—well, we can do quite a lot. But if you don’t mind writing in Markdown, I think you’ll be pleased with what Scrivener offers along those lines—even to the point of maybe skipping some of the below and just building a Lua output for Pandoc that does all of this from an abstract source.

Native output

As noted by nontroppo above, Scrivener itself is quite capable of generating output as syntax. We put a lot of effort and design into that side of Scrivener, so that people could come along and build custom file type converters—so that we don’t have to spend vast amounts of time doing so ourselves, and result in cluttering the Compile for menu beyond reason. Our design target for this was valid XML, which is going to be enough for most other syntax rules as well.

Between having Pandoc around and its dozens of support file type outputs, and Plain-Text with its DIY syntax generation capabilities, there is little you can’t do yourself, or orchestrate the doing of with utilities.

To circle back to the original request for one moment though: while you are absolutely right that JSON is like supporting XML export these days, and that it would benefit developers and the like who use Scrivener, the main problem I see with that approach is what that even means beyond the basic structure. Would an outline chunk be one long string with \n separated paragraphs, or would it be an array of paragraphs containing metadata about para style and such for each line? What should be used to mark up inline formatting within paragraphs? Or should it be more hardcore like Pandoc’s JSON AST, which is essentially unusable for a human as each word is considered a discrete structural component. I don’t think there are going to be universal answers to that question. JSON is a way of structuring data, not defining the application of that data, or how it should be implemented—and I’m not aware of there being any kind of common “Book JSON” specification.

Ultimately I fear it would be a “good for one person” type result, either that or so laden with compile pane options that we might as well just stick with what we’ve got and leave all of the answers to these questions up to you. With the exception of a little },{" type glue in between things—what more could we be doing that doesn’t cross over into presumption?

And the glue is the easy part—so in a way we’re already fulfilling the feature request to the best level we can.

Practical example of a JSON compile format

json_output.tar.gz (68.7 KB)

To that end, here is a POC that in my testing gets us to 100% valid JSON output, and that contains useful markup information at a granularity of paragraphs. Of note:

I’ve employed a few workarounds to get around some Windows bugs that involve how Section Layout prefix/suffix strings are merged with paragraph style prefix/suffix strings, when adjacent.
Info for Windows users...
- I’ve had to add a newline after the last paragraph of each section. This avoids causing the final paragraph to not acquire the styled prefix/suffix it needs to wrap the paragraph in JSON structure. The final paragraph becomes the last empty one, which is ignored anyway by the compiler, as should be.
  
  The implication to be aware of though is that one would need to be more careful with how they format content in the text editor—to be mindful of how each line is going to be transformed.
- One that I couldn’t fix is that Windows doesn’t handle custom date-time formats very well (for the created/modified time stamps). The output is kind of garbage, so if one were to be using Windows I’d suggest just using one of the stock placeholders instead.
Paragraph styles that aren’t body text would need to be added to the format’s Styles pane, and given a similar prefix/suffix treatment to regular paragraphs. I’ve provided one example of that using Block Quote. You will note that in the output of the first subsection, “Scene a”, we get two paragraphs in the array with their type set to “blockquote”.
Inline styles are a conundrum. One could argue that paragraphs containing inline styles should be broken down into another array, with each particle of that paragraph denoted for its semantic intent. That may be a bit much for the compiler to handle though as it would require a logical revision to the container from a simple hash key associated with a string for content to an array of typed strings, meant to be glued together by the post-processor:
Sample code...
```
"content":
[
	{
		"chartype": "normal",
		"content": "Beginning of the pagraph ",
	},
	{
		"chartype": "emphasis",
		"content": "some text stated emphatically",
	},
	{
		"chartype": "normal",
		"content": ", the rest of the paragraph...",
	},
],
```
We could maybe do that with Scrivener all by itself, but I bet it would really be easier to just mark up the paragraph string and post-process it somehow.
On that, one approach I would favour even though it adds dependencies to the post-processing chain, is using Markdown to mark up inline formatting in the strings. The advantage is that with Pandoc you can take a common source string and go pretty much anywhere you want with it. But if you have a certain single-format processing requirement, like HTML, then I suppose it would be just as easy to bake that into the compile settings. You wouldn’t be stuck at least, since all of this markup is coming out of the compile settings one way or another.

Either way, this can largely be done with Styles, using the prefix and suffix values (see “Emphasis” in the Styles pane). But another compile format pane to be aware of is the Markup pane, available to TXT output alone. I’ve only provided one simple example of adding Markdown links to internal cross-references (that’s the trickiest one as we are using the <$linkID> placeholder in conjunction with its usage in the Section Layouts to establish the value of the id key for each section chunk).
This all is a pretty simple example, but you can see that by using Section Types, and purpose-built Layouts, you could create additional structural designs beyond the array-of-paragraphs example. Since the paragraph style is being applied via the Section Layout itself, each layout can use whatever paragraph style it needs to format paragraphs to its intent.

So with all of that demonstrated, if what I was saying earlier didn’t make sense, hopefully this better illustrates what I mean about the implementation being too wide-open once we get past the bit about satisfying JSON syntax validity. But let me know if I’m missing something.

DeeLew · January 23, 2023, 5:46pm

AmberV,
Wow!
Thank you for the extensive and impressive explanation and suggestions.
Will need time to review it carefully.
Will have to explore/learn quite a bit more of Scrivener to do so.
Once time permits.
By the way, one of the target outputs is pure JSON, to be used for delivery on multiple platforms.
Thanks again!
Cheers!

DeeLew · January 24, 2023, 2:12pm

AmberV

Yes, in the following, I think you are correct -

“To circle back to the original request for one moment though: while you are absolutely right that JSON is like supporting XML export these days, and that it would benefit developers and the like who use Scrivener,…”

That is exactly the main point.

I convert the XML produced using your OPML export facility to JSON, using a variety of XSLT templates.
The resulting JSON is then used very productively downstream.

OPML + XSLT Transformation → JSON

A very helpful first-step (and extension to Scrivener) would be to produce the same data produced by the OPML, only present it in JSON. Just that would be helpful.

(I suspect if XML never existed, and the JSON format existed, Scrivener might have created the JSON to be helpful. I think doing so, given that the OPML feature already exists, in no way lessens the attractiveness of having such a JSON export option.)

Yes, there are many things that COULD be done, and it could look daunting. But it might look daunting BECAUSE of the multitude of options and possibilities that could exist.

In other words, do not fail to create an automobile predicated on the fact that there are so many ways automobiles can be designed, that conceiving the one automobile for every need is too daunting. Like Ford - Dare to just make a black one, with basic features, and see where the market takes you.

Cheers!

kewms · January 24, 2023, 6:29pm

Isn’t that what Ioa’s sample Compile format just did? The whole point of Scrivener’s Markdown engine is that we don’t have to create custom converters for every format that users might want.

michaelhendrsn · January 24, 2023, 7:02pm

A fascinating discussion. I would have thought that Scrivener is more akin to a universal scaleable engine that can fit in many cars by applying the right kit. And that the kits come from many others who work on body mechanics.
So long as Scrivener does not produce the final product (as a publisher might) they can be a partner for many writers, without being seen to favour one preference or another. The kit producers, on the other hand, get to favour particular approaches, styles and purposes-something that empowers the discussions that go on here and elsewhere. That said, I do envy the skills that @AmberV demonstrates by being able to demonstrate “kit” options for so many “body” types.

DeeLew · February 11, 2023, 5:33pm

Just had a moment…will share these remarks.

What I envisioned was a simple EXPORT menu option where, as in the case of OPML, one merely selects “JSON”.

As was put above, “OPML is essentially just a plain-text dump into attributes.”

The Wish List request was/is for a similar “text dump” into attributes into a JSON formatted file.

Just to be clear regarding my initial post.

AmberV · February 11, 2023, 6:18pm

Ah okay. Initially the request was for all formatting data to be included as well, which is why I came up with a way of exporting in my solution above, so long as styles are used (which I think is a sensible requirement for this kind of structured output).

It would be fairly simple to make that less complex if you wanted; just an array of plain-text paragraphs with no markup inserted into the strings.

As for an export though, off the menu, I’m perhaps missing why that would need to be a requirement in order to satisfy this. Bear in mind that with compile you can set it to only compile your selection as a default, so the result is really close to what you would get with a hypothetical exporter (save the latter would necessarily be dramatically less flexible).

I see I failed to respond to this before. I think this is not quite the point that I was trying to make. Perhaps it is not known that OPML support in Scrivener is in fact full integration of a somewhat standard file format that is used by various mind-mapping and other similar outlining tools. Adding support for that means giving people a way to, for example, export their Draft folder to FreeMind, tinker with it, and then bring the adjustments back into their binder.

You don’t have to be a programmer to benefit from that, and supporting this file format is more akin to supporting FDX (another XML format) for scriptwriters to integrate Scrivener with other tools.

So that’s where I was coming from with these questions. You’re using it as a generic output-only XML dump—and that’s perfectly fine (honestly I’d use something more expressive though, like Markdown-HTML or even FDX, or hey even DOCX or ODT if you want to get ambitious and have loads of formatting data to work with), but that’s not why the feature is there.

When I was asking what a “Book JSON” export looks like, I was asking if there is any kind of formal, or even conventional specification that would make sense to work toward. Otherwise we’re just making up stuff and hoping it’s useful—while ignoring the fact that anyone can build a customisable JSON exporter using the compiler and get precisely what they need from it.

Does that make mores sense?

We have a system that can build complex file format generators already—demonstrated above. Why would we make a super basic export-only format with few options, when we have that? That’s the main question I’m still stuck on, and so maybe I’m missing something.

DeeLew · March 26, 2023, 7:48pm

Hi Amber,

It has been a while since my last response. Apologies.
With all that has been happening work-wise, and globally, time has been of incredibly short supply (including a few 28 hour days, smile).

So, I have stolen a few minutes to circle back to my request, and your impressive response, to which I must, again respond…“WOW! Well done!”

What you provided is very impressive, and I will now very much consider/explore this approach.
Which might result in the elevation of Scrivener’s place/importance in the workflow.

So, thank you, Amber, for your guidance.
It is appreciated.

Cheers,
D

DeeLew · March 27, 2023, 9:56am

Hi Amber,
I have a couple of questions regarding your suggested approach.
Can I communicate with you directly via email, or must I pose them here?
Thanks,
D

AmberV · March 27, 2023, 10:27am

If you feel everyone who might want such a thing would benefit, here is best, but if it involves specific implementation details you don’t want shared, feel free to PM me. Just click my avatar and then the message button.

DeeLew · March 31, 2023, 3:33pm

UPDATE:
Due to Amber’s excellent guidance, I am now able to produce acceptable JSON output directly from Scrivener. Thank you, Amber!

This will definitely streamline and improve my overall authoring/publishing workflow.
So, life is good.

Cheers!

bernardo_vasconcelos · April 10, 2023, 3:51pm

That is great to hear. If you can, share a template or example so others can give it a try as well.

DeeLew · April 10, 2023, 4:23pm

Hi Bernardo,

Thanks for the note, encouragement, and interest.
The example Amber provided above is an excellent simple introduction.

I am still learning and exploring possibilities, to be sure.
Based on Amber’s example, I created several styles to be applied to different “data types”, and edited (modified the prefixes and suffixes) of one of her example styles to create the JSON for each new “data type”/style.

The resulting compilation produces the corresponding JSON, with some minor corrections needed (e.g. removing extra unnecessary ending commas). As I use Visual Studio Code for many other things, there are JSON-related plugins that will make the corrections with a simple click.

Below is a zip showing a couple of the “data types” created, along with the resulting output:

book_snippet.zip (210.7 KB)

That is really it so far.

Once I have time to learn more, and experiment more, I can provide something different from what Amber provided. In particular, I am just starting to look into MultiMarkdown, both within and outside of Scrivener.

So much to ponder and learn.

To be continued…

D

bernardo_vasconcelos · April 11, 2023, 3:31pm

That sounds great. Just to be clear, I meant the Scrivener project you are using to generate the JSON, but if you prefer not to share it, that is perfectly fine too

DeeLew · April 11, 2023, 3:56pm

Hi Bernardo,

Not to worry, you were very clear in your earlier message.

Again, see Amber’s message above.
I used what Amber provided. It generates JSON.
Cheers!