Importing — CKJ text

Hi,
This is probably somewhat esoteric in Keith’s terms, and the only other user I know from the forum who might have comparable experience is Maria, but …

I am about to embark on co-ordinating and editing the translation of a 25,000 Simplified Chinese character guide book into English — should come out at over 30,000 words as a rough estimate. So, I have started by bringing the Chinese text into Scrivener, so that (i) I can use Scriv. to split the text up and export the sections to send to the team of people who will undertake the basic translation; (ii) when it comes to editing, I will be able to have the original text section up in a split pane next to the basic translations to be easily accessible for checking; and (iii) I’ll be able to use Edit Scrivenings to bring it all together to check for consistency and style across all the sections … oh, and (iv) I know it’s going to require a fair bit of background research on my part too, so Scriv. seems perfect for the job.

The problem? The original text is a Chinese Wierd for Windoze doc, which I opened in Nisus and saved out as RTF. I have also extracted the very first bit of it for demo purposes for this forum:

If I try to use <file: import> to get the text into Scriv … whether I drag it into the binder, or use the menu/keyboard shortcut (seen as “Import” in the binder of the Scriv. project attached), the Chinese characters are all invisible … all you can see is strings of numbers, which are actually concatenations of dates in the text. What is really strange is that if you hover the mouse over the appropriate icon in the binder, a kind of note window comes up which does show the Chinese text!

If I open the file in Nisus, copy the text and then paste it into an empty binder document in Scriv. (“Pasted” in the binder) it then appears as normal.

Looking at the binder itself, the synopsis column is empty for the imported version, but the text is there for the pasted version.

I don’t know how many Scriv. users other than Maria are actively involved with CKJ languages … and Japanese and Korean have important differences from Chinese, and Japanese seems to be more closely embedded in the system than Chinese, so there may not be the same problem … but there seems to be a bug here that needs addressing. It seems to me you should be able to use Import to get a Chinese document into Scriv. just like a document in any other language, and not have to open it in another app and do a cut and paste.

Incidentally, I have a friend who is going to try to translate Alain de Botton’s “Art of Travel” into Chinese, and I think I have persuaded her that Scrivener is the tool she needs (She is now the proud possessor of a MacBook).

Mark
Chinese.scriv.zip (16.2 KB)

I don’t think there is a Scrivener bug here - at least, not if you are just importing as RTF. What happens when you import the RTF into TextEdit? Does it have the same issues? Scrivener just uses Apple’s own RTF importer methods, that is the problem. It may be a wider issue with the RTF format - dunno. What about exporting as UTF8 plain text and importing as plain text? Does that work? (The synopses are plain text.)

Best,
Keith

Hi,

I think it is a problem with the Auto-detection of the Script. Chinese comes in several Scripts, Big5 was mostly used on Windows, if I remember correctly. Apple’s Text engine uses UTF8.

I transformed all my files from old Scripts like EUC etc. to UTF8 with Cyclone. You can download this helpful app for free.

No problems since then. Whenever I read messages like yours I realise again how easy life has become with Mac OS X.

Best,
Maria

Sorry guys, but (i) Nisus default plain text encoding is UTF-8 on my machine; (ii) Nisus uses a version of the Apple Text Engine heavily edited to give all the other functionality; (iii) the RTF saved out from Nisus opens without problem in Text Edit and Pages; with NeoOffice 2 Alpha (beta 13 patch) it opens, but with some aberrant spacing. None of these have an “import file”, but they open it either by using “open” from the file menu, or by dragging it to the dock icon.

TextWrangler opens the RTF as a plain text file, with all the style code in the header, but the actual Chinese text just comes out as the string of UTF-8 character codes.

I copied it onto a flash disk and tried opening it on my former Release 1 400 Mhz PowerBook, running 10.4.8, importing it into Mellel 2.1.1 and opening it in Papyrus Write, both of which displayed it without problem, though I had to set the font manually in Papyrus (The Chinese font sets are not identical on that TiBook and my MBP as the “Song Ti” in particular is incompatible with the postscript in OS 10.4, so other Chinese fonts have been substituted — it’s an infernal nuisance not having Song Ti, as that is the most common Simplified Chinese font under Windows) though Mellel swapped fonts without problem … I wish I liked Mellel!

For good measure, I realised that I had actually opened it and saved it out in Nisus Writer Pro private beta 6 — I am a tester — so just in case anything had changed there, I opened it in Nisus Writer Express 2.7 and saved it out under a different file name as an RTF. I also saved it out as a plain text file and as an RTFD.

In each case, with Scrivener I got the same message that the file was being converted to RTFD — including the RTFD one! — but in each case they come up blank except for the numbers and some quote marks. The only version that is readable is the pasted version, though they all show the “tool-tip” type window with the text if you hover the mouse over the icon in the binder.

Anyway, here’s a zip archive of the new version of the project with the extra imports for you to see. For good measure, I also attach the original RTF document so that you can try for yourselves.

This doesn’t make using Scrivener too burdensome for these purposes, but having to go through copy and paste is somewhat disappointing.

Cheers,

Mark

PS Maria, FYI, Big5 was the standard coding for Traditional Chinese as used in Hong Kong, Taiwan and Singapore etc; GB2312 (8-bit) was the coding for Simplified Chinese as used on the mainland. Latterly GB18030 was used … I think that was devised on the mainland, but was 16-bit and provided for both Simplified and Traditional characters. It was soon superseded by UTF-8 which is now the standard, though I think some Chinese font sets use UTF-16. There are EUC codings for both Simplified and Traditional Chinese, but as people are upgrading their hardware, I find I have to switch to those less and less. OS-X comes with Apple’s own coding, which Windows machines can’t read and the original version of Mail didn’t allow you to switch coding on sent messages, making it useless for people such as me … which is why I found GyazMail and continue to use that. Most of the time when I send an email with Chinese, I get an alert saying my mail contains text in UTF-8 and do I wish to continue sending.
Chinese [9_Mar_2007_09_37].zip (30.1 KB)

Couldn’t add the second attachment … here’s the RTF. Didn’t realise I’d have to zip it to be able to post it.

By the way, this machine is a revision 1 MacBook Pro 17", Core Duo 1, 2.16 Ghz, 1 MB RAM, running fully updated 10.4.8, no haxies.

Mark
intro.rtf.zip (6.15 KB)

Mark,

I mostly work with Traditional Chinese if I work with Chinese, this is why I mentioned it, but I am not deep into the Chinese scripts and will inform myself in more detail – if ever necessary. Converting solved all my problems in any application with Traditional and Modern Chinese as well as with Japanese.

You may try to convert your files or not, I gave you a kind advice and did not ask for public lectures. I am not a guy.

Best,
Maria

By the way Keith, I tried importing the plain text version using the menu entry but with the same result. And you will also notice that on all the apparently blank versions, the status bar at the bottom of the window reports the existence of 1369 characters! I truly fear it is a Scrivener display bug. The characters or their codes are there in the file, they are simply not being displayed.

Furthermore, showing the contents of the bundle lists the RTFD files and choosing them with the contextual menu to “open with” opens them and the text is there.

Mark

Maria, I am sorry you took it like that … it was not intended as a lecture at all. I only know of you as using Japanese, so just as I am not acquainted with the intricacies of different Japanese entry systems although I am aware of their existence, I hoped it would be of interest to you. I also firmly believe that there is nothing that is not worth knowing about and it was in that spirit that I included that note. So if you feel offended, I most sincerely apologise and assure you that I am not a guy of that sort! Furthermore, and with due apologies, I did not take your informing me that:

as lecturing me. Please let us bury these hatchets.

That said, apart from the font issue, the only other text conversion problems I have ever encountered are: with Mellel, which wouldn’t open .docs in Chinese directly … I found I had to open them in TextEdit or Nisus and save them out as an RTF and import that into Mellel — one of the reasons I gave up on Mellel, quite apart from much preferring the Nisus interface; the other is that some documents produced by one flavour of Office for Windows … a combination of Office 97 running under a specific version of Windows, don’t open at all … whether Word or Excel documents and irrespective of the language or coding … I run them through MacLink Plus.

But I do thank you for your advice and will see if this application helps with this problem.

Yours

Mark

Mark,
sorry from my side as well.
I hope you get your problem solved,
Maria :slight_smile:

I had the same problems with Mellel, chose the same temporal solution with TextEdit or Nisus and eventually chose to abandon Mellel once and forever. In my case the problems were not only with .doc files but also with certain RTF. Sometimes I had to copy and paste. I am glad I do not need to recall all this mess any more…

Maria

Final word, unless Keith needs any further information from me. I have downloaded and installed Cyclone as recommended by Maria, and run intro.rtf through it.

Cyclone registered it as being in “Mac Western”, so I thought “aha … maybe we have a problem here”, so I had Cyclone export it as Mac Chinese. Imported into Scrivener, it still came up blank …

I have also re-opened the file in Nisus, checked that Nisus is marking the text as being in Simplified Chinese and re-exported. Problem is still there.

So I will continue to copy and paste until Keith tracks down what is going on and sorts it out, if he feels it’s important enough to take up his time. I actually think he has much more important things to develop in Scrivener, so I’m not going to hold my breath and this won’t diminish my enjoyment of Scrivener to any great extent.

Cheers
Mark

Bit later.

Sorry … checked again … the last version, the one in which I ensured Nisus was marking it as Simplified Chinese, does seem to have opened properly in Scriv, with the text showing … so that does seem to be the root of the problem. I suspect this is actually an issue with the way Nisus Writer Pro private betas are handling language and font encoding. I’ll take it up with them.

Ahh …

Wa . . . neh . . . ?

Yoi!

:slight_smile:

Dave