Bug in project statistics (inaccurate word count)

I’m running Scrivener v1.03 on a PPC PowerBook G4.

When comparing the Scrivener word count feature with the system statistics service (Scrivener/Services/Statistics), I’ve found that the Scrivener word count often overcounts, sometimes quite spectacularly; I’ve just looked at a document that Scrivener says has 507 words, while Statistics says it is 484.

I’ve verified this by checking the statistics of the same text pasted into TextMate and MS Word for Mac documents.

This is not a bug. What is counted as a “word” will vary between different counters - you will find slight variations, for instance, between Word, Nisus, Mellel and so on and so forth. Scrivener’s word count feature is fairly robust but no word count algorithm is perfect - take them all with a pinch of salt, including whichever one you have installed in your Services menu.
All the best,
Keith

Hi Keith,

Thanks for the response, but I must say I find it a bit unsettling that three different applications disagreed with Scrivener’s word count algorithm, but that they were all in agreement with each other; I actually printed out and manually counted each word in that (thankfully very small) project.

I take your meaning re the flexibility of word count algorithms, but surely if there is this much evidence stacked up against the Scrivener word count, it merits some investigation and fine-tuning?

Most users, myself included, are writing to specific word-counts, and I think this ‘feature’ (I still think of it as a bug) has the potential to damage relationships with commissioning editors; e.g. if you trust you’re handing in an article or project that is just right, in terms of commissioned length, but is in reality 50 or more words off target. That’s another thing; the disparity between the comparative word counts actually increases as the Scrivener project increases in length.

Looking forward to your response,
Bash

Bash,

I would be grateful if you would back up and substantiate your claims. For instance:

Exactly what evidence? Stacking up? It would really be quite helpful were you to name the three applications (which hardly constitute a “stack” of evidence).

At any rates, and I will state this again: there is nothing wrong with Scrivener’s word count mechanism. I investigated it thoroughly when I implemented (I am not, as you imply, careless about such things) and I have just taken an hour or two out of my development to revisit it once again. Suffice to say, I am very happy with the results. Before comparing some popular applications (I cannot refer to those you mention as you do not name them), let us just consider some important factors.

First - and it sounds a silly question, but really it isn’t - what is a word? I mean, in terms of word counts? Do you count spaces, or punctuation? Different word processor do things differently. Take, for instance, the following phrase:

Hey - tick-tock.

How many words are there? I would say two, given that tick-tock is made into a compound word by the hyphen. Scrivener counts three, as it finds words based upon Apple’s word finding code but refuses to count punctuation. Thus “tick” and “tock” are separated into separate words = 3 words. Mellel and Nisus Writer Pro do exactly the same and count three words.

Devon Technologies’ WordServices “Statistics” feature (available from the Services menu, which is, I suspect, what you were referring to in your first post, though you did not clarify this) also counts three words. It counts “tick-tock” as one word, but unfortunately counts the hyphen between words as a word in itself. This will add a lot of words in texts that use a lot of hyphens surrounded by spaces.

Word is best here and only counts two words.

Okay, so what about the following:

1.09 is coming.

Clearly, three words. All sample apps agree except for Mellel which counts four, as it splits “1” and “09” into two words because of the period.

Okay, so onto a couple of texts with long word counts. I grabbed a document from Scrivener - just a regular prose document with paragraphs, some dialogue, scene breaks, the sort of thing you would get in an average novel or book. The different word counts were as follows:

Scrivener: 33,184
Devon’s WordService Statistics: 33,214
MS Word: 33,218
Nisus Pro: 33,210
Mellel: 33,147

So, Mellel gave the lowest count, Word the highest, with the difference between the lowest and highest being 71 words - not bad for a document of 33,000 words (and thus an error margin between apps of around 0.2%)!

Next up, a similar document with over 50,000 words:

Scrivener: 50,740
Devon’s Statistics: 50,874
MS Word: 50,890
Nisus Pro: 50,781
Mellel: 50,700

Again, Mellel gave the lowest word count and Word the highest, this time with a difference of 190 words. Still not bad, with around a 0.4% error margin between apps.

Next up, an e-text of The Brother’s Karamazov, taken from fyodordostoevsky.com/etexts/the_ … amazov.txt:

Scrivener: 350,565
Devon’s Statistics: 349,362
MS Word (which has given up all attempts at offering a live word count now): 349,588
Nisus: 350, 565 (identical to Scrivener)
Mellel: 328,577

This time Mellel seems to be way off, as it gives a count of 22,000 words less than any of the other apps. Scrivener’s count is identical to Nisus’s, and this time Word’s count is less than either, but there is less than 1,000 words in it - not bad for a 350,000 word text.

So, in a comparison with the major word processors on the Mac platform, Scrivener does pretty bloody well - it is as reliable as the word counts of Nisus and Word, and possibly a little more reliable than that of Mellel. Certainly, to say that Scrivener overcounts and “somtimes quite spectacularly” is clearly wrong. It would, however, have been (and still could be) helpful had you posted the 484/507 word document you mentioned.

Finally, let’s sober up a little. Just how obsessed with word count do we need to be? If you have to write a 1,000 word article, then it is fairly important. But the word count is only a rough estimate for column space, and with such a short article the discrepancy between word counts in different applications is likely to be minimal, as can be seen from the above tests.

It is perhaps fitting to give the last word to Miss Snark, the (in)famous blogging literary editor:

Whew! Now that was a missive. In the words of Trinity: “Dodge this!”

I accept your unreserved apology graciously.

All the best,
Keith

Just a comment on your point that word count is a shortcut for column space. When people are being overly obsessive about word count, I sometimes find it helpful to remind them that

see spot run

and

semiconductor manufacturing technology

both contain three words, even though the latter takes triple the column space.

As an editor, the only time I ever counted words was when I wrote checks based on word count. (In which case you can bet I took the lowest estimate I honestly could.) The rest of the time, all that mattered was whether the words would fit in the amount of space I had available.

Katherine

I should have thought twice before providing an application for the most pedantic population in existence: writers. :slight_smile:
Best,
Keith