NOW they tell us...

Forget any “Rules for good writing” you’ve heard (or read). A computer analysis of successful novels shows what grabs the reader, what really sells.

Successful works include “heavy use of conjunctions …and large numbers of nouns and adjectives,” and rely on “verbs that describe thought processes, such as ‘recognised’ or ‘remembered.’”

And the less-successful works? They include “more verbs and adverbs,” and rely on “words that explicitly describe actions and emotions, such as ‘wanted,’ ‘took’ or ‘promised.’”


Wait, how does romance genre even exist? Granted I’ve only read a few pages of it (sorry, I just couldn’t stop laughing) but it seemed to be all verbs, adverbs and “want”, “took”, and … “@#$%ed”.

I know I’m slow, but something seems stupid there.

Lots of ‘thought-process’ words, maybe? I bet Nicholas Sparks scores highly. And from Hemingway onwards, many modern writers, romantic and otherwise, have been prolific users of ‘and’ and ‘but’ - especially to start sentences (despite what my old English teacher would say), and also to chain one to another.

It can be a very calming, soporific and almost dream-inducing style.

I’ve skimmed the scientific paper, and I like how they’ve managed squeeze in an insult to Dan Brown. :smiley:

There is a very strong chance that I’ll be compelled to write a website post on this over the weekend. Too many opinions… must inflict them on the world…

Anybody who’s anybody manages to squeeze in an insult to Dan Brown. Personally I’d be very happy to see my bank account insulted to the extent his must have been.

Right now having my bank account insulted the way the average-kid-with-a-paper-round’s account is would be an improvement, so yeah… I imagine he’s okay with the occasional slur.

I feel grateful and fortunate that I was a theoretical linguist but not a computational linguist. Computational linguists should think of the shame and embarrassment of this inane drivel!


Mr X.

Mr X, bear in mind that when this paper talks about ‘success’, it may not be defining it in the same way that you would.

This is what it says:

‘In order to quantify the success of literary works, and to obtain corresponding gold standard labels, one needs to first define “success”. For practical convenience, we largely rely on the download counts available at Project Gutenberg as a surrogate to quantify the success of novels. For a small number of novels however, we also consider award recipients (e.g., Pulitzer, Nobel), and Amazon’s sales records to define a novel’s success. We also extend our empirical study to movie scripts, where we quantify the success of movies based on the average review scores at We leave analysis based on other measures of literary success as future research.’

For what it’s worth, I can foresee further steps in this direction in the future, however flawed or challengeable.

The data base is skewed sharply to Gutenberg and imdb. In other words, books out of copyright, and a mixed bag of critical opinion. Further steps in this direction, unless they incorporate world-wide sales and box-office figures, likely will be flawed and challengeable.

Follow the money? Yes. Isn’t that the gold standard of success?

“For a small number” they also include award recipients. Trying to strike a balance between marketplace and medals?

The general idea is interesting, but like a lot of research projects (I’ve been doing medical studies correlation), data and methodology can seem flawless yet still produce misleading results.


The bottom line is that they appear to have effectively restricted their analysis to published works, and through Project Gutenberg, to works good enough to be preserved for posterity despite being written a fair while before the invention of the computer.

To put it another way, threshold requirements for publishing like plot, characterisation, intrigue, originality and luck are a given in their sample set. They also point out, using Dan Brown as a example, that writing can be bad and still be successful by the measure they used.

A second, different attempt to forecast ‘success’: