A formula for the perfect Goodreads novel

By Will Jarrett

Once upon a time, crafting the perfect novel required creativity, dedication, and a whole lot of luck. Those dark times are – thankfully – behind us.

An ocean of book data is now available online for the discerning would-be novelist to dissect. With the right analysis, this data can reveal trends in bestselling books, mistakes to avoid, and just maybe a formula for the next literary sensation.

The book-rating platform Goodreads is one of the best sources of book data available today. The website claims to have more than 90 million members and lists hundreds of millions of books. Although ratings on Goodreads are not necessarily representative of the whole reading public, they do give us a powerful insight into the thoughts of “the world’s largest community of book lovers.” I took a look at patterns in the 10,000 books with the most ratings on Goodreads to guide us on our quest to create a new bestseller.

Let’s start with one of our most important decisions: genre. By plotting how many books of each major genre made it into the top 10,000 alongside their average user ratings, we can get a sense of their popularity.

Children’s fiction, fantasy, and erotica

get the highest ratings on Goodreads.

Average

5-star

rating

4.1

4

3.9

Children’s fiction

Fantasy

Erotica

Romance

Young adult

Non-fiction

Sci-fi

Crime

Historical fiction

Horror

Humor

Thriller

Mystery

0

400

800

1200

1600

2000

2400

2800

(Literary fiction

is way down

here, at 3.74)

Number of books

in top 10,000

Children’s fiction, fantasy, and erotica

get the highest ratings on Goodreads.

Average 5-star rating

Children’s fiction

4.1

4

3.9

Fantasy

Erotica

Romance

Young adult

Non-fiction

Sci-fi

Crime

Historical fiction

Horror

Humor

Mystery

Thriller

0

400

800

1200

1600

2000

2400

2800

Literary

fiction

Number of books

in top 10,000


Children’s fiction, fantasy, and erotica have the highest 5-star ratings. But fantasy is way ahead in terms of reach – over a quarter of all books in the top 10,000 include “fantasy” as one of its main tags, so it looks like a safe bet for securing a wide and appreciative readership.

This graph also shows us what to avoid. Humor and horror stories have poor readership and ratings, so we shouldn’t try to be too funny or scary. Literary fiction seems deeply unpopular, with an average 3.74 rating, so we should take pains to avoid a highbrow style.

But what if we want accolades from critics as well as adoration from the public?

Literary fiction wins the most awards. Erotica wins the fewest.

Horror

Romance

Historical fiction

Sci-fi

Crime

Thriller

Literary fiction

Erotica

Fantasy

Young adult

Mystery

Children’s fiction

Non-fiction

Humor

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

Average number of awards

Literary fiction wins the most

awards. Erotica wins the fewest.

Average

number of

awards

3.6

3.4

3.2

3

2.8

2.6

2.4

2.2

2

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

Literary fiction

Young adult

Sci-fi

Children’s fiction

Historical fiction

Fantasy

Crime

Mystery

Horror

Humor

Romance

Non-fiction

Thriller

Erotica


It turns out that literary fiction scoops awards like nothing else. If we were just after critical acclaim, that would be the route to go, but given its unpopularity with the public we should still give it a miss.

Despite its enthusiastic audience, erotica wins the fewest awards, so steamy prose is also ill advised.

The second top award winner is young adult fiction. This sounds good for us, as it fits well with fantasy – over 10% of the top 10,000 books have both “fantasy” and “young adult” in their top tags. This mix of genres has an average 4.07 rating and 1.6 awards per book, so it is popular with both the public and with critics.

By analyzing the descriptions of books scoring above and below 4 stars, we can take an even deeper look at the kinds of topics that are popular with readers. Words related to family and social relationships appear to correlate with high scores: “mother,” “husband,” “man,” “girl,” “family,” “marriage,” and “love” all appear in the top twenty words indicative of a high user rating. The words “battle,” “fight,” and “secret” feature highly as well. “Detective” and “celebrity” both correlate with low scores, so we should give them a wide berth.

A similar analysis of the books’ titles shows that one of the best indicators of high ratings is the presence of words such as “volume,” “saga,” “trilogy,” or “chronicles.” This suggests that our book should be part of a series rather than a standalone novel.

So, we know the genre of our novel, we know (roughly) what it is about, and we know that it should be part of a series. But how long should it be? There appears to be a sweet spot for book length in Goodreads hits, with a third of its top books falling between 301 and 400 pages long.

A third of Goodreads’ top books are 301–400 pages long.

These books get

the highest 5-star

user ratings

These books win

awards most often

Pages

1000

900

800

700

600

500

400

300

200

100

0

500

1000

1500

2000

2500

3000

3500

Number of books in top 10,000

A third of Goodreads’ top

books have 301–400 pages.

Pages

These books get

the highest 5-star

user ratings

These books win

awards most often

1000

900

800

700

600

500

400

300

200

100

0

1000

2000

3000

Number of books in top 10,000


By a small margin, awards and high ratings tend to go to books with more pages. Nonetheless, as comparatively few people read those books, the 301–400 range still looks like our best bet to secure a large audience.

Once our first book has been written, there are still things we can do to give ourselves an edge. For instance, the perceived gender of the author may have a minor impact on popularity. In the top 10,000 books, books published under ambiguously gendered names (such as authors who use initials like J.R.R. Tolkien) tend to have a slightly higher user rating than authors with names typically perceived as either male or female. This could be worth trying for a little boost.

So, our ultimate book is a fantasy story aimed at young adults, with some focus on familial or social relationships. It should be part of a series. It should be 301–400 pages long and published under a gender-neutral name.

Dozens of the top 10,000 books fit this mold exactly – including the early Harry Potter books, the Series of Unfortunate Events saga, and The Vampire Diaries. It seems that this formula has already produced a good number of bestsellers.

With that in mind, best of luck with your magnum opus, and keep an eye out for my upcoming title, The Blood Secret Chronicles, Vol #1 by W.D. Jarrett.


Methodology

To find the URLs for the top 10,000 books, I used an existing dataset posted on Github. Then I scraped each book’s webpage using the Python library Beautiful Soup to find details such as the number of awards and descriptions. I used a VPN and an error log to get around Goodreads’ block on scraping (which kicks in after a few hundred pages).

I used various Python libraries including pandas and regex to conduct my analysis. Graphs were made using Altair and cleaned up in Illustrator. I used the New York Times’ aitohtml tool to convert the Illustrator files into HTML files before embedding them in the page.

Header photo credit Ed Robertson