I should really have done this yesterday when I put the list up in the first place, but the results were so much more fun than the method. Anyway, here we are.

The table shows the 12 lists that were included, their country of origin, publication date, coverage and brief methodology. You can also click on any of the lists to see it at the source!

Name Country Date Coverage Methodology
BBC Big Read UK 2003 All time Poll
TIME 100 USA 2005 1923-2005 Critics’ selection
World Book Day poll UK 2007 All time Poll
Norwegian Book Club Norway/International 2002 All time Survey of 100 authors
Observer critics UK 2003 1600 onward Critics’ selection
Modern Library USA 1998 20th Century Critics’ selection
Telegraph UK 2009 All time Critics’ selection
Le Monde France 1999 20th Century Poll based on critics’ selections
Librarians USA 1998 All time Survey of librarians
Radcliffe Library USA 1998 20th Century Critics’ selection
German Big Read Germany 2004 All time Poll based on critics’ selections
New York Public Library USA 1995 20th Century Critics’ selection


For more lists, I can strongly recommend this page by Robert Teeter, which compiles a great many lists of both Western and Eastern classics.

Now, as I mentioned the other day, some of these lists gave a ranking from 1 to 100. In those cases, the top book got 100 points, second got 99, and so on all the way down to 1 point for book number 100.  For unranked lists, every book appearing was awarded 50 points. Based on this, every book included in any of the lists was given a total score from across all the lists. I then checked for each book how many of the lists it was eligible for. For example, a book published in 1950 was eligible for all 12, whereas a book published in 1850 was only eligible for the 6 “all time” lists, plus the Guardian’s 1600 onward one. Books published after 200o also had limited opportunity to be included, as many of the lists were compiled in the latter half of the 90s. Therefore, each book’s total score was divided by the number of lists it could have featured on, i.e. Pride and Prejudice’s total score of 523 was divided by 7 to give it 75 overall, while The Great Gatsby’s apparently superior score of 698 ended up at 58 as it was divided across all 12 lists.

I think the two main issues with the method are the selection of lists and the way the points were awarded. If anything, I should have collected a larger number of lists from a greater variety of sources. If you follow the Robert Teeter link above you’ll see that there are many many lists to choose from and you could argue that my selection is kind of arbitrary. In a rough way, I wanted to include a mixture of academic and popular selections, so I didn’t select lists like the St John’s College one, but you could argue it both ways. Regarding the points, I think I was too mean on the books at the bottom of the ranked lists. According to the methodology, being considered the hundredth best book of all time (1 point) is almost equivalent to being out of the running completely (0 points). To illustrate why this is a problem, take Midnight’s Children versus Slaughterhouse Five. As late 20th century books, both were eligible for all 12 lists; Midnight’s Children appeared on 8 and Slaughterhouse 5 on 4. You’d have thought Midnight’s Children would win by knockout, but two very positive scores from American lists gave Slaughterhouse 5 a decent score for 71st place overall, while Midnight’s Children ended up in 100th place on three separate lists and faded to 128th place. If list rankings had been ignored and points awarded purely for presence on lists, Midnight’s Children would have been in 21st place overall. This is probably an injustice; a future edition of the list may give greater weight to list presence.

All this suggests plenty of scope for refinement – in fact I’ve already started fiddling with the system and found one plausible way to give Midnight’s Children a higher rank than Slaughterhouse 5 – but for the moment I’m going to concentrate on the data mining, which is more interesting than all the compiling and classifying work that went into building the dataset. Coming next: international face-off.