Statistics of Mystery Novels

In August 2020, I asked readers’ help in collecting data on a stratified random sample* of 60 mystery novels, selected from the list of 655 books that have been nominated for an Edgar® award in the Best Novel or Best First Novel category between 1946 (the year the award started) and 2020. I wanted to see whether mysteries written by men tend to have a higher or lower body count than those written by women, and whether there were other differences in the books depending on when, and by whom, they were written.

The data have now been collected. The file mysteries.csv of the data sets accompanying my book Sampling: Design and Analysis, Third Edition (now in print!) contains selected variables (no plot-spoiling variables, though) from the sample. Chapters 3, 10, 11, and 12 feature exercises using this sample. In this post, I’ll compare statistics about circumstances of homicide from the mystery novel sample with statistics from the FBI Supplemental Homicide Reports from 1985 to 2019.**

General Book Information

Information about gender of author(s), book genre (private eye, procedural, or suspense), gender of detective(s), setting (urban, small town, or rural), and historical setting was available from online bookseller information or book reviews for all 60 books in the sample. Here are some summary statistics (calculated using the stratified sampling design):

32 percent had a female author (95% confidence interval [20, 44]).
26 percent had at least one female detective (95% confidence interval [14, 38]).
Not surprisingly, female authors were more likely to write about female detectives than were male authors. 59 percent of the books by female authors had at least one female detective, compared with 11 percent of the books by male authors (this difference was statistically significant with p-value = 0.001).
64 percent took place in an urban setting (95% confidence interval [50, 78]).
22 percent were “private eye” novels, 16 percent were “procedurals,” and the remaining 62 percent were of the “suspense” type (this category included the one “cozy” novel in the sample).
26 percent were historical novels, with the main action occurring at least 20 years before the publication date. There were no significant differences between male and female authors with respect to urban/non-urban setting, genre, and historical setting.

How do these statistics compare with those from the FBI? The online FBI data do not include information about location or detective gender, but some of that information is available elsewhere. Table 2 of Crime in the United States, 2019 reports that of the estimated 16,245 homicides in 2019, an estimated 14,539 (89.5%) were in metropolitan statistical areas and another 839 (5.2%) were in cities outside of metropolitan statistical areas. Of course, this discrepancy might be expected because most of the novels take place before 2019 (some in medieval times) in a less-urbanized world. Earlier volumes of Crime in the United States show a lower percentage of homicides occurring in cities; the 1975 volume, for example, indicates that 14,764 (78%) of the estimated 18,830 homicides from that year took place in cities.

What about detective gender? I’m not aware of any statistics kept on the type of amateur sleuths featured in many suspense novels, but there is demographic information on law enforcement officers in the United States: in 2019, about 13% were female.

Statistics about Murderers and Victims

The information available online about book location and genre didn’t tell who in the book was murdered or who did it. Variables such as number of victims or gender of the criminal(s) could only be determined by reading the books. Twenty-seven of the books in the sample were easy to obtain—they were in the public library, or I or one of the friendly mystery-loving volunteers owned them. One of us read each of these books and recorded the information about number of victims and number and gender of the criminals.

The remaining 33 books, however, were not on our bookshelves or in local libraries; many, especially those published more than 20 years ago, were out of print. In other words, the response rate for the initial attempt to obtain the data about the particulars of the fictional crimes was 27/60, or 45 percent. Moreover, the nonresponse was not evenly distributed across the strata. All books in the strata of recent “Best Novel” winners were found, while none of the non-winning books nominated for “Best First Novel” before 1980 were readily available.

Statistics calculated from the 27 readily available books from the sample might be misleading, if the readily available books differ from those that are harder to obtain. Suppose that the readily available books tend to have more victims than those that are harder to find. Then estimates calculated from the 27 books would overestimate the average number of victims per book.

To avoid this potential bias, I selected a random subsample of the nonrespondents in each stratum and purchased the 13 books in that subsample from used book vendors. This gave me a sample of 40 books altogether** that can be used to estimate characteristics of the victims and murder weapons in the population of all 655 books.

I estimated that on average, the 655 books contain 3.55 murder victims per book, for an estimated total of 2328 victims, of whom 1867 (80%) are male and 461 are female. According to FBI Supplemental Homicide Reports from 1985 to 2019 (excluding the pandemic years), 433,631 (77%) of the victims whose sex was reported were male and 128,228 were female.
The average book by a male author has more than twice as many victims as the average book by a female author (4.1 and 1.9 victims per book, respectively, with p-value = 0.01).
An estimated 80 percent of the books have at least one male murderer, compared with an estimated 40 percent of the books that have at least one female murderer (some books have both). But the number, and gender, of murderers did not exhibit significant differences by author gender.
Altogether, the 655 books are estimated to have 960 male murderers and 297 female murderers, with about 67% of the murderers being male (95% confidence interval [65%, 88%]). The FBI does not know the gender of all murderers (not all are solved); of offenders between 1985 and 2019 with known gender, 89% were male.

Finally, a few random observations. These exploratory statistics are not based on pre-planned hypotheses, so take them for what they are worth.

None of the murderers in the books I read “got away with it.” This is not true in real life. The percentage of homicides cleared by arrest declined from 79% in 1976 to 62% in 2005. In 2019, 61% of homicides were cleared by arrest; in 2020, that percentage dropped to 54%. The clearance statistics include only homicides that are known to law enforcement.
An estimated 44 percent of the books have at least one character (usually a murderer) who could be characterized as a sociopath.
Sometimes, instead of going to the police with information about a murder, a minor character attempted to blackmail the murderer. This did not end well for any of the blackmailers.
The protagonists in the books were nearly uniformly portrayed as being intelligent and perceptive people. But in 42 percent of the novels, the protagonist did something unbelievably stupid near the end of the book, such as agreeing to meet the suspected murderer alone in an abandoned warehouse. And really, how intelligent is a detective who cannot figure out which of six people in an isolated country house committed a murder until after two more people are killed?

Footnotes

*The 655 books in the population were stratified by the cross-classification of three variables:

Nomination year (1946-1980, 1981-2000, or 2001-2020)
Type of nomination (best novel or best first novel)
Did the nominated book win the award? (yes or no)

Four books were randomly selected from each stratum containing award winners, and six books were randomly selected from each stratum consisting of books that were nominated for an award but did not win. The sampling fractions were higher for the six award-winner strata than for the other six strata, but we can still estimate characteristics of the population of all 655 books by accounting for the oversampling of award-winners when calculating statistics.

An alternative stratification might have used author gender, since I was interested in calculating separate statistics for male and female authors. But that stratification would have required manually going through the list to assign a gender based on the name, whereas the stratification variables used could be calculated easily in a spreadsheet. In addition, some mystery authors go by initials or use a pseudonym, so a stratification by guessed gender would likely have some misclassifications.

**These data are from the Supplemental Homicide Reports, collected on approximately 85% of law enforcement agencies in the United States each year during the time period studied. The statistics in this post were accessed from the Crime Data Explorer in November, 2021 and may differ from currently posted statistics because the website statistics are revised as new data come in. See my book Measuring Crime: Behind the Statistics for a guide to evaluating and interpreting crime statistics from the FBI.

***The 27 books initially obtained represent their share of the population from the original sample. The 13 additional books in the sample represent their share of the population from the original sample plus part of the nonrespondents’ share; they thus have higher sampling weights than the other 27 books. This sample (called a two-phase sample) still represents the population, though, because both initial respondents and initial nonrespondents are included with known probabilities of selection.

Sharon LohrDecember 10, 2021