Sam Flatery is a scholar working in what is becoming one of the hottest fields in the humanities today, text mining. “Text mining, Flatery explains, “is basically working with a textual corpus to highlight particular recurring verbal patterns. The patterns that we find, once we’ve removed ‘stop words,’ which are the relatively meaningless grammatical terms like auxiliary verbs, prepositions, and articles, can tell us a great deal about a particular body of writing. Analysis of those patterns reveal aspects of the texts that may not be apparent through a traditional close reading.”
Text mining and text analysis are being applied by literary scholars to a wide and diverse range of texts, ranging from traditional literary works such as the novel, to the outputs of the communities that form within social media such as Twitter. Flatery has chosen as his subject the online writings of essayist Thomas Friedman, whose commentary upon changing trends in higher education, published primarily in the New York Times, have become seminal explorations of the application of large scale technologies such as MOOCs (“Massively Open Online Courses”) to university courses.
“We are applying a form of what is sometimes called ‘Distant Reading‘ to Friedman’s works. The point is to analyze his writing and ideas without directly accessing or reading the text itself — we approach it from a distance, with the computer as an intermediary,” says Flatery. “This has all sorts of advantages. For instance, it reduces the impact of particular rhetorical structures on the reader and removes our own biases from the analysis, permitting a far more objective overview of the text.”
“In this case, however, the particular advantage is that I and my graduate assistants are not directly exposed to Friedman’s flatulent prose style or stupefied by the sheer vacuity of his putative insights. Distant Reading insulates us from those effects. And, because we are making this study public, we hope to be able to spare more general readers from having to actually read Friedman too. This is really pretty ground-breaking: our study is the first to explore the potential of Distant Reading as a prophylactic.”
Flatery’s project also applied new principles to its modelling of the data produced by the text analysis of Friedman’s columns. “Initially, we tried some fairly standard approaches to modelling — look for patterns and connections in substantive ideas, for instance. Basically, we came up empty: there really wasn’t a whole lot left after removing stop words and blankly banal restatements of other people’s ideas.”
Undeterred, however, Flatery and his team worked out a new and somewhat more nuanced approach to their data. “What we noticed as we sifted through the indexed words in context was the particular tone that Friedman took with regard to the subject of higher education in the US. So we borrowed a few principles from Sentiment Analysis, and produced a list of key terms and phrases that could better describe what Friedman was actually communicating. We call it Smugness Analysis.”
Flatery calls up some visualizations to illustrate his findings. “In this particular graph, we’ve measured the Smugness index in one of his articles on MOOCs across time. The colour red indicates complete smugness, while green is merely a moderate degree of complacency. Along the vertical axis, we’ve measured the degree to which he demonstrates his ignorance of his subject. As you can see, there is a clear correlation between the two factors, although it’s not quite 1:1.”
Flatery, clearly excited by his findings, points to several brightly-coloured peaks on his graph. “Here, where he talks about the benefits of MOOCs in relation to blended learning, you can see a peak in both his degree of ignorance and his smugness. But by the end of the piece, where he is addressing the economic benefits of online learning through MOOCs for financially disadvantaged students, the Smugness index nearly goes off the chart, and there is a kind of wild frenzy of ignorance. Really, Friedman is a master of his art.”
Flatery is understandably enormously excited about the potential for future applications of his methodology. “We’re hoping that our textual strategy of ‘not-reading’ eventually bears fruit in the ‘not-publishing’ of further essays by Friedman. Our next project is to apply these same principles and approaches to the Collected Educational Wisdom of Larry Summers. Not Big Data, of course: actually, the methodological problems derive from the fact that the content of that particular corpus is pretty darn miniscule. But once we’ve worked out the theoretical and technical details, we’ll be publishing our results, and updating them as need be.”
“We’re fairly confident that no one will ever have to read anything actually written by Summers again. We see it as a public service.”


