My old friend Stephen Marche, the renowned Shakespearean, is at it again, this time with an impassioned piece preaching the massively controversial credo that “Literature is not Data.” It’s an attack on authors and academics. Or on digital humanists. Or on algorithms (which are, saith Marche, fascist). Or something. It’s a very strange, very ill-informed, very incoherent essay, and demands a more in-depth response from someone who is more immersed in current Digital Humanities practices than a mere dabbler such as myself.
But there are a couple of characteristic blunders in Marche’s article that I feel compelled to write about.
First of all, there is the weird narrative of Google Books he spins. In that story, “the openness and honest labor of engineers” comes face to face with the “closed ranks” of the “priestly class”: poor old Google just wants to make all the books it’s digitized freely available, or at least searchable, while “literary people” selfishly reject “the gift of digitization.” If Marche is to be believed, the conflict over Google Books was fought between a benign team of practically-minded innovators and a coterie of “writers and professors,” who, far from being “liberals, hedonists, bohemians,” are “in fact, profoundly, deeply, organically conservative.” He mentions, but then quickly ignores, that the legal case against Google was brought not just by the Authors Guild, but also the Association of American Publishers. Corporations, in Marche’s story, are good: they solve problems. Writers and thinkers, on the other hand, are bad: they squabble, and they “create problems rather than solving them.” Publishers, somehow, don’t appear to have a dog in this fight.
This is, of course, nonsense. It’s similarly nonsense to claim that “professors” were especially active in fighting Google’s noble mission to democratize knowledge. Very few professors make any money at all from their publications, as Marche must know. I recall some colleagues reacting with trepidation to the prospect of their books becoming available in full on Google, but that, of course, never happened; I would think that the vast majority of us are quite happy to have our work more widely accessible than it is when contained in the pages of $100-plus volumes and locked away in university libraries. The authors who objected most strenuously to Google’s project were those who stood to lose royalties — and, of course, their publishers.
Most revealingly, however, Marche claims that the idea of a digitized library of the world’s books was Google’s idea. It wasn’t. I wouldn’t presume to offer an authoritative alternative history, but I will point out that the Internet Archive, now containing almost 3 million public-domain texts, started in 1996, well before Google got on the band wagon. Nor is it true that “the world’s five largest libraries signed on as partners” in the Google Books project. They didn’t, and they haven’t. Some very large libraries were among the initial partners (Harvard and the NYPL, currently ranked 3rd and 4th in the US). But none of the major National Libraries have joined in, and the project remains extremely Anglocentric in focus.
Marche thinks Google should never have engaged with authors, because that way, well, lies madness. Instead, he proposes, “In hindsight, perhaps, Google should have followed the law for ‘fair use’ of copyright, come to agreements with the world’s major libraries to provide the Book Search to public institutions in perpetuity, and stepped aside.” Sounds good. Except that what “fair use” means in this context is far from a settled legal issue. It was the question at the heart of the law suit, and a question left open when Google and the parties in the law suit came to a settlement in 2009. That settlement was rejected by a judge in 2011, and the case is currently pending.
However, none of this has stopped Google digitizing books: the collection is steadily growing. Nor has it made conducting full-text searches harder. Google won’t display copyrighted material from books whose publishers have not signed an agreement, but the text is still being searched. And in any case, this only concerns material still in copyright. All older texts are fully and freely available.
All of which is to say, I have no idea exactly why Marche thinks Google Books has been a “failure” — or why he claims that scholars simply refused to engage with the kind of work Google is doing:
Academia could have done what humanists have done throughout history and tried to add to Google’s mandate: make the texts legible and available. They could have tried to bring out the contemporary relevance that only historical context, knowledge of literary tradition, and scholarly standards can provide. But this ancient task was anathema, for the simple reason that it would have involved honest work. Much easier to remain in the safe irrelevance of mass publication in the old mode, what Kingsley Amis called “the pseudo-light it threw on non-problems.
The central sneer here appears to be that academics don’t like “honest work” and prefer “mass publication in the old mode” — a mode that apparently does not involve making legible texts available. I honestly have no idea what Marche is talking about. The past twenty years have seen an astonishing wealth of academic, not-for-profit undertakings that make texts available in reliable versions all over the place all the time — independently of or in cooperation with commercial enterprises such as Google’s. That Marche would locate the true scholarly spirit so emphatically inside the hallowed halls of Googleplex speaks volumes.
Secondly, Marche talks a bit about the supposed impact of the digital revolution on academic research. His prime example is EEBO (Early English Books Online). He does not seem to be aware that EEBO is an expensive subscription service, nor does he seem to realize that the vast majority of the books it contains are simply digitized from microfilms that were available long before the World Wide Web changed everything. Here’s how he imagines Renaissance scholars worked in the bad old days:
Before EEBO arrived, every English scholar of the Renaissance had to spend time at the Bodleian library in Oxford; that’s where one found one’s material. But actually finding the material was only a part of the process of attending the Bodleian, where connections were made at the mother university in the land of the mother tongue. Professors were relics; they had snuffboxes and passed them to the right after dinner, because port is passed left. EEBO ended all that, because the merely practical reason for attending the Bodleian was no longer justifiable when the texts were all available online.
No British Library in Stephen Marche’s world; no Huntington, no Houghton, no Beinecke, no Folger, no Newberry, no Library of Congress; no Cambridge University Library, no National Library of Scotland. Renaissance scholars all flocked to the Bod — and now, one supposes, the Bod stands empty, while we all stare at our screens. I’m glad Stephen Marche was treated to snuff in hall at whatever college he was staying at in Oxford — I never have been, though I can report that professors still eat dinner there, and still pass the port. Some of them may fairly be considered relics, though, I expect, no more or fewer of them than in the pre-EEBO days. And the Bodleian remains busy, as do all the other excellent and well-stocked research libraries I mentioned.
It is certainly true that things have changed. Scholars fortunate enough to work at institutions with an EEBO subscription can read far more materials at home, just as those whose libraries owned full runs of the old STC microfilms could. But that hasn’t spelled the end of research trips to archives. What is true is that there is greater interest in manuscript work now than for a long time, and there is doubtless a connection between that shift in focus and the wider availability of digital texts. Cynically, one might suggest that scholars need to justify research trips somehow, and looking at manuscripts, or at individual copies of works, is a great way of doing that. More idealistically, one might argue that services such as EEBO have freed up more time for archival exploits that were simply not manageable for most scholars before. Either way, the scene Marche describes still plays out, all around the world (not just in Oxford). Though without the snuff. Same as it ever was.
And then there’s this bit of weirdness: “Stylometry, the analysis of definable patterns in literary styles, has also been a mode of desacralization.” Sure, I suppose. But of course stylometry has nothing to do with Google Books. Or, for that matter, inherently with the internet (as I imagine Lorenzo Valla would point out if he still could). Marche’s single example of the triumph of stylometry — the addition of Middleton’s name to the title page of Timon of Athens — has its basis in R. V. Holdsworth’s unpublished 1982 PhD thesis. In published form, the most prominent summary of the arguments can be found in Brian Vickers’ Shakespeare as Co-Author, which appeared in 2002: two years before Google Books put a single digitized volume online.
All of this forms the long opening salvo to Marche’s essay, which, given its ostensible purpose of arguing against the Digital Humanities, may seem a little odd. So far, he’s singing the praises of Google Books, highlighting the virtues of EEBO and of the new internetified science of Stylometry, and castigating crusty old lazy scholars for refusing to do their bit to make the media revolution happen. Sounds like a grand defence of DH to me, or at least a heavily corporatized version of DH.
But then Marche switches from one imaginary target to another — if academics aren’t loathsome in their retrograde attachment to paper, they are vile in their refusal to acknowledge the special status of the literary: “Literature cannot meaningfully be treated as data. The problem is essential rather than superficial: literature is not data. Literature is the opposite of data.”
To which one obvious answer is: well, duh. And another obvious response may be “Well, only if you don’t understand what ‘data’ means.”
On one level, Marche is naturally right, though it’s a little absurd that he thinks this is a great insight: “The experience of the mystery of language is the original literary sensation. The exuberance of ancient literature — whether it is in the simple, inscrutable lyrics of Sappho or Oedipus’s tragic misunderstanding of the oracles — contains a furiously distressed joy that words mean so much more than they mean.” As so often in Marche, it’s all expressed in too absolute terms, too, if you will, exuberantly, but the ideas are anything but new. Or controversial.
I don’t know, to be honest, what text mining DHers would have to say in response. I doubt anyone has come up with software that can explain how great literature works. I hope no one has. And if anyone ever were to develop a program that can deliver the ultimate analysis of any text we feed it, our jobs as teachers of literature would probably be over. But so would the jobs of literary authors. And as far as I know, no one is actively trying to destroy literature through electronic demystification.
Marche writes as if all scholarship were engaged in acts of literary interpretation — more specifically, in acts of close reading. As he must know well enough, given his academic background, nothing could be further from the truth. Criticism is one kind of literary scholarship, but it’s only part of the larger enterprise; and I suspect it’s the part DH is least good at. Literary history, on the other hand, is far more likely to benefit from the broad-based, distant view data-rich approaches can offer — although Marche, bizarrely, thinks that “the process of turning literature into data removes … the history of the reception of works.” He’s right that a data-centric approach is less likely to be influenced by “taste” or “refinement,” but for my money, that’s a good thing. History dictated by taste is history written by the winners. And that’s bad history. “Meaning is mushy,” Marche writes, not inaptly. But whereas the meaning of a line of poetry may emerge more clearly, or more richly, simply through contemplation, through critical engagement, the meaning of a historical development is just as likely to become more apparent through a process of accumulating more data — of stepping back and seeing the development in the broadest possible context, the kind of context data analysis can provide with a clarity and a neutrality likely lost in a critical endeavour propelled by questions of taste and a desire for refinement. (I can’t be the only one who’s finding it difficult to reconcile Marche-in-Matthew-Arnold-mode with Marche-in-Google-Books-acolyte-mode.)
Finally, Marche seems to think, rather puzzlingly, that “data” implies “completeness.” “Literature is terminally incomplete,” he notes. He means that not every text ever written has survived, as far as I can tell, though he quickly moves from this discussion of literature’s archival fragmentation to the (unrelated) challenge of the fragmentation of meaning in the literary text. He appears to concede that this problem of partial transmission does not afflict literature alone (“The information we have about the past is, in almost every case, fragmentary”), so that it is presumably not literature alone, but all human existence that is “haunted by such oblivion, by incipient decay.” But it’s unclear why any of this should matter in any case. No data set is ever complete. Marche’s counterexamples are baseball stats and case law. He doesn’t seem to be aware that in both those cases, we’re dealing with flawed and incomplete sets of data. Baseball stats have become ever more detailed and fine grained in recent years, and many of the analyses now possible (of pitching data in particular) cannot be undertaken for historical figures, as the numbers aren’t available. And the idea that it’s possible to “establish a complete database for all of the legislation and case law in the world” is just preposterous. Like any other human activity, law cases are subject to transcription and transmission, to conventional editing and pruning and to archival loss. There is nothing special about literature’s transmission challenges. Working with incomplete and unreliable data sets is an entirely familiar and common experience for analysts of all kinds.
It’s thus not news for anyone that “there are always masses of data which are simply missing or which cannot be untangled,” though some of us may be surprised to learn that “the most obvious and relevant example is Shakespeare.” Why would he be? Obvious, perhaps — but relevant? How? To whom? And in what sense? What’s clear is that Marche himself finds the Shakespearean data set confusing, so let me clarify: “There are nine different versions of Richard III; there are three versions of Hamlet, each with missing sections or added sections,” writes Marche. Well, no, there aren’t. There are eight quartos of Richard III, though they don’t differ much (if at all) from edition to edition after the third quarto. And then there are four reprints in folio form, but the second through fourth folio aren’t usually considered to have independent authority. So that’s either four different versions or thirteen. Hamlet exists in three different texts, two in quarto, one in folio; the second quarto was reprinted three times, but there are later quartos from the second half of the seventeenth century, five in total. So Hamlet, counting by the method Marche seems to use, half-heartedly, for Richard III, exists in somewhere between nine and fourteen “versions.”
If anything, we have too much literary data in these cases. What we don’t have, and what’s challenging, is non-literary data, supplementary information that would elucidate the status and the genesis of all these texts. The problem is not the indeterminacy of the literary work, or its incomplete transmission as such — it’s the absence of metadata. The challenge, that is to say, is not literary: it’s historical. And the mystery is not the mystery of language, but that of commercial publishing practices, playhouse conventions, censorship decisions, archival and collecting decisions, and so on. Algorithms won’t be the ultimate solution to those challenges and mysteries. But who ever said they would be?
- Click to email this to a friend (Opens in new window)
- Click to print (Opens in new window)
- Click to share on Facebook (Opens in new window)
- Click to share on Twitter (Opens in new window)
- Click to share on Tumblr (Opens in new window)
- Click to share on Reddit (Opens in new window)
- Click to share on Pinterest (Opens in new window)
- February 2018
- January 2018
- July 2017
- May 2017
- March 2017
- November 2016
- October 2016
- September 2016
- August 2016
- June 2016
- May 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- February 2015
- January 2015
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- May 2013
- April 2013
- March 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
Holger Syme's work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Images may be reused as long as their source is properly attributed in accordance with the Creative Commons License detailed above. Many of the photos here were taken at the Folger Shakespeare Library; please consult their policy on digital images as well.