One major buzzword in technological circles these days is “big data,” a term that means datasets so large they can’t be easily manipulated, even by computer database systems. Humanities scholars have appropriated the term “big data” to mean something more like datasets larger than could be reasonably used or mined without computational technology. Data-driven history was a part of economic, political, and even social history in the 1960s-1980s, but now digital humanities is bringing data-driven history back into vogue.
The Trans-Atlantic Slave Trade Database, a collection of data about slave voyages culled from sources all across the Atlantic, is one of the most remarkable collections of what humanists might call “big data.” It began in the 1960s, when even then it was too big to be dealt with effectively in book format. In 1999, the data was released on CD. Now it has been transformed into an interactive digital project online, with not only data but also visualizations, maps, and analytical essays. This database has been the project of many scholars, across a long period of time, and it has been used by hundreds of scholars. Anyone writing on the slave trade has to use this database in some way—its data is too valuable to be ignored. This project is particularly remarkable because it has existed all the way from the days of quantitative history into its present digital form. It gives us an idea of exactly how powerful data can be—and how much more powerful it can be when it is easily accessible and understandable to a wide range of people online.
The Methodology of Digital “Big” Data
Using data to make a historical argument requires its own type of methodology. There are many different ways to use quantitative methods, from a simple frequency counting, to a complicated set of statistical analyses. As I noted in my last post, digital humanities requires its own sort of methodology, but one which often subsumes the methodologies of quantitative history.
One of the major tenets of digital history is an emphasis in transparency in methodology. This emphasis was also present in quantitative history, perhaps because its methods were so foreign to traditional historical analysis, and thus had to be justified more fully. Digital history exercises transparency in methodology—in other words, a good digital project is very clear about the methods, algorithms, and statistical assumptions in its formulations.
Collaboration is a prominent aspect of many digital history projects involving “big” data. Historians are, after all, historians. To do complicated quantitative history, a historian has two options: either become a mathematician or statistician as well as a historian; or partner with another scholar who can do the computational work. Some historians have the skills to be both, but most don’t. Digital quantitative history can give historians the chance to cross very wide disciplinary boundaries—to computer science or mathematics!—rather than staying within the humanities for collaborations. This collaboration often means breaking out of traditional historical methodologies in order to use the data most effectively, as well as giving full credit to all collaborators for their involvement in the project.
Using “Big” Data in Military History
Military history is in many respects a data-driven discipline. Ship manifests, army records, casualty rate, force ratios, logistics, and even demographic data about soldiers and sailors all make up a vital part of the history of the military. So military history is ripe for using quantitative methods. And digital methods can make these quantitative methods easier to deal with.
There are many statistical models, and many different tools for using them. But all data-driven history takes unique instances, unique objects, or unique people, and puts them into categories. This process of deciding what categories to put things in can be very perilous. (This isn’t a problem just with quantitative history, but also whenever a historian makes a generalization.) For example, casualty rates are somewhat easy to put into categories—after all, a person is dead or alive, without much wiggle room. But other things aren’t so easy. Take statistics about officers in the U.S. navy, for instance. It’s one thing to count deaths. It’s another thing to determine how many officers belong to a particular social class or came from a particular background. Thinking through decisions about how to categorize data points is a critical part of the historian’s process when using quantitative methods.
One frequent criticism of quantitative history is that it does not allow the narrative of history to shine. But well-done quantitative history can actually create new narratives, rather than obscuring them. And digital history can help quantitative history to tell a story the story better than a monograph or article.
One excellent monograph that relies on data for its argument is Christopher McKee’s A Gentlemanly and Honorable Profession: The Creation of the U.S. Naval Officer Corps, 1794-1815. A series of appendices and a bibliographic essay show all of McKee’s data and the sources from which he culled it. The problem: those appendices are a huge 41 pages of charts and bibliographic information, which only the most dedicated historian will slog through. Similarly, Larry M. Logue and Peter Blanck’s work Race, Ethnicity, and Disability: Veterans and Benefits in Post-Civil War America deals with records from 40,000 Union Army veterans to explore patterns and disparities in the granting of disability pensions amongst African American and immigrant communities. These two social histories of the U.S. military are highly valuable, but because most readers will not work through the data, some of the power of the analysis is lost.
Digital History and Military Data
Digital history can help us negotiate these complex (but necessary) charts and tables and make them more accessible for a professional and non-academic audience. With a single mouse-click visualizations allow ‘big data’ to be easily interpreted yet analytically rigorous.A great example of this mixture of readability and rigor is the Visualizing Emancipation project from the University of Richmond. The huge number of data points in this visualization would be essentially meaningless in a chart, but overlaid on a map, trends and patterns become visible. We’ll discuss various styles of visualizations, as well as benefits and detriments, in a later blog post.
Digital history can also allow historians to expand their definition of data. Texts can become points of data in ways impossible to imagine before advanced computer technology. Jonathan Stray and Julian Burgess have used reports from the Iraq War to draw conclusions about operations and the social and military trends within the war’s official documents. (Notice, too, the clear and transparent explanation of the methodology.) The data being used in this instance includes both information about the reports and information contained within the reports.
The purpose of these examples is this: Though the methods of quantitative history may be resurging in digital history, the biggest changes are reflected in how data is presented. Digital history allows data to be less intimidating and more usable than the quantitative history of the 1970s and 1980s.
How have you used quantitative methods in military history? What benefits did you see? What pitfalls?
Abby Mullen is a graduate student at Northeastern University in Boston, Massachusetts. She studies the navy of the United States in the early republic. She is also a fellow at Northeastern’s NULab for Texts, Maps, and Networks, researching for the Viral Texts project.