“Where the author was once presumed to be the originating transmitter of a discourse next sent for management to the editor, publisher, and so on through all the other positions in the discursive circuit, now the author is in a mediating position as just one among all those other managers looking upstream to previous originating transmitters—database or XML schema designers, software designers, and even clerical information works (who input data into the database or XML source document).” (Liu 81)
A (very brief) history of the term “crowdsourcing”
In his 2004 article in Critical Inquiry titled “Transcendental Data: Toward a Cultural History and Aesthetics of the New Encoded Discourse,” Alan Liu describes today’s digital information culture through the concept of “discourse network 2000,” a way of organizing information production and dissemination which has the potential to disempower readers and writers by prescripting their roles within an overly-articulated management-focused framework. Fast-forward one year and Wired writer Jeff Howe coins the term “crowdsourcing” to refer to:
the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call.
Within universities, crowdsourcing has become a much-lauded and –fretted method of completing a large body of simple but tedious research tasks. It first made the leap from industry to academy via citizen science campaigns such as the Zooniverse, but gained attention in the digital humanities through the oft-emulated Transcribe Bentham project. In this post, I will look at two main threads of crowdsourcing in DH as proposed by Carletti et. al.’s taxonomy, and attempt to situate them within Liu’s theory of our postindustrial information economy.
Version One: The Crowd as Algorithm
According to Carletti et. al., one kind of DH crowdsourcing project asks volunteers to “integrate/enrich/reconfigure existing institutional resources,” with specific curation tasks such as “social tagging, image selection, exhibition curation, classification); revision (e.g., transcription, correction); and location (e.g., artworks mapping, map matching, location storytelling).” These tasks are time-consuming and difficult to automate, but require no special knowledge on the part of the volunteer. The purpose of these tasks is to make the information housed at institutions more easily discoverable; it is not to tap into some special ability of the crowd.
In other words, when the technology catches up, computers will be able to replace and outperform these human volunteers. Automated systems already exist for tagging, querying, and presenting information on web pages; humans are simply bridging the gap for those processes too complex for current algorithms. In the long term, then, this use of crowdsourcing adds little human value to a project, and is an unsustainable method of maintaining community interaction.
One oft-cited example of a pilot DH crowdsourcing project is Transcribe Bentham. Those involved were careful to interview their volunteers and chart their progress for the benefit of future projects. Despite initial concern, Causer et. al. write that “no volunteer expressed any feeling of being exploited” (127). However, when the project had to scale back the feedback and moderation provided by paid staff, it lost many of its regular contributors. Causer et. al. recommend building enough expertise amongst the volunteers that they may self-moderate, “thereby negating the need to pay academics to run the site” (131).
This focus on costs belies an underlying thread—much of the language of the article is couched in the business-ese of “discourse network 2000.” For example, Causer et. al. provide a workflow chart, cost-benefit analysis, and a push-pull understanding of the services exchanged between volunteers and project staff.
In Liu’s description of discourse network 2000, authorship (and readership) has been undermined by the “data pour,” in which a webpage is populated with information drawn from databases following instructions from an algorithm. In this first form of crowdsourcing, the volunteers act as algorithm, “automatically” tagging data and helping site users locate it. This is not unique to Transcribe Bentham. For example, Carletti et. al. refer to a project at the Victoria and Albert Museum through which “the public is invited to select the best images to be used” when others search the collections. The most ideal “crowd” for such a task wouldn’t be the public, but a consistent, reliable computer program—and once such a thing exists, the humans will be replace.
Transcribe Bentham has recognized the need to build a community “based on mutual respect and trust” between staff and volunteers, while still maintaining the staff as arbiters of what does and doesn’t count as a completed transcription, perpetuating and unequal power dynamic. Luckily, “Crowd as Algorithm” is not the only kind of crowdsourcing available.
Version Two: The Crowd as Creator
In their taxonomy, Carletti et. al. refer to a second kind of crowdsourcing which includes “projects that ask the ‘crowd’ to create/contribute novel resources,” and allows the public “to share physical or digital objects, such as document private life (e.g., audio/video of intimate conversations); document historical events (e.g., family memorabilia); and enrich known locations (e.g., location-related storytelling).” “Crowd-as-Creator” crowdsourcing gives the public more autonomy, elevating the volunteer base from content-management algorithm to content-generator.
Although not as common as the first kind of crowdsourcing, examples of Crowd-as-Creator include major projects such as StoryCorps and Europeana 1914-1918. One characteristic which distinguishes these projects is their combination of “‘face-to-face’ events with ‘standard’ computer-mediated actions” (Carletti et. al.). The public physically engages with the project, creating the information which goes into the “data pour,” rather than simply sorting it for display. Of course, these projects still exist within discourse network 2000—the crowd’s contribution has simply shifted from content-manager to database-filler. However, my inclination is to view this kind of work within the grassroots community-building movements which have traditionally preserved folk life and local history.
In an article titled “Digital Curiosities: Resource Creation via Amateur Digitization,” Melissa Terras points to another frequently neglected paradigm preserving our cultural memory: the pro-amateur digital museum. These digital museums cut out the middle man (universities) in favor of curating and, significantly, promoting information access themselves, often targeted at niche interests neglected by the academy. Unlike their institutional counterparts, the long-term prognosis for pro-amateur websites is extremely good—they maintain connections with avid users and attract thousands of new visitors every year, whereas:
“once an institutional website is created, it is often left to its own devices, with little sustainability funding made available to allow the regular upkeep and maintenance, and lack of the type of interaction with user communities necessary to attract and keep visitors which were described by these passionate amateurs.” (Terras 432)
There is an ethos to these projects which differs significantly from their institutional counterparts. The pro-amateur museum’s slogan could be (to riff on JFK’s famous line), “Ask not what your userbase can do for you; ask what you can do for your userbase.” Those contributing free labor to a project should also be served by that project in a meaningful way.
There is a branch of DH which shares this service-based ethos—the alt-ac movement, particularly as represented by the library, but including many of the alternative innovators in the humanities. Hearkening back to my previous post, Bethany Nowviskie advises project managers to: “Seek partners, not services. Seek collaborators, not staff.” In this context, she is referring to how DHers approach potential partners in industry or the computer sciences, but it applies equally well to forming a partnership with the “crowd.” While the logic of the web may largely decontextualize creative work and take the power to control content out of our hands, crowdsourcing can be used to re-empower content creators, if seen as a partnership between different kinds of experts, not as a temporary stopgap until better we develop better technology.
According to Carletti et. al., “rethinking the relationship between official and unofficial knowledge is probably the main challenge that cultural institutions have to face when undertaking a crowdsourcing process.” The purpose of the university is shifting—skyrocketing costs, the limited practical value of many degrees, MOOCs, the student debt bubble, anti-intellectualism, alt-ac—all of these contribute in different ways to the need to re-envision the purpose of the academy’s “hallowed” halls. Large-scale research projects are no exception. Who is being served by our work, and what service is being provided? The way an institution engages in crowdsourcing can give insight into how its academics would answer these questions.
Carletti, Laura et al. “Digital Humanities and Crowdsourcing: An Exploration.” Museums and the Web. N.p., n.d. Web. 13 Apr. 2015.
Causer, T., J. Tonra, and V. Wallace. “Transcription Maximized; Expense Minimized? Crowdsourcing and Editing The Collected Works of Jeremy Bentham*.” Literary and Linguistic Computing 27.2 (2012): 119–137. CrossRef. Web. 13 Apr. 2015.
Liu, Alan. “Transcendental Data: Toward a Cultural History and Aesthetics of the New Encoded Discourse.” Critical Inquiry 31.1 (2004): 49–84. CrossRef. Web. 9 Apr. 2015.
Terras, Melissa. “Digital Curiosities: Resource Creation via Amateur Digitization.” Literary and Linguistic Computing 25.4 (2010): 425–438. llc.oxfordjournals.org. Web. 13 Apr. 2015.
 One doesn’t normally label an epigraph, does one? I’ve been encoding too much TEI recently.
 The word “invitation” is repeated throughout the literature, as if to evoke the idea that the public will miss out on a great opportunity if they don’t “accept” the invitation to perform tedious labor for free.
 From Causer et. al.
 I’m being a little unfair here. Causer et. al. also describe an ideal situation in which the userbase is given at least one year of training by paid project staff, organically evolving its own volunteer moderators. This would be a brand-new kind of crowdsourcing in DH (institution-born, public-maintained), and possibly the most democratic option of all, as only projects of real value to the public would survive.
 Seen in this light, it’s almost The Matrix-esque—humans harnessed, not as batteries filled with energy, but as batteries filled with information, powering the Internet. (If you want to be cynical about it.)
 A term Terras coins to refer to an amateur (as in, non-academic) who is, in fact, the expert on his or her chosen topic.
 Nowviskie, “Ten rules for humanities scholars new to project management.”