During the second day of the conference, participants continued to explore computer-driven and crowdsourced approaches to tag audiovisual assets. Although there exists a tension between these two methods, many participants argued for a combination of both. In spite of the challenges, solutions are already underway and were introduced as case studies at the conference.
In the day’s first keynote, Cees Snoek (Intelligent Systems Lab, University of Amsterdam) described artificial learning whereby a computer is trained to automatically generate accurate sentence-level descriptions of video content derived from shapes contained in the video frame. The keynote was followed by a case study from Sam Davies (BBC R&D), in which emotional terms are mapped to video content, effectively increasing user accessibility, when combined with conventional discovery tools. The second case study, from Radiotelevisione Svizzera – RSI Archive, documented a successful collaboration between archivists and IT staff to create a software to automatically tag radio content by means of computer audio analysis.
In the first panel discussion, the participants acknowledged the challenges disruptive technologies pose to archivists. While Brecht DeClercq (VRT) suggested addressing archivists’ reticence at adopting new tools through better interface design, Alberto Messina (RAI) argued that cultural differences between IT and archivists hampers project development. Hypothetically, the IT team might scope out software applications without involving the archivists and the archivists may not be willing to compromise on software functionality. The panelists agreed that automated tools, by freeing up archivists’ time, would increase the quality and quantity of metadata. However, Brecht argued that careful choices must be made to match the best tool to the collection. Google was identified as one source of rapidly developing technology that could possibly disrupt current audiovisual workflows (N.B. During the conference, Google announced its automatic tagging service, which linked identifying terms to its Knowledge Graph).
The second half of the day focused on user-generated metadata, often described as crowdsourcing. In a keynote address, Lora Aroyo (Intelligent Information Systems Web and Media Department of Computer Science, at the Free University Amsterdam) posited that, due to the symbiotic relationship between humans and machines, a new infrastructure must be built around the ways users consume audiovisual content. Aroyo suggested a holistic approach incorporating user experience design, quality metrics, user motivation studies, and the use of flexible tagging schemes. Yves Raimond (BBC) presented a project, now out of beta, that automatically tags radio content and then exposes it to the public, who are then invited to evaluate the tags’ integrity. Liam Wylie (RTÉ) presented the final case study of a grass-roots initiative, which engages the public through Twitter. Wylie argued that only by gaining the attention of the public can collection managers use crowdsourcing as a means to realize economies of scale.
The final panel discussion focused on the value of user-generated tags. While Julia Vytopil (Beeld en Geluid) argued that because the general public often develop non-specific, overlapping, and redundant tags a controlled vocabulary is needed to productively constrain user choice. Aroyo claimed that attention should be paid to niche-sourcing, arguing for tagging initiatives to be geared towards particular user groups, who often require different incentives than general or casual users. The risk of user mis-tagging can be mitigated through the use of an array of different approaches, including filtering (Aroyo), the use of controlled vocabularies (Vytopil), and expert review (Svein Prestvik from NRK). In response to another question, Aroyo suggested capturing the provenance of user tags in much the same way that Wikipedia documents user edits to its entries. Prestvik conceded that integrating broadcast metadata from many disparate sources has been very challenging, though progress is being made. Both Camilla Roesen (Danish Radio) and Prestvik agreed that users should generate rights metadata, but acknowledged the difficulties this posed. Elena Brodie Kusa questioned whether relying on social media excluded an important part of the public that had yet to cross the digital divide. Though the panelists agreed this was an important consideration, some believed that people who have been digitally illiterate are rapidly gaining computer literacy skills.
Computer driven innovations seem ubiquitous; from social media tools that allow us to connect online to IBM’s Watson, an artificial intelligent computer that won a quiz show. Although I was unsurprised to hear so many presentations about automatic video and audio analysis, the number of computer generated tag projects that were out of beta phase impressed me. According to one presenter (McKernan, 2013), machines will do 90% of routine audiovisual tagging, while humans will do the remaining 10%. While, Thomas Prehn (Asunto), who co-moderated the event with Elena Brodie Kusa, remarked that humans should be at the center of any tagging initiative, Cees Snoek’s presentation supported the trend toward a mostly automated cataloguing workflow. I remain unconvinced of one method’s superiority over the other. As Aroyo stated in her keynote, humans and machines have a symbiotic relationship. Perhaps computers will grow to understand us, while we learn to embrace the advantages they bring. One thing is certain, as our society’s appetite for media continues to grow, so does our need to archive and open up this audiovisual content for the public.
McKernan. (2013, May 19). Metadata matters. Luke McKernan. Retrieved from http://lukemckernan.com/2013/05/19/metadata-matters/