AAPB NDSR Final Station Reports

After a great presentation from our AAPB NDSR residents at the Society of American Archivists meeting in Portland last month, the 2016-2017 AAPB NDSR residencies have officially drawn to a close.

Each host site has written up a report on the progress that was made on their digital archival practices over the course of the NDSR project and the impact that they expect this work to have on their institution going forward. Please click on the links to read each host institution’s report:


Howard University Television (WHUT)

KBOO Community Radio

Louisiana Public Broadcasting

Minnesota Public Radio

Wisconsin Public Television

Congratulations again to all of the residents on completing their projects, and thank you again to our hosts and all the members of the AAPB NDSR community!

Posted in admin, project updates | Leave a comment

Launching the American Archive of Public Broadcasting Wiki

The AAPB NDSR residencies have now ended, but we’re very proud to launch the final project created by our AAPB NDSR residents: The American Archive of Public Broadcasting Wiki, a technical preservation resource guide for public media organizations.

Selena Chau, Eddy Colloton, Adam Lott, Kate McManus, Lorena Ramírez-López, and Andrew Weaver have highlighted their collaboration and shared their resources, workflows, and documents used for managing audiovisual assets in all their possible formats and environments.  The resulting Wiki encompasses everything from the first stages of the planning process to exit strategies from a storage or database solution.

AAPB staff and the residents hope that this Wiki will be an evolving resource. Editing capabilities will be locked on the Wiki for one week following launch, to allow time for the creation of a web archive of the resource in its original form that the residents may use in their portfolios; after this period, we will open up account creation to the audiovisual archiving and public broadcasting communities. We welcome your participation and contributions!

Posted in admin, project updates | Tagged | Leave a comment

Trying New Things: Meditations on NDSR from the Symposium in DC

Way back in August, when I was already feeling overwhelmed by this residency, I decided that yes, I was going to join the planning committee for the NDSR Symposium. I don’t know why I thought taking on more obligations was a good idea, when I was already feeling like I might collapse under the weight of the residency. But I always have a high opinion of what Future Kate is capable of.

In terms of planning a Symposium, this one was a good one to start with. Everyone on the planning committee, staff, and advisory council was dedicated to making this a space where stakeholders across residencies could talk about the benefits of the program, the successes, and of course, the missteps. We all really believe in these residencies, and the conversations we had over the two days of the Symposium were excellent.

In terms of planning, I advocated for a group slack for the planning committee. I tracked the very generous travel grant applications. We set up gmail accounts for the applications (both travel and submissions for panels and presentations), as well as google sheets to track how much money we had in the pool. I also came up with some templated emails for the planning committee to use. If you got one, hey! I hope it was exactly like every other confirmation email you’ve ever gotten. Like I said, I have never done anything like this before, so everything was new to me. I also helped with registration and greeting, so that I could meet as many people as possible. Hi!

There were two upsets that we didn’t anticipate. After initially agreeing to making opening remarks, Dr. Carla Hayden was unable to make it. There was a replacement speaker quickly arranged, but I know I was really disappointed. She is the Librarian of Congress, however, so I understand there are demands on her time.

There was also a small issue with the chair delivery being sent to the wrong building. The staff at Library of Congress soon got it sorted out, and skipping a break allowed us to get back on schedule. Of all the things that could go wrong, I’m glad it was something that wasn’t a disaster. Many of the early speakers said they enjoyed the extra mingling and networking.

Despite these minor hitches, the conversations during the Symposium were brilliant across the board. I enjoyed hearing about the diversity of projects. The challenges some speakers raised, about sustainability, communication, and effective mentorship, are challenges that my cohort has faced. And even defining what exactly qualifies a project as an NDSR program challenged my thinking.

Of course, the AAPB crew represented our work over the past few months, which is always interesting. A goal of mine has been to become an improved public speaker. The jury’s probably still out on that one, but it’s become less terrifying. The smaller group probably had something to do with that. We’ll see if I feel the same way when Society of American Archivist’s National Conference rolls around in July.

There was another opportunity this trip to Washington offered me, something I had never done before. I met with my congressperson, Jason Lewis. I had met two of my hometown’s mayors, but this was the first time I made an appointment with an elected leader, certainly I had never met anyone from the National level. But with all the uncertainty of continued IMLS funding, I felt it would be worth a short but passionate conversation. During the last congressional recess, he had visited one of the local libraries here, so I thought it would be good timing.

It turned out to be not so great timing, as he wasn’t able to meet with me while I was there. I’m trying not to think it was intentional, and the Legislative Assistant who met with me took a polite interest.

Honestly, my anxiety didn’t even kick in, I wasn’t nervous at all. It helped that I was coming in with the full weight of my master’s degree, my ten years in library service, my position on my local library’s board, and of course, my experience as a direct recipient of an IMLS grant focusing on our digital heritage. I talked to the Assistant for 20 minutes about the benefits of libraries and how IMLS supports them. I gave him articles and a printout of the American Library Association’s Code of Ethics. I got his card and he’s already heard from me. I hope he’s being honest when he says he’s passing my concerns on to the Congressperson.

I feel this trip to DC really benefited my understanding of the NDSR program as a whole. I tried a lot of new things, I got a lot out of the conversations I had. I hope that I gave as much as I took with me.

As a resident who is wrapping up her project in the next two months, I only hope that I can continue to be connected to this wonderful community, and to advocate for our shared digital heritage.

This post was written by Kate McManus, resident at Minnesota Public Radio.

Posted in resident post | Tagged , | Leave a comment

Resident Webinar Recording Roundup

With Lorena Ramírez-López’s presentation last month on “Whats, Whys, and How Tos of Web Archiving,” our AAPB NDSR webinar series has now concluded.  However, if you want to catch up on what you missed, I’m happy to share that recordings of all our resident-hosted webinars, along with slides and other resources, are now available for open viewing!

Challenges of Removable Media in Digital Preservation – Eddy Colloton
Removable storage media could be considered the most ubiquitous of digital formats. From floppy disks to USB flash drives, these portable, inexpensive and practical devices have been relied upon by all manner of content producers. Unfortunately, removable media is rarely designed with long-term storage in mind. Optical media is easy to scratch, flash drives can “leak” electrons, and floppy disks degrade over time. Each of these formats are unique, and carry with them their own risks. This webinar, open to the public, will focus on floppy disks, optical media, and flash drives from a preservation perspective. The discussion will include a brief description of the way information is written and stored on such formats, before detailing solutions and technology for retrieving data from these unreliable sources.
(Originally aired January 12, 2017)
Recorded Webinar

Demystifying FFmpeg/FFprobe – Andrew Weaver
The FFmpeg/FFplay combination is a surprisingly multifaceted tool that can be used in myriad ways within A/V workflows.  This webinar will present an introduction to basic FFmpeg syntax and applications (such as basic file transcoding) before moving into examples of alternate uses.  These include perceptual hashing, OCR, visual/numerical signal analysis and filter pads.
(Originally aired January 26, 2017)
Recorded Webinar

Intro to Data Manipulation with Python CSV – Adam Lott
Archivists frequently inherit huge amounts of messy and disorganized data. More often than not, this data does not fit into established organizational systems, and manually reconstructing data can be a monumental task. Luckily, there are tools for automating these processes. This webinar, open to the public, will focus on one such tool, the Python CSV module, which can handle the large-scale manipulation of CSV data. Using this module, we will extract content from a large CSV, isolating specific assets that have yet to be digitized. Basic concepts of Python programming will also be covered, so those with little experience can follow along and create code of their own.
(Originally aired February 23, 2017)
Recorded Webinar

Through the Trapdoor: Metadata and Disambiguation in Fanfiction – Kate McManus
Within the last twenty years, fanfiction writing has transfigured from disparate communities like webrings within Geocities communities into websites like AO3, providing a more intimate and interactive space for the wide world of web communities to gather and create. Recently, professionals who work to preserve and archive pop culture have begun to open dialogues with fan writers in hope that these underground communities will share their labors of love, but doing so creates its own problems for both creators and archivists.
(Originally aired March 9, 2017)
Rated: T
Characters: Gen
Tags: meta, archives, angst, rarepair
Disclaimer: Just playing in the sandbox. DO NOT SUE, I OWN NOTHING Please read and review!
Recorded Webinar

ResourceSpace for Audiovisual Archiving – Selena Chau
Selena Chau (AAPB NDSR Resident, KBOO Community Radio) and Nicole Martin (Senior Manager of Archives and Digital Systems, Human Rights Watch) share their assessment and uses of ResourceSpace as an open-source tool for metadata and asset management within digital preservation workflows.
(Originally aired March 23, 2017)
Recorded Webinar
Demo videos: 1, 2, 3, 4

Whats, Whys, and How Tos of Web Archiving – Lorena Ramírez-López
Basic introduction of:
– what web archiving is 🤔
– why we should web archive ¯\_(ツ)_/¯
– how to web archive 🤓
in a very DIY/PDA style that’ll include practical methods of making your website more “archive-able” – or not! – and a live demo of webrecorder.io using our very own NDSR website as an example!
(Originally aired April 6, 2017)
Recorded Webinar


Posted in admin | Tagged , , , , , , | Leave a comment

Adventures in Perceptual Hashing


One of the primary goals of my project at CUNY TV is to prototype a system of perceptual hashing that can be integrated into our archival workflows. Perceptual hashing is a method of identifying similar content using automated analysis; the goal being to eliminate the (often impossible) necessity of having a person look at every item one-by-one to make comparisons. Perceptual hashes function in a similar sense to standard checksums, except instead of comparing hashes to establish exact matches between files at the bit level, they establish similarity of content as would be perceived by a viewer or listener.

There are many examples of perceptual hashing in use that you might already have encountered. If you have ever used Shazam to identify a song, then you have used perceptual hashing! Likewise, if you have done a reverse Google image search, that was also facilitated through perceptual hashing. When you encounter a video on Youtube that has had its audio track muted due to copyright violation, it was probably detected via perceptual hashing.

One of the key differences between normal checksum comparisons and perceptual hashing is that perceptual hashes attempt to enable linking of original items and derivative or modified versions. For example, if you have a high quality FLAC file of a song and make a copy transcoded to MP3, you will not be able to generate matching checksums from the two files. Likewise, if you added a watermark to a digital video it would be impossible to compare the files using checksums. In both of these examples, even though the actual content is almost identical in a perceptive sense, as data it is completely different.

Perceptual hashing methods seek to accomodate for these types of changes by applying various transformations to the content before generating the actual hash, also known as a ‘fingerprint’. For example, the audio fingerprinting library Chromaprint converts all inputs to a sample rate of 11025 Hz before generating representations of the content in musical notes via frequency analysis which are used to create the final fingerprint. 1 The fingerprint standard in MPEG-7 that I have been experimenting with for my project does something analogous, generating global fingerprints per frame with both original and square aspect ratios as well as several sub-fingerprints. 2 This allows comparisons to be made that are resistant to differences caused by factors such as lossy codecs and cropping.

Hands on Example

As of the version 3.3 release, the powerful Open Source tool FFmpeg has the ability to generate and compare MPEG-7 video fingerprints. What this means, is that if you have the most recent version of FFmpeg, you are already capable of conducting perceptual hash comparisons! If you don’t have FFmpeg and are interested in installing it, there are excellent instructions for Apple, Linux and Windows users available at Reto Kromer’s Webpage.

For this example I used the following video from the University of Washington Libraries’ Internet Archive page as a source.


From this source, I created a GIF out of a small excerpt and uploaded it to Github.



Since FFmpeg supports URLs as inputs, it is possible to test out the filter without downloading the sample files! Just copy the following command into your terminal and try running it! (Sometimes this might time out; in that case just wait a bit and try again).

Sample Fingerprint Command

ffmpeg -i https://ia601302.us.archive.org/25/items/NarrowsclipH.264ForVideoPodcasting/Narrowsclip-h.264.ogv -i https://raw.githubusercontent.com/privatezero/Blog-Materials/master/Tacoma.gif -filter_complex signature=detectmode=full:nb_inputs=2 -f null -

Generated Output

[Parsed_signature_0 @ 0x7feef8c34bc0] matching of video 0 at 80.747414 and 1 at 0.000000, 48 frames matching [Parsed_signature_0 @ 0x7feef8c34bc0] whole video matching

What these results show is that even though the GIF was of lower quality and different dimensions, the FFmpeg signature filter was able to detect that it matched content in the original video starting around 80 seconds in!

For some further breakdown of how this command works (and lots of other FFmpeg commands), see the example at ffmprovisr.

Implementing Perceptual Hashing at CUNY TV

At CUNY we are interested in perceptual hashing to help identify redundant material, as well as establish connections between shows utilizing identical source content. For example, by implementing systemwide perceptual hashing, it would be possible to take footage from a particular shoot and programmatically search for every production that it had been used in. This would obviously be MUCH faster than viewing videos one by one looking for similar scenes.

As our goal is collection wide searches, three elements had to be added to existing preservation structures: A method for generating fingerprints on file ingest, a database for storing them and a way to compare files against that database. Fortunately, one of these was already essentially in place. As other parts of my residency called for building a database to store other forms of preservation metadata, I had an existing database that I could modify with an additional table for perceptual hashes. The MPEG-7 system of hash comparisons uses a three tiered approach to establish accurate links, starting with a ‘rough fingerprint’ and then moving on to more granular levels. For simplicity and speed (and due to us not needing accuracy down to fractions of seconds) I decided to only store the components of the ‘rough fingerprint’ in the database. Each ‘rough fingerprint’ represents the information contained in forty-five frame segments. Full fingerprints are stored as sidecar files in our archival packages.

As digital preservation at CUNY TV revolves around a set of microservices, I was able to write a script for fingerprint generation and database insertion that can be run on individual files as well as inserted as necessary into preservation actions (such as AIP generation). This script is available here at the mediamicroservices repository on github. Likewise, my script for generating a hash from an input and comparing it against values stored in the database can be found here.

I am now engaged with creating and ingesting fingerprints into the database for as many files as possible. As of writing, there are around 1700 videos represented in the database by over 1.7 million individual fingerprints. The image below is an example of current search output.



During search, fingerprints are generated and then compared against the database, with matches being reported in 500 frame chunks. Output is displayed both in the terminal as well as superimposed on a preview window. In this example, a 100 second segment of the input returned matches from three files, with search time (including fingerprint generation) taking about about 25 seconds.

The most accurate results so far have been derived from footage of a more distinct nature (such as the example above). False positives occur more frequently with less visually unique sources, such as interviews being conducted in front of a uniform black background. While there are still a few quirks in the system, testing so far has been successful at identifying footage reused across multiple broadcasts with a manageably low amount of false results. Overall, I am very optimistic for the potential enhancements that perceptual hashing can add to CUNY TV archival workflows!

This post was written by Andrew Weaver, resident at CUNY TV.

Posted in resident post | Tagged , | Leave a comment

Reporting from the PNW: Online Northwest Conference

Hello, it’s your Pacific Standard Time/Pacific Northwest resident here reporting on happenings in the Cascadia region. I was fortunate to learn about the Online Northwest conference (someone who attended my KBOO Edit-a-Thon event told me about it!), although last minute. This small, one=day conference was well organized with the majority of the presenters and participants from Oregon and Washington but also from northern California, New York, and Virginia. It was held on Cesar Chavez’s birthday which is a recognized holiday for KBOO. And P.S.: some of our recently digitized open reel audio made it into KBOO’s Cesar Chavez Day Special Programming so check it out!

I learned that Online Northwest (#ONW17) previously had been held in Corvallis, had a hiatus day last year, and was held at PSU for the first time this year. At Online Northwest, there were four tracks: User Experience/Understanding Users, Design, Engagement/Impact, and Working with Data. Presenter slides are here: http://pdxscholar.library.pdx.edu/onlinenorthwest/2017/

First, I was fully engaged with Safiya Umoja Noble’s keynote “Social Justice in LIS: Finding the Imperative to Act” which highlighted the ongoing need for reevaluation and action in our work as information professionals to ensure that information is equitable in online catalogs, cataloging procedures, search algorithms, interface design, and personal interactions.

I attended the Kelley McGrath (University of Oregon) session on metadata where she talked about the potential of linked data and the semantic web, or the brave new metadata world. She talked about algorithms and machines that attempt to self-learn and present users with new information drawn from available linked data sources. Of course the information can be fickle at times since computers are largely unknowing of the data they are fed. End of session questions almost reached a point of interest where people were putting two and two together: one can say that algorithms are neutral but we can’t ignore that humans choose and develop algorithms. Rarely do individuals critique implemented algorithms for human bias. This reminded me of Andreas Orphanides’s keynote at Code4Lib: that a person initially frames the question that is being asked, and that influences the answer, or what is being looked at. Framing to focus on something specific requires excluding many other things. My takeaway from this session, on the heels of Dr. Noble’s keynote, was to be cautious and consider potential bias when using algorithms.

The Design of Library Things: Creating a Cohesive Brand Identity for Your Library (Stacy Taylor & Maureen Rust, Central Washington University) presenters talked about their process of brand identity from start to finish including their challenges and sharing their awesome resources. Considering tone of communication and signage (what is the emotional state of readers when they see your signs?), consistent styling of materials, defining scope of the project, clearly separating standards (requirements) from best practices, and continuously making a case and being firm on the needs for and benefits of branding were important takeaways.

Open Oregon Resources: From Google Form to Interactive Web Apps by Amy Hofer (Linn-Benton Community College) and Tamara Marnell (Central Oregon Community College) was fantastic! They walked through their use of Google forms/sheets and WordPress to share open educational resources in use by professors in Oregon Community Colleges, and ideas for how others could implement something similar, but better, considering that technology advances and old versions age, and considering the time and resource needs of an implemented tool.

Chris Petersen from Oregon State University (OSU) presented on Creating a System for the Online Delivery of Oral History Content. He described ways in which he reused existing OSU resources (Kaltura, XML, TEI, colleague knowledgeable in XSL/XSLT), common in the “poor archivist”’s toolbox. I also thought his reference to the Cascadia Subduction Zone’s impending massive destruction when arguing for the need to ensure multiple collocations of digital materials was uniquely PNW. Like the Open Oregon Resources talk, this session was modeled as here’s what I did, but it clearly can be done better; here’s what we’ll be doing or moving towards in the future. Chris mentioned they’ll be OHMS-ing in the future. Although it is less flexible than their current set-up, it was more viable considering his workload. He also mentioned Express Scribe software (inexpensive) as an in-house transcription tool and XSL-FO for formatting xml to pdf.

My favorite lightning talk was Electronic Marginalia (Lorena O’English, Washington State University): a case for web annotation. Tools I want to check out are hypothes.is (open source) and genius web annotator.

This post was written by Selena Chau, resident at KBOO Community Radio.

Posted in resident post | Tagged , | Leave a comment

AAPB NDSR April Webinar

Our last AAPB NDSR webinar is coming up this week!

Whats, Whys, and How Tos of Web Archiving
(Lorena Ramírez-López)
Thursday, April 6
3:00 PM EST

Basic introduction of:
– what web archiving is 🤔
– why we should web archive ¯\_(ツ)_/¯
– how to web archive 🤓

in a very DIY/PDA style that’ll include practical methods of making your website more “archive-able” – or not! – and a live demo of webrecorder.io using our very own NDSR website as an example: https://ndsr.americanarchive.org/!

Register for “Whats, Whys, and How Tos of Web Archiving”

And if you missed our March webinars from Kate McManus and Selena Chau, you can catch up now:

“Through the Trapdoor: Metadata & Disambiguation in Fanfiction, a case study,” presented by Kate McManus, AAPB NDSR resident at Minnesota Public Radio
–          view the slides
–          watch the recording

“ResourceSpace for Audiovisual Archiving,” presented by Selena Chau, AAPB NDSR resident at KBOO Community Radio, and Nicole Martin, DSenior Manager of Archives and Digital Systems, Human Rights Watch
–          view the slides
–          watch the recording
–          watch the demo videos: 1, 2, 3, 4

Posted in admin | Tagged | Leave a comment