I do not know everything. Nor does anyone else.
Huge surprise, I’m sure. Lemme give you a second to pick your jaw up off the floor and recompose yourself.
Despite the fact that this foregone conclusion is painfully obvious, I certainly struggle with admitting it to people sometimes. Especially if the particular thingy that I do not totally understand is something that I wish I did know.
When this happens in social situations, I am guilty of trying to “pass” as someone who knows this information. I’m terrified that acknowledging that I don’t know how to write a script for that particular problem, or that I have not seen that really cool foreign film, or that I have never heard of this historical event, will be met with “WHAT??? You haven’t heard of XYZ?”
Instead, I will just nod my head and softly say “yeah, yeah,” while hoping no one presses me for details (they usually don’t).
Hey, by the way, when someone says they don’t know something PLEASE DON’T belittle them for it. I don’t care how much you loved that punk band in high school, not everyone has heard of them. Don’t act like someone just told you they didn’t know the sky was blue.
But I’m not talking about social situations in this post.
I want to talk about when I don’t know stuff that I wish I knew, so that I could do my job better (and faster).
Often in this situation I insist that I can just “boot-strap” this problem. I can teach myself, I can solve this without help, I am unstoppable.
Then, when I ultimately cannot pile-drive my ignorance out of existence through sheer force of will, I feel defeated. I put myself down. This often takes the form of comparing myself to someone else, who has, likely through the use of the dark arts (but certainly not through asking for help), managed to acquire the skills/knowledge I desire. I am clearly less than this person, as they have learned this thing that I cannot learn.
The author David Foster Wallace lived and worked in my hometown of Bloomington/Normal, Illinois, and I have always felt like that somehow explained how relatable I find his writing (totally plausible that he is just a good writer, and good writing just feels relatable – like a little peak inside the author’s head – but don’t ruin this for me). In an interview (that eventually became a book, that eventually became a movie) Although of Course You End Up Becoming Yourself: A Road Trip With David Foster Wallace, DFW describes his struggle with unhappiness, and his misguided approach to addressing this unhappiness, as “very American.”
When talking with David Lipsky:
“DFW: …Or, then, for two weeks I wouldn’t drink, and I’d run ten miles every morning. You know that kind of desperate, like very American, ‘I will fix this somehow, by taking radical action.’
And uh, you know, that lasted for a, that lasted for a couple of years.
DL: Like Jennifer Beals, more or less. In Flashdance, solving Pittsburgh.
DFW: And it’s weird. I think a lot of it comes out of sports training. You know? (In Schwarzenegger voice) ‘If there is a problem, I vill train myself out of it. I vill get up early. I vill vork harder.’ And that shit worked on me when I was a kid, but you know…”
That’s what I mean by “boot-strapping” a problem. Somehow the solution is just going to present itself to me because I did a bunch of pushups or something. Or the digital preservation equivalent of pushups, google searches.
I’ve found over the course of my residency at LPB, that I get better answers, faster, when I just admit I don’t know how to do something, and would like to know how to do that thing. One of my first experiences with this during my residency came up when I was trying to figure out how to run MediaInfo on all of the web encoded access file we keep on our server. I had worked out a process with LPB’s Web IT Manager, and was proudly tweeting about it. Turns out, there was a faster and easier way to do the same thing, and Kieran O’Leary from the Irish Film Archive was nice enough to clue me in.
Thanks to Kieran we ended up modifying this script to run recursively through our LTO-6 tapes as well.
for /r %%A in (“*.mp4”, “*.mxf”) do mediainfo –output=PBCore2 “%%A” > “%%~nxA.xml”
This command is written for Windows command line, often stored in a batch file or “.bat.” They’re new to me, so this might be a little rudimentary to a seasoned Windows user. I found this resource helpful for figuring out all the different arguments and parameters (what all those goofy % and ~ signs mean):http://ss64.com/nt/syntax-args.html
I’m also looking to implement the MediaConch application into the archival workflows here at LPB. MediaConch does many things, but at LPB, I would like to use it as a form of automated quality control, through the policy checker feature. The policy checker essentially checks an input file against a set of rules, a policy, and tells the user whether the file conforms to that policy, or not. You can setup a policy manually, for instance, “all files must be in a .mov container, and have a sample rate of 5mbps or higher,” or you can use a file as a template to create a policy. Here at LPB, we make many files that will all be encoded the same way, so creating a policy from an existing file seemed like the way to go. If I knew how to do that.
I had read that it was possible, but couldn’t figure out how to do it. After scrolling around fruitlessly through the MediaConch GUI (there’s a CLI version as well), I decided to publically declare my ignorance, albeit timidly.
Dave Rice, audiovisual archivist at CUNY TV, was quick to jump in with the answer to my question. Not only did I get the answer I sought pretty much instantly, former NDSR resident, current mass digitization coordinator at NYPL, and distributor of sage advice, Dinah Handel, was there to set me straight on my needless sheepishness. Finally, Jérôme Martinez is the lead developer on MediaInfo and MediaConch, so now my feedback can be used to help develop later versions of the software. (In retrospect, I could have done this more directly on Github using the “Issues” tab on the software’s Github repo:https://github.com/MediaArea/MediaConch_SourceCode)
Once one has created a MediaConch policy, and runs a file against that policy, a report is created, detailing whether the file “passed” or “failed” the policy, and why. MediaConch exports reports in a variety of formats, you can create an HTML file that presents the report in an easy to read format (for a human), or as an XML file that can be interpreted more easily by a machine. I wanted to see if we could parse the XML reports automatically to create some sort of “red flag” for our transfer engineer when a file failed our policy. I emailed Ashley Blewer, a developer for NYPL, who is on the MediaConch team, and a friend of mine, and asked her if she had any ideas on how I might accomplish my “red flag” idea. See?? Asking for help! I’m learning (I hope).
Ashley clued me into a project another friend of ours was working on (isn’t it nice to have friends?) at her internship at CUNY TV.Savannah Campbell, current NYU MIAP student and CUNY TV intern, has been using XMLStarlet to parse the XML output of MediaConch reports at CUNY TV. You can see some of her code here:https://github.com/mediamicroservices/mm/blob/master/verifypackage
If you’re looking at that code and you’re like, “um, what?” So was I! I just emailed Savannah and said that basically.
Savannah clued me into XMLStarlet, and the different flags you can add to the XML Starlet command, and how they work (I also needed some help recognizing the MediaInfo Xpath).
It was easy enough to re-purpose Savannah’s code into our own processes here at LPB. It’s still a little up in the air whether we will employ this method for reviewing files or not, we may just create HTML files and have our engineer quickly open them to review the report, but XMLStarlet is clearly a really powerful tool that I’m glad I learned about either way.
A post from one of my cohort members on this blog was super helpful to me, too. I had used md5deep before, but only on a really small scale. It was great to be able to use Lorena’s post as a reference before I dove into unfamiliar waters. We’re using md5deep slightly differently at LPB, and so when I was struggling to build off of what I had learned from Lorena, embolden by my success of asking for help earlier in the residency, I emailed Dinah Handel about it, as well as asking the rest of the cohort about it through our slack channel. Dinah worked on migrating files from one generation of LTO tape to another during her residency, and used checksums to verify that the transfer was complete and successful. I’m going to be trying something similar soon, as we’re moving files off of a RAID and onto an LTO tape. Dinah suggested creating a manifests of checksums from all the files before and after the transfer, and then comparing the two lists.
Thanks Dinah! That’s the plan.
But, while I was testing this process, I was running into some trouble. When you run md5deep recursively through a directory, it doesn’t always process the files in the same order. This sucks, cause when you try to compare the files programmatically on Windows (using the “FC” command), it sees the files as being totally different, because they’re in a different order (FC compares files line by line). Thankfully, I’m not in this alone.
I was talking to my fellow AAPB NDSR residents about this, and Andrew Weaver suggested piping the results through the “sort” function. The “pipe” command takes the results from one command and pushes them through a second process. Andrew’s suggestion totally paid off. The command I’m using to create a manifest looks like this:
md5deep64 -r -b -e “INPUT DIRECTORY” | sort > OUTPUT FILE PATH #1.txt
Soooo I tested this script yesterday and it turns out it doesn’t work for LTO tapes. I think the “*” wildcard sends md5deep on a rampage trying to create checksums for everything it can get its grubby little hands on.
LTO stands for Linear Tape Open, and the “Linear” part means that files are written linearly. When a computer is reading an LTO tape, it can only read one file at a time. So when md5deep tries to create checksums for three files at once, it basically just spins the tape around in the drive, getting nothing done.
Instead, I’m using a modifed version of the script we’re using for creating MediaInfo files, which processes one file at a time, inserting each new checksum into the same text file, and then sorting the results after the fact:
for /r %%A in (“*”) do md5deep64 -b -e “%%A” >> “checksum_manifest_for_RAID.txt”
sort “checksum_manifest_for_RAID.txt” /O “checksum_manifest_for_RAID.txt”