Need a reality check? On peer-review and domain knowledge

Entering the Subway by rosemarie_mckeon
Entering the Subway, a photo by rosemarie_mckeon on Flickr.

Last week I attended IDCC14 in San Francisco, where I was immersed in digital curation. Naturally, things like peer-review and domain knowledge/expertise were on my mind, as things to consider with research (data) publishing. Then I saw this story about a Twitter data scientist “hacking” BART. I immediately retweeted it with a remark about not understanding how BART works (which is true). Right now, particularly in the Bay Area, there are a lot of “hacks” to solve problems that aren’t actually problems. It’s just that the people who perceive the problems don’t have the full picture and “hack” the solution for them, which in the case of public transit is only a small segment of the users. Joe Eskenazi wrote a good column about this in SF Weekly that looks at both sides. (Disclosure: We play futsal together, or did until I broke my finger in a game. Miss you Kamikaze!)

Reading Haque’s paper on arXiv, it’s clear to me he’s got the math and science stuff down – it’s the transportation that he’s lacking. A common issue with data scientists is that they often have the analytic and technological skills, but lack the domain expertise. So they have to work with experts or learn enough to become an expert (which takes time). Even if he just ran some of these ideas past a transportation engineering or planning masters student, they could have helped him refine the “problem”. Haque seemingly wrote this in a vacuum, so when it saw the light of the internet the transportation folk just picked it apart based on the faulty assumptions of how commuter rail fares work. (Note: Everybody thinks they’re an expert on transportation because they use it. Sorry, you’re most likely not.)

This is where peer-review could have been a good thing. I looked at the paper on arXiv to see if it was published elsewhere, such as a journal or conference. arXiv is often used as an open access repository for pre-publication manuscripts. Haque’s paper was not (as of yet) published or submitted elsewhere, which means there’s been no obvious peer-review which explains a lot.

Peer-review would have pointed out the flaws in Haque’s methodology (assuming the reviewers had the expertise). Instead, he got the open peer-review of Twitter and lots of transportation professionals and advocates, many of whom are tired of tech workers “hacking” transportation in a way that doesn’t really help. (Seriously, fare evasion with the help of an app and surge pricing on transit? BART isn’t Uber!)

There is a lot broken with peer review, but this is one case where it could have helped. I really hope Haque can hook into the very passionate and knowledgeable transportation community here in the Bay Area and start “hacking” some real problems. Let’s do it!

Records, mp3s, and preservation (and copyright)

records by Jack Emerson Garland
records, a photo by Jack Emerson Garland on Flickr.

This weekend, while organizing my records and putting them into Discogs, I had an epiphany — “Digital piracy can be a form of preservation. The ultimate LOCKSS.” The distribution isn’t organized, but it definitely helps preserve the “long tail”. Once something is on the internet, it’s very difficult to lose it.

This morning I stumbled across this piece about vinyl records, mp3s, and preservation questions. Basically, from an archivist’s perspective, vinyl and other analog musical formats aren’t great for long term preservation, especially compared to digital formats. Again, piracy saves the day!

Then something else happened this weekend… I watched this Beatles live performance from 1966 and remembered that they’re alright. No, I still don’t think they’re the best thing ever or deserve all of the praise, adulation, and obsession they’ve received, but they’re good and significant. Then I remembered the bootlegs they released last year on iTunes (of course it’s iTunes!) as a response to changes in EU copyright. (One cynical Guardian reader summed it up: “Another income source for McCartney, lovely.”) So the Beatles, such as they are now, get to retain copyright on those recordings another 20 year but had to publish/sunlight them in the process. Well, they’re out now. And thanks to certain torrenting sites, they’re not going away anytime soon. Preservation to the masses!

When a college takes the music library of a college radio station.

Music Library. by Pitseleh Pitseleh
Music Library., a photo by Pitseleh Pitseleh on Flickr.

Earlier this week @LibrarySherpa sent me this article about UT Austin accepting and processing the KUT music library.

Thanks to its purchase of the entire physical library of the university’s public-radio station, KUT, the university’s Fine Arts Library has 60,000 CDs and 4,000 LPs to process and store—400 boxes’ worth. The archive comprises music of all genres, including albums by little-known bands that were at one time or another part of Austin’s long-thriving music scene.

My initial thoughts were:
1 – That’s a small collection.
2 – How will the DJs use it if it’s circulating with the whole campus?

Then I finished reading the whole article. KUT is now a news and talk station, so the collection is fairly useless to them. As for the logistics, in a normal year the UT library processes donations of 800 CDs and 4,000 LPs. This makes the KUT collection a bit of a stretch. Will be interesting to see how it pans out despite my concerns about the future of college radio.

This isn’t the first time something like this happened in Texas. In 2010 Rice University sold the license for KTRU to University of Houston, which turned it into a classical radio station, effectively killing KTRU. Thankfully (?) their music library is now part of the Rice archives.

Closer to home, the University of San Francisco sold its FCC license for KUSF to USC so that now in the Bay Area 90.3 FM is also a classical music station. KUSF lives on via online streaming and the library is intact.

This all interests and concerns me as I’m co-director of the KALX music library. We have a collection of about 100,000 pieces of music — 45% LPs, 45% CDs, 10% 7″s. We’ve been collecting records since we started in 1962 and throw nothing out. If a KALX DJ says we should keep it, we do. The value of the collection is not only the size and the breadth, but also the reviews and comments scrawled on almost every record and CD. This is the history of KALX. Our copy of Nirvana’s Nevermind has a dialog about “selling out”, the grunge explosion, and the role of college radio. For some reason the original Star Wars soundtrack was also contentious. While it would be interesting to open this up to the public, it’s a working collection for the DJs, and the primary value is it being at the DJs’ ready at all times. KALX has a culture that really appreciates this, almost revering the library as a sacred collection, which is why theft is so low. KALX is an atypical college radio station in many respects, but the library is one of the better ones. If by a cruel twist of fate we become a classical station, I would hope the main library would take the collection, but I really hope that day never comes.

DIY Cohort: Professional networking online to keep you sane.

First, watch the full 58 minutes of this Stax Volt Revue from 1967. Then read this post.

Today I stumbled across this blog post from Inky Reviews about building her own community on Twitter while getting her MLIS online. She used it to build her community and engage with the profession in lieu of face to face relationships over sugar cookies in a physical class.

I covered this topic a 5 years ago, but I think it’s a good time to revisit since things are always changing and I’m definitely not the librarian I was back then. Well… not entirely.

Knowing people in online and “meating” them is pretty standard now. It’s not like it was back in 2006 when I flew to the UK to stay with a friend I met on a weird message board. We lied when I met his friends and sister that we had a mutual friend who I met on study abroad. The stigma’s largely gone. My conference roommates are my colleagues from Twitter. In fact, I’ve used Twitter a lot to work on stuff for SLA. It’s really helped me keep on top of what’s going on across the association (and with other library associations), and helped me forge partnerships (and friendships) that probably wouldn’t have happened any other way.

It’s also really helped me find a cohort within transportation. It’s still weird to think about how when I joined Twitter, it was nothing but a bunch of librarians. Now there’s a huge and vibrant transportation community on there, and it’s made going to transportation conferences way more fun and engaging. For one, it’s helped me meet people beyond the libraries and beyond my local group, so I get different perspectives. For another, more people to get coffee with. Oh how much that matters.

So yeah, I built a cohort for myself on the internet that serves me well off it. I encourage everybody to try it, or at least be open to it. The thing I sometimes worry about is that it’s not for everybody. I can extol the virtues of live Tweeting conferences, using it for random polls, and helping people out, but I also know a lot of people just won’t take to it. That’s fine. Yeah, they’re missing out but I also know there are lots of other venues of networking out there, and I’m definitely missing out – fancy dinners, whatsapp, tumblr, whatever the kids do these days.

But to you, my Twitter friends, thanks for keeping me sane and connected. I’ll pay you back with puns and music videos.



How do you define access in scholarly publishing?

academic journals by davidsilver
academic journals, a photo by davidsilver on Flickr.

I’m taking this time before the semester and the TRB Annual Meeting and SLA Leadership to whip some data into shape.

I’m analyzing the citations from our PhD students’ dissertations from the past 5 years. I hope to learn something about our collection development (is it on target?), how many citations each paper has, the age of the citations, and how do they use Open Access material?

I’m stalled with defining “access”.

If somebody cites an article in a legit journal (say, Australasian Transport Theories) from a well regarded, big subscription publisher (say, Springer) which is freely available on the web (not through the publisher), how do you define the access? Say this example isn’t Open Access. It’s just a good old well intentioned but not quite legal PDF on the web. For my research, how do I define these citations?

It’s a tricky thing because I feel like I know too much. I know the grad students just look to see if they can find the text and don’t much worry about whether or not the paper should be available on that site. Of course this behavior makes them think that sort of thing is OK and then they do it. Not to say I don’t agree with them, but as I said, I know too much. I know that they probably didn’t clear it with the publisher. That’s assuming the PDF came from an author. Often, it just is out there.

That said though… I’d rather see these sort of things out in the open. But how do I define access? What’s better? Something of dubious origin crawled by Google bots or a box of old journals in the corner? I think we know the answer.