Adventures in text mining part 1: Getting started

Fishing flickr photo by Homini:) shared under a Creative Commons (BY) license

Someday I may sit down and write this all out into a good research paper I can shop around to open access academic venues, but I thought it would be cool and possibly helpful to somebody out there to talk about my research as I’m doing it. I decided it’s time to knuckle down and try different types of citation analysis. I will be testing out different tools and methods to come up with a framework for data collection and analysis. I’m starting in the classic tradition of the data rich by going on a fishing expedition.

I’ve been collecting publication data for my institute since 2015 when we had an external academic review. They needed a bibliography from all academic publications from the institute in the last decade, so I brute forced it in Zotero. Coming up with a library of a couple thousand citations. Since then I’ve used Zotero to keep track of our publications. I like using Zotero because it generates a fairly rich data set which has been useful in a number of ways. I know what is the most cited paper of the last decade. I also know about 30% of our publications are through Elsevier, so I need to pay attention to changes in their OA policies. But is there anything else this data can tell me?

This is where I went fishing with the free text analysis platform Voyant. I put in a CSV of all of our publications from FY2015-2016 just to see what I’d get. It was this:

Yeah, I hate word clouds. This is garbage because I didn’t clean the data at all, instead just uploading the raw CSV file. Most of the top words are related to how Zotero organizes things, not the publications themselves. So I went about adding some stop words to mute these results. The default list in Voyant is quite good for text analysis of a literary corpus, which is not how I’m using it. It stands to reason that the stop words would need to be refined to make analysis of this data meaningful. So I nixed words like “storage”, “zotero”, “05”, “http”, “users”, and many other terms that seemed to be more about Zotero and the file systems. Here are the updated results:

This could use some more work, but now research topics are actually visible. Of course transportation is present, and I’m not surprised by “data”, “model”, “traffic”, “systems”, or “time.” Those are all common themes/terms used in our research. The surprise to me was “control” because I didn’t think control theory or control systems factored for that much of our research, though they are fundamental to some areas of autonomous and connected vehicles. I just never realized we published so much about it. Of course this is probably a reflection of the publication rates of different disciplines, but that’s a different fishing trip.

Something…libraries…something:Neutrality is dead.

Sophie Scholl flickr photo by jimforest shared under a Creative Commons (BY-NC-ND) license

I often lament that I never really use my undergraduate degree in History and German. Only now it seems like my decision to get a degree in what I joked was “Nazi Studies” (modern German history, literature, and culture could be reduced in such a manner) has prepared me for these really chaotic times we are currently living in. It’s why I know the story of Sophie Scholl and the White Rose anti-Nazi group. Scholl and many other were executed for their dissent. They are now regarded as heroes, but what about back then?

It’s hard not to think about these things when you have white supremacists openly marching and inciting terror on the streets. I’m talking about Charlottesville, but it could easily be any other city with these “Free Speech” protests or demonstrations around Confederate statues (symbols of this country’s love of racist things). They’re coming here next weekend and I will be there to counter protest. Maybe you’re going to stay home and sheetcake it, whatever you’re comfortable with.

Which gets me to the crux of this post. This week different library groups like ALA, ACRL, and APALA issued statements denouncing the racist violence and domestic terrorism in Charlottesville. I realized yesterday that SLA never made a statement. Hell, there hasn’t even been a discussion about such a statement. Given how much getting statements about the Muslim Ban and against HB2 was like pulling teeth, I’m not surprised by any of this. It’s not to say that individually SLA members don’t care – I’ve noticed on Twitter how many of my fellow SLAers are outraged and mobilized to speak out and stop hate and racism in many different ways. I’ve also noticed that nobody has called for SLA to do anything. Maybe we’ve learned that SLA isn’t that type of organization. That we aren’t there yet, and we don’t feel like opening up any uneasy truce we have with ourselves. That is weak excuse.

I’m not making a big deal about SLA this time around because I’d rather spend my energy working on other things. Combating white supremacy through SLA is not the most effective use of my bandwidth. Combating Nazis when they come to town is. Getting people to actually think about Antifas beyond mainstream media soundbites is. Am I disappointed with SLA’s inaction? Yes. Though if they did anything at this point it would mostly come off as too little too late. I wish SLA had more passion and energy for justice, but it still regards itself a an organization for corporate libraries. And while some corporations are taking stands (finally), many others are still silent.

Would I like SLA to change? Yes. Do I expect it to this week? No. Will I speak up and continue to push them to change? Yes, but only after the Nazis come to town.

If Charlottesville has taught us anything, it’s that the time for neutrality is over. The “both sides” argument is beyond dumb at this point and clearly picking a side. You can’t compare white supremacists and the anarchists. Libraries are figuring this out. I hope SLA speaks up before it’s too late.

A library? What’s that? Nobody knows.

Dennis Schuck flickr photo by Snap Man shared under a Creative Commons (BY-NC) license

I don’t know this fellow but I like his glasses and appreciate his style. I also identify with his expression in this photo: cheerful frustration.

This week there’s been a lot to make me question what’s a library these days and why does it matter. Perhaps it’s the inevitable existential dread of a steady stream of (totally valid and inevitable) fear and anger about global politics. It makes sense then, that librarians are trying to figure out where they fit in during this tumultuous time, and question the world around us.

The main thing that has raised my ire is the response to a recent article by Jane Schmidt and Jordan Hale about Little Free Libraries (LFL®). First of all, it’s depressing how many librarians didn’t actually read the article. They just skimmed posts about it, like this one, and got upset at a perceived attack on literacy. (Reading comprehension, what?) Secondly, in this time of heightened anxiety, fear, and fake facts, people are very quick to react emotionally. I think we’re so used to feeling deeply betrayed when we find out people close to us have different, seemingly fundamental views, so that criticising something so seemingly benign as your LFL® seems on par with saying you hate books or kicking puppies. Nevermind that the research was a critique of the LFL® model and its stated goals. I found Schmidt and Hale’s research to be refreshing in that it articulated a lot of misgivings I have about LFL®s in my community, and gave me some ideas for some research of my own.

There was also a post going around this week by Stacie Williams and how it’s impossible for libraries to remain neutral today. Longtime readers of this blog know I have strong feelings on this subject. Go ahead and read some of my old posts about how everything’s political. It’s 2017, we can’t pretend we live in a world where this sort of thing doesn’t matter. Professionally speaking, maybe it doesn’t matter to you but it probably does to your colleague or user, so is it OK to keep on acting like the status quo is OK? Libraries and librarians do need to look at out roles in systems of oppression and ways we can be forces for good, but what that entails will be very different depending on your library.

I’m not going to bury you in links. I’m not going to cite myself or others. I’m not going to prove to you I’ve read a lot and have deep thoughts backs up with critical theory and academic statements on it, because that’s not going to change much. (Seriously, if you need me to prove my credibility on this issue, then you probably wouldn’t care about my opinion… that’s a whole other rant though.)

I have been struggling since I got back from my parental leave with figuring out what my library is supposed to be because I have no idea what libraries are anymore. One thing that stood out from the LFL® controversy is that a lot of people assume a box with free books is a library, which rankles many librarians. They might argue that a box with books and a librarian makes a library. Do you need books or librarians (read: staff) to be a library? I don’t feel comfortable to answer that because it seems to ignore context and on the face of it be a knee-jerk preservation of our profession.

Much of the discussion about libraries I see in the literature (read: Twitter) is focused on public and academic libraries with large, diverse user populations. These libraries try to be all things to all people because they have to be through necessity of function and funding. As our local municipalities cut funding for social services, it often falls to libraries to fill the gap. Is that a good thing? Is it sustainable? Should it be lauded when a library becomes the community’s de facto drop-in shelter for homeless folk because the citizens and government won’t actually fund a proper one? What about academic libraries filling the needs of students because there isn’t enough student support on campus? Is that how to maintain funding long term? (Funny how this all seems to be tied to funding.) So from these definitions, it seems a library is a place that may or may not have books, does have access to material, staff to help navigate that stuff, and space for people to use that stuff or not. It’s not really a satisfying definition.

Of course missing from these discussions are the roles of all the other libraries. We talk about librarians fighting to preserve government data, but what about the librarians in those government agencies? What about librarians working for private corporations? Law librarians at legal firms? Most of these conversations ignore them because it doesn’t fit into the convenient narrative, but also because most people don’t have a direct experience with them so it’s harder to articulate what they are and why they matter. This is also where a lot of the assumed ethical imperatives of the profession get a bit more complicated. If Open Access is an assumed good (read: it should be the default, duh!), where do librarians working publishing companies opposed to true OA fit? What about librarians for defense contractors? Digital Asset Managers for companies where everything locked down? Are they not part of the profession? Do they not work in libraries?  I’m getting sidetracked, but it’s easy when I think about all the constituents I have to work with. If your organization’s function is part of systematic oppression, does that make you less of a librarian? In this capitalist society people need jobs to get by, and unfortunately sometimes you might have to work for the oppressor. That’s just reality.

So what makes MPOW a library? We have books, so I guess there’s that but I give that another 10 years. We want to provide resources and data which requires working with government agencies and private companies and given funding models for transportation, it’s likely it will all be licensed from private sources in the next decade as well. Also given the competition for research funding, I’m not sure how realistic it is to be completely open about things because it’s either giving up a competitive advantage for grants or we’re getting funding from a private company with a requisite NDA. As public funding of public research universities dries up, this is inevitable. I guess it’s that we’re a space of collaboration and exploration, which we have been for decades. They can’t take that away from us, but do we need librarians for that space? I think we’re trained to help mediate that discovery, so yes! Articulating that value is extremely difficult, especially without relying on outdated memories of going to the library as a child.

And those realities are why it makes it hard for me to deal with a lot of the discussions about what libraries are or should be doing, because they frankly have pretty narrow views of the profession. I fully believe libraries need to be as inclusive as possible for all of their populations, but I can also understand why it some libraries don’t put up signs showing how inclusive they are – because it would be out of place or perhaps contrary to their organization’s policies and missions.

So basically, I don’t know what it means to be a librarian or work in a library anymore, or even what a library is, because they are so varied to so many people it’s becoming meaningless. I would say it’s a philosophy or spirit, but I’m not sure that’s true anymore. Is it about the democratization of information? Access? Preservation? Navigation? I’d like to know.

What’s A Library Without Women? Closed.

Miss Shirley Robbins works at a library reference desk, January 8, 1952 flickr photo by North Carolina Digital Heritage Center shared under a Creative Commons (BY-NC-ND) license

Today is International Women’s Day, and there’s the Day Without Women Strike. General strikes aren’t that common now, but it’s important to remember that the origin of IWD is in the socialist worker’s movement – go read about it.

I came to work today after lots of deliberation. Had I not just gotten back from parental leave, and would it not mean closing the library, it would have been a simple “I’m staying home”. But as it stands, I am still catching up on work and have some deadlines. Of course, I’ll probably still have to close the library early due to limited staffing today and childcare. Which pretty much illustrates the role of women in libraries.

Librarians are a classic pink collar profession that still is predominantly female. In my library, every person working here is a woman (all 4 of us!). I’m the only one working today and I will leave early to make sure I can pick my kid up from daycare, a very gendered act (or a reflection of crazy Bay Area commutes – my partner’s is a 2 hour commute). If lots of librarians were participating in the Day Without Women, several libraries would be closed or have severely reduced services. Some colleagues are talking about it, though not as much publicly as I had hoped. Some of this is uneasiness with being overtly political at work, especially with the current administration. I share those fears.

I think a bigger, and much easier to overcome, problem though is moderate apathy and a need for comfort and convenience. You might argue that you can’t strike because you don’t want to limit service or close the library, but what’s a better demonstration of a day without women? I mean, that’s basically the point of the strike. A real day without women would pretty much mean a day without most libraries. Striking or not is a personal choice, but should be part of a conversation that goes beyond performance. Public apologias with excuses for not striking are kind of meaningless if they aren’t accompanied with smaller, actionable changes that can be done. Revolution isn’t for everybody, but if we actually want to change the status quo, we need to overcome inertia and actually do something. Thinking about change is a great first step, but going beyond normal comfort levels has to be the next. If that’s striking today, having a public dialog about the role of women in the profession, ways our services can be more accessible to the whole community, whatever else you got, there has to be some action. I recognize politics are very personal for a lot of people, there’s privilege in making political acts, and that some people really just don’t care. I don’t have time for the apathy, just like I don’t have much time for excuses.

So my actions today, other than angst: Join the campus walk out in a few minutes. Wearing the most red shirt that fits today. Working the circ desk. Closing early so I can pick up my kid from daycare. Discuss the role of women in libraries and society with anybody in earshot. Urge people to think about small ways they can do something to bring about change they want beyond public apologies for why they can’t.


Librarians, libraries, and politics: When governments gets irrational

Travel Ban Protest Rally Boston flickr photo by Kristin “Shoe” Shoemaker shared under a Creative Commons (BY-ND) license

I’ve been trying not to think about work much while I was out on parental leave, but it’s hard given how much I love my job and how the new administration has really made for interesting times. This blog has been quiet for a while because my job was really consuming in the run up to my leave, and then I had this kid, #babymoonbeam, and he’s kind of time consuming.

But given the near insanity of President Trump’s first month in office, I don’t really want to see what the next 40 will bring. This has been enough chaos. I remember the transition from Bush to Obama, and while there was some uncertainty it was all pretty measured. Nobody seems to know what’s going on right now and it seems like we’re all resigned to being on edge and a life of chaos. This story about the state of the State Department is a bleak example. You also have the Internet Archive preserving websites and moving servers to Canada just in case. Then there are the efforts to preserve government data from being wiped. For many of us, our jobs are caught in a political quagmire made worse by the uncertainty. (We’ll not even deal with the headache of balancing state and federal mandates when your state is openly defying the federal government, basically saying “COME AT US”. )*

This situation sucks.

I do take some heart that many of my colleagues are engaged in helping. Of course the cynical radical in me has some thoughts along the lines of “oh, now you’re paying attention…”, over all it’s been pretty positive. I think that action is a kind of self-care. Doing something gives us agency as we wait for things to shake out. Instead of rolling my eyes at my colleagues explaining how the government works and how to research government info to me, I’ll give them a thumbs up. (Though OK, come on – please have some self awareness that maybe you’re not the first librarian to dip their toes into legislative waters and researching this sort of thing. I can see how many ALA/ACRL members might not realize there are people who have been doing this their whole careers, but we’re supposed to be good at research!)

The thing that spurred me to start post was about SLA’s lack of response about travel ban executive order. It was too political. You know I firmly believe we’re political by the very nature of our jobs, and this year has not only affirmed that to me but really hardened that belief. I know I need to figure out my personal boundaries between personal and professional politics, but I’m not going to pretend that the very act of preserving publicly funded research isn’t considered political in this climate. Anybody (and this is pretty much all librarians) who works with government information now has to be vigilant and cynical. It’s a sorry state of affairs and might take time to adjust, especially if this isn’t your natural outlook. (Growing up in Sacramento and being obsessed with then dysfunctional state politics as a kid has served me well!)

Let’s continue the fight.

*Background on the history of Governor Moonbeam.