Categories
Digital Scholarship Summer Fellows News Projects

A Summer of Growth: Python, Text Mining, and the Summer Fellows Program

I owe a summer (and beyond) of growth to the Daily Digest. Yes, the Daily Digest, Bryn Mawr’s e-newsletter that arrives in our Outlook inboxes shortly after midnight and seems too long to read through. As an anxious first year in the fall of 2020, I found the amount of information in each newsletter to be comforting. So began a habit I still have: wake up, read the Daily Digest, make note of any interesting events or opportunities, and go on about my day. I was reading the Daily Digest in mid-March when I noticed a blurb titled Digital Scholarship Summer Fellows Program: Apply by April 1.” Students, regardless of their major, were being encouraged to apply for a paid, ten-week fellowship where they would learn how to code, analyze archival materials, and work with data. While I was intrigued, my interest increased after I learned that the fellowship would focus on The College News, Bryn Mawr’s student newspaper active from 1914 to 1968. Such an eventful period would surely be engaging to work with, so I submitted my application.  

Eight thumbnails showing issues of The College News: 9/30/1914, 10/8/1914, 10/15/1914, 10/22/1914, 10/18/1968, 10/11/1968, 9/27/1968, 10/4/1968
The first four issues of The College News and the last four issues.

An interview and offer later, it was June 1st and I was sitting in Bryn Mawr’s Digital Media Lab with four other Digital Scholarship Summer Fellows. I felt nervous because I had little prior programming experience. As it turns out, most of the fellows were also inexperienced. Yet, since we all worked through various programming tutorials for the first two weeks, this didn’t matter for long. By the end of the third week, we had worked together to write pseudocode and a Python program to scrape the TriCollege Libraries Digital Collections website for issues of The College News. The result was a corpus of 1,340 text files, with each file containing a transcript of an issue. After exploring the corpus, we brainstormed visualizations and projects to create. We then split into smaller groups, figured out who would do what, and got to work. 

Each of us spent the rest of the summer multitasking. Most of my tasks required collecting and cleaning data. This meant learning more Python so I could write programs that would search, analyze, and visualize the corpus. Within weeks, I went from not knowing what a module was to using several at once in my programs. Some programs identified locations for the online map, which visualizes all locations mentioned in The College News, and the wooden map, which visualizes where some Bryn Mawr graduates are from. The locations were then cleaned and normalized in OpenRefine. I used the desktop application to remove Bryn Mawr campus locations, unidentifiable locations, and words that had mistakenly been identified as locations. I also clustered together locations with different spellings. In the end, the online map’s 7,000+ locations were reduced to 4,229 and the wooden map’s 300+ locations were reduced to 85. 

A map showing Chicago. Count: 655, Issues: 420, several spelling variations.
A screenshot detail of our online map.

Other programs identified keywords and their surrounding contexts for my individual project, which visualized ‘negro’ and related words in The College News. I encountered some problems throughout the project. For one, as ‘black’ was not used as a racial label in the newspaper until the late 1960s, and its last issue was published in 1968, there were few instances of the word as a racial term. In comparison, ‘negro’ and its variations had many instances, enough to visualize. But how? I wanted to show the Bryn Mawr community’s perceptions and treatment of black people. After unsuccessful attempts to extract more information from the instances through sentiment analysis and topic modeling, I decided to visualize them as is. Therefore, I created a table bubble plot with the frequency and context of ‘negro’ and related words. Viewers of my visualization can see which issues used these words, when, and how. 

The end of the summer arrived before the other fellows and I knew it. As we wrapped up our work, I was filled with pride and sadness. We had accomplished so much in only ten weeks! Still, I would miss the time we spent together. As cheesy as it sounds, the Digital Scholarship Summer Fellowship changed my life. Through it, I discovered that I could code and was inspired to take a computer science course in the fall. Now I’m a computer science major. Additionally, the fellowship gave me the opportunity to make pieces of college history more accessible. I look forward to seeing what other pieces are made more accessible, and how, by future fellows.