Cheers to the end of Fall term! It was a strange one for everyone, but DHA work went along smoothly nevertheless. We actually had a fair number of tasks on our hands this term, some (like updates to the Public Memory of Myanmar collection) carrying over from last year, others from fresh projects or classes.
One rather short but nonetheless important project I want to highlight was a bit of work we did for AMST 256 (Walt Whitman’s New York). The assignment students were working on involved analysis of individual sections of Whitman’s Song of Myself. The professor asked if we could provide any resource for them that would allow them to understand usage of language throughout the text, so as to add more nuance to the students’ arguments about their respective sections.
Our solution to this was to partition digitized versions of the text into sections (using a Python script! ah, my CS major is coming in handy), then upload those files to make Voyant corpora. We passed the links from these to the professor, along with some instruction documents from a previous term. This made for text analysis tools that were accessible to the students and required very little input from the professor.
(Looking at Voyant statistics like…)
The reason I wanted to highlight this project is because we want to offer more support to English and related courses, and I think this one provides a very nice model. While much of DHA work is centered around long-term projects that heavily involve professors, this assignment was a good example of how it’s easiest to introduce DH tools to courses—and professors—when they come tailored for the students. Once it’s been introduced, there’s a hope that the professor—or future professors for the course—will better understand how the tools can be used, and know when it might be useful to come to us in the future, or even incorporate it on their own.
Stay tuned for more DHA involvement with English courses! Our goal for next term is to make a “what we can do for you” sheet for English professors, offering ideas for places where DH tools could be useful to their courses, and how we could assist them with the implementation.
Hi! My name is Luna Yee and I’m now a senior Computer Science and Linguistics major planning to graduate after Winter term. This is my third year as a Digital Humanities Associate (DHA). I’ve been involved with a fair few of our group’s projects at this point, as well as support for a number of DH-involved classes. I’ve also been involved with the DH-adjacent Dakota Language Project through my linguistics work—check it out!
One reason I find DHA work rewarding is that it lets me engage as a user for digital tools. Academically, I tend to work on the development side—that is, with programming and design. By participating in like technologies as a user, support resource—and even sometimes a correspondent with the developers—I feel that I’ve gotten a good deal of practical perspective on the struggles of user experience and the design lifecycle. I expect this experience to translate positively into my professional work—and more immediately, into my CS comps.
Outside of academic and professional pursuits, I sleep enjoy reading fiction and fiddling around with creative writing. In the past few years, I’ve expanded to other fiction media and developed a taste for those as well, but a nice novel still wins in my book (ha). I’m also a member of the Aikido Club on campus, which serves as a nice enforced structured break from work—especially in the age of Zoom fatigue.
We have a new cast and crew, but we’re ready to get #backtoit. With five new DHAs joining us this year, you can expect we’ll be involved with quite a few projects this year; please look forward to it! And since we’re ready, a Spongebob meme seems appropriate:
“Oh?… So you don’t need to hear about the… unstandardized name entry fields?”
(After all, in the end, a good portion digital humanities is figuring out how to present data in a way that hides just how messy it actually is—or was.)
Hi! My name is Luna Yee and I’m now a junior Computer Science major (I plan to double major with Linguistics and minor in Cognitive Science). This is my second consecutive year as a Digital Humanities Associate; last year, I worked on typesetting in LaTeX for the Undergraduate Journal of Humanistic Studies (UJHS), tutoring students in Omeka (in-person and via online tools) for the Global Religions in Minnesota class, and managing the back-end of our other student-content online publications—namely the Prairie Creek Wildlife Management (PCWM) Area Digital Collection and the Carleton Guide to Medieval Rome (CGMR; updated site coming soon!)—among other projects.
While in academics I mostly focus on the development side of the digital sphere—that is, computer programming and algorithmic studies and research—I find that working in the Digital Humanities offers a diverse and fulfilling scope in which to apply such tools. DH allows us to use digital technologies not only to study human histories, cultures, and conditions but also to spread such information to a wider audience.
Outside of the realm of digital work and studies, I fancy myself an avid reader and discerning fan of fiction in all media—even though I find myself rarely with enough time to consume the material—and have developed a penchant for writing over the course of the years. I also love music (that is, listening to music; it’s been quite some time since I actively played an instrument) and the study of music perception and cognition.
As is the case in many parts of digital work, simplicity for the user results in complexity for the back-end in Google Forms.
For several years now, the Carleton Digital Humanities Associates have been in charge of handling projects that require the use of Google Forms to collect relevant data. To name a couple, we handle data submissions from ENTS 110 students to Carleton’s Prairie Creek Wildlife Management Area Digital Collection, as well as manage the Carleton Guide to Medeival Rome website, which is composed of submissions from students on Off-Campus Study in Italy. (The current version of the website is a work in progress, and I plan to come back and add a link once we make it live!)
One of the primary benefits of Google Forms is the ability to upload files with a relatively lenient storage capacity. Google Forms recently added a “File Upload” question type that automatically copies a file submitted by a user to your Google Drive, so the limit on files you can have uploaded is simply the amount of storage in your Drive. This far outstrips most other form submission software, which will often have very strict limits on uploading files, if they even allow such a response type.
However, as it would turn out, Google Forms is not well equipped for our needs. Both the Prairie Creek and Carleton Guide to Medeival Rome projects require us to upload the data we receive as a CSV (comma-separated value) or TSV (tab-separated value) file to a separate framework; ContentDM and Omeka, respectively. Converting a Google Spreadsheet to a CSV or TSV is a simple press of a Save As button, but the problem lies in the nature of the data in that spreadsheet.
The first problem is the headers. The headers of an output spreadsheet from Forms are the titles of the questions. Omeka, as an example, is built off of the Dublin Core metadata fields, and its CSV import feature (add-on) anticipates CSV headers that line up with this vocabulary. Additionally, we might receive data we need for administrative purposes (e.g., the timestamp), but has no purpose in the upload process.
A sample view of the response validation option in Google Forms. Only one such validation field may be attached to each question.
The second problem is response validation. While Google Forms has implemented some basic response validation, its capabilities are limited (you can specify that some answer must be a number, or not contain “hello world”; though admittedly, a regular expression option was recently added that enables some added complexity), and you can only set one dimension of validation per question. This becomes a problem when asking more specific questions; for example, a filename, which must have a suffixed file type, which is usually three characters after a period (.pdf, .png, .mp4), but sometimes four or more (.jpeg, .docx), and also not contain any characters besides alphanumerics, hyphens and underscores.
A sample of the spreadsheet output of a “file upload”-type form question.
The third problem has to do with file upload. As it turns out, Google Forms relates uploaded files not by their names, but rather by Drive URLs. While this is very convenient for accessing files within Drive’s online user interface, it becomes a problem when trying to retrieve them in a local file structure. Omeka, again the example, requires that items with relevant media be listed in the CSV with the name of the media file in one of the data fields. There are two issues with the Forms output: for one, each file name would need to be input manually by accessing the file through Drive; and second, Drive’s naming system uses spaces in the file name, which is illegal if not in Omeka at least in ContentDM.
My work recently has revolved around finding a workaround to these problems; particularly the third, which has resulted in a large number of DHA work-hours on the back-end in recent terms.
I have been uploading my working modules to the Digital Carleton Github, at https://github.com/DigitalCarleton/form-output-editing. While the project (and its documentation) are far from complete, the task currently has two parts.
First, Filename Append is a prototype Google Apps Script (.gs) to retrieve the filenames from a Google Form’s output spreadsheet. As it turns out, the links to Drive objects are rather useless to software outside of Drive’s own API, so if we wish to avoid manually inputting file names, we must use Google Apps Script, which is a relatively obscure bit of software that allows for additional scripts to be added to Google products, and the language in which Add-ons for Drive software is written. Filename Append in particular is a script that, when attached to a form-response Spreadsheet, will retrieve the names of uploaded files from the output Drive links.
Output sample from the Filename Append script. To the left is the standard upload output; to the right is the filename in Drive.
And second, Form Output Editor is a Python module for large-scale rewriting of the exported CSV. Currently, it draws from a configuration file (.json) to determine how it will apply three different functions. First, it can rename columns to new “aliases” from the configuration. Second, it can delete all the data from any column that we give some deletion flag, which we can do when aliasing. Lastly, it can rename files according to some template and other data in the CSV. For instance, suppose we want the template “DHA_{email prefix}_{date}”; if a row contains the timestamp “4/7/2019 11:17:48” and the email address “carletondha@gmail.com”, it could retrieve those and the file extension, renaming the file to “DHA_carletondha_4-7-2019.png”.
Though still a work in progress, and we are hoping this project will save us time in the long run. The Github repository will continue to update with code changes and more specific README information and instructions.
Happily, if you’re on this site, you probably know this already.
A meme for digital humanities as a whole…
Alternatively: Bringing obscure literary references to the wider internet.
And a hashtag for the training experience…
#preparingforanything
Long story short, it’s been a good week of training and I’m looking forward to getting into the work. One of the takeaways from this week has been that what we do is entirely dependent on what people (you people reading this blog!) need done, and I enjoy the challenge of the unknown like that. So, with that said, here we go!
Hello! My name is Luna Yee and I’m currently a sophomore at Carleton College, hoping to double major in Computer Science and Cognitive Science. If I had to put my academic pursuits into a single question, it would be this: how can we better understand computers, and how can computers better understand us? A true form of artificial intelligence might still be a pipe dream due to practical limitations (the human brain holds an astounding amount of data), but we have the tools and methodologies to at least have intelligent user interfaces and even user-tailored experiences. Computational linguistics, for example, is a field I have hopes of working in: the intricacies of teaching a computer to understand the nuances of human speech fascinates me.
Digital humanities is the exact linguistic match to this: combining computer platforms with the literal study of humans. I have a fondness for working on elegant user interfaces, and on designing with effective user input in mind. The way I see it, the more ease of access and effective response available in our computers, the better we can preserve and pass on the wisdoms we’ve learned as a society. And that might seem like a bit of a weighty description to give to the humanities, but if you ask me, that’s exactly what we’re working on here: efficiently preserving and accurately representing histories (of places, objects, people, societies, and so on) to make them more accessible to the world at large, and generations to come.