Two areas have been absolutely integral to my time at Carleton: history and digital humanities. I was lucky to discover both relatively early in my Carleton career – in fact, I practically discovered the two simultaneously. For three years, I worked as a Digital Humanities Associate, and have been a dedicated history major for about as long.
My participation in Carleton’s Digital Humanities program throughout my time at Carleton shaped my college experience in a multitude of ways. The most obvious was my work as a DHA, which gave structure to my time outside of class. My work allowed me to explore a variety of subjects, projects, softwares, and methodologies. There are few other places where I could have been exposed to eighteenth-century English workhouses, the religious diversity of Minnesota, the medieval sites of Rome, the work of the eighth-century monk Bede, Greek archaeological objects, undergraduate scholarship, and the debates surrounding the Boston Massacre (among others). The accompanying projects were just as diverse as the content, involving the online exhibits, 3D models of objects and buildings, historical video games, digital maps, and more.
My time as a DHA exposed me to more than just these areas of content. The work challenged me to further my digital skills and problem solving abilities, and think about our work in new ways. I learned about issues of accessibility and open access through our work, and the ways in which digital humanities practitioners consciously strive to make their work accessible, ethical, and socially responsible, all considerations I carried into my own academic work.
I also had the opportunity to practice important skills that aren’t strictly digital, but crucial to working in a team, such as reporting on my own work, communicating with team members, and identifying and contacting specialists on campus when the issue went beyond our own skills. My work as a DHA facilitated my learning these crucial skills.
As my time at Carleton comes to end, I become more cognizant of what has made my experience what it was, and my work as a DHA is one critical aspect of my time at Carleton. I am grateful to have had the opportunity to do meaningful work with the wonderful students, staff, and faculty involved with Carleton’s DH program. And a very special thank you to Sarah Calhoun and Austin Mason for all their help and support for the past couple of years!
As is the case in many parts of digital work, simplicity for the user results in complexity for the back-end in Google Forms.
For several years now, the Carleton Digital Humanities Associates have been in charge of handling projects that require the use of Google Forms to collect relevant data. To name a couple, we handle data submissions from ENTS 110 students to Carleton’s Prairie Creek Wildlife Management Area Digital Collection, as well as manage the Carleton Guide to Medeival Rome website, which is composed of submissions from students on Off-Campus Study in Italy. (The current version of the website is a work in progress, and I plan to come back and add a link once we make it live!)
One of the primary benefits of Google Forms is the ability to upload files with a relatively lenient storage capacity. Google Forms recently added a “File Upload” question type that automatically copies a file submitted by a user to your Google Drive, so the limit on files you can have uploaded is simply the amount of storage in your Drive. This far outstrips most other form submission software, which will often have very strict limits on uploading files, if they even allow such a response type.
However, as it would turn out, Google Forms is not well equipped for our needs. Both the Prairie Creek and Carleton Guide to Medeival Rome projects require us to upload the data we receive as a CSV (comma-separated value) or TSV (tab-separated value) file to a separate framework; ContentDM and Omeka, respectively. Converting a Google Spreadsheet to a CSV or TSV is a simple press of a Save As button, but the problem lies in the nature of the data in that spreadsheet.
The first problem is the headers. The headers of an output spreadsheet from Forms are the titles of the questions. Omeka, as an example, is built off of the Dublin Core metadata fields, and its CSV import feature (add-on) anticipates CSV headers that line up with this vocabulary. Additionally, we might receive data we need for administrative purposes (e.g., the timestamp), but has no purpose in the upload process.
A sample view of the response validation option in Google Forms. Only one such validation field may be attached to each question.
The second problem is response validation. While Google Forms has implemented some basic response validation, its capabilities are limited (you can specify that some answer must be a number, or not contain “hello world”; though admittedly, a regular expression option was recently added that enables some added complexity), and you can only set one dimension of validation per question. This becomes a problem when asking more specific questions; for example, a filename, which must have a suffixed file type, which is usually three characters after a period (.pdf, .png, .mp4), but sometimes four or more (.jpeg, .docx), and also not contain any characters besides alphanumerics, hyphens and underscores.
A sample of the spreadsheet output of a “file upload”-type form question.
The third problem has to do with file upload. As it turns out, Google Forms relates uploaded files not by their names, but rather by Drive URLs. While this is very convenient for accessing files within Drive’s online user interface, it becomes a problem when trying to retrieve them in a local file structure. Omeka, again the example, requires that items with relevant media be listed in the CSV with the name of the media file in one of the data fields. There are two issues with the Forms output: for one, each file name would need to be input manually by accessing the file through Drive; and second, Drive’s naming system uses spaces in the file name, which is illegal if not in Omeka at least in ContentDM.
My work recently has revolved around finding a workaround to these problems; particularly the third, which has resulted in a large number of DHA work-hours on the back-end in recent terms.
I have been uploading my working modules to the Digital Carleton Github, at https://github.com/DigitalCarleton/form-output-editing. While the project (and its documentation) are far from complete, the task currently has two parts.
First, Filename Append is a prototype Google Apps Script (.gs) to retrieve the filenames from a Google Form’s output spreadsheet. As it turns out, the links to Drive objects are rather useless to software outside of Drive’s own API, so if we wish to avoid manually inputting file names, we must use Google Apps Script, which is a relatively obscure bit of software that allows for additional scripts to be added to Google products, and the language in which Add-ons for Drive software is written. Filename Append in particular is a script that, when attached to a form-response Spreadsheet, will retrieve the names of uploaded files from the output Drive links.
Output sample from the Filename Append script. To the left is the standard upload output; to the right is the filename in Drive.
And second, Form Output Editor is a Python module for large-scale rewriting of the exported CSV. Currently, it draws from a configuration file (.json) to determine how it will apply three different functions. First, it can rename columns to new “aliases” from the configuration. Second, it can delete all the data from any column that we give some deletion flag, which we can do when aliasing. Lastly, it can rename files according to some template and other data in the CSV. For instance, suppose we want the template “DHA_{email prefix}_{date}”; if a row contains the timestamp “4/7/2019 11:17:48” and the email address “carletondha@gmail.com”, it could retrieve those and the file extension, renaming the file to “DHA_carletondha_4-7-2019.png”.
Though still a work in progress, and we are hoping this project will save us time in the long run. The Github repository will continue to update with code changes and more specific README information and instructions.