Author: Luna Yee
As is the case in many parts of digital work, simplicity for the user results in complexity for the back-end in Google Forms.
For several years now, the Carleton Digital Humanities Associates have been in charge of handling projects that require the use of Google Forms to collect relevant data. To name a couple, we handle data submissions from ENTS 110 students to Carleton’s Prairie Creek Wildlife Management Area Digital Collection, as well as manage the Carleton Guide to Medeival Rome website, which is composed of submissions from students on Off-Campus Study in Italy. (The current version of the website is a work in progress, and I plan to come back and add a link once we make it live!)
One of the primary benefits of Google Forms is the ability to upload files with a relatively lenient storage capacity. Google Forms recently added a “File Upload” question type that automatically copies a file submitted by a user to your Google Drive, so the limit on files you can have uploaded is simply the amount of storage in your Drive. This far outstrips most other form submission software, which will often have very strict limits on uploading files, if they even allow such a response type.
However, as it would turn out, Google Forms is not well equipped for our needs. Both the Prairie Creek and Carleton Guide to Medeival Rome projects require us to upload the data we receive as a CSV (comma-separated value) or TSV (tab-separated value) file to a separate framework; ContentDM and Omeka, respectively. Converting a Google Spreadsheet to a CSV or TSV is a simple press of a Save As button, but the problem lies in the nature of the data in that spreadsheet.
The first problem is the headers. The headers of an output spreadsheet from Forms are the titles of the questions. Omeka, as an example, is built off of the Dublin Core metadata fields, and its CSV import feature (add-on) anticipates CSV headers that line up with this vocabulary. Additionally, we might receive data we need for administrative purposes (e.g., the timestamp), but has no purpose in the upload process.

The second problem is response validation. While Google Forms has implemented some basic response validation, its capabilities are limited (you can specify that some answer must be a number, or not contain “hello world”; though admittedly, a regular expression option was recently added that enables some added complexity), and you can only set one dimension of validation per question. This becomes a problem when asking more specific questions; for example, a filename, which must have a suffixed file type, which is usually three characters after a period (.pdf, .png, .mp4), but sometimes four or more (.jpeg, .docx), and also not contain any characters besides alphanumerics, hyphens and underscores.

The third problem has to do with file upload. As it turns out, Google Forms relates uploaded files not by their names, but rather by Drive URLs. While this is very convenient for accessing files within Drive’s online user interface, it becomes a problem when trying to retrieve them in a local file structure. Omeka, again the example, requires that items with relevant media be listed in the CSV with the name of the media file in one of the data fields. There are two issues with the Forms output: for one, each file name would need to be input manually by accessing the file through Drive; and second, Drive’s naming system uses spaces in the file name, which is illegal if not in Omeka at least in ContentDM.
My work recently has revolved around finding a workaround to these problems; particularly the third, which has resulted in a large number of DHA work-hours on the back-end in recent terms.
I have been uploading my working modules to the Digital Carleton Github, at https://github.com/DigitalCarleton/form-output-editing. While the project (and its documentation) are far from complete, the task currently has two parts.
First, Filename Append is a prototype Google Apps Script (.gs) to retrieve the filenames from a Google Form’s output spreadsheet. As it turns out, the links to Drive objects are rather useless to software outside of Drive’s own API, so if we wish to avoid manually inputting file names, we must use Google Apps Script, which is a relatively obscure bit of software that allows for additional scripts to be added to Google products, and the language in which Add-ons for Drive software is written. Filename Append in particular is a script that, when attached to a form-response Spreadsheet, will retrieve the names of uploaded files from the output Drive links.

And second, Form Output Editor is a Python module for large-scale rewriting of the exported CSV. Currently, it draws from a configuration file (.json) to determine how it will apply three different functions. First, it can rename columns to new “aliases” from the configuration. Second, it can delete all the data from any column that we give some deletion flag, which we can do when aliasing. Lastly, it can rename files according to some template and other data in the CSV. For instance, suppose we want the template “DHA_{email prefix}_{date}”; if a row contains the timestamp “4/7/2019 11:17:48” and the email address “carletondha@gmail.com”, it could retrieve those and the file extension, renaming the file to “DHA_carletondha_4-7-2019.png”.
Though still a work in progress, and we are hoping this project will save us time in the long run. The Github repository will continue to update with code changes and more specific README information and instructions.