Updates for 2019 fall term: Creating a WordPress site and Updating the Carleton DHA page

During the fall term 2019, I’ve been working on the WordPress site and updating the Carleton DHA page.

In the former project, collaborating with professors from the Classics Department, I created CHIANTI site, a WordPress site. To add and organize various contents, I used several plugins: Elementor to organize the content pages, Shortcodes and List category posts to order posts sorted by categories on a page, Document Embedder to convert language learning sources to be downloadable, Smart Slider to use a video carousel on the student portal page, and Pods Admin to create a submission form for faculties.

chianti site
French page for the instructors (The var on the left shows the code for showing posts sorted by tags)

In the course of arranging and refining the site, I realized some tips which would be helpful when creating websites at another time. I’ll write them down for future use.

  • Clarify the audience and objects of the website.
  • When you get stuck, google for the troubleshooting first. There is maybe somebody who is in the same situation and already asked similar questions.
  • Be careful about the consistency – theme colors, fonts, font sizes……
  • When you are not sure which plugin to use, see their review, download numbers, the latest update date.
  • If you create a website and then yield control over it to the third party, make sure to create a concise and easy to follow instructional document. (preferably with some screenshots as needed) This is actually a great way to keep information in one place, such as the theme colors and fonts.
  • Finally, although there is a lot more to mention, communicating with partners/clients is crucial to improve the website closer to what they expect.

Regarding updating the Carleton DHA page, with permission to access and edit the page, I mainly updated the DH members for this year and the past projects. Updating past projects especially required some important things to keep in mind: 1) Use visually eye-catching screenshots of the project, 2) Check the copyright of the image within the screenshots, 3) Avoid controversial contents/images publishing on the web, 4) Make sure that private information is hidden.

As you’ve seen, I spent most of the time working with WordPress. For the next term, I hope I’ll be working with other types of digital tools.

Mapping the Fifteenth-Century London Chronicles: Experimentation and Collaboration

One of the projects I’ve been working on this year has been a textual analysis of the fifteenth-century London Chronicles for an English professor’s research. The professor hoped to identify and isolate place names in the text (such as London Bridge, Sussex, etc.) and make a map of all the data. This is where the Digital Humanities team came in: what software and digital tools could we use to extract this data and display it in an insight way?

The first tool we examined was Voyant, an online textual analysis tool that creates data visualizations. We uploaded a PDF of the London Chronicles to Voyant and played around with the website to see how it worked and determine whether it was effective.

A screenshot of the London Chronicles data visualization in Voyant

While Voyant was great for analyzing macro data sets and getting a holistic view of the text, it was rather ineffective for gathering specific iterations of place names and appeared no better than manual close reading for this purpose. One of the other problems we encountered were the variations in medieval spelling; for example, Voyant created a separate category for “London” and “Londan” even if they referred to the same place.

We then turned to a different tool to help map our place names: Edinburgh Geoparser. Geoparser created a wonderful map of the place names. However, it was unable to quantify the number of times a place name appears or arrange the place names in order of frequency. Thus, it was great for visualizing the places but not ideal for textual analysis.

The map of the London Chronicles created by Edinburgh Geoparser
The map of the London Chronicles created by Edinburgh Geoparser

Finally, after testing these different softwares, we stumbled upon a Gazetteer of Early Modern Europe which contained a list of place names, their spelling variants, and their location. We collaborated with a member of the Data Squad, a local Carleton organization dedicated to organizing data, to produce a program that would cross-reference The London Chronicles PDF with an XML of this data. In this manner, we would be able to get a reliable count of place names in the text that included their spelling variants.

The Early Modern London Gazetteer

This process has taught me that Digital Humanities is a lot of trial and error. In doing this research, I’ve learned there might not be one perfect tool for a project, but combining different resources and collaborating with others allowed me to find an innovative solution. This experimentation and sharing of ideas and research is vital to the work we do as Digital Humanities Associates.

Exploring topics from different perspectives: #multifacetedapproach

I enjoyed my training as an DHA and am excited for the upcoming year. The interdisciplinary nature makes the field extremely broad, requiring us to take a #multifacetedapproach. I really enjoy this dynamic aspect about digital humanities. As a result, I made the following meme.

I am excited to learn more and expand my horizons further in my work as a DHA!

The End of the Beginning and #PreparingforInsightfulChallenges

I finished my first week of Digital Humanities training and it was really fascinating – the word that comes to mind is “insightful.” It was definitely not what I was expecting so far as I was expecting the focus to more be on concrete skills than discussion and mental exercises but this had no impact on my enjoyment of the training. From what I heard of how DH works, I decided upon this meme:

I am really excited and intrigued for what’s to come next and I am sure I will need to take on new skills and difficulties. These difficulties are bound to be fascinating and something I  learn from which is why my hashtag is #preparingforinsightfulchallenges.

Ready for #RoundTwo!

Training week has ended which means it’s time to really get started! I decided to summarize digital scholarship in the following meme, showcasing what I have learned this week:

After being abroad last term, I’m excited to see how our digital projects have evolved and what new projects may be coming our way!

First Dive into Digital Humanities #packingforanewadventure

I finished a week of Digital Humanities training and here is what I think of Digital Humanities now……

Digital Humanities Fry - NOT SURE IF DIGTAL OR HUMANITIES Futurama Fry

Digital Humanities is interdisciplinary and involves a variety of information and tools. Through the training, I found Digital Humanities more interesting and harder to grasp the whole concept of itself, and it keeps me thinking of its unlimited possibilities.

Finally, my hashtag for the training is……#packingforanewadventure. I’m excited about exploring the world of Digital Humanities!

Ready for another year of Digital Humanities! #backtoit

We have a new cast and crew, but we’re ready to get #backtoit. With five new DHAs joining us this year, you can expect we’ll be involved with quite a few projects this year; please look forward to it! And since we’re ready, a Spongebob meme seems appropriate:

Meme from a Spongebob Squarepants scene. Bouncer: Welcome to the Salty Spitoon how tough are ya? Bodybuilder: How tough am I? I once uploaded a hundred items' worth of media files to an Omeka site. Bouncer: Yeah so? Bodybuilder: One of the files was named "Pride&Prejudice #1.mp4 (2).png" Bouncer: Right this way
“Oh?… So you don’t need to hear about the… unstandardized name entry fields?”

(After all, in the end, a good portion digital humanities is figuring out how to present data in a way that hides just how messy it actually is—or was.)

Managing Google Forms Output

As is the case in many parts of digital work, simplicity for the user results in complexity for the back-end in Google Forms.

For several years now, the Carleton Digital Humanities Associates have been in charge of handling projects that require the use of Google Forms to collect relevant data. To name a couple, we handle data submissions from ENTS 110 students to Carleton’s Prairie Creek Wildlife Management Area Digital Collection, as well as manage the Carleton Guide to Medeival Rome website, which is composed of submissions from students on Off-Campus Study in Italy. (The current version of the website is a work in progress, and I plan to come back and add a link once we make it live!)

One of the primary benefits of Google Forms is the ability to upload files with a relatively lenient storage capacity. Google Forms recently added a “File Upload” question type that automatically copies a file submitted by a user to your Google Drive, so the limit on files you can have uploaded is simply the amount of storage in your Drive. This far outstrips most other form submission software, which will often have very strict limits on uploading files, if they even allow such a response type.

However, as it would turn out, Google Forms is not well equipped for our needs. Both the Prairie Creek and Carleton Guide to Medeival Rome projects require us to upload the data we receive as a CSV (comma-separated value) or TSV (tab-separated value) file to a separate framework; ContentDM and Omeka, respectively. Converting a Google Spreadsheet to a CSV or TSV is a simple press of a Save As button, but the problem lies in the nature of the data in that spreadsheet.

The first problem is the headers. The headers of an output spreadsheet from Forms are the titles of the questions. Omeka, as an example, is built off of the Dublin Core metadata fields, and its CSV import feature (add-on) anticipates CSV headers that line up with this vocabulary. Additionally, we might receive data we need for administrative purposes (e.g., the timestamp), but has no purpose in the upload process.

A sample view of the response validation option in Google Forms. Only one such validation field may be attached to each question.

The second problem is response validation. While Google Forms has implemented some basic response validation, its capabilities are limited (you can specify that some answer must be a number, or not contain “hello world”; though admittedly, a regular expression option was recently added that enables some added complexity), and you can only set one dimension of validation per question. This becomes a problem when asking more specific questions; for example, a filename, which must have a suffixed file type, which is usually three characters after a period (.pdf, .png, .mp4), but sometimes four or more (.jpeg, .docx), and also not contain any characters besides alphanumerics, hyphens and underscores.

A sample of the spreadsheet output of a “file upload”-type form question.

The third problem has to do with file upload. As it turns out, Google Forms relates uploaded files not by their names, but rather by Drive URLs. While this is very convenient for accessing files within Drive’s online user interface, it becomes a problem when trying to retrieve them in a local file structure. Omeka, again the example, requires that items with relevant media be listed in the CSV with the name of the media file in one of the data fields. There are two issues with the Forms output: for one, each file name would need to be input manually by accessing the file through Drive; and second, Drive’s naming system uses spaces in the file name, which is illegal if not in Omeka at least in ContentDM.

My work recently has revolved around finding a workaround to these problems; particularly the third, which has resulted in a large number of DHA work-hours on the back-end in recent terms.

I have been uploading my working modules to the Digital Carleton Github, at https://github.com/DigitalCarleton/form-output-editing. While the project (and its documentation) are far from complete, the task currently has two parts.

First, Filename Append is a prototype Google Apps Script (.gs) to retrieve the filenames from a Google Form’s output spreadsheet. As it turns out, the links to Drive objects are rather useless to software outside of Drive’s own API, so if we wish to avoid manually inputting file names, we must use Google Apps Script, which is a relatively obscure bit of software that allows for additional scripts to be added to Google products, and the language in which Add-ons for Drive software is written. Filename Append in particular is a script that, when attached to a form-response Spreadsheet, will retrieve the names of uploaded files from the output Drive links.

Output sample from the Filename Append script. To the left is the standard upload output; to the right is the filename in Drive.

And second, Form Output Editor is a Python module for large-scale rewriting of the exported CSV. Currently, it draws from a configuration file (.json) to determine how it will apply three different functions. First, it can rename columns to new “aliases” from the configuration. Second, it can delete all the data from any column that we give some deletion flag, which we can do when aliasing. Lastly, it can rename files according to some template and other data in the CSV. For instance, suppose we want the template “DHA_{email prefix}_{date}”; if a row contains the timestamp “4/7/2019 11:17:48” and the email address “carletondha@gmail.com”, it could retrieve those and the file extension, renaming the file to “DHA_carletondha_4-7-2019.png”.

Though still a work in progress, and we are hoping this project will save us time in the long run. The Github repository will continue to update with code changes and more specific README information and instructions.

About Antirubbersheeter

As a follow up to Elizabeth’s post last year, I tried to “play with” the old map Elizabeth was using and see if I could crate any meaningful mapping out of it. First, as Elizabeth previously pointed out, this old map is quite difficult to use, since there are a lot of places in the map that do not have a similar spacing or arrangement in a modern map. As a result, instead of trying to superimpose this old map to a modern map (e.g. google maps) and geocode it (i.e. “georectifying” the map on a software like ArcGIS), I used a tool called Antirubbersheeter created by Moacir P. de Sá Pereira to annotate the map.

Here is what the old map looks like again:

Woodcut image of Rome

In annotating the map, I went to the website and followed the instructions. First, I uploaded the image that I wanted to use (i.e. the old map of Rome above). This image would then also be uploaded to imgur. Then, I listed a few places that I wanted to mark on the map one by one separated by commas on the geocode box. Due to my minimum knowledge of Rome geography, I only picked four places: Papal Palace, Castel St Angelo, Pantheon, Colosseum.

Uploading image and list the places

After clicking on Geocode, now I can start on geocoding the places that I specified on the homepage. For each place, I saved the geocode and proceeded to the next.

Geocoding the map

Again, my knowledge about Rome is extremely limited, so I might have made a lot of errors in geocoding my places. After saving everything, Antirubbersheeter gave me a .json file that refers to my now annotated map. Here is the final result:

Annotated map

Ideally, we would want to annotate more places to give a detailed tour of Rome. I was initially thinking to somehow put this map as a layer on a modern map, but this would go against Moacir’s purpose in creating Antirubbersheeter, which is to make the old map to be the center of the study not the pre-existing ground, such as Google Maps. However, antirubbersheeter also gives us a .json file that could be further modified using a Leaflet template. For instance, we can create lines, pop-ups, or circles on the map using Javascript-based functions. This requires some javascripting and CSS skills.

My .json file with minimum Javascript skills

Although antirubbersheeter seems really simple, its interactive feature makes it easier to produce .json file. It also automatically uploads the image you use to imgur, so that you do not have to deal with local directory problems when trying to present your annotated map. However, to make a full use of it, you need an ample skills and knowledge of Javascript and Leaflet platform in general. If you were comfortable with both, antirubbersheeter would seem unnecessary, and you perhaps want to utilize the offline version. the only difference between the online and local platforms is that your image would not be stored on imgur if you use the local platform, which would be important if the image has copyrights. Thus, although antirubbersheeter is interactive in intuitive on its own, its use is limited without Leaflet platform. Overall, this tool could be made more useful if it also allows for other kinds of annotations that are only available when using Leaflet.

A Week of DHA Training

A meme for the first few days of training…

Meme of Patrick Star: "No Patrick...Digital Humanities is not an instrument."
Happily, if you’re on this site, you probably know this already.

A meme for digital humanities as a whole…

Meme of the Terrible Trivium (from the Phantom Tollbooth): "Digital Humanities: Converting data to an accessible digital format one grain of sand at a time."
Alternatively: Bringing obscure literary references to the wider internet.

And a hashtag for the training experience…


Long story short, it’s been a good week of training and I’m looking forward to getting into the work. One of the takeaways from this week has been that what we do is entirely dependent on what people (you people reading this blog!) need done, and I enjoy the challenge of the unknown like that. So, with that said, here we go!