Leading the CARCAS project’s transition to a new back end

Erin Watson’s graduation reflection

I started this year excited to continue working on CARCAS, an archaeology project about displaying 3D scans of animal bones. I am now graduating, and looking back on the work I have done, I am incredibly proud of the technologies I have learned, the technical documentation I have written, and my growth as a team leader.

This winter, with the addition of the Alpaca skeleton, it became clear that the current systems in place for storing the CARCAS models on the web server were no longer sustainable. There were too many big files, and the method for storing them took up too much additional space.

The transition took a lot of time and effort. I started by researching different tools, eventually settling on Datalad since it has clear, beginner friendly documentation and it can play nice with the web server that CARCAS already used. Figuring out the correct configuration tested my patience. My early attempts at making the models downloadable caused a nasty little bug that prevented the correct method from working. I tried troubleshooting on the Datalad forum, but even experienced users couldn’t figure out how to fix it. I finally started over from scratch, and everything worked like a charm.

Dealing with this bug, and solving it by starting from scratch, reinforced that failing is part of the learning process. The second time around, everything went a lot smoother because I understood how all the pieces fit together. And, I was able to write down what I was doing because I wasn’t overwhelmed with learning new things.

Once the new system was in place, my supervisors emphasized that I needed to write instructions so that future DHAs could learn how to use it. After all, a system for collaboration and backups is useless if no one knows how to use it. I thought this would be relatively straightforward. After all, I knew how to use the system and I didn’t have comprehensive notes of my own.

I discovered that I had learned a lot through experimenting with Datalad and the new CARCAS system, and that it was not immediately obvious to others what to do. I collaborated with Noah Zameer Lee, another DHA. He had completed the Datalad tutorial like I had, but he had been working on other projects while I set up the new system. When I was sitting right next to him, I could guide him through using the system, but there was a lot that was not clear, even to someone familiar with the software.

I wrote a few different sets of instructions and documentation for different use cases. I focused especially on first time set up, routine tasks, and where to learn more to deeply understand the tools. These were the areas that overwhelmed me when I first started working on CARCAS. First time set up and routine tasks look easy when you have done them before, but when you are just getting started, there’s nothing for your brain to latch on to.

Also, towards the end of making it easy for a future DHA to get started, I hosted a recorded Zoom meeting with my supervisors and coworkers where I demonstrated what it looks like to follow my instructions and start making changes. I deliberately chose to record this video from a new account on my computer so that I would have to show the set up.

Recording this walk through was also incredibly helpful for me. I discovered tasks that I had forgotten to write about because they had become second nature, and I discovered sections of my documentation that were too cluttered and difficult to reference. Just as revising is helpful when writing an essay or a blog post, I learned that it is also an essential step of writing technical documentation.

Working on CARCAS’s back-end transition has taught me a lot about working as a team leader. I made impactful decisions, like when I decided that Datalad was the best tool for CARCAS. I had to reassess and choose whether or not to stand by my decisions, like when I spent weeks looking for the bug in my first attempt at using Datalad. I needed strong communication throughout the whole process. I explained to my supervisors what I was doing and why, without getting into the technical weeds. I taught Noah how to use the system I set up, and I helped him figure out how it fit in with his piece of the project. I created documentation and a video for future DHAs, in hopes that I could pass on my knowledge.

I have had a wonderful time working on CARCAS this year. As I go off into the world after Carleton, I can’t wait to look back and see how CARCAS keeps on growing!

Highlight of My Year: The Mapping Japan Project

As a Digital Humanities Associate (DHA) this year, my work on the Mapping Japan project has been incredibly rewarding. This exciting initiative is a collaborative effort between multiple departments and the Gould Library at Carleton College, led by Professor Asuka Sango from the Religion and Asian Studies departments. Initially developed during the Institute for Liberal Arts Digital Scholarship (iLiADS) last summer at Davidson College, our goal is to digitize Carleton’s rich Japanese map collections and build an Omeka S site to host them. This site and the sample items will serve as valuable resources for a course (ASST 285: Mapping Japan, the Real and the Imagined) next spring.

Our Collections

Our collections include Gaihōzu maps, produced by Imperial Japan in the late 19th and early 20th centuries, which were captured by Allied forces at the end of World War II and distributed to various libraries in the U.S. Carleton has approximately 1,280 sheets of Gaihōzu. There are also Naihōzu maps, which are maps of Japan and its overseas territories, also produced by Imperial Japan and captured by Allied forces. We are currently organizing this collection, which likely includes several hundred sheets. Additionally, our collection holds 70 sheets of maps of Japan created by the Office of Strategic Services (former CIA) during and shortly after World War II, as well as 8 sheets of premodern Japanese maps, with plans to acquire more.

My Role in the Project

My contributions to the Mapping Japan project have been multifaceted.

Metadata Template Creation

We decided to create two separate Omeka sites: one for general purposes and another for student exhibits next year. For our metadata template, we chose Dublin Core due to its widespread acceptance and standardization in metadata practices. Key fields in our template include titles (in English and Japanese), descriptions, call number, creators and contributors, genres, dates, and places shown.

Creating the metadata template was a thoughtful process that involved iterative discussions to identify the most critical properties of the maps. We examined how institutions like the Stanford University Libraries present their Gaihōzu maps and incorporated feedback from Professor Sango on how the template could serve as a gradable assignment in her upcoming class. We debated details such as whether to record B&W/color distinctions and which field to use for physical dimensions. Additionally, we considered the best media for showcasing the digitized maps, evaluating options such as IIIF, the default Omeka settings, or the published Google Drive images.

Creating Sample Site Structures and Wireframes

Designing the website structure was another fascinating part of my work. We analyzed exemplary Omeka S sites, especially those showcasing maps, to learn from their navigation bars, search functions, and more. As a statistics major, I was particularly interested in the data visualization aspect– how to best present all types of data, whether textual, visual, or numerical, in the most accessible way. Collaborating with my colleague, DHA Tonushree, we created engaging slides and pitches for the entire team.

Current Landing Page

Developing the Timeline Showcase

Another significant task was creating a sample timeline object to embed on the site. This timeline showcases maps from different periods, from the Meiji era to post-World War II. After exploring various options, we decided to use TimelineJS, which allows us to customize and embed images and descriptions seamlessly. The timeline offers a compelling historical perspective, and we aim to include a wide range of maps beyond just military and topographical ones, such as those depicting spring areas in Japan.

Sample Timeline

Writing Instructions for Students

I also authored instructions for students who will create items and exhibits for the site next year. This involved finding examples of how to upload images, appropriate citation formats, and what to include in descriptions. This experience required continuous learning, particularly about different types of rights statements and the reuse of historical archives and images.

Instructions for the “Creator/Agency” Metadata Field

Why This Project Matters

The Mapping Japan project is more than just an academic exercise; it is an effort to diversify Carleton’s curriculum and highlight non-Western items in our library’s Special Collections. By digitizing, annotating, and publicizing these Japanese maps, we aim to create a rich, accessible online exhibit that will serve both current and future students. This project not only preserves historical artifacts but also provides valuable educational resources and opportunities for hands-on digital humanities work.

Conclusion

The Mapping Japan project has been a highlight of my year as a DHA. It has offered me the chance to collaborate with talented individuals, learn new skills, and contribute to a meaningful initiative. As we move forward, I am excited to see how this project will continue to grow and impact both the Carleton community and the broader field of digital humanities.

Gale Digital Scholar Lab and Constellate: A Comparison

In previous blogs, I have discussed how Gale Digital Scholar Lab (GDSL) can be utilized to create datasets and subsequently conduct various analyses on them. Recently, a comparable online ‘data lab’ has emerged as a contender: JSTOR’s Constellate. Much like GDSL, Constellate is an online platform, developed by JSTOR, designed to support digital scholarship by providing tools for text and data mining across a broad range of academic content. Researchers, data scientists, and advanced students can utilize Constellate to analyze and explore diverse datasets, conduct advanced text analysis, and gain insights from academic texts. The platform offers a suite of tools for tasks like clustering, n-grams, and topic modeling, and integrates with Jupyter Notebooks for users who prefer coding in Python or R.

In this blog, I intend to explore the distinctions between Constellate and GDSL, highlighting how each platform may be better suited for different purposes. I will assess them based on various criteria, including the quality of the database, access to full-text content, user-friendliness, and flexibility.

1. Quality of Database:

Both tools provide researchers with access to a database containing a wealth of materials. In the case of GDSL, this database comprises a diverse array of, in many cases, public domain content, ranging from historical newspapers and magazines to flyers and other ephemera. It is worth noting that the Gale database includes more academic sources alongside its ‘non-academic materials’. This incorporation of academic works into its database creates a blend of both academic and non-academic materials. This rich variety of content makes GDSL’s database an extensive resource for researchers seeking a broad spectrum of information for their analyses and studies.

Constellate employs the formidable database of JSTOR, esteemed for its comprehensive coverage of academic journals and papers across numerous disciplines, with particular strength in the humanities. This expansive repository offers researchers access to a wealth of scholarly literature, providing authoritative sources and profound insights for academic inquiries in fields ranging from history and literature to sociology and anthropology. While Constellate’s focus on academic content may mean it features fewer non-academic sources compared to GDSL, its emphasis on scholarly rigor and depth of coverage makes it an indispensable tool for researchers seeking to explore and analyze academic research, especially in the humanities.

Researchers seeking a combination of academic and non-academic content can benefit from using Gale Digital Scholar Lab (GDSL), which provides access to a diverse range of materials including historical newspapers, magazines, and other non-academic sources alongside academic works. On the other hand, for those focused solely on scholarly content, Constellate, with its extensive collection of academic journals and papers sourced from JSTOR, serves as an excellent resource. By understanding their specific research needs and preferences, researchers can choose the platform that best aligns with their objectives and maximizes the efficiency and effectiveness of their research endeavors.

2. Full-text Access:

While both Gale Digital Scholar Lab (GDSL) and Constellate offer general access to their respective databases, a notable distinction lies in their policies regarding full-text access. GDSL grants users unrestricted access to the full text of the content within its database, enabling researchers to delve deeply into the materials and conduct thorough analyses without constraints. This unrestricted access is particularly advantageous for users who require comprehensive access to the entirety of the available dataset for their research endeavors.

In contrast, Constellate adopts a different approach regarding full-text access. While users have general access to the datasets generated by Constellate, including metadata and select text snippets, full access to the complete text may not be readily available. Instead, researchers interested in accessing the full text of the datasets need to submit a special request. This additional step is likely implemented to adhere to copyright regulations and licensing agreements, especially concerning the academic content sourced from JSTOR. Consequently, Constellate’s approach to full-text access may involve a more structured process, potentially requiring users to navigate copyright considerations before gaining complete access to the textual content.

This disparity in full-text access reflects the differing compositions of the databases maintained by GDSL and Constellate. GDSL benefits from a substantial amount of public domain content, contributing to its ability to provide unrestricted access to the full text of the materials. On the other hand, Constellate’s database primarily comprises academic content sourced from JSTOR, necessitating careful consideration of copyright and licensing restrictions. A researcher must keep this key difference into account when making any decision about which tool to use.

3. User-friendliness:

Gale Digital Scholar Lab (GDSL) distinguishes itself with its abundance of automatic features and user-friendly interface, catering to researchers who prioritize ease of use and efficiency in their digital scholarship endeavors. GDSL’s suite of automatic features streamlines various aspects of text analysis, from data preprocessing to visualization, minimizing the need for manual intervention and technical expertise. This automated approach empowers researchers to focus on their analyses and interpretations without being bogged down by the intricacies of the tool itself. Additionally, GDSL’s intuitive interface further enhances user experience, making it accessible even to those with limited technical background or experience in digital scholarship.

In contrast, Constellate, with its reliance on programming and integration with tools like Jupyter Notebooks, presents a more complex environment suited for users comfortable with coding and advanced analytical techniques. While Constellate offers unparalleled flexibility and customization options through its programming capabilities, including the ability to write and execute code in Python and R, it may pose a steeper learning curve for researchers less familiar with programming languages or text analysis methodologies. However, for users proficient in coding and seeking sophisticated analytical capabilities, Constellate’s complexity provides a powerful platform for conducting advanced research and exploring complex datasets in depth.

Ultimately, the choice between GDSL and Constellate depends on the specific needs and preferences of researchers, as well as their level of technical expertise and familiarity with digital scholarship tools. GDSL’s automatic features and user-friendly interface make it an excellent choice for researchers prioritizing ease of use and efficiency, while Constellate’s advanced capabilities cater to users seeking greater flexibility and customization in their text analysis workflows, albeit with a higher degree of complexity.

4. Flexibility:

Constellate offers researchers significantly higher flexibility through its integration with programming environments like Jupyter Notebooks, empowering users to customize their analyses to suit their specific research needs. The ability to write and execute code in languages such as Python and R provides researchers with unparalleled control over their analytical processes, enabling them to implement advanced algorithms, develop bespoke visualizations, and explore complex datasets with precision and depth.

Moreover, Constellate facilitates transparency and reproducibility in research by allowing users to document and share the exact data or textual analyses performed within the platform. Researchers can provide detailed explanations of their methodologies, including the specific code used for data manipulation, analysis, and visualization, thereby enhancing the integrity and reliability of their findings. Additionally, Constellate enables users to share datasets fully, promoting collaboration and facilitating the replication of analyses by other researchers.

In contrast, while Gale Digital Scholar Lab (GDSL) offers a user-friendly environment for text analysis, its capabilities for customization and sharing are more limited compared to Constellate. GDSL’s focus on providing pre-built tools and workflows may constrain researchers who require greater flexibility or wish to document and share their analyses comprehensively. As a result, researchers seeking maximum control over their analytical processes, along with transparency and reproducibility in their research, may find Constellate to be the preferred platform.

Conclusion:

In conclusion, Gale Digital Scholar Lab (GDSL) and Constellate each offer unique strengths and cater to distinct user needs within the realm of digital scholarship. GDSL stands out as an excellent tool for beginners and researchers seeking to explore historical newspapers and other non-academic sources with ease. Its user-friendly interface and pre-built tools make it accessible to those new to digital scholarship, while also providing valuable resources for uncovering insights from diverse materials. On the other hand, Constellate emerges as a powerful platform tailored for users interested in humanities research and academic scholarship. With its integration of JSTOR’s extensive academic database and support for programming, Constellate provides unparalleled flexibility and depth for conducting advanced textual analyses and exploring scholarly literature. Researchers seeking to delve deeply into academic research and enhance transparency and reproducibility in their work will find Constellate to be an invaluable resource. Ultimately, the choice between GDSL and Constellate depends on the specific objectives and preferences of the researcher, with both platforms offering valuable tools and resources to support digital scholarship in their respective domains.

Updates for 2019 Fall Term: Creating a WordPress Site and Updating the Carleton DHA Page

During the fall term 2019, I’ve been working on the WordPress site and updating the Carleton DHA page.

In the former project, collaborating with professors from the Classics Department, I created CHIANTI site, a WordPress site. To add and organize various contents, I used several plugins: Elementor to organize the content pages, Shortcodes and List category posts to order posts sorted by categories on a page, Document Embedder to convert language learning sources to be downloadable, Smart Slider to use a video carousel on the student portal page, and Pods Admin to create a submission form for faculties.

chianti site
French page for the instructors (The var on the left shows the code for showing posts sorted by tags)

In the course of arranging and refining the site, I realized some tips which would be helpful when creating websites at another time. I’ll write them down for future use.

  • Clarify the audience and objects of the website.
  • When you get stuck, google for the troubleshooting first. There is maybe somebody who is in the same situation and already asked similar questions.
  • Be careful about the consistency – theme colors, fonts, font sizes……
  • When you are not sure which plugin to use, see their review, download numbers, the latest update date.
  • If you create a website and then yield control over it to the third party, make sure to create a concise and easy to follow instructional document. (preferably with some screenshots as needed) This is actually a great way to keep information in one place, such as the theme colors and fonts.
  • Finally, although there is a lot more to mention, communicating with partners/clients is crucial to improve the website closer to what they expect.

Regarding updating the Carleton DHA page, with permission to access and edit the page, I mainly updated the DH members for this year and the past projects. Updating past projects especially required some important things to keep in mind: 1) Use visually eye-catching screenshots of the project, 2) Check the copyright of the image within the screenshots, 3) Avoid controversial contents/images publishing on the web, 4) Make sure that private information is hidden.

As you’ve seen, I spent most of the time working with WordPress. For the next term, I hope I’ll be working with other types of digital tools.

Mapping the Fifteenth-Century London Chronicles: Experimentation and Collaboration

One of the projects I’ve been working on this year has been a textual analysis of the fifteenth-century London Chronicles for an English professor’s research. The professor hoped to identify and isolate place names in the text (such as London Bridge, Sussex, etc.) and make a map of all the data. This is where the Digital Humanities team came in: what software and digital tools could we use to extract this data and display it in an insight way?

The first tool we examined was Voyant, an online textual analysis tool that creates data visualizations. We uploaded a PDF of the London Chronicles to Voyant and played around with the website to see how it worked and determine whether it was effective.

A screenshot of the London Chronicles data visualization in Voyant

While Voyant was great for analyzing macro data sets and getting a holistic view of the text, it was rather ineffective for gathering specific iterations of place names and appeared no better than manual close reading for this purpose. One of the other problems we encountered were the variations in medieval spelling; for example, Voyant created a separate category for “London” and “Londan” even if they referred to the same place.

We then turned to a different tool to help map our place names: Edinburgh Geoparser. Geoparser created a wonderful map of the place names. However, it was unable to quantify the number of times a place name appears or arrange the place names in order of frequency. Thus, it was great for visualizing the places but not ideal for textual analysis.

The map of the London Chronicles created by Edinburgh Geoparser
The map of the London Chronicles created by Edinburgh Geoparser

Finally, after testing these different softwares, we stumbled upon a Gazetteer of Early Modern Europe which contained a list of place names, their spelling variants, and their location. We collaborated with a member of the Data Squad, a local Carleton organization dedicated to organizing data, to produce a program that would cross-reference The London Chronicles PDF with an XML of this data. In this manner, we would be able to get a reliable count of place names in the text that included their spelling variants.

The Early Modern London Gazetteer

This process has taught me that Digital Humanities is a lot of trial and error. In doing this research, I’ve learned there might not be one perfect tool for a project, but combining different resources and collaborating with others allowed me to find an innovative solution. This experimentation and sharing of ideas and research is vital to the work we do as Digital Humanities Associates.

Learning How to TA

This term, I’m making my first foray into the world of being an in-class Teaching Assistant (TA). In past terms I’ve worked as an out of class TA, holding office hours and offering outside support, but this is my first time actually attending class. This means that there’s some new things for me to figure out, but there’s also some things that I learned from being a TA last term that still apply.

Ana and I were out of class TAs for a Classics course last term and I learned some important things from that experience. One thing I always try to do now when I’m working with a student is check what they do know. Immersed as we are in the world of metadata, I didn’t think to explain what metadata itself was. But pretty early on we got that question – what exactly is metadata? And once we got that question, it made sense. Metadata was not something they were studying in class, so there was no expectation that they would know what it was. After that, I made sure to check with students what they knew about Omeka and metadata first, so I would know where to start that would be most helpful. Because of course there is also the flip side to this problem – if a student is familiar and comfortable with metadata, there’s no need to explain it. So I always found it most helpful to check first before beginning any explanations, so I could meet the student where they were.

An aspect of being a TA that is absolutely new to me is being in class with the students. On Wednesday there was time in class for students to work on an assignment in pairs. I was a bit shy about going up the students when they were working, and at first just wandered and waited for someone to ask a question. I realized after a little while that actually approaching the students was more helpful. While when I wandered past the students wouldn’t ask any questions, if I prompted them with a simple, “how’s it going for you?” they frequently would ask me a question. So although I was shy about doing asking them directly, it was more productive for both of us if I did. I’m still trying to get more comfortable in my new role, but I’m learning some good approaches along the way in order to provide assistance for both the professors and students in the most helpful way.

See some of the work the class has been doing on the blog!

Course Blog for Bringing the English Past to Virtual Life

Neat things you can do with Neatline

I have spent a large portion of my worktime this term looking at mapping tools and ways of visually representing different kinds of geographic data. We used ArcGIS for creating a map of Bede’s England (you can read more about it here) and that worked really well for the purposes of the project: it allowed to denote different types of objects (e.g., a town, a monastery, etc.) with different symbols and link pages that alluded to those places and gave a clear visual representation of the objects. However, as much as I love ArcGIS (I think it’s really cool!), one can imagine wanting to do certain things with maps that it’s not perfect for.

This term I started working on the Carleton Guide to Medieval Rome to which students who go on the Carleton Rome OCS program contribute pictures and stories about different monuments and places in the Medieval city. The goal is for site visitors to be able to go on a particular “walk” and, while on that walk, learn about the pieces of the Medieval Rome they encounter along the way. So I have started looking for a platform that might be better suited for this goal than ArcGIS and found a really nifty one – Neatline, an exhibit builder that uses Omeka items to create interactive stories including timelines, pictures and georeferenced historical maps (I’ll explain what that means below). I have only started playing around with it but have already discovered a wide array of really great tools.

First, you can use actual historical maps (scanned images of old maps, that is) to overlay the basemap. That is a fantastic visual way of integrating the ancient or Medieval city into a modern one – and bring a historic map back to life! I used an 1830 map of Rome from David Rumsey map collection above (“1830 is not Medieval!” you might astutely observe. You are totally right – I picked this one as an example mostly because I liked how it looked). To do that, I had to georeference the map – a fancy term for identifying several common points on both maps – using a very straightforward online tool called MapWarper and then exported it to Neatline.

Then I added a couple of objects onto the map – in the screenshot above you can see the Temple of Hadrian, or Hadrianeum. I first created a new item in Omeka and then imported it into the Neatline exhibit – you can import all items with a certain tag which will be handy if a lot of images need to be added. Neatline places the point on the map for you if you include the coordinates in the metadata – that part turned out much more confusing than it sounds, however. It turns out that the coordinates need to be in a very specific format, WKT (or well-known text). Wikipedia told me that the format for a point is POINT(# #). Unexpectedly, when I entered the lat and long numbers, Neatline placed the temple of Hadrian…in the ocean off the west coast of Africa. After some frustratingly futile googling, I found out that the coordinate WKT uses coordinates in a coordinate system called Pseudo-Mercator, and the lat and long values need to be flipped (I’m still puzzled by all of that).

In addition to adding points on the map, you can also add lines and geometric shapes. Lines of different colors and points could be used to represent different walks across the Medieval city of Rome – I’m excited to try that and see how it turns out!

 

Getting the ball rolling on Unity

While my colleagues have been migrating content from our old website to our new not-yet-live one, I’ve been working on learning my way around the Unity game engine, a major part of Project Workhouse. I started from scratch with only a vague knowledge of C#, the language that customized scripts are written in for Unity. In the past week I’ve worked through the “Roll a Ball” tutorial and gotten started on importing assets like textures and materials and using them to good effect in a 3D game environment. – Bard

Here are some snapshots from my version of the “Roll a Ball” tutorial: