Will’s Introduction

Hi all, I’m Will Shrestha! I’m from Saint Paul, Minnesota. I’m a junior Computer Science major and Digital Arts and Humanities minor. I worked as a TA for Hacking the Humanities the past two terms and learned about Digital Humanities Associates from Austin Mason. This will be my first term working as a Digital Humanities Associate. I hope that this job will help me experience working with a team in a professional setting and also teach me more about intersections between computer science and the humanities.

My headshot

Some other academic interests I have are music and psychology. I take saxophone lessons here at Carleton and am currently enrolled in Cross-Cultural Psychology which I think is really cool because it discusses the differences in psychology in cultures around the world.

As for extracurriculars, I am a member of the Carleton Karls (ultimate frisbee team), MOSAIC (Mosaic of South Asian Interests at Carleton), Mixed Club, and Club Tennis.

Some non-Carleton related interests of mine are my pets and playing video games with my friends. I have two dogs and two cats (one cat, Scoop, is living on campus with me as an ESA). Two of my favorite video games at the moment are Bloons Tower Defense 6 and Stardew Valley.

Here’s a picture of Scoop in my dorm!

Picture of my cat, Scoop

Using Generative AI for DH

Digital Humanities (DH) is brimming with passionate individuals eager to explore the depths of human culture and history through the lens of technology. Now, a revolutionary tool is transforming the way these enthusiasts approach their work: artificial intelligence (AI). This article delves into the exciting applications of AI in the DH landscape, exploring how it can power people to work smarter, delve deeper, and unlock new avenues of understanding. I will show various ways in which AI can be used for DH work.


While AI cannot replicate the human touch and creativity fundamental to writing, it offers a diverse toolbox that can significantly enhance the writing process. From overcoming writer’s block and generating initial ideas to conducting research, checking grammar and style, and even exploring different writing styles, AI provides writers with a range of valuable tools to streamline their work and fuel their creative exploration. However, it’s crucial to remember that AI serves as a collaborator, not a replacement, in the writing journey. It is the human writer who ultimately wields the power of the pen, harnessing the capabilities of AI to refine their craft and unleash their unique voice.

Generative AI such as ChatGPT and Google Gemini can help in multiple ways with writing. They can make points for your blog, essay, or post. They can correct any grammatical mistakes. They can rewrite certain sentences. They can get you started with writing by writing the introduction to your piece. The figure below shows part of a response by Google Gemini AI when I asked it if AIs can write.

While one can not fully rely on AIs to write, they certainly are very useful as writing tools especially when you are providing them your own ideas. There are certain copyright implications when AIs are used for the generation of images but these concerns are highly reduced when AIs are employed for writing, especially when the human user is providing a unique idea that can be employed by the AI to write.


AI has revolutionized the field of translation by offering a suite of powerful tools and techniques that enhance the efficiency and accuracy of the translation process. Machine translation systems, such as Google Translate and DeepL, employ advanced algorithms like neural machine translation (NMT) to translate text between languages. These systems continuously improve through machine learning, analyzing vast amounts of translated data to refine their translations and capture nuances more effectively. Furthermore, Generative AIs such as Gemnini and Chat GPT also have their own peculiar way of translation that is distinct from tools like Google Translate. AI-driven translation memory tools, like SDL Trados and MemoQ, store previously translated segments and suggest them to translators when encountering similar content. This not only accelerates translation but also ensures consistency across documents and projects. Natural Language Processing (NLP) techniques further enhance translation quality by enabling AI systems to understand and generate human language more accurately. NLP algorithms analyze sentence structures, grammar rules, and contextual clues to produce translations that are contextually relevant and linguistically precise.

In addition, AI assists in managing glossaries and terminology databases, ensuring consistency of terminology throughout translations. These tools automatically identify and suggest appropriate translations for specific terms, reducing errors and maintaining coherence. AI can also aid in post-editing machine-translated content by providing suggestions for improving fluency, readability, and accuracy. Post-editing tools analyze translated text and offer alternative phrasing, correct grammatical errors, and highlight potential mistranslations for human editors to review and refine. Moreover, AI-driven content generation platforms assist in creating multilingual content by automatically translating existing texts into multiple languages. While these systems may not match the quality of human translation entirely, they serve as a valuable starting point for further refinement by professional translators. Overall, while AI has significantly streamlined and enhanced the translation process, human translators remain essential for tasks requiring cultural understanding, creative adaptation, and linguistic nuance, ensuring the highest quality of translation output.

Many such services are still under-development and free access is limited to Chat GPT and Gemini but in the future, we can expect to get more access to such tools that will significantly increase the speed and accuracy of translation. This can have major implications for DH work in various languages and for creating multilingual DH projects.

Image Generation

The realm of visual creation is undergoing a dramatic shift with the emergence of AI-powered image generation. This innovative technology empowers users to translate their written descriptions into stunning visuals, spanning the spectrum from photorealistic landscapes to abstract artistic expressions. Tools like DALL-E and Midjourney allow users to describe their desired image using specific keywords and phrases, prompting the AI to generate visuals in various styles, color palettes, and compositions. These tools unlock a universe of possibilities for artists, designers, and even casual users, enabling them to bring their creative visions to life in an entirely novel way. However, it’s crucial to acknowledge that AI image generation is still in its infancy. While tools like Stable Diffusion offer advanced customization options like image size and specific details, ethical considerations remain paramount. Concerns regarding potential biases within the training data and the ownership of AI-generated artwork are crucial aspects of this rapidly evolving technology. As this technology continues to develop, addressing these concerns will be essential to ensure its responsible and ethical application in the realm of visual creation.

If these ethical concerns are settled, something which seems unlikely, then these image generation AIs can prove to be very helpful for DH work, helping us create pictures and illustrations. OpenAI is now even testing video generation which can prove to be even more useful and help with a variety of DH projects.


Another field in which AI can be very helpful is generating code. AI is revolutionizing code generation, aiding developers in various tasks. Through neural networks, it offers auto-completion tools, speeding up coding with intelligent suggestions. It also assists in code synthesis from high-level specifications, enabling faster development. AI aids in refactoring and optimization by identifying inefficiencies and suggesting improvements. Additionally, it facilitates rapid prototyping by generating and refining code iteratively. Despite challenges, AI promises to reshape software development, making it smarter and more efficient. The figure below shows what Gemini AI gave as output when I asked it for a certain code.


In conclusion, I think it is important to acknowledge the various ways in which AI can help us make our work better and more efficient. At the same time, there are technical and ethical concerns that are attached to it. Technical concerns include that the writing style of AI is different than humans, the code it might generate might be wrong, the images might have some problems, or the translation output by it has problems. At the end, we need to find the errors and correct them. That is where the human factor remains very important.

Noah’s Introduction

Hello! My name is Noah and I am from Kuala Lumpur, Malaysia. I am a first-year student interested in Computer Science and Statistics as a major – pursuing a career in software engineering or data science. Aside from computers, I am also interested in History – European history in particular – and I’m also on the swim team here. This will be my first year working in Digital Humanities and I am really excited to work on projects and get to know everyone here. I like making things with computers and solving problems so I hope this position will expand on what I’ve learned in the classroom and improve my skills in the workplace.

I’m interested in data analysis and modeling. With my limited work in stats so far, it has been really interesting to show data in interesting ways and find interesting relationships that people would otherwise overlook. Aside from that, I hope to be able to understand Omeka and WordPress more and be able to make, and help other people, make awesome things.

Aside from that, I like pets and beaches, and I can’t wait to experience my first Minnesota winter. I am also a lifeguard who works at West Gym on Mondays and Tuesdays at 11am so come to West Gym at those hours! Malaysia’s really awesome so if you have any questions about there or really any other thing about Asia or cats feel free to ask me! I have 3 cats – Hiroshi, Leo, and Diana and my girlfriend has like 9 cats so I know quite a lot about them.

Cat named Putih

Using Gale Digital Scholar Lab: Utilizing n-grams

An introduction to GDSL and its tools has already been given in a previous blog post. In this blog, I will attempt to explain the utility of another GDSL tool, namely n-gram. An n-gram is a contiguous sequence of n items from a given sample of text or speech. These items can be characters, words, or even other units like phonemes or syllables, depending on the context. N-grams are widely used in natural language processing (NLP) and computational linguistics for various tasks, including language modeling, text analysis, and machine learning.

The “n” in n-gram represents the number of items in the sequence. Commonly used n-grams include:

  1. Unigrams (1-grams): These are single items, which are typically individual words. For example, in the sentence “The quick brown fox,” the unigrams are “The,” “quick,” “brown,” and “fox.”
  2. Bigrams (2-grams): These consist of pairs of adjacent items. In the same sentence, the bigrams would be “The quick,” “quick brown,” and “brown fox.”
  3. Trigrams (3-grams): These consist of sequences of three adjacent items. For the same sentence, the trigrams would be “The quick brown” and “quick brown fox.”

N-grams are often used in language modeling to estimate the probability of a specific word or sequence of words occurring in a given context. They are also used in various NLP tasks, such as text generation, machine translation, and sentiment analysis. N-grams provide a way to capture some of the context and relationships between words in a text, which can be useful for many language-related applications.

In GDSL, the n-gram analysis can be used in two ways:

  1. Word Cloud: Word Cloud is a visual representation of a collection of words, where the size of each word is proportional to its frequency or importance in the text. Typically, word clouds are used to quickly and visually convey the most prominent words in a piece of text, making it easy to identify the most common or significant terms at a glance.
  2. Term Frequency: Term Frequency (TF) is a fundamental concept in natural language processing, information retrieval, and computational linguistics. It serves as a quantitative measure of the frequency of occurrence of a specific term or word within a document or text corpus, thereby aiding in the assessment of the term’s significance and relevance in a particular textual context. In essence, TF offers a means to quantify the emphasis placed on individual terms within documents

Both these tools can provide a useful way to understand the main concepts, ideas, and words in a textual corpus. Here is an example of a word cloud made from our test content set.

To attain precision in n-grams, qualifiers in search can be utilized. First, create a content set CS with parameters X and Y. Then generate a hypothesis Z about CS. Z could be about the influence of another factor, an explanation behind certain events, or a correlation with other factors. Once the hypothesis has been generated, incorporate it into your search by adding yet another parameter that corresponds to Z. Now, the new content set created by parameters X,Y and Z would be a subset of the prior content set. Analyzing (A∩B)’ union would give insight into what data was not taken into account when parameter Z was introduced. This can usually aid in identifying different clusters of data within the same corpus. In this case, the word clouds can also aid in visual identification since the word clouds would appear to be different for the two content sets.

For example, compare the first word cloud of the data set with parameters X and Y where X = Pakistan, Y = War and function = AND. The hypothesis here was that in this content set, there are two clusters; one that reports the war between India and Pakistan and another that reports the war between Pakistan and Afghanistan (and the Soviet Union). To check for this, parameter Z was added (Z = India). Given this, (A∩B)’ must be analyzed. And rightly so, Soviet is not found in XYZ but is available in XY. This confirms our hypothesis.

Although this might be a little complex, it can help greatly in understanding and qualifying data.

The place of the missing data can also tell about the frequency of it in XY as a whole.

Using Gale Digital Scholar Lab: Achieving Precision In Document Clustering

One tool that can be used for Digital Humanities is the Gale Digital Scholar Lab (henceforth: GDSL). GDSL is a database of various texts that can be used for analyzing, finding, cleaning, and organizing data using natural language processing (NLP). The toolset for textual analysis provided by GDSL includes document clustering, named entity recognition, n-grams, parts of speech, sentiment analysis, and topic modeling. All these analyses can be used to understand and categorize data in different ways. Such analyses are useful for scholars who aim to study trends and correlations in texts of any certain types. Currently, Carleton has access to 21 textual databases including American Fiction 1774-1920, American Historical Periodicals from the American Antiquarian Society, Archives of Sexuality and Gender, Archives Unbound, British Library Newspapers, Decolonization: Politics and Independence in Former Colonial and Commonwealth Territories and more.

In this blog, I aim to study one of these tools provided by GDSL and present ways to make it more precise and exhaust more of its capabilities. This tool of analysis is Document Clustering. To begin document clustering, first of all, we need to search for appropriate data that can be used to create a Content Set. The Advance Search feature can be used to generate Content Sets with specific characteristics. Search operators and special characters can further help in creating precise content sets.

A combination of different search terms, operators, and special characters would result in the generation of an appropriate dataset. One important parameter that can be used is “word1 nx word2” where x stands for the number of possible words between word 1 and word 2. For example, if you want to see all the sources in which “Ireland” is mentioned in 10 spaces near “Finland”, you can search “Finland n10 Ireland”. After searching, you will see all your results and they can be added to the content set by selecting the “Select All” and “Add To Content Set” options.

Once you have created the Content Set, it can be used for further analysis. As you can see, I got 53 results and I have added all of them in a test content set. Now, I will use Document Clustering tool on this content set. The document clustering tool can be accessed by My Content Sets > Analyze > Document Clustering.

By clicking the “Run” option, you will be able to run the analysis on the given dataset. I have run a basic analysis on my dataset. Now, I will show you how the output of the analysis can be better understood and utilised to the best extent. This is the initial output of my first run with two clusters.

Please note that GDSL does not tell you what the y-axis or x-axis is but there are ways you can understand the output in a more comprehensive manner. The very first thing to do is to just manually compare and contrast the data points available in the two clusters. I attempted to do this with the clusters I generated. I saw that cluster 2 (the orange cluster) contained more philosophical works whereas cluster 1 (blue cluster) contained more general works such as history, literature and news. This gives me a general idea of what the x-axis (or perhaps the y-axis) might mean for this graph. The higher the x value, the more philosophical the work might be.

Another good way to understand the output is to increase the number of clusters. You can change the cleaning configuration and No. of clusters of the tool by going to Document Clustering > Tool Setup (grey toolbar on left) > Cleaning Configuration/Number of Clusters. Below is the setup I used for my second test run. Rather than using 2 clusters only, I used 3 clusters.

The graph generated for 3 clusters looked like this:

Given this cluster, I aimed to find out the main difference between the three clusters. I found out that the third cluster in this graph only included magazines. The second cluster also included magazines (but more of an academic nature rather than literati nature).

In addition to this, you can also revise your dataset and search for terms in them. This can also help you find out what are the classifications being made in the clusters. It would not always be obvious what the cluster contains but a close look and analysis can provide more information.

Tonushree’s Introduction

Hello! My name is Tonushree and I’m a senior from Mumbai, India. I am majoring in Cognitive Science with a Digital Arts and Humanities minor. When I’m not studying/working, I love creating art, exploring various nooks and crannies of campus and Northfield with my friends, and getting coffee with someone different every week to keep me on my toes. I also enjoy acting in Carleton productions. Apart from being a Digital Humanities Associate, I am also a Costume Shop Assistant at the Theater and Dance department and an Office Assistant for the Psych and Cog-Sci department!

Here is a picture of me soaking up the sun during my study abroad, Studio Art in the South Pacific
Here is a picture of me soaking up the sun during my study abroad, Studio Art in the South Pacific 🙂

I thrive in settings of interdisciplinarity, where a project, theory, or concept needs to be understood from several different lenses, be it philosophical, sociological, economical, linguistic, or digital to name a few. I’m excited by the fact that being a DHA gives me the opportunity to simultaneously dabble in multiple humanities classes and work on or observe the digitization of their projects. I would love to further my interests in art, storytelling and creative visualization by learning how to turn them into digital projects during my time as a DHA!

Henry’s Introduction

Hello! I’m Henry, a sophomore from Minneapolis, MN. Here at Carleton, I’m interested in computer science, cognitive science and economics/public policy. This is my first year as part of the DHA team, and I’m excited to start getting involved in some Digital Humanities projects. While I love the theoretical/technical side of computer science, using these techniques as tools to build interesting projects has always been more rewarding for me than just theory alone. I’m hoping in my role as a DHA, I will have the opportunity to apply my formal knowledge and experience in web design and app building to projects across Carleton’s humanities disciplines. 

headshot of Henry

I’m interested in the process of distilling large amounts of information into digestible analysis. This year, I’m hoping to learn more about how to do this well with digital tools. I’m also interested in learning more about visual storytelling with maps and other interactive web-based experiences.

Outside of school, I love playing music and spending time outside. On campus, I’m a board member for KRLX (our student-run radio station), a software developer on DataSquad, I live in Carleton’s Outdoors House (Wade house), and I can often be found in the climbing gym, exploring the Arb (by ski or foot), or plotting overly complex pranks on my friends. 

Cynthia’s Introduction


My name is Cynthia Leng and I am a junior Statistics major from Beijing, China. This is my first year working as a Digital Humanities Associate (DHA), and I am really excited about the interesting projects I will be working on.

I got interested in digital humanities when I started working with the Carleton Archives. My role in the Carleton in China project introduced me to digital tools and content management systems, and I am eager to learn more about them as a DHA.

In my free time, I sing with the Carleton Choir and a Chinese a cappella group. I also enjoy traveling, watching movies, and playing badminton with my friends.

I love watching sunsets.

Thanks for reading about me!

Erin’s Introduction

My favorite part of Carleton is that there’s plenty of space to follow curiosity – no matter what discipline the topic falls into. I’m excited to work as a Digital Humanities Associate this year because I love helping other people bring their curiosity to life. Selfishly, I also get to see more cool projects.

As an example, last Spring I began working as a DHA, and I worked on the 3D model viewer for CARCAS, which is an archaeology department project to display high quality scans of bones. The best moment of that project was showing Sarah, the faculty member leading the project, that the mobile version of the site lets you virtually put the bones on real world surfaces around you using your phone camera. She was delighted! I’m so proud that I was part of what made that moment happen, and I’m looking forward to more moments like that this year.

The author appears to be holding a goat skull, which is actually the AR component of the CARCAS model viewer.
Here I am, holding a goat skull! In reality, nothing was in my hands; this is a feature of the mobile version of the CARCAS model viewer that I worked on last spring. Photograph by Cynthia Leng.

A little bit more about me – I’m a senior, and I’m a math major and digital arts and humanities minor. I love all angles of math, from the beauty and symmetry of abstract math to the nitty gritty computational considerations of working with real data. I also keep coming back to maps across a variety of academic fields, from medieval maps to storytelling maps made in the present day.

I’m looking forward to a great year working as a DHA!

Reflections on Liberal Arts and Sample Site

“You can know something about everything,” These were my thoughts when I was applying to Carleton. The idea of Liberal Arts really fascinated me. I am the kind of person who tries to know at least something about everything. I think Liberal Arts are really important and I am very happy that I was able to do a small part in promoting them.

This term I worked on making an Omeka Sample Site that serves two functions: 1) Provide a sample site to Carleton students and community and; 2) Provide some basic information about liberal arts. The site can be accessed here. This site was also added to reclaim’s EdTech resource list:

The website can be accessed by students who aim to work on their own Omeka projects in future. The main page of the website looks like this:

I think Liberal Arts are important and help the students in becoming critical thinkers and solve novel problems. With the increasing influence of technology, it is important we add these technological advancements in our humanistic, literary and scientific studies. I really enjoyed working on this project and I hope this is helpful for students as well.

