What Exactly Does the Internet Know About You?

I get it – Facebook, Twitter, Google – they own me. They have all my data: the ads I click, the things I search, the pages I visit. The implications of this lack of privacy have been unfolding slowly, but with a dampened sense of urgency, until the recent Cambridge Analytica revelations, and now, people are realizing too late how valuable their data is.

But here’s the thing – although I understand that the broad implications of this privacy breach are very serious, on a personal level I just find it, quite frankly, a little difficult to care. I’m right on the edge of the generation that was thoroughly indoctrinated by the internet from Day 1. I’m too young to remember dial-up, but old enough to remember when the iPhone came out; one of my earliest memories is my parents getting their first cell phones (my mom had the iconic and beloved Motorola Razr), but I never hung out in AIM chat rooms or had a MySpace. So yes, I vaguely remember a world without internet surveillance, but I came of age in the midst of this new era, so for me it’s just reality; it’s the price you pay for (monetarily) free social media and access to unlimited amounts of information. Anyone younger than me won’t even know life without this surveillance. If nothing else, it’s mildly comforting to know that Google’s got everyone’s data, not just mine.

But not caring is a dangerous pattern to fall into, because it’s fine until it’s not fine. It’s fine when Facebook just knows that I like watching videos about artisanal chocolate making, but it’s not fine when widespread demographic targeting influences a presidential election, which is to say, it’s fine until we noticeit. And at that point it’s too late.

The fact is, unless you’re willing to become a recluse or forego many of the incredible advantages of the internet and mobile technology, there isn’t an enormous amount any of us can do except be careful about what we post, click on, and search for (which you should always be doing anyway). But the one thing you cando is to stay educated. In the wake of the Cambridge Analytica scandal, I believe it’s important for everyone to know exactly what information various sources have on you. It won’t stop them from using it, but it may make you aware of how targeted advertising is affecting your online experience.

Most social media outlets allow you to download an archive of the data they have on you; they just often make it very difficult to find. Here’s a guide to how to get some of that data:

To get all the data Facebook has on you…

Go to Settings > in tiny print at the bottom of your settings click “Download a copy of your Facebook data”

To find out how Facebook categories you…

Go to Settings > Ads (on the left sidebar navigation) > Your Information > Your Categories

To get all the data Google has on you…

Go to myaccount.google.com > Control Your Content > “Create Archive” > Pick what you want in the archive and click “Next” > Choose file settings and click “Create Archive”

*A note about Google’s data archive: For all the work Google puts into making sure your Google calendar syncs seemlessly with your Google Gmail and your Google Docs are all stored in one big happy Google Drive, Google clearly isn’t invested in making sure your Google archives experience is just as convenient. A lot of important and interesting Google data, like your entire search history, is just tossed into a JSON and handed over to you. What about the vast part of the population that doesn’t know what a JSON is? Doesn’t know how to read a JSON? Doesn’t know how/have the tools to open a JSON on their machine? Google, you could do better.

To find out how Google categorizes you and what ads they think you like…

Go to adssettings.google.com.

To get all the data Twitter has on you…

Go to Settings and Privacy > Your Twitter data (left sidebar navigation) > Scroll all the way to the bottom and click the small print that says “Request Your Data”

To get all the data Snapchat has on you…

(Don’t panic – this doesn’t include every snap you’ve ever sent. It’s mostly account info and statistics, ads you’ve interacted with, and timestamps of every snap you’ve sent, with the actual photos redacted. Oh right, it does include Snaps you’ve recently submitted to Our Story, though. Every. Single. One.)

Go to accounts.snapchat.com > Click “My Data” > Scroll to the bottom and click “Submit Request”

Presenting Interdisciplinary Research

This winter term, I double compsed (for any non-Carleton readers: “comps” is the equivalent of a senior thesis or capstone project – it stands for “comprehensive exercise”). For both of my comps, one in computer science and one in English, I was lucky enough to have the opportunity to do digital humanities projects, but this posed a problem when I was required to give a presentation for each project at the end of the term.

For both presentations, my audience was a mix of humanities people, computer science people, and people who lie somewhere in between. How do I give a presentation that accommodates my entire audience? How do I explain the tech to the humanities folks, and contextualize the humanities for the tech folks?

Here are some rules for interdisciplinary presentations that I created for myself while planning my comps presentations:

Either explain jargon or put it in a black box. Combining tools from multiple disciplines is going to cause a vocabulary problem. You can’t say, “I ran text files of each novel through a Python script that used the NLTK’s POS-tagger to tag each word, then iterated over the tagged tuples to count occurrences of different parts of speech,” and expect anyone who’s never coded before to follow. Either take the time to explain what the NLTK’s POS-tagger is, or just say “I used a tool to get the part of speech of every word in the text.” The same goes for humanities lingo – make sure your entire audience clearly understands what close reading or deconstruction is before using those terms to contextualize your results.

Signpost. In an interdisciplinary presentation, it’s not unreasonable to expect that at least part of your audience is going to get lost at some point. Unless you’re going out of your way to explain every STEM concept and humanities context (which would make for a very long, very boring presentation), at some point someone is going to get lost. But that’s ok! Divide your presentation into clearly defined sections, and at the beginning and end of each section, talk about what you’re going to or have just explained, so that everyone can grasp the broader concepts. Even if someone gets lost within a section, with signposting they’ll hopefully be able to jump back in in the next section.

Include something for everyone. If you’re giving an interdisciplinary presentation, it should be truly interdisciplinary! Acknowledge the different subgroups of your audience and make them feel like they are a part of the conversation by including details from each discipline of your project, and not over-explaining as if they weren’t there. This rule almost contradicts my first rule, and the two can be hard to balance. The goal is to find a happy medium for each discipline between including enough interesting detail for the experts and enough explanation for those unfamiliar with the discipline.

To Map or Not to Map?

During this year’s fall DH training, the DHA’s got some practice using ArcGIS, an online mapping tool that makes it relatively easy to create your own customized maps in one sitting. This post discusses some of the pros and cons, advantages and pitfalls of mapping data. (Note that by mapping I am referring strictly to the use of geospatial maps, not to the more general application of the term that includes graphing.)

Why use a map? Mapping is fun and exciting, and it’s a relatively easy way to build a data visualization that’s interactive and easily facilitates instantaneous spatial comprehension of the data. For these reasons, people are often quick to jump on the “let’s map it!” train whenever there is spatially relevant data. But it’s important to stop and ask this question first: what will a map add to this project that other data visualizations will not? Sometimes, sparsity or lack of variation in your data should disqualify the map idea.

Take this example from Stanford’s Professor Martin Evans, which maps specific locations in and around London that are referenced in works written by authors from London. There’s an abundant amount of data in this data set, and the locations are spread all over London – mapping helps us understand the data, so mapping was a good choice. If, however, you were mapping only locations in London referenced by Sylvia Plath, you might think twice about whether the <10 data points clustered in one small location is worth putting on an interactive map.

Once you’ve determined that a map is worth your time, you might next consider what kind of spatial information you want to convey. Is the data represented well by points on a map? Or is there a path or order to these points? How can you visually differentiate between different paths or groups of points (hint: colors)? Try to create a map that accurately visualizes the story you’re trying to tell with your data. In this example, students at the Georgia Institute of Technology recreated the paths taken throughout the day by characters in Mrs. Dalloway. The smooth, continuous paths tell a better story than a series of sequential points would, and the colors make each path stand out from the others. Above all else, mapping should make it easier for your audience to understand your data, so think hard about how you’re transferring your data to your map. And use colors!

Don’t forget that an important part of mapping is the base map itself, not just the points you put on it. Much of the time, simpler will be better – if the story you’re trying to tell has nothing to do with the terrain of the area, don’t clutter your visual with a terrain base map. Humanities scholars are often excited about using historical base maps, which are historical maps that can be georeferenced onto a modern, digital map of the same location by matching specific points between the two locations. One common problem with historical base maps is that many historical maps are not geographically accurate, so georeferencing them can stretch and distort them to an unusable extent. For example, this 1853 map of Maine from the David Rumsey Map Collection is quite geographically accurate, and would work well as a georeferenced historical base map, but this 1935 world map of post office and radio/telephone services from the same collection is highly geographically inaccurate and would have to be significantly distorted to be georeferenced onto a modern 2-dimensonal map of the world.

Finally, consider how you will communicate the data for each point or path on your graph. Points and paths don’t always speak for themselves, and there will often be metadata or a paragraph of information that necessarily accompanies each data point. How will your user access this information? Is there a key that goes with the map? Do you click on a point to reveal the associated text? Does each point link to more information?

There are many ways to address the above issues and questions, facilitating lots of creativity and flexibility within each project. Above all else, no matter how you approach a mapping project, your map should always give a clear and intuitive answer to the question: what story is this map trying to tell?

Welcome Back!

Last week I arrived early on campus to participate in the fall term DHA training. I didn’t get to take part last year because I was abroad in the fall, so it was a new experience for me. There’s one difference between this year and last year that immediately stands out – since I was able attend the training this year, I had an opportunity to work with and get to know the other DHAs and new DH interns before the term officially started. This was my favorite part of training, and I’m hoping it’ll get us off to a great start this year. I find it so much easier – and more fun! – to work with others when we’ve already eaten deep-fried food from Jesse James Days by the Cannon River together.

At the end of spring term last year I attended the digital humanities conference that I had spent all of spring term helping to organize. Although I was one of only a few students who attended and felt initially intimidated by the sea of “real adults,” I became increasingly aware throughout the course of the conference that I knew what I was doing. I understood a lot of the jargon, I was able to intelligently contribute to conversations, and, most importantly, I felt like I deserved my place in those conversations. In short, it was really cool. A year ago, I wouldn’t have been able to do that. This year, I’m excited to build on that confidence as I expand my DH toolbox. Not too long from now I’ll have to leave Carleton to join the leagues of “real adults,” and I think some confidence will come in handy.

Learning about NLTK

In the past couple of week I’ve been helping to update the curriculum for a fantastic project called DH Bridge. This curriculum includes a one-day programming bootcamp for people with no computer science experience (and particularly those who are also involved in the humanities) to learn some basic Python skills. I’ve had so much fun doing the tutorial along the way because it focuses on text analysis using the Natural Language Toolkit (NLTK), which I wasn’t previously familiar with, but includes some really cool tools for natural language processing. You can download NLTK for free and use the many Python libraries it has available to do text analysis day and night! Here are a few of the things I learned:

  • NLTK has a built in method for getting word frequencies, and it’ll spit out the n most common words in a text (you decide what n is) along with the number of times that each word appears, in order from most to least frequent. Nothing too complicated – but it’s a great (and very useful) starting place.
  • Want to see the context in which a certain word appears throughout a text? This method takes a single word as a parameter and prints out each instance of that word within its surrounding text. For example, here’s every instance of the word “trial” in Harper Lee’s To Kill a Mockingbird.

This is a great way to get a sense of how a word is being used throughout a text without having to Control+F your way through the whole thing.

  • This one is my favorite because I think it’s so cool. You give it a word and it returns the twenty words that are “most similar” to that word in the text. I haven’t looked too far into how it works, but the method somehow determines which words are most often used in a similar context to the given word. For example, here are the results for the word “trial” in To Kill a Mockingbird.

Some words, like “court” and “newspaper” are pretty self explanatory, but we may question why a word like “family” is so closely associated with the word “trial” in this novel.

Even with these very simple searches, it’s already easy to see the kind of information you can get out of a text that the human eye wouldn’t necessarily be able to see. Yay digital text analysis!

How To Do Your Job When You Don’t Know How To Do Your Job

The cool thing about this job is that I get to constantly be doing new things and jumping into new projects. The flip side to this, however, is that each project is unique and requires very different skills – skills that I (very often? most of the time?) don’t yet have. So this term, I’ve been getting used to the fact that not having a skill to do a certain job doesn’t mean I don’t do the job, it means I get to learn how to do it. The question then often becomes, “Where do I even start to learn how to do X?” The following are some tips and tactics I’ve been working on using when I’m faced with a daunting task that I’ve never done before:

  • Just ask. This seems obvious, but it’s often much easier said than done. People don’t want to risk sounding dumb by asking questions, but 1) people probably won’t actually think you’re dumb, and 2) isn’t it better to ask and learn how to do something correctly than spend all your time doing it wrong?
  • Google it, but be smart about it. Again, this seems obvious, but Google is a gift and a curse. Be wary of bad advice (you wouldn’t cite a Buzzfeed article for an academic paper, so why should you take serious advice from it?), and think hard about the search terms you use (be precise, try a variety of related terms, etc…).
  • Pretend that you know what you’re doing. I love this tactic. Sometimes I know that I don’t know what I’m doing, but I don’t know what I don’t know, so I just start working until I get stuck in order to figure out where the problem is. It’s a really great way to pinpoint exactly what you don’t know.
  • Use sites that were created for these situations, like Lynda.com. If you’re a Carleton student, you already have a subscription! Even if you can’t find a video to explain exactly what you’re supposed to be doing, it can help you to get a hang of the general terminology relating to the task at hand or the basic functionality of a tool you’re learning to use.
  • Look for existing examples. Chances are you’re not the first person to do anything, so it’s a great idea to find examples of best practices and conventions. This is true for pretty much anything, but particularly when you’re doing something totally new.

Of course, the best part about not knowing how to do something is that you get to learn how to do it and then a week later when one of your colleagues doesn’t know how to do the same thing you get to pretend that you’ve known it all along and teach them how to do it! Such is the cycle of life. Remember, everyone’s just trying to fake it ‘til they make it.

Organization: Yes, it really works.

Seventh week is beginning (did anyone else just go into fight or flight mode after reading those words?), which means that I’ve now had well over a month to settle in to my first term as a DHA. Initially, I was going to write that I had spent the first several weeks of this job learning the ropes and getting the hang of how it goes (because I have indeed learned a great many things about a great deal of stuff), but then I realized that that’s not really true. More accurately, I’d say that I’ve jumped in headfirst, taking a “sink or swim” approach to this new job, so now seems like a great time to come up for air and do a bit of reflecting.

In short, I think I can confidently say that I have not utterly failed (I joke, I’ve actually done quite well). I owe this success in large part to the people I work with, who are intelligent and always helpful. But there’s another tool that has been key in learning quickly how to tackle a new project: the documentation.

The nature of student jobs and participation in organizations is that the turnover is fast – jobs and organizations are looking for new employees and members every year, so making training efficient can be essential. I’ve participated I some student organizations where it seems like every week we’re saying, “I’m pretty sure so-and-so did a project like this a couple years ago, but then they graduated…do you have idea how we could get their contact information to see if they’ve still got that information? Or maybe I still have an email about it from freshman year…” Yes, sifting through emails from 2014 is one of the warning signs that something went wrong…

Student turnover can be a logistical nightmare, but this job has been proof that it doesn’t need to be. It’s been so easy for me to access project history, familiarize myself with all the relevant tools and information, and then quickly jump into new projects. Not all of it is perfect, but the effort was made and I am reaping the benefits. If better (or any) documentation is something you think your job or organization could benefit from, here are some tips I’ve gleaned from my experience:

  1. Be specific and consistent when naming documents. “Meeting 3/5/15” is not helpful. “Initial Meeting with Web Developer” is helpful.
  2. Note specifically who has done what. If you need to contact someone about work they’ve done, you don’t want to be guessing between 10 different people.
  3. Take the time to organize. The point of documentation is that it makes everyone’s lives easier down the road. Make everything easy to find using section headers and bulleted lists.
  4. Note things that did and didn’t work. For example, if you’re reflecting on how an annual event went, a note that says “Next year get catering order in 1 week before event” can save planning time and prevent disasters in the future.
  5. Keep everything in one place. This seems obvious, but it can be very easy to for things to go missing, especially if there are a lot of people working on one project. For example, having one big Google Drive folder ensures that everyone knows where and how to access everything.

Now go forth and document!

Martha Says Hello

img_5824Hi! My name is Martha Durrett, and I’m a junior Computer Science and English major at Carleton College. Computer science and English! What? “Well those don’t overlap at all,” you might say. In some way you’re right…I certainly won’t be counting any of my CS classes for English credits. But in many ways that’s wrong – for example, digital humanities! Could I have found a more fun and engaging way to integrate my two majors?

As I learn more about digital humanities, I’m hoping to continue to break down that distinction between “English major” and “computer science major.” I want to discover fun and engaging new ways to integrate not just English and computer science, but any subject that piques my interest. Who says subjects need to be separate? (Finland doesn’t – check this out if you haven’t heard about Finland’s radical educational reform.)  Throughout the rest of the year, I’ll be thinking about how the digital world is changing expectations about how we’re supposed to learn about and interact with the humanities. If you ask me, the humanities (whatever that hefty term entails) have spent far too long hiding inside of textbooks, and it’s about time we did something new with them!