When You Need to do More than Just Tag: Controlled Vocabulary and Metadata 3

As I’m approaching finishing building my form for my Clio3 project, I’m starting to think ahead to what I need to do when I actually begin using my form to input my meta data.

I have four “categories” of meta information beyond the basic cartoon date, caption and artist information. These categories are: character, events, keywords and themes. The first two are relatively easy and straight-forward. Was Uncle Sam depicted in a particular cartoon or not? Was this cartoon referencing the 1960 Election, the Korean War, the Kennedy Assassination, or not? It’s the last two that are giving me anxiety-filled dreams and theoretical nightmares. This is because the issue of controlled vocabulary and information architecture have sprung up remarkably quickly. Even with the first 6 cartoons I’ve been using as testers, I’ve started to have terminological panic-attacks.

Take these two cartoons for instance.

Herblock cartoon, January 8, 1960 Washington Post

Herblock cartoon, January 10, 1960 Washington Post

My characters for the first cartoon are obviously: Administration and John Q. Public (Herblock’s “everyman” stock character). For the second cartoon my characters are: (again) Administration and Child. As for events, there is not a specific historical “event” as I am defining it so that field would be empty.

But what about the keywords and themes? My first problem/dilemma is whether I want to only have one theme per cartoon or multiple themes. If I went with only one, my thought is that the first cartoon’s theme would be “government spending.” The second cartoon’s theme would be “Education.” If I had multiple themes, the first cartoon could be: government spending, housing, and defense budget. For the second: education and budget. This seems like a fairly straightforward “pick one and go with it” kind of decision until I factor in my other subjective category: keywords. If I have multiple themes, do I repeat the “budget” theme as a keyword also for the second cartoon as it is a label used in the cartoon. If I go by that logic though, do I also include “school” as a keyword for the second cartoon? If I am limiting my keywords to only words printed on the cartoon, what do I do for the first cartoon where my “labels” are full sentences?

My impulse is to stick with one theme per cartoon and determine that based on my interpretation of what the “point” of the cartoon is. What is the artist really commenting on in this image? Which is why my theme for the second would be “education” rather than “budget.” But Herblock’s mention of the budget is important and relevant so it would obviously be a keyword. If I am limiting myself to only using keywords that are printed on the cartoon then I would also include “school.” But if I am not limiting myself should I also include it since it would be slightly redundant to the theme of “education?”

The first cartoon is a little trickier, and is the kind of situation that induces controlled vocabulary panic. I interpret this cartoon as being about government spending. This is related, but different, to the government’s budget. Because Herblock is not actually referencing the budget, I am hesitant to include that term as either a theme or a keyword. But “housing” could easily be either a theme or a keyword. Similarly, should another keyword be “missiles” or “defense spending” or “defense budget”?

And this is just with two cartoons. This will get infinately more complex and convoluted once I get knee-deep in the almost 3,000 cartoons I have for just this one artist. (I eventually will be adding two other cartoonists to this database.)

I have never dealt with such a large body of images before. I have worked with cartoons and wrote my Master’s Thesis on Herblock’s portrayal of the Korean War. I used a very limited controlled vocabulary for that project and had a small set of main keywords and a few sub-keywords for each. That was probably not the best way to handle my primary sources but it worked at the time and taught me that if I would ever enlarge my cartoon set I would need to vastly rethink my process and choices (I had also tracked all those cartoons on a cartoon spreadsheet which was painful, tedious and stretched the limits of what I could actually “do” with my cartoons – hence the database this time round). Tackling this large a corpus is daunting to say the least. The fact that is the foundational research for my dissertation, makes it even more so.

I don’t have the answers yet to these questions. I think it might be as my colleague Sharon Leon suggested. That’s it is just going to be a process that I will have to figure out as I go along…and to make sure I write a very robust “Edit” option for my database.


Abby Schreiber says:

I can really relate to your volume problem – I’m dealing with a huge collection of manuscript sources that I will have to process manually. I keep asking myself if I can ever hope to get through it all and still finish my degree in a reasonable time. I’m considering a selective process, where (for now at least) I only transcribe into my database entries from even years or every third year. Have you considered making a selection of your cartoons? If so, how would you go about doing that?

Sasha Hoffman says:

I have considered sampling but ultimately decided against it. I’m only examining three cartoonists total so they are themselves a sample of all political cartoonists during the time period I’m examining. In addition since I’m trying to really dig in and trace the occurrence of discussions around nuclear fear and anxiety I may miss a blip for absence of a conversation if I’m only looking at say, 2 cartoons a month.

John says:

You might want to re-think the relationship between controlled vocabulary and keywords, with an emphasis on some kind of structure that is related to your thesis. The controlled vocabulary might be best seen as the code for your research project as a whole (e.g., Nuclear and Non-Nuclear) and then the keywords are as you suggest: ways to tag imagery in the cartoon. Though you might want to think about what kinds of imagery you want to tag in advance. For instance, your (semi-controlled) keywords might be: Budget (covers general govt spending), Defense Spending, Education, etc, but you specifically decide to omit anything outside of that imagery (buildings, clothing, gender). This way, you can set up a code book that makes it clear how you were interpreting the images (because the database is a kind of argument already, so you want to make transparent what kinds of structures are contained within).

(BTW, came to your blog via American History Now)

