Teaching Digital Archival and Publishing Skills

June 12th, 2009
Erin Bell

I’ve been putting this off for a while now, especially after seeing some of the really impressive projects other campers are working on. My job is not research-oriented; much of what I do revolves around operationalizing and supporting faculty projects in the History Department where I work. What follows is a rather long description of one such project in which students, in the context of a local history research seminar, are tasked with digitizing archival items, cataloging them using Dublin Core, and creating Omeka exhibits that reflect the findings from their traditional research papers. Despite the fact that the students are typically Education or Public History majors, they are expected to carry out these tasks to standards which can be challenging even to professional historians and librarians.

I’ve written about some of the practical challenges in projects like this here. For a full description of the project at hand, click through the page break below. What is intriguing me right now are the questions such projects raise, particularly those relating to content quality and presentation.

What are realistic expectations for metadata implementation? Is enforcing metadata standards even appropriate in the context of humanities education? Many trained librarians aren’t even competent or consistent at cataloging, how can we expect more from undergrad History students? It’s not that they don’t gain from it (whether they like/know it or not), it’s just that poor metadata might be worse than none. Information architecture is another challenge, even when students have no role in the initial site design. They can still confuse the navigation scheme and decrease usability through poorly organized contributions. Likewise, the content students create is not always something we want to keep online for any number of reasons. Where do you draw the line between a teaching site (as in, a site designed and used for training projects) and one which is distinctly for use by the broader public? It’s very blurry to me, but I think how you answer that dictates what you are willing to do and what you end up with. We really want to create something that is generated entirely by students but with a life outside the classroom. Ultimately though, we will make decisions that best serve our instructional goals. I think the value is the process, not the result (though it would be nice for them to match up). We have done some very ambitious and high quality projects working with small, dedicated teams, but working with large class groups has led to some interesting and unforeseen problems. I wonder if anyone has any idea about how we might be able to replicate that small team experience and quality on this significantly larger scale.

Has anyone out there done a similar project? I’d love to hear some experiences and/or suggestions on pedagogy, standards or documentation?

I think this fits in to some degree with Jim Calder’s post and Amanda French’s post, among others (sadly, I have yet to read all the posts here, but I will get to it soon and maybe hit some people up in the comments).

OVERVIEW
This past semester, the Center for Public History and Digital Humanities at CSU has been training teachers, interns and undergraduate students in the History Department to use Omeka as a tool for exploring archives, sharing research, and curating personal exhibits. Students in our Local History Seminar are trained in archival research, image handling and digitization, and archival description and subject cataloging, including the use of Dublin Core metadata. In the interest of harnessing student labor for the benefit of the library, and protecting heavily used artifacts from further deterioration, we have tightened the process so that each participant’s labor may yield results that can be directly transferred to the library’s digital archive, Cleveland Memory , which runs on the ContentDM platform. Through trial and error, we have devised a barebones metadata plan, set digital image processing standards, and crafted a workflow that optimizes time and labor investments by students, faculty, and department and library staff. We hit a few bumps along the way, but have plans to revise our process next semester.

EDUCATIONAL RATIONALE
Holistic experience in history-making, from archival process to research to public exhibition

Creation and collection of student-generated content (images, maps, charts, exhibits, etc.)
Hands-on research in physical and digital archival collections
Image processing (digitizing physical artifacts according to locally-defined best practices)
Archival description using common metadata standards (Dublin Core)
Increased awareness of organization and use of metadata in libraries/archives may lead to increase in use and overall research effectiveness?
Experience using online archival software / publishing platform (Omeka)
Curating thematic local history exhibits based on area of research
We believe this increases readiness for employment, teaching, and continued education.

PROCESS
Students choose a research topic in local history, most often a neighborhood, park, district or institution/building with historical interest. Students are required to write a 15 page analytical research paper based in primary source research. They collect documents and images from available archival resources, including both digital and physical artifacts. Items are uploaded to an Omeka installation (csudigitalhumanities.org/exhibits) and described using Dublin Core and local metadata standards. Non-digital items are digitized according to processing guidelines set by CSU Special Collections. Using the items they collect, and the content from their research papers, students use Omeka to curate an interpretive exhibit around their topic, which they present to the class at the end of the semester. Professors spend a limited amount of class time providing ongoing instruction and guidance in technical matters, but generally focus on content.

As Center staff, I met with the class for hands-on sessions in Omeka use and image digitization, and have created handouts and an online student guide (csudigitalhumanities.org/exhibits/guide) containing instructions for using Omeka, digitizing items, and employing metadata standards. The guide contains general rules for Dublin Core and, as the first semester progressed, has evolved to also address common mistakes and questions. I track and enforce quality control on new items, and use the MyOmeka plug-in to leave administrative notes on each record containing instructions for correcting errors, as well as other suggestions for improvement. These notes can be seen only by students and administrators who are logged in with the single shared username. At the end of the semester, items and exhibits are graded and vetted to determine which will remain online. Items which contain complete metadata records and meet copyright and quality standards are exported into the Cleveland Memory collection. The rest are deleted. High-quality Exhibits remain public, others are deleted or made private.

RESULTS
Despite the extensive documentation, administrative notes, classroom instruction, and my availability for one-on-one consultation, the results in our first run were decidedly mixed. About one-third of students met the expectations for overall quality; another third came very close but made a few significant mistakes. Common mistakes included use of copyright protected items, grammar and syntax errors in metadata creation, improper use of controlled vocabulary terms, use of editorial voice in item descriptions, and image processing errors (low resolution, poorly cropped or aligned images, etc.). Others failed to translate their research into well-crafted exhibits, despite the fact that their in-class presentations were almost unanimously excellent.

From an administrative perspective, we also have some work to do to streamline the process. Some of our challenges involved limitations with the Omeka software, which was not necessarily designed for such projects.

We gave comments via the MyOmeka plug-in, which requires students to log-in and find their items via the public view. Once they find an item in need of correction, they must return to the admin view to make corrections and cannot see comments without again returning to the public view. At least one student complained about this cumbersome process. It was equally difficult for administrators. While printing out item records and adding handwritten notes would have been ideal for students and instructors, our workflow and other commitments dictated that this would not be possible.

At the end of the semester, we began the vetting process. I went through and reviewed each item, tagging them with “keep,” “revise,” “remove,” “rights,” and “cmp.” “Rights” was assigned to items in which copyright status was uncertain. “CMP” was assigned to items which were already available via the Cleveland Memory project. The tags were useful in quickly identifying the status of each item in the collection, but moving beyond that point has proven problematic. For one, the University dictates that we keep student work for up to 6 weeks after the end of the semester. Were the items and exhibits graded as a final exam, we would need to keep them for a full semester (thankfully, the physical research paper was “the final” for this course). Additionally, there is no easy way to batch delete or batch edit items from Omeka. Again, this is not necessarily a shortcoming in Omeka’s architecture, just a limitation of our project design. Due to each of these issues, we are making items and exhibits public or not public according to our vetting criteria. Deletions and revisions will have to wait at least six weeks.

We have decided to postpone plans for migration to Cleveland Memory until we can address some of the problems encountered in our trial run. We are optimistic that we can improve our instructional and administrative processes next semester, but that will require some new approaches and answers to some of the questions that emerged the first time around.

NEW APPROACHES

Next semester we will use the Contribution plug-in to collect items. This will allow us to limit confusion about which fields to fill and will also allow us to track submissions more effectively. Because we still want students to have some experience with metadata standards, and need to collect some additional information for later migration to the Cleveland Memory repository, we have customized the plug-in to include some additional fields.

To solve the issues of grading and revision, as well as required retention, we will use the ScreenGrab plug-in for Firefox, which allows for the capture of complete web pages. Students will save each item record and exhibit page in JPEG or PNG format, adding them to a printable document that they will submit for review as items and exhibits are added.

We are still trying to figure out a way to modify and delete items in batches. Since most mistakes involved improper use of controlled subject terms, it would be nice if we could identify a recurring term and edit it in a way that would cascade across the entire installation (e.g. locate all instances of the incorrect subject “Terminal Tower” and replace each with “Union Terminal Complex (Cleveland, Ohio)” ). This would likely involve a major change in Omeka, which – to my knowledge – does not collate Subject fields in this way. Batch deletion for superusers, on the other hand, might be easier to accomplish. Any thoughts?

Students will receive more comprehensive training. Based on common mistakes and frustrations, we will adjust instruction and documentation accordingly.

11 Responses to “Teaching Digital Archival and Publishing Skills”

Arden Kirkland Says:
June 13th, 2009 at 3:41 pm
This is exactly the kind of project I’ve been easing into with my students, so I’m all over this! I really appreciate your thoughtful commentary about what you’ve done so far (and I’d love to hear more). Among many other aspects, I think we can discuss how models / templates / rubrics all fit into this; anything that helps the students focus on content. Your emphasis on process is right on – but we can definitely brainstorm ways to encourage high quality results. So far, how do the students feel about what they’ve learned?
Erin Bell Says:
June 13th, 2009 at 6:14 pm
Arden, thanks for the comment. I look forward to working through some of these questions, and can see from your own post that you are looking at this from an even broader perspective, which I think will be very helpful to myself and others.

As for how students are reacting? Generally, it seems that they enjoy the work, though they do get frustrated with the heavy emphasis on details like metadata and exhibit organization (not to mention some of the common challenges of using unfamiliar technology). It’s hard to gauge their overall learning from where I stand (as support staff for the course instructor), but from watching some of their in-class presentations, I gather that this really pushed them to find and analyze more archival items (mostly images and clippings) than they would have in writing a conventional paper. I think having an outlet for their findings actually encouraged them to be more thorough in their research. On the other hand, I’m not sure they entirely appreciate the educational value of “taking it online.”

I agree that setting some additional standards via rubrics/models/templates would go a long way to improve both their experience and the final results.
Jeffrey McClurken Says:
June 13th, 2009 at 8:55 pm
Erin and Arden,
I’m certainly interested in the issues both of you have raised here. [I’ve grappled with them for my own senior seminar on Digital History and the tools students in that class use to complete their semester-long group projects. I’m particularly concerned with the question of how students’ experiences change when they’re not creating new projects, but rather building on (contributing to) the work of others (especially when the questions of tools, approach, form of metadata is largely decided for them).]

Certainly there’s some overlap with this topics with the thread started by Amanda French, but given the robust response to that post I suspect there’s room for at least two sessions on what we have students working on and the strategies for undergraduate/graduate involvement in digital humanities projects.
Erin Bell Says:
June 15th, 2009 at 2:23 pm
Jeff,

That is also a good point. We’ve also thought about having students improve upon the exhibits created by previous classes, but have yet to really come to terms with what that means. Obviously, one of the benefits of digital publishing is the ability to revise and improve upon resources as needed, but there are certainly a number of questions raised when asking students to build on the work of others. How do you evenly and fairly divide that work (some existing projects require more/less modification than others)? How do you measure the amount of work completed or the amount of actual learning when much of the content is preexisting and many of the most important decisions have already been made? What does it mean to go back and edit or improve student work (does this constitute a misrepresentation of what students actually produce on their own. we wouldn’t be comfortable editing and re-publishing their research papers)?

Unfortunately, instructors have typically chosen the tools to use, but I am very interested in how you have fared in giving students that choice. I think that’s an important competency, but one that we have not really addressed in our Public History courses. (an explicitly Digital course might have an easier time fitting this in).

Also, I realized another issue which I meant to raise in my initial post. That is, the difference between collecting and curating. Librarians and archivists seem to already do a pretty good good job of collecting, but they are less involved in providing interpretation and context for those items/collections and tend to take a “more is more” approach to digital projects (i.e. “we need to make everything available”). One thing that we have emphasized in our course work is that it is preferable to present a small number of exceptional (or representative) items and provide a more in-depth study. I think this is a big tension, especially between librarians and historians, who have different views about what constitutes a teaching/learning resource, what qualifies as scholarship, etc. Curating, I think, is a core skill that could be explored more thoroughly and, again, clearly distinguished from what librarians do. Library standards and practices definitely have a role in digital humanities projects, but there needs to be a balance and understanding about what that means.
Sterling Fluharty Says:
June 16th, 2009 at 9:18 am
I wonder if scale and motivation are related as well. Do students have greater a desire to excel in these kinds of projects when they realize that they are part of a larger movement? Or will students respond enthusiastically to state and local digital history projects if they have a personal or family connection to the area?

I suspect that agency, expertise, and publishing are also connected. Wikipedia is bustling with unpaid contributors. Presumably one of the reasons they write is for the satisfaction of seeing their work appear instantaneously in digital print, available to billions. Another likely reason is that they get to display their knowledge and feel like an expert. And because they have reached critical mass with this massive online publishing venture, Wikipedia has become tremendously useful for people across the globe. If students have to wait months for their digital history contributions to appear online, does that mean they will feel cheated? If students are not allowed to select their own topics for digital history projects, will they feel less invested in the process and outcome? If students feel like they are being asked to research, write, and publish on what they perceive to be obscure topics, will put less effort into the digital history project because they assume almost no one will read their work?

I am curious to know how people would compare and contrast their own digital history projects for students with the History Engine. Has the History Engine taken the “more is more” approach? Do the students who write for the History Engine feel like they have made a solid contribution, but have little hope that readers who need or want the information in their contribution will ever find it? Do the students who write for the History Engine respond positively when given the latitude to select their own topics, but wonder afterwards how all of the various student contributions will fit together into some meaningful whole or larger pattern? Do the students who contribute to the History Engine find some satisfaction in becoming experts, but are left wondering whether their contribution counts as a publication?
jamesdcalder Says:
June 16th, 2009 at 2:29 pm
Great post Erin. While I’m not currently working with these technologies in the classroom, I think we face similar, or at least related problems, when thinking about user generated content in general. What I mean by this is how do we get good metadata for user generated content, whether it be in the classroom or from the general public? As you correctly point out, sometimes less metadata is better than bad metadata. This is interesting to me, because it in some ways goes against my belief that its always a good idea to simply “open things up” and let people catalog, archive, etc. for themselves (there are a number of caveats of course, especially my preference for having user generated metadata exist separately but alongside librarian/archivist metadata). However, I think you’re right that sometimes this extra data, if done poorly, can really detract from the archive/digital history project as a whole. I think I’m going on some sort of a tangent here, so I’m going to stop, at least for now, but let me know what you think.
Sherman Dorn Says:
June 16th, 2009 at 9:29 pm
I agree with Jeff on the direction of this, with a suggestion: the editing of prior student work doesn’t have to be “for credit” to serve the function of seeing in-process work and improving on it. OR… they can look at prior student work but don’t have the opportunity to edit it unless they can show proficiency with the metadata projects an instructor sets, and then the editing work is extra credit. I guess the idea is that you can be flexible with the multi-generational concept Jeff suggests.
Patrick Murray-John Says:
June 18th, 2009 at 8:37 am
On the question of whether producing bad/wrong/misleading metadata is worse than no metadata at all, I’m influenced by my past life teaching first year composition into the following analogy. Producing bad/wrong/misleading writing is certainly better than students not writing at all, and I take the same view of metadata. To address that, I lean toward providing more metadata about the asserted status of the online project is the important thing.

I s’pose that puts me in the “more is more” camp. I’m comfortable with that, since when it comes to the curation end I’m all over the idea of the same digital object having many representations — curation might come in the form of creating and providing filters (some the user sees, some perhaps not necessarily?) on those objects. Anyone done two completely different Omeka exhibits, drawing on exactly the same set of items in a collection?

Looking forward to this one!
Erin Bell Says:
June 19th, 2009 at 5:37 pm
I think Sterling makes a relevant point about the student satisfaction that comes from both immediacy and potential impact of digital publishing and from being part of a large/ambitious movement or project. A project like the History Engine is probably even more satisfying than what I have been working on because it is nationally-oriented (rather than locally) and wiki-based (meaning, I presume, immediately visible results). It seems to have struck a nice balance in content and metadata as well, at least for the time being.

I wonder, though, how well it will scale with the metadata (specifically the controlled tags) being used. Will they provide enough granularity to actually sort through a resource which could grow exponentially in a short period of time? I’m not suggesting students should exhaustively catalog using Library of Congress Subject Headings or anything, I just think there needs to be a balance between what will increase findability and use and what will be manageable in the context of (dispersed) classroom teaching and learning projects.

And to address Patrick’s comment, I agree entirely with the idea that items/objects can be filtered/represented through multiple curated “events” (whether that is through an “exhibit” or some other mechanism) and don’t think that puts you in the “more is more” category at all (by “more is more” I am referring to the idea that quantity = quality in digitization/digital projects). Since we just started this, we haven’t seen any overlapping exhibits on the same topic, but that’s something to think about. My personal (OCD Librarian) inclination would be to either merge or clearly differentiate them for the sake of maintaining some sense that the site is a controlled resource. On the other hand, there is no reason intellectually why we shouldn’t present multiple accounts of the same topic. That might actually provide a different kind of value, but it would also change the nature of the site (not necessarily a bad thing).

I’m very grateful for all the feedback and comments here. This is going to be a really interesting discussion.
Erin Bell Says:
June 19th, 2009 at 5:48 pm
And to RE-clarify my poor explanation of the “more is more” approach, I meant to say “quantity (of items/item metadata) = quality (of resource),” which I think is problematic if not just wrong for most projects. That said, the idea that more filters, sitewide metadata, curation, etc. add value is right on. I hope I haven’t further confused what I thought was a really innocuous phrase. Have a great weekend everyone! 🙂
| Center for Public History and Digital Humanities Says:
August 3rd, 2009 at 11:23 am
[…] and competencies involved in this process. My initial submission to the THATcamp blog is here and is reposted below for your convenience. Be sure to read the comments at THATcamp.org and […]