Uncategorized – THATCamp CHNM 2009

David Staley discussing his digital installation, Syncretism

Dave — Sat, 04 Jul 2009 04:58:02 +0000

As mentioned in his previous blog post, David Staley displayed a digital installation in the Showcase center during THATCamp. Here’s video of David discussing his work in greater detail:

Session Notes: Libraries and Web 2.0

Vika Zafrin — Wed, 01 Jul 2009 16:27:18 +0000

These are the notes from the first breakout session I attended, is Libraries and Web 2.0. People attending included “straight-up” librarians, digital humanists, a programmer at NCSA even. Let’s see if I can capture what we talked about.

The European Navigator was originally intended to show people what the EU is, in general. But then teachers started using it in classrooms, with great success, and later began asking for specific documents to be added. The site talks about historical events, has interviews and “special files”, has a section devoted to education, and one for different European organizations. The interface is intricate yet easy to use, and uploaded documents (some of them scanned) are well captioned.

Teachers are asking for more on pedagogy/education, but the site’s maintainers feel they don’t have the skills to oblige. [vz: So are teachers willing to contribute content?] The site is having a bit of technical problems: the back end was based on an Access database exported into SQL (exporting is painful! quality control of exports takes a lot of time), and the front end is Flash (slow); they’ll be changing that. It’s made as a browser, which means a navigator within a navigator (which, Frederic Clavert says, is bad, because it doesn’t lend itself to Web 2.0 tool addition — vz: plus accessibility is pretty much shot, and they haven’t created special accessibility tools), and they have to ask users to contribute content, which ends up being too biased.

They do know who their audience is: they did a study of their users in 2008. That’s a great and important thing to do, for libraries.

They’re migrating to the Alfresco repository, which seems to be popular around the room. They want annotation tools, comment tools, a comment rating engine, maybe a wiki, but ultimately aren’t sure what web 2.0 tools they’ll want. They’re obliged to have moderators of their own to moderate illegal stuff (racist comments, for example), but for the most part it seems that the community will be able to self-regulate. Reserchers who are able to prove that they’re researchers will automatically have a higher ranking, and they’re thinking of a reputation-economy classification of users, where users who aren’t Researchers From Institutions but contribute good stuff will be able to advance in ranking. But this latter feature is a bit on the backburner, and — vz — I don’t actually think that’s a good thing. Starting out from a position of a default hierarchy that privileges the academe is actively bad for a site that purports to be for Europe as a whole, and will detract from participation by people who aren’t already in some kind of sanctioned system. On the other hand, part of ENA’s mission is specifically to be more open to researchers. They’re aware of the potential loss of users, and have thought about maybe having two different websites, but that’s also segregation, and they don’t think it’s a good solution. It’s a hard one.

On to the Library of Congress, Dan Chudnov speaking. They have two social-media projects: a Flickr project that’s inaugurating Flickr Commons, and YouTube, where LC has its own channel. YouTube users tend to be less serious/substantial in their responses to videos than Flickr users are, so while LC’s Flickr account allows (and gets great) comments, their YouTube channel just doesn’t allow comments at all.

They’ve also launched the World Digital Library, alongside which Dan presented the Europeana site. Both available in seven and six languages, respectively (impressive!). WDL has multi-lingual query faceting; almost all functionality is JavaScript-based and static, and comes out of Akamai, with whom LC has partnered; so the site is really really stable; on the day they launched, they had 35 million requests per hour and didn’t go down. Take-away: static HTML works really well for servability and reliability and distributability. Following straight-forward web standards also helps.

Good suggestion for Flickr Commons (and perhaps Flickr itself?): comment rating. There seems to be pushback on that; I wonder why? It would be a very useful feature, and people would be free to ignore it.

Dan Chudnov: the web is made of links, but of course we have more. Authority records, different viewers for the big interconnected web, MARC/item records from those, but nobody knows that. More importantly, Google won’t find it without screenscraping. What do you do about it? Especially when you have LC and other libraries having information on the same subject that isn’t at all interconnected?

Linked data, and its four tenets: use URIs as names for things; use HTTP URIs; provide useful information; include links to other URIs. This is a great set of principles to follow; then maybe we can interoperate. Break down your concepts into pages. Use the rel tag, embed information in what HTML already offers. So: to do web 2.0 better, maybe we should do web 1.0 more completely.

One site that enacts this is Chronicling America. Hundreds of newspapers from all over the country. Really great HTML usage under the hood; so now we have a model! And no “we don’t know how to do basic HTML metadata” excuse for us.

Raymond Yee raises a basic point: what is Web 2.0? These are the basic principles: it’s collective intelligence; the web improves as more users provide input. Raymond is particularly interested in remixability and decomposeability of it, and into making things linkable.

So, again, takeaways: follow Web 1.0 standards; link to other objects and make sure you can link your own objects; perhaps don’t make people get a thousand accounts, so maybe interoperate with OpenID or something else that is likely to stick around? Use encodings that are machine-friendly, machine-readable — RDF, JASN, XML, METS, OpenSearch, etc. Also, view other people’s source! And maybe annotate your source, and make sure you have clearly formatted source code?

There’s got to be a more or less central place to share success stories and best practices. Maybe Library Success? Let’s try that and see what happens.

(Edited to add: please comment to supplement this post with more information, whether we talked about it in the session or not; I’ll make a more comprehensive document out of it and post it to Library Success.)

Digital training session at 9am

Amanda French — Sun, 28 Jun 2009 11:53:56 +0000

So @GeorgeOnline (whose last name I simply MUST discover) has set up several platforms at teachinghumanities.org, and we semi-agreed over Twitter that it’d be fun to use the 9am “Digital Training” session to build it out a bit. Gee, anyone have a laptop they can bring?

Do please let us know via Twitter or comments on this post whether you’d like to use the session for that purpose; far be it from me to curtail conversation, especially the extraordinarily stimulating sort of conversation that has so far been the hallmark of THATcamp.

Six degrees of Thomas Kuhn

shermandorn — Sun, 28 Jun 2009 03:01:38 +0000

The recent PLoS ONE article on interdisciplinary connections in science made me wish instantly for a way to map citation links between individuals at my institution.

From Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. 2009 Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. doi:10.1371/journal.pone.0004803

So the authors of the article looked for connections among huge areas and journals. In practice, interdisciplinary collaboration is helped tremendously by individualized matchmaking. The clickstream data for Bollen et al is one example of “linkage” but there are others: Google Scholar can probably help connect scholars at individual institutions by the sources they use in common. The title is a misnomer: trying to follow sequential citations to find the grand-grand-grand-grand-grandciters of Thomas Kuhn would be overkill and impractical. First-level citation overlaps would identify individuals who share either substantive or methodological understandings.

I thought this was impossible until one fellow camper told me at the end of the day that there is a Google Scholar API available to academics. Woohoo! Is any enterprising programmer interested? Or someone who works at a DH center interested in getting this started? Or someone….

Incidentally, I suspect that there are many possible data sources (six degrees of Twitter @ refs?) and ways of working the practical uses of this (seeing detailed overlaps for two specified individuals, or identifying summary overlaps for groups of individuals at a university, in an organization, attending a conference, etc.).

And, finally, … yes, to answer the logical question by those at the last session today in the GMU Research I auditorium, the Bollen piece is catalogued in visualcomplexity.com.

Context & Connections notes

shermandorn — Sun, 28 Jun 2009 01:25:29 +0000

The following are my raw notes on the Saturday morning “Context and Connections” session. Assume that unless otherwise noted, they are paraphrases of comments made by participants (either labeled or not). It began with a note that this was about making connections with and adding context to historical document collection (e.g., The Papers of Thomas Jefferson with Monticello/Jefferson Foundation, on the UVA Press Rotunda site), but this is about both research and teaching. The problem in the classroom: students often USE digital archives but do not interact with it in terms of mashups (or scholars, with contribution)

Someone suggested this is sort of like Thomas Jefferson’s FB page: who were his friends, etc.

Montpelier/Madison foundation has a hierarchical set of keywords and two separate databases for names that may not interact.

Problem of places/data sets that do not talk to each other (e.g., LoC has largest set of Jefferson papers, but limited (and difficult-to-read) image sets.

So if there’s a suite of tools, is there one appropriate for both archivist/research community and for students?

MIT Media Lab’s Hugo Liu has an older project that simulated “what would they think?” AI.

Web forces textual connections (links). E.g., Wikipedia keyword linkages. It is not required to rely on a folksonomy; could have a multi-level tagging system (by persona).

How much text-mining (by computer) and how much is intensity of analysis/interpretive-focused? LoC project on Civil War letters is on the second end of the spectrum.

From library/archive world: WordPress has hierarchical categories AND (nonhierarchical) tags

Someone asked about a tag suggestion system? Someone noted that existed with delicious.

Another person: Try Open VocabThat does move it into the semantic

What to do with “rough piles” of tags, etc. If the tags accrete, we will want to analyze who tags how, and how that changes depending on context (and time.

“That sounds like scholarship.

Conversation. “That sounds like scholarship.”

Tags aren’t enough. Conversation isn’t enough. I want both.
We want a person behind that tag.

The Old Bailey is working on this problem — the largest holding of information on dead proletariats in the world, and how do we make connections among sparse information (e.g., Mary arrested as prostitute, with place of work, date, and pimp).

We need a Friendster of the Dead.

Maybe a way of figuring out by context who wrote (or context of writing).

[Sherman]: Like quantitative ways of guessing authors of individual Federalist Papers, except less well defined

Archivists have to do that all the time — “what did this word mean”? Time and place contexts

A question of how much preprocessing is required…

We need a way of mapping concepts across time. There’s only so much computationally that you can do. A social-networking peer review structure so that experts winnowed out the connections that a program suggested.

That’s a task good for socializing students — give them a range of potential connections, make them winnow the set and justify the judgments.

As a scholar, I need computers to suggest connections that I will judge by reading the sources.

Library (archival collection) no longer provides X that scholars use. There needs to be a conversation/collaboration.

Philologists on disambiguation: that’s a tool I can use.

Toolbuilding is where these connections will be made: with Zotero and Omeka, I spend as much time talking with archivists/librarians as with scholars.

Does anyone know about the Virtual International Authority File?

There are standards for marking up documents in public format? Will that standardization translate to what we do online, much more loose and free with Digital Humanists.

Back channel link to Historical Event Markup and Linking (HEML) project.

The “related pages” links for Google sometimes work for documents.

You don’t know why something is coming up as similar, and that’s a personal disambiguation process (reflection).

Discussion about extending core function of Text Encoding Initiative.

Discussion around www.kulttuurisampo.fi/ about intensity of work, selection of projects, etc.

DBPedia– controlled-vocabulary connection analysis for Wikipedia from infoboxes on articles, but the software is open-source. (and could be applied to any MediaWiki site).

Keep an eye on the IMLS website! – there is a project proposal to use TEI for other projects.

More on Libraries of Early America

endrina tay — Sat, 27 Jun 2009 18:04:24 +0000

So I didn’t have time to ask the two questions I had for everyone during my Dork Shorts session on the Libraries of Early America, so here they are …

1. I’m very keen to hear what folks think about how this sort of data might be used by scholars and for teaching?

2. What kinds of visualizations would folks be interested in experimenting with by using such bibliographical data, e.g. date/place of publication, publishers, content, etc.

The links again:

Thomas Jefferson’s library on LibraryThing

Subject tag cloud | Author tag cloud

Examples of interesting overlaps:

Books shared with John Adams

Books shared with George Washington

The list of Collections in the pipeline is here. This is a subset of the larger Legacy Libraries or “I See Dead People’s Books” project.

Crowdsourcing – The Session

Lisa Grimm — Sat, 27 Jun 2009 15:45:20 +0000

Being a semi-liveblog of our first session of the day – please annotate as you see fit (and apologies if I left anything or anyone out).

Attendees: Andy Ashton, Laurie Kahn-Leavitt, Tim Brixius, Tad Suiter, Susan Chun, Josh Greenberg, Lisa Grimm, Jim Smith, Dan Cohen

Lisa: Kickoff with brief explanation of upcoming project needing crowdsourcing.

Susan: Interested in access points to large-scale collections – machine-generated keywords from transcriptions/translations, etc. Finding the content the user is most likely to engage with.

Josh: Landing page of the site to steer the crowd to certain areas – flickr commons?

Susan: Discovery skillset? Asking users – ‘what are you interested in?’ Discipline-specifc, multi-lingual vocabularies could be generated?

Josh: Getting more general: moving beyond the monoculture – what is the crowd? Layers of interest; Figuring out role – lightweight applications tailored to particular communities. NYPL historical maps project example – can we crowdsource the rectification of maps? Fits well w/community dynamics, but the information is useful elsewhere. Who are the user communities?

Laurie: Relation between face to face contact and building a crowdsource community? Susan & Josh’s projects have large in-person component.

Defining the need for crowdsourcing – what is the goal? Josh likes notion of hitting multiple birds with one stone. What is the crowd’s motivation? How can we squeeze as many different goals as possible out of one project?

Tad: issue of credentialing – power of big numbers.

Jim: Expert vs. non-expert – research suggests amateurs are very capable in certain circumstances.

Susan: Dating street scenes using using car enthusiasts – effective, but key is in credentialing.

Andy: The problem of the 3% of information that isn’t good – the 97% that’s correct goes by the wayside. Cultural skepticism over crowdsourcing, but acceptance of getting obscure information wherever possible (e.g. ancient texts). Looking into crowdsourcing for text encoding. Data curation and quality control issues to be determined. Interested to see Drexel project results next year?

Susan: Human evaluation of results of crowdsourcing – tools available from the project there. (Yay!)

Jim: Transitive trust model – if I trust Alice, can I trust Bob?

Josh: Citizen journalism, e.g. Daily Kos – self-policing crowd. Relies on issues of scale, but not just ‘work being done that’s not by you.’ Cultural argument about expertise/authority – ‘crowd’ meaning the unwashed vs. the experts.

Susan: Long tail is critical – large numbers of new access points. How to encourage and make valuable?

Tad: Translations: ‘they’re all wrong!’ (Great point).

Andy: Depth, precision & granularity over breadth

Jim: Unpacking the digital humanities piece – leveling effect. Providing an environment for the community, not just a presentation.

Josh: Using metrics to ‘score’ the crowd.

Tad: Wikipedia example – some interested in only one thing, some all over.

Josh: Difference between crowdsource activity as work vs. play. Treating it as a game – how to cultivate that behavior?

Susan: Fun model relies on scale.

Josh: MIT PuzzleHunt example; how to create a game where the rules generate that depth?

Susan: Validation models problematic – still requires experts to authorize.

Tad: Is PuzzleHunt work, rather than play?

Andy: NITLE Predictions Market – great example of crowdsourcing as play.

Dan: Still hasn’t gotten the scale of InTrade, etc. – how to recruit the crowd remains the problem. Flickr participation seems wide, but not deep.

Josh: Compel to do job because they have to, do Amazon Mechanical Turk model and pay or get deeper into unpacking the motivations between amateur and expert communities.

Susan: Work on motivation in their project determined that invited users tagged at a very much higher rate vs. those who have just jumped in.

Susan: Paying on Mechanical Turk not as painful as it might be – many doing tons of work for about $25.

Josh: So many ways to configure crowdsourcing model – pay per action? Per piece? Standards & practices don’t exist yet.

Susan: We’ve talked a lot about them, but there are still relatively few public crowdsourcing projects.

Dan: Google averse to crowdsourcing (GoogleBooks example) – they would rather wait for a better algorithm (via DH09).

Susan: But they have scale!

Dan: Data trumps people for them.

Andy: Image recognition – it’s data, but beyond the capabilities now.

Dan: Third option: wait five years – example of Google’s OCR. Google has the $$ to re-key all Google Books, but they are not doing it.

Josh: Google believes that hyperlinks are votes –

Dan: Latent crowdsourcing, not outright

Susan: Translation tools largely based on the average – our spaces don’t fit that model

Tad: Algorithm model gives strong incentive to proprietary information – you have everything invested in protecting your information, not openess.

Dan: OpenLibrary wiki-izing their catalog, vs. Google approach. Seems purely an engineering decision.

Andy: Approach informed by a larger corporate strategy – keeping information in the Google wrapper. Institutional OPACs almost always averse to crowdsourcing as well. What is the motivating factor there?

Josh: Boundary drawing to reinforce professional expertise and presumption that the public doesn’t know what it’s doing.

Andy: Retrieval interfaces horrible in library software – why keep best metadata locked away.

Sending around link to Women Physicians…

Susan: different views for different communities – work with dotSub for translation.

Dan: Other examples of good crowdsourced projects?

Susan: Examples of a service model?

Josh: Terms of service? Making sure that the data is usable long-term to avoid the mistakes of the past. Intellectual property remains owned by person doing the work, license granted to NYPL allowing NYPL to pass along license to others. Can’t go back to the crowd to ask for pernission later. Getting users to agree at signup key. Rights and policies side of things should appear on blog in future.

Jim: Group coding from Texas A&M moved into a crowdsourcing model – future trust model ‘model’

Please continue to add examples of projects (and of course correct any ways I’ve wildly misquoted you).

It would be great to have some crowdsourcing case studies – e.g., use flickr for project x, a different approach is better for project y…

Museum Content–A Couple of Ideas

schun — Sat, 27 Jun 2009 10:13:27 +0000

Posting to the THATCamp blog *so* late has allowed me to change the focus of my proposed session and to consider my very most recent project. For reference (and perhaps post-conference follow-up), I’m posting a description of my original THATCamp proposal, in addition to some thoughts about a possible session about searching of museum records:

My original proposal involved a project called “The Qualities of Enduring Publications” that I developed at The Metropolitan Museum of Art during the financial crisis that followed the 9/11 attacks. Faced with a deficit budget resulting from severely diminished attendance, the museum planned to implement radical budget cuts, including significant cutbacks in publishing. In light of these cutbacks, I was interested in examining the essential nature of the publications (for 2002, read: books and print journals) that the discipline was producing and reading, and in thinking about what gives an art history publication enduring value. The question was examined through a personal prism, in a series of small workshops (ca. 10 participants each) at the Met and at museums around the country. Participants came to the workshop having selected one or two publications that had had enduring value for them in their professional lives–books that they had consulted regularly, had cited frequently, or had used as models for their own publications. A few minutes at the start of the workshop were spent sharing the books, after which I (as workshop chair), began the discussion, which centered around a series of simple scripted questions, to which answers were responded for later analysis. The questions asked whether titles had been selected for (for example) the fidelity of the reproductions, for the lucidity of the prose, for the multiplicity of voices, for the well-researched bibliography, and so on. The workshops were fascinating, not just for the results they produced (the publications most valued by art historians had relatively little in common with the gigantic multi-authored exhibition catalogues produced by museums during that time frame), but also for the lively conversation and debate that they engendered amongst museum authors and future authors.

I have recently been encouraged to expand the workshop scope to include participants and titles from all humanities disciplines, as well as to consider the impact of electronic publishing and distribution on an individual’s choices. Staging the new version of the workshop will require the recruitment of workshop chairs from across the country and throughout the humanities, and the drafting of a series of additional questions about the ways in which electronic publishing might impact a participant’s thinking about his or her enduring publications. I had hoped to use THATCamp as an opportunity to identify potential workshop chairs in humanities disciplines other than art history, to discuss examine the existing workshop discussion template and to work on the questions to be added on e-publishing, and to think about ways to analyze a (much larger) body of responses, perhaps considering some bibliometric analysis techniques.

Though I’m still interested in speaking informally with ANY THATCamp participant who might be interested in participating in the expanded “Qualities of Enduring Publications” workshops, I’m actually focused right now on a newer project for which some preliminary discussion is needed to seed the project wiki. Along with colleagues at ARTstor and the Museum Computer Network, I’ll be organizing a team that will examine the user behaviors (particularly search) in repositories that aggregate museum records. The project, which will take place during the six weeks before the Museum Computer Network conference in November, 2009, will involve analysis of the data logs of ARTstor, the museum community’s key scholarly resource for aggregated museum records, as well as logs from other libraries of museum collection information, including (we hope) CAMIO and AMICA. A group of recruited participants will consider the logs, which will be released about six weeks before the November conference, articulate questions that might be answered by interrogating the data, and write and run queries. We’ll also think about ways to establish and express some useful ways to query and analyze an individual museum’s search logs, and will use these methods to look at the logs of participants’ museums, as a baseline for comparison with the ARTstor, CAMIO, and AMICA records. At an all-day meeting during MCN, we’ll gather to examine the results of the preliminary results; discuss, modify and re-run the queries, and work together to formulate some conclusions. In the eight weeks after the meeting, ARTstor staff and/or graduate student volunteers will produce a draft white paper, which will circulate to the meeting participants before being released to the community at large. Although the project is limited in scope (we have not yet figured out how to get any useful information about how users of Google look for museum content), we hope that it will help museums to begin to think about how their content is accessed by users in the networked environment using real evidence; at present, very little quantitative information about user behaviors (including which terms/types of terms are used to search, whether searches are successful, which objects are sought) is available. Results could have lasting impact on museum practice, as organizations prioritize digitization and cataloguing activities, and consider what content to contribute to networked information resources. I hope that a discussion at THATCamp might provide some seed content for the project wiki, which serve as the nexus of discussion about what questions we will ask, and about what methods will be used to answer them.

ICONCLASS demo/discussion

eposthumus — Fri, 26 Jun 2009 23:01:45 +0000

Vieing for a spot on the ‘absolutely last-minute proposal postings’ roster, here’s mine:

A demo and discussion of the ICONCLASS multilingual subject classification system (www.iconclass.nl)

This system might be known to students of Art History and hard-core classification library science geeks, but it has applicability to other fields in cultural heritage. Originally conceived for use by Art History Prof Henri van de Waal in the Netherlands, it has matured over the past 40 years and is in use internationally. Over the past few years we have made several new digital editions, software tools and have applied it to diverse other fields including textual content. In the near future we will be making a brand-new ‘illustrated’ version public, and hope to also make it a Linked Data node.

A session showing what it is and how to use it, or a more advanced discussion on thematic classification is possible, depending on feedback.

Building a better web by linking better

dan chudnov — Fri, 26 Jun 2009 18:31:39 +0000

Here’s my original proposal:

Been thinking a lot about what it might mean to make Linked Data reliable and resilient. We can do better than just “the LOD cloud” – we can make a web of data that can survive the temporary or permanent loss of a node in the big graph or a set of data sources. Since Linked Data is a natural extension of the web, we have all the knowledge and experience of 20+ years of web and networking developments to apply to building Linked Data systems. We’ve learned a few things about proxying and caching, in particular, and those concepts should apply equally well to linked data. If you’re interested in the “web of data”, whether as a consumer of it in the course of your research or as a producer of digital humanities resources or both, I’d like to highlight some of these issues for you by demoing some work we’re doing in the realm of digital collections in libraries, and to leave you with a few ideas for making your own stuff more resilient.

But then the King of Pop died.

So instead, I would like to demonstrate the shot-for-shot recreation of the famous Thriller video I made last night with an Arduino, Omeka, Processing, crowdsourcing, rectified old maps from NYPL reprinted using e-ink, and a native RDF triple store.

more on digital archives, libraries, and social networks

tclement — Fri, 26 Jun 2009 18:20:47 +0000

I’ve lost my thatcamp proposal (go figure) but since I’ve been concerned about the same issue for some time, I think I can piece it together again briefly here. I’m very interested in what another camper has posted here as making static archives more social by using something like Omeka. My particular focus is a digital edition of poetry written by a Dadaist poet that I’ve created called In Transition: Selected poems by The Baroness Elsa von Freytag-Loringhoven (see it here: www.lib.umd.edu/dcr/projects/baroness/ user: dcr; password: dcrstaff). The thing about the baroness is that she was super popular during the 1920s New York bohemian art scene. She published poetry in The Little Review but she also performed on the street in outlandish dress and pretty much provoked the world at large by flaunting her sexuality, chiding men like Marcel Duchamp and William Carlos Williams for “selling out” and becoming popular, and otherwise behaving hilariously obnoxious. Point is, what made her poetry the talk of the town at the time was in part due to her social network and the collaborative audience that both responded to and provided fodder for her art.

Now, someone anonymous has created a mySpace page with over 700 friends for the baroness (see awww.myspace.com/dadaqueen). The interesting thing is the response her persona attracts. People upload videos and poems and some just comment on their adoration. Very few, however, mention her poetry. So, what happens if we bring her poetry into this scene? How will this popular response change? Would it? What would people find in her poetry that may have been missed in an anthologized, normalized rendition of “A Dozen Cocktails, please?” How might people respond to each other in this space, a space imbued with her poetry?

This brings me to my third and final point. These questions are what has provoked my interest in Omeka, but why Omeka? Why not try and start up an edition in Facebook or MySpace? What would that look like? Well, . . . good question. I have found–in my humble experience–that digital projects are in part restricted by the digital means to which one has access. That is, currently the edition I have created is on a server waiting to be incorporated into the official University of Maryland digital repository which is supported by Fedora. Currently, the library doesn’t have an exhibit application that they use for projects like mine. (The whole library world is trying to figure this stuff out, after all.) I think incorporating Omeka (as opposed to trying to figure something out in FB or mS) would provide for the social network I’m trying to tap into as well as a very real structure that the library community could embrace and incorporate in the existing infrastructure. Thoughts . . . ?

Museum & University: Creating Content Together

drszucker — Fri, 26 Jun 2009 03:53:50 +0000

Are others interested in discussing strategies for bringing museums and colleges/universities together to create content? In my field, art history, graduate students chose either a curatorial or a teaching path and rarely look back. Of course a professor and curator may share a particular interest and collaborate, but these are isolated instances. Wouldn’t our students and museum visitors be better served if collaborations were on-going? Can the ease with which we now publish high quality images, audio, video, and text be used to coax institutions beyond their cloistered walls?

Scholarly arguments in non-text-based media

mary litch — Fri, 26 Jun 2009 01:39:40 +0000

I’d like to meet with others who want to discuss the publication end of digital humanities. I’m particularly interested in how scholarly argumentation can be represented in or strengthened by the use of non-text-based media. What are the possible bearers of argumentation? How exactly does this work outside the traditional essay format? I’m an analytic philosopher who has done some work on the representation of (philosophical) arguments in film and I’m thinking that some of the analysis done in this context might also apply to questions such as: Do articles in Vectors Journal offer arguments? Can a map mash-up offer an argument? Can a series of images offer an argument? Are there limitations to the sorts of arguments that non-text-based media can offer? Are non-text-based media better than the traditional essay at presenting certain types of arguments?

While a starting assumption of mine is that scholarly communication in the humanities involves at a minimum the presentation of arguments, perhaps this is also something that could be opened for discussion.

I have some ideas on reasonable answers to these questions based on the analogy with argumentation in film and on recent discussion at UCLA’s Mellon Seminar and DH09, but my thoughts haven’t gelled to the point that I feel comfortable saying “I want to present on this topic.” — So, anyone want to join me for a discussion?

modelling subjectivity in space & time

Bethany Nowviskie — Thu, 25 Jun 2009 16:38:42 +0000

This is just a tardy post to say that I’d love to see this year’s THATcampers engage seriously with the notion of subjectivity in spatial and temporal visualization. I’m picking up here on ideas by Amanda and Brian, and also on a series of conversations I’ve been having this week at the annual Digital Humanities conference (hashtag #dh09, for the few THATcamp Twitterati who haven’t already experienced the deluge!).

At DH09, I presented one particular cultural artifact that has become a touchstone for me in thinking about the geospatial tools and services we’re building at the UVA Scholars’ Lab. This is a little journal from 1823, in the private (open-access!) map collection of David Rumsey. I hope to publish something on it in the coming year (so be a sport and let me share my find with you without worrying about getting scooped!).

It’s Frances Henshaw’s book of penmanship, a wildly imaginative collection of spatialized textual representations of states in 1820s America, together with hand-drawn, -lettered, and -colored maps. If you check it out, you’ll see what I mean and why the subjective and aesthetic qualities of the document are so interesting. I’d be happy to give a brief guided tour at THATcamp as well.

I want our analytical tools for spatial information to become attuned enough to the interpretive aims of humanities scholars to help us say something about the Henshaw document. What do we need to articulate and know in order to get there? The Scholars’ Lab will be hosting some conversations through SCI (the Scholarly Communication Institute) and our NEH-funded Institute for Enabling Geospatial Scholarship, but — as I found last year — there’s no place like THATcamp!

That’s space. Then there’s the subjective dimension of time. I never go to a conference without having at least one person ask me about the Temporal Modelling Project, which was a prototyping project I undertook when I was a grad student, in collaboration with Johanna Drucker. Temp Mod aimed to create a fluid kind of sketching environment in which humanists could model time and temporal relations as they interpreted them in their objects of study. So you could map time in, say, a Faulkner novel, and concentrate on those subjective qualities of temporality that particularly interest humanists: moments of disruption, anticipation, regret, catastrophe, joy — and create graphical expressions of moments that seem to speed by or drag on. Out of that iterative sketching, you’d get a formal data model you could use to encode (primarily, we imagined) texts in XML.

Temporal Modelling lost its (bizarre) corporate sponsorship unexpectedly after 9/11 and never really recovered, but the intellectual work was good and I think the time is ripe to consider these ideas again — especially in the broader context of geo-temporal visualization for the fuzzy, inflected, madcap, subjective humanities. Could we look at projects like Temp Mod and artifacts like the Henshaw journal to open a discussion at THATcamp?

Visual Art and DH

karindalziel — Thu, 25 Jun 2009 05:05:28 +0000

I expressed two ideas in my proposal, both of which have been expressed in some form or another by others.

One, I am interested in the tools people use for digital projects and why they use them. The reason for this is that both I and the Programmer where I work are fairly new to the position and sometimes I feel like we are grasping at straws, recreating what others may have already figured out. I suspect that this may not take a session of its own, but will come out of talking to people and hearing about other’s projects.

The other thing I suggested was this:

I am really interested in visual (fine) art and the digital humanities. There was a session last year on fine art and the DH, which was only attended by myself and two others, but I had a great time. Since then, I’ve thought more about how fine art and art history might be supported by DH. I also blogged about the possibility of an artist in residence at a DH center, perhaps supported by the Fellowship at Digital Humanities Centers grant. I would love to hear what others think on this topic and would be very willing to do a little overview of what’s out there right now.

I’m not so sure about the overview part- partly because I have not had much time to research this in depth, and partly because my cursory look hasn’t turned up much. There seems to be a split between fine art and digital humanities centers. David Staley’s post, for instance, talks about a visually oriented humanities project- but the work (and the title of the post, even!) make me think “artwork” and “artist.” I find it really interesting that just about the same exact work could be “digital humanities” or “fine arts” depending on who is doing the work. The point was driven home during Lev Manovitch’s plenary speech at DH ’09. Manovitch os a Professor in the Visual Arts Department, and the kind of things his lab does could be considered both fine art and digital humanities. I’m interested in talking about the overlap, as well as how to involve artists in DH, not only in the areas they have been (maily web design) but also in more theoretical conceptual roles such as visualizations.

I’m not sure if this could stand on its own, or if it should be combined with a more general session on visualizations (which also seemed to be a hot topic at DH ’09).

mining for big history: uncouth things i want to do with archives

joguldi — Wed, 24 Jun 2009 22:43:56 +0000

woohoo THATcampers! i’m so psyched to hang out with you. actually, i need to learn from your enormous brains…

a major theme of my graduate course in digital history at the u of c was the opportunities lying around unprecedented scale of access and manipulability.

historians, for instance, typically train to write 20- to 40-year studies, at most 100-year histories; they frequently teach by the century, at most the five-hundred-year time period. proposal: digital archives, as a revolution in access, radically open the horizons for legitimate big history of long-term trends.

ideas for sessions:

* how would you text mine a 500-yr history? how bout a 5000-yr history? many of the tools for text-mining (cf philologic) look narrower and narrower within a peculiar text; how could these tools be used to crunch many texts across large time periods (off the top of my head: graph for me, computer, the top verbs used around the word “eye” in medical texts since Greece … )? how can timelines more usefully render the results visual (and interactive!)?

* how bout images. here we’re talking about 200 years what can you do with 1 billion photographs? what happens when you automagically photosynth (livelabs.com/photosynth/) the entire nineteenth- to twentieth-century city of London? what about “averaging” photos: www.faceresearch.org/demos/average ? what does the average house look like, decade by decade? what does an average coal miner look like?

* how bout maps. doug knox (hi doug!) and i have been talking with the newberry map librarians about how you’d collate atlases of place names, travelers’ diaries, and maps to annotate an interactive atlas of chicago where any given block could be peeled back, year by year. how would you make a 300-year thick map of the american west?

Digital Archive

ssmulyan — Tue, 23 Jun 2009 15:58:25 +0000

Just wanted to post briefly to update my application essay. I’m a faculty member at Brown and have just returned from a semester at the University of Melbourne, Australia, where a colleague and I co-taught an American history honors seminar called “American Publics.” Next year at this time, we will teach the course at Brown AND at Melbourne and link the students digitally.

That’s what I wrote in the application, now I have to figure out what I mean when I said “link the students digitally.” I explored and rejected wikis (the two campuses assign different kinds of writing and the students have different stakes in the writing) and existing social networking tools. I think what we want to do is design a lightweight, experimental “archive” to which students can upload texts (scanned documents, websites, images, sound files) to share across campuses. The new Center for Digital Scholarship at the Brown University Library will build a password protected web environment (using PHP and SOLR) within which students may upload, describe, and annotate digital resources. Students will be able to search and browse their resources, and arrange them into sets based on catalog records and/or student designed taxonomic tags. The interface would create XML records for submitted assets and then post that data to the index. We have done this for other student research projects at Brown and plan something easily portable, that could also be used or hosted at the University of Melbourne. We want to make this project experimental and quickly set up so that we can change it and modify it as we go.

We would hope to be able to share such a tool once its designed and tested and would love to hear thoughts about what we should and shouldn’t include and any possible challenges you could foresee.

All that said, I also teach a graduate course called “Digital Scholarship” for humanities and social science students and look forward to a discussion of what kind of tools, competencies and knowledge graduate students need.

Using web tools to let students reach the public

Steven Lubar — Tue, 23 Jun 2009 15:48:28 +0000

I want to learn about how to use new web-interaction tools for teaching classes that have a public product. Students in the Brown public humanities program do exhibitions and programs for the public, and it would be good to add web outreach projects to those. A few of the tools that I’ve played with, but want to know more about:
•   Crossroads (shared markup of documents)
•   Voicethread (commenting on images, words, video)
•   Dipity (creation of timelines)
•   Omeka (collections)
•   Flickr (images)
And I’m sure there are others… I’d like to know more about them, especially tools that can be combined with oral history projects.
Several challenges here…
One is doing these as group projects – how to get a class, or several small groups from a class, to work together on these.
Another is how to automate the process of moving between these tools, and more traditional databases. Can we, for example, pull pictures from a historical society’s PastPerfect system, put the pictures onto Flickr, the objects in Omeka, and display a timeline on Dipity, without doing it all by hand. Can we take a community-curated collection from Flickr and move it into Omeka, or into a library system with better long-term storage, metadata, control, etc., without having to re-enter the data that’s there – and to continue to collect data from the public and capture it long-term?
Lots of questions!

Steve

An actual digital revolution?

nm45 — Tue, 23 Jun 2009 10:15:13 +0000

I’m very new to this kind of community but I’ve been struck by how often a rhetoric of “digital revolution” versus a “conservative” establishment has been used in these posts. I wonder if there should not be time to discuss what appears to be a set of digital revolutions that are actually taking place, such as the current crisis in Iran, the censorship program in China/Burma etc. It’s striking to me how a technology like Twitter that has been widely derided in the US as self-indulgent narcissism has come to play a central role in disseminating ideas and information in situations such as the Bombay bombings and the current Iranian crisis. For me, the humanities must pay attention to developments such as these in making claims for the significance of networked critical practice. Or is this so obvious a thought that it’s taken for granted in digital circles, in which case I apologize!?

Travel practicalities?

Lisa Grimm — Tue, 23 Jun 2009 14:59:11 +0000

I know there’s been a bit of dicussion back and forth about the best ways to get to and from GMU, but I thought I’d try to get it all together in a central location. I’m told by the folks at the Hampton Inn (where I’ll be staying, and I’m sure there are others as well) that it’s best to take the Orange Line (presuming everything is more or less normal on it after yesterday’s news) to Vienna/GMU and take a cab.

I’m sure there will be a few people gathering in the lobbies of both hotels Saturday and Sunday mornings – will people be sharing taxis to the campus, or is it walkable? Google Maps offers a bit of a zigzag walking path and I wondered if there was a short cut.

I saw that the shuttle to GMU from the Metro is normally reserved for students – do they let conference attendees aboard?

Anyway, I’m just looking to get some advice from locals – I’m sure others have similar questions.

Thanks!

Crowdsourcing & outreach

Lisa Grimm — Mon, 22 Jun 2009 16:59:26 +0000

I mentioned briefly in my original post that we have a good deal of 19th-century textual material in what are now fairly obscure German dialects; we have scanned the originals and had parts translated, but we have two further goals with this project: 1) encourage more translations of other related material and 2) create a resource/destination for researchers to debate and discuss the materials. (A further goal is to start digitizing our snazzy Paracelsus collection, once we have this in place as a test case – but that’s down the road).

We have the scanning and digital object handling well underway (or would, if our server upgrade were finally finished, but that’s another story – I may not be able to show off much from this collection, but can demonstrate how we are set up with some other materials), but we are looking for inspirations and ideas for the other two goals. Since we’re looking for low IT involvement, creating a blog highlighting some of the materials and encouraging discussion in the comments is one idea, but we’d like to avoid creating an additional digital ‘space’ that we’d require users to navigate to (especially since we already have a blog for our collections in general).

Is anyone using a more interactive plugin (or similar more modular feature) to create spaces for discussion in a way that’s still tied to the digital object? One of our concerns is that there may be a steep IT learning curve for a high percentage of scholars in this particular subfield and we’d like to make sure they all feel welcomed, so ease of use is key. We are also looking to use the project to reach out to other scholars who might not currently be aware of the materials (likely language scholars and historians in related fields) and feel pretty confident about putting that plan in place once we know what sort of sandbox we can offer them.

Anyway, I would love to hear what suggestions everyone has and am definitely looking forward to seeing some examples of what everyone else has done.

Digital Publishing-Getting Beyond the Manuscript

david parry — Mon, 22 Jun 2009 09:53:40 +0000

Here is the original submission I made to THATCamp followed by some additional background ideas and thoughts:

Forget the philosophic arguments, I think most people at THATCamp are probably convinced that in the future scholarly manuscripts will appear first in the realm of the digital, I am interested in the practical questions here: What are born digital manuscripts going to look like and what do we need to start writing them? There are already several examples, Fitzpatrick’s Planned Obsolescence, Wark’s Gamer Theory, but I want to think about what the next step is. What kind of publishing platform should be used (is it simply a matter of modifying a content management system like WordPress)? Currently the options are not very inviting to academics without a high degree of digital literacy. What will it take to make this publishing platform an option for a wider range of scholars? What tools and features are needed (beyond say Comment Press), something like a shared reference manager, or at least open API, to connect these digital manuscripts (Zotero)? Maybe born digital manuscripts will just be the Beta version of some books which are later published (again i.e. Gamer Theory)? But, I am also interested in thinking about what a born digital manuscript can do that an analog one cannot.

Additional Thoughts:

So I should start by saying that this proposal is a bit self serving. I am working on “a book,” (the proverbial tenure book), but writing it first for the web. That is rather than releasing the manuscript as a beta version of the book online for free, or writing a book and digitally distributing it, I want to leverage the web to do things that cannot be accomplished in a manuscript form. It is pretty clear that the current academic publishing model will not hold. As I indicated in the original proposal above, I think that most participants at THATCamp are probably convinced that the future of academic publishing is in some ways digital (although the degree to which it will be digital is probably a point of difference). But, in working with this project I have come to realize that the tools for self digital publishing are really in the early stages, a pre-alpha release almost. Yes, there are options, primarily blogs, but for the most part these tend to mimic “book centered” ways of distributing information. To be sure there are examples of web tools which break from this model, namely CommentPress, but I am interested in thinking about what other tools might be developed and how can we integrate them. And at this point I think you have to be fairly tech savvy or have a “technical support team” to be able to do anything beyond a simple blog, or digital distribution of a manuscript (say as a downloadable .pdf). For me one of the early models we can look to is MacKenzie Wark’s Gamer Theory, but he had several people handling the “tech side.” For me I can get the tech help to do the things I cannot on my own, but is seems pretty clear that until the tools are simple and widely available digital publishing will either remain obscure or overly simple/conservative (just a version of the manuscript).

So, what tools do we need to be developing here? Should we be thinking about tools or about data structures and than developing tools around that? (I realize this is not an either or proposition.) I am imagining something like WordPress with a series of easy to install plugins that would open up web publishing to a much wider range of scholars. Perhaps a “publisher” could host these installs and provide technical support making it even easier for academics. I have a fairly good idea of what I personally want for my project, but am interested in thinking about/hearing about what other scholars, particularly those from other disciplines would need/want.

How to make Freebase useful in the digital humanities?

raymond yee — Sat, 20 Jun 2009 00:07:07 +0000

I would like to lead a session on the application of Freebase.com to the humanities. Freebase is “open database of the world’s information”, with an API that allows for integration with other applications (such as Zotero). I’ve been experimenting with using Freebase.com in the realm of government data, specifically to create PolDB, an “IMDB for politicians” (though my progress has been meagre so far.) I would like to share my experiences on that front, speculate on the usefulness of Freebase for applications in the humanities (particularly art history), and foster a discussion about the application of other “semantically oriented” techniques beyond Freebase.

Teaching Digital Archival and Publishing Skills

Erin Bell — Sat, 13 Jun 2009 02:57:53 +0000

I’ve been putting this off for a while now, especially after seeing some of the really impressive projects other campers are working on. My job is not research-oriented; much of what I do revolves around operationalizing and supporting faculty projects in the History Department where I work. What follows is a rather long description of one such project in which students, in the context of a local history research seminar, are tasked with digitizing archival items, cataloging them using Dublin Core, and creating Omeka exhibits that reflect the findings from their traditional research papers. Despite the fact that the students are typically Education or Public History majors, they are expected to carry out these tasks to standards which can be challenging even to professional historians and librarians.

I’ve written about some of the practical challenges in projects like this here. For a full description of the project at hand, click through the page break below. What is intriguing me right now are the questions such projects raise, particularly those relating to content quality and presentation.

What are realistic expectations for metadata implementation? Is enforcing metadata standards even appropriate in the context of humanities education? Many trained librarians aren’t even competent or consistent at cataloging, how can we expect more from undergrad History students? It’s not that they don’t gain from it (whether they like/know it or not), it’s just that poor metadata might be worse than none. Information architecture is another challenge, even when students have no role in the initial site design. They can still confuse the navigation scheme and decrease usability through poorly organized contributions. Likewise, the content students create is not always something we want to keep online for any number of reasons. Where do you draw the line between a teaching site (as in, a site designed and used for training projects) and one which is distinctly for use by the broader public? It’s very blurry to me, but I think how you answer that dictates what you are willing to do and what you end up with. We really want to create something that is generated entirely by students but with a life outside the classroom. Ultimately though, we will make decisions that best serve our instructional goals. I think the value is the process, not the result (though it would be nice for them to match up). We have done some very ambitious and high quality projects working with small, dedicated teams, but working with large class groups has led to some interesting and unforeseen problems. I wonder if anyone has any idea about how we might be able to replicate that small team experience and quality on this significantly larger scale.

Has anyone out there done a similar project? I’d love to hear some experiences and/or suggestions on pedagogy, standards or documentation?

I think this fits in to some degree with Jim Calder’s post and Amanda French’s post, among others (sadly, I have yet to read all the posts here, but I will get to it soon and maybe hit some people up in the comments).

OVERVIEW
This past semester, the Center for Public History and Digital Humanities at CSU has been training teachers, interns and undergraduate students in the History Department to use Omeka as a tool for exploring archives, sharing research, and curating personal exhibits. Students in our Local History Seminar are trained in archival research, image handling and digitization, and archival description and subject cataloging, including the use of Dublin Core metadata. In the interest of harnessing student labor for the benefit of the library, and protecting heavily used artifacts from further deterioration, we have tightened the process so that each participant’s labor may yield results that can be directly transferred to the library’s digital archive, Cleveland Memory , which runs on the ContentDM platform. Through trial and error, we have devised a barebones metadata plan, set digital image processing standards, and crafted a workflow that optimizes time and labor investments by students, faculty, and department and library staff. We hit a few bumps along the way, but have plans to revise our process next semester.

EDUCATIONAL RATIONALE
Holistic experience in history-making, from archival process to research to public exhibition

Creation and collection of student-generated content (images, maps, charts, exhibits, etc.)
Hands-on research in physical and digital archival collections
Image processing (digitizing physical artifacts according to locally-defined best practices)
Archival description using common metadata standards (Dublin Core)
Increased awareness of organization and use of metadata in libraries/archives may lead to increase in use and overall research effectiveness?
Experience using online archival software / publishing platform (Omeka)
Curating thematic local history exhibits based on area of research
We believe this increases readiness for employment, teaching, and continued education.

PROCESS
Students choose a research topic in local history, most often a neighborhood, park, district or institution/building with historical interest. Students are required to write a 15 page analytical research paper based in primary source research. They collect documents and images from available archival resources, including both digital and physical artifacts. Items are uploaded to an Omeka installation (csudigitalhumanities.org/exhibits) and described using Dublin Core and local metadata standards. Non-digital items are digitized according to processing guidelines set by CSU Special Collections. Using the items they collect, and the content from their research papers, students use Omeka to curate an interpretive exhibit around their topic, which they present to the class at the end of the semester. Professors spend a limited amount of class time providing ongoing instruction and guidance in technical matters, but generally focus on content.

As Center staff, I met with the class for hands-on sessions in Omeka use and image digitization, and have created handouts and an online student guide (csudigitalhumanities.org/exhibits/guide) containing instructions for using Omeka, digitizing items, and employing metadata standards. The guide contains general rules for Dublin Core and, as the first semester progressed, has evolved to also address common mistakes and questions. I track and enforce quality control on new items, and use the MyOmeka plug-in to leave administrative notes on each record containing instructions for correcting errors, as well as other suggestions for improvement. These notes can be seen only by students and administrators who are logged in with the single shared username. At the end of the semester, items and exhibits are graded and vetted to determine which will remain online. Items which contain complete metadata records and meet copyright and quality standards are exported into the Cleveland Memory collection. The rest are deleted. High-quality Exhibits remain public, others are deleted or made private.

RESULTS
Despite the extensive documentation, administrative notes, classroom instruction, and my availability for one-on-one consultation, the results in our first run were decidedly mixed. About one-third of students met the expectations for overall quality; another third came very close but made a few significant mistakes. Common mistakes included use of copyright protected items, grammar and syntax errors in metadata creation, improper use of controlled vocabulary terms, use of editorial voice in item descriptions, and image processing errors (low resolution, poorly cropped or aligned images, etc.). Others failed to translate their research into well-crafted exhibits, despite the fact that their in-class presentations were almost unanimously excellent.

From an administrative perspective, we also have some work to do to streamline the process. Some of our challenges involved limitations with the Omeka software, which was not necessarily designed for such projects.

We gave comments via the MyOmeka plug-in, which requires students to log-in and find their items via the public view. Once they find an item in need of correction, they must return to the admin view to make corrections and cannot see comments without again returning to the public view. At least one student complained about this cumbersome process. It was equally difficult for administrators. While printing out item records and adding handwritten notes would have been ideal for students and instructors, our workflow and other commitments dictated that this would not be possible.

At the end of the semester, we began the vetting process. I went through and reviewed each item, tagging them with “keep,” “revise,” “remove,” “rights,” and “cmp.” “Rights” was assigned to items in which copyright status was uncertain. “CMP” was assigned to items which were already available via the Cleveland Memory project. The tags were useful in quickly identifying the status of each item in the collection, but moving beyond that point has proven problematic. For one, the University dictates that we keep student work for up to 6 weeks after the end of the semester. Were the items and exhibits graded as a final exam, we would need to keep them for a full semester (thankfully, the physical research paper was “the final” for this course). Additionally, there is no easy way to batch delete or batch edit items from Omeka. Again, this is not necessarily a shortcoming in Omeka’s architecture, just a limitation of our project design. Due to each of these issues, we are making items and exhibits public or not public according to our vetting criteria. Deletions and revisions will have to wait at least six weeks.

We have decided to postpone plans for migration to Cleveland Memory until we can address some of the problems encountered in our trial run. We are optimistic that we can improve our instructional and administrative processes next semester, but that will require some new approaches and answers to some of the questions that emerged the first time around.

NEW APPROACHES

Next semester we will use the Contribution plug-in to collect items. This will allow us to limit confusion about which fields to fill and will also allow us to track submissions more effectively. Because we still want students to have some experience with metadata standards, and need to collect some additional information for later migration to the Cleveland Memory repository, we have customized the plug-in to include some additional fields.

To solve the issues of grading and revision, as well as required retention, we will use the ScreenGrab plug-in for Firefox, which allows for the capture of complete web pages. Students will save each item record and exhibit page in JPEG or PNG format, adding them to a printable document that they will submit for review as items and exhibits are added.

We are still trying to figure out a way to modify and delete items in batches. Since most mistakes involved improper use of controlled subject terms, it would be nice if we could identify a recurring term and edit it in a way that would cascade across the entire installation (e.g. locate all instances of the incorrect subject “Terminal Tower” and replace each with “Union Terminal Complex (Cleveland, Ohio)” ). This would likely involve a major change in Omeka, which – to my knowledge – does not collate Subject fields in this way. Batch deletion for superusers, on the other hand, might be easier to accomplish. Any thoughts?

Students will receive more comprehensive training. Based on common mistakes and frustrations, we will adjust instruction and documentation accordingly.

Easy readers

Douglas Knox — Thu, 11 Jun 2009 04:05:55 +0000

At THATCamp ’08 I learned how to draw a smiley face with a few geometric programming commands.

Dan Chudnov demonstrated how to download Processing, a Java-based environment intended for designers, visual artists, students, and others who want to create something without being full-time professional programmers. Dan’s purpose was to show librarians, scholars, artists, and free-range humanists that getting started with simple programming isn’t as hard as people sometimes think. You don’t have to be a computer scientist or statistician to develop skills that can be directly useful to you. Dan posted a version of what he was demonstrating with the tag “learn2code.”

I’m not a trained programmer, was not new to programming altogether, but was new to Processing, and for a while I didn’t have much reason or time to do more with it. But last winter I found myself highly motivated to spend some of my spare time making sense of tens of thousands of pages of text images from the Internet Archive that were, for my purposes, undifferentiated. The raw, uncorrected OCR was not much help. I wanted to be able to visually scan all of them, start reading some of them, and begin to make some quick, non-exhaustive indexes in preparation for what is now a more intensive full-text grant-funded digitization effort (which I will also be glad to talk about, but that’s another story). I wanted to find out things that just weren’t practical to learn at the scale of dozens of reels of microfilm.

Processing has turned out to be perfect for this. It’s not just good for cartoon faces and artistic and complex data visualizations (though it is excellent for those). It is well suited to bootstrapping little scraps of knowledge into quick cycles of gratifying incremental improvements. I ended up cobbling together a half-dozen relatively simple throwaway tools highly customized to the particular reading and indexing I wanted to do, minimizing keystrokes, maximizing what I could get from the imperfect information available to me, and efficiently recording what I wanted to record while scanning through the material.

Having spent plenty of hours with the clicks, screeches, and blurs of microfilm readers, I can say that being able to fix up your own glorified (silent) virtual microfilm reader with random access is a wonderful thing. (It’s also nice that the images are never reversed because the person before you didn’t rewind to the proper spool.) And immensely better than PDF, too.

At THATCamp I would be glad to demonstrate, and would be interested in talking shop more generally about small quasi-artisanal skills, tools, and tips that help get stuff done — the kind of thing that Bill Turkel and his colleagues have written up in The Programming Historian, but perhaps even more preliminary. How do you get structured information out of a PDF or word processing document, say, and into a database or spreadsheet? Lots of “traditional humanists,” scholars and librarians, face this kind of problem. Maybe sometimes student labor can be applied, or professional programmers can help, if the task warrants and resources permit. But there is a lot of work that is big enough to be discouragingly inefficient with what may pass for standard methods (whether note cards or word processing tools), and small enough not to be worth the effort of seeking funding or navigating bureaucracy. There are many people in the humanities who would benefit from understanding the possibilities of computationally-assisted grunt work. Like artificial lighting, some tools just make it easier to read what in principle you could have found some other way to read anyway. But the conditions of work can have a considerable influence on what actually gets done.

More abstractly and speculatively, it would be interesting to talk about efficiencies of reading and scale. Digital tools are far from the first to address and exacerbate the problem that there is far more to be read and mapped out than any single person can cope with in a lifetime. Economies of effort and attention in relation to intellectual and social benefit have long shaped what questions can be asked and do get asked, and to some extent what questions can even be imagined. Digital tools can change these economies, although not in deterministically progressive ways. Particular digital applications and practices have all too often introduced obfuscations and inefficiencies that limit what questions can plausibly be asked at least as much as microfilm does. Which is why discussions even of low-level operational methods, and their consequences, can be of value. And where better than THATCamp?

From History Student to Webmaster?

jamesdcalder — Wed, 10 Jun 2009 20:25:54 +0000

Here’s my original proposal (or part of it at least):

“I would like to discuss the jarring, often difficult and certainly rewarding experiences of those, like myself, who have somehow managed to make the leap from humanities student to digital historian/webmaster/default IT guy without any formal training in computer skills. While I am hoping that such a discussion will be helpful in generating solutions to various technical and institutional barriers that those in this situation face, I am also confident that meeting together will allow us to better explain the benefits that our unique combination of training and experience bring to our places of employment. I would also be very interested to see if we could produce some ideas about how this group could be better supported in our duties both by our own institutions and through outside workshops or seminars.”

I’m not sure if this is the right place for this discussion, as I’m guessing that many campers may not share these difficulties. However, if enough people are interested, I think I’ll go with it. Related to this discussion, I would also like to talk about people’s experiences or recommendations for resources that could be useful to digital historians in training, as well as better ways to get our message about web 2.0, open source technologies, freedom of information, etc. to our colleagues.

Anyways, let me know what you all think.

Omeka playdate open to THATCampers

Dave — Wed, 10 Jun 2009 17:57:30 +0000

The Friday before THATCamp (June 26th) we’ll be holding an Omeka “playdate” that’s open to campers and anyone else who would like to attend. Interested in learning more about Omeka? Already using Omeka and want to learn how to edit a theme? Want to build a plugin or have advanced uses for the software? This workshop is a hands-on opportunity to break into groups of similar users, meet the development and outreach teams, and spend the part of the day hanging around CHNM.

We’ve increased the number of open spots, and would love to see some THATCampers sign up as well. If you plan on attending, please add you name to the wiki sign-up. Remember to bring your laptop!

Standards

rob nelson — Mon, 08 Jun 2009 21:22:08 +0000

Here’s my original proposal for THATCamp. The question and issues I’m interested in entertaining dovetail nicely, I think, with those that have been raised by Sterling Fluharty in his two posts.

The panel at last year’s THATCamp that I found the most interesting was the one on “Time.” We had a great discussion about treating historical events as data, and a number of us expressed interest in what an events microformat/standard might look like. I’d be interested in continuing that conversation at this year’s THATCamp. I know Jeremy Boggs has done some work on this, and I’m interested in developing such a microformat so that we can expose more of the data in our History Engine for others to use and mashup.

While I’d like to talk about that particular task, I’d also be interested in discussing a related but more abstract question too that might be of interest to more THATCampers. Standards make sense when dealing with discrete, structured, and relatively simple kinds of data (e.g. bibliographic citations, locations), but I’m wondering if much of the evidence we deal with as humanists requires enough individual interpretation to make it into structured data that the development of interoperability standards might not make that much sense. I’m intrigued by the possibility of producing data models that represent complex historical and cultural processes (e.g. representing locations and time in a way that respects and reflects a Native American tribe’s sense of time and space, etc.). An historical event doesn’t seem nearly that complicated, but even with it I wonder if as humanists we might not want a single standard but instead want researchers to develop their own idiosyncratic data models that reflect their own interpretation of how historical and cultural processes work. I’m obviously torn between the possibilities afforded by interoperability standards and a desire for interpretive variety that defies standardization.

In his first post, Sterling thoughtfully championed the potential offered by “controlled vocabularies” and “the semantic web.” I too am intrigued to by the possibilities that ontologies, both modest and ambitious, offer, say, to find similar texts (or other kinds of evidence), to make predictions, to uncover patterns. (As an aside, but on a related subject, I’d be in favor of having another session on text mining at this year’s THATCamp if anyone else is interested.) Sterling posed a question in his proposal: “Can digital historians create algorithmic definitions for historical context that formally describe the concepts, terms, and the relationships that prevailed in particular times and places?” I’m intrigued by that ambitious enterprise, but as my proposal suggests I’m cautious and skeptical for a couple of reasons. First, I’m dubious that most of what we study and analyze as humanists can be fit into anything resembling an adequate ontology. The things we study–e.g. religious belief, cultural expression, personal identity, social conflict, historical causation, etc., etc.–are so complex, so heterogeneous, so plastic and contingent that I have a hard time envisioning how they can be translated into and treated as structured data. As I suggested in my proposal, even something as modest as an “historical event” may be too complex and subjective to be the object of a microformat. Having said that, I’m intrigued by the potential that data models offer to consider quantities of evidence that defy conventional methods, that are so large that they can only be treated computationally. I’m sure that the development of ambitious data models will lead to interesting insights and help produce novel and valuable arguments. But–and this brings me to my second reservation–those models or ontologies are, of course, themselves products of interpretation. In fact they are interpretations–informed, thoughtful (hopefully) definitions of historical, cultural relationships. There’s nothing wrong with that. But adherence to “controlled” vocabularies or established “semantic” rules or any standard, while unquestionably valuable in terms of promoting interoperability and collaboration, defines and delimits interpretation and interpretative possibility. I’m anti-standards in that respect. When we start talking about anything remotely complex–which includes almost everything substantive we study as humanists–I hope we see different digital humanists develop their own idiosyncratic, creative data models that lead to idiosyncratic, creative, original, thoughtful, and challenging arguments.

All of which is to say that I second Sterling in suggesting a session on the opportunities and drawbacks of standards, data models, and ontologies in historical and humanistic research.

Digital Humanities Manifesto Comments Blitz

Tom Scheinfeldt — Mon, 08 Jun 2009 19:39:25 +0000

I just managed to read UCLA’s Digital Humanities Manifesto 2.0 that made the rounds a week or so ago, and I noticed its CommentPress installation hadn’t attracted many comments yet. Anyone interested in a session at THATCamp where we discuss the document paragraph by paragraph (more or less) and help supply some comments for the authors?

Campfire Plans

Richard Urban — Wed, 03 Jun 2009 22:29:55 +0000

Maybe this isn’t the right venue, but sometimes it’s never too early to start talking about extracurricular activities. What happens Saturday/Sunday night? Will Amanda French be leading us in a round of digital humanities songs around the campfire?

An installation

david staley — Wed, 03 Jun 2009 20:13:05 +0000

Colleagues,

I, too, am eager for the camp to begin, and seeking your insights for the project I will be presenting.

I will be using the video wall in the Showcase center to display a digital installation titled “Syncretism,” which will run for both days of the camp. The piece is an associative assemblage of still images that each depict instances of cultural syncretism; juxtaposed together, the images suggest associations and analogies, and this a larger theme, between differing instances of cultural syncretism (for example, images of “English-style Indian food” juxtaposed next to skyscrapers in Shanghai next to a rickshaw driver in Copenhagen.

I am seeking feedback both on the visual message of the installation itself, as well as thoughts on the idea of an installation as an example of scholarly performance in the humanities. Is there space in the humanities for a “humanities-based imagist?”

I don’t know if I should propose a separate session to discuss these themes, or whether I should informally speak with you all during the conference while the installation runs.

In any event, I am eager to hear your thoughts about the installations.

Granular annotation frameworks

andyashton — Fri, 29 May 2009 14:35:02 +0000

A lot of great tools exist to annotate collections and bibliographies – Zotero being one of the best lightweight examples for end-users. At the same time, some large scale projects are exploring annotations as low-level data objects. I want to discuss the middle – potential annotation frameworks that could slip easily into the services layer of web applications for manipulating textual collections, particularly TEI. One idea is to use AtomPub to post, retrieve, and edit annotations tied to texts and text collections. There are several benefits to this approach: one is the ease with which one could embed metadata that could be used to ingest annotations into a digital repository as independent objects, to be recombined with texts at the application level; Another is that it would establish an annotation framework that could apply to diverse types of collections, and would enable the ability to annotate data using rich media.

While AtomPub is easy to implement, building connections between Atom documents and very granular segments of text or multimedia is more difficult. For TEI, there are some native tools (XPointer), but they are fairly clunky. There are also abstraction tools that could be used to tokenize a text for annotation purposes, but the complexity involved in building that abstraction layer may negate the benefits of a simple, RESTful annotation framework that uses AtomPub.

I would like to work with other folks at THATCamp to brainstorm and hopefully test some ideas for using AtomPub for granular annotation.

Digital History Across the Curriculum

Amanda French — Wed, 27 May 2009 20:18:31 +0000

How can digital skills and issues be thoroughly incorporated into a humanities curriculum, especially a graduate curriculum? It’s basically a “lazyweb” question, because that’s exactly the question I’m grappling with now in my current position, so if the minds at THATcamp would help me, I’d be extremely grateful indeed. It’s easy enough to design and teach a digital humanities course or two, but there’s something about that approach that just seems wrong. It keeps digital humanities in its own little pen, which is odd considering that those of us yelling into that echo chamber simply *know* that the whole practice of the humanities is going to have to come to terms with new technologies sooner or later. It’s also odd considering how many more careers are opened up to digitally literate people. I do think that digital humanities has been very much a research-oriented field, and I’d really like to concentrate on teaching for a bit. It may be that current educational course-centric structures are simply inimical to the digital humanities; I wager that most of us learned to be digital humanists through collaborative project work and self-directed study, which aren’t well supported by a 3-credit single-teacher single-department course structure.

[Several months later . . . ]

I’m in the thick now of writing a curriculum, and I can tell you a few things:

There are guidelines for M.A. programs set by the National Council on Public History and the Society of American Archivists, and I’m drawing heavily on those. There’s also the AHA’s book, The Education of Historians for the Twenty-First Century, published 2004, but I haven’t had a chance to look at it yet — I’m pretty sure there’s nothing about social networking in it, though! There’s also Dan Cohen’s recent narrative of the GMU PhD in Digital History in the May 2009 issue of AHA’s Perspectives.

What there isn’t is a set of guidelines for baseline digital skills that humanists should have. Perhaps all humanists don’t need digital skills. Nevertheless, it’s something I’m hacking away at.

(Let me just work out a Zotero issue & I’ll link to my bibliography with the above-named resources in it.)

Cebula Proposal for THATCamp

larrycebula — Wed, 27 May 2009 04:17:35 +0000

Here is what I proposed for THATCamp:

I have two major interests that I would bring to ThatCamp. The first is how to make my institution, the Washington State Digital Archives, more interactive, useful, and Web 2.0ish. We have 80 million documents online but a quirky interface that does not allow much interaction. I need not only ideas on how to change, but success stories and precedents and contacts to convince my very wary state bureaucracy that we can and have to change.

Second, I am interested in all manner of digital history training. I just began directing a Public History graduate program at Eastern Washington University. How can I prepare my history MA students for the jobs that are instead of the jobs that were? How do I work with the computer science and geography departments? How do I, a traditionally trained scholar, model the new realities for my grad students? There just is not space in an already-crowded 60 credit program for a bunch of courses on web design and such. I need to integrate digital training into an existing curriculum.

Co-housing at the hotel, anyone?

Vika Zafrin — Mon, 18 May 2009 13:22:58 +0000

Anyone looking to co-house for the conference? Let me know. I don’t care about your gender, but I’d frown upon late-night booze-fueled ruckus.