The following are my raw notes on the Saturday morning “Context and Connections” session. Assume that unless otherwise noted, they are paraphrases of comments made by participants (either labeled or not). It began with a note that this was about making connections with and adding context to historical document collection (e.g., The Papers of Thomas Jefferson with Monticello/Jefferson Foundation, on the UVA Press Rotunda site), but this is about both research and teaching. The problem in the classroom: students often USE digital archives but do not interact with it in terms of mashups (or scholars, with contribution)
Someone suggested this is sort of like Thomas Jefferson’s FB page: who were his friends, etc.
Montpelier/Madison foundation has a hierarchical set of keywords and two separate databases for names that may not interact.
Problem of places/data sets that do not talk to each other (e.g., LoC has largest set of Jefferson papers, but limited (and difficult-to-read) image sets.
So if there’s a suite of tools, is there one appropriate for both archivist/research community and for students?
MIT Media Lab’s Hugo Liu has an older project that simulated “what would they think?” AI.
Web forces textual connections (links). E.g., Wikipedia keyword linkages. It is not required to rely on a folksonomy; could have a multi-level tagging system (by persona).
How much text-mining (by computer) and how much is intensity of analysis/interpretive-focused? LoC project on Civil War letters is on the second end of the spectrum.
From library/archive world: WordPress has hierarchical categories AND (nonhierarchical) tags
Someone asked about a tag suggestion system? Someone noted that existed with delicious.
Another person: Try Open VocabThat does move it into the semantic
What to do with “rough piles” of tags, etc. If the tags accrete, we will want to analyze who tags how, and how that changes depending on context (and time.
“That sounds like scholarship.
Conversation. “That sounds like scholarship.”
Tags aren’t enough. Conversation isn’t enough. I want both.
We want a person behind that tag.
The Old Bailey is working on this problem — the largest holding of information on dead proletariats in the world, and how do we make connections among sparse information (e.g., Mary arrested as prostitute, with place of work, date, and pimp).
We need a Friendster of the Dead.
Maybe a way of figuring out by context who wrote (or context of writing).
[Sherman]: Like quantitative ways of guessing authors of individual Federalist Papers, except less well defined
Archivists have to do that all the time — “what did this word mean”? Time and place contexts
A question of how much preprocessing is required…
We need a way of mapping concepts across time. There’s only so much computationally that you can do. A social-networking peer review structure so that experts winnowed out the connections that a program suggested.
That’s a task good for socializing students — give them a range of potential connections, make them winnow the set and justify the judgments.
As a scholar, I need computers to suggest connections that I will judge by reading the sources.
Library (archival collection) no longer provides X that scholars use. There needs to be a conversation/collaboration.
Philologists on disambiguation: that’s a tool I can use.
Toolbuilding is where these connections will be made: with Zotero and Omeka, I spend as much time talking with archivists/librarians as with scholars.
Does anyone know about the Virtual International Authority File?
There are standards for marking up documents in public format? Will that standardization translate to what we do online, much more loose and free with Digital Humanists.
Back channel link to Historical Event Markup and Linking (HEML) project.
The “related pages” links for Google sometimes work for documents.
You don’t know why something is coming up as similar, and that’s a personal disambiguation process (reflection).
Discussion about extending core function of Text Encoding Initiative.
Discussion around www.kulttuurisampo.fi/ about intensity of work, selection of projects, etc.
DBPedia– controlled-vocabulary connection analysis for Wikipedia from infoboxes on articles, but the software is open-source. (and could be applied to any MediaWiki site).
Keep an eye on the IMLS website! – there is a project proposal to use TEI for other projects.