THATCamp CHNM 2009

Posts Tagged ‘ontologies’

Standards

Monday, June 8th, 2009

Here’s my original proposal for THATCamp. The question and issues I’m interested in entertaining dovetail nicely, I think, with those that have been raised by Sterling Fluharty in his two posts.

The panel at last year’s THATCamp that I found the most interesting was the one on “Time.” We had a great discussion about treating historical events as data, and a number of us expressed interest in what an events microformat/standard might look like. I’d be interested in continuing that conversation at this year’s THATCamp. I know Jeremy Boggs has done some work on this, and I’m interested in developing such a microformat so that we can expose more of the data in our History Engine for others to use and mashup.

While I’d like to talk about that particular task, I’d also be interested in discussing a related but more abstract question too that might be of interest to more THATCampers. Standards make sense when dealing with discrete, structured, and relatively simple kinds of data (e.g. bibliographic citations, locations), but I’m wondering if much of the evidence we deal with as humanists requires enough individual interpretation to make it into structured data that the development of interoperability standards might not make that much sense. I’m intrigued by the possibility of producing data models that represent complex historical and cultural processes (e.g. representing locations and time in a way that respects and reflects a Native American tribe’s sense of time and space, etc.). An historical event doesn’t seem nearly that complicated, but even with it I wonder if as humanists we might not want a single standard but instead want researchers to develop their own idiosyncratic data models that reflect their own interpretation of how historical and cultural processes work. I’m obviously torn between the possibilities afforded by interoperability standards and a desire for interpretive variety that defies standardization.

In his first post, Sterling thoughtfully championed the potential offered by “controlled vocabularies” and “the semantic web.” I too am intrigued to by the possibilities that ontologies, both modest and ambitious, offer, say, to find similar texts (or other kinds of evidence), to make predictions, to uncover patterns. (As an aside, but on a related subject, I’d be in favor of having another session on text mining at this year’s THATCamp if anyone else is interested.) Sterling posed a question in his proposal: “Can digital historians create algorithmic definitions for historical context that formally describe the concepts, terms, and the relationships that prevailed in particular times and places?” I’m intrigued by that ambitious enterprise, but as my proposal suggests I’m cautious and skeptical for a couple of reasons. First, I’m dubious that most of what we study and analyze as humanists can be fit into anything resembling an adequate ontology. The things we study–e.g. religious belief, cultural expression, personal identity, social conflict, historical causation, etc., etc.–are so complex, so heterogeneous, so plastic and contingent that I have a hard time envisioning how they can be translated into and treated as structured data. As I suggested in my proposal, even something as modest as an “historical event” may be too complex and subjective to be the object of a microformat. Having said that, I’m intrigued by the potential that data models offer to consider quantities of evidence that defy conventional methods, that are so large that they can only be treated computationally. I’m sure that the development of ambitious data models will lead to interesting insights and help produce novel and valuable arguments. But–and this brings me to my second reservation–those models or ontologies are, of course, themselves products of interpretation. In fact they are interpretations–informed, thoughtful (hopefully) definitions of historical, cultural relationships. There’s nothing wrong with that. But adherence to “controlled” vocabularies or established “semantic” rules or any standard, while unquestionably valuable in terms of promoting interoperability and collaboration, defines and delimits interpretation and interpretative possibility. I’m anti-standards in that respect. When we start talking about anything remotely complex–which includes almost everything substantive we study as humanists–I hope we see different digital humanists develop their own idiosyncratic, creative data models that lead to idiosyncratic, creative, original, thoughtful, and challenging arguments.

All of which is to say that I second Sterling in suggesting a session on the opportunities and drawbacks of standards, data models, and ontologies in historical and humanistic research.

Tags: data models, microformats, ontologies, standards
Posted in Uncategorized | 7 Comments »

Zotero and Semantic Search

Friday, May 29th, 2009

Here is my original proposal for THATCamp, which I hoped would fit in with session ideas from the rest of you:

I would like to discuss theoretical issues in digital history in a way that is accessible and understandable to beginning digital humanists. This is probably the common thread running through my interests and research. I really wonder, for instance, whether digital history has its own research agenda or whether it simply facilitates the research agenda of traditional academic history. I believe that Zotero will need a good theory for its subject indexing before it can launch a recommendation service. Are any digital historians planning on producing any non-proprietary controlled vocabularies? We need to have a good discussion of what the semantic web means for digital history. Are we going to sit on our hands while information scientists hardwire the Internet with presentist ontologies? Can digital historians create algorithmic definitions for historical context that formally describe the concepts, terms, and the relationships that prevailed in particular times and places? What do digital historians hope to accomplish with text mining? Are we going to pursue automatic summarization, categorization, clustering, concept extraction, entity relation, and sentiment analysis? What methods from other disciplines should we consider when pursuing text mining? What should be our stance on the attempt to reduce the “reading” of texts to computational algorithms and mathematical operations? Will the programmers among us be switching over to parallel programming as chip manufacturers begin producing massively multi-core processors? How prepared will we be to exploit the full capabilities of high-performance computing once it arrives on personal computers in the next few years?

Here is a post that just went up at my blog that addresses some of these issues and questions:

Zotero and Semantic Search

The good news is that Zotero 2.0 has arrived. This long-awaited version allows a user to share her or his database/library of notes and citations with others and to collaborate on research in groups. This will be a tremendous help to scholars who are coauthoring papers. It also has a lot of potential for teaching research methods to students and facilitating their group projects.

(more…)

Tags: ontologies, recommendation engines, semantic search, zotero
Posted in Session Ideas | 14 Comments »