Standards

June 8th, 2009
rob nelson
Tags: data models, microformats, ontologies, standards

Here’s my original proposal for THATCamp. The question and issues I’m interested in entertaining dovetail nicely, I think, with those that have been raised by Sterling Fluharty in his two posts.

The panel at last year’s THATCamp that I found the most interesting was the one on “Time.” We had a great discussion about treating historical events as data, and a number of us expressed interest in what an events microformat/standard might look like. I’d be interested in continuing that conversation at this year’s THATCamp. I know Jeremy Boggs has done some work on this, and I’m interested in developing such a microformat so that we can expose more of the data in our History Engine for others to use and mashup.

While I’d like to talk about that particular task, I’d also be interested in discussing a related but more abstract question too that might be of interest to more THATCampers. Standards make sense when dealing with discrete, structured, and relatively simple kinds of data (e.g. bibliographic citations, locations), but I’m wondering if much of the evidence we deal with as humanists requires enough individual interpretation to make it into structured data that the development of interoperability standards might not make that much sense. I’m intrigued by the possibility of producing data models that represent complex historical and cultural processes (e.g. representing locations and time in a way that respects and reflects a Native American tribe’s sense of time and space, etc.). An historical event doesn’t seem nearly that complicated, but even with it I wonder if as humanists we might not want a single standard but instead want researchers to develop their own idiosyncratic data models that reflect their own interpretation of how historical and cultural processes work. I’m obviously torn between the possibilities afforded by interoperability standards and a desire for interpretive variety that defies standardization.

In his first post, Sterling thoughtfully championed the potential offered by “controlled vocabularies” and “the semantic web.” I too am intrigued to by the possibilities that ontologies, both modest and ambitious, offer, say, to find similar texts (or other kinds of evidence), to make predictions, to uncover patterns. (As an aside, but on a related subject, I’d be in favor of having another session on text mining at this year’s THATCamp if anyone else is interested.) Sterling posed a question in his proposal: “Can digital historians create algorithmic definitions for historical context that formally describe the concepts, terms, and the relationships that prevailed in particular times and places?” I’m intrigued by that ambitious enterprise, but as my proposal suggests I’m cautious and skeptical for a couple of reasons. First, I’m dubious that most of what we study and analyze as humanists can be fit into anything resembling an adequate ontology. The things we study–e.g. religious belief, cultural expression, personal identity, social conflict, historical causation, etc., etc.–are so complex, so heterogeneous, so plastic and contingent that I have a hard time envisioning how they can be translated into and treated as structured data. As I suggested in my proposal, even something as modest as an “historical event” may be too complex and subjective to be the object of a microformat. Having said that, I’m intrigued by the potential that data models offer to consider quantities of evidence that defy conventional methods, that are so large that they can only be treated computationally. I’m sure that the development of ambitious data models will lead to interesting insights and help produce novel and valuable arguments. But–and this brings me to my second reservation–those models or ontologies are, of course, themselves products of interpretation. In fact they are interpretations–informed, thoughtful (hopefully) definitions of historical, cultural relationships. There’s nothing wrong with that. But adherence to “controlled” vocabularies or established “semantic” rules or any standard, while unquestionably valuable in terms of promoting interoperability and collaboration, defines and delimits interpretation and interpretative possibility. I’m anti-standards in that respect. When we start talking about anything remotely complex–which includes almost everything substantive we study as humanists–I hope we see different digital humanists develop their own idiosyncratic, creative data models that lead to idiosyncratic, creative, original, thoughtful, and challenging arguments.

All of which is to say that I second Sterling in suggesting a session on the opportunities and drawbacks of standards, data models, and ontologies in historical and humanistic research.

7 Responses to “Standards”

Ryan Shaw Says:
June 8th, 2009 at 11:09 pm
Robert, you raise some interesting questions regarding the formal modeling of historical events. These are precisely the kinds of questions I’m attempting to address in my PhD dissertation. I’m looking at the notion of “historical event directories” that would provide a service for time analogous to the service place name gazetteers provide for space. I’ve found that a naive conception of historical events as objectively existing phenomena localized in time and space provides a poor grounding for such a service. Fortunately work in critical philosophy of historiography that provides some alternative conceptions that seem more promising. I really wish I could be at THATCamp to discuss this with you and others; unfortunately my dissertation plus a new baby made it impossible this year.

Incidentally, is there any record of the panel you mentioned on “Time” at last year’s THATCamp?
Douglas Knox Says:
June 8th, 2009 at 11:55 pm
Definitely worth discussing. I’ve been thinking for a while that there is an opportunity for history, especially, to be in productive tension with the vision of semantic web. Though I am quite interested to see semantic web ideas explored and developed (and I have a lot to learn about them), some of the more giddy early promises of what might be done with semantic web technology strike me as naive about history and its intellectual problems. “Linked data” is a useful compromise that puts the focus on good, web-savvy information engineering to support the activity of finding relevant information, in the way that library standards do — standards remain very useful in that way. But it seems like the semantic web wants to offer to outsource a layer or two of inferencing, and to do that well will require much more than standardized event models, controlled vocabularies, and taxonomies. It will require formalizing a “logic of history,” if not in the sense of a grand metanarrative, then as a formalization of the implicit modes of thought practiced by historians and others in historical disciplines. I think we can learn a lot from John Unsworth’s argument that markup is valuable because it “externalizes interpretation.” Even when the attempt at formalization fails, we can learn something. (www3.isrl.illinois.edu/~unsworth/newberry.04.html)

What would it mean to have a discussion of something like the [Temporal Modeling] of the English Working Class, for example? Date values and event models alone won’t get us very far in framing a good question. The semantic web offers to reify anything as a subject for assertions, but the interesting assertions in history are about change and becoming. Date values are relatively easy, and that data certainly can be useful, but it can be more interesting to historical humanities to see how time wears away at the predicates and values that purport to be fixed. Taxonomic categories themselves have histories, and will generate histories. The semantic web will sooner or later have to confront this problem in its own way. History could have something to contribute as a source of prior thinking and a site of interestingly hard challenges.
Musebrarian Says:
June 9th, 2009 at 8:41 pm
I’m familiar with the CIDOC Conceptual Reference Model (cidoc.ics.forth.gr/) designed for cultural heritage objects that includes a model of historical events. The full model may be more than is needed, but the CIDOC CRM has also generated a fair number of papers about its modeling choices (and potential pitfalls for other modelers of historic events).
THATCamp » Blog Archive Says:
June 10th, 2009 at 5:48 pm
[…] tools than I have. In some ways, my interests resonate with Robert Nelson’s post on standards, since I’m also thinking about what to do when the objects of humanistic study (in this case, […]
Arden Kirkland Says:
June 11th, 2009 at 1:01 pm
I definitely share your interest and concern with standards and controlled vocabularies and their potential to exclude alternative interpretations (I just commented on this in reply to one of Sterling Fluharty’s posts). I work with historic clothing, which is indeed idiosyncratic and therefore challenging, and has not been well served by existing structural models.

Douglas Knox – I appreciate your thoughts on this tension. I think we need to carefully consider our intended outcome, and whether some aspects of the semantic web will help or hinder. I’d love to learn more about “linked data.”

Musebrarian – the CDOC CRM seems interesting – do you know of any good examples of it in action? I don’t quite understand how it works.
Musebrarian Says:
June 17th, 2009 at 12:08 pm
Arden,

It has mostly been used for European projects – probably the best place to find some examples is via their References page: cidoc.ics.forth.gr/references.html

The problem is that CIDOC CRM is a higher level model that you would use as a guide for developing a local information model, not necessarily by directly applying it the way you would the Dublin Core. By basing your local model on CRM you can enable your data to be exchanged with others who have developed their own local information structure.

There are also some broader examples of how the CIDOC CRM is being used on the Semantic Museum (SeMuse) wiki www.semuse.org

While the CRM can be somewhat impenetrable to uninitiated, it may be useful for emerging humanities semantic web approaches to use as a guide to modeling certain kinds of problems (e.g. historical events).

CRM also has its share of critics who think it is too complex (and one of the reasons it hasn’t been as widely adopted as other approaches). It does fall on the ‘neat’ side of the neat vs. scruffie divide (is.gd/14Aeg) and may not suit everyone’s taste.
Liste non exhaustive des thématiques abordées lors des THATCamp | ThatCamp Paris 2010 Says:
May 4th, 2010 at 2:52 am
[…] thatcamp.org/2009/standards/ Sur les standars […]