Posts Tagged ‘data models’

Archiving Social Media Conversations of Significant Events

Tuesday, June 23rd, 2009

I’ve already proposed one session, but recent events in Iran and the various discussions of the role of social media tools in those events prompted this post.

I propose that we have a session where THATCampers discuss the issues related to preserving (and/or analyzing) the blogs, tweets, images, Facebook postings, SMS(?) of the events in Iran with an eye toward a process for how future such events might be archived and analyzed as well.  How will future historians/political scientists/geographers/humanists write the history of these events without some kind of system of preservation of these digital materials?  What should be kept?  How realistic is it to collect and preserve such items from so many different sources? Who should preserve these digital artifacts (Twitter/Google/Flickr/Facebook; LOC; Internet Archive; professional disciplinary organizations like the AHA)?

On the analysis side, how might we depict the events (or at least the social media response to them) through a variety of timelines/charts/graphs/word-clouds/maps?  What value might we get from following/charting the spread of particular pieces of information? Of false information?  How might we determine reliable/unreliable sources in the massive scope of contributions?

[I know there are many potential issues here, including language differences, privacy of individual communications, protection of individual identities, various technical limitations, and many others.]

Maybe I’m overestimating (or underthinking) here, but I’d hope that a particularly productive session might even come up with the foundations of: a plan, a grant proposal, a set of archival standards, a wish-list of tools, even an appeal to larger companies/organizations/governmental bodies to preserve the materials for this particular set of events and a process for archiving future ones.

What do people think?  Is this idea worth a session this weekend?

UPDATE:   Ok, if I’d read the most recent THATCamp proposals, I’d have seen that Nicholas already proposed a similar session and I could have just added my comment to his…..  So, we have two people interested in the topic.  Who else?

Standards

Monday, June 8th, 2009

Here’s my original proposal for THATCamp. The question and issues I’m interested in entertaining dovetail nicely, I think, with those that have been raised by Sterling Fluharty in his two posts.


The panel at last year’s THATCamp that I found the most interesting was the one on “Time.” We had a great discussion about treating historical events as data, and a number of us expressed interest in what an events microformat/standard might look like. I’d be interested in continuing that conversation at this year’s THATCamp. I know Jeremy Boggs has done some work on this, and I’m interested in developing such a microformat so that we can expose more of the data in our History Engine for others to use and mashup.

While I’d like to talk about that particular task, I’d also be interested in discussing a related but more abstract question too that might be of interest to more THATCampers. Standards make sense when dealing with discrete, structured, and relatively simple kinds of data (e.g. bibliographic citations, locations), but I’m wondering if much of the evidence we deal with as humanists requires enough individual interpretation to make it into structured data that the development of interoperability standards might not make that much sense. I’m intrigued by the possibility of producing data models that represent complex historical and cultural processes (e.g. representing locations and time in a way that respects and reflects a Native American tribe’s sense of time and space, etc.). An historical event doesn’t seem nearly that complicated, but even with it I wonder if as humanists we might not want a single standard but instead want researchers to develop their own idiosyncratic data models that reflect their own interpretation of how historical and cultural processes work. I’m obviously torn between the possibilities afforded by interoperability standards and a desire for interpretive variety that defies standardization.


In his first post, Sterling thoughtfully championed the potential offered by “controlled vocabularies” and “the semantic web.” I too am intrigued to by the possibilities that ontologies, both modest and ambitious, offer, say, to find similar texts (or other kinds of evidence), to make predictions, to uncover patterns. (As an aside, but on a related subject, I’d be in favor of having another session on text mining at this year’s THATCamp if anyone else is interested.) Sterling posed a question in his proposal: “Can digital historians create algorithmic definitions for historical context that formally describe the concepts, terms, and the relationships that prevailed in particular times and places?” I’m intrigued by that ambitious enterprise, but as my proposal suggests I’m cautious and skeptical for a couple of reasons. First, I’m dubious that most of what we study and analyze as humanists can be fit into anything resembling an adequate ontology. The things we study–e.g. religious belief, cultural expression, personal identity, social conflict, historical causation, etc., etc.–are so complex, so heterogeneous, so plastic and contingent that I have a hard time envisioning how they can be translated into and treated as structured data. As I suggested in my proposal, even something as modest as an “historical event” may be too complex and subjective to be the object of a microformat. Having said that, I’m intrigued by the potential that data models offer to consider quantities of evidence that defy conventional methods, that are so large that they can only be treated computationally. I’m sure that the development of ambitious data models will lead to interesting insights and help produce novel and valuable arguments. But–and this brings me to my second reservation–those models or ontologies are, of course, themselves products of interpretation. In fact they are interpretations–informed, thoughtful (hopefully) definitions of historical, cultural relationships. There’s nothing wrong with that. But adherence to “controlled” vocabularies or established “semantic” rules or any standard, while unquestionably valuable in terms of promoting interoperability and collaboration, defines and delimits interpretation and interpretative possibility. I’m anti-standards in that respect. When we start talking about anything remotely complex–which includes almost everything substantive we study as humanists–I hope we see different digital humanists develop their own idiosyncratic, creative data models that lead to idiosyncratic, creative, original, thoughtful, and challenging arguments.

All of which is to say that I second Sterling in suggesting a session on the opportunities and drawbacks of standards, data models, and ontologies in historical and humanistic research.

Here's what others are saying about THATCamp on Twitter