Archive for the ‘Uncategorized’ Category

Crowdsourcing & outreach

Monday, June 22nd, 2009

I mentioned briefly in my original post that we have a good deal of 19th-century textual material in what are now fairly obscure German dialects; we have scanned the originals and had parts translated, but we have two further goals with this project: 1) encourage more translations of other related material and 2) create a resource/destination for researchers to debate and discuss the materials.  (A further goal is to start digitizing our snazzy Paracelsus collection, once we have this in place as a test case – but that’s down the road).

We have the scanning and digital object handling well underway (or would, if our server upgrade were finally finished, but that’s another story – I may not be able to show off much from this collection, but can demonstrate how we are set up with some other materials), but we are looking for inspirations and ideas for the other two goals.  Since we’re looking for low IT involvement, creating a blog highlighting some of the materials and encouraging discussion in the comments is one idea, but we’d like to avoid creating an additional digital ‘space’ that we’d require users to navigate to (especially since we already have a blog for our collections in general).

Is anyone using a more interactive plugin (or similar more modular feature) to create spaces for discussion in a way that’s still tied to the digital object?  One of our concerns is that there may be a steep IT learning curve for a high percentage of scholars in this particular subfield and we’d like to make sure they all feel welcomed, so ease of use is key.  We are also looking to use the project to reach out to other scholars who might not currently be aware of the materials (likely language scholars and historians in related fields) and feel pretty confident about putting that plan in place once we know what sort of sandbox we can offer them.

Anyway, I would love to hear what suggestions everyone has and am definitely looking forward to seeing some examples of what everyone else has done.

Digital Publishing-Getting Beyond the Manuscript

Monday, June 22nd, 2009

Here is the original submission I made to THATCamp followed by some additional background ideas and thoughts:

Forget the philosophic arguments, I think most people at THATCamp are probably convinced that in the future scholarly manuscripts will appear first in the realm of the digital, I am interested in the practical questions here: What are born digital manuscripts going to look like and what do we need to start writing them? There are already several examples, Fitzpatrick’s Planned Obsolescence, Wark’s Gamer Theory, but I want to think about what the next step is. What kind of publishing platform should be used (is it simply a matter of modifying a content management system like WordPress)? Currently the options are not very inviting to academics without a high degree of digital literacy. What will it take to make this publishing platform an option for a wider range of scholars? What tools and features are needed (beyond say Comment Press), something like a shared reference manager, or at least open API, to connect these digital manuscripts (Zotero)? Maybe born digital manuscripts will just be the Beta version of some books which are later published (again i.e. Gamer Theory)? But, I am also interested in thinking about what a born digital manuscript can do that an analog one cannot.

Additional Thoughts:

So I should start by saying that this proposal is a bit self serving. I am working on “a book,” (the proverbial tenure book), but writing it first for the web. That is rather than releasing the manuscript as a beta version of the book online for free, or writing a book and digitally distributing it, I want to leverage the web to do things that cannot be accomplished in a manuscript form. It is pretty clear that the current academic publishing model will not hold. As I indicated in the original proposal above, I think that most participants at THATCamp are probably convinced that the future of academic publishing is in some ways digital (although the degree to which it will be digital is probably a point of difference). But, in working with this project I have come to realize that the tools for self digital publishing are really in the early stages, a pre-alpha release almost. Yes, there are options, primarily blogs, but for the most part these tend to mimic “book centered” ways of distributing information. To be sure there are examples of web tools which break from this model, namely CommentPress, but I am interested in thinking about what other tools might be developed and how can we integrate them. And at this point I think you have to be fairly tech savvy or have a “technical support team” to be able to do anything beyond a simple blog, or digital distribution of a manuscript (say as a downloadable .pdf). For me one of the early models we can look to is MacKenzie Wark’s Gamer Theory, but he had several people handling the “tech side.” For me I can get the tech help to do the things I cannot on my own, but is seems pretty clear that until the tools are simple and widely available digital publishing will either remain obscure or overly simple/conservative (just a version of the manuscript).

So, what tools do we need to be developing here? Should we be thinking about tools or about data structures and than developing tools around that? (I realize this is not an either or proposition.) I am imagining something like WordPress with a series of easy to install plugins that would open up web publishing to a much wider range of scholars. Perhaps a “publisher” could host these installs and provide technical support making it even easier for academics. I have a fairly good idea of what I personally want for my project, but am interested in thinking about/hearing about what other scholars, particularly those from other disciplines would need/want.

How to make Freebase useful in the digital humanities?

Friday, June 19th, 2009

I would  like to lead a session on the application of to the humanities.  Freebase is “open database of the world’s information”, with an API that allows for integration with other applications (such as Zotero).    I’ve been experimenting with using in the realm of government data, specifically to create PolDB, an “IMDB for politicians” (though my progress has been meagre so far.)   I would like to share my experiences on that front, speculate on the usefulness of Freebase for applications in the humanities (particularly art history), and foster a discussion about the application of other “semantically oriented” techniques beyond Freebase.

Teaching Digital Archival and Publishing Skills

Friday, June 12th, 2009

I’ve been putting this off for a while now, especially after seeing some of the really impressive projects other campers are working on.  My job is not research-oriented; much of what I do revolves around operationalizing and supporting faculty projects in the History Department where I work.  What follows is a rather long description of one such project in which students, in the context of a local history research seminar, are tasked with digitizing archival items, cataloging them using Dublin Core, and creating Omeka exhibits that reflect the findings from their traditional research papers.  Despite the fact that the students are typically Education or Public History majors, they are expected to carry out these tasks to standards which can be challenging even to professional historians and librarians.

I’ve written about some of the practical challenges in projects like this here.  For a full description of the project at hand, click through the page break below.  What is intriguing me right now are the questions such projects raise, particularly those relating to content quality and presentation.

What are realistic expectations for metadata implementation?  Is enforcing metadata standards even appropriate in the context of humanities education?  Many trained librarians aren’t even competent or consistent at cataloging, how can we expect more from undergrad History students?  It’s not that they don’t gain from it (whether they like/know it or not), it’s just that poor metadata might be worse than none.  Information architecture is another challenge, even when students have no role in the initial site design.  They can still confuse the navigation scheme and decrease usability through poorly organized contributions.  Likewise, the content students create is not always something we want to keep online for any number of reasons.  Where do you draw the line between a teaching site (as in, a site designed and used for training projects) and one which is distinctly for use by the broader public?  It’s very blurry to me, but I think how you answer that dictates what you are willing to do and what you end up with.  We really want to create something that is generated entirely by students but with a life outside the classroom.  Ultimately though, we will make decisions that best serve our instructional goals.  I think the value is the process, not the result (though it would be nice for them to match up).  We have done some very ambitious and high quality projects working with small, dedicated teams, but working with large class groups has led to some interesting and unforeseen problems.  I wonder if anyone has any idea about how we might be able to replicate that small team experience and quality on this significantly larger scale.

Has anyone out there done a similar project?  I’d love to hear some experiences and/or suggestions on pedagogy, standards or documentation?

I think this fits in to some degree with Jim Calder’s post and Amanda French’s post, among others (sadly, I have yet to read all the posts here, but I will get to it soon and maybe hit some people up in the comments).


Easy readers

Wednesday, June 10th, 2009

At THATCamp ’08 I learned how to draw a smiley face with a few geometric programming commands.

Dan Chudnov demonstrated how to download Processing, a Java-based environment intended for designers, visual artists, students, and others who want to create something without being full-time professional programmers. Dan’s purpose was to show librarians, scholars, artists, and free-range humanists that getting started with simple programming isn’t as hard as people sometimes think. You don’t have to be a computer scientist or statistician to develop skills that can be directly useful to you. Dan posted a version of what he was demonstrating with the tag “learn2code.”

I’m not a trained programmer, was not new to programming altogether, but was new to Processing, and for a while I didn’t have much reason or time to do more with it. But last winter I found myself highly motivated to spend some of my spare time making sense of tens of thousands of pages of text images from the Internet Archive that were, for my purposes, undifferentiated. The raw, uncorrected OCR was not much help. I wanted to be able to visually scan all of them, start reading some of them, and begin to make some quick, non-exhaustive indexes in preparation for what is now a more intensive full-text grant-funded digitization effort (which I will also be glad to talk about, but that’s another story). I wanted to find out things that just weren’t practical to learn at the scale of dozens of reels of microfilm.

Processing has turned out to be perfect for this. It’s not just good for cartoon faces and artistic and complex data visualizations (though it is excellent for those). It is well suited to bootstrapping little scraps of knowledge into quick cycles of gratifying incremental improvements. I ended up cobbling together a half-dozen relatively simple throwaway tools highly customized to the particular reading and indexing I wanted to do, minimizing keystrokes, maximizing what I could get from the imperfect information available to me, and efficiently recording what I wanted to record while scanning through the material.

Having spent plenty of hours with the clicks, screeches, and blurs of microfilm readers, I can say that being able to fix up your own glorified (silent) virtual microfilm reader with random access is a wonderful thing. (It’s also nice that the images are never reversed because the person before you didn’t rewind to the proper spool.) And immensely better than PDF, too.

At THATCamp I would be glad to demonstrate, and would be interested in talking shop more generally about small quasi-artisanal skills, tools, and tips that help get stuff done — the kind of thing that Bill Turkel and his colleagues have written up in The Programming Historian, but perhaps even more preliminary. How do you get structured information out of a PDF or word processing document, say, and into a database or spreadsheet? Lots of “traditional humanists,” scholars and librarians, face this kind of problem. Maybe sometimes student labor can be applied, or professional programmers can help, if the task warrants and resources permit. But there is a lot of work that is big enough to be discouragingly inefficient with what may pass for standard methods (whether note cards or word processing tools), and small enough not to be worth the effort of seeking funding or navigating bureaucracy. There are many people in the humanities who would benefit from understanding the possibilities of computationally-assisted grunt work. Like artificial lighting, some tools just make it easier to read what in principle you could have found some other way to read anyway. But the conditions of work can have a considerable influence on what actually gets done.

More abstractly and speculatively, it would be interesting to talk about efficiencies of reading and scale. Digital tools are far from the first to address and exacerbate the problem that there is far more to be read and mapped out than any single person can cope with in a lifetime. Economies of effort and attention in relation to intellectual and social benefit have long shaped what questions can be asked and do get asked, and to some extent what questions can even be imagined. Digital tools can change these economies, although not in deterministically progressive ways. Particular digital applications and practices have all too often introduced obfuscations and inefficiencies that limit what questions can plausibly be asked at least as much as microfilm does. Which is why discussions even of low-level operational methods, and their consequences, can be of value. And where better than THATCamp?

From History Student to Webmaster?

Wednesday, June 10th, 2009

Here’s my original proposal (or part of it at least):

“I would like to discuss the jarring, often difficult and certainly rewarding experiences of those, like myself, who have somehow managed to make the leap from humanities student to digital historian/webmaster/default IT guy without any formal training in computer skills.  While I am hoping that such a discussion will be helpful in generating solutions to various technical and institutional barriers that those in this situation face, I am also confident that meeting together will allow us to better explain the benefits that our unique combination of training and experience bring to our places of employment.  I would also be very interested to see if we could produce some ideas about how this group could be better supported in our duties both by our own institutions and through outside workshops or seminars.”

I’m not sure if this is the right place for this discussion, as I’m guessing that many campers may not share these difficulties.  However, if enough people are interested, I think I’ll go with it.  Related to this discussion, I would also like to talk about people’s experiences or recommendations for resources that could be useful to digital historians in training, as well as better ways to get our message about web 2.0, open source technologies, freedom of information, etc. to our colleagues.

Anyways, let me know what you all think.

Omeka playdate open to THATCampers

Wednesday, June 10th, 2009

The Friday before THATCamp (June 26th) we’ll be holding an Omeka “playdate” that’s open to campers and anyone else who would like to attend. Interested in learning more about Omeka? Already using Omeka and want to learn how to edit a theme? Want to build a plugin or have advanced uses for the software? This workshop is a hands-on opportunity to break into groups of similar users, meet the development and outreach teams, and spend the part of the day hanging around CHNM.

We’ve increased the number of open spots, and would love to see some THATCampers sign up as well. If you plan on attending, please add you name to the wiki sign-up.  Remember to bring your laptop!


Monday, June 8th, 2009

Here’s my original proposal for THATCamp. The question and issues I’m interested in entertaining dovetail nicely, I think, with those that have been raised by Sterling Fluharty in his two posts.

The panel at last year’s THATCamp that I found the most interesting was the one on “Time.” We had a great discussion about treating historical events as data, and a number of us expressed interest in what an events microformat/standard might look like. I’d be interested in continuing that conversation at this year’s THATCamp. I know Jeremy Boggs has done some work on this, and I’m interested in developing such a microformat so that we can expose more of the data in our History Engine for others to use and mashup.

While I’d like to talk about that particular task, I’d also be interested in discussing a related but more abstract question too that might be of interest to more THATCampers. Standards make sense when dealing with discrete, structured, and relatively simple kinds of data (e.g. bibliographic citations, locations), but I’m wondering if much of the evidence we deal with as humanists requires enough individual interpretation to make it into structured data that the development of interoperability standards might not make that much sense. I’m intrigued by the possibility of producing data models that represent complex historical and cultural processes (e.g. representing locations and time in a way that respects and reflects a Native American tribe’s sense of time and space, etc.). An historical event doesn’t seem nearly that complicated, but even with it I wonder if as humanists we might not want a single standard but instead want researchers to develop their own idiosyncratic data models that reflect their own interpretation of how historical and cultural processes work. I’m obviously torn between the possibilities afforded by interoperability standards and a desire for interpretive variety that defies standardization.

In his first post, Sterling thoughtfully championed the potential offered by “controlled vocabularies” and “the semantic web.” I too am intrigued to by the possibilities that ontologies, both modest and ambitious, offer, say, to find similar texts (or other kinds of evidence), to make predictions, to uncover patterns. (As an aside, but on a related subject, I’d be in favor of having another session on text mining at this year’s THATCamp if anyone else is interested.) Sterling posed a question in his proposal: “Can digital historians create algorithmic definitions for historical context that formally describe the concepts, terms, and the relationships that prevailed in particular times and places?” I’m intrigued by that ambitious enterprise, but as my proposal suggests I’m cautious and skeptical for a couple of reasons. First, I’m dubious that most of what we study and analyze as humanists can be fit into anything resembling an adequate ontology. The things we study–e.g. religious belief, cultural expression, personal identity, social conflict, historical causation, etc., etc.–are so complex, so heterogeneous, so plastic and contingent that I have a hard time envisioning how they can be translated into and treated as structured data. As I suggested in my proposal, even something as modest as an “historical event” may be too complex and subjective to be the object of a microformat. Having said that, I’m intrigued by the potential that data models offer to consider quantities of evidence that defy conventional methods, that are so large that they can only be treated computationally. I’m sure that the development of ambitious data models will lead to interesting insights and help produce novel and valuable arguments. But–and this brings me to my second reservation–those models or ontologies are, of course, themselves products of interpretation. In fact they are interpretations–informed, thoughtful (hopefully) definitions of historical, cultural relationships. There’s nothing wrong with that. But adherence to “controlled” vocabularies or established “semantic” rules or any standard, while unquestionably valuable in terms of promoting interoperability and collaboration, defines and delimits interpretation and interpretative possibility. I’m anti-standards in that respect. When we start talking about anything remotely complex–which includes almost everything substantive we study as humanists–I hope we see different digital humanists develop their own idiosyncratic, creative data models that lead to idiosyncratic, creative, original, thoughtful, and challenging arguments.

All of which is to say that I second Sterling in suggesting a session on the opportunities and drawbacks of standards, data models, and ontologies in historical and humanistic research.

Digital Humanities Manifesto Comments Blitz

Monday, June 8th, 2009

I just managed to read UCLA’s Digital Humanities Manifesto 2.0 that made the rounds a week or so ago, and I noticed its CommentPress installation hadn’t attracted many comments yet. Anyone interested in a session at THATCamp where we discuss the document paragraph by paragraph (more or less) and help supply some comments for the authors?

Campfire Plans

Wednesday, June 3rd, 2009

Maybe this isn’t the right venue,  but sometimes it’s never too early to start talking about extracurricular activities.   What happens Saturday/Sunday night?   Will Amanda French be leading us in a round of digital humanities songs around the campfire?

Here's what others are saying about THATCamp on Twitter