At ALA Annual in Chicago, among other sessions, I attended the Metadata Interest Group session which featured Erik Mitchell from UC Berkeley on Library Linked Data. Erik has a website: http://www.erikmitchell.info, which is outdated but still gives you an idea of Erik’s work. There were several points in Erik’s presentation that were certainly fun and one that was intended to shock. The shock component was his statement that “xml is the MARC of the 21st century”. If xml is anything like MARC, I imaging that xml will be dying for the next 20+ years. But let’s put the demise of xml to the side for the moment. What I found interesting was Erik’s approach to explaining metadata.
In short, Erik explained that metadata is a bunch of discreet parts such as conceptual, structural, or digital. There is a metadata model that he outlined with the following graph:
I felt that this graph wasn’t clearly communicated in terms of the importance of the placement of the boxes and what they represented. In a very abstract sense, I saw that these boxes were trying to visualize Erik’s idea that metadata is made up of discreet parts and that this is a data model. But I wondered. Are controlled vocabularies a data model? Is offering facets a data model or mere presentation? One of the problems that I saw with this model is that it represented a whole mix of approaches to metadata without really saying what metadata is or are. I also saw that anyone who isn’t a librarian would be confused. This leads me to the second problem. What happens when people take this graph and this metadata is discreet parts back to their job and hold a workshop? There’s going to be a mixed reaction but my bet will be that the majority of people will not have understood metadata.
Interestingly both where I work and at UMass Amherst, explaining metadata as if you were talking to librarians or future librarians doesn’t yield positive results. What does it mean to explain metadata to librarians? First, you assume a common knowledge of how to organize information and that most of the people in the audience are aware of cataloging, organizing digital collections or something to this effect. In a sense, it means skipping a beat. Instead of starting out with some very general statement about how metadata is information that documents a resource (a resource that can be someone’s research, data set, a book, a computer, etc) in such a way as to be able to search, discover and access that resource, you assumes this understanding. Instead of providing examples that such information can contain the author, publisher, DOI, data the resources was published or a summary, again you assume this from your librarian colleagues. Instead of asking people to think of the information found on Amazon – all of this information together is metadata and that like data, metadata can be seen as data or discreet parts of needed information to correctly identify a resource, you’ve assumed librarians know this. All of this background information is put to the side in many cases for librarians and the explanation goes quickly into how to categorize the many different incarnations of metadata – in Erik’s case, conceptual, structural, digital, etc. This is really another version to defining metadata by types such as administrative, structural, rights, technical, preservation, descriptive, etc. The problem here is that metadata is still seen as “is” a thing to be fit in a box and these boxes can fit together to form data models, just like legos.
What’s going on here? Metadata certainly can be linked to data models because metadata is simply data. Metadata are data that documents a resource in such a way as to be able to uniquely identify that resource. These data can be embedded in the resource or in a separate file such as text file or xml record or even a jasonLD file. What is important is that these data are meant to properly document a resource for use now and hopefully in the future. These data can conform to a number of data models depending on the community and the data needs of that community. These data can also be expressed and shared a number of different ways. However, any type of data has a life cycle. Seeing metadata as discreet parts ignores its life cycle and becomes inflexible to the necessary changes that will take place in how to understand metadata, organizing information and even how we search and discover resources.
This leads me back to the demise of xml. XML is simply a tool that is convenient at this point in time not only for the library community but the digital community at large. The next step isn’t jasonLD, which might be dead as well in a couple of years. If you look at a history of programing languages, there are a number of them that no longer exist. Can anyone remember SGML? There is a larger issue namely whether xml is backwards compatible between its various versions and whether the data encoded in xml can be migrated to the new and best encoding standard that will come along. Again, I mention SGML. There will also be a language that will supplant jasonLD. It could be a new and improved version of that language or something different.
Don’t mistake me. I enjoyed hearing Erik speak about data models, linked data and his research. What he had to say was important, especially in regards to linked data and implementing linked data in our work. I believe however that we need to go one step further beyond the boxes. We need to see how our system is flexible enough to withstand more than a couple of years. I think we also need to teach our future librarians this flexibility in addition to understanding the assumptions they make because what they will need to continue to learn new things, breaking the models and boxes they learnt at school to bring in the next wave of how we treat data in libraries.