I was listening to the radio the other day on the way to work. The commentator was talking about the recent faux pas on the part of the government and its gathering metadata, or data, on people. This has been on the news for some time and discussions have appeared on AUTOCAT. What was interesting about this commentary was that metadata is essentially data (data about data anyone?). This data like all data is not information in and of itself. To become useful and understandable, data need context. What I found interesting is that this is one of the goals of providing metadata. Going further, context is perhaps a key to finding quality metadata.
What does it mean that metadata should provide context? There are several steps in generating metadata to describe a resource. This is something we are familiar with. These steps have fancy names like subject analysis. It’s funny because in the realm of digital initiatives and when I talk to my colleagues who are programmers, liaisons or faculty, they don’t really know what these steps are much less the terms catalogers and metadata librarians use for them. What I noticed is that metadata seems to be an activity that is done once – namely to describe your research at the end of a project. Now we know that isn’t true and there are many steps to describing a resource such as the decision of how to document your data gathering as you’re in the midst of researching a project. Let’s put that aside and focus more on the end of a project and resource description and context.
Description: It is important to describe the resource such that people can find that resource and be able to determine if it the right resource. This concerns metadata that help searching, discovery, access, etc. I guess we could use the FRBR or RDA terminology of Find, Identify, Select, Obtain. The metadata that one could create for this step are data such as the name of the author or creator, various dates associated with the resource, DOI or other persistent identifier (ARK, Handle, etc.), keywords, summary, rights, availability, etc. These type of information are common. We see it everyday on labels, on websites or in catalogs. Typically the minimal level of metadata is title and perhaps keywords – think of Flickr. These metadata help create citations and records for various purposes like display or searching. These metadata can be created by a computer and/or people. However, these data need to also provide context.
Context: Metadata, or data documenting a resource, are still just data. If the data is presented inconsistently or inaccurately, then it will be hard to find the resource. If there are insufficient metadata, then it will also be hard to identify the resource. Also, for those that browse and who might not have a DOI or the author, it is necessary to have a different entry point into the record. Furthermore, it is crucial to provide a minimal of relationships for the resource. Essentially, the role of metadata is not just to provide data to construct a citation for a bibliography. There are roles to metadata. One is description. Another is to provide enough of metadata that uniquely identifies not just the resource but also the role of that resource such that you can compare and contrast it to other resources. The comparing and contrasting can be done a number of different ways. One is to provide an accurate description of the resource. Another is to provide some sort of identifier (DOI, ARK, Call Numb., etc.). Another is to provide information about the nature of the resource. Is it a book, an e-book, the film of the book, a sound recording, or an image that might have appeared in an article? Yet another way is to provide the technical details and tease out the embedded metadata which provides another level of data that documents the resource. This is beginning to think about the context of the description. Context is really the activity of gathering metadata together to form a consistent and meaningful picture of the resource.
The level of context depends on the resource itself and the complexity of that resource. It might be necessary to include several data about how a resource is related to others. It might be necessary also to provide a large amount of automatically generated technical data.
All of this is to say that each resource is unique. The context needed for one resource is not the exactly the same as another resource. This is the nice feature of cataloging in Connexion where each record can be unique because each context is slightly different. This is also something that many digital library softwares seem to miss the mark. Has anyone else noticed the use of “template” driven metadata creation and editing? In general, metadata submission and editing templates meet the needs of many resources. However, it’s the exceptions that always put these templates to shame. To provide the necessary context for an exceptional resource, a person creates metadata that is stuffed somewhere into some element because the templates lacks the flexibility for other added elements. I’ve seen the Dublin Core element, description, used for general notes on the resource, cataloger notes and then archival notes. This is a real mix of disparate information that will be difficult to migrate, tease out, or share later on, especially if the data are entered inconsistently. Templates are not a bad thing and make it easy for people entering data. Yet, context shouldn’t be sacrificed because of an inflexible template or web form. In many cases, it seems we have swung the pendulum from the overwhelming variety of marking up metadata in MARC to the lack thereof with templates and web forms. To preserve context, we need to find a happy medium between overwhelming and limited or inflexible encoding practices. Because it is really the art of providing context to metadata that enables people to find, identify, select and obtain resources. Each resource has a unique context. If we don’t respect that, then we lose sight on how best to describe that resource. Furthermore, it is this context that a computer cannot always provide. And if that happens, then no matter how much metadata is presented, it will be difficult to sort out this context and correctly select a resource. Better quality is not more metadata. Better quality is linked to providing the necessary context of a resource, a context that the user can understand in order to be able to find, identify, select and obtain that resource. This is the added value provided by catalogers and metadata libraries because they can bring together the disparately generated data and enhance it with providing relationships to other resources, terms from controlled vocabularies, or other specific information that a computer might not know. I say let’s advocate for context!