METS Is As Easy As Sliced Apple Pie

I recently heard from a colleague that the Metadata Encoding Transmission Standard (http://www.loc.gov/mets) is easy. Let me backup and provide some context to this assertion. My colleague works primarily with EAD and Dublin Core records created in software applications. In this sense, she doesn’t sit down and write documents in xml where content is correctly encoded according to EAD and/or DC standards. However, my colleague has in the past written EAD files by hand and is knowledgeable about xml. What she wanted to do is write by hand separate METS documents for a collection of PDF’s according to our METS Profile. And my colleague definitely had more sense than those who told her that METS was easy. Thanks to her experience she figured that it wasn’t as straight forward as these happy people were leading her to believe. There are several reasons for this. One of them is that the METS profile lays out a number of requirements as illustrated in the Appendix by an example METS file. The requirements ask to create two dmdSecs, one for OAI and MODS, a digiprovMD sec for PREMIS, the struct map, a sourceMD. if necessary a header in terms of the general sections. Then there is required vocabulary for some elements and attributes along with other requirements such as data types, etc. My colleague and I knew that she could figure out how to write by hand METS files for each one of the PDFs. But we were both skeptical as to why someone wanted my colleague to do this by hand, especially for a large number of METS files. It’s not that this is an impossible task. But suffice to say that METS is not as easy as sliced apple pie.

First, there’s the issue of xml. METS is of course a standard written in xml according to a schema again written in xml. XML is the extensible markup language. If you are totally unfamiliar with xml, w3schools has some great tutorials on xml along with other languages in the xml family such as xsd’s (schemas), xlink, xpath, or xquery. Let’s just stick with xml whose main goal is to store and organize information. Information is organized in what are sometimes called tags or elements. Here’s an examplewhich is used on the w3schools tutorial.

<bookstore>
<book category=”CHILDREN”>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category=”WEB”>
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

What this xml file does is store and organize information about books. There is a “root” element, bookstore, and one direct child element, book that has an attribute called category, which in turn has its own children elements, title, author, year, price. What this xml file does not do is some sort of action. It won’t open a new window or do any other operation. If you view it in the browser, you’ll just see this file.This in and of itself is more or less straightforward. One can simply follow the w3schools tutorial to learn more about writing basic xml files like this one. However, this is only the tip of the xml iceberg.

Second, we typically don’t want to put information into various tags that just make sense to us. We want to not only organize information but describe and define the elements used in a xml file and then further transform them to use in html or to create another xml, etc. To define xml elements, there is the step of learning schemas (or go old school for DTDs or perhaps you want to learn both DTDs and schemas). Again, many of us in metadata work with schemas that have already been created for us and this is the case with METS. The METS schema defines all the elements and attributes that can occur in a METS file. In this sense, one cannot simply go and write your own METS file. You have to write a METS file that conforms to the definitions outlined in the METS schema. If you don’t, then you simply have a random xml file that might look like METS but isn’t. Even if you rely on a schema that was created for you, you have to know just enough about schemas in order to read the schema – adding another challenge to the great xml adventure.

Thirdly, one just doesn’t want to write by hand hundreds of METS documents one by one. Unless perhaps you have some beer on hand. Typically, as with my colleague’s case, you need to take hundreds of records in some other metadata schema written in xml and transform that to separate METS files. There are tools that allow you to automate this process, namely xslt if you are working with data in an xml format and need to transform it into a METS file format. This adds multiple adventure layers. You need to learn xslt and more than likely xpath. Of course, getting a handle on regular expressions would be nice. On top of that,  you need to consider the accuracy and consistency of both the encoding of your xml files and that of the content. For example, if your data consists among other things of a url, is this url encoded using the same element for every single record for your PDF file or perhaps you use or sometimes ?

What’s the point exactly here? XML is easy to become familiar with in the beginning. But it is deceptively thought of as easy. There is a learning curve. This doesn’t mean that xml and its fellow family members are out of reach. It means that time needs to be taken to learn xml and its family members. The time spent on learning this will allow you to automate processes such as creating METS files using xslt.

So why bring this up? This statement reminded me of what someone told me some years back about catalogers and metadata librarians. In a nutshell, this person asserted that catalogers and metadata librarians are data entry secretaries working to put in and edit information in forms. Of course, this is a simplistic view. Obviously the person didn’t understand the work of catalogers or metadata librarians. But the statement that METS is easy can be seen to be an expression of that. Isn’t METS just creating tags in some xml file? Of course, METS is an xml file that consists of tags. But it is also so much more and involves more than just understanding xml in and of itself. This is even more true when you want to transform METS into another xml format or turn some xml file into a METS formated document.

We shouldn’t take metadata standards for granted. They are complex. Even Dublin Core, which is considered to be one of the easier metadata formats, comes in at least 31 flavors just in how it is implemented differently by various people. We shouldn’t furthermore take writing xml files that conform to metadata standards for granted. This task is much more than just creating and editing information in a form or in a text editor. It involves a number of different tools and levels of understanding about these tools and how they interact together. Metadata is not rocket science. It is however a science and an art that takes time, practice, and more practice to make it seem as easy as sliced apple pie.

Advertisements

Leave a comment

Filed under cataloging

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s