The Dog Ate My Metadata

Have you heard the story of “Legacy Metadata” or “Legacy Data”? I heard it again the other day and it went like this….

Now we’re moving to a new and improved digital platform. After we test the system and make sure it works to spec, then we’ll begin migrating the digital objects and their associated metadata. Of course, we’ll have to clean up the metadata. It’s legacy data. It’s inconsistent, inaccurate and in some cases either missing or just plain incorrect. This is a big job but we know what we’re doing now. We can’t blame our predecessors because they just didn’t know what they were doing way back then way back in time with their digital collections. This whole business of digital was new back then and people were figuring out what to do with digital collections. Ahhh… once we get this legacy data cleaned up and migrated, all our metadata will look nice.

Let’s put sarcasm aside. When we do, the story that I heard and have heard before is that legacy data can cause trouble because it is inconsistent and inaccurate. The reasons are many but one that comes up often is that people were leaning how to create digital collections. Hence, mistakes were made and inconsistent and inaccurate metadata followed. Certainly metadata that was created some time ago in an old platform will have issues. But is it right to wholesale blame our predecessors for bad metadata and say they didn’t know what they were doing? The story is much more complex. Could it be that their approach is not the same perceptive that many of us share on how to organize information today? What’s really going on here and have we learned our lesson in regards to legacy metadata?

As to what is really going on, there are certainly a number of hypotheses. I would like to look at just a few possible ones: (1) “time and effort” being dedicated to metadata, either legacy or current; (2) legacy data is inconsistent and inaccurate; (3) our predecessors were learning as they went and we know what we’re doing now.

(1) Time and effort: for those of us who work with metadata, we understand that not everything is automatic. This is detail work that takes time and effort. Were I have worked, bibliographic maintenance has become a thing of the past for the most part. The result is an inconsistent and inaccurate catalog because no one really has the time to fix mistakes. When I talk to digital pioneers, especially those who work in Archives or Special Collections, the goal was to digitize. Digitize it and people will come. Where I used to work, the priority was on digitization and metadata last. Where I am now, the priority is still on digitizing but also full text searching. Full text searching can be helpful unless your digital object doesn’t contain text or if the text needs more than keywords to be browsed or searched for. In both instances, the push is to provide content to users first and information about that content second.

Contrary to this idea is the push to provide documentation on your data. In the new wave of metadata consultations for eScience and Data Management Plans, we ask researchers to take the time and make the effort to provide a minimum of good metadata. Good metadata is information that uniquely describes a researcher’s data. Now if this consists of 4 metadata tags so be it. Perhaps it is more as is the case with most FGDC marked up information. Here, the push is to provide content and information about that content.

(2) This leads me to the idea that legacy metadata is inaccurate and inconsistent. It is wrong to think that for some reason during the early digital wave that librarians forgot how to organize information. Again, the focus was not so much on how to organize the information that uniquely described these digital resources as on the organization of those digital objects themselves. Should we put them by collection and then series? Should we display the most frequently downloaded? What about the A-Z list for people to browse? I don’t mean to imply that no time or effort was given to metadata. However, I don’t think this was a priority because it was simply more important to just get the object out there on Flickr or some other platform. This certainly wasn’t a bad approach. It shed light on many collections that had until then remained unknown. As we move to a more linked data verse, it is, however, becoming apparent that the linking happens with data. If the data (or metadata) isn’t there or is inaccurate then linkages don’t happen. By linkages, I’m thinking of time lines, mapping, visualizations, etc. It’s re-using and re-imaging the data. This is one of the reasons why pushing content and information about that content are two steps that need to be done very closely together or even better at the same time through means of automatically supplying that information about the content and having someone provide the rest if the automated metadata isn’t enough to uniquely describe that digital content. This leads me to my third point on people back then didn’t know what they were doing.

(3) Technology is moving fast. Versioning is not just a problem with your laptop. It is also an issue with metadata standards, digital platforms and anything really that relies on a computer. Just like our predecessors, we are learning better ways to get our digital resources out to the public so that they can discover, use, share, and re-use the data. The learning curve didn’t stop in the 90’s or 2013. If anything we’ve learned that we always need to be developing our skills and learning new ones.

But another key lesson out there is the importance of metadata. Metadata has become or has been a trendy word for some time. Many like to think of this as automation. Finally we no longer have to be tied to a cataloger creating records us. Perhaps even some money can be saved when there is no longer a need to employ an expensive professional. We can have a computer do it all. That would be nice but it is not a reality. We definitely need to automate as much as possible because the sheer amount of data that we work with and that we will be working with will be overwhelming. But not everything can be automated. Most of all this concerns the organization of that information that uniquely describes the digital resources. Thinking through how to organize this information in a consistent and accurate manner takes time and effort. It requires skills and learning news skills along the way. If we don’t allow adequate time and effort to think through how to organize metadata, then indeed we will create inconsistent and inaccurate metadata just as some of our predecessors did. So let’s not use the excuse of the dog ate my (homework) metadata and dedicate the time, effort and support needed to create a legacy.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s