Going from A Traditional to Hybrid Cataloging Department

I recently attended the ACRL New England Conference. It was a very good conference, especially since I was able to connect with a couple of my colleagues in cataloging. We got to talking about how to make the jump from a traditional cataloging unit to a hybrid one. Let’s back up a bit to explain more about tradition and hybrid. A traditional cataloging team has nothing to do with heirlooms. In fact, it is a busy department working with a plethora of materials in a variety of formats. These departments have seen numerous re-organizations. Many are no longer even referred to as “cataloging”. Perhaps resource management or resource access team is the name of the year under their current administration. But what’s missing in these departments is working on digital collections. When I was talking to my colleagues, I was interested to hear that in their case the “Digital” add you name of the year here team administered the metadata for digital projects. In this case, there was a definite split between cataloging and metadata. A hybrid cataloging or cataloging/metadata or resource access or something access and management team is one that takes care of all the MARC cataloging projects and the non MARC cataloging projects in some capacity. Typically on this hybrid team, there are people who know everything about the MARC world and then also some xml, MODS, METS, DC, and other varieties of metadata schemas and perhaps how to navigate some digital collection management software. Now of course, there are large variations in this model and my summary is only a very general picture of a stock hybrid situation.

To get back to my conversation with my colleagues, they asked how to make the jump from being solely a MARC shop to be and not to be MARC! I like to take the example of Tufts University and their Miscellany Collection. Alex May, who is the metadata librarian on the project wrote about it in this PDF. Taking a small collection from the Archives and Special Collections, they digitized it. Using xml, they created a small database with a front end of PHP to present the lost Miscellany Collection. Alex May has given several presentations throughout the New England Area on how he created this online collection and created the metadata for it. With a small collection and a talented metadata specialist with the skills to pull it off, Tufts Library has added a super digital collection.

But what happens when your staff lack skills such as xml or PHP? Find someone who is interested in learning new skills. Then thanks to free courses online, let them explore the xml universe. The w3 schools has some great training guides online and for free on xml. If you have access to a training budget, there are a number of other alternatives, from lynda.com to bringing in experts in the area. And New England has a whole bunch of experts in xml, programing languages, or metadata. In my experience, many of these fantastic individuals would be more than happy to come help out for a modest fee. If you don’t know who these people are, then talk to a few of your colleagues and they can certainly hook you up with the right people. Increasingly, there are a number of “unconferences” where you can network and learn to your hearts content.

With training and a small project (such as 31 images in one collection or a diary), make sure you have a plan. Boise State has a good guide to help get you started on this route. It’s not just a question of: Now we can just scan images! Yea! You need to know how you’re going to scan them, what type of images you need, storage, to be hosted or not, what digital collection management software you use (vendor or open source), etc. The more you work out the details of how to make your small print collection become digital, the easier it will be. Also, you’ll be more prepared to deal with changes and challenges that will inevitably happen along the way. Taking time to read up on how other libraries started their digital collection is also an excellent way to get information on the dos and don’ts along the way; for example, check out “Using Omeka to Build Digital Collections“.

Don’t forget to give yourselves time and a timeline that includes training and time to make mistakes along the way.

The first time your traditional cataloging department starts to be involved in a digital collection will involve most likely lots of opportunities as well as plenty of experience on how to do a better job the next time. With a decent plan, time and support, your traditional catalogers can confidently make the switch from traditional and rocking to hybrid and still rocking. This initiative will illustrate just how cataloging skills can be transfered to digital initiatives bringing new attention to your catalogers and perhaps more respect. Also, this could open up new collaborations with departments that might not have worked with cataloging before.

In short, plan, train, plan and collaborate. With a willing member(s) of your traditional team, you can help them by giving them the time, tools, and support they need to make the switch from MARC to xml and help them participate in the creation of the creation of a digital collection.

1 Comment

Filed under cataloging

METS Is As Easy As Sliced Apple Pie

I recently heard from a colleague that the Metadata Encoding Transmission Standard (http://www.loc.gov/mets) is easy. Let me backup and provide some context to this assertion. My colleague works primarily with EAD and Dublin Core records created in software applications. In this sense, she doesn’t sit down and write documents in xml where content is correctly encoded according to EAD and/or DC standards. However, my colleague has in the past written EAD files by hand and is knowledgeable about xml. What she wanted to do is write by hand separate METS documents for a collection of PDF’s according to our METS Profile. And my colleague definitely had more sense than those who told her that METS was easy. Thanks to her experience she figured that it wasn’t as straight forward as these happy people were leading her to believe. There are several reasons for this. One of them is that the METS profile lays out a number of requirements as illustrated in the Appendix by an example METS file. The requirements ask to create two dmdSecs, one for OAI and MODS, a digiprovMD sec for PREMIS, the struct map, a sourceMD. if necessary a header in terms of the general sections. Then there is required vocabulary for some elements and attributes along with other requirements such as data types, etc. My colleague and I knew that she could figure out how to write by hand METS files for each one of the PDFs. But we were both skeptical as to why someone wanted my colleague to do this by hand, especially for a large number of METS files. It’s not that this is an impossible task. But suffice to say that METS is not as easy as sliced apple pie.

First, there’s the issue of xml. METS is of course a standard written in xml according to a schema again written in xml. XML is the extensible markup language. If you are totally unfamiliar with xml, w3schools has some great tutorials on xml along with other languages in the xml family such as xsd’s (schemas), xlink, xpath, or xquery. Let’s just stick with xml whose main goal is to store and organize information. Information is organized in what are sometimes called tags or elements. Here’s an examplewhich is used on the w3schools tutorial.

<bookstore>
<book category=”CHILDREN”>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category=”WEB”>
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

What this xml file does is store and organize information about books. There is a “root” element, bookstore, and one direct child element, book that has an attribute called category, which in turn has its own children elements, title, author, year, price. What this xml file does not do is some sort of action. It won’t open a new window or do any other operation. If you view it in the browser, you’ll just see this file.This in and of itself is more or less straightforward. One can simply follow the w3schools tutorial to learn more about writing basic xml files like this one. However, this is only the tip of the xml iceberg.

Second, we typically don’t want to put information into various tags that just make sense to us. We want to not only organize information but describe and define the elements used in a xml file and then further transform them to use in html or to create another xml, etc. To define xml elements, there is the step of learning schemas (or go old school for DTDs or perhaps you want to learn both DTDs and schemas). Again, many of us in metadata work with schemas that have already been created for us and this is the case with METS. The METS schema defines all the elements and attributes that can occur in a METS file. In this sense, one cannot simply go and write your own METS file. You have to write a METS file that conforms to the definitions outlined in the METS schema. If you don’t, then you simply have a random xml file that might look like METS but isn’t. Even if you rely on a schema that was created for you, you have to know just enough about schemas in order to read the schema – adding another challenge to the great xml adventure.

Thirdly, one just doesn’t want to write by hand hundreds of METS documents one by one. Unless perhaps you have some beer on hand. Typically, as with my colleague’s case, you need to take hundreds of records in some other metadata schema written in xml and transform that to separate METS files. There are tools that allow you to automate this process, namely xslt if you are working with data in an xml format and need to transform it into a METS file format. This adds multiple adventure layers. You need to learn xslt and more than likely xpath. Of course, getting a handle on regular expressions would be nice. On top of that,  you need to consider the accuracy and consistency of both the encoding of your xml files and that of the content. For example, if your data consists among other things of a url, is this url encoded using the same element for every single record for your PDF file or perhaps you use or sometimes ?

What’s the point exactly here? XML is easy to become familiar with in the beginning. But it is deceptively thought of as easy. There is a learning curve. This doesn’t mean that xml and its fellow family members are out of reach. It means that time needs to be taken to learn xml and its family members. The time spent on learning this will allow you to automate processes such as creating METS files using xslt.

So why bring this up? This statement reminded me of what someone told me some years back about catalogers and metadata librarians. In a nutshell, this person asserted that catalogers and metadata librarians are data entry secretaries working to put in and edit information in forms. Of course, this is a simplistic view. Obviously the person didn’t understand the work of catalogers or metadata librarians. But the statement that METS is easy can be seen to be an expression of that. Isn’t METS just creating tags in some xml file? Of course, METS is an xml file that consists of tags. But it is also so much more and involves more than just understanding xml in and of itself. This is even more true when you want to transform METS into another xml format or turn some xml file into a METS formated document.

We shouldn’t take metadata standards for granted. They are complex. Even Dublin Core, which is considered to be one of the easier metadata formats, comes in at least 31 flavors just in how it is implemented differently by various people. We shouldn’t furthermore take writing xml files that conform to metadata standards for granted. This task is much more than just creating and editing information in a form or in a text editor. It involves a number of different tools and levels of understanding about these tools and how they interact together. Metadata is not rocket science. It is however a science and an art that takes time, practice, and more practice to make it seem as easy as sliced apple pie.

Leave a Comment

Filed under cataloging

The Dog Ate My Metadata

Have you heard the story of “Legacy Metadata” or “Legacy Data”? I heard it again the other day and it went like this….

Now we’re moving to a new and improved digital platform. After we test the system and make sure it works to spec, then we’ll begin migrating the digital objects and their associated metadata. Of course, we’ll have to clean up the metadata. It’s legacy data. It’s inconsistent, inaccurate and in some cases either missing or just plain incorrect. This is a big job but we know what we’re doing now. We can’t blame our predecessors because they just didn’t know what they were doing way back then way back in time with their digital collections. This whole business of digital was new back then and people were figuring out what to do with digital collections. Ahhh… once we get this legacy data cleaned up and migrated, all our metadata will look nice.

Let’s put sarcasm aside. When we do, the story that I heard and have heard before is that legacy data can cause trouble because it is inconsistent and inaccurate. The reasons are many but one that comes up often is that people were leaning how to create digital collections. Hence, mistakes were made and inconsistent and inaccurate metadata followed. Certainly metadata that was created some time ago in an old platform will have issues. But is it right to wholesale blame our predecessors for bad metadata and say they didn’t know what they were doing? The story is much more complex. Could it be that their approach is not the same perceptive that many of us share on how to organize information today? What’s really going on here and have we learned our lesson in regards to legacy metadata?

As to what is really going on, there are certainly a number of hypotheses. I would like to look at just a few possible ones: (1) “time and effort” being dedicated to metadata, either legacy or current; (2) legacy data is inconsistent and inaccurate; (3) our predecessors were learning as they went and we know what we’re doing now.

(1) Time and effort: for those of us who work with metadata, we understand that not everything is automatic. This is detail work that takes time and effort. Were I have worked, bibliographic maintenance has become a thing of the past for the most part. The result is an inconsistent and inaccurate catalog because no one really has the time to fix mistakes. When I talk to digital pioneers, especially those who work in Archives or Special Collections, the goal was to digitize. Digitize it and people will come. Where I used to work, the priority was on digitization and metadata last. Where I am now, the priority is still on digitizing but also full text searching. Full text searching can be helpful unless your digital object doesn’t contain text or if the text needs more than keywords to be browsed or searched for. In both instances, the push is to provide content to users first and information about that content second.

Contrary to this idea is the push to provide documentation on your data. In the new wave of metadata consultations for eScience and Data Management Plans, we ask researchers to take the time and make the effort to provide a minimum of good metadata. Good metadata is information that uniquely describes a researcher’s data. Now if this consists of 4 metadata tags so be it. Perhaps it is more as is the case with most FGDC marked up information. Here, the push is to provide content and information about that content.

(2) This leads me to the idea that legacy metadata is inaccurate and inconsistent. It is wrong to think that for some reason during the early digital wave that librarians forgot how to organize information. Again, the focus was not so much on how to organize the information that uniquely described these digital resources as on the organization of those digital objects themselves. Should we put them by collection and then series? Should we display the most frequently downloaded? What about the A-Z list for people to browse? I don’t mean to imply that no time or effort was given to metadata. However, I don’t think this was a priority because it was simply more important to just get the object out there on Flickr or some other platform. This certainly wasn’t a bad approach. It shed light on many collections that had until then remained unknown. As we move to a more linked data verse, it is, however, becoming apparent that the linking happens with data. If the data (or metadata) isn’t there or is inaccurate then linkages don’t happen. By linkages, I’m thinking of time lines, mapping, visualizations, etc. It’s re-using and re-imaging the data. This is one of the reasons why pushing content and information about that content are two steps that need to be done very closely together or even better at the same time through means of automatically supplying that information about the content and having someone provide the rest if the automated metadata isn’t enough to uniquely describe that digital content. This leads me to my third point on people back then didn’t know what they were doing.

(3) Technology is moving fast. Versioning is not just a problem with your laptop. It is also an issue with metadata standards, digital platforms and anything really that relies on a computer. Just like our predecessors, we are learning better ways to get our digital resources out to the public so that they can discover, use, share, and re-use the data. The learning curve didn’t stop in the 90′s or 2013. If anything we’ve learned that we always need to be developing our skills and learning new ones.

But another key lesson out there is the importance of metadata. Metadata has become or has been a trendy word for some time. Many like to think of this as automation. Finally we no longer have to be tied to a cataloger creating records us. Perhaps even some money can be saved when there is no longer a need to employ an expensive professional. We can have a computer do it all. That would be nice but it is not a reality. We definitely need to automate as much as possible because the sheer amount of data that we work with and that we will be working with will be overwhelming. But not everything can be automated. Most of all this concerns the organization of that information that uniquely describes the digital resources. Thinking through how to organize this information in a consistent and accurate manner takes time and effort. It requires skills and learning news skills along the way. If we don’t allow adequate time and effort to think through how to organize metadata, then indeed we will create inconsistent and inaccurate metadata just as some of our predecessors did. So let’s not use the excuse of the dog ate my (homework) metadata and dedicate the time, effort and support needed to create a legacy.

Leave a Comment

Filed under cataloging

A Translation of the Big Heads at ALA MidWinter

I enjoy attending brown bags to hear my colleagues’ summarize their trips to various conferences. The other week, I attended a colleague’s brown bag about the Big Heads at ALA Midwinter 2013.

This year, the discussion was based on the following that I was able to find on ALA Connect:

What are the opportunities for and challenges facing technical services staff in large research libraries in 2013?  How do we transform existing positions as incumbents retire or move on?  What kind of background, traits and skills are we looking for in department heads and other supervisors within technical services?  How can we nurture these managers to lead from the middle and to best serve the needs of our organizations and the wider communities of which we are a part?  Finally, how do we maintain morale during these times of financial stress and breakneck transition?

Click here to get to the ALA connect page where you’ll find the schedule as a .docx file.

This discussion of new skills, transformation of technical services, etc. is an old one now. What was interesting was how my colleague transformed this dialog into the following: catalogers are for the most part unable to make the switch to metadata librarians. Cataloging and metadata librarianship are truly different and bridging the gap is perhaps more difficult than one would think.

I didn’t attend ALA Midwinter and therefore am unable to verify if this was what really was communicated. However, I’ve heard this argument before, namely that catalogers (especially those in the business for some time) are unable to adapt to the new world of metadata. Part of the discussion and what my colleague related was based on skills and flexibility. Let’s take a look at those two ideas and I would add perception as well.

Skills: I work both side of the fence so to speak. I might start out my day in MARC land and end up working in METS. The tools used are different. For my MARC work, I rely heavily on Connexion, MarcEdit, and my ILS. For my metadata work, I rely on Oxygen, Google Refine, ContentDM, Digital Commons, and right now Fedora and Islandora. Because the tools are different doesn’t entail that the work is strikingly different. For those not used to working with other tools, it is good to have training sessions. What I have found is that many have given up on the “old” catalogers by not providing training or support. Instead they downsize the department so extensively that the “old” cataloger doesn’t have time to learn new skills because they are now trying to do the job of at least 3-5 people. It is not an inability to learn but a lack of time and support for the most part. Also, these are skills. Catalogers have been learning new skills for a very long time. This doesn’t mean that accurately and consistently describing a resource is totally different in MARC and metadata land.

Flexibility: My colleague brought up this notion that old catalogers are simply not flexible. Certainly there are librarians who aren’t flexible. That is independent of whether they are catalogers or not. I ask again: how is it possible to be flexible when you are already bending backwards to get work done with a small staff? Of course, not all cataloging/metadata departments are small. However, the tendency across the board has been to reduce cataloging/metadata staff. About a year ago, I heard Christopher Cronin talk about how his staff had been reduced by almost 60% over the last decade. Downsizing has affected even the big heads of TS. Downsizing has affected the amount of time staff can dedicate to learning new skills, volunteer for new tasks, or take on new work. Sometimes, this is made harder by an organization’s structure, workflows, competing departments or even work philosophy. Take for example one common response is to create a digital initiatives team that often compete with cataloging and metadata. There is much more in play than simply flexibility, skills, downsizing, training or internal politics. This leads me to perception.

Perception: It is not so much an issue of skills or flexibility. I have found that the majority of catalogers (old and new alike) have some fantastic skills and are very flexible. What we do have in common is a bad wrap. There is a long-standing perception that catalogers are inflexible, unable to learn new skills, and do not understand the new world of metadata. Because of this general inability, catalogers need to step aside for a brand new generation. A generation of metadata librarians not hampered by old cataloging rules or trained exclusively in cataloging. A generation who have new skills for this brave new world of metadata such as programming.

This is where a large part of the problem resides, with this negative perception of catalogers. This negative perception promotes that catalogers cannot work with metadata because they don’t understand metadata, do not have the tools and skills necessary to work with metadata and are in general inflexible. This perception masks that metadata and cataloging share a common goal, organizing information accurately and consistently. This is one of the reasons why many in the cataloging world have said that they have been working with metadata for years and that is true. In any profession, tools and skills change. Some are able to adapt and some not. This is not determined by your profession but on an individual basis. I think there are examples of inflexibility in any profession. Perhaps it is time to stop judging the profession of cataloging as a whole and see what support individuals need to transition into coding in xml or working in Fedora.

Leave a Comment

Filed under cataloging

Google Refine

In my last post, I talked about using mail merge and excel to do data cleanup. Mail merge and excel are good tools for data cleanup. But like many tools they have their limitations. Another tool that is more powerful is called Google Refine. Google Refine is an open source tool that allows you to cleanup up messy data.

Essentially Google Refine looks like a spreadsheet. For basic operations, it almost acts like a spreadsheet. However, Google Refine goes beyond these to incorporate the use of regular expressions to help clean up thousands of records at a time. A nice analogy is that Google Refine is to metadata in xml or csv as MarcEdit is to MARC data.

It is necessary to download Google Refine and install it on your computer. This is a relatively simply process. The website has several videos that introduce you to Refine and user guides. You’ll see that a diamond appears in the unzipped file. Click on this and Google refine will open a command window and then the application will appear in your default  browser.

You can create a new project or open one you’ve already been working on. You can upload xml, csv or xslx files. I’ve opened xml and xslx files without any problem. If you upload a xml file, Google Refine will ask you to delimit a metadata record and then place that xml into a “spreadsheet” presentation for you to work on.

What I’ve found the most powerful in Google Refine are the regular expressions or GREL. With GREL, you can replace, add or remove text. There are math or date functions and much more. Depending on the extensions you download, Google Refine will also support Jython or other languages. Google Refine also has some built in expressions that help you reconcile data, remove white spaces, or transform cases from upper to lower, etc.

This is a very powerful tool to help you clean up thousands of records at a time. At the moment, the only limitations that I have run across are as follows:

Google Refine has trouble viewing and manipulating more than 2400 records. I found it works best with around 2100 or around there.

It was difficult to find sufficient documentation on GRELs. I continue to experiment with GRELs and see how they behave. However, there’s no GRELs for dummies out there.

In general, I found it better to jump in and start using Google Refine. There is a learning curve. If you begin you’ll need to be comfortable jumping in. Though the introductory videos are helpful, they are really too vague in my opinion.

Have fun and definitely give this a try….

Leave a Comment

Filed under cataloging

Underthinking for a change

There are times when you have a project and it seems like the fancy tools are just out of reach. If only your department had a software engineer or if you had time to learn python and perl! With the movement of let’s do more with less, many in metadata and cataloging find ourselves strapped for time. Though we could certainly take a moment to learn how to program, sometimes we need to get the project completed in very little time. This happened to me just the other week. I was working with very messy data from a digital collection that was started some 8 years ago. To date, almost a dozen or so people have worked to create or edit the data. Each person had their own perspective of how that data should be entered. And yes it was fairly easy to see when a cataloger entered the data versus a volunteer. Add to this a funky export function offered by the digital library software application and Voilà – very inconsistent, inaccurate and messy data. My job was to quickly (as in you only have 3-4 days) to see how to clean this up and create METS documents based on our METS Profile ready to ingest into a digital repository. The good news was that the data didn’t have to be entirely clean since the end goal was to test if we could get METS files into and out of the digital repository. But the METS files had to be valid and conform to our METS profile. Any HTML had to be removed as well as other characters that have special meaning like brackets or question marks which are used to make queries.

As I digested the news, the person making the request said: “Let’s not overthink this. Let’s use mail merge.” I have never heard of using Word’s (or other word processing software) mail merge function to create METS files. But we needed this information quickly for all 4000+ records. The data were in a csv file. I brought this into excel and did the minimal cleanup. Then used a Word mail merge template that I created using the wizard and linking it to this excel file to create a giant merged document of first 2000+ METS records. Once you have this document, you can use an open source application called MS Word Split (Break, Create) Mail Merge from CNET to create separate METS files for each record. It’s not perfect but you can do this in a minimal amount of time. Thankfully in my case, our programmer created a perl code to separate each of the METS files.

I was very happy to get this advice. I was also happy to “under”think about this project. I would have never thought to use excel to cleanup data and then Word’s mail merge to create a new METS document (saved as .txt). If you need to do data cleanup that is not particularly challenging, this is an excellent approach.

In my next post, I want to talk about Google refine, which is a much better tool than excel for data cleanup.

Leave a Comment

Filed under cataloging

Metadata Consultation Services

I just finished reading Laura Smart’s new post, “What the heck is a metadata service?” This is something that I’ve also had to come to terms with. I was hired to be primarily the audiovisual cataloger with experience in electronic resources and metadata. Like many, my job has evolved and includes more metadata now than ever. The key was, and still is, to highlight the roles that metadata librarians can perform for other staff and members of my academic community. Like Laura suggests, it wasn’t the case that metadata (or cataloging) wasn’t a public service. The services provided by my library unit are mostly done in the background and very well. However, it was necessary to make these services “more visible” to use Laura’s expression. It was also necessary to add some new services.

How does one go about making metadata services more visible? Certainly MIT, Cornell, and Standford have done an excellent job. I have reviewed their web sites on metadata several times. Here, I decided to create a LibGuide, from Springshare. LibGuides aren’t fancy but an easy way to get content out there on the web. Another way of becoming more visible is to promote your services with people and at meetings. You can also volunteer for projects that involve metadata. Of course, the downside to many of these solutions is that some might accuse you of being everywhere and in everyone’s business. Another downside is that if you volunteer, it normally involves work on your part … on top of what you’re already doing. Another negative is that sometimes this simply isn’t enough. I have found it difficult to persuade people to think differently of metadata librarians and catalogers than just the police of the standards world.

Another important part of making services more visible is understanding that these services fit the needs of users. It is better for you as a metadata librarian to be able to connect to these users, understand their needs and also be able to talk to them about your services… in a way that they understand. It is this last point that is crucial. Over the last year and a half of having started to promote metadata services, talking about metadata and metadata services for users and even other staff members is sometimes very challenging. I’ve learned that not many staff and even less people in the community understand what metadata means. Because of this lack of understanding, they don’t recognize metadata when they see it and certainly don’t see the importance of metadata. I’ve also realized how many specific word-isms metadata librarians and catalogers use on a daily basis. Making our services more visible also consists in stepping out of this word and not using the word-isms. The narrative surrounding metadata will change depending on your users. Interestingly, I have found that in the science crowd, the notion of “documenting your data sets and/or research” is a way of explaining metadata. This might not work for the humanities crowd or the general public. Actually for the general public, I provide the following example. I first ask if they have ever been in a library. If they have, I ask if they’ve used a catalog. If so, I say that the information they see is metadata. If not, then I ask if they’ve used Amazon or something like it to buy stuff on the web. I then say that the information they see about the product is metadata. This explanation works fairly well, even though it doesn’t fully explain metadata or metadata services.

No matter the steps taken, this process is ongoing. Just like Laura mentions in her post, metadata services and technical services will evolve. There are certainly those who will resist. But the changes are already here. I see these changes as an opportunity to promote the hard work of metadata librarians and the services that are important for users and fellow staff.

2 Comments

Filed under cataloging