Ever since I found out about Laura Smart’s blog, Managing Metadata, I’ve seen some really exciting and useful posts. Her latest one on adding Caltech faculty to the National Name Authority File or VIAF is a great example. Laura first got a hold of an HTML file of all the current faculty at Caltech. Thanks to a trick in Internet Explorer, namely converting an HTML to an excel spreadsheet, Laura had a workable spreadsheet that she could convert into MARC authority records using MarcEdit. I didn’t know about this little feature of IE. There’s some resources out on the web that explain how to do this:
I wasn’t able to find dates on either of these webpages but what I could tell is that the information seemed relatively reliable. See also Laura’s description as well. In terms of how to convert a spreadsheet into MARC records, YouTube has some very resourceful videos on how to convert a delimited text file in MarcEdit to MARC records. Laura goes on to explain some of the difficulties, one of which is that the names are in first and then last name order, exactly what you don’t want for MARC records. So of course there is some cleanup of the data involved. However, this is a handy process to know if you need it.
In my case, as my institution is not a NACO member, we created a local name authority file. Though it is local, it’s good to keep in mind how this data is stored, in what platform, how the data can be exported, and how it is used to link various silos. In relation to the last point, the local authority file I was working with concerned articles submitted to our institutional repository. Many faculty have articles in our IR and in our catalog. Though we haven’t reached this point, it would be great to be able to create a link from our catalog to our digital collections using this local name authority file.
One last point, I really liked reading this post because it showed that in metadata, it is important not just to be concerned about the description of digital objects but also authority control. If we are to help our users find information in our digital repositories, we also need to present them with consistent data including names, subjects, dates, and what other elements are used in your repositories.
Copy of Laura’s post:
I’ve mentioned that we want to get authority records for all current Caltech facultyinto the National Authority File and by extension into the VIAF. The 1st step is to ensure that we have a current and comprehensive list of all faculty working here. I’m happy to learn that I can easily obtain the information in a manipulate-able form. I was expecting that I would need ask somebody in academic records and plead our case. Lists can be tightly guarded by powers-that-be. I just figured out that you canconvert HTML tables to Excel via Internet Explorer. That’s probably old news to most of you. I’ve done .xls to html conversion, I’ve just never had the need to go in the opposite direction. Plus I don’t use Internet Explorer.
I was able to create a spreadsheet of the necessary data by doing a directory search limited to faculty and running the conversion. Sweet! Now we can divvy up the work and get cracking. Getting the info is a small thing. But it’s these little victories which make my days brighter. I played around with the delimited text to MARC translator in MarcEdit to auto-generate records from the spreadsheet. It worked like a charm. Unfortunately the name info in the spreadsheet is collated within a single cell. Also it’s in first name surname order without any normalization of middle initials, middle names, or nicknames in parens. A text-to-MARC transform can only work with the data it is given. A bunch of records with 1oo fields in the wrong order isn’t so helpful. I messed about with the text-to-columns tool in Excel in order to parse the name data more finely, to no avail. It worked but would require much post-split intervention to ensure the data is correct. Might as well do that work within Connexion.
In fact, I’m ok with creating the authority records from scratch since we’re training to be NACO contributors. People need the practice. In my experience, it’s easier to do original cataloging vs. using derived records. Editing requires a finer eye and original work can be helped along with constant data and/or macros. Regardless, it was fun to play with the transform and teach myself something new. And it’s very exciting to take a step towards meeting our goal of authority/identity information/identifiers for our constituents.