Learning About Metadata

In my last post, I talked about the rise of metadata experts and how instead it was important to grow and learn about metadata as it is an ever evolving discipline. I have to say that being a part of a committee called eScience at my institution is instrumental in this belief. ARL definies eScience as “E-Science is computationally intensive science carried out in highly distributed network environments, such as science that uses immense data sets requiring grid computing or High Performance Computing to process. The term sometimes includes technologies that enable distributed collaboration, such as the Access Grid, and is sometimes used as an alternative term for Cyberinfrastructure (e.g. e-Science is the preferred term in the UK).  Examples of e-Science research include data mining, and statistical exploration of genome and other –omic structures.” I like to refer to eScience as disciplines that require intense computations and produce a heck of a lot of data. Some of the concerns in eScience are storage, security, file management, or meeting funding agencies’ requirements such as the 2 page data management plan. My institution, like others, are responding to the needs of researchers in eScience. We wrote a strategic agenda thanks to the ARL EScience Institute. We are working on an environmental scan, surveys, outreach, working with our Office of Sponsored Research and Univ. IT, and among other initiatives workshops for graduate students. All of these initiatives have granted better visibility to metadata and how metadata librarians can positively contribute in eScience. At the same time, it has shown me how much jargon we use in metadata departments that others don’t have a clue about. This became crystal clear with our workshops which are run by staff from the library, OPS, and Univ. IT. Thus far, I’ve had only 2 workshops. For the first one, I took a more detailed approach. The main criticism from attendees was that it was too detailed and metadata was not really their main concern, which was storage for large data sets. In lieu of this, I changed my approach for the second workshop to present metadata generally, a bird’s eye view. Attendees’ reactions were mixed. A minority liked this approached while the majority still had no idea what metadata is. This was extremely frustrating. So I went back to the MS Office PowerPoint board to try to find some happy medium. While doing this, I realized thanks to feedback from the attendees, colleagues, and well let’s face some good times with Buffy season 9, that there were several issues at play.

  1. Main concern of researchers: This might not be the case at every institution. Here, researchers in eScience are primarily concerned about storage, file management software or approaches, backup solutions, security, or data management plans required by granting agencies. On rare occasions, a researcher would like to know more about metadata. Interestingly, at the last workshop, a graduate student wanted to know more about FGDC CSDGM metadata standard, which is not surprising given that community’s longstanding commitment to metadata. What I’ve seen over the past year and a half that I’ve been working on the eScience committee is that researchers don’t list metadata as a main concern. This could be seen as a lack of concern for metadata in general. As many catalogers and metadata librarians know, it is a constant struggle to justify the importance of our operations even to our colleagues much less to the university community at large. Many catalog/metadata units have been decimated with much of the work being outsourced. Perhaps you remember the UMass Boston example where the catalog department was closed or the one at the Stamford Public Library was completely outsourced. These are certainly extreme examples. And they didn’t work. UMass Boston started re-hiring catalog and/or metadata librarians. I don’t think it is a question of not caring (though again in extreme examples this might be the case). One of the issues is that metadata is very much a hard term to define. We use a lot of jargon, which I’ll hit on next. But a real complexity to metadata is that it can been seen so to speak everywhere and nowhere. When I listed the main concerns of researchers, in order to find, discover and access their data – in particular with a file management solution, you need metadata. In fact, researchers use metadata everyday just when they save their file. Their word software will save that file name or perhaps automatically generate one, add the date when it was last modified. Some systems will provide a history of changes with the dates of those modifications. All of this information can be broadly defined as administrative metadata. The issue is not caring about metadata. Perhaps the issue is just not seeing something that permeates our digital word. Catalog/metadata librarians typically use a lot of jargon to describe this permeation.
  2. Jargon: Interoperability, interchangeability, shareability, discovery and access, standards, schemas, migration, data life cycle, etc. And then we have our acronyms. We really love those in library land. How many metadata acronyms do you know? Is DC your first or DDI or MODS, VRA? I always like FGDC CSDGM. It’s like the German of metadata acronyms. It seems even what might now seem like jargon actually is for those who don’t work in this area. A good example is what I learned from the workshop, namely “function”, “type”, “accuracy”, “consistency”, “normalization”. These seem fairly straightforward… well for me. But I also work with this pretty much 5 days a week. It’s part of my job to know what functions data need fulfill and from there to try to see what standards, policies, and implementations are possible to respond to those functions, which include a number of other factors such as the platform these data need to be stored and managed. This is a particular approach and a high level one that slowly goes down into nitty gritty questions of which attributes to use in MODS that best fit these functions. Let’s face it, these terms are helpful for those working in cataloging and metadata. It’s something we enjoy working on and even our colleagues (for the most part) don’t understand this jargon or our geeky concern for these ideas. So, how is it possible to express metadata without falling back on to jargon that you’re already comfortable with and without going into the tech services black hole of “we’re all going to be outsourced because they still don’t know what we do”?
  3. Metadata without metadata? How do you explain metadata without jargon and in the face of what seems to be a lack of interest for metadata? I don’t have the answer to that question. If I did, I think I’d probably be a consultant visiting a many catalog/metadata department. But I am willing to try a different approach in my workshop presentation. I still refer to metadata. I still refer to data about data. However, I put a question mark after “data about data”. Again, this is jargon and perhaps directed to us librarians. But a colleague mentioned that information about data was easier for her to understand. This makes sense in the realm of eScience where researchers are already working with data. Then they use information to act as a surrogate for their data. Another good suggestion was that researchers really needed to get their hands on something to understand it. In other words they needed a context they will were familiar with if metadata was to hit home. What’s really important for researchers and even graduate students working in eScience? Tenure. The tenure track starts early. If people don’t keep a record of their accomplishments, then they risk becoming tenure at a later date. How are these accomplishments recorded? Citations. Where do these citations come from? Published or unpublished works, presentations, committee work (if we enlarge the definition of citation), work on grants, etc. But where does the information come from to create these citations? Basically metadata or information about the various activities done in a researcher’s career. Yes, I realize this is a simplification of the entire process and even metadata. However, it is an interesting approach to help illustrate why metadata is important. First, metadata is information about a research endeavor whether that be a data set, dissertation, publication, presentation. That information helps a researcher get credit for their work by providing a marker. Of course, some markers come with authenticity labels – such as works published in peer review journals or those that have been confirmed by a doctoral committee for example. The one advantage to providing informational markers for individual research works is that research community has a way to verify the authenticity and accuracy of a researcher’s body of work.

It is difficult to leave the jargon aside. In many instances the jargon can provide a convenient shield against further staff and budget cuts. However, this same jargon runs the risk of alienating other colleagues and those in the community. Of course, what happens to metadata if it is explained indirectly within other contexts? Do catalog/metadata librarians run the risk of being seen as secondary since it does not take center stage? Perhaps this indirect approach isn’t indirect but a way to illustrate the permeability of metadata and its importance of providing discovery and access along with many of the cool features that users want such as timelines, maps, faceted searching, or genius results. It’s worth a try at least. What do you think?