I have to give a shout out to Laura Smart at Caltech libraries and her recent post on her blog Managing Metadata. She writes about her current experience of moving beyond AACR2/RDA/MARC, DC and EAD. What is strange is that her experience is very similar to what is happening where I work. My library decided to implement a Fedora Digital Repository, which will eventually replace our vendor bought systems such as the institutional repository and our image digital collection. This repository will go beyond being a mere ir. It is intended primarily for long term preservation according to TRAC. It will house digital resources from various library units, our university community (such as datasets, marketing images, etc.), and then go beyond with partnerships with other institutions in the area. In other words, it is an important project in size and with the impact on our staff. For the work of metadata in the library, it is one of the first times that we are involved with developing the architecture of a repository system, creating guidelines, building crosswalks, and moving beyond AACR2/RDA/MARC, DC and pre-packaged metadata templates created by a vendor. Just like Laura, this is extremely exciting work that has a lot of brain teasers and the opportunity to learn. The steep learning curve with new standards, learning a new system called Fedora and its content model architecture (cma) has been a boon on how to think of (meta)data differently and how metadata is used and by whom at the various levels of a digital repository such as the system or presentation layers.
The steep learning curve came first with Fedora which is not a relational database. Just as Laura explains, it’s necessary to understand the “object” architecture of Fedora. The power of Fedora is that it is perfect as a gatekeeper for any digital resource in any format. In that respect, much groundwork needs to be done with implemented a Fedora digital repository. And the first step is to understand a little of how Fedora operates and then eventually developing a content model architecture (cma) while being informed of the variety of cma’s already implemented by others.
The other learning curve relates to the complexity of metadata in relation to the cma. In our first phase, we decided to implement Fedora with an Islandora presentation and administrative layer and Solr indexing. In this first phase, the aim was to ensure that materials could be entered into the system, indexed, and retrieved. At the same time, we discussed in particular which cma would best suit our needs. It turns out that Islandora is more of a lumper than a splitter in terms of its cma. This means that media content is not an object but a datastream within an object. For our purposes, we are going with a more atomistic model where media content is an object. How does this affect the metadata?
It affects it in several ways…
-We are implementing various metadata standards for descriptive, preservation, technical, rights, and administrative (structural) metadata. These standards are informed by the work done by others such as the DLF/Aquifer guidelines, OAI/PMH Metadata Conformance guidelines, various ISO standards especially for date/time or FGDC metadata, MIME media type standards, etc. It is an impressive list of standards to consult that touch on data structure, data content/value, and data format standards and guidelines. Having experience with some but unfortunately not all means learning a lot in a short period of time. I have found that I would love the equivalent of a list such as AUTOCAT for these types of metadata projects.
-We accept any descriptive metadata standards and normalize these metadata to MODS. This means that several transformations take place at time of ingest. Thankfully most transformations can be found through a simple Google search or just going to my favorite website for metadata – the Library of Congress Standards website. The only snag that I have found is with geographic metadata. There is a transformation from FGDC to DC and then DC to MODS that can be found on LC’s website or in MarcEdit. However, from my tests, a lot of data was lost. Also, our geographic metadata was not in FGDC, per say. It turns out that the system used as a geographic information cataloging and file management system is ArcGIS, which has its own propriety metadata format. The ArcGIS metadata format is a combo of three standards and their own metadata elements. Thankfully, ArcGIS can export data as FGDC or ISO19115. It turns out that our geo folks prefer and use the export to ISO19115 and not to FGDC. That meant getting down and dirty with what is for me an entirely new data format and value/content standard. to create a crosswalk from scratch.
-Another after effect is how to update information in the various datastreams in various objects. We are developing a profile for a METS document that we call the superset. The superset will contain descriptive, technical, rights, and preservation metadata that will create and/or update the various datastreams in all of our Fedora Objects that contain metadata according to our atomistic content model; unlike Islandora, we have a Grouping object for “collection” type level resources, primary content level, and the media object which contains the media content. Thanks to the structure of METS, we will easily know not only what type of metadata we are dealing with and which datastream needs to be created with that metadata but also in which object this metadata is supposed to live (either the grouping, primary container or media object). This means thinking about METS more of as a systems tool to transmit new data to datastreams at various cma levels and not as a structural tool to help organize logical or physical sections of a digital object.
-To help us think of these different objects, we have begun to think of our grouping, primary container and media object more in terms of FRBR. The grouping object represents the grouping of related container and media objects and acts like the intellectual idea or work. The primary container object acts like the expression and manifestation while the media object can represent an item. The atomistic content object model allows for more a FRBR type approach than Islandora that groups media content in with the container yielding more of a level that can represent expression, manifestation and item in one object. Though the analogy has some limitations, it is helpful in terms of seeing how FRBR user tasks can apply to (meta)data in our Fedora cma.
-Looking into user tasks led us to consider the function of metadata as data being used in function of users that include the Fedora system, staff, and a number of other potential users of our new digital repository. It is necessary to look into the different standards by type for metadata. However, how will these (meta)data be used and for whom? Thinking about the function of metadata especially in terms of FRBR’s user tasks (find, identify, select, obtain) in relation to our cma and the datastreams in those Fedora objects has prompted the discussion on just what are the functions of these data in our repository. From metadata, we are starting to refer to data in the plural instead. This focus on data allowed to stress that data including metadata can be structured in different ways for different purposes. In this sense, we are beginning to see potential uses of these data, especially by the system, to general different presentations such as timelines, calendars, interactive widgets, or the like because the data will not just be text strings but in a variety of structures where the system can be programmed to manipulate and transform these data in unique presentations for users.
There are other consequences as well. But this new work is definitely stimulating. It has prompted us to rethink how we look at data and metadata, users, architecture and the even the difference between data in theory and data in practice. Laura is spot on – this is fun.