In preparation for ALA, I’ve been thinking about creating ideas. What makes an idea creative? When it is a good time to say good bye to a creative idea? How do you evaluate creative ideas? These are great questions to ask and keep asking yourself as you take on new projects or are just looking to take an old project in a novel way. I have to admit that I like coming up with ideas. Granted not all are creative but when I have time I like to try them out. Typically I like to think of ideas that make doing work easier. Recently, I thought I had a good idea that was in reality unsuccessful. My idea was to create a MODS spreadsheet or an excel spreadsheet with columns for MODS elements that our partners could use to put in descriptive information for their objects that needed to be ingested into our digital repository. The primary reason I thought of a spreadsheet was that the majority of our partners don’t know xml. So working directly in MODS xml was not something that was possible without intensive training. Actually most partners don’t have the staff or time for this type of training. Their first priority is to get their content out there to their users in a way that is efficient for them. I wanted the spreadsheet to be flexible. Partners could add as many name subjects, name creators or contributors, genres, etc. as they wanted. The initial reaction to the spreadsheet was positive. All of the partners understood excel and were used to working in this format to create descriptive metadata. However, the problem arouse with how to process these spreadsheets to create individual MODS xml records. There are a number of scripts that can be written to process a CSV to xml file. However, the challenge was that this file would never be “fixed” or would never have the same number of columns. This is because the parameters of the spreadsheet included the ability to add columns for those who wanted to add multiple creators, contributors, subjects, etc. Writing such a script wouldn’t really be possible since the number of columns would always be unknown. Also, writing a schema to save the spreadsheet as an xml file would also be difficult since this would be a custom schema every time given that the columns would never be the same for each file.
Now you’re probably asking yourself (and I also asked myself) what I was thinking. Actually looking back on it, my thought was that these spreadsheets would be sent to me and I could process them. You might ask: how does that solve the problem? Originally my thought was that partners’ spreadsheets would be sent to me and I would process them using a little technique I’ve developed (a future post :)). However, I didn’t correctly consider the scale and the impact of this type of workflow. I think that this could work for a limited amount of partners, perhaps 2-5. Though even with a limited number of partners this solution is not feasible for the long term since it creates a bottleneck with MODS xml record creation. In my case, our partners went far beyond 5 and is perhaps getting close to 50 or so institutions. I guess my enthusiasm for MODS records just made me forget about efficiency :). With any number of partners wanting to create MODS records to be ingested along with their digital content, it is essential to avoid any bottlenecks. Having all spreadsheets go to one person was not only inefficient but also didn’t help any of our partners understand their metadata better. I have to admit that I still like the idea of MODS spreadsheet templates. But let me get to that later.
Looking back on this idea, I created the spreadsheet so that it would conform to our implementation of MODS but also to my script that I used to transform this spreadsheet. So subjects or any multiple entry terms were separated by a semicolon. Here’s some reactions to the spreadsheet:
- Some partners felt that this was an absolutely right interpretation of our MODS requirements. As a result, with their data exported from their content management system, they tried to do some excel *magic* to make it conform to the spreadsheet I created.
- Other partners were using both the spreadsheet and an online submission form. The spreadsheet was to used for batch uploads. The online submission form was used in our digital repository once logged in to upload one digital object at a time. In this case, some entered multiple entries separated by a semicolon. However, the online submission form was designed to accept only one entry per field, much unlike the spreadsheet.
In both cases, I saw immediately that the spreadsheet was causing confusions. We made the decision to scrap the spreadsheet. We put our focus into providing a how-to for the online submission form. Since then there have been no multiple entries separated by semicolons.
What’s the lesson? Looking back on this idea, I created the spreadsheet with the hope of helping our partners create descriptive metadata for their digital objects. However, the biggest flaw that I can see now is that I created the spreadsheet based on my workflow and my script. In a sense, it was a tool that made my work easier when I would receive these spreadsheets. The big lesson is to not assume workflows. Other way of putting this is to conceptualize creative ideas that can be integrated into various workflows not only one. The other lesson is to realize what made this idea flop and build better ideas from there. One reason why this flopped is that I was working on the assumption of partners adding or deleting columns. Actually, the spreadsheet can work if you fix (or lock down) the number of columns. For multiple entries, you can separate them with a semicolon. In you CSV to xml script, you would need to just test for null and multiple entries, which can be done. I haven’t given up on the spreadsheet, just rethought it. My next step is to look into a way that this spreadsheet can integrate into anyone’s workflow. An example could be a web page where people upload their CSV file and press “create MODS records” where the MODS xml records are saved to their local drive. Of course the goal would be to do this but avoid the confusion with the online submission form. This is a little more programming but I think possible with something like Python or Pearl! So more creative ideas on the way…