At The New School Archives and Special Collections, we initially wanted set up an Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) data provider to be able to serve our digital collections records from our CollectiveAccess site to the OCLC Digital Collections Gateway so that they would be searchable on WorldCat. Later on, we wanted to configure it to be able to provide records to the Digital Public Library of America as well. Maybe you want to do this, too? Here’s what I learned.
On the CollectiveAccess official wiki, there is a page about setting up an OAI-PMH Provider. Here I learned that Providence version 1.4 and later has the capacity to serve database records compliant with OAI-PMH, and that it is controlled by the conf file oai_provider.conf.
First, I needed to set up a Data Exporter with an XML mapping. Since our profile is based on VRA Core 4.0, I needed to create a mapping to simple Dublin Core that would work for both OCLC and DPLA. OAI-PMH requires repositories to return records in simple Dublin Core at least, and can accommodate additional metadata schema as well. The export mapping needs to be set up before the OAI provider configuration file can be set.
Instructions on creating a mapping for data export are available on the CollectiveAccess wiki. At the New School, we are on Providence version 1.5, and according to the wiki the data exporter is available for version 1.4+. The mapping is an Excel spreadsheet, similar to an import mapping. I found looking at sample export mapping documents more helpful than looking at the instructions themselves. At the time, there was one very minimal sample mapping on the wiki page. When I posted on the user forum about my difficulties in setting up the OAI-PMH Provider, Stefan shared with me an export sample from the Brooklyn Academy of Music, which I used as the basis for our initial export mapping to be used for OCLC. I later altered our export mapping to meet DPLA standards, and used a sample mapping from the New York Archives shared with me by Chris Stanton, the metadata coordinator for our local DPLA hub, the Empire State Digital Network. (For more on metadata used by DPLA, check out this helpful post by Chris!) On a side note, I would love to make it easier to share more export mappings between users, whether on this site, the CollectiveAccess official site, or somewhere else — get in touch with me if you would, too!
The export mapping settings require the user to indicate a name and system code that can be used to identify the mapping document; the “table,” which indicates whether the mapping applies to objects, occurrences, entities, etc; and the exporter format which can be XML, MARC or CSV. Then there are a number columns to actually indicate the mapping of data. The first is “Rule type” to determine whether an element is to be mapped from a CollectiveAccess system metadata element (Mapping) or set to be the same value for each record (Constant). At the New School, we have two “Constant” values: one for our generic rights statement that is applied to all objects, and one that identifies us (the New School Archives and Special Collections) as the source for all records. The second “ID” row we don’t generally use — it can be used to more particularly specify the (in our case Dublin Core) element being mapped to. In the third column, “Parent ID,” we put “dc” to indicate Dublin core, then the fourth “Element” column specifies the particular (Dublin Core) element being mapped to from our system (dc: title, dc:subject, dc:date etc).
The fifth “Source” column indicates the system code for the metadata element being mapped from our system to Dublin Core (looks like: ca_objects.preferred_labels, ca_objects.lcshTopical, ca_objects.dateSet.setDisplayValue, etc). To find the element code for your metadata field, from the CollectiveAccess back end select “Manage” from the top menu then “Administration” from the dropdown then “Metadata Elements” from the left navigation bar.
The “Element code” listed here makes up the core of the string you will use in the “Source” column. It also needs a prefix indicating the type of record the metadata is associated (object, entity, collection, etc), and sometimes for a multi-part metadata element (called a Container in CA) , it will need a suffix as well. In most cases, you will probably be using an element associated with object records, so the prefix would be “ca_objects”. (A few system metadata elements aren’t listed here — title, for example, which in our system is always “ca_objects.preferred_labels”.) If you are looking for the code for a multi-part (Container) element, click the black page icon in the last right column on the element row. Then scroll down to the “Sub-elements” section.
Click the black page icon next to the relevant line containing the part of your multi-part metadata element that you would like to map. The next page will look similar, but you should be able to find the sub-element code you need in parentheses if you scroll to the bottom.
In the screenshots above, you can see in the first image the Description element with the code “DescriptionSet”. This is a container element with two parts: a description text and description source. Since I only want to map the description text, I open the editor for that sub-element, to find the system code in parentheses (“descriptionText”). To get the string to use in the Source column of the exporter document, I put these together, along with the prefix indicating these are metadata elements coming from object records: ca_objects.DescriptionSet.descriptionText.
The sixth “Options” column serves multiple functions, and isn’t required for every exported metadata element.
Once your exporter spreadsheet document is set up, log into your CollectiveAccess back end and select “Manage” then “Export” from the top navigation bar. In the screen that opens click “Add exporters” next to the big plus sign icon in the top right corner of the screen:
Then open the folder containing your exporter spreadsheet, and drag the file onto the screen. The code (which as you’ll remember was set in the exporter document) appears on this screen — you can select it and copy, to paste later.
Congratulations, you set up your data exporter! Now is the time to go back to that OAI configuration file, which can be found in the location indicated in the first screenshot in this post. (The file path in my system is “/collectiveaccess/app/public_html/admin/app/conf”.) Open oai_provider.conf using your favorite text editor. (I use Notepad++ for Windows, which can be downloaded here.) It should look something like this:
The next steps are relatively simple. Change “enabled = 0” (line 6 in the screenshot) to “enabled = 1”. For the name and email used for the identifier, and the identifier namespace, I put the name of my organization (line 14), our general archives email (line 15) and for identifierNamespace I put our digital collections URL. I’m not entirely sure I am understanding these setting correctly; my use of them is based on the description of Namespace here and Identify verb response formats here — if you have other ideas, please let me know! Then I added the code I created for my exporter document pasted over “code_of_your_oai_dc_mapping_here”, line 25 in the screenshot above. My edited OAI-PMH conf file looks like this:
I saved the edited file in the local directory within the conf directory and left the original file as is, which is best practice in general for edited configuration files to prevent issues when pulling updates. If you are running into problems, check your configuration file very carefully to make sure there are no typos. As you may have seen if you clicked on the link to my user forum post, I kept hitting a wall with my conf seemingly for no reason, but then discovered I had accidentally deleted a comma at the end of a line!
Some resources for testing your OAI-PMH Provider: