Re-mapping DC Extended for OAI-PMH Export

tdahn · May 23, 2017, 2:38pm

Hello,

I’m trying to map the DC Extended fields in the OAI-PMH repository plugin, but am unsure how to call them into the array that the OAI plugin populates. I’ve looked up the name for the extended fields I’m interested in (Spatial Coverage and Temporal Coverage. An example:

array(
    'label'       => 'Spatial Coverage', 
    'name'        => 'spatial', 
    'description' => 'Spatial characteristics of the resource.', 
    '_refines'    => 'Coverage'
),

But when I add ‘spatial’ into the $dcElementsNames array in OaiDc.php it breaks the output.

    $dcElementNames = array( 'title', 'creator', 'subject', 'description', 
                             'contributor', 'date', 'type',
                             'format', 'identifier', 'source', 'language',
                             'relation', 'coverage' );

I think this may have to do with it actually refining the existing metadata field, but I don’t understand how to get the extended metadata to post in output of the OAI Repository plugin, and then how to remap it.

Any help would be appreciated, thanks!

Tristan

jflatnes · May 23, 2017, 4:17pm

The oai_dc format operates under an assumption that only holds true for the basic DC set of elements: that the name in the namespace is just the lowercased version of their label. For Spatial Coverage, this doesn’t hold true. Besides, there’s various other problems in trying to just add it to the oai_dc output: “spatial” isn’t in the legacy Dublin Core namespace that oai_dc works with, and in general OAI harvesters aren’t going to look for other elements there anyway.

The RDF format does already include all the Dublin Core Extended elements, so if you can use that format instead, that may be a simpler option.

tdahn · May 23, 2017, 5:45pm

Thanks for your reply! We’re using the OAI-PMH harvester for ingest into a DPLA hub, so I will contact them to see if the RDF format is suitable for their use. The issue of the harvester not looking for extended elements is exactly what I’m encountering, but was hoping to be able to map these elements to existing DC core elements for the purpose of the harvest without having to change the metadata on our end (Temporal Coverage -> dc:coverage, Spacial Coverage -> dc:subject).

Best,

Tristan

jflatnes · May 23, 2017, 6:36pm

Mapping the elements into the core set is possible, but it wouldn’t be as simple as just adding them to that array. You’d also lose the distinction between different types of Coverage. It’s definitely preferable if the RDF harvest can simply be used.

tdahn · May 23, 2017, 9:11pm

Unfortunately, the way the aggregator for the DPLA in our region is built, they can not use the RDF output from the OAI-PMH repository plugin. Any help on pulling the DC Extended elements into the OAI-DC output would be greatly appreciated. Losing the distinction is not an issue since they are not using those elements anyway. They would like us to map those elements as such: Temporal Coverage -> dc:coverage, Spacial Coverage -> dc:subject

Thanks as always for your help!

Tristan

jflatnes · May 24, 2017, 12:22am

Writing a version of oai_dc that just automatically mapped the proper elements into the things they refine would be one way, that would be good to have in the upstream plugin itself. However, you’re looking to kind of “cheat” Spatial Coverage into Subject, so it’ll have to be something a little custom for your case.

What you probably want to do is this: at the end of the $dcElementNames loop, add this:

if ($elementName === 'coverage') {
    $refinedTexts = $item->getElementTexts('Dublin Core', 'Temporal Coverage');
    foreach ($refinedTexts as $elementText) {
        $oai_dc->appendNewElement('dc:'.$elementName, $elementText->text);
    }
}

Add another similar block right after that refers instead to 'subject' and 'Spatial Coverage' to handle that mapping, too.

mjlassila · May 25, 2017, 10:44am

You might find helpful to check out the customizations (additional output format) I have developed to enable harvesting Omeka-powered sites with item type metadata and DC Extented-fields See: https://github.com/mjlassila/plugin-DublinCoreExtended/blob/master/libraries/DublinCoreExtended/Metadata/Finna.php

jflatnes · May 25, 2017, 4:48pm

Yes, if your hub (or whatever the term is) can accept one of the various “qualified” DC formats, that would also be an option. The trouble there is there’s not one standard qualified DC format like there is for unqualified, so it’s much more hit or miss (partially why there isn’t a qualified DC format in the repository by default).

tdahn · May 26, 2017, 2:42pm

Thanks, Matti! Though I’ve got everything in order for the current harvest, this is very helpful for my understanding of how Omeka is handling the different element sets more generally.

And thanks as always John, that’s what the issue was, the hub does not ingest qualified DC for that exact reason, so mapping those fields to appropriate core elements was necessary.