OAI-MPH haresting times out after 10 seconds

Testing the OAI-MPH plugin I received the following message:

“Read timed out after 10 seconds Please check to be certain the URL is correctly formatted for OAI-PMH harvesting.”

Now the used URL is kind of correct, as I received it from the admin of the data directly:

gerard.limburg.be/repox/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc&set=KULeuven

You can test yourself and you actually will see the data.

However, OAI-MPH validator (http://validator.oaipmh.com/ ) spawns some error messages, which I find hard to parse:

HTTP status 200 Content type text/xml; charset=utf-8 Content XML checked. Request time is 0.129 sec OAI-PMH protocol version is 2.0. Invalid adminEmail mailto:jmalliet@limburg.be.

Can anybody help?
Kind regards,
Cor

Basically the problem is that the repository you’re trying to harvest from is too slow. The harvester wants the server to send back a response within 10 seconds, and it seems like that server takes closer to a minute to respond.

You could alter the code of the harvester to tell it to wait longer for a response, by adding a timeout line here. Alternatively you might ask the administrator of the repository to investigate why it’s so slow to respond.

You may also have another problem: the URL you should be giving the harvester won’t have all those parameters on it, it should just be http://gerard.limburg.be/repox/OAIHandler. Fixing that URL will probably get you further along in the process, but you’d probably just hit the timeout problem after we tried to start the harvest.

We did change it to respond quicker and then tried again:

http://gerard.limburg.be/repox/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc&set=KULeuven

This time error message is different, but not very instructive all the same:

All items created for this harvest were deleted on 2016-06-13 09:43:20

Notice: badArgument: The request includes illegal arguments, is missing
required arguments, includes a repeated argument, or values for
arguments have an illegal syntax. (2016-06-13 10:31:08)

Notice: No records were found. (2016-06-13 10:31:08)

Notice: Did not receive a resumption token. (2016-06-13 10:31:08)

Ok we understand that there is something wrong in the argument, but what exactly?

I mentioned it in my last message, but sort of buried at the end.

Try using this URL instead: it’s the same one, but just without the query string on it: http://gerard.limburg.be/repox/OAIHandler

The harvester handles the query string for you, so you don’t need to (and shouldn’t) have it on there.

We tried that one too. The result is that we receive all the sets, but we cannot harvest any of them, because they remain empty. I can’t select anything under the “Go” dropdown menu’s. If I push the GO button, Omeka gives me a generic error message:

Omeka has encountered an error
To learn how to see more detailed information about this error, see the Omeka Codex page on retrieving error messages.

But I can’t retrieve any error message apparantly.

So, you’ve tried to follow the directions on that codex page, but they didn’t work?

As for being unable to select any of the metadata formats, it looks like there’s some problem with that repository’s ListMetadataFormats response: none of the formats declare their schema URL, and the only declared namespace URL is for oai_dc and it’s not the one mandated by the OAI standard.

The output from this repository looks like

<metadataFormat>
    <metadataPrefix>oai_dc</metadataPrefix>
    <schema/>
    <metadataNamespace>http://krait.kb.nl/coop/tel/handbook/telterms.html</metadataNamespace>
</metadataFormat>

while the standard mandates this

<metadataFormat>
    <metadataPrefix>oai_dc</metadataPrefix>
    <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
    <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</metadataFormat>

Basically, the harvester doesn’t see any formats that it knows how to harvest from that repository. oai_dc is there and the harvester understands that, but ListMetadataFormats needs to correctly list the schema and namespace for the harvester to “see” it. This is something that needs to be resolved on the repository’s side.