OAI-PMH reharvesting doesn't overwrite as expected

We would like to ‘force’ a reharvest any time that the reharvest button is clicked (rather than only when an item record has been updated). We thought this was successful by commenting out the section of code that sets the startFrom date in the Harvest.listRecords method as well as commenting out the date check in the Abstract._harvestLoop method. This appeared to trigger a forced reharvest (resumption tokens were being found) but the data in the items was not changing. Specifically, I manually changed a title to see if it would be updated and the modified title remained after the harvest.

Furthermore, we also tried ‘forcing’ the harvest by actually modifying the datestamp in the item record but this still did not cause an update to the data in Omeka. It seems that a reharvest is only bringing in new items from the collection but not modifying the existing items. Can someone please verify this?

We need to have this functionality work because we are using some logic that updates a custom database based on what is contained in the dc:identifier field. In many cases, the item record may not change but the transformation that configures the dc:identifier field changed. Therefore datestamp would not be updated.

Thanks!

The harvester should update records that have changed, but it will only do so if the OAI record’s datestamp is more recent than it was when it was first harvested. You said you removed that logic, though, so it should be updating. Since you’re not getting duplicates, it’s correctly detecting the old records, so that’s not the problem.

Is it possible that there’s some mistake in the changes you made? I tried out a “fake” reharvest myself by editing the stored datestamps, just to confirm, and it correctly overwrote my changed text.

Looks like it was a problem of some dirty data. I was able to successfully force an overwrite of data.

Hello, we have a similar problem. We are using Omeka for a metadata aggregator and we have harvested multiple collections, but now a problem has come up, we have tried to reharvest a collection from a DSpace repository, which has about 10 new records and it doesn’t work. Could you help us identify the problem? Thanks!

What version of the harvester are you using? Is it possibly a fork or modified version?

The log there indicates an issue with the until parameter, but the normal harvester doesn’t actually use until ever.

This is the information you see in the Omeka plugins section:

OAI-PMH Harvester

Versión 2.0.2 por Roy Rosenzweig Center for History and New Media
Harvests metadata from OAI-PMH data providers.

Is it possible that someone could have made local modifications?

I just can’t come up with another reason that an “until” parameter would appear.

No, I did the Omeka installation, but I don’t know how to program this plugins …

OK.

I tested this on my own install of the harvester and received the same error. I believe this is an issue with the server you are trying to harvest from and not the Omeka plugin.

An example query of the kind the harvester will try to run is: http://repositorio.filo.uba.ar/oai/snrd?verb=ListRecords&metadataPrefix=oai_dc&from=2020-09-24

You’ll see that indicates the same error you’re seeing logged. There’s two problems here. One: the error message just incorrectly uses until when we actually passed from as a parameter. Two, and more importantly, the date we passed, 2020-09-24, is a valid value for that parameter. The OAI-PMH spec requires all repositories to support values of the from parameter expressed as a date, as in this example.

This looks like an issue that may have been previously fixed by DSpace.

Thank you very much. We are going to do other harvest tests in other repositories and contact this one where we had the specific problem.