Gateway Timeout and other errors with OAI-PMH Harvester

Hello,

I am running into the following errors when trying to use the OAI-PMH harvester on a digital commons site:

Unable to Connect to ssl://digitalcommons.wou.edu:443. Error #0: Please check to be certain the URL is correctly formatted for OAI-PMH harvesting.

The other URLs I’ve seen in examples are all non-secure, so I tried http instead:

Read timed out after 20 seconds Please check to be certain the URL is correctly formatted for OAI-PMH harvesting.

I looked around on here a bit more and saw the advice to increase the timeout in Request.php, so I pushed that up to 120 seconds and instead got:

Gateway Timeout

The gateway did not receive a timely response from the upstream server or application.

According to validator.oaipmh.com, the request response time from the server is just barely over a second:

  1. HTTP status 200
  2. Content type text/xml
  3. Content XML checked.
  4. Request time is 1.005 sec
  5. XML complies with OAI-PMH XML Schema http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd
  6. OAI-PMH protocol version is 2.0.
  7. Valid adminEmail dc-support@bepress.com

Finally I moved onto looking at other repositories from https://www.openarchives.org/Register/BrowseSites to see if those were working any better, but hit the same snags with http://pdxscholar.library.pdx.edu/do/oai , http://agritrop.cirad.fr/cgi/oai2 and http://citeseerx.ist.psu.edu/oai2

Any idea why the harvester plugin is having trouble? From the fact that it’s not working with any of the repositories I try, I’m wondering if there’s a port or something on my campus firewall that needs to be opened for the harvester to get to the repositories.

Thanks for any help you can offer! :slight_smile:

Our campus IT department made Port 443 available for us, and now the harvester is getting responses from servers. Hooray!

Unfortunately I’ve now hit a PHP error:
RuntimeException: The configured PHP path () is invalid. in /webvol/application/libraries/Omeka/Job/Process/Dispatcher.php:114 Stack trace: #0 /webvol/application/libraries/Omeka/Job/Process/Dispatcher.php(93): Omeka_Job_Process_Dispatcher::_checkCliPath('') #1 /webvol/application/libraries/Omeka/Job/Process/Dispatcher.php(28): Omeka_Job_Process_Dispatcher::getPHPCliPath() #2 /webvol/application/libraries/Omeka/Job/Dispatcher/Adapter/BackgroundProcess.php(31): Omeka_Job_Process_Dispatcher::startProcess('Omeka_Job_Proce...', Object(User), Array) #3 /webvol/application/libraries/Omeka/Job/Dispatcher/Default.php(151): Omeka_Job_Dispatcher_Adapter_BackgroundProcess->send('{"className":"O...', Array) #4 /webvol/plugins/OaipmhHarvester/controllers/IndexController.php(156): Omeka_Job_Dispatcher_Default->sendLongRunning('OaipmhHarvester...', Array) #5 /webvol/application/libraries/Zend/Controller/Action.php(516): OaipmhHarvester_IndexController->harvestAction() #6 /webvol/application/libraries/Zend/Controller/Dispatcher/Standard.php(308): Zend_Controller_Action->dispatch('harvestAction') #7 /webvol/application/libraries/Zend/Controller/Front.php(954): Zend_Controller_Dispatcher_Standard->dispatch(Object(Zend_Controller_Request_Http), Object(Zend_Controller_Response_Http)) #8 /webvol/application/libraries/Zend/Application/Bootstrap/Bootstrap.php(105): Zend_Controller_Front->dispatch() #9 /webvol/application/libraries/Zend/Application.php(384): Zend_Application_Bootstrap_Bootstrap->run() #10 /webvol/application/libraries/Omeka/Application.php(73): Zend_Application->run() #11 /webvol/admin/index.php(28): Omeka_Application->run() #12 {main}

This shows up when I try to harvest. When I just retrieve the sets, everything works.

This error means that Omeka can’t detect the path to PHP-CLI on your server. That’s needed to run background jobs like imports or harvests that could run a long time.

IT or hosting should be able to tell you what the proper path for that is, and then you can configure it by setting it in the application/config/config.ini file. There’s a line in there for background.php.path which is where you’d set it.

This is a topic that’s discussed in the manual, if you’d like to look there also.

1 Like

Thanks! This can now be marked resolved.

OAI-PMH is very old, but still efficient for harvest. According to OAI-PMH standard, the oai-pmh repository must be http, not https. Some libraries are strict with standards (like the French national library), and don’t have https for the OAI-PMH server, but most of libraries are now available as https.

1 Like

Thanks, Daniel. That makes sense!

This topic was automatically closed 360 days after the last reply. New replies are no longer allowed.