Hello,
I am running into the following errors when trying to use the OAI-PMH harvester on a digital commons site:
Unable to Connect to ssl://digitalcommons.wou.edu:443. Error #0: Please check to be certain the URL is correctly formatted for OAI-PMH harvesting.
The other URLs I’ve seen in examples are all non-secure, so I tried http instead:
Read timed out after 20 seconds Please check to be certain the URL is correctly formatted for OAI-PMH harvesting.
I looked around on here a bit more and saw the advice to increase the timeout in Request.php, so I pushed that up to 120 seconds and instead got:
Gateway Timeout
The gateway did not receive a timely response from the upstream server or application.
According to validator.oaipmh.com, the request response time from the server is just barely over a second:
- HTTP status 200
- Content type text/xml
- Content XML checked.
- Request time is 1.005 sec
- XML complies with OAI-PMH XML Schema http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd
- OAI-PMH protocol version is 2.0.
- Valid adminEmail dc-support@bepress.com
Finally I moved onto looking at other repositories from https://www.openarchives.org/Register/BrowseSites to see if those were working any better, but hit the same snags with http://pdxscholar.library.pdx.edu/do/oai , http://agritrop.cirad.fr/cgi/oai2 and http://citeseerx.ist.psu.edu/oai2
Any idea why the harvester plugin is having trouble? From the fact that it’s not working with any of the repositories I try, I’m wondering if there’s a port or something on my campus firewall that needs to be opened for the harvester to get to the repositories.
Thanks for any help you can offer! 
Our campus IT department made Port 443 available for us, and now the harvester is getting responses from servers. Hooray!
Unfortunately I’ve now hit a PHP error:
RuntimeException: The configured PHP path () is invalid. in /webvol/application/libraries/Omeka/Job/Process/Dispatcher.php:114 Stack trace: #0 /webvol/application/libraries/Omeka/Job/Process/Dispatcher.php(93): Omeka_Job_Process_Dispatcher::_checkCliPath('') #1 /webvol/application/libraries/Omeka/Job/Process/Dispatcher.php(28): Omeka_Job_Process_Dispatcher::getPHPCliPath() #2 /webvol/application/libraries/Omeka/Job/Dispatcher/Adapter/BackgroundProcess.php(31): Omeka_Job_Process_Dispatcher::startProcess('Omeka_Job_Proce...', Object(User), Array) #3 /webvol/application/libraries/Omeka/Job/Dispatcher/Default.php(151): Omeka_Job_Dispatcher_Adapter_BackgroundProcess->send('{"className":"O...', Array) #4 /webvol/plugins/OaipmhHarvester/controllers/IndexController.php(156): Omeka_Job_Dispatcher_Default->sendLongRunning('OaipmhHarvester...', Array) #5 /webvol/application/libraries/Zend/Controller/Action.php(516): OaipmhHarvester_IndexController->harvestAction() #6 /webvol/application/libraries/Zend/Controller/Dispatcher/Standard.php(308): Zend_Controller_Action->dispatch('harvestAction') #7 /webvol/application/libraries/Zend/Controller/Front.php(954): Zend_Controller_Dispatcher_Standard->dispatch(Object(Zend_Controller_Request_Http), Object(Zend_Controller_Response_Http)) #8 /webvol/application/libraries/Zend/Application/Bootstrap/Bootstrap.php(105): Zend_Controller_Front->dispatch() #9 /webvol/application/libraries/Zend/Application.php(384): Zend_Application_Bootstrap_Bootstrap->run() #10 /webvol/application/libraries/Omeka/Application.php(73): Zend_Application->run() #11 /webvol/admin/index.php(28): Omeka_Application->run() #12 {main}
This shows up when I try to harvest. When I just retrieve the sets, everything works.
This error means that Omeka can’t detect the path to PHP-CLI on your server. That’s needed to run background jobs like imports or harvests that could run a long time.
IT or hosting should be able to tell you what the proper path for that is, and then you can configure it by setting it in the application/config/config.ini file. There’s a line in there for background.php.path
which is where you’d set it.
This is a topic that’s discussed in the manual, if you’d like to look there also.
1 Like
Thanks! This can now be marked resolved.
OAI-PMH is very old, but still efficient for harvest. According to OAI-PMH standard, the oai-pmh repository must be http, not https. Some libraries are strict with standards (like the French national library), and don’t have https for the OAI-PMH server, but most of libraries are now available as https.
1 Like
Thanks, Daniel. That makes sense!