I’am experiencing a really slow CSV import (running version 2.6.1 of the CSV import module on Omeka S 4.4.1). It seems to be focussed on the “creates”, not the “updates”.
This particular CSV import does not have any links to media nor iiif, it does have references (bij sdo:identifier) to other Omeke resources. A module in use which might have an impact (and I do not want to disable for obvious reasons): Ark (version 3.5.15). Other modules like Solr Search, History and Statistics are already disabled to hunt down the bottleneck.
Currently this Omeka S instance has about 1.7M items each with an ARK identifier.
I have now used xhprof to measure the the import action, specifically line 357 of the create function in the Job/Import.php file.
The profiling output of a single execution of this function has the following output:
This shows that the _dba_fetch_range and dba_nextkey functions - which can be found in Ark/vendor/daniel-km/noid4php/lib/Noid.php - take nearly 400 seconds to mint 50 new ARKs for 50 new resources !?
Is this expected behaviour? Well, if I look at the code of Noid.php in daniel-km/noid4php (version 1.1.2 as is specified in composer.json of the Ark module, 1.2.1 is the current version of noid4php released in april this year) I see the following comment with the _dba_fetch_range function:
/**
* Workaround to get an array of all keys matching a simple pattern.
*
* @internal The default extension "dba" doesn't allow to get range of keys.
* This workaround may be slow on big bases and may need a lot of memory.
* @todo Build a partial temporary base to avoid memory out for big bases.
*
* @param string $pattern The pattern of the keys to retrieve (no regex).
* @param resource $db
* @return array Ordered associative array of matching keys and values.
*/
@Daniel_KM can I use noid4php version 1.2.1 as a drop-in replacement for version 1.1.2? Or will this be part of the nex release of Omeka-S-module-Ark?
Would disabling the ARK module for the import and enabling after the import work, or would the “Create ARKs” admin action then take a very long time (just shifting the problem).
I really want ARKs for all of my items, so I hope a solution can be found to make creating ARKS a scalable function.