Bulk "Items - Edit all - Add to site" very slow

Hi all,

I’m currently in the middle of a bulk operation that seems to take ages.
There are approx. 1900 items that I want to make part of a certain site. I’m using the Bulk operation (job) that can be started with “Items - Edit all - Add to site”.

The job seems to process items in batches of 100. Currently, it has done 900 items in 67 minutes (approx. 4.5 seconds per item). This means it would take around 2 hours and 11 minutes for the entire batch.
I find this an unusual long period of time for what is - in my opinion - a fairly simple database operation (i.e. add 1900 rows to the item_site table in MySQL).

When I look at CPU usage I see that the mysql process is usually using around 15% of CPU, sometimes spiking to 160% to 220%. (it is a multicore server running Ubuntu, therefore percentages above 100%)

What I find particularly worrisome is that the Omeka S instance (and all other Omeka S instances on this server) hang during the bulk operation and cannot be used for most of the time. I found that it hangs during the 15% phase of the CPU-pattern and is usable during the short 160% to 220% spikes.

Any leads on where I could start troubleshooting to optimize my setup?

Best regards,
Maarten Coonen

Please find the contents of the System Information panel in Omeka S below

Omeka S
Version 	4.0.4

PHP
Version 	8.2.27
SAPI 	fpm-fcgi
Memory Limit 	512M
POST Size Limit 	1024M
File Upload Limit 	1024M
Garbage Collection 	Yes
Extensions 	bcmath, bz2, calendar, cgi-fcgi, Core, ctype, curl, date, dba, dom, exif, FFI, fileinfo, filter, ftp, gettext, hash, iconv, imagick, intl, json, ldap, libxml, mbstring, msgpack, mysqli, mysqlnd, openssl, pcre, PDO, pdo_mysql, Phar, posix, random, readline, Reflection, session, shmop, SimpleXML, soap, sockets, sodium, SPL, standard, sysvmsg, sysvsem, sysvshm, tokenizer, xml, xmlreader, xmlrpc, xmlwriter, xsl, Zend OPcache, zip, zlib

MySQL
Server Version 	8.0.42-0ubuntu0.22.04.1
Client Version 	mysqlnd 8.2.27
Mode 	ONLY_FULL_GROUP_BY, STRICT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_ENGINE_SUBSTITUTION

OS
Version 	Linux 5.15.0-142-generic x86_64
Modules
active 	Archive Repertory (3.15.16), Ark (3.5.13.5), Block Plus (3.4.21), Blocks Disposition (3.4.2.3-beta), Bulk Edit (3.4.27), Bulk Export (3.4.30), CSV Import (2.6.1), Common (3.4.55), Create Missing Thumbnails (0.3.0), Custom Vocab (2.0.2), EU Cookie Bar (3.4.4), Extract Text (2.0.0), File Sideload (1.7.1), Generic module (3.4.46), Hide Properties (1.3.1), IIIF Search (3.4.7), IIIF Server (3.6.16), Image Server (3.6.17), Log (3.4.22), Mirador Viewer (3.4.7.16), Model viewer (3.3.0.7-132), NDE Termennetwerk (1.2.0), Numeric Data Types (1.11.3), Pdf Viewer (3.4.4), Resource Meta (1.1.0), Sitemaps (1.1), Universal Viewer (3.6.9), Value Suggest (1.17.2)
not_active 	Advanced Search (3.4.21), Advanced Search adapter for Solr (3.5.45), Statistics (3.4.7)
1 Like

While I was typing the post above the running job caused an error related to the extracttext metadata field. See below for the exact error.

Why is this bulk operation touching the metadata and (re-)inserting the values in the database? This seems unnecessary as it only needs to update the item_site table in my opinion.

The error (I’ve truncated the last part, as it spits out the entire full text stored in the extracted-text metadata field):

PDOException: SQLSTATE[08S01]: Communication link failure: 1153 Got a packet bigger than 'max_allowed_packet' bytes in /var/www/html/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php:117
Stack trace:
#0 /var/www/html/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php(117): PDOStatement->execute()
#1 /var/www/html/vendor/doctrine/dbal/lib/Doctrine/DBAL/Statement.php(179): Doctrine\DBAL\Driver\PDOStatement->execute()
#2 /var/www/html/vendor/doctrine/dbal/lib/Doctrine/DBAL/Statement.php(226): Doctrine\DBAL\Statement->execute()
#3 /var/www/html/application/src/Stdlib/FulltextSearch.php(48): Doctrine\DBAL\Statement->executeStatement()
#4 /var/www/html/application/Module.php(558): Omeka\Stdlib\FulltextSearch->save()
#5 /var/www/html/vendor/laminas/laminas-eventmanager/src/EventManager.php(319): Omeka\Module->saveFulltext()
#6 /var/www/html/vendor/laminas/laminas-eventmanager/src/EventManager.php(171): Laminas\EventManager\EventManager->triggerListeners()
#7 /var/www/html/application/src/Api/Manager.php(323): Laminas\EventManager\EventManager->triggerEvent()
#8 /var/www/html/application/src/Api/Adapter/AbstractEntityAdapter.php(499): Omeka\Api\Manager->finalize()
#9 /var/www/html/application/src/Api/Manager.php(236): Omeka\Api\Adapter\AbstractEntityAdapter->batchUpdate()
#10 /var/www/html/application/src/Api/Manager.php(146): Omeka\Api\Manager->execute()
#11 /var/www/html/application/src/Job/BatchUpdate.php(35): Omeka\Api\Manager->batchUpdate()
#12 /var/www/html/application/src/Job/DispatchStrategy/Synchronous.php(34): Omeka\Job\BatchUpdate->perform()
#13 /var/www/html/modules/Log/src/Job/Dispatcher.php(80): Omeka\Job\DispatchStrategy\Synchronous->send()
#14 /var/www/html/application/data/scripts/perform-job.php(66): Log\Job\Dispatcher->send()
#15 {main}

Next Doctrine\DBAL\Driver\PDO\Exception: SQLSTATE[08S01]: Communication link failure: 1153 Got a packet bigger than 'max_allowed_packet' bytes in /var/www/html/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDO/Exception.php:18
Stack trace:
#0 /var/www/html/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php(119): Doctrine\DBAL\Driver\PDO\Exception::new()
#1 /var/www/html/vendor/doctrine/dbal/lib/Doctrine/DBAL/Statement.php(179): Doctrine\DBAL\Driver\PDOStatement->execute()
#2 /var/www/html/vendor/doctrine/dbal/lib/Doctrine/DBAL/Statement.php(226): Doctrine\DBAL\Statement->execute()
#3 /var/www/html/application/src/Stdlib/FulltextSearch.php(48): Doctrine\DBAL\Statement->executeStatement()
#4 /var/www/html/application/Module.php(558): Omeka\Stdlib\FulltextSearch->save()
#5 /var/www/html/vendor/laminas/laminas-eventmanager/src/EventManager.php(319): Omeka\Module->saveFulltext()
#6 /var/www/html/vendor/laminas/laminas-eventmanager/src/EventManager.php(171): Laminas\EventManager\EventManager->triggerListeners()
#7 /var/www/html/application/src/Api/Manager.php(323): Laminas\EventManager\EventManager->triggerEvent()
#8 /var/www/html/application/src/Api/Adapter/AbstractEntityAdapter.php(499): Omeka\Api\Manager->finalize()
#9 /var/www/html/application/src/Api/Manager.php(236): Omeka\Api\Adapter\AbstractEntityAdapter->batchUpdate()
#10 /var/www/html/application/src/Api/Manager.php(146): Omeka\Api\Manager->execute()
#11 /var/www/html/application/src/Job/BatchUpdate.php(35): Omeka\Api\Manager->batchUpdate()
#12 /var/www/html/application/src/Job/DispatchStrategy/Synchronous.php(34): Omeka\Job\BatchUpdate->perform()
#13 /var/www/html/modules/Log/src/Job/Dispatcher.php(80): Omeka\Job\DispatchStrategy\Synchronous->send()
#14 /var/www/html/application/data/scripts/perform-job.php(66): Log\Job\Dispatcher->send()
#15 {main}

Next Doctrine\DBAL\Exception\DriverException: An exception occurred while executing 'INSERT INTO `fulltext_search` (
`id`, `resource`, `owner_id`, `is_public`, `title`, `text`
) VALUES (
:id, :resource, :owner_id, :is_public, :title, :text
) ON DUPLICATE KEY UPDATE
`owner_id` = :owner_id, `is_public` = :is_public, `title` = :title, `text` = :text' with params [2850, "items", 4, true, "Rechtsherstel jrg 7 1952-1953", "Rechtsherstel jrg 7  1952-1953\n\u201eRechtsherstel\u (...truncated...)

While I was typing the post above the running job caused an error related to the extracttext metadata field. See below for the exact error.

The “Got a packet bigger than ‘max_allowed_packet’ bytes” error could be related to the slowdown you’re experiencing. You could increase your database’s maximum allowed packet size and see if the error goes away.

Why is this bulk operation touching the metadata and (re-)inserting the values in the database? This seems unnecessary as it only needs to update the item_site table in my opinion.

During update operations, Omeka refreshes the resource’s fulltext. This is necessary to account for changes to the searchable text that may have occurred during the update. Maybe it’s this step – refreshing the fulltext – that’s slowing down your server. Do your resources have a substantial amount of metadata, from ExtractText or otherwise?

Try two things:

  • Deactivate the ExtractText module and run the bulk operation. Does it run faster?
  • Edit the Omeka’s application/Module.php file. In Module::saveFulltext() add a return to the beginning of the method. Run the bulk operation. Does it run faster?
public function saveFulltext(ZendEvent $event)
{
    return;
    ...
}
2 Likes

Thank you so much Jim! The ExtractText module was indeed causing this slowness.

I’ve temporarily disabled the module and ran a test on 243 resources that have their fulltexts stored in the “extracted text” metadata field.

Now, the “Items - Edit all - Add to site” operation completed in 58 seconds, averaging 0.24 seconds per item.
Previously, this would take 4.5 seconds per item; a huge difference!

Recommended approach for readers who find this topic in the future

  1. Temporarily disable modules “Advanced Search”, " Advanced Search adapter for Solr" and “Extract Text”
  2. Run your bulk operation on many items
  3. Re-enable the modules “Advanced Search”, " Advanced Search adapter for Solr" and “Extract Text”
  4. Go to “Search manager” in the admin panel and perform a reindex on Solr.

Thank you very much @jimsafley. This saves us so much time and frustration :smiley:

1 Like