High MySQL load Omeka API

I hope this is the right location in the forums. I am using the most current version of Omeka 2.7.1 on a shared hosting solution with a LAMP stack. Omeka installed just fine and is running well! The chief problem is trying to get a few hundred collections, tens of thousands of items, and a few hundred thousand files into Omeka. I had written a python script to digest our current IIIF content create collections, items, and files (rescaled for the project) then push to the Omeka API.

This was working splendidly until we got a nastygram about exceeding our server usage times by 300% - 2,300%. I can’t figure out what I am doing that would cause so much server usage. I have now implemented some caching solutions so that I don’t have to query if an item exists every time. On my local server that wasn’t a big deal but trying to squeeze every cycle out, I re-thought that approach for our paid solution.

Historically I have gone with the bigger hammer approach and just pumping more info as fast as I can but I can’t transfer fast enough to make the hit a one-time thing. (permission vs forgiveness thing)

Feedback from the provider says the long-running SQL Statements in order are

  • INSERT INTO omeka_element_texts
  • INSERT INTO omeka_files
  • INSERT INTO omeka_keys
  • INSERT INTO omeka_items
  • INSERT INTO omeka_search_texts

SQL After that are only a minor portion of the processing time.

Any thoughts on how I can optimize these some? Or should I scrap the idea of the API and go directly to the database? I can’t tell if SQL Transactions are being used or to what level so it is possible keys are rebuilt every single insert instead of as a group? I have had that problem with other systems.

I am stumped here. As it is shared hosting I am having to go back and for the with their techs to get server stats so it has been slow going to get here. Thanks for any assistance you can offer.

Classic doesn’t use transactions, so that could be an issue you’re seeing.

I’m a little surprised that even at that level that load on the SQL server would be a significant factor… typically the bulk of time spent when importing an item, no matter from what interface, is on generating thumbnails for the files, which doesn’t touch the database. I wonder how restrictive these server limits are…

If the issue is just INSERT time maybe you could/should just run the imports locally and move the install as a whole over after the fact?

Thank you for the suggestion. I had been considering duplicating the remote instance locally on a VM so that I could do what you suggest. Then just transfer it in one lump transfer to the provider again. I think I will gain some significant efficiencies in data transfer that way also.

As for server load at one point the hosting service had seen queries taking on the order of 2 seconds. A lot of 2 second queries add up fast.

I appreciate your time. Thanks!

I thought perhaps I should close this out. I did go with the solution suggested above.

  1. Did a database dump of the production database and backed up Server files
  2. Imported database locally and restored Server files locally
  3. Ran script to scrape my IIIF data source (the best option we had for source at the time)
  4. INSERT MANY DAY PROCESS HERE(Slow IIIF source likely rate limited from the provider)
  5. Backed up images from the local server
  6. Transferred images to the production environment
  7. Dumped local database
  8. Restored only the tables with new data in them for the items and files using transactions.

I doubt this will be of assistance to others but in the event it helps anyone that’s what I did.

Thanks for pointing me in this direction.