Maintaining a Healthy Omeka S Database

Dear Forum,

As we continue to keep adding collections and sites to our Omeka S installation for our institution, we wanted to know if we need to do anything to ensure that our MySQL database stays healthy?

Partly we noticed a growth of 10GB in the last two weeks without any major imports and site additions. We have over 50,000 items, roughly the same amount of media, 60 sites and two search engines running. We also have a number of Vocabs, including schema, and darwin:core on top of the defaults and a couple of custom ones. The total size of the omeka DB is around 50GB at the moment.

The IT team that helps us run the platform wanted to know from a long-term MySQL database maintenance angle are their recommendations for potential cleanup actions from Omeka S side or scripts that need to periodically be run on the DB?

We would be curious to know what the practices of the community are, as well as maybe how big the db gets?

Thanks for your time,

Sanjin

2 Likes

The database seems very big, so you probably have a lot of ocr transcription and full text.

So to understand the issue, check the database for big tables. Generally, the big tables are fulltext_search (that you can disable if you use an external search engine like solr, and there is a pr for this point too on github), log (that can be truncated or cleaned), session (that can be truncated too, and that grow up if you have many users and visitors; module EasyAdmin allows to do it manually), value (that must not be truncated, this is your records), reference (check its config to avoid to reindex big data; and next version will be a lot lighter). Maybe some other tables.

Dear @Daniel_KM

Thanks so much for that detailed breakdown of the things we could do and the quick assessment on our database. We do have two follow up questions, if you or anybody in the community has some time.

I am working with the IT team to be able to get an overview of the sizes of our tables in mySQL, we just need an interface to set up for that.

What we did do, was install the EasyAdmin module to try to manage the log and session tables on our DEV server (running Omeka S 3.2.3) which has our omeka S database from about a year ago (at that time 3.7GB).

+----------+--------------------+------------+
| Database | Table              | Size in MB |
+----------+--------------------+------------+
| IBALI    | session            |    1582.00 |
| IBALI    | reference_metadata |     742.06 |
| IBALI    | value              |     673.89 |
| IBALI    | fulltext_search    |     524.59 |
| IBALI    | log                |     140.70 |
+----------+--------------------+------------+

Question 1:
We saw that our session table was about 1.5GB and we used the options provided with that module to clear the records older than 1 day. We could see the records were deleted - but the size of the table is still 1.5GB. So this might be a very beginner mySQL question, but are we missing another step to reclaim that space? Do we need to do something like this: https://garrickvanburen.com/how-to-reclaim-disk-space-after-deleting-lots-of-mysql-records/

Question 2:
While you might be readings this, I wanted to pick up on you saying that reference is one table that could get big. And i see here that reference_metadata is quite big and wondering if it is the one you meant? We use a lot of the Reference module to generate a number of pages with indexes on different sites and we have the below disabled in our site settings;


Any particular advice on this table or the management of the reference module would be helpful - otherwise we wait for the next update when we move our system to Omeka S 4ā€¦)

Thanks for your time time,
Sanjin

For question 1, iā€™m not a mysql/mariadb expert, but the solutions you provide is the simplest one (backup and reimport database), so internal indexes files will be reset too.
For question 2, yes the current version of module Reference copies all the values to do quicker requests, but next version will proceed differently, via indexes and maybe views. It will be published in April.

Thanks so much @Daniel_KM, appreciate you taking the time to respond.