Sudden and massive increase of the database, session table

Hi,

On a Omeka S installation (4.0.1), I noticed a sudden and massive increase of the mysql database. The database was ~150M and in one day jumps to 3.2G with no activities in the admin side.

The tables ‘session’ and ‘log’ are concerned.

I understand the log “table” which contains errors I can fix.

I am more confused about the “session” table. It contains 1948708 lines of this kind.

And the ‘BLOB’ data is always a session-data.bin file with information such as.

__Laminas|a:2:{s:20:"_REQUEST_ACCESS_TIME";d:1691479286.297107;s:6:"_VALID";a:1:{s:28:"Laminas\Session\Validator\Id";s:40:"sib5mt6q6i58cq07c7evvp6aijlh07lg70vjfj5p";}}

or

__Laminas|a:2:{s:20:"_REQUEST_ACCESS_TIME";d:1701257204.72632;s:6:"_VALID";a:1:{s:28:"Laminas\Session\Validator\Id";s:26:"5c28sgjih6jacrnd8lu7c4glfr";}}EasyAdmin|O:26:"Laminas\Stdlib\ArrayObject":4:{s:7:"storage";a:2:{s:14:"lastBrowsePage";a:1:{s:5:"admin";a:1:{s:5:"items";s:342:"/s/correspondance/rech_correspondance?q=&facet%5Bahpo:sentBy%5D%5B0%5D=Bertrand,%20Marcel&facet%5Bahpo:sentBy%5D%5B1%5D=Mittag-Leffler,%20G%C3%B6sta%20(1846-1927)&facet%5Bahpo:sentTo%5D%5B0%5D=Faidherbe,%20Louis&facet%5Bahpo:sentTo%5D%5B1%5D=Hertz,%20Heinrich%20Rudolph%20(1857-1894)&facet%5Bahpo:sentTo%5D%5B2%5D=Comon,%20Louis%20(1854-1918)";}}s:9:"lastQuery";a:1:{s:5:"admin";a:1:{s:5:"items";a:2:{s:1:"q";s:0:"";s:5:"facet";a:2:{s:11:"ahpo:sentBy";a:2:{i:0;s:16:"Bertrand, Marcel";i:1;s:34:"Mittag-Leffler, Gösta (1846-1927)";}s:11:"ahpo:sentTo";a:3:{i:0;s:16:"Faidherbe, Louis";i:1;s:35:"Hertz, Heinrich Rudolph (1857-1894)";i:2;s:24:"Comon, Louis (1854-1918)";}}}}}}s:4:"flag";i:2;s:13:"iteratorClass";s:13:"ArrayIterator";s:19:"protectedProperties";a:4:{i:0;s:7:"storage";i:1;s:4:"flag";i:2;s:13:"iteratorClass";i:3;s:19:"protectedProperties";}}

I wonder what is exactly this table and how can it could grow to almost 2G?

It is probably a robot that access your site?

By the way, this is a common issue and you can truncate the table without issue (except log out users).

I see you use the module Easy Admin, so you can remove these logs and sessions in the matching tasks in the section “database”.

Thanks Daniel.

I purged the session and log tables.

The “session” one is still filling. I guess probably a bot scraping the website (is it possible that it is someone using Osii module from another Omeka S?).

If it is a good idea (?), is there a way to disable/prevent the logging of the session table to avoid any saturation?

The session table is needed for any user sessions; it’s what underlies logins, etc.

When it grows very very large as you’re seeing, that’s often a sign that the server has session garbage collection disabled (i.e., the php.ini setting session.gc_probability set to 0). Some servers or distributions do this with the intention of cleaning out sessions using a cron job instead, but those cron jobs only work against PHP’s default file-based sessions.

Thanks John. Good to know!

I had indeed this option set to 0 on my server (running debian). I changed it.

Sorry to reopen this issue.

I have still this problem from time to time. When a bot scrapes this Omeka S installation, the session table grows very fast until there is no space left. We have several Omeka S on the same infrastructure and only this website is affected (I do not know if it because this one is more popular with bots or for some other reason).

I can manually purge the session table but it is not a automatic solution.

This Omeka S installation is on two different virtual machines (both debian). Omeka S files are in the VM1 and the mysql database is in VM2. Both VM have the setting session.gc_probability set to 1 in the /etc/php/8.2/apache2/php.ini

What can I do?

Just blocking the bots may be one answer. Any visitor gets a session generated. One issue with bots is that they generally don’t send the cookie back, so they get a new session started, and so on.

The enabled gc setting should be making the table get cleaned up from time to time (and probably pretty often if you have lots and lots of traffic) but if it’s filling up so fast as to be problematic before the lifetimes of the earliest sessions have expired, it won’t help (as nothing will be old enough to get garbage collected yet).