Big sessions table

On my servers, the table “session” (in Omeka Classic and Omeka S) is bigger and bigger, a lot more than all other data, some gigabytes. Is it safe to truncate it regularly?

It’s safe to truncate the session table, yeah. Obviously if you truncate it totally, you’ll lose any active logins. You could also drop only the rows older than a certain date to avoid that being an issue.

When and whether old entries get cleared out in the normal course of operation is controlled by PHP settings for session garbage collection (by default there’s some randomness involved). On some distributions (Ubuntu at least comes to mind) that garbage collection is disabled by default in favor of a cron job that just regularly clears out sessions, but that cron job only handles file-based sessions in the default location, and so doesn’t do anything for database-stored sessions like Omeka’s.

Ok, thanks. I added such a task to the module Bulk Check.

1 Like

I am also having this problem in Omeka Classic. According to Google Analytics, my site gets only 200-300 page views per day; however my sessions table is growing by 5,000 or more rows every day. It had actually hit 2.9 million rows when I discovered the problem, found this post, and truncated the table.

I will set up a cron job to clean up the table since garbage collection is clearly not working, but I’m puzzled as to how such a low-traffic site can possibly trigger creation of so many sessions. I don’t know if the Analytics numbers are meaningful. I do think the site is heavily crawled based on how many organic search hits we get (a good thing).

Can anyone shed light on this?

The “extra” sessions are almost certainly being created for bots and crawlers, yes.

As for garbage collection not working… it can be an issue of configuration of your server. PHP’s standard settings are to basically use a randomized system to decide to do garbage collection “every once in a while” (with the chance GC will occur configurable in php.ini).

Some distributions (Ubuntu comes to mind) turn off the PHP garbage collection for sessions completely and do it themselves with a “cron” job by default, but this only accounts for PHP’s default file-based session storage, not the database-backed setup Omeka uses.

I did check with my host (Reclaim – dedicated server that I manage for 16 Omeka sites) – they were not aware of any configuration issue that would affect GC.

Today I noticed this message showing up on several of my sites in the root level error_log file:

[02-Jul-2020 19:44:50 UTC] PHP Warning: session_destroy(): Session object destruction failed in /home/coaedu/public_html/application/libraries/Zend/Session.php on line 772

Presumably session_destroy is called by GC?

I’ve created a cron job to do the cleanup, but this is disconcerting since it might be an indication of some other problem.

Thanks in advance for any additional feedback.

You could look at what phpinfo returns for your session settings: GC frequency is controlled by session.gc_probability and session.gc_divisor

They are both set to the default of zero. I have opted not to experiment with those just yet, because I’m trying to assess the problem as-is without introducing any new variables. Or do those default settings prevent GC altogether?

Also, there’s the fact that session_destroy() is failing sometimes.

Ultimately the real problem is that this particular site, a small university, is getting pounded by crawlers. It had over 5 million records in the sessions table and when I deleted all, but ones from the last 7 days, there were still 68,500!

Do you know why the session records are needed for visitors who do not log in? Especially since cookies are also used. Is it for things like remembering what page a visitor is on when paging through lengthy search results?

Do you know if there’s a way to tell from the session record is for an Omeka user vs a visitor? If I could tell, then I could delete non-user sessions every night.

Any info you can provide is appreciated. Thanks.

Zero is not the default for those settings: if that’s what they’re set to, it indicates that the GC situation on your server is probably the setup I described: PHP’s default system is disabled and there is probably a cron or something else cleaning the sessions, which won’t clear out Omeka’s session data.

The PHP session system uses a cookie that just stores the session ID for the user, the actual data is stored on the server. As for why sessions are created for users who don’t log in, that’s just the way the authentication system we get from Zend works: we have to turn on the session to see if the user is logged in, and for users who aren’t, that sets a cookie and creates and saves a stored session. The Omeka core really doesn’t use sessions for anything for anonymous users, though plugins would be free to do so (sessions work just fine for anonymous users… I don’t know off the top of my head if we have any plugins that are really doing this, though). The crawlers are probably just throwing away the cookies, so they’re getting a new session on every request.

I think it would be possible to alter the way the authentication and/or session systems work to not create rows for “empty” sessions with no information associated… but there could/would be side effects, like actual anonymous users possibly getting a new session cookie on every request, so it’s not necessarily a trivial change to make.

Thank you for your detailed and informative reply. I mistakenly assumed that zero was the default because neither session.gc_probability nor session.gc_divisor are set in the copy of application.ini that ships with Omeka 2.7.1. I’ll pass this information along to Reclaim to see what I can learn from them.

The crawlers are probably just throwing away the cookies, so they’re getting a new session on every request.

That would make sense and explain the crazy high number of session records. If a crawler hit just one link of search results and chased down each of those and the subsequent link on the Item pages, there would be hundreds or thousands of requests.

For now my own nightly cron job is keeping things under control, but I’ll giving some thought to how to improve the situation and let you know if I come up with anything.

This topic was automatically closed 250 days after the last reply. New replies are no longer allowed.