Inquiry about Bot Traffic Control in Omeka Classic

Hi there,

I’m looking for information on how Omeka Classic handles bot traffic control. Our site is currently getting hit hard by aggressive AI bots, and robots.txt isn’t helping.

The current solution of manually blocking subnets is time-consuming. Are there any application-level features within Omeka Classic designed to mitigate this kind of modern bot traffic?

I feel your pain. It’s been a huge problem for me. The solution I used was to funnel all traffic through CloudFlare and to block nearly every country except US, Canada, and European nations. To my knowledge there is nothing in Omeka that will help.

1 Like

Please see @jflatnes’s response to the same inquiry on github: Inquiry about Bot Traffic Control in Omeka Classic · Issue #1092 · omeka/Omeka · GitHub

1 Like

I’m experiencing the same problem: I’m monitoring the Sessions table (through the plugin Admin Tools), and it’s getting really inflated by bots visiting our site.

we have other similar application, they have challenge page if enable. maybe omeka developer can consider it too.

It seems like an interesting idea.

The trap code should be run in the very beginning, when Omeka is generating the new session. Any idea how to do it? Or should it be added to the core somehow?

I don’t think we’re likely to put something like this in the core. If for no other reason that the kind of necessary/useful techniques shift over time in a way that probably makes the core not the ideal place.

If anyone is thinking about building something like this as a plugin, I’d probably look at doing it with a controller plugin: you can have code that runs very early in the process of going to any URL.

Also I’ll note here that Classic 3.2 should eliminate constant growth of the sessions table for anyone whose servers have/had that problem before: bots can still create lots of empty sessions but they should be culled periodically.

But wouldn’t it be too late, John, when the plugin kicks in? I would assume the Session had already been created, by that time…

The session will probably be started by then, yeah. But the sessions table entries are more of a minor annoyance, really: for most, stopping these requests from getting to the actual queries used to generate the page is what matters.