I’m looking for information on how Omeka Classic handles bot traffic control. Our site is currently getting hit hard by aggressive AI bots, and robots.txt isn’t helping.
The current solution of manually blocking subnets is time-consuming. Are there any application-level features within Omeka Classic designed to mitigate this kind of modern bot traffic?
I feel your pain. It’s been a huge problem for me. The solution I used was to funnel all traffic through CloudFlare and to block nearly every country except US, Canada, and European nations. To my knowledge there is nothing in Omeka that will help.
I’m experiencing the same problem: I’m monitoring the Sessions table (through the plugin Admin Tools), and it’s getting really inflated by bots visiting our site.
The trap code should be run in the very beginning, when Omeka is generating the new session. Any idea how to do it? Or should it be added to the core somehow?
I don’t think we’re likely to put something like this in the core. If for no other reason that the kind of necessary/useful techniques shift over time in a way that probably makes the core not the ideal place.
If anyone is thinking about building something like this as a plugin, I’d probably look at doing it with a controller plugin: you can have code that runs very early in the process of going to any URL.
Also I’ll note here that Classic 3.2 should eliminate constant growth of the sessions table for anyone whose servers have/had that problem before: bots can still create lots of empty sessions but they should be culled periodically.
The session will probably be started by then, yeah. But the sessions table entries are more of a minor annoyance, really: for most, stopping these requests from getting to the actual queries used to generate the page is what matters.