Inquiry about Bot Traffic Control in Omeka Classic

Hi there,

I’m looking for information on how Omeka Classic handles bot traffic control. Our site is currently getting hit hard by aggressive AI bots, and robots.txt isn’t helping.

The current solution of manually blocking subnets is time-consuming. Are there any application-level features within Omeka Classic designed to mitigate this kind of modern bot traffic?

I feel your pain. It’s been a huge problem for me. The solution I used was to funnel all traffic through CloudFlare and to block nearly every country except US, Canada, and European nations. To my knowledge there is nothing in Omeka that will help.

1 Like

Please see @jflatnes’s response to the same inquiry on github: Inquiry about Bot Traffic Control in Omeka Classic · Issue #1092 · omeka/Omeka · GitHub

1 Like

I’m experiencing the same problem: I’m monitoring the Sessions table (through the plugin Admin Tools), and it’s getting really inflated by bots visiting our site.

we have other similar application, they have challenge page if enable. maybe omeka developer can consider it too.

It seems like an interesting idea.

The trap code should be run in the very beginning, when Omeka is generating the new session. Any idea how to do it? Or should it be added to the core somehow?

I don’t think we’re likely to put something like this in the core. If for no other reason that the kind of necessary/useful techniques shift over time in a way that probably makes the core not the ideal place.

If anyone is thinking about building something like this as a plugin, I’d probably look at doing it with a controller plugin: you can have code that runs very early in the process of going to any URL.

Also I’ll note here that Classic 3.2 should eliminate constant growth of the sessions table for anyone whose servers have/had that problem before: bots can still create lots of empty sessions but they should be culled periodically.

But wouldn’t it be too late, John, when the plugin kicks in? I would assume the Session had already been created, by that time…

The session will probably be started by then, yeah. But the sessions table entries are more of a minor annoyance, really: for most, stopping these requests from getting to the actual queries used to generate the page is what matters.

As others have said, I’ve started using Cloudflare and the problems have dropped off significantly.

This stated…the files served by Omeka are all in an S3 bucket, which is also getting hammered, so now I’m trying to figure out how to integrate the S3 bucket into Cloudflare!

Thanks for the idea, I just created a new module Bot Challenge that integrates the logic of the js of AtoM, so it fixes all sessions issues.

Plus the modules Cron or EasyAdmin that include a regular task to remove old sessions, normally there is no more issue on this.

But it is for Omeka S, not Omeka Classic.

Since it hasn’t already been mentioned, if you’re having bot registrations, like we had a number of last year, you can add a captcha challenge to the registration form, if you haven’t already

Hi Daneil, thanks for the great work. do you consider to develop an plugin for the Omeka Classic? Thank you.

right now, the good/bad bots can access websites in any page. not just to prevent submit form.

Hi, the “aggressive bots” issue was discussed in an Omeka S thread and you can find a solution in Bots and Find/Search Pages - #2 by boregar

Hope this helps until Daniel writes a module for Omeka Classic too :wink:

Yes, it is published BotChallenge for Omeka Classic.

Great work!!! we are absolutely implement it. thank you so much !

Thank you, Daniel. Testing it right now, will report.

Do you think it would be useful to have a logging option, to see the “blacklisted” failing visitors? One might than use that information to block them at .htaccess level, for instance…

Omeka is only an app and normally, the it department or the service host should manage all the infrastructure that protects the app (firewall, proxy, anti-bot, anti-dos, etc.). BotChallenge and similar tools are only useful when omeka is served directly on the internet.

So the first thing is to ask the service host or the it department what are their measures to protect omeka. If they don’t have or if they don’t care, the module is useful. So you can improve it as you want to store ip, etc., but probably useless, ips are easy to spoof. For my part, i don’t have a static ip and i click on cloudflare checkbox ten times a day.