Omeka's robots.txt file, problem with Google

Hi everyone.

We’ve been notified by Google that there are some issues with our Omeka installation: 52 files have been “indexed although blocked by the robots.txt file”.

I’ve checked, and found that all those files are included in the directory files/original of our repository; the robots.txt file, as per Omeka installation, is very simple and indeed blocking access to that directory:

User-agent: *
Disallow: /files/

I suppose those 52 files are linked from outside the repository, and therefore have been indexed by Google.

Is there anything that could be done to solve the Google issues, without causing problems to the Omeka installation?

Thanks.

Yeah, robots.txt won’t be effective against external links.

You have limited options here: one would be the noindex directive which should work, but to set it for uploaded files, you’d need to make that change in Apache settings or an .htaccess file. You could also just block Google’s crawlers from accessing those files directly in Apache settings (though that wouldn’t affect other search engines).

Thanks, John. That was what I thought too.

I’m wondering whether anybody else got the same notification by Google, and what they have done (if anything).

This topic was automatically closed after 250 days. New replies are no longer allowed.