Hi @Dbrett.
So, here’s my thinking about this so far.
Option 1. Everything remains as it is in the database and in Omeka, but we add a nice php line in the theme that outputs something like this:
<link rel="canonical" href="https://example.com/s/website_where_item_is_actually_included/item/1234" />
Since we currently have no better information, website_where_item_is_actually_included
would probably be the site with the lowest id, if the item is included in more than one site, or probably the first public website, if the item is not explicitly included in any website.
Option 2. A slightly better version would be that this line would be preceded by an if statement such as “if this item is not part of the current website, then output this nice canonical url”. This would probably be better because if an item is included in more than one site, then it would be up to Google to decide which is the most important, and long-term we’d expect Google to get this right (and anyway, it couldn’t get it erally wrong, as all sites where the item is not included would still have a link to the canonical url). Also, this would ensure that the change would not annoy people who, for example, have all items added to an old ugly site with a low id number, and they’d much prefer Google to send them anywhere else… with the if statement, Google decides, but only among sites where the item was added.
Option 3. Would be that Omeka defaults along the lines of Option 2, but would allow the admin to explicitly say which site is the main site for an item if they so wish. This could probably happen either when items are added to a site, in some other bulk form (e.g. all items that are in x item set will have site y as main site), or manually. If the admin says nothing explicitly, things would happen along the lines of Option 2, so only items pages of sites where they have not been added will tell search engines the preferred address for that given item. This feature could conceivably be introduced either in core Omeka S or as a module.
Option 4. is actually a workaround. If there is really no way to write a nice php line such as the one I propose above or if that is undesirable for some other reason I’m not currently considering, it would be possible to achieve something similar by parsing the Omeka’s REST API, and systematically create a sitemap.xml file that implements the same as above. As the page I linked above suggests, sitemap.xml files are less effective than the other solutions, but it should probably still work. This could also be more of a pain to maintain, as this may be implemented outside of Omeka, so you would probably have a script that runs once a day or something, and regenerates an updated sitemap.xml (or could have the same done by a module inside Omeka… even if this seems more complicated than option 1 and 2).
All of the above would still be better that manually adding links to the removal tool, which is nice if there’s just the odd link out of place, but becomes soon a huge pain if you need to to do this for each new item for each of the dozen websites where that item has not been added.
I’d be curious to hear how common this experience is with Omeka S 3, but I would find it surprising if this was bugging only @Dbrett and me.
Considering that Google is really the main source of traffic for our Omeka S sites, as I suppose is the case for many others, I feel it’s not a minor nuisance.
I am not very familiar with either php or the internals of Omeka, so any help in writing that nice php line would certainly be very welcome.
Long term, if not Option 3, I feel at least something like Option 2 should be part of Omeka’s core (as an helper function) or added to default themes. Ultimately, the nice thing about canonical urls is that they do not bring any user-visible change, they would represent a huge improvement for many, and wouldn’t make a difference for those who didn’t have a problem in the first place.
Looking forward to hear thoughts about this! Thanks a lot as usual!