Developing Persistent Identifier module for Omeka-S

Matthew · April 22, 2021, 6:06pm

The Omeka team is currently planning the development of a Persistent Identifier (PID) module for Omeka-S, which will allow users the option of assigning PIDs to Omeka items for the purpose of stable ongoing URL linking and reference. The module will be able to both mint/assign new PIDs to items, as well as retrieve and update existing PIDs for imported items. It will also be designed in a flexible way in order to accommodate different PID services such as DOI, ARK, Handles etc.

One central issue to implementation is how Omeka-S item URLs are currently assigned. There is no internal persistent URL/landing page for an item: the URL depends on the Omeka-S Site where the item is published, and an item may have multiple URLs if it is published on more than one Site. Thus there is currently no obvious single URL to point a PID at for a given Omeka-S object.

We have a few ideas for addressing this issue, but would like to hear more from our user community. Which of the following options makes the most sense to you? Do you use or plan to use PIDs in your collections, and how would you like to see them integrated with Omeka-S?

Option 1: On the Site Resources page, include an option to ‘Update Persistent Identifiers’, which will go through all items assigned to the site and either mint/assign a PID, or retrieve a PID from the items identifier field and use the PID API service to update that item’s PID target to the current Item URL.

Pros: item page will retain theme/branding
Cons: updating/assigning PIDs will require user intervention whenever items are added/removed, items on multiple Sites can only have one PID targeted URL

Option 2: Ability to create a separate, site-agnostic landing page for an item, and assign the PID to point to this page.

Pros: a single, stable URL for a PID to point to no matter if the item is published to a Site or multiple Sites
Cons: a ‘generic’ landing page with no Site-specific theme or branding

Option 3: similar to option 1, but PIDs are assigned independently within the context of each Site. In other words, an item on multiple Sites would have multiple PIDs, each pointing to the item on each separate Site.

Pros: retain theme/branding, allow for an item to have several stable PIDs across several sites
Cons: a single intellectual item with multiple PIDs assigned kind of goes against the whole point/spirit of assigning PIDs

Are we missing anything? Feel free to share any other ideas/thoughts/concerns surrounding PIDs and Omeka-S. Thank you!

jimsafley · April 22, 2021, 7:21pm

The API already provides an internal persistent URL for an item at the api/default route. For example, http://example.com/api/items/1 points to the JSON-LD serialization of item #1. I suppose you could add content negotiation to redirect the client to an HTML representation of the item, possibly landing on Option 2 above. I don’t see a compelling reason to specify an item’s site given the nature of Omeka S as having a pool of resources. The site is incidental, and any representation of an item will already have a list of sites to which it belongs.

Matthew · April 22, 2021, 8:08pm

Very helpful Jim, thanks! I think everything depends on what exactly the collection holder and/or end user expects to see after clicking on the PID. The JSON-LD result is great for pure data but would likely not be acceptable for an author citing a resource in a publication, or an online exhibit linking out to an Omeka item–they would want something understandable to their intended audience.

So content negotiation to some sort of human-readable landing page would need to be at the very least an option, but the JSON-LD persistent URL could be a useful building block in that process.

Janneman · April 24, 2021, 11:10am

Of course Jim 's proposal is the way to go. This is already standard practice in the Semantic Web for some decennia to handle uri’s. The content negotiation determines the response to the resource uri.

Daniel_KM · April 30, 2021, 8:10am

Note that there is already a persistent identifier module for Omeka S that create opaque identifiers for ark.: the module Ark.

giocomai · May 1, 2021, 1:15pm

I also feel that option 2 is the most sustainable and coherent.

The resulting page could still prominently show the sites where the item has been added, encouraging the viewer to see the item in context.

raboof · August 28, 2021, 12:56pm

I’m just getting started with collections management, Omeka S and the semantic web, so take my input with a grain of salt, but I’d like to add a few thoughts. Some of this is probably beyond what a PID module would offer, but perhaps might influence its design.

First and foremost, I agree with Jim and Janneman above that content negotiation, i.e. serving either HTML or JSON-LD (or perhaps in the future even other formats like Turtle, RDF/XML or N-Triples) depending on the Accept header of the request, sounds very valuable.

In some communities, like the Dutch Platform Linked Data which is referred to by our ‘Network for Digital Heritage’ (Netwerk Digitaal Erfgoed, NDE), the Cool URIs for the Semantic Web approach appears to be popular. In short: this scheme recommends PIDs to be of the form https://data.eicas.nl/id/work/42, which when requested would produce a HTTP 303 See Other redirect to https://data.eicas.nl/doc/work/42 (id → doc), which then would respond with a RDF or HTML. It’s not obvious to me that this is a good idea (I think I understand the distinction between ‘id’ and ‘doc’, but it still seems somewhat artificial, and breaks the ‘just copy the URL from the address bar and you have the ID’ convenience), so it’s definitely not universal, but it might make sense to support it. This means the PID itself would not always correspond directly to the HTTP endpoint serving the actual content. Some webserver redirect/rewrite configuration probably goes a long way, but having it ‘natively’ in Omeka has its appeals as well.

When unauthenticated, the document returned should probably be filtered so only the ‘public’ items and fields are shown, both for the HTML and ‘Semantic’ formats.

As an ‘advanced feature’, I could imagine it might be useful to be able to (perhaps by plugging in some PHP code) add further processing, such as presenting some information that is recorded in Schema.org also under their Dublin Core equivalent or vice-versa. This is getting into ‘export module’ territory though I suppose .

On the relationship between PIDs and sites: I don’t really have the background to judge how this would work out, but for ‘smaller’ organizations like ours (we’re a museum for modern art in its early days) perhaps there doesn’t have to be a distinction between the PID of the item and its URL on the ‘main’ site. I can imagine that produces technical challenges, though - in which case the ‘next best thing’ for the user might perhaps be to have to PID redirect to the corresponding URL on the ‘main’ site when HTML is requested (though that again breaks the ‘just copy the URL from the address bar and you have the ID’ convenience)?

Matthew · August 30, 2021, 5:47pm

Thanks for these additional thoughts! I think the idea for the content-negotiated, site-agnostic, human-readable HTML endpoint is for it to be a non-changing stable URI so it will be ‘native’ in a sense once it’s implemented.

Since we’ll be working with multiple PID services, each with their own PID URI resolver syntax (ex: https://n2t.net/ark:/12345/6789 or http://dx.doi.org/10.1093/ajae/aaq063), we won’t have much control over how ‘Cool’/simple the initial PID URI is, short of setting up an Omeka-specific resolver service for each, which would be far outside the scope of this work. That said, while it doesn’t use the exact terminology, the content negotiation discussed here provides a kind of id → doc progression, in that a call is made to the internal persistent URL at api/default which could be considered the “id”, then redirected to the aforementioned HTML endpoint “doc”.

A good point in that Authentication can and will be considered when implementing this. Presumably a user would only want PIDs assigned to more publicly accessible items, but that should not be considered a given. Another good idea for an Advanced Feature, which might be implemented down the line once basic functionality is in place.

The reason the PID endpoint is site-agnostic is that the ‘main’ site for each object is mutable depending on the workflow/access model of the organization–there’s no programmatic way to have Omeka S know what the ‘main’ site for an object might be across all the different organizations using Omeka differently, and even if so, there is nothing stopping the object from being removed/moved in the future or displayed on multiple sites. Creating a persistent object landing page away from these temporal and content management decisions allows us to always know which PID endpoint to point to, ensuring that these persistent identifiers really do remain persistent. That said, since we will be using content negotiation, there is nothing stopping an Omeka S user from adjusting the code for their specific instance so that it does redirect to the object’s Site page or another desired page.

raboof · August 31, 2021, 7:48am

I’m not sure I understand - isn’t the main point of content-negotiation that it will produce either an HTML or a machine-readable representation of the resource depending on the Accept header of the request?

Ah, so you consider the ‘actual’ PID URI resolving as external to Omeka S itself, so you can use those external PID services. Makes sense, that hadn’t clicked for me before yet. I’m not planning to use an external PID service yet, but I would like to refer to our resources with persistent identifiers under our own domain (e.g. https://data.eicas.nl/id/work/42), which would remain stable even if we make changes internally. It sounds like the module described here could help cover that use case as well, and I could set up my own “poor man’s PID service” in the form of some Apache redirect configuration returning a 303 redirect from https://data.eicas.nl/id/work/42 to the content-negotiated Omeka endpoint.

Wouldn’t it be the other way around? As I understand it the “id” should be the “pretty”, long-term-stable name, and the “doc” could be an Omeka-specific URI with content negotiation.

I completely agree this has to be an organization-specific decision. I think it would be nice if an organization can choose to have a ‘main’ site and aren’t forced to have a ‘site-agnostic’ representation next to their one and only ‘main’ site . This would indeed mean they have to make sure all objects are indeed part of that site. Sounds like that’s possible, so

Matthew · August 31, 2021, 4:34pm

Yes–I was referring to the end URI eventually ‘served up’ via content negotiation. Although not always via the Accept header, the major PID services have ways of supporting content negotiation, and if any future services do not it should be possible to mimic content negotiation behavior by using regex or similar to recognize the particular syntax of the PID.
Correct, the module will be geared toward established PID services–but if you set up your own service, and you point https://data.eicas.nl/id/work/42 toward the correct Omeka URI for the object (to be finalized but will likely be the API URL for an object mentioned by Jim above) along with an acceptable form of content negotiation (again, to be detailed with the forthcoming module), it should resolve to the correct URI just the same.
This might just come down to semantics, but yes, I can see it your way as well. In which case the API URL that is used as a PID target would be a sort of ‘middle-man’ between the id and doc.
Good food for thought–the intent was always to make the module flexible and clear enough that an Omeka S user could easily change where an incoming PID request could redirect to, from a generic page to a specific site page as you’re describing, via the code or configuration files. But now I’m thinking that if possible, it may be useful to integrate this functionality into the settings for the module–have a sort of drop down where, instead of a site-agnostic landing page, you could choose which Site for all incoming PID requests to redirect to. Will look into this!

raboof · September 1, 2021, 6:48am

Ah, so you’re primarily considering content negotiation a responsibility of the PID service, not of Omeka-S itself - I didn’t realize that. For example from The Code4Lib Journal – Persistent identifiers for heritage objects I got the impression that the content negotiation features of the major PID services were rather rudimentary, but I don’t really have first-hand experience here. Which PID service(s) are you initially targeting?

I think having content negotiation (particularly via the Accept header) on the ‘Omeka side’ would be valuable, not just in PID context but also when Omeka is used ‘directly’ in a Linked Data context.

Matthew · September 6, 2021, 7:02pm

Raboof–perhaps I could have phrased that better, I am definitely planning on having the content negotiation implemented on the Omeka side. I was more referring to the data or ‘hook’ provided by the PID/resolving services URL/Headers/Etc. that the Omeka S content negotiation would need in order to know where to redirect.

I’m initially targeting ARKs (to be generated within Omeka S via the EZID service) and DOIs (via the DataCite service), and with those two I’m fairly certain I’ll be able to use Accept headers for content negotiation, so that will definitely be incorporated into the CN process. Other methods such as URL regex etc. might eventually be incorporated as fallbacks in case Accept headers are incorrect or missing, but that will be the initial and main method of negotation.

raboof · September 8, 2021, 6:01am

This sounds great, looking forward to it!

system · May 16, 2022, 6:01am

This topic was automatically closed 250 days after the last reply. New replies are no longer allowed.