Accessing images stored in Amazon S3 via OAI-PMH

Hello all,

I’m under the employ of California Digital Library, harvesting metadata into . For each record in Calisphere, we’re grabbing from the source repository all relevant descriptive metadata, a link back to the source record, and an image URL to generate thumbnail/preview files from.

For many sources OAI-PMH is the easiest harvest option. We’re still relatively new to Omeka as a data source but have had success harvesting via OAI-PMH in the past. For a current source, we’re able to easily grab the record page link from a dc:identifier field but I can’t figure out how to harvest/generate an image file link since the source’s image files are all stored in Amazon S3. The Amazon filenames look like generated strings, and do not correspond in any way to the TIF filename listed in another dc:identifier field.

Example OAI feed:

Example single record:

Example Amazon S3 URL:

My question: is there any way to access and/or generate the Amazon S3 URL for image files via OAI-PMH? Or is this something I would need to fiddle with an API for?


Strictly speaking this is an question, not Omeka itself, but it’d come up the same way for any site using private S3 storage, so I think it makes sense here.

What exactly are you looking to get when you say you want the S3 URL?

The URL you see there is it: a pre-signed URL to the file. The details of that URL will change as it generates different expiration dates and so on, but there’s no other valid URL to the image.

Is there some reason why that URL’s not sufficient for your needs? If you’re just pulling down the image to make a thumbnail, I can’t think of what the problem would be with that URL.

Hi jflatnes,

Thanks for your quick response and sorry for the incorrect posting location.

You are right, the URL I posted as an example is exactly what I need. My problem is that I obtained that URL by clicking on the image in the Omeka interface and copy/pasting. What I need is some way to automatically obtain this URL from a data feed source, preferably from the OAI-PMH feed (example given) or possibly an API.

Our harvester software stack is built to interact with OAI-PMH or APIs to batch harvest metadata and image URL information. So I’m asking to see if there was some way for the institution to express the Amazon S3 file URL or its unique ID values via a programmatic method such as OAI-PMH or API. It’s obviously not feasible for me to manually copy/paste these URLs into our SOLR index.

Let me know if you have further questions, thanks again!

Ah, now I understand your issue.

There’s an “expose files” setting for the OAI-PMH repository plugin. If the owner/admin of the site turns that setting on for the plugin, it will add an additional identifier with the URL for each file.

Additionally, the “mets” metadata prefix includes the same Dublin Core metadata as oai_dc, but because METS has an explicit place for them it also includes file URLs (e.g., )

This is just what I needed. I’ll reach out to the Omeka repository owners to expose the files. Thanks!