Accessing images stored in Amazon S3 via OAI-PMH

matthewmckinley · November 10, 2016, 7:37pm

Hello all,

I’m under the employ of California Digital Library, harvesting metadata into Calisphere.org . For each record in Calisphere, we’re grabbing from the source repository all relevant descriptive metadata, a link back to the source record, and an image URL to generate thumbnail/preview files from.

For many sources OAI-PMH is the easiest harvest option. We’re still relatively new to Omeka as a data source but have had success harvesting via OAI-PMH in the past. For a current source, we’re able to easily grab the record page link from a dc:identifier field but I can’t figure out how to harvest/generate an image file link since the source’s image files are all stored in Amazon S3. The Amazon filenames look like generated strings, and do not correspond in any way to the TIF filename listed in another dc:identifier field.

Example OAI feed: http://christensenfamilycollection.omeka.net/oai-pmh-repository/request?verb=ListRecords&metadataPrefix=oai_dc&set=7

Example single record: http://christensenfamilycollection.omeka.net/items/show/282

Example Amazon S3 URL: https://s3.amazonaws.com/omeka-net/12774/archive/files/983b80ab8079155dc054fada86645a90.jpg?AWSAccessKeyId=AKIAI3ATG3OSQLO5HGKA&Expires=1478849228&Signature=95PFWVaM02Tev3V1bQZHHqlr0O4%3D

My question: is there any way to access and/or generate the Amazon S3 URL for image files via OAI-PMH? Or is this something I would need to fiddle with an API for?

Thanks!
Matthew

jflatnes · November 10, 2016, 7:42pm

Strictly speaking this is an Omeka.net question, not Omeka itself, but it’d come up the same way for any site using private S3 storage, so I think it makes sense here.

What exactly are you looking to get when you say you want the S3 URL?

The URL you see there is it: a pre-signed URL to the file. The details of that URL will change as it generates different expiration dates and so on, but there’s no other valid URL to the image.

Is there some reason why that URL’s not sufficient for your needs? If you’re just pulling down the image to make a thumbnail, I can’t think of what the problem would be with that URL.

matthewmckinley · November 10, 2016, 9:51pm

Hi jflatnes,

Thanks for your quick response and sorry for the incorrect posting location.

You are right, the URL I posted as an example is exactly what I need. My problem is that I obtained that URL by clicking on the image in the Omeka interface and copy/pasting. What I need is some way to automatically obtain this URL from a data feed source, preferably from the OAI-PMH feed (example given) or possibly an API.

Our harvester software stack is built to interact with OAI-PMH or APIs to batch harvest metadata and image URL information. So I’m asking to see if there was some way for the institution to express the Amazon S3 file URL or its unique ID values via a programmatic method such as OAI-PMH or API. It’s obviously not feasible for me to manually copy/paste these URLs into our SOLR index.

Let me know if you have further questions, thanks again!
Matthew

jflatnes · November 10, 2016, 10:09pm

Ah, now I understand your issue.

There’s an “expose files” setting for the OAI-PMH repository plugin. If the owner/admin of the site turns that setting on for the plugin, it will add an additional identifier with the URL for each file.

Additionally, the “mets” metadata prefix includes the same Dublin Core metadata as oai_dc, but because METS has an explicit place for them it also includes file URLs (e.g., http://christensenfamilycollection.omeka.net/oai-pmh-repository/request?verb=ListRecords&metadataPrefix=mets&set=7 )

matthewmckinley · November 10, 2016, 10:17pm

This is just what I needed. I’ll reach out to the Omeka repository owners to expose the files. Thanks!