IIIF collections & items & item-sets & extract text

jpeeraer · May 22, 2023, 11:25am

A researcher wants to make a collection of program books of the opera (Koninklijke Vlaamse Opera) which consist of images (scans) of each page. IIIF manifests are mandatory, so I figured out that each program book should be an item, where each page is media belonging to that item. This way the program book can be easily viewed in for example IIIF viewer as Mirador, and viewing the next page can be done by clicking ‘next item’.
Every program page has an OCR-text, so it would be nice if this text could be presented alongside the image in Mirador, like in this demo ( Mirador).
I thought I could solve this with the aid of the module ‘Extract text’. Installed it (as well as corresponding extractors), and indeed, if I upload a text file as media, this will be added to the field extracted_text to the item as well as the media. If I upload a second text file , the text will be appended to the items ‘extracted text’.
So if I finally have a look at it with Mirador, the text of all the pages appears on the first page of the program book.
I tried to solve this by adding all pages as an item, and then add an item-set as collection above it. But this way, the presentation is not so nice in for example Mirador viewer, as each item (page) is a manifest now and is presented as a collection in Mirador.
Anyone any idea?

thanks

coret · May 22, 2023, 2:49pm

I added the Gouda Adresboek 1885 as Item and uploaded each page as Media Item. The AltoXML files where also uploaded as Media Items (see How to upload Alto xml files?), where the filename (except extension) corresponds with the scan of the page. I use the IIIF Server, IIIF Search and Universal Viewer modules. I’m still looking for a way to make the transcriptions more visible (not just search), eg. via a HTR panel.

Are Gouda Adresboek 1885 and https://www.goudatijdmachine.nl/data/iiif/2/99958/manifest the results you are looking for?

jpeeraer · May 23, 2023, 8:28am

Gouda adresboek looks good. If I open it in Mirador, then you can see the annotations, and that’s something we also want. Only problem I have is that the extract OCR module is not updated for omeka-s version 4, which runs on our server for the moment. You are running an older version of Omeka-S then?

thanks

Jef

coret · May 23, 2023, 3:31pm

I’m not using the extract module. The scans and Alto XML files were delivered from digitization, so no need for extraction.

jpeeraer · May 25, 2023, 9:29am

my images are not stored locally in omeka-s (external image server), and that doesn’t seem to work then. I read in the docs that they will provide an URI in the future for that.
This search in the Gouda timemachine is very nice though. Is that website built with OmekaS ?

coret · May 25, 2023, 10:02am

Yes, everything under Data · Data · Gouda Tijdmachine is Omeka S. All data from Omeka S is also available as Linked Data via GraphDB Workbench

system · May 19, 2024, 10:02am

This topic was automatically closed 360 days after the last reply. New replies are no longer allowed.