Set up Universal Viewer with iiif server (beginner)

Hi, I am new to omeka and to (everything related). My final goal is to have a site to show some historical documents that I have annotated and ocr-ized (ALTO XML)

From what I understood, I need the modules Common, IIIF Server, Image Server, and Universal Viewer. I’ve installed them all already.

I am first trying to test and to get familiar to everything, so I am running omeka-s in a php localhost.

What I have now is that Universal Viewer apparently can’t show any image I try. I am uploading them in this way: New Item → Media → Add media → Upload

in the logs, I have the following message:
[404]: GET /iiif/2/11/info.json - No such file or directory

Indeed, I don’t seem to have an ‘iiif’ dir. I think I need to make some more configuration for the modules to work, but I am not sure how to. The instructions in the official pages of the modules are a bit confusing to me as someone who never worked with those things.

I’ve read about manifests, but I am not sure how should they come at play here. Do I need to manually write one for every image? Should one of the modules do it automatically (if configured right)? Where should they go? Is this even the issue?

Thank you for the help

Universal viewer is not able to display ocrized data, unless there is a plugin i don’t know. But it can use the ocrized data for search (via module IiifSearch).

Normally, no configuration is needed, the default one is working fine on install. Manifests are created automatically by module Iiif Server and info.json are created automatically by module Image Server. And there is no iiif dir, it is a route path to the module. The same for the main manifests, but they may be cached for performance via the directory files/iiif.

So is your media an image?

I want to do something similar to what was discussed here:

In this topic, you also link La semaine agricole, revue agricole et politique de la France et de l'étranger · Collections de la Fondation Maison de Salins · Fondation Maison de Salins at one point. Something like this would already be great for us (the only difference is we would also like to have the full ocrized text also shown on the side or something, but maybe this is not possible?). But from what I understand you were using pdf and extracting ocr with another module at this point, so things might be different.

Yes, my media is an image

Thank you for your help!

As far as I know, there is no plugin in Universal Viewer to display the full text ocerized on the side, but there is one in Mirador. Anyway, they are two iiif viewers. To display the ocr text, you should have a alto file attached to the item, or if you don’t, you can create it with the module Extract Ocr (from pdf). From image, the process is more complex and not fully implemented in the module yet, so use another tool. Then, the ocr is automatically included in the iiif manifest by the module IiifServer.

@Daniel_KM there is a code for Universal Viewer to display OCR/HTR, see GitHub - 4Science/universalviewer at ocr

You can see this working (in a PoC style, non-Omeka S site) via Open Archieven » Universal Viewer (with HTR) Click on the lower right vertical bar “Resultaten HTR” - Dutch for Results HTR - to open the annotations, you can also click on lines to highlight this in the scan. The “OCR module” reads a AnnotationCollection (like https://www.openarchieven.nl/not/annotations/NL-HaHGA_archief_0372-01_inventaris_8_deel_4_van_9-089.json) which I converted for this PoC from PageXML files (like https://www.openarchieven.nl/not/page/NL-HaHGA_archief_0372-01_inventaris_8_deel_4_van_9-089.xml).

I really think a lot of heritage organisations want this kind of functionality!

Ok, it is possible. The code is a fork of universal viewer v2, not a plugin for universal viewer, so there is a hard word to do to extract and to reimplement it in universal viewer v4. It may be simpler to implement it directly in it. You may try to copy the js in the module.

For the format, it should be possible to integrate the format PageXml in module iiif server in order to manage it alongside alto and pdf2xml, but i prefer always to work on standard (alto), so the better way is to create a xslt sheet to convert PageXml to alto, if it doesn’t exist yet.

Thank you for your answer! I’ve been trying Mirador since then. I did indeed have an alto file attached to the item.

reading into Mirador, I’ve founf the plugin mirador-ocr-helper, and it looks like it was used in: Protokolle des Bundesrates (1848-1972)

this was great news for me, since I want to do something exactly like that (I realize ALTO is not the best choice maybe, but this is flexible). I “activated” it in Settings->PLayers->Mirador plugins for v3->OCR helper

However, it is still now working for some reason? I tried creating an item with an image and a hOCR attached (which should be supported from what I understood), but the viewer looks like the standard Mirador

I think it probably has something to do with the Mirador config as object string for v3 (item) field, so I tried reading about it and messing with it. Still, nothing seems to change

examples of json I’ve tried:

{
  "window": {
    "allowClose": true,
    "textOverlay": {
      "enabled": true,
      "visible": true,
      "skipEmptyLines": true
    },
    "sideBarOpenByDefault": true,
    "panels": {
      "info": true
    }
  }
}

As I was writing this, I stumbled upon https://www.npmjs.com/package/@4eyes/mirador-ocr-helper?activeTab=code. Should my json look something like this? How do I even do it with the imports?

Once again, thank you for your help

You don’t need to modify the config of the json to display the ocr, you just need to select the ocr plugin and the module takes it in account automatically when an ocr is attached to the item. In fact, the ocr is attached to the iiif manifest, so check the iiif manifest for it.

yeah maybe that’s it. There is no mention of the ocr in the manifest. Any idea how to fix this?

Also, maybe I am misunderstanding the “attaching” part. What I am doing is uploading both files as media to the same item, but maybe this is not the right way?

Still no success, any idea?

Dear Thomas,
Provide (valid) alto.xml files and use Mirador plugin version 3.4.7.16 and you should be good to go. Not all versions work good together.
Please let me know if this is of any help.

1 Like

Hi Peter, thank you for your reply!

I just tried this here, and it didn’t work as well

I read somewhere about an issue with IIIF Search not working with some XML files a guy was using, and he was instructed to use ExtractOCR or format his files in the same witay as the output of ExtractOCR (he did the latter and it apparently worked), so i tried using ExtractOCR just to see if it would work this way (just for testing, because i need to use my own OCR for the project, but I could “translate” them to a format that would work), but it still doesn’t work.
(with ExtractOCR, I tried .tsv, .xml and .alto.xml, all generated by the module. Text was in fact extracted in all of my attempts, so it isn’t the case that the OCR is empty or something)

The modules I have currently are:

(omeka-s 4.1.0)

  • Common 3.4.55
  • Extract Ocr 3.4.7
  • IIIF Search 3.4.8
  • IIIF Server 3.6.20
  • Image Server 3.6.17
  • Mirador Viewer 3.4.7.16 (was 3.4.9 before)

The viewer does appear, but not the things that would be expected with OCR Helper

(for the test with Extract OCR, I’m also adding an image to the medias of the item after extraction, since Mirador doesn’t show pdf)

All help is welcome! :slight_smile:

Dear Thomas,
We (Maastricht University) will go to production with IIIF in the upcoming month and have succeeded in a setup that works with these modules
Advanced Search version 3.4.21
Advanced Search adapter for Solr version 3.5.45
Common version 3.4.55
Extract Text version 2.0.0 (for use with advanced Search)
IIIF Search version 3.4.7
IIIF Server version 3.6.16
Image Server version 3.6.17
You can download this setup for local testing purposes at GitHub - MaastrichtU-Library/omekas-docker: Dockerized development environment for Omeka S