Set up Universal Viewer with iiif server (beginner)

Hi, I am new to omeka and to (everything related). My final goal is to have a site to show some historical documents that I have annotated and ocr-ized (ALTO XML)

From what I understood, I need the modules Common, IIIF Server, Image Server, and Universal Viewer. I’ve installed them all already.

I am first trying to test and to get familiar to everything, so I am running omeka-s in a php localhost.

What I have now is that Universal Viewer apparently can’t show any image I try. I am uploading them in this way: New Item → Media → Add media → Upload

in the logs, I have the following message:
[404]: GET /iiif/2/11/info.json - No such file or directory

Indeed, I don’t seem to have an ‘iiif’ dir. I think I need to make some more configuration for the modules to work, but I am not sure how to. The instructions in the official pages of the modules are a bit confusing to me as someone who never worked with those things.

I’ve read about manifests, but I am not sure how should they come at play here. Do I need to manually write one for every image? Should one of the modules do it automatically (if configured right)? Where should they go? Is this even the issue?

Thank you for the help

Universal viewer is not able to display ocrized data, unless there is a plugin i don’t know. But it can use the ocrized data for search (via module IiifSearch).

Normally, no configuration is needed, the default one is working fine on install. Manifests are created automatically by module Iiif Server and info.json are created automatically by module Image Server. And there is no iiif dir, it is a route path to the module. The same for the main manifests, but they may be cached for performance via the directory files/iiif.

So is your media an image?

I want to do something similar to what was discussed here:

In this topic, you also link La semaine agricole, revue agricole et politique de la France et de l'étranger · Collections de la Fondation Maison de Salins · Fondation Maison de Salins at one point. Something like this would already be great for us (the only difference is we would also like to have the full ocrized text also shown on the side or something, but maybe this is not possible?). But from what I understand you were using pdf and extracting ocr with another module at this point, so things might be different.

Yes, my media is an image

Thank you for your help!

As far as I know, there is no plugin in Universal Viewer to display the full text ocerized on the side, but there is one in Mirador. Anyway, they are two iiif viewers. To display the ocr text, you should have a alto file attached to the item, or if you don’t, you can create it with the module Extract Ocr (from pdf). From image, the process is more complex and not fully implemented in the module yet, so use another tool. Then, the ocr is automatically included in the iiif manifest by the module IiifServer.

@Daniel_KM there is a code for Universal Viewer to display OCR/HTR, see GitHub - 4Science/universalviewer at ocr

You can see this working (in a PoC style, non-Omeka S site) via Open Archieven » Universal Viewer (with HTR) Click on the lower right vertical bar “Resultaten HTR” - Dutch for Results HTR - to open the annotations, you can also click on lines to highlight this in the scan. The “OCR module” reads a AnnotationCollection (like https://www.openarchieven.nl/not/annotations/NL-HaHGA_archief_0372-01_inventaris_8_deel_4_van_9-089.json) which I converted for this PoC from PageXML files (like https://www.openarchieven.nl/not/page/NL-HaHGA_archief_0372-01_inventaris_8_deel_4_van_9-089.xml).

I really think a lot of heritage organisations want this kind of functionality!

Ok, it is possible. The code is a fork of universal viewer v2, not a plugin for universal viewer, so there is a hard word to do to extract and to reimplement it in universal viewer v4. It may be simpler to implement it directly in it. You may try to copy the js in the module.

For the format, it should be possible to integrate the format PageXml in module iiif server in order to manage it alongside alto and pdf2xml, but i prefer always to work on standard (alto), so the better way is to create a xslt sheet to convert PageXml to alto, if it doesn’t exist yet.

Thank you for your answer! I’ve been trying Mirador since then. I did indeed have an alto file attached to the item.

reading into Mirador, I’ve founf the plugin mirador-ocr-helper, and it looks like it was used in: Protokolle des Bundesrates (1848-1972)

this was great news for me, since I want to do something exactly like that (I realize ALTO is not the best choice maybe, but this is flexible). I “activated” it in Settings->PLayers->Mirador plugins for v3->OCR helper

However, it is still now working for some reason? I tried creating an item with an image and a hOCR attached (which should be supported from what I understood), but the viewer looks like the standard Mirador

I think it probably has something to do with the Mirador config as object string for v3 (item) field, so I tried reading about it and messing with it. Still, nothing seems to change

examples of json I’ve tried:

{
  "window": {
    "allowClose": true,
    "textOverlay": {
      "enabled": true,
      "visible": true,
      "skipEmptyLines": true
    },
    "sideBarOpenByDefault": true,
    "panels": {
      "info": true
    }
  }
}

As I was writing this, I stumbled upon https://www.npmjs.com/package/@4eyes/mirador-ocr-helper?activeTab=code. Should my json look something like this? How do I even do it with the imports?

Once again, thank you for your help

You don’t need to modify the config of the json to display the ocr, you just need to select the ocr plugin and the module takes it in account automatically when an ocr is attached to the item. In fact, the ocr is attached to the iiif manifest, so check the iiif manifest for it.

yeah maybe that’s it. There is no mention of the ocr in the manifest. Any idea how to fix this?

Also, maybe I am misunderstanding the “attaching” part. What I am doing is uploading both files as media to the same item, but maybe this is not the right way?