Set up Universal Viewer with iiif server (beginner)

Hi, I am new to omeka and to (everything related). My final goal is to have a site to show some historical documents that I have annotated and ocr-ized (ALTO XML)

From what I understood, I need the modules Common, IIIF Server, Image Server, and Universal Viewer. I’ve installed them all already.

I am first trying to test and to get familiar to everything, so I am running omeka-s in a php localhost.

What I have now is that Universal Viewer apparently can’t show any image I try. I am uploading them in this way: New Item → Media → Add media → Upload

in the logs, I have the following message:
[404]: GET /iiif/2/11/info.json - No such file or directory

Indeed, I don’t seem to have an ‘iiif’ dir. I think I need to make some more configuration for the modules to work, but I am not sure how to. The instructions in the official pages of the modules are a bit confusing to me as someone who never worked with those things.

I’ve read about manifests, but I am not sure how should they come at play here. Do I need to manually write one for every image? Should one of the modules do it automatically (if configured right)? Where should they go? Is this even the issue?

Thank you for the help

Universal viewer is not able to display ocrized data, unless there is a plugin i don’t know. But it can use the ocrized data for search (via module IiifSearch).

Normally, no configuration is needed, the default one is working fine on install. Manifests are created automatically by module Iiif Server and info.json are created automatically by module Image Server. And there is no iiif dir, it is a route path to the module. The same for the main manifests, but they may be cached for performance via the directory files/iiif.

So is your media an image?

I want to do something similar to what was discussed here:

In this topic, you also link La semaine agricole, revue agricole et politique de la France et de l'étranger · Collections de la Fondation Maison de Salins · Fondation Maison de Salins at one point. Something like this would already be great for us (the only difference is we would also like to have the full ocrized text also shown on the side or something, but maybe this is not possible?). But from what I understand you were using pdf and extracting ocr with another module at this point, so things might be different.

Yes, my media is an image

Thank you for your help!

As far as I know, there is no plugin in Universal Viewer to display the full text ocerized on the side, but there is one in Mirador. Anyway, they are two iiif viewers. To display the ocr text, you should have a alto file attached to the item, or if you don’t, you can create it with the module Extract Ocr (from pdf). From image, the process is more complex and not fully implemented in the module yet, so use another tool. Then, the ocr is automatically included in the iiif manifest by the module IiifServer.

@Daniel_KM there is a code for Universal Viewer to display OCR/HTR, see GitHub - 4Science/universalviewer at ocr

You can see this working (in a PoC style, non-Omeka S site) via Open Archieven » Universal Viewer (with HTR) Click on the lower right vertical bar “Resultaten HTR” - Dutch for Results HTR - to open the annotations, you can also click on lines to highlight this in the scan. The “OCR module” reads a AnnotationCollection (like https://www.openarchieven.nl/not/annotations/NL-HaHGA_archief_0372-01_inventaris_8_deel_4_van_9-089.json) which I converted for this PoC from PageXML files (like https://www.openarchieven.nl/not/page/NL-HaHGA_archief_0372-01_inventaris_8_deel_4_van_9-089.xml).

I really think a lot of heritage organisations want this kind of functionality!

2 Likes

Ok, it is possible. The code is a fork of universal viewer v2, not a plugin for universal viewer, so there is a hard word to do to extract and to reimplement it in universal viewer v4. It may be simpler to implement it directly in it. You may try to copy the js in the module.

For the format, it should be possible to integrate the format PageXml in module iiif server in order to manage it alongside alto and pdf2xml, but i prefer always to work on standard (alto), so the better way is to create a xslt sheet to convert PageXml to alto, if it doesn’t exist yet.

Thank you for your answer! I’ve been trying Mirador since then. I did indeed have an alto file attached to the item.

reading into Mirador, I’ve founf the plugin mirador-ocr-helper, and it looks like it was used in: Protokolle des Bundesrates (1848-1972)

this was great news for me, since I want to do something exactly like that (I realize ALTO is not the best choice maybe, but this is flexible). I “activated” it in Settings->PLayers->Mirador plugins for v3->OCR helper

However, it is still now working for some reason? I tried creating an item with an image and a hOCR attached (which should be supported from what I understood), but the viewer looks like the standard Mirador

I think it probably has something to do with the Mirador config as object string for v3 (item) field, so I tried reading about it and messing with it. Still, nothing seems to change

examples of json I’ve tried:

{
  "window": {
    "allowClose": true,
    "textOverlay": {
      "enabled": true,
      "visible": true,
      "skipEmptyLines": true
    },
    "sideBarOpenByDefault": true,
    "panels": {
      "info": true
    }
  }
}

As I was writing this, I stumbled upon https://www.npmjs.com/package/@4eyes/mirador-ocr-helper?activeTab=code. Should my json look something like this? How do I even do it with the imports?

Once again, thank you for your help

You don’t need to modify the config of the json to display the ocr, you just need to select the ocr plugin and the module takes it in account automatically when an ocr is attached to the item. In fact, the ocr is attached to the iiif manifest, so check the iiif manifest for it.

yeah maybe that’s it. There is no mention of the ocr in the manifest. Any idea how to fix this?

Also, maybe I am misunderstanding the “attaching” part. What I am doing is uploading both files as media to the same item, but maybe this is not the right way?

Still no success, any idea?

Dear Thomas,
Provide (valid) alto.xml files and use Mirador plugin version 3.4.7.16 and you should be good to go. Not all versions work good together.
Please let me know if this is of any help.

1 Like

Hi Peter, thank you for your reply!

I just tried this here, and it didn’t work as well

I read somewhere about an issue with IIIF Search not working with some XML files a guy was using, and he was instructed to use ExtractOCR or format his files in the same witay as the output of ExtractOCR (he did the latter and it apparently worked), so i tried using ExtractOCR just to see if it would work this way (just for testing, because i need to use my own OCR for the project, but I could “translate” them to a format that would work), but it still doesn’t work.
(with ExtractOCR, I tried .tsv, .xml and .alto.xml, all generated by the module. Text was in fact extracted in all of my attempts, so it isn’t the case that the OCR is empty or something)

The modules I have currently are:

(omeka-s 4.1.0)

  • Common 3.4.55
  • Extract Ocr 3.4.7
  • IIIF Search 3.4.8
  • IIIF Server 3.6.20
  • Image Server 3.6.17
  • Mirador Viewer 3.4.7.16 (was 3.4.9 before)

The viewer does appear, but not the things that would be expected with OCR Helper

(for the test with Extract OCR, I’m also adding an image to the medias of the item after extraction, since Mirador doesn’t show pdf)

All help is welcome! :slight_smile:

Dear Thomas,
We (Maastricht University) will go to production with IIIF in the upcoming month and have succeeded in a setup that works with these modules
Advanced Search version 3.4.21
Advanced Search adapter for Solr version 3.5.45
Common version 3.4.55
Extract Text version 2.0.0 (for use with advanced Search)
IIIF Search version 3.4.7
IIIF Server version 3.6.16
Image Server version 3.6.17
You can download this setup for local testing purposes at GitHub - MaastrichtU-Library/omekas-docker: Dockerized development environment for Omeka S

1 Like

Dear Peter,
does your setup support in book search and eventually also copying of OCRed text? If yes, could you please give me a link to such book at the Maastrich university library page?
Thank you in advance
Milos

Dear Milos,
Sorry to disappoint you, but currently we only have IIIF books in our public environment without OCR.
However, we made a proof of concept in test- and accept environment, so it is possible to have ocr’ed books in IIIF.
In order to get that working, you would need both an image file and and and .xml file for each page. In that case, search highlighting becomes available as well.
This works best (only?) with the Mirador viewer.

Dear Peter,
thank you for your fast response. Yes, we have image+xml pairs. Is your test environment also available on Github?

Milos

There was a French meeting last week (journées annuelles Biblissima) where we saw that the config of IIIF was complex, so there will be a new single module “IIIF” that will merge all features with all settings in one place. See Merge with ExtractOcr · Issue #5 · smachefert/Omeka-S-module-IiifSearch · GitHub for more informations.

Unfortunately, I also have problems displaying my full text correctly in Mirador. I have both alto/xml and hOCR available. The display of the images works so far, the search also, but I do not get the full text displayed as in the link mentioned. What do I have to set to display the green marked area?

Omeka 4.1.1
Common 3.4.32
IIIF Search 3.4.8
IIIF Server 3.6.22
Image Server 3.6.19
Mirador Viewer 3.4.8

In the meantime I have been able to find out that the mentioned text overlay is an NPM package as a plugin for Mirador.

I manage to start the extension for Mirador as a standalone application. But unfortunately I am not able to integrate it into the Omeka-S-module-Mirador. I have also tried it with @Daniel_KM 's Mirador-integration-Omeka-Module. But I have no success this way either.

Another Mirador plugin that interests me would be this one for 3D objects. The same here too: no problem as a stand-alone application, but integration into the Omeka module fails.

I would be very happy to receive help

This topic was automatically closed 360 days after the last reply. New replies are no longer allowed.