I’ve been trying to get this working and obviously I’m doing something wrong because it’s just not working!
I have documents with multiple pages. To reduce file sizes, I split the document’s .pdf into multiple files: 1 pdf file for each page of the document.
I’m using Universal Viewer to view the files. I’d like to be able to page through the files like I can through a series of .jpg files, but the paginator icons in UV are greyed out and the pagination bar at the top of the viewer is not visible. Can I page through multiple .pdf files in the Universal Viewer?
I want to be able to search the text of the .pdf files like in this example: Universal Viewer Content Search :: IIIF Technical Workshop
I have fiddled with various configurations and I can’t get the search bar to show up. Please help.
Here’s what I’m using right now:
Omeka S 3.2.0
Universal Viewer 22.214.171.124
Image Server 126.96.36.199
IIIF Server 188.8.131.52
Extract OCR 184.108.40.206
IIIF Search 220.127.116.11
I’m attaching screenshots of my Image Server and IIIF Server config pages, in case they’re useful.
(edited to add IIIF search module version)
@Daniel_KM do you have any recommendations?
IIIF specifications don’t manage pdf, but only three things : image, audio and video (and 3d in next version). So Universal viewer improves it to be able to display other formats, included pdf. So the pdf is not displayed according to specifications, but with a pdf viewer. So the search in a pdf is not done in a iiif-way, but like in any pdf.
To be able to search with iiif, it should be an image for each page. Then, the module IIIF Search will help you to search inside the viewer. But with this module, the text should be in a specific and simple format, extracted from a pdf attached to the item, pdf2xml, via the module ExtractOcr.
So to have search in Universal Viewer or in Mirador or any other iiif viewer, you have to:
- install the modules IiifSearch and ExtractOcr
- create an item with all images and the pdf attached
I have done an improvement of the module to use xml Alto as source of ocr for iiif search to avoid to extract it from a pdf. it will be released soon.
@Daniel_KM Thank you.
I’m now pursuing a jpg + alto xml solution.
I have alto xml transcriptions that were generated by Abbyy Finereader. I’d love to use them with your new release of the IIIF Search module. When are you planning to release it?
In testing search with a pdf file and the generated pdf2xml, I’m able to search for occurrences of a single word. But if I search for a phrase, then I get all occurrences of each word, not matches on the phrase. It’s the same when I put my phrase in quotes.
It would be awesome if this plugin combination could search phrases instead.
Still interested in and excited about @Daniel_KM 's alto solution, and I’m trying to be patient.
This topic was automatically closed 250 days after the last reply. New replies are no longer allowed.