OCR capability in Omeka

deepthimurali · February 28, 2023, 4:04pm

Hello,
Other than the Extract Text Module, is there anyway to make images (PDF and .jpeg) of text documents OCR-capable on Omeka S? The Extract Text module works on PDFs but it “extracts” it into a field called extract text. Thank you.

AllanaMayer · February 28, 2023, 4:26pm

Yes, Extract Text is designed to make a separate metadata field that just contains the text-only content for searchability. It can be a bit aesthetically unsatisfying, but you can use your resource template to hide that metadata field’s display while still including it in searching.

All of the “Extract Text” functions are just replicating what’s already in a text layer on the PDF, HTML, DOCX, etc. They are not actually running optical character recognition on the files you upload.
https://omeka.org/s/docs/user-manual/modules/extracttext/

So, if you want to recognize text in images, you will need to do that separately using other software, then ingest both the image and text into Omeka (or combine the two into a PDF and upload that).

If I’m missing the point of your question, please let me know what you have in mind. Thanks!

deepthimurali · February 28, 2023, 4:48pm

Hello Allana,

Thank you so much for your reply. I am wondering if you also know how to “hide” because on our end we could not get the extract text field to be hidden. Is there some setting that enables this? Thanks again. Greatly appreciate the help.

AllanaMayer · February 28, 2023, 6:11pm

Sorry, forgot how I did it at first. You install the Hide Properties module:

and add the field there. It should automatically work for items and media.

deepthimurali · March 1, 2023, 12:17pm

Thanks so much! This I think resolves some our issues.

system · February 24, 2024, 12:17pm

This topic was automatically closed 360 days after the last reply. New replies are no longer allowed.