Other than the Extract Text Module, is there anyway to make images (PDF and .jpeg) of text documents OCR-capable on Omeka S? The Extract Text module works on PDFs but it “extracts” it into a field called extract text. Thank you.
Yes, Extract Text is designed to make a separate metadata field that just contains the text-only content for searchability. It can be a bit aesthetically unsatisfying, but you can use your resource template to hide that metadata field’s display while still including it in searching.
All of the “Extract Text” functions are just replicating what’s already in a text layer on the PDF, HTML, DOCX, etc. They are not actually running optical character recognition on the files you upload.
So, if you want to recognize text in images, you will need to do that separately using other software, then ingest both the image and text into Omeka (or combine the two into a PDF and upload that).
If I’m missing the point of your question, please let me know what you have in mind. Thanks!
Thank you so much for your reply. I am wondering if you also know how to “hide” because on our end we could not get the extract text field to be hidden. Is there some setting that enables this? Thanks again. Greatly appreciate the help.
Sorry, forgot how I did it at first. You install the Hide Properties module:
and add the field there. It should automatically work for items and media.
Thanks so much! This I think resolves some our issues.