I have a collection of around 10k historical documents – originals are PDF files with embedded OCR text, and I’ve used the CsvImport plugin to import them into Omeka.
In this case, each item includes exactly 1 file – in other words, essentially each file becomes a single item.
Because of this 1-1 relationship, I think it would be more straightforward from an end user perspective to have the OCR text placed into the Item object rather than into the File object. It seems a bit nonintuitive when a search returns File objects rather than the Items themselves. Also, I think transcription would work better if the text is in the Item.
I know that I can adjust which kinds of objects are searchable under Settings -> Search -> Search Record Types. But I am not yet sure whether/how I could change things so the OCR text of the PDF files is stored with the Item. It would of course be preferable to do this in a plugin.
If anyone can give me a pointer where to start, it would save me some time and I’d be grateful. Thanks!