Store File text in Item object?

shodges · December 22, 2017, 7:01pm

Hi…

I have a collection of around 10k historical documents – originals are PDF files with embedded OCR text, and I’ve used the CsvImport plugin to import them into Omeka.

In this case, each item includes exactly 1 file – in other words, essentially each file becomes a single item.

Because of this 1-1 relationship, I think it would be more straightforward from an end user perspective to have the OCR text placed into the Item object rather than into the File object. It seems a bit nonintuitive when a search returns File objects rather than the Items themselves. Also, I think transcription would work better if the text is in the Item.

I know that I can adjust which kinds of objects are searchable under Settings -> Search -> Search Record Types. But I am not yet sure whether/how I could change things so the OCR text of the PDF files is stored with the Item. It would of course be preferable to do this in a plugin.

If anyone can give me a pointer where to start, it would save me some time and I’d be grateful. Thanks!

shodges · December 22, 2017, 11:45pm

A later thought: I suppose I could directly manipulate the omeka_element_texts table. The rows containing the PDF text have record_type of ‘File’ and element_id of 52; changing these to record_type ‘Item’ and element_id 1 seems to accomplish what I want – the text then belongs to the ‘Text’ field of the ‘Text’ item type. I’m not crazy about doing it this way, but it may be the simplest approach.