currently porting our digital library to omeka-s, there are some modules that we need to upgrade to the new system. One of them is ExtractOcr which one of the goals is to extract the OCR from pdf files when they are added to the library.
To achieve this, the hookBeforeSaveFile function is being used. So far I have not been able to find a way to reproduce that using the ingester system of omeka-s.
Am I missing something or the only way would be to add a custom ingester?
Depending on your process, you can use
post if you launch a job). See https://omeka.org/s/docs/developer/reference/events:
Note that there is PdfText to extract text too (see https://daniel-km.github.io/UpgradeToOmekaS for all upgraded modules or in progress).
Thanks @Daniel_KM I should have thought about looking at source of PdfText, that’s exactly what we need. Thanks for the pointer, I had not yet encountered the events listeners when working with omeka-s, that will be extremely useful.
is there a triggered event after add media to an item from the edit page ?
In this example :
‘createMedia’ is only triggered when i use directly the omeka api, it doesn’t work
after a classic import of media.
Is it an issue or it’s common ?
When you add a media from the edit page,
itemAdapter is triggered and it manages the media directly (so, the api events are not triggered for media: see item adapter).
So, i need to browse all media in $item->getMedia() and when i found a pdf media i launch my job?
This is like that ArchiveRepertory works. But depending on your job, you may use other events related to the entity (media in your case) “entity.persist.pre”, but it is more complex to manage.