Adding element text in after save item hook deletes existing element texts

I created a plugin that transfers stripped PDF text from the field created by the PDF Text plugin to the Item Type Metadata Text field. I’m having a problem that when I try to save the new element text any existing metadata is getting deleted.

I think this is a problem with how the ElementText mixin works. After you call saveElementTexts it doesn’t clear out the _elementsOnForm field. I’m guessing that the ElementText object still has the list of elements from the Item edit form when the hook is called. If I add a line at the end of the saveElementTexts function to clear out the _elementsOnForm array ($this->_elementsOnForm = array();) then it seems to fix this problem, but maybe I’m using the mixin wrong.

Here is the code from my plugin:

    private function transferFileText($item){
        $item_text_element = $item->getElement('Item Type Metadata', 'Text');
        $item->deleteElementTextsByElementId(array($item_text_element->id));
    
        $files = $item->getFiles();
        foreach($files as $file){
            $file_texts = $file->getElementTexts('PDF Text', 'Text');
            foreach($file_texts as $text){
                $item->addTextForElement($item_text_element, $text);
            }        
        }
        $item_text_element->save();
        $item->setReplaceElementTexts(false);
        $item->saveElementTexts();            
    }

    public function hookAfterSaveItem($args)
    {
        $item = $args['record'];
        $this->transferFileText($item);
    }

I think your diagnosis is accurate. The ElementText mixin isn’t the most defensively programmed component in the world and it can get a little brittle when it’s used in a different way than the core itself uses it. The pattern that’s lead to problems is this one exactly: saving the texts multiple times within a single request (though the past problem was actually that it would duplicate the existing texts so this is an interesting variation).

We can fix the bug easily, but you’d have to wait for the release to take advantage of the fix. Usually, I’d say the best option is to just switch things around so they’re happening in the before_save hook instead, as that’s when the mixin’s expecting new values to get added and you can just piggyback on the regular saveElementTexts that happens in afterSave. However, your situation’s tricky because you’re depending on the PDF Text plugin being done with its work so you’re sort of stuck being on the after side for the Item.

It might be easier here to just create and save ElementText records directly and thereby side-step the problem with the mixin entirely. (As a side note, it doesn’t look look like there’s any point to save $item_text_element here as you’re not making any changes to it).

Interesting. I had a plugin a while ago that was doing almost exactly the same thing, adding element texts to an item after save. I couldn’t figure out what was different until @jflatnes’s post. The difference was that my plugin was for a site that was only creating items via a CSV import – hence, I guess, no elements on the form?

One twist that I discovered that sounds like it might be relevant here in creating the ElementTexts directly is whether they need to be added to the search system. The Mixin_Search will catch the data from the PDF, but the search results on that text will point to the File, not the Item. If that’s a concern, you might also need to directly add the ElementTexts to the search directly, too.

Thanks. I’ll look into saving the ElementText records directly. (The mixin seemed convenient.) I did try using before_save, but the problem was exactly what you mentioned: the PDF Text isn’t there yet.

Our site is also using CSV import to create items, so we didn’t notice this for a while. We decided to change our workflow to allow editing items and files in Omeka once content is imported (because editing in CSV can be annoying) and only then noticed this problem.

Thanks for the heads up about search. The main reason we are putting the text on the Items is so that they can be found via extracted text using advanced search, so I need to make sure that they are indexed.