Batch update job strategy

Hello,

Question about the batch update job: what is the strategy in breaking it down the updates into the 3 parts (data, data_append, and data_remove)? The reason I ask is because we use the Solr/Search modules and in the course of making batch updates, I noticed that records in our Solr index are updated multiple times in the same batch update because it’s triggering api.update.post for each part and that is what is being used in the module to trigger updating records in the index.

The potential issue with this is the module is making hard commits on the Solr index more than it necessarily needs to which can then cause performance issues with Solr. So I understand that’s also an issue with the module itself that could be addressed (and separately I’ve forked the modules to work with auto-commits and soft commits to mitigate this), but still wondering about the necessity of breaking down the batch update job into multiple parts. It seems like it only does this when you do batch edit all with items, whereas if you run a batch edit on selected items it’s not breaking it down similarly. Does it need to be that way?

Thanks,

Joe Anderson

What version of Omeka S are you using? We fixed a related bug in v4.0.2, so perhaps an upgrade will resolve this.

If you’re on a more recent release, then I suspect a module (maybe one of the Solr/Search modules) is adding form elements to the batch form using different collection actions, causing the multiple requests to the API that you’re seeing (more about “collection actions” below).

what is the strategy in breaking it down the updates into the 3 parts (data, data_append, and data_remove)?

I’ll try to explain, but it’s a little in the weeds. Requests to the API may declare one of three actions to handle Doctrine collections. This will tell the API which action to take on certain collections during partial UPDATE requests. (You can see the actions here).

Depending on the modules you use, the batch update form may contain form elements that require different actions. This is why it’s broken down into three parts, one for each action.

Hi Jim,

We’re currently on 4.0.2, but I think based on what you’re saying is that either way if I run a batch update like so,

{
    "resource": "items",
    "query": {
        "item_set_id": "219579"
    },
    "data": [],
    "data_remove": {
        "o:is_public": "1",
        "o:resource_template": {
            "o:id": "15"
        },
        "o:item_set": [
            "106"
        ]
    },
    "data_append": {
        "o:site": [
            "5"
        ]
    }
}

Which includes both an ‘append’ and a ‘remove’, $api->batchUpdate effectively gets called twice, and each record is essentially update/saved twice?

Yes, you are correct. All resources in the batch are saved for every collection action. There’s no way to tell beforehand which collection action applies to which resource.

Ok, thanks. Again, it’s much less of an issue for us anyway since I changed our commit approach in Solr, but just wanted to make sure I was understanding everything right.