Bulk Import Module - language and data type conflict

Hello, I’m configuring the Bulk Import Module, and I stumbled upon an issue. We’re using Bulk Import to import csv files with headers that contain both language and data type. When a column header has a language and certain data types, the language setting doesn’t import.

It seems like the language setting fails in combination with data types created by Numeric Data Type and Data Type RDF modules.

The language does not set for these:

dcterms:description ^^html @es
dcterms:description ^^xml @es
dcterms:description ^^numeric:integer @es
dcterms:description ^^numeric:timestamp @es

The language is set just fine for these:

dcterms:description ^^uri @es
dcterms:description @es

I submitted an issue in the Bulk Import module repository. I’m posting it here too, in case anyone can help me, in case others have noticed the same issue, and in case the thread is useful to someone else.

Thanks!

I’ve been working on this. I’ve made some progress, but now I’m stuck.

What I’ve done so far:
The language wasn’t included during resource processing, when filling property values. In /BulkImport/src/Processor/ResourceProcessor.php, I made 2 changes. I added a condition for the ‘html’ data type in fillProperty() and in fillPropertyForValue()

This is where I’m stuck:
Somehow the language is dropped during entity creation.

Here’s the function where I notice the language disappear.
It’s within createEntity() in BulkImport/src/Processor/AbstractResourceProcessor.php

protected function createEntity(array $resource): ?AbstractEntityRepresentation{
        // Linked ids from identifiers may be missing in data. So two solutions
        // to add missing ids: create resources one by one and add ids here, or
        // batch create and use an event to fill add ids.
        // In all cases, the ids should be stored for next resources.
        // The batch create in api adapter is more a loop than a bulk process.
        // The main difference is the automatic detachment of new entities,
        // instead of a clear. In doctrine 3, detachment will be removed.

        // So act as a loop.
        // Anyway, in most of the cases, the loop contains only one resource.

        // And in fact, there is no more loop, but a single resource.

        $defaultResourceName = $this->getResourceName();

        // Manage mixed resources.
        $resourceName = $resource['resource_name'] ?? $defaultResourceName;
        $resource = $this->bulkIdentifiers->completeResourceIdentifierIds($resource);

        // Remove uploaded files for items.
        foreach ($resource['o:media'] ?? [] as $key => $media) {
            if (($media['o:ingester'] ?? null )=== 'bulk' && ($media['ingest_ingester'] ?? null) === 'upload') {
                $resource['o:media'][$key]['ingest_delete_file'] = true;
            }
        }

        try {
            $response = $this->bulk->api(null, true)
                ->create($resourceName, $resource);
        } catch (ValidationException $e) {
            $resource['messageStore']->addError('resource', new PsrMessage(
                'Error during validation of the data before creation.' // @translate
            ));
            $messages = $this->listValidationMessages($e);
            $resource['messageStore']->addError('resource', $messages);
            $this->bulkCheckLog->logCheckedResource($this->indexResource, $resource);
            ++$this->totalErrors;
            return null;
        } catch (\Exception $e) {
            $resource['messageStore']->addError('resource', new PsrMessage(
                'Core error during creation: {exception}', // @translate
                ['exception' => $e]
            ));
            $this->bulkCheckLog->logCheckedResource($this->indexResource, $resource);
            ++$this->totalErrors;
            return null;
        }

        $representation = $response->getContent();
        $this->bulkIdentifiers->storeSourceIdentifiersIds($resource, $representation);
        if ($representation->resourceName() === 'media') {
            $this->logger->notice(
                'Index #{index}: Created media #{media_id} (item #{item_id})', // @translate
                ['index' => $this->indexResource, 'media_id' => $representation->id(), 'item_id' => $representation->item()->id()]
            );
        } else {
            $this->logger->notice(
                'Index #{index}: Created {resource_name} #{resource_id}', // @translate
                ['index' => $this->indexResource, 'resource_name' => $this->easyMeta->resourceLabel($resourceName), 'resource_id' => $representation->id()]
            );
        }

        return $representation;
    }

At line 1218, the language is present in $resource:

$response = $this->bulk->api(null, true)
                ->create($resourceName, $resource);

When I log $resource, its contents look like this:

{"resource":"{\"resource_name\":\"items\",\"o:id\":null,\"source_index\":1,\"checked_id\":true,\"o:owner\":{\"o:id\":1},\"o:is_public\":true,\"o:item_set\":[],\"o:media\":[],\"dcterms:title\":[{\"type\":\"literal\",\"property_id\":1,\"is_public\":true,\"@value\":\"Jesús Sánchez Erazo [Chuíto el de Bayamón]\",\"o:lang\":null}],\"dcterms:description\":[{\"type\":\"uri\",\"property_id\":4,\"is_public\":true,\"@id\":\"https://www.geonames.org\",\"o:label\":null,\"o:lang\":\"es\"},{\"type\":\"html\",\"property_id\":4,\"is_public\":true,\"@value\":\"<p>Jes&uacute;s S&aacute;nchez Erazo, known as Chu&iacute;to el de Bayam&oacute;n, was born in 1900 in Bayam&oacute;n, Puerto Rico. His upbringing stressed the importance of music and working the land. S&aacute;nchez Erazo was a <em>trovador</em> and based his <em>d&eacute;cimas</em> (ten octosyllabic poetic stanzas) on everyday life and scenes from rural Puerto Rico. He began his career in a bolero duo with Beno, a guitarist and popular music singer. At the same time, he started performing <em>d&eacute;cimas</em> as a soloist and competing with troubadours at patron saint festivals in various municipalities.&nbsp;</p>\",\"o:lang\":\"en\"}],\"has_error\":false,\"messageStore\":{}}"

Then, at line 1239, the language is missing in $representation:

$representation = $response->getContent();

When I log $representation, its contents looks like this:

"representation":"object(Omeka\\Api\\Representation\\ItemRepresentation) {\"@context\":\"https:\\/\\/domain.com\\/api-context\",\"@id\":\"https:\\/\\/domain.com\\/api\\/items\\/215\",\"@type\":\"o:Item\",\"o:id\":215,\"o:is_public\":true,\"o:owner\":{\"@id\":\"https:\\/\\/domain.com\\/api\\/users\\/1\",\"o:id\":1},\"o:resource_class\":null,\"o:resource_template\":null,\"o:thumbnail\":null,\"o:title\":\"Jes\\u00fas S\\u00e1nchez Erazo [Chu\\u00edto el de Bayam\\u00f3n]\",\"thumbnail_display_urls\":{\"large\":null,\"medium\":null,\"square\":null},\"o:created\":{\"@value\":\"2025-02-27T18:42:19+00:00\",\"@type\":\"http:\\/\\/www.w3.org\\/2001\\/XMLSchema#dateTime\"},\"o:modified\":{\"@value\":\"2025-02-27T18:42:19+00:00\",\"@type\":\"http:\\/\\/www.w3.org\\/2001\\/XMLSchema#dateTime\"},\"o:primary_media\":null,\"o:media\":[],\"o:item_set\":[],\"o:site\":[{\"@id\":\"https:\\/\\/domain.com\\/api\\/sites\\/1\",\"o:id\":1},{\"@id\":\"https:\\/\\/domain.com\\/api\\/sites\\/2\",\"o:id\":2}],\"dcterms:title\":[{\"type\":\"literal\",\"property_id\":1,\"property_label\":\"Title\",\"is_public\":true,\"@value\":\"Jes\\u00fas S\\u00e1nchez Erazo [Chu\\u00edto el de Bayam\\u00f3n]\"}],\"dcterms:description\":[{\"type\":\"uri\",\"property_id\":4,\"property_label\":\"Description\",\"is_public\":true,\"@id\":\"https:\\/\\/www.geonames.org\",\"o:lang\":\"es\"},{\"type\":\"html\",\"property_id\":4,\"property_label\":\"Description\",\"is_public\":true,\"@value\":\"<p>Jes&uacute;s S&aacute;nchez Erazo, known as Chu&iacute;to el de Bayam&oacute;n, was born in 1900 in Bayam&oacute;n, Puerto Rico. His upbringing stressed the importance of music and working the land. S&aacute;nchez Erazo was a <em>trovador<\\/em> and based his <em>d&eacute;cimas<\\/em> (ten octosyllabic poetic stanzas) on everyday life and scenes from rural Puerto Rico. He began his career in a bolero duo with Beno, a guitarist and popular music singer. At the same time, he started performing <em>d&eacute;cimas<\\/em> as a soloist and competing with troubadours at patron saint festivals in various municipalities.&nbsp;<\\/p>\",\"@type\":\"http:\\/\\/www.w3.org\\/1999\\/02\\/22-rdf-syntax-ns#HTML\"}]}

I can’t figure out where the representation is created. If I knew this, then maybe I could figure out what goes wrong in the process.

Thanks for reading this post and thanks in advance for any suggestions.

Hello again,

I deactivated all plugins except Data Type RDF, Common, Log, and Bulk Import. I’m using the module’s default CSV - Items importer. The problem persists.

I’m curious whether anyone can replicate the issue. I’ll attach my test csv file in case this is helpful. test_language_datatype.csv (776 Bytes)

Perhaps @Daniel_KM can help with this?

Thanks in advance for any assistance.

In module Numeric Data Types, a numeric data type cannot have a language.

Can RDF data types have a language?
html is the data type I am trying to use.