CSV import "foreign keys"

Suppose I have 2 TSV files:

Streets.tsv

schema:identifierschema:name
street/1Churchstreet
street/2Longroad
....

Houses.tsv

schema:identifierschema:namegtm:street
house/1Serenity Housestreet/1
house/2Oakhurststreet/1
....

So, a line in Houses.tsv has a gtm:street column which links to a Item from (the previously imported) Streets.tsv, like a foreign key.

As far as I known, the only way to accomplish the links between lines and existing Items when importing Houses.tsv, is to pre-process this file. In the derivative file HousesID.tsv all values of gtm:street are looked up via the schema:identifier property and replaced with the Omeka ID of the Item. Then when I import HousesID.tsv I can specifify as Data Type for the gtm:street column Omeka resource (by ID).

Are there better / more efficient ways to accomplish this, without pre-processing? If not, could this be a feature of the CSV module? Maybe the Data Type Omeka resource (by schema:identifier).

You have the situation correct: for making internal resource values, linking one resource to another, CSV Import currently only offers the option of doing that linkage via the ID, so thereā€™s no way to do them without going back and looking up the IDs to match.

The module contains all the functionality it needs to make these connect across by searching for metadata properties, and itā€™s something thatā€™s allowed when doing the similar task of looking up resources to update. Whatā€™s needed is the interface to let the user choose from among the properties when selecting the Omeka resource data type.

My current thinking for something that would accomplish this without much upheaval is adding a new option, accessible under the ā€œwrenchā€ sidebar. The Omeka resource type would just be renamed ā€œOmeka resource,ā€ and then the new option would let you choose what property to use when looking up a matching resource: the ID as now, or any property.

2 Likes

I have a work-in-progress branch up for the solution I mentioned in my previous post: it adds a new option to the options sidebar allowing you to choose a property thatā€™s used to look up the resource to link to when youā€™re using the ā€œOmeka resourceā€ data type. So you can use IDs (still the default) or a match against any property you choose.

Ready for some testing?

If so, Iā€™ll install the resource-identifier-property branch in a dev environment and put it to the test.

Sure, have at it.

I donā€™t know if youā€™ve installed CSV Import from a Git checkout before; if you havenā€™t, just a note that unlike most modules it has a Composer dependency so you need to run composer install in it when checking it out.

John, Iā€™ve tested the resource-identifier-property branch, but run into issues.
The first one (missing ā€˜useā€™) is easy, by the always empty $identifier I donā€™t quite understand, from the log: "" (dcterms:identifier) is not a valid resource..

See my inline comments in Add resource identifier property option Ā· omeka-s-modules/CSVImport@fb1bf4f Ā· GitHub

The fixes for these problems are now incorporated in the branch.

1 Like

This feature is officially out, in version 2.6.0 of CSV Import.

1 Like