CSV import "foreign keys"

Suppose I have 2 TSV files:

Streets.tsv

schema:identifierschema:name
street/1Churchstreet
street/2Longroad
....

Houses.tsv

schema:identifierschema:namegtm:street
house/1Serenity Housestreet/1
house/2Oakhurststreet/1
....

So, a line in Houses.tsv has a gtm:street column which links to a Item from (the previously imported) Streets.tsv, like a foreign key.

As far as I known, the only way to accomplish the links between lines and existing Items when importing Houses.tsv, is to pre-process this file. In the derivative file HousesID.tsv all values of gtm:street are looked up via the schema:identifier property and replaced with the Omeka ID of the Item. Then when I import HousesID.tsv I can specifify as Data Type for the gtm:street column Omeka resource (by ID).

Are there better / more efficient ways to accomplish this, without pre-processing? If not, could this be a feature of the CSV module? Maybe the Data Type Omeka resource (by schema:identifier).

You have the situation correct: for making internal resource values, linking one resource to another, CSV Import currently only offers the option of doing that linkage via the ID, so there’s no way to do them without going back and looking up the IDs to match.

The module contains all the functionality it needs to make these connect across by searching for metadata properties, and it’s something that’s allowed when doing the similar task of looking up resources to update. What’s needed is the interface to let the user choose from among the properties when selecting the Omeka resource data type.

My current thinking for something that would accomplish this without much upheaval is adding a new option, accessible under the “wrench” sidebar. The Omeka resource type would just be renamed “Omeka resource,” and then the new option would let you choose what property to use when looking up a matching resource: the ID as now, or any property.

2 Likes

I have a work-in-progress branch up for the solution I mentioned in my previous post: it adds a new option to the options sidebar allowing you to choose a property that’s used to look up the resource to link to when you’re using the “Omeka resource” data type. So you can use IDs (still the default) or a match against any property you choose.

Ready for some testing?

If so, I’ll install the resource-identifier-property branch in a dev environment and put it to the test.

Sure, have at it.

I don’t know if you’ve installed CSV Import from a Git checkout before; if you haven’t, just a note that unlike most modules it has a Composer dependency so you need to run composer install in it when checking it out.

John, I’ve tested the resource-identifier-property branch, but run into issues.
The first one (missing ‘use’) is easy, by the always empty $identifier I don’t quite understand, from the log: "" (dcterms:identifier) is not a valid resource..

See my inline comments in Add resource identifier property option · omeka-s-modules/CSVImport@fb1bf4f · GitHub