Search for recent created/updated items

coret · April 3, 2022, 11:16am

How can I search for items which have been created (o:created) or updated (o:modified) recently or since a specific date?

jflatnes · April 5, 2022, 8:44pm

You can sort by created/updated pretty easily but we don’t have much in the way of allowing you to search it.

coret · April 5, 2022, 11:40pm

My use-case is an crawler to get all data from Omeka S, convert it to N-triples to import into a triplestore (in order to have a SPARQL-endpoint on my data). Currently the crawler just gets all data which of course isn’t efficient, that why I want the crawler to focus on new and updated items (to update the file cache).

I guess, the sort option will suffice. My crawler just needs to call api/items?per_page=100&sort_by=created&sort_order=desc&page={page} and then api/items?per_page=100&sort_by=modified&sort_order=desc&page={page}, And I just have to let the crawler decide what items are of interest (have it only include items with a created/updated date after the last crawl, data from other items can just be fetched from file cache) and if the next page should be requested.

Caeiro · April 8, 2022, 2:11am

You may have already solved your problem, but to you or anyone trying to serve Omeka S’ data on a SPARQL endpoint (or just dump it in some RDF format other than JSON-LD), I would like to point out that it can be achieved using Ontop or a similar software (like the D2R Server), if you have access to Omeka’s database. Depending on the triplestore that you have been using, it might allow you to virtualize Omeka’s database without requiring an external software.

In my case, for example, I’ve been using GraphDB to get Omeka S’ data into a virtual repository (GraphDB has Ontop embedded in it for that purpose) and then import the virtualized data into a native GraphDB repository, in order to avoid the limitations of a virtual one. Virtuoso also has a similar feature, but only in its paid version.

Since I mentioned GraphDB, it is also valid to comment that you can use it to directly import JSON-LD data from a URL and, consequently, from Omeka’s API. However, it will only import data from that single URL, meaning that, as far as I know, it will not follow any kind of pagination. To circumvent that, you can use Omeka’s API “per_page” parameter to get all the data at once or, if that’s not possible somehow, you can also use an external script to get all the necessary page URLs from Omeka’s API and then call GraphDB’s API to import the JSON-LD from those URLs.

coret · April 8, 2022, 11:14am

Hi @Caeiro,

As I use GraphDB myself, the thought of making a virtual repository has crossed my mind, but never got around or exploring this route, because making the mapping from the relational database to RDF seemed somewhat daunting.

I really like the idea of having the Omeka S data in a triplestore with SPARQL endpoint. I have the crawler route working, but the route via a virtual repository feels “cleaner”.

Can you share more info or even configuration/mapping/CONSTRUCTs how you have achieved the virtual repository from Omeka S data?

Caeiro · April 8, 2022, 8:59pm

Unfortunately, I can’t share specific codes or configurations because I don’t know if I would be allowed to share them (they belong to a project I’ve been collaborating with). However, in order to set up a virtual repository, it isn’t really difficult. The main problem I had, back when I was still testing the D2R Server (that was my first shot at virtualization, because it was mentioned in several old books and articles), was dealing with incomplete or nonexistent documentation about some things.

If you use Ontop (standalone or embedded in GraphDB), things should be more straightforward. Firstly, in order to facilitate the mapping, you can create a view in Omeka’s database where each of its rows represent a triple (alongside with some other potentially relevant data). I don’t know how familiarized you are with the data model used by Omeka S, but tables that may be relevant are “item”, “value”, “resource”, “resource_class”, “vocabulary”, “property” and “media”. Additionally, given the fact that you might have Ontop accessing the database from a remote server, you might want to create a MySQL user with permissions solely for that view, and allow connections from that server’s IP.

After you do that, Ontop is going to need a “.properties” file, where you are going to put the credentials required to access the MySQL database; a mappings file (that can be OBDA or R2RML, having the first an easier syntax), where you will effectively map the data to triples; and a file containing the ontology being used. An example of properties and mappings files can be obtained from Ontop’s tutorial.

The “.properties” is very straightforward, but it is worth to point out that, depending on how your database is set up, you might have to provide additional parameters to the “jdbc.url”. For example, if your MySQL installation can’t be served using SSL, the URL might become something like:
jdbc\:mysql\://[YOUR IP]\:[DATABASE PORT]/[DATABASE NAME]?characterEncoding\=utf8&verifyServerCertificate\=false&useSSL\=false&requireSSL\=false

As for the mappings, if you create a view like I suggested, it also shouldn’t be too difficult. The example available in Ontop’s tutorial should be a good starting point. However, I must point out that, in my case, I used as item URIs the ones defined by Omeka S and, considering that much of the work was done during the view creation, I had to define only a handful of mappings. For example, if your view is called “my_view”, a mapping declaration to define the item class could be done like this, where item_id and class_uri can be obtained from the tables I mentioned earlier:

mappingId       items_class
target          <https://myomekaserver.com/api/items/{item_id}> a <{class_uri}> .
source          select item_id, class_uri from my_view

Mapping declarations for properties would be done in a similar way, but you might need more than one declaration for different types of properties or values, depending on how you build your view (or if you create more than one view in order to deal with different types of property values — text, URI or Omeka resource — in a more elegant/efficient manner).

Also note that having only one view is very inefficient (unless it is a materialized view), because it makes lots of joins unnecessarily at every single query that is made on the virtual repository. In my case, that’s tolerable because that virtual repository is only used to import data to another native GraphDB repository — a process that is only executed when the Omeka S administrator signals that there is new data to be imported. In your case, things might be different and you might need to find a better approach.

Once you have those two files (plus the ontology file), you can run Ontop virtual endpoint using the following command, like stated in Ontop’s guide. It will also require you to add MySQL’s JDBC driver to Ontop’s “jdbc” folder (and, of course, will require you to have a JRE set up).

./ontop endpoint -m your_mappings.obda \
                 -t ontology_file.ttl \
                 -p your_db_credentials.properties \
                 --cors-allowed-origins=*

I highly recommend you to test things using Ontop CLI before jumping to GraphDB, because, if you do something wrong, you might not be able to see the full errors through GraphDB. Once everything is Ok, you can set up an Ontop virtual repository in GraphDB by following its documentation.

If you choose to take an approach like the one I took and import the data to a native repository, you can do so through query federation, like presented in GraphDB’s documentation. In my case, I just run a federated query to get all the data from the virtual repository and insert that data in a named graph in the repository from where I run the query.

jflatnes · April 14, 2022, 8:06pm

Back on the original question here: we’ll have before/after queries for the created and modified dates in the next version of Omeka S (probably that will be version 4.0.0).

system · December 20, 2022, 8:06pm

This topic was automatically closed 250 days after the last reply. New replies are no longer allowed.