Large Omeka sites and searching

We have recently switched from our old, highly customized version of Omeka 1.2 to the latest version and got moved over without too much trouble. We currently have 323604 items in 43 collections with about 7M
rows / 1.5 GB of omeka_element_texts (InnoDB). In a month, we serve about 13M requests to 300K visitors.

Our original install used a modified Solr plugin and worked okay. Now we have tried Simple search and Avant search and are getting very slow, incomplete searches that seem to be causing timeouts on the server.

I am interested in methods that might improve our situation and thoughts anyone may have about an Omeka site this size. I am also considering developing a new search.

What kind of behavior are you getting as an “incomplete” search? Actual incomplete results, or just a timeout? In general I would think if you’re getting any results, you’re probably not having a timeout, though maybe right on the margins it’s possible that there’s enough time to execute the query but not quite enough to then fully render the page in time.

There’s a few different “simple” searches in Omeka Classic, which one are you using? The one available from the item-specific advanced search, the one that can search over more than just items (“sitewide” search) or something else? The item-specific one really isn’t suitable for very large numbers of items: to be maximally flexible and enable some searches that the other method doesn’t, it uses a very simple LIKE MySQL search that basically doesn’t use indexing and is going to be very slow with millions of element texts.

The “sitewide” search uses MySQL’s fulltext indexing, which comes with some drawbacks in terms of limiting queries (only words over a certain length are included, among other things) but has vastly better performance with large amounts of data. It also aggregates things like the element texts into a single row per item/file/etc., so in terms of rows you’re dealing with something more like the hundreds of thousands of items rather than millions of texts.

At the size you’re talking, it’s quite likely that something like the Solr search probably makes the most sense, just because it’s a purpose-built search engine. Was there a particular reason that you moved away from it during the upgrade?

The incomplete search is from the Avant search. In that instance, I have items that I directly search for that won’t return. Because we are having two different major issues with Avant (no ranking in search results and no collection support) along with the timeouts, I need to switch to something else. These issues have caused great frustration for my users.

The old version existed on a patchwork of mods to the Solr database plugin for Omeka 1.x done by someone else ca. 2013. Solr may be the solution I ultimately have to go with.

I have been testing with the advanced sitewide search. What we are working towards is a search refinement sidebar and some modifications to the sitewide search may work.

Does anyone know where the SolrSearch plugin for Omeka Classic has gone?

The GitHub is still up but the plugin seems to have been removed from the plugins page: https://github.com/scholarslab/SolrSearch

Is it being repackaged or edited? Is there an ETA on it being restored?

This topic was automatically closed after 250 days. New replies are no longer allowed.