Media api search is much slower than item search

Hello,

I’m working on a module and I ran into some funny behavior. I want to be able to locate a specific media that has a particular dc:identifier so I’m using

$media = $api->searchOne('media', [
                    'property' => [
                        [
                            'property' => $identifier->id(),
                            'type' => 'eq',
                            'text' => $page_id,
                        ],
                    ],
                ])->getContent();

And it works but I noticed it goes kinda slow. So I tried out another search using the ‘items’ api also by dc:identifier. It also worked but the response time is about 4/5x quicker, which seems substantial. We have about 10,000 items and about 90,000 media in our instance. Would that size discrepancy explain that? It doesn’t seem like a terrible amount for a mysql database to handle.

Thanks,

Joseph

Hmm, how slow are we talking, roughly?

I can’t think of any reasons immediately that this should be happening… the values and all that is pretty much all shared between items, sets and media. There being substantially more media would account for some difference but those aren’t particularly large sizes you’re talking about.

For the item, it takes about 1/1.5 seconds response and for media it’s 5 seconds. Granted, I’m doing this on a docker container so that might be slowing down everything, but I’ve also noticed on our live site in the admin browse, just loading the media page is slower than items. It’s peculiar.

I tried deactivating all the modules to see if any of them we doing anything, but it didn’t seem to make a difference.

Hey @jflatnes, one thing I was thinking about trying running the query directly in mysql and seeing if there was any difference. Of course, if you just search the values table for a particular value it doesn’t make a difference, but I imagine there might be some sort of join going on with the item or media tables. Do you know what the specific query that happens if you do a property search on a media or item?

Thanks,

Joseph

One option is to enable SQL logging and EXPLAIN the long-running media query. To enable logging, open /config/database.ini and set the log_path. For example:

log_path = /path/to/omeka-s/logs/sql.log

Remember to un-comment the line and make sure that the log file is writable by your web server.

After loading a media browse page, open the log and find the query that includes FROM media. Do you see any joins beyond the expected INNER JOIN resource?

In addition, you could prepend EXPLAIN to the query and run it directly in your database. This will get you the statement execution plan, which could provide some insights into why your query is running slowly.

Hey @jimsafley

So I ran them both directly in mysql and the discrepancy is there as well, with item taking about 0.3 second and media taking 3.6 seconds.

Just so I’m doing everything right. Here are the 2 queries:
Item

SELECT
    r0_.id AS id_0,
    r0_.title AS title_1,
    r0_.is_public AS is_public_2,
    r0_.created AS created_3,
    r0_.modified AS modified_4,
    r0_.resource_type AS resource_type_5,
    r0_.owner_id AS owner_id_6,
    r0_.resource_class_id AS resource_class_id_7,
    r0_.resource_template_id AS resource_template_id_8,
    r0_.thumbnail_id AS thumbnail_id_9
FROM
    item i1_
INNER JOIN resource r0_ ON
    i1_.id = r0_.id
LEFT JOIN `value` v2_ ON
    r0_.id = v2_.resource_id AND(v2_.property_id = 10)
WHERE
    (
        v2_.value_resource_id IN(
        SELECT
            r3_.id
        FROM
            resource r3_
        LEFT JOIN item i4_ ON
            r3_.id = i4_.id
        LEFT JOIN item_set i5_ ON
            r3_.id = i5_.id
        LEFT JOIN value_annotation v6_ ON
            r3_.id = v6_.id
        LEFT JOIN media m7_ ON
            r3_.id = m7_.id
        WHERE
            r3_.title = "US.NN.FIT.SC.362.1.1.262.19410303"
    ) OR v2_.`value` = "US.NN.FIT.SC.362.1.1.262.19410303" OR v2_.uri = "US.NN.FIT.SC.362.1.1.262.19410303"
    )
GROUP BY
    r0_.id
ORDER BY
    r0_.id ASC
LIMIT 1

Media

SELECT
    r0_.id AS id_0,
    r0_.title AS title_1,
    r0_.is_public AS is_public_2,
    r0_.created AS created_3,
    r0_.modified AS modified_4,
    m1_.ingester AS ingester_5,
    m1_.renderer AS renderer_6,
    m1_.data AS data_7,
    m1_.source AS source_8,
    m1_.media_type AS media_type_9,
    m1_.storage_id AS storage_id_10,
    m1_.extension AS extension_11,
    m1_.sha256 AS sha256_12,
    m1_.size AS size_13,
    m1_.has_original AS has_original_14,
    m1_.has_thumbnails AS has_thumbnails_15,
    m1_.position AS position_16,
    m1_.lang AS lang_17,
    m1_.alt_text AS alt_text_18,
    r0_.resource_type AS resource_type_19,
    r0_.owner_id AS owner_id_20,
    r0_.resource_class_id AS resource_class_id_21,
    r0_.resource_template_id AS resource_template_id_22,
    r0_.thumbnail_id AS thumbnail_id_23,
    m1_.item_id AS item_id_24
FROM
    media m1_
INNER JOIN resource r0_ ON
    m1_.id = r0_.id
LEFT JOIN `value` v2_ ON
    r0_.id = v2_.resource_id AND(v2_.property_id = 10)
WHERE
    (
        v2_.value_resource_id IN(
        SELECT
            r3_.id
        FROM
            resource r3_
        LEFT JOIN item i4_ ON
            r3_.id = i4_.id
        LEFT JOIN item_set i5_ ON
            r3_.id = i5_.id
        LEFT JOIN value_annotation v6_ ON
            r3_.id = v6_.id
        LEFT JOIN media m7_ ON
            r3_.id = m7_.id
        WHERE
            r3_.title = "US.NN.FIT.SC.362.1.1.262.19410303_008"
    ) OR v2_.`value` = "US.NN.FIT.SC.362.1.1.262.19410303_008" OR v2_.uri = "US.NN.FIT.SC.362.1.1.262.19410303_008"
    )
GROUP BY
    r0_.id
ORDER BY
    r0_.id ASC
LIMIT 1

I ran the explain on both, but I don’t really know if I understand how to decipher it. One thing that jumped out at me is that in the first row for the Item explain it says that it uses the index but doesn’t say that for Media.

Interested to hear what you think—thanks!

Hey guys, I was playing around with this further, and it looks like the element of the query that tremendously slows it down is searching for the value_resource_id. If you take that out of the search, it goes from 3.6 seconds to 0.0007 seconds. Would there be a way to make that optional using the php api?

SELECT
    r0_.id AS id_0,
    r0_.title AS title_1,
    r0_.is_public AS is_public_2,
    r0_.created AS created_3,
    r0_.modified AS modified_4,
    m1_.ingester AS ingester_5,
    m1_.renderer AS renderer_6,
    m1_.data AS data_7,
    m1_.source AS source_8,
    m1_.media_type AS media_type_9,
    m1_.storage_id AS storage_id_10,
    m1_.extension AS extension_11,
    m1_.sha256 AS sha256_12,
    m1_.size AS size_13,
    m1_.has_original AS has_original_14,
    m1_.has_thumbnails AS has_thumbnails_15,
    m1_.position AS position_16,
    m1_.lang AS lang_17,
    m1_.alt_text AS alt_text_18,
    r0_.resource_type AS resource_type_19,
    r0_.owner_id AS owner_id_20,
    r0_.resource_class_id AS resource_class_id_21,
    r0_.resource_template_id AS resource_template_id_22,
    r0_.thumbnail_id AS thumbnail_id_23,
    m1_.item_id AS item_id_24
FROM
    media m1_
INNER JOIN resource r0_ ON
    m1_.id = r0_.id
LEFT JOIN `value` v2_ ON
    r0_.id = v2_.resource_id AND(v2_.property_id = 10)
WHERE
    (
        v2_.`value` = "US.NN.FIT.SC.362.1.1.262.19410303_008" OR v2_.uri = "US.NN.FIT.SC.362.1.1.262.19410303_008"
    )
GROUP BY
    r0_.id
ORDER BY
    r0_.id ASC
LIMIT 1

That subquery is necessary to match on the titles of linked resources. Thanks for posting the EXPLAIN. We’ll look into how we can optimize the query. In the meantime you could comment out this line, which should make your query faster.

Ok, thanks, that’s helpful!

Just a note: we only have about a couple hundred values that are linked to resources, none of which are in the identifier field (id: 10), so is it searching all the titles of all resources even if they’re not linked?

What version of MySQL are you running?

Looks like on the docker container I’m using 5.7 :expressionless:
but we’re more up-to-date on our production site, 10.5.16-MariaDB.

I guess I could try to update the docker container and see if that makes a difference.

If it’s actually 5.7.something in the container, you’re likely using MySQL in one place and MariaDB in the other, also.

I also just ran the queries on our production server, which uses MariaDB 10.5 and got pretty similar results. And also removing the linked resources subquery really sped it up too. I’ll comment that out for now—I don’t imagine it will cause any issue for our site.

Thanks for the help!

One more thing!

We haven’t had a chance to upgrade yet to v 4 and are still on 3.2.3. Is there anything that’s changed there that might affect this?
(We plan on upgrading soonish either way…)

There isn’t anything in 4, I don’t think.

However, we’re going to be including a change in what will be 4.0.2 that changes the indexing for the resource table in a way that should help here.

4.0.2 is out and includes the indexing change I mentioned, so you can try that out.

Hey guys,

I made the upgrade. It seemed to only make a small difference. I also played around with increasing the innodb_buffer_pool_size memory which also helped a little, but really commenting out the valueresource part of the query makes a significant difference. Taking it from 6 seconds to .2 seconds where the other things like the upgrade and increasing the memory shave just like a second off.

Thanks for your help,

Joseph

If you have time at some point, it would be helpful if you could share your EXPLAIN results as before, but against 4.0.2.

Here’s the explain for the full media query:

Running that query in mysql directly it takes 1.35 seconds now with the upgrade and more memory as opposed to 3.6 seconds when I tested last week. So that seems pretty good. Running it through the application was really sluggish earlier, but maybe something else was going on too.

Thanks.

One thing that stands out here is that the query as you’re getting it executed stars with a scan of all the media, then filters from there. This is true for your item query also but you just have fewer items so the effect is not as strong.

There’s a couple other things I’d like you to poke at if you would: first, if you do ANALYZE TABLE `value` and try again, any change there?