What are the best practices for full text search with Omeka S?

Not much in the indexing job log:

2018-06-14T01:55:10+00:00 INFO (6): Start
2018-06-14T01:55:10+00:00 INFO (6): Index id: 1
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #2 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #3 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #4 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #5 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #6 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #7 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #8 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #9 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #10 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #11 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #12 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #13 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #14 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #15 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #16 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #17 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #18 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #19 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #20 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #48 (items)
2018-06-14T01:55:10+00:00 INFO (6): Commit
2018-06-14T01:55:10+00:00 INFO (6): Commit
2018-06-14T01:55:10+00:00 INFO (6): End

Nothing in the logs found on the Solr admin page either.

Hmmm.

I set up a new index just to confirm what I was seeing. I left the mappings and configuration in place as you see above. After running the indexing job I looked back at my Solr config to see the dynamic fields that were added. Everything appeared as before (dc_terms_s, is_public_b, etc.), except media_content_txt_en_split is not seen here.

I added additional files both via the Sideload and the HTML formatter, but nothing seems to be making it into the Solr index. If thereā€™s anything else I can check which might help debug this, let me know.

I believed I have found why text media are not indexed, but I still donā€™t understand why media ingested from HTML ingester are not indexed.

Please try to edit Solr/src/Service/ValueExtractor/ItemValueExtractorFactory.php at line 42:

Replace /files with /files/original like this:

$baseFilepath = $config['file_store']['local']['base_path'] ?: (OMEKA_PATH . '/files/original');

Besides, try to add a property to your media and to map it in order to see whether it is well indexed.

Thank you, @pols12. I made the change to the item extractor and was able to index the content of the sideloaded media file. I also followed your suggestion of adding metadata to this media file and mapping this property; this, too, worked like a charm.

I havenā€™t looked further at HTML ingester, but I donā€™t really need this. If I discover anything in this area, I will let you know.