dfox
June 14, 2018, 2:02am
41
Not much in the indexing job log:
2018-06-14T01:55:10+00:00 INFO (6): Start
2018-06-14T01:55:10+00:00 INFO (6): Index id: 1
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #2 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #3 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #4 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #5 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #6 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #7 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #8 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #9 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #10 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #11 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #12 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #13 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #14 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #15 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #16 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #17 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #18 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #19 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #20 (items)
2018-06-14T01:55:10+00:00 INFO (6): Indexing resource #48 (items)
2018-06-14T01:55:10+00:00 INFO (6): Commit
2018-06-14T01:55:10+00:00 INFO (6): Commit
2018-06-14T01:55:10+00:00 INFO (6): End
Nothing in the logs found on the Solr admin page either.
Hmmm.
dfox
June 14, 2018, 2:45am
42
I set up a new index just to confirm what I was seeing. I left the mappings and configuration in place as you see above. After running the indexing job I looked back at my Solr config to see the dynamic fields that were added. Everything appeared as before (dc_terms_s, is_public_b, etc.), except media_content_txt_en_split is not seen here.
I added additional files both via the Sideload and the HTML formatter, but nothing seems to be making it into the Solr index. If thereās anything else I can check which might help debug this, let me know.
pols12
June 14, 2018, 1:49pm
43
I believed I have found why text media are not indexed, but I still donāt understand why media ingested from HTML ingester are not indexed.
Please try to edit Solr/src/Service/ValueExtractor/ItemValueExtractorFactory.php at line 42:
use Interop\Container\ContainerInterface;
use Zend\ServiceManager\Factory\FactoryInterface;
use Solr\ValueExtractor\ItemValueExtractor;
class ItemValueExtractorFactory implements FactoryInterface
{
public function __invoke(ContainerInterface $services, $requestedName, array $options = null)
{
$api = $services->get('Omeka\ApiManager');
$config = $services->get('Config');
$baseFilepath = $config['file_store']['local']['base_path'] ?: (OMEKA_PATH . '/files');
$itemValueExtractor = new ItemValueExtractor;
$itemValueExtractor->setApiManager($api);
$itemValueExtractor->setBaseFilepath($baseFilepath);
return $itemValueExtractor;
}
}
Replace /files
with /files/original
like this:
$baseFilepath = $config['file_store']['local']['base_path'] ?: (OMEKA_PATH . '/files/original');
Besides, try to add a property to your media and to map it in order to see whether it is well indexed.
dfox
June 17, 2018, 8:52pm
44
Thank you, @pols12 . I made the change to the item extractor and was able to index the content of the sideloaded media file. I also followed your suggestion of adding metadata to this media file and mapping this property; this, too, worked like a charm.
I havenāt looked further at HTML ingester, but I donāt really need this. If I discover anything in this area, I will let you know.