SolR text field length?

Trying to map my install with SolR, I am currently doing some tests with ExtractText module, I have created a SolR mapping for my Extract Text value, but it seems that only ~4 000 characters are stored in SolR.

I have tried to map it to various field types (*_t, *_txt, *_txt_en_split_tight) but never been able to index the whole content of my extracted text (~ 80 000 characters in my first example)

Any idea where this limit is set so that I can try to increase the value?

Ok, I finally found the issue with my text, it’s because my OCR layer contains some characters that have been translater to <. If I use the default text formater, strip_tags function will remove everything that is after the <.

What I have done so far is to create a custom in /SearchSolr/src/ValueFormatter/EscapeSpecialCharsText.php :

<?php

namespace SearchSolr\ValueFormatter;

/**
 * ValueFormatter to escape all special characters from text.
 */
class EscapeSpecialCharsText implements ValueFormatterInterface
{
    public function getLabel()
    {
        return 'Escape special chars text'; // @translate
    }

    public function format($value)
    {
        return htmlspecialchars($value);
    }
}

Then this formatter is registered by adding to config/module.config.php in the searchsolr_value_formatters section the following entry:

      'escape_special_text' => ValueFormatter\EscapeSpecialCharsText::class,

That being done, I have applied this filter to my field in /admin/search-manager/solr/core/1/map/items and all my content is indexed.

That won’t fix everything, because I might have some text that is not relevant in the content, but as long as it is for searching, I assume it’s fine.

Ok, it’s integrated in last version SearchSolr.

Thanks @Daniel_KM , looking forward to move to 3.0 to use this new version.

This topic was automatically closed 250 days after the last reply. New replies are no longer allowed.