Trying to map my install with SolR, I am currently doing some tests with ExtractText module, I have created a SolR mapping for my Extract Text value, but it seems that only ~4 000 characters are stored in SolR.
I have tried to map it to various field types (*_t, *_txt, *_txt_en_split_tight) but never been able to index the whole content of my extracted text (~ 80 000 characters in my first example)
Any idea where this limit is set so that I can try to increase the value?
Ok, I finally found the issue with my text, it’s because my OCR layer contains some characters that have been translater to <. If I use the default text formater, strip_tags function will remove everything that is after the <.
What I have done so far is to create a custom in /SearchSolr/src/ValueFormatter/EscapeSpecialCharsText.php :
<?php
namespace SearchSolr\ValueFormatter;
/**
* ValueFormatter to escape all special characters from text.
*/
class EscapeSpecialCharsText implements ValueFormatterInterface
{
public function getLabel()
{
return 'Escape special chars text'; // @translate
}
public function format($value)
{
return htmlspecialchars($value);
}
}
Then this formatter is registered by adding to config/module.config.php in the searchsolr_value_formatters section the following entry: