Ensure Text content paragraphs remain separate after search indexing #294

jmurty · 2017-09-04T03:12:16Z

I have seen a situation (with AGSA/Tarnanthi) where some text entered on a page as a Text content item (HTML behind the scenes) becomes unsearchable because separate paragraphs of text are concatenated in the text document created during search indexing.

For example, a Text content item with the following HTML content This is a trickytest can get converted to This is trickytest with no whitespace between tricky and test by the default ICEkit search document template icekit/templates/search/indexes/icekit/default.txt. This would mean that subsequent searches for the words tricky and test may not find the page containing this Text content, depending on the word-stemming rules used on a site.

I think this is caused by the striptags filter used in that template, combined with HTML content generated by the Text widget without any newlines between HTML markup tags.

It can probably be best fixed by ensuring that  paragraph end tags generated by the Text component include a trailing newline character.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure Text content paragraphs remain separate after search indexing #294

Ensure Text content paragraphs remain separate after search indexing #294

jmurty commented Sep 4, 2017 •

edited

Loading

Ensure Text content paragraphs remain separate after search indexing #294

Ensure Text content paragraphs remain separate after search indexing #294

Comments

jmurty commented Sep 4, 2017 • edited Loading

jmurty commented Sep 4, 2017 •

edited

Loading