Skip to content
This repository has been archived by the owner on Feb 26, 2024. It is now read-only.

Ensure Text content paragraphs remain separate after search indexing #294

Open
jmurty opened this issue Sep 4, 2017 · 0 comments
Open

Comments

@jmurty
Copy link
Contributor

jmurty commented Sep 4, 2017

I have seen a situation (with AGSA/Tarnanthi) where some text entered on a page as a Text content item (HTML behind the scenes) becomes unsearchable because separate paragraphs of text are concatenated in the text document created during search indexing.

For example, a Text content item with the following HTML content <p>This is a tricky</p><p>test</p> can get converted to This is trickytest with no whitespace between tricky and test by the default ICEkit search document template icekit/templates/search/indexes/icekit/default.txt. This would mean that subsequent searches for the words tricky and test may not find the page containing this Text content, depending on the word-stemming rules used on a site.

I think this is caused by the striptags filter used in that template, combined with HTML content generated by the Text widget without any newlines between HTML markup tags.

It can probably be best fixed by ensuring that </p> paragraph end tags generated by the Text component include a trailing newline character.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant