-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document rescore window_size anomalies #369
Comments
There is another bad anomaly related to that, that mostly becomes obvious if a smaller window_size is used and the documents after the window_size are fetched: Example:
Not sure if this a bug inside elasticsearch, the ltr plugin or just a bad designed model. Some documentation for a workaround would be a first way to go, but I guess this should also be fixed somewhere. Any advice? |
Thanks for describing another case Rudi. This sounds like a potential bug so I'd like to write a specific test case around it to get a better understanding of where the blame lies. In the meantime you could try shifting all of the outputs for your model by some value so that there are never any negative values. I'm also curious if the approach of adding a constant boost to all of the rerank docs helps to alleviate your particular case or not. In theory the boost + [model output (even if negative)] is always over the window_size + 1 doc's score so things don't get sunk. |
I could reproduce this bug with standard elasticsearch rescorers. It always happens when a rescorers tanks the scores, e.g. by multiplying a score < 1 with the original score:
I am about to open a ticket for that at Elasticsearch... |
This anomalies can be avoided by boosting the documents within the desired window_size into a higher value range. This is the solution that works for my usecase:
|
The way I deal with this is using the parameter
This way the final score is: |
@rlps as you mention yourself, the scores of all other documents outside of the "rescore window" will be 0 with this approach. At least in my case this is not desired, as the order of those documents will be random (or maybe even inconsistent?). |
Some users have encountered the scenario where their model tanks document scores. When this happens a rescore window can actually sink documents outside of the original window and propagate lower scoring documents on top! This is dangerous as it's not always obvious when it's occurring.
Document the behavior and some possible workarounds.
The text was updated successfully, but these errors were encountered: