Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging Strategy for Multiple Similar Items #31

Open
ayush-vibrant opened this issue Oct 31, 2023 · 0 comments
Open

Merging Strategy for Multiple Similar Items #31

ayush-vibrant opened this issue Oct 31, 2023 · 0 comments

Comments

@ayush-vibrant
Copy link

I am not very clear about the current merging strategy for multiple similar items. I've read the code, but still have few doubts.

In my opinion, while the current system efficiently merges similar items, there's room to enhance the nuance and depth of the merged output, especially when several items closely match a new addition.

  • Observation: In an example where the list contains items like "buy apples", "purchase apples", and "get some apples", adding "acquire apples" led to a singular merged item: "Purchase apples". While this merge is accurate, the strategy could benefit from further refinement. (The updated refined strategy isn't directly applicable to this particular example, but there can be so many other cases where it could be helpful)
  • Suggested Approach:
    1. Weighted Average Merging: Instead of merging strictly based on the highest similarity, we could merge using a weighted average determined by the similarity scores of all corresponding items. This might result in a more detailed representation.
    2. Illustrative Example for Weighted Average Merging:
      Suppose we deduce similarity scores for "acquire apples" with our list items as follows:
      • "buy apples": 94%
      • "purchase apples": 92%
      • "get some apples": 90%
        A simplistic merging approach might default to "buy apples" because of its top similarity score. However, if we employ a weighted average technique, the eventual merged representation could incorporate elements from all three items, capturing the essence of each.

PS: I think currently this functionality is solved (in a proxy manner) by playing with threshold values, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant