Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix poll notify #14670

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Fix poll notify #14670

wants to merge 3 commits into from

Conversation

jlaitine
Copy link
Contributor

@jlaitine jlaitine commented Nov 6, 2024

Summary

Fix a race condition in poll_notify, revert seemingly unnecessary change in serial driver.

Originally, the race condition was found with an older nuttx version inside the poll_notify, and the same race condition exists also in the current upstream version.

This PR has the first patch, which protects from the RMW issue. Second patch is an optimization to the poll_notify, but also fixes another race which may cause callback being called twice.

For the serial driver, the issue has been circumvented in #13914, but the race condition should be fixed in poll_notify, as it exists for all the poll interfaces, and not just serial driver. I believe that the serial driver fix is unnecessary, so third patch is reverting that - it is not nice to add sched_locks around the drivers unnecessarily.

Impact

Impacts the implementation of poll / poll_notify, fixing a read-modify-write race condition.

Testing

The code has been inspected by 2 engineers, and desk-tested with real application using the poll interface. However, I don't have a possibility to run extensive testing which would show the exact issue which was fixed in #13914, so I'd appreciate if the solution is reviewed & tested by e.g. @CV-Bowen who fixed the race condition for serial driver.

@github-actions github-actions bot added Area: Drivers Drivers issues Area: File System File System issues Size: S The size of the change in this PR is small labels Nov 6, 2024
@nuttxpr
Copy link

nuttxpr commented Nov 6, 2024

[Experimental Bot, please feedback here]

Yes, this PR mostly meets the NuttX requirements, although some clarifications and additions are needed.

Missing/Needs Improvement:

  • Summary: While the summary explains the what, it lacks detail on how the changes fix the race condition. Describe the specific mechanism used (e.g., mutex, atomic operation, etc.) in each patch. Briefly mention why the serial driver change in serial:update some bug fix,and functional improvements #13914 is believed to be unnecessary.
  • Impact:
    • User Impact: While it fixes a bug, does this introduce any behavioral changes the user might observe? If the answer is truly NO, explicitly state "NO". If there are subtle timing changes that could affect user code (even if unlikely), mention that.
    • Build Impact: NO (Explicitly state)
    • Hardware Impact: NO (Explicitly state)
    • Documentation Impact: NO (Explicitly state, or YES if documentation needs updating).
    • Security Impact: This fixes a race condition. This likely has positive security implications. Describe this briefly. E.g., "YES - Improves robustness and potentially mitigates denial-of-service possibilities due to the race condition."
    • Compatibility Impact: NO (Explicitly state, unless there's a subtle edge case)
  • Testing: The testing section is weak. Desk testing and code inspection are good, but insufficient for a race condition fix. While you may not be able to reproduce the original issue, you should attempt to create a test case that stresses the poll_notify mechanism and demonstrates that the race condition is no longer present. Even a simple test case is better than none. Explain why you can't fully reproduce the original issue if possible. Reaching out to @CV-Bowen for review and testing is good practice.

Recommendations:

  1. Elaborate on the "how" in the Summary. Provide a concise, technical description of the fix for each patch.
  2. Complete all Impact sections explicitly. Even if the answer is "NO", state it clearly.
  3. Strengthen the Testing section. Create at least one test case, even if it's a simple one, to demonstrate the absence of the race condition after your changes. Document the limitations of your testing environment and explain why reproducing the original issue is difficult.

By addressing these points, you will significantly improve the quality of your PR and make it easier for reviewers to understand and accept your changes.

t-salminen and others added 3 commits November 8, 2024 09:09
pollfd revents value is occasionally zeroed by race condition when
poll_notify() is called. This results to false timeout return value
from poll().

This behavior was found out when application called tcdrain() before
calling uart poll().

Signed-off-by: Tero Salminen <[email protected]>
…events

This is an optimization.

Signed-off-by: Jukka Laitinen <[email protected]>
@jlaitine
Copy link
Contributor Author

jlaitine commented Nov 8, 2024

re-based to current master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Drivers Drivers issues Area: File System File System issues Size: S The size of the change in this PR is small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants