Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in MessageFilter #538

Closed
wants to merge 9 commits into from

Commits on Jul 1, 2022

  1. Configuration menu
    Copy the full SHA
    ac281ea View commit details
    Browse the repository at this point in the history
  2. Add stress test for MessageFilter

    ... revealing that propagating messages via a ROS callback queue is inherently unsafe.
    If the filter is going to be destroyed, a CBQueueCallback::call() might still be in action,
    locking the underlying Signal1's mutex_.
    This will cause as assertion failure during destruction of the mutex:
    boost::mutex::~mutex(): Assertion `!posix::pthread_mutex_destroy(&m)' failed.
    rhaschke committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    aa5d97d View commit details
    Browse the repository at this point in the history
  3. Fix race condition

    When MessageFilter::signalMessage() is called asynchronouyls via a ROS callback queue,
    this call might still be in progress while another (holding Signal1::mutex_), while
    another thread might attempt to remove the MessageFilter.
    This results in a failed assertion as the MessageFilter destructor attempts to destroy a locked mutex.
    
    Here, by introducing another mutex in MessageFilter, we ensure that CBQueueCallback::call() is finished
    before destroying the MessageFilter.
    rhaschke committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    b5691c2 View commit details
    Browse the repository at this point in the history
  4. Remove debug output

    rhaschke committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    4d42700 View commit details
    Browse the repository at this point in the history
  5. Revert ros#144: Solve a bug that causes a deadlock in MessageFilter

    PR ros#144 attempted to resolve a deadlock by postponing calls to callbacks.
    However, this has the drawback of using references to TransformableRequests -
    outside the protection by transformable_requests_mutex_.
    Thus, these requests might get deleted by another thread, resulting in segfaults:
    
    terminate called after throwing an instance of 'boost::wrapexcept<boost::lock_error>'
      what():  boost: mutex lock failed in pthread_mutex_lock: Invalid argument
    
    This reverts commit bfb8038.
    rhaschke committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    55596cc View commit details
    Browse the repository at this point in the history
  6. Simplify ros#101

    rhaschke committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    ea54eac View commit details
    Browse the repository at this point in the history
  7. Alternative implementation to resolve deadlock ros#91/ros#144

    As pointed out in ros#144 the deadlock arises due to an inversion of locking order:
    - MessageFilter::add(), when removing the oldest message, locks:
      - MessageFilter::messages_mutex_
      - BufferCore::transformable_requests_mutex_ (via cancelTransformableRequest())
    - BufferCore::testTransformableRequests() locks:
      - transformable_requests_mutex_
      - MessageFilter::message_mutex_ (via MessageFilter::transformable() callback)
    
    PR ros#144 attempted to resolve the issue by postponing callback calls.
    However, this has the drawback of using references to TransformableRequests -
    outside the protection by transformable_requests_mutex_.
    These requests might got deleted by another thread meanwhile!
    
    Here the deadlock is resolved in MessageFilter::add, cancelling the requests
    outside the messages_mutex_ protection.
    rhaschke committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    5f86b1d View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    f4e05fd View commit details
    Browse the repository at this point in the history
  9. Reduce probability of race condition with CBQueueCallback

    While MessageFilter::clear() removes pending messages from the callback queue,
    there might be still callbacks active when attempting to remove the MessageFilter.
    Depending on what goes first, either the already destroyed mutex is tried to lock
    or a locked mutex is tried to destroy, both resulting in an assertion failure.
    
    This race condition cannot completely avoided.
    Using a shared mutex for all callers and moving the unique lock to the end
    of the destructor hopefully reduces the probability of such a race condition.
    rhaschke committed Jul 1, 2022
    Configuration menu
    Copy the full SHA
    12fe50c View commit details
    Browse the repository at this point in the history