-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added functions Nodelet::ok() and Nodelet::requestStop() #116
base: noetic-devel
Are you sure you want to change the base?
Conversation
c3b4c55
to
ab95aa5
Compare
This is how I tested the sequence of destructor calls (together with the basic functioning of this PR): class MyNodelet : public nodelet::Nodelet
{
public:
~MyNodelet()
{
ROS_INFO("~MyNodelet start %d", gettid());
ros::WallDuration(3).sleep();
ROS_INFO("~MyNodelet end %d", gettid());
}
void onInit() override
{
ROS_INFO("onInit start");
this->sub = this->getNodeHandle().subscribe("/a", 1, &MyNodelet::cb, this);
ros::WallDuration(1).sleep();
ROS_INFO("onInit end");
}
void cb(const std_msgs::Header& m)
{
ROS_INFO("cb start %d %s", gettid(), this->ok() ? "1" : "0");
ros::WallDuration(10).sleep();
ROS_INFO("cb end %d %s", gettid(), this->ok() ? "1" : "0");
}
ros::Subscriber sub;
};
int main(int argc, char **argv)
{
ros::init(argc, argv, "test");
ros::NodeHandle nh;
{
nodelet::Loader l(boost::function<boost::shared_ptr<nodelet::Nodelet> (const std::string&)>([](const std::string& lookup_name){
return boost::make_shared<MyNodelet>();
}));
EXPECT_TRUE(l.load("my_nodelet", "MyNodelet", {}, {}));
auto pub = nh.advertise<std_msgs::Header>("/a", 1);
ros::WallDuration(1).sleep();
ROS_INFO("%i", pub.getNumSubscribers());
ROS_INFO("pub %d", gettid());
pub.publish(std_msgs::Header());
ros::WallDuration(2).sleep();
ROS_INFO("before unload %d", gettid());
l.unload("my_nodelet");
ROS_INFO("after unload %d", gettid());
ros::WallDuration(10).sleep();
ROS_INFO("~l start %d %lu", gettid(), l.listLoadedNodelets().size());
}
ROS_INFO("~l end %d", gettid());
} This is what is printed to console, showing that
When I comment out the
|
Does anybody know about some other publicized ways how to run nodelets than the standard |
A workaround until this PR is merged: #include <sstream>
#define private public
#include <nodelet/nodelet.h>
#include <nodelet/detail/callback_queue.h>
#include <nodelet/detail/callback_queue_manager.h>
#undef private
bool isCallbackQueueValid(ros::CallbackQueueInterface* queue)
{
auto nodeletQueue = dynamic_cast<nodelet::detail::CallbackQueue*>(queue);
// if not a nodelet callback queue, we don't know what to do, so we rather report the queue as valid
if (nodeletQueue == nullptr)
return true;
const auto& queues = nodeletQueue->parent_->queues_;
return queues.find(nodeletQueue) != queues.end();
}
class MyNodelet : public nodelet::Nodelet
{
bool ok() const
{
return inited_ &&
::cras::impl::isCallbackQueueValid(this->getNodeHandle().getCallbackQueue()) &&
::cras::impl::isCallbackQueueValid(this->getMTNodeHandle().getCallbackQueue());
}
} This works because the callback queue of the nodelet gets removed from the CallbackQueueManager as soon as it is unloaded, even when callbacks are still running. However, this approach is probably pretty fragile, and it needs to access private and detail-namespaced parts of the nodelet code. |
@peci1 Is it possible to create an automated test to verify the correct functionality? |
@gbiggs I'm sorry it took so long, but the tests are here! I hope they test the behavior thoroughly. There is requirement that was not obvious to me at the first time - the destructor cannot finish as long as any callbacks querying Alternatively, there could be a shared pointer to some independent bool object that would be set to false by |
Have you solved that problem? |
You can see one of the possible solutions in the added unit test - manually adding locking into downstream classes. The alternative I described above would be better in that it wouldn't require explicit support from downstream code, but it would require ABI-difficult changes. But I'm still not sure this is actually needed. The thing that was not obvious (and I'll maybe add it to nodelet docs on wiki) is that any callback accesing I guess strictly checking the instance validity would resolve some of the mysterious crashes on Ctrl-C with (not only?) nodelets. The bad thing is that the best generally usable primitive for this kind of synchronization is a reverese semaphore which has no easily available implementations in any libraries rosdep knows about (I'll publish my implementation soon as a part of our utility library). Based on the above-mentioned thoughts, I think the implementation of |
I wrote the reverse semaphore implementation - here and here. It could be easily copy-pasted into this library. In a class where async destructor call is expected, I add this as the last statements in destructor: this->callbackSemaphore.disable();
this->callbackSemaphore.waitZero(); This tells the semaphore to stop giving out "leases" and wait until all "leases" are returned before destroying the instance. In ALL callbacks accessing SemaphoreGuard<ReverseSemaphore> guard(this->callbackSemaphore);
if (!guard.acquired())
return false; Everything after the guard is guaranteed to have a valid I still see two problems:
I'm still thinking if this does not have a nice solution that would not require downstream classes to do anything special. Maybe some wrapped |
Oh man, please forget everything I wrote about The callback queue makes sure the nodelet is not destroyed while a callback is running: nodelet_core/nodelet/src/callback_queue.cpp Lines 66 to 79 in 6265cac
The tracked object in that case is weak pointer from In case the user would spin some custom threads, he should also take a weak pointer of I've faced the problem with destroyed |
…age resembles the real one
Now I'm satisfied with the state of this PR and pretty confident it is usable and useful. It is ready to be reviewed. I added other integration tests that verify the standard functionality inside a nodelet manager and how it interacts with explicit unloading of the nodelet, killing the nodelet loader node, and killing the nodelet manager. Along the way, I discovered a bug in bond package which could get the test nodes into infinite wait - ros/bond_core#93 . This is worked around by making sure at least two bond heartbeat messages have been sent by each bond end. There was also a bug in Last, a change of order of member variables was needed in |
A year has passed with this PR being ready and without a review. Could I please ask a few eyes on this? Our downstream shim that basically reimplements this PR as a custom Nodelet child class has been used for a few years already and it seems to be working well. |
Could help resolving ros/geometry2#381.
This function tells whether it is okay to use the nodelet. The states go as follows:
onInit()
is called -> truerequestStop()
is called or destructor started to be called -> falseThis helps client code recognize that the nodelet was asked to be unloaded and the code should stop everything. It is basically a nodelet variant of
ros::ok()
.The
ManagedNodelet
class fromloader.cpp
callsrequestStop()
in its destructor to tell the nodelet it is being stopped. I verified the sequence of destructor calls when using the standard nodelet loader, andManagedNodelet
is destroyed immediately whenunload()
is called, while theNodeletPtr
it contains has an additional copy created here until all running callbacks have finished. So the callbacks running while the nodelet was unloaded will see the change fromok() == true
took() == false
. The nodelet destructor will not be called until these callbacks finish, so it is safe to access the nodelet's methods and state in these callbacks.The call to
requestShutdown()
in~Nodelet()
is maybe superfluous as it might be too late to setok() = false
when all parent destructors have already been run. The call is definitely harmless, but it might give the impression that it is okay to not callrequestStop()
and expect that it is called from the nodelet's destructor. It was mainly meant as a safeguard against forgetting to callrequestStop()
, but maybe it's a wrong safeguard. I can remove the call from destructor if more people agree on that.One of the cases where an
ok()
function is painfully missing is the above linked issue. If the TF buffer knew when the nodelet is asked to be unloaded, it could stop the lookup while loop even when ROS time is paused. Currently, when the time is paused, the loop keeps running indefinitely without a way of knowing that the nodelet was asked to unload...This PR doesn't break ABI and API is only added, not changed or removed.
This PR comes with no tests yet, but I will add them once there is a 👍 from the maintainers that this functionality is desired and could be merged.