Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodelets not tear down when nodelet manager receives SIGINT #55

Open
mikaelarguedas opened this issue Mar 9, 2017 · 6 comments
Open
Labels

Comments

@mikaelarguedas
Copy link
Member

Reposting this as a standalone issue here . Thanks @dseifert for reporting this).

To test this, prepare the following:

  1. add a destructor to nodelet_core/test_nodelet/src/plus.cpp that has a ROS_INFO (or higher) output
  2. Create this launch file:
<launch>
 <node pkg="nodelet" type="nodelet" name="nodelet_manager"  args="manager" output="screen"/>

 <node pkg="nodelet" type="nodelet" name="n1" output="screen" args="load test_nodelet/Plus nodelet_manager" />
</launch>

Now, perform these tests:

  • Test 1: run the launch file; use ps xa | grep nodelet to figure out the process ID of the nodelet load command and kill -SIGINT it ... you will see that the destructor is called
  • Test 2: run the launch file, use ps xa | grep nodelet to figure out the process ID of the nodelet manager command and kill -SIGINT it ... you will see that the destructor is NOT called
@andviane
Copy link

andviane commented Feb 4, 2019

Confirmed. We have the server thread that must be properly shutdown in destructor (seems the recommended practice). By placing the log statement there we discovered that the destructor does not run while killing the application with CTRL-C.

@jpapon
Copy link

jpapon commented Sep 21, 2019

I'm experiencing what I believe is a related issue in ros-melodic.
With various camera drivers (realsense-ros, libuvc_ros, avt_camera, etc...), when using nodelet mangers with many nodelets in them, I am unable to relaunch pipelines on an already started roscore.
So this does not work:

  1. roscore
  2. In second terminal session, roslaunch my pipeline which has 10-20 nodelets in the nodelet manager, including a camera nodelet.
  3. ctrl-c
  4. roslaunch pipeline second time.
    The second roslaunch always results in a hang or a crash (depending on the camera/pipeline). This seems to be because things aren't unloading properly.
    Killing and restarting the roscore allows me to run the pipeline again (once).

On the other hand, if I don't have a separate roscore running, I can kill/relaunch the pipeline without issue.

Is anyone aware of a workaround other than not having a separate roscore? Having to restart roscore can be problematic in larger distributed systems.

@tompe17
Copy link

tompe17 commented Dec 2, 2019

I see a similar issue to what jpapon describes.

I have a launch files that starts 350 nodes or nodelets. And if I do not restart the roscore when re-running the launch file I get problem with nodes (are nodelets) being killed because name duplication. But the different names should be in different namespace so that should not be an issue.

If I let the launch file start the roscore then it works without any problems.

rosnode list

does not show any nodes left after I have killed the launch file. And ps doesn not show any hanging processes. rosnode cleanup did not help.

@doronhi
Copy link

doronhi commented Jan 1, 2020

I have the same issue with realsense2_camera (realsense-ros).
After running roslaunch realsense2_camera rs_camera.launch,
The command rosnode kill /camera/realsense2_camera triggers the nodelet's destructor while rosnode kill /camera/realsense2_camera_manager and "Ctrl-C" do not.

@YoshuaNava
Copy link

YoshuaNava commented Jul 29, 2020

Hi all,
In my view this could be happening because when trying to shut down the nodelet

  1. We might be breaking the bond with ROS before the components are unloaded
  2. There is no ros::Time available to leave waits like the one for service calls to shutdown: https://github.com/ros/nodelet_core/blob/indigo-devel/nodelet/src/nodelet.cpp#L192

I think the second point would be easy to verify, by adding a ros::Time::shutdown(); to ensure that ros time users stop waiting.

Further explanation

Using this as reference: http://docs.ros.org/diamondback/api/roscpp/html/init_8cpp_source.html

Request shutdown sets a global variable inside the node class that should smoothly shut down a node in the next iteration of spin. (Line 140) That request is serviced later by the Poll Manager object of the node, in an asynchronous manner.

Shutdown actually shuts down the queues, time loggers and connections of a node. (Line 519)

However, shutdown might not advance to shut down time, which is required to stop nodes that use ros::Rate and use custom signal handlers and threads. In this case you have to manually shut down the time tracking instance.

In both cases there are recursive mutexes to prevent that multiple shutdown calls crash the node.

@LucasWaelti
Copy link

A possible work around would be to kill the nodelet manager. For instance, assume the following nodes are running but need to be restarted:

/stereo/stereo_cam_nodelet
/stereo/stereo_nodelet_manager

By running the command:

rosnode kill /stereo/stereo_nodelet_manager

all associated nodelets are killed and can then be restarted. Killing /stereo/stereo_cam_nodelet first prevents to restart the nodelets however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants