Updater raises std::bad_alloc sporadically on arm64 with O3 optimizations #119

Schwo0ps · 2019-07-31T23:35:05Z

The issue is a complicated one, but here goes.

I first noticed this issue when I saw that diagnostics messages from arm64 machines sometimes arrive, but only infrequently (between 0% and 10% of the time), and eventually the following message comes from the Diagnostic Aggregator running on amd64 and it appears all messages are dropped.

[ERROR] [1564529964.093868659 /diag_agg] [/tmp/binarydeb/ros-kinetic-roscpp-1.12.14/src/libros/transport_publisher_link.cpp:TransportPublisherLink::onMessageLength:175]: a message of over a gigabyte was predicted in tcpros. that seems highly unlikely, so I'll assume protocol synchronization is lost.

At first, I though it was Endianness, but all machines are Little Endian. There are also C++ nodes which are able to communicate with each other properly on all machines. It also appears as though this only happens with the Diagnostic Updater, not any other topic.

After this, I started running a test: just running roscore and a single test node with the following code.

#include <ros/ros.h>
#include <diagnostic_updater/diagnostic_updater.h>
#include <diagnostic_updater/update_functions.h>

int main(int argc, char** argv)
{
  ros::init(argc, argv, "updater_node");
  double rate = 10.;
  diagnostic_updater::Updater updater;
  diagnostic_updater::FrequencyStatus frequency_status(
    diagnostic_updater::FrequencyStatusParam(&rate, &rate)
  );
  updater.setHardwareID("none");
  updater.add(frequency_status);

  ros::Rate r(rate);
  while (ros::ok())
  {
    frequency_status.tick();
    updater.update();
    ros::spinOnce();
    r.sleep();
  }
}

Everything works fine with amd64, but on arm64, the above issues happen. Additionally, the node just eventually crashes with std::bad_alloc. Here is the backtrace and relevant message that was published (as it looks like the error was with serialization)

#13 0x0000000000419084 in diagnostic_updater::Updater::publish (this=this@entry=0x7fffffecc0, status_vec=std::vector of length 1, capacity 1 = {...})
    at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:547
547	        publisher_.publish(msg);
(gdb) list
542	            node_name_.substr(1) + std::string(": ") + iter->name;
543	        }
544	        diagnostic_msgs::DiagnosticArray msg;
545	        msg.status = status_vec;
546	        msg.header.stamp = ros::Time::now(); // Add timestamp for ROS 0.10
547	        publisher_.publish(msg);
548	      }
549	
550	      /**
551	       * Publishes on /diagnostics and reads the diagnostic_period parameter.
(gdb) p msg
$7 = {header = {seq = 0, stamp = {<ros::TimeBase<ros::Time, ros::Duration>> = {sec = 1564612119, nsec = 323701618}, <No data fields>}, frame_id = ""}, status = std::vector of length 1, capacity 1 = {{level = 0 '\000', name = "updater_node_1564612117153179612: Frequency Status", message = "Desired frequency met", hardware_id = "none", 
      values = std::vector of length 7, capacity 7 = {{key = "Events in window", value = "22"}, {key = "Events since startup", value = "22"}, {key = "Duration of window (s)", value = "2.100290"}, {key = "Actual frequency (Hz)", value = "10.474742"}, {key = "Target frequency (Hz)", value = "10.000000"}, {
          key = "Minimum acceptable frequency (Hz)", value = "9.000000"}, {key = "Maximum acceptable frequency (Hz)", value = "11.000000"}}}}}
(gdb) bt
#0  0x0000007fb7a04528 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x0000007fb7a059e0 in __GI_abort () at abort.c:89
#2  0x0000007fb7bde254 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#3  0x0000007fb7bdbdc4 in ?? () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#4  0x0000007fb7bdbe10 in std::terminate() () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#5  0x0000007fb7bdc0d4 in __cxa_throw () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#6  0x0000007fb7bdc6d8 in operator new(unsigned long) () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#7  0x000000000040f840 in ros::serialization::serializeMessage<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > > (message=...) at /opt/ros/kinetic/include/ros/serialization.h:795
#8  0x000000000040d60c in boost::_bi::list1<boost::reference_wrapper<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const> >::operator()<ros::SerializedMessage, ros::SerializedMessage (*)(diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const&), boost::_bi::list0> (f=<optimized out>, a=<synthetic pointer>..., 
    this=<optimized out>) at /usr/include/boost/function/function_template.hpp:129
#9  boost::_bi::bind_t<ros::SerializedMessage, ros::SerializedMessage (*)(diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const&), boost::_bi::list1<boost::reference_wrapper<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const> > >::operator() (this=<optimized out>) at /usr/include/boost/bind/bind.hpp:893
#10 boost::detail::function::function_obj_invoker0<boost::_bi::bind_t<ros::SerializedMessage, ros::SerializedMessage (*)(diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const&), boost::_bi::list1<boost::reference_wrapper<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > const> > >, ros::SerializedMessage>::invoke (
    function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:138
#11 0x0000007fb7efc8b0 in ros::TopicManager::publish(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<ros::SerializedMessage ()> const&, ros::SerializedMessage&) () from /opt/ros/kinetic/lib/libroscpp.so
#12 0x000000000040fe14 in ros::Publisher::publish<diagnostic_msgs::DiagnosticArray_<std::allocator<void> > > (this=this@entry=0x7fffffee88, message=...) at /usr/include/c++/8/new:169
#13 0x0000000000419084 in diagnostic_updater::Updater::publish (this=this@entry=0x7fffffecc0, status_vec=std::vector of length 1, capacity 1 = {...}) at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:547
#14 0x000000000040c5ac in diagnostic_updater::Updater::force_update (this=0x7fffffecc0) at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:440
#15 diagnostic_updater::Updater::update (this=0x7fffffecc0) at /opt/ros/kinetic/include/diagnostic_updater/diagnostic_updater.h:390
#16 main (argc=<optimized out>, argv=<optimized out>) at /ws/src/test/src/updater_node.cpp:20

It crashes after ~30 seconds but seems to do so more quickly if multiple of the nodes are running.

As specified in the title, this only happens with GCC optimization level 3 (compiling with -O3 or CMAKE_BUILD_TYPE=Release). I tried with -O2 and it appears to work fine.

I doubt this can be easily fixed and I'm not 100% sure if the issue is within this repo or ros_comm, but any ideas would be greatly appreciated. We really would like to use -O3 throughout our code for improved performance.

The text was updated successfully, but these errors were encountered:

Ecophagy · 2022-02-01T18:44:17Z

Bit of a necro, but this looks like a gcc bug, the same as ros/ros_comm#2197 & ros/roscpp_core#130

If you need to use -O3 then I think your only option is to update your version of gcc.

peci1 · 2023-06-06T18:56:15Z

PR ros/roscpp_core#136 should fix this. I'm looking for someone who could verify. Just please notice that Focal now has GCC 9.4 by default where I could not reproduce the issue. So the test would need to be done with GCC 9.3 installed explicitly.

eborghi10 mentioned this issue Jul 4, 2022

Livox nodelet not publishing topics oxin-ros/livox_ros_driver#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updater raises std::bad_alloc sporadically on arm64 with O3 optimizations #119

Updater raises std::bad_alloc sporadically on arm64 with O3 optimizations #119

Schwo0ps commented Jul 31, 2019

Ecophagy commented Feb 1, 2022

peci1 commented Jun 6, 2023

Updater raises std::bad_alloc sporadically on arm64 with O3 optimizations #119

Updater raises std::bad_alloc sporadically on arm64 with O3 optimizations #119

Comments

Schwo0ps commented Jul 31, 2019

Ecophagy commented Feb 1, 2022

peci1 commented Jun 6, 2023