[REP Index] [REP Source]

REP:	2014
Title:	Benchmarking performance in ROS 2
Author:	Víctor Mayoral-Vilches <victor at accelerationrobotics.com>, Ingo Lütkebohle <Ingo.Luetkebohle at de.bosch.com>, Christophe Bédard <bedard.christophe at gmail.com>, Rayza Araújo <araujorayza at gmail.com>
Status:	Rejected
Type:	Informational
Content-Type:	text/x-rst
Created:	29-Sept-2022
Post-History:	15-June-2023

Contents

Abstract
Motivation
Performance metrics in robotics
Methodology for benchmarking performance in ROS 2
Reference implementation and recommendations
References and Footnotes
Copyright

Abstract

This REP describes some principles and guidelines for benchmarking performance in ROS 2.

Motivation

Benchmarking is the act of running a computer program with a known workload to assess the program's relative performance. In the context of ROS 2, performance information can help roboticists design more efficient robotic systems and select the right hardware for their robotic application. It can also help understand the trade-offs between different algorithms that implement the same capability, and help them choose the best approach for their use case. Performance data can also be used to compare different versions of ROS 2 and to identify regressions. Finally, performance information can be used to help prioritize future development efforts.

The myriad combinations of robot hardware and robotics software make assessing robotic-system performance in an architecture-neutral, representative, and reproducible manner challenging. This REP attempts to provide some guidelines to help roboticists benchmark their systems in a consistent and reproducible manner by following a quantitative approach. This REP also provides a set of tools and examples to help guide roboticists while collecting and reporting performance data.

Value for stakeholders:

Package maintainers can use these guidelines to integrate performance benchmarking tools (e.g. instrumentation) and data (e.g. results, plots and datasets) in their packages.
Consumers can use the guidelines in the REP to benchmark ROS Nodes and Graphs in an architecture-neutral, representative, and reproducible manner, as well as the corresponding performance data offered in ROS packages to set expectations on the capabilities of each.
Hardware vendors and robot manufacturers can use these guidelines to show evidence of the performance of their systems solutions with ROS in an architecture-neutral, representative, and reproducible manner.

The guidelines in here are intended to be a living document that can be updated as new information becomes available.

A quantitative approach

Performance must be studied with real examples and measurements on real robotic computations, rather than simply as a collection of definitions, designs and/or marketing actions. When creating benchmarks, prefer to use realistic data and situations rather than data designed to test one edge case. Follow the quantitative approach [1] when designing your architecture.

Approaches to benchmark performance

There're different types of benchmarking approaches, specially when related to performance. The following definitions clarify the most popular terms inspired by [2]:

Functional performance testing: Functional performance is the measurable performance of the system’s functions which the user can experience. For example, in a motion planning algorithm, measures could include items like the ratio the algorithm gets the motion plan right.
Non-functional performance testing: The measurable performance of those aspects that don't belong to the system's functions. In the motion planning example, the latency the planner, the memory consumption, the CPU usage, etc.
Black-box performance testing: Measures performance by eliminating the layers above the layer-of-interest and replacing those with a specific test application that stimulates the layer-of-interest in the way you need it. This allows to gather the measurement data right inside your specific test application. The acquisition of the data (the probe) resides inside the test application. A major design constraint in such a test is that you have to eliminate the “application” in order to test the system. Otherwise, the real application and the test application would likely interfere.
Grey-box performance testing: More application-specific measure which is capable of watching internal states of the system and can measure (probe) certain points in the system, thus generate the measurement data with minimal interference. Requires to instrument the complete application.

Graphically depicted:

         Probe      Probe
         +            +
         |            |
+--------|------------|-------+     +-----------------------------+
|        |            |       |     |                             |
|     +--|------------|-+     |     |                             |
|     |  v            v |     |     |        - latency   <--------------+ Probe
|     |                 |     |     |        - throughput<--------------+ Probe
|     |     Function    |     |     |        - memory    <--------------+ Probe
|     |                 |     |     |        - power     <--------------+ Probe
|     +-----------------+     |     |                             |
|      System under test      |     |       System under test     |
+-----------------------------+     +-----------------------------+


          Functional                            Non-functional


+-------------+                     +----------------------------+
| Test App.   |                     |  +-----------------------+ |
|  + +  +  +  |                     |  |    Application        | |
+--|-|--|--|--+---------------+     |  |                   <------------+ Probe
|  | |  |  |                  |     |  +-----------------------+ |
|  v v  v  v                  |     |                            |
|     Probes                  |     |                      <------------+ Probe
|                             |     |                            |
|       System under test     |     |   System under test        |
|                             |     |                      <------------+ Probe
|                             |     |                            |
|                             |     |                            |
+-----------------------------+     +----------------------------+


         Black-Box                            Grey-box



    Probe      Probe     Probe             Probe                     Probe
    +          +          +       +-------+                          |
    |          |          |       |                                  |
+-----------------------------+   | +-----------------------------+  |
|   |          |          |   |   | |                             |  |
|   | +-----------------+ |   |   | |                             |  |
|   | |        v        | |   |   | |                             |  |
|   | |                 | |   |   | |                             |  |
|   +->     Function    +<+   |   +>+                             +<-+
|     |                 |     |     |                             |
|     +-----------------+     |     |                             |
|      System under test      |     |       System under test     |
+-----------------------------+     +-----------------------------+


            Transparent                           Opaque

Tracing and benchmarking

Tracing and benchmarking can be defined as follows:

tracing: logging (partial) execution information while the system is running. tracing is used to understand what goes on in a running software system.
benchmarking: a method of comparing the performance of various systems by running a common test.

From these definitions, inherently one can determine that both benchmarking and tracing are connected in the sense that the test/benchmark will use a series of measurements for comparison. These measurements will come from tracing probes or other logging mechanisms. In other words, tracing will collect data that will then be fed into a benchmark program for comparison.

Prior work

There're various past efforts in the robotics community to benchmark ROS robotic systems. The following are some of the most representative ones:

ros2_benchmarking : First implementation available for ROS 2, aimed to provide a framework to compare ROS and ROS 2 communications.
performance_test: Tool is designed to measure inter and intra-process communications. Runs at least one publisher and at least one subscriber, each one in one independent thread or process and records different performance metrics. It also provides a way to generate a report with the results through a different package.
reference_system: Tool designed to provide a framework for creating reference systems that can represent real-world distributed systems in order to more fairly compare various configurations of each system (e.g. measuring performance of different ROS 2 executors). It also provides a way to generate reports as well.
ros2-performance: Another framework to evaluate ROS communications and inspired on performance_test. There's a decent rationale in the form of a proposal, a good evaluation of prior work and a well documented set of experiments.
system_metrics_collector: A lightweight and real-time metrics collector for ROS 2. Automatically collects and aggregates CPU % used and memory % performance metrics used by both system and ROS 2 processes. Data is aggregated in order to provide constant time average, min, max, sample count, and standard deviation values for each collected metric. Deprecated.
ros2_latency_evaluation: A tool to benchmarking performance of a ROS 2 Node system in separate processses (initially focused on both inter-process and intra-process interactions, later focused). Forked from ros2-performance.
ros2_timer_latency_measurement: A minimal real-time safe testing utility for measuring jitter and latency. Measures nanosleep latency between ROS child threads and latency of timer callbacks (also within ROS) across two different Linux kernel setups (vanilla and a RT_PREEMPT` patched kernel).
buildfarm_perf_tests: Tests which run regularly on the official ROS 2 buildfarm. Formally, extends performance_test with additional tests that measure additional metrics including CPU usage, memory, resident anonymous memory or virtual memory.
ros2_tracing: Tracing tools for ROS 2 built upon LTTng which allow collecting runtime execution information on real-time distributed systems, using the low-overhead LTTng tracer. Performance evaluation can be scripted out of the data collected from all these trace points. The ROS 2 core layers (rmw, rcl, rclcpp) have been instrumented with LTTng probes which allow collecting information of ROS 2 targets without the need to modify the ROS 2 core code (system under test). There are various publications available about ros2_tracing [3] [4] and it is used actively to benchmark ROS 2 in real scenarios, including perception and mapping [5], hardware acceleration [6] [7] or self-driving mobility [8].

Industry standards

There are no globally accepted industry standards for benchmarking robotic systems. The closest initiative to a standardization effort in robotics is the European H2020 Project EUROBENCH which aimed at creating the first benchmarking framework for robotic systems in Europe focusing on bipedal locomotion. The project has been completed in 2022 and the results are available in [9]. The project has been a great success and has been used to benchmark a wide range of bidepal robotic systems throughout experiments however there're no public plans to escalate the project to other types of robots, nor the tools have been used elsewhere.

When looking at other related areas to robotics we find the MLPerf Inference and MLCommons initiatives which are the closest to what we are trying to achieve in ROS 2. The MLPerf Inference is an open source project that aims to define a common set of benchmarks for evaluating the performance of machine learning inference engines. The MLCommons is an open source project that aims to define a common set of benchmarks for evaluating the performance of machine learning models. Both projects have been very successful and are widely used in the industry. The MLPerf Inference project has been completed in 2021 and the results inference benchmarks available in [10]. The MLCommons project has become an industry standard in Machine Learning and the results publicly disclosed in [11].

Performance metrics in robotics

Robots are deterministic machines and their performance should be understood by considering metrics such as the following:

latency: time between the start and the completion of a task.
system reaction time: time between receipt of an external stimulus and the beginning of the system's actions (for example, time between an obstacle sensor firing and the first velocity command taking this into account)
software system reaction time: time between when an external stimulus is received by the robot's software and when the corresponding action has been executed by the software. This is usually the more directly measurable version of system reaction time.
message latency: Time between publishing a message and invocation of the corresponding callback on the receiver side
execution latency: Time between when an event leading to an execution (such as a timer firing, or a message being received) occurs, and when the corresponding callback is called
bandwidth or throughput: the total amount of work done in a given time for a task. When measuring bandwidth or throughput in a ROS 2 system Messages per second is interesting, and the total number of bytes per second is interesting too.
power: the electrical energy per unit of time consumed while executing a given task.
performance-per-watt: total amount of work (generally bandwidth or throughput) that can be delivered for every watt of power consumed.
memory: the amount of short-term data (not to be confused with storage) required while executing a given task.

These metrics can help determine performance characteristics of a robotic system. Of most relevance for robotic systems we often encounter the real-time and determinism characteristics defined as follows:

real-time: ability of completing a task's computations while meeting time deadlines
determinism: that the same external or internal event leads to the same system behavior, with executions in the same order, each time.

For example, a robotic system may be able to perform a task in a short amount of time (low latency), but it may not be able to do it in real-time. In this case, the system would be considered to be non-real-time given the time deadlines imposed. On the other hand, a robotic system may be able to perform a task in real-time, but it may not be able to do it in a short amount of time. In this case, the system would be considered to be non-interactive. Finally, a robotic system may be able to perform a task in real-time and in a short amount of time, but it may consume a lot of power. In this case, the system would be considered to be non-energy-efficient.

In another example, a robotic system that can perform a task in 1 second with a power consumption of 2W is twice as fast (latency) as another robotic system that can perform the same task in 2 seconds with a power consumption of 0.5W. However, the second robotic system is twice as efficient as the first one. In this case, the solution that requires less power would be the best option from an energy efficiency perspective (with a higher performance-per-watt). Similarly, a robotic system that has a high bandwidth but consumes a lot of energy might not be the best option for a mobile robot that must operate for a long time on a battery.

Therefore, it is important to consider different of these metrics when benchmarking a robotic system. The metrics presented in this REP are intended to be used as a guideline, and should be adapted to the specific needs of a robot.

Methodology for benchmarking performance in ROS 2

In this REP, we recommend adopting a grey-box and non-functional benchmarking approach to measure performance and allow to evaluate ROS 2 individual nodes as well as complete computational graphs. To realize it in an architecture-neutral, representative, and reproducible manner, we also recommend using the Linux Tracing Toolkit next generation (LTTng) through the ros2_tracing project, which leverages probes already inserted in the ROS 2 core layers and tools to facilitate benchmarking ROS 2 abstractions.

The following diagram shows the proposed methodology for benchmarking performance in ROS 2 which consists of 3 steps:

                                              +--------------+
                  +----------------+  rebuild |              |
                  |                +---------->              |
start  +----------> 1. trace graph |          | 2. benchmark +----------> 3. report
                  |                |          |              |
                  +----+------^--^-+          |              |
                       |      |  |            +-------+------+
                       |      |  |                    |
                       +------+  |                    |
                         LTTng   +--------------------+
                                     re-instrument

instrument both the target ROS 2 abstraction/application using LTTng. Refer to ros2_tracing for tools, documentation and ROS 2 core layers tracepoints;
trace and benchmark the ROS 2 application;
create performance reports with the results of the benchmarking.

Reference implementation and recommendations

The reader is referred to ros2_tracing and LTTng for the tools that enable a grey-boxed performance benchmarking approach in line with what the ROS 2 stack has been using (ROS 2 common packages come instrumented with LTTng probes). In addition, [3] and [4] present comprehensive descriptions of the ros2_tracing tools and the LTTng infrastructure.

Reference implementations complying with the recommendations of this REP can be found in literature for applications like perception and mapping [5], hardware acceleration [6] [7] or self-driving mobility [8]. A particular example of interest for the reader is the instrumentation of the image_pipeline ROS 2 package [12], which is a set of nodes for processing image data in ROS 2. The image_pipeline package has been instrumented with LTTng probes available in the ROS 2 Humble release, which results in various perception Components (e.g. RectifyNode Component) leveraging intrumentation which if enabled, can help trace the computational graph information flow of a ROS 2 application using such Component. The results of benchmarking the performance of image_pipeline are available in [13] and launch scripts to both trace and analyze perception graphs available in [14].

References and Footnotes

[1]	Hennessy, J. L., & Patterson, D. A. (2011). Computer architecture: a quantitative approach. Elsevier.

[2]	Pemmaiah, A., Pangercic, D., Aggarwal, D., Neumann, K., & Marcey, K. (2019) "Performance Testing in ROS 2". https://drive.google.com/file/d/15nX80RK6aS8abZvQAOnMNUEgh7px9V5S/view

[3]	(1, 2) Bédard, C., Lütkebohle, I., & Dagenais, M. (2022). ros2_tracing: Multipurpose Low-Overhead Framework for Real-Time Tracing of ROS 2. IEEE Robotics and Automation Letters, 7(3), 6511-6518.

[4]	(1, 2) Bédard, C., Lajoie, P. Y., Beltrame, G., & Dagenais, M. (2022). Message Flow Analysis with Complex Causal Links for Distributed ROS 2 Systems. arXiv preprint arXiv:2204.10208.

[5]	(1, 2) Lajoie, P. Y., Bédard, C., & Beltrame, G. (2022). Analyze, Debug, Optimize: Real-Time Tracing for Perception and Mapping Systems in ROS 2. arXiv preprint arXiv:2204.11778.

[6]	(1, 2) Mayoral-Vilches, V., Neuman, S. M., Plancher, B., & Reddi, V. J. (2022). "RobotCore: An Open Architecture for Hardware Acceleration in ROS 2". https://arxiv.org/pdf/2205.03929.pdf

[7]	(1, 2) Mayoral-Vilches, V. (2021). "Kria Robotics Stack". https://www.xilinx.com/content/dam/xilinx/support/documentation/white_papers/wp540-kria-robotics-stack.pdf

[8]	(1, 2) Li, Z., Hasegawa, A., & Azumi, T. (2022). Autoware_Perf: A tracing and performance analysis framework for ROS 2 applications. Journal of Systems Architecture, 123, 102341.

[9]	European robotic framework for bipedal locomotion benchmarking https://eurobench2020.eu/

[10]	MLPerf™ inference benchmarks https://github.com/mlcommons/inference

[11]	MLCommons https://mlcommons.org/en/

[12]	image_pipeline ROS 2 package. An image processing pipeline for ROS. Humble branch. https://github.com/ros-perception/image_pipeline/tree/humble

[13]	Case study: accelerating ROS 2 perception https://github.com/ros-acceleration/community/issues/20#issuecomment-1047570391

[14]	acceleration_examples. ROS 2 package examples demonstrating the use of hardware acceleration. https://github.com/ros-acceleration/acceleration_examples

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.