[REP Index] [REP Source]
REP: | 2014 |
---|---|
Title: | Benchmarking performance in ROS 2 |
Author: | Víctor Mayoral-Vilches <victor at accelerationrobotics.com>, Ingo Lütkebohle <Ingo.Luetkebohle at de.bosch.com>, Christophe Bédard <bedard.christophe at gmail.com>, Rayza Araújo <araujorayza at gmail.com> |
Status: | Rejected |
Type: | Informational |
Content-Type: | text/x-rst |
Created: | 29-Sept-2022 |
Post-History: | 15-June-2023 |
Contents
This REP describes some principles and guidelines for benchmarking performance in ROS 2.
Benchmarking is the act of running a computer program with a known workload to assess the program's relative performance. In the context of ROS 2, performance information can help roboticists design more efficient robotic systems and select the right hardware for their robotic application. It can also help understand the trade-offs between different algorithms that implement the same capability, and help them choose the best approach for their use case. Performance data can also be used to compare different versions of ROS 2 and to identify regressions. Finally, performance information can be used to help prioritize future development efforts.
The myriad combinations of robot hardware and robotics software make assessing robotic-system performance in an architecture-neutral, representative, and reproducible manner challenging. This REP attempts to provide some guidelines to help roboticists benchmark their systems in a consistent and reproducible manner by following a quantitative approach. This REP also provides a set of tools and examples to help guide roboticists while collecting and reporting performance data.
Value for stakeholders:
The guidelines in here are intended to be a living document that can be updated as new information becomes available.
Performance must be studied with real examples and measurements on real robotic computations, rather than simply as a collection of definitions, designs and/or marketing actions. When creating benchmarks, prefer to use realistic data and situations rather than data designed to test one edge case. Follow the quantitative approach [1] when designing your architecture.
There're different types of benchmarking approaches, specially when related to performance. The following definitions clarify the most popular terms inspired by [2]:
Graphically depicted:
Probe Probe + + | | +--------|------------|-------+ +-----------------------------+ | | | | | | | +--|------------|-+ | | | | | v v | | | - latency <--------------+ Probe | | | | | - throughput<--------------+ Probe | | Function | | | - memory <--------------+ Probe | | | | | - power <--------------+ Probe | +-----------------+ | | | | System under test | | System under test | +-----------------------------+ +-----------------------------+ Functional Non-functional +-------------+ +----------------------------+ | Test App. | | +-----------------------+ | | + + + + | | | Application | | +--|-|--|--|--+---------------+ | | <------------+ Probe | | | | | | | +-----------------------+ | | v v v v | | | | Probes | | <------------+ Probe | | | | | System under test | | System under test | | | | <------------+ Probe | | | | | | | | +-----------------------------+ +----------------------------+ Black-Box Grey-box Probe Probe Probe Probe Probe + + + +-------+ | | | | | | +-----------------------------+ | +-----------------------------+ | | | | | | | | | | | | +-----------------+ | | | | | | | | | v | | | | | | | | | | | | | | | | | | +-> Function +<+ | +>+ +<-+ | | | | | | | +-----------------+ | | | | System under test | | System under test | +-----------------------------+ +-----------------------------+ Transparent Opaque
Tracing and benchmarking can be defined as follows:
From these definitions, inherently one can determine that both benchmarking and tracing are connected in the sense that the test/benchmark will use a series of measurements for comparison. These measurements will come from tracing probes or other logging mechanisms. In other words, tracing will collect data that will then be fed into a benchmark program for comparison.
There're various past efforts in the robotics community to benchmark ROS robotic systems. The following are some of the most representative ones:
There are no globally accepted industry standards for benchmarking robotic systems. The closest initiative to a standardization effort in robotics is the European H2020 Project EUROBENCH which aimed at creating the first benchmarking framework for robotic systems in Europe focusing on bipedal locomotion. The project has been completed in 2022 and the results are available in [9]. The project has been a great success and has been used to benchmark a wide range of bidepal robotic systems throughout experiments however there're no public plans to escalate the project to other types of robots, nor the tools have been used elsewhere.
When looking at other related areas to robotics we find the MLPerf Inference and MLCommons initiatives which are the closest to what we are trying to achieve in ROS 2. The MLPerf Inference is an open source project that aims to define a common set of benchmarks for evaluating the performance of machine learning inference engines. The MLCommons is an open source project that aims to define a common set of benchmarks for evaluating the performance of machine learning models. Both projects have been very successful and are widely used in the industry. The MLPerf Inference project has been completed in 2021 and the results inference benchmarks available in [10]. The MLCommons project has become an industry standard in Machine Learning and the results publicly disclosed in [11].
Robots are deterministic machines and their performance should be understood by considering metrics such as the following:
These metrics can help determine performance characteristics of a robotic system. Of most relevance for robotic systems we often encounter the real-time and determinism characteristics defined as follows:
For example, a robotic system may be able to perform a task in a short amount of time (low latency), but it may not be able to do it in real-time. In this case, the system would be considered to be non-real-time given the time deadlines imposed. On the other hand, a robotic system may be able to perform a task in real-time, but it may not be able to do it in a short amount of time. In this case, the system would be considered to be non-interactive. Finally, a robotic system may be able to perform a task in real-time and in a short amount of time, but it may consume a lot of power. In this case, the system would be considered to be non-energy-efficient.
In another example, a robotic system that can perform a task in 1 second with a power consumption of 2W is twice as fast (latency) as another robotic system that can perform the same task in 2 seconds with a power consumption of 0.5W. However, the second robotic system is twice as efficient as the first one. In this case, the solution that requires less power would be the best option from an energy efficiency perspective (with a higher performance-per-watt). Similarly, a robotic system that has a high bandwidth but consumes a lot of energy might not be the best option for a mobile robot that must operate for a long time on a battery.
Therefore, it is important to consider different of these metrics when benchmarking a robotic system. The metrics presented in this REP are intended to be used as a guideline, and should be adapted to the specific needs of a robot.
In this REP, we recommend adopting a grey-box and non-functional benchmarking approach to measure performance and allow to evaluate ROS 2 individual nodes as well as complete computational graphs. To realize it in an architecture-neutral, representative, and reproducible manner, we also recommend using the Linux Tracing Toolkit next generation (LTTng) through the ros2_tracing project, which leverages probes already inserted in the ROS 2 core layers and tools to facilitate benchmarking ROS 2 abstractions.
The following diagram shows the proposed methodology for benchmarking performance in ROS 2 which consists of 3 steps:
+--------------+ +----------------+ rebuild | | | +----------> | start +----------> 1. trace graph | | 2. benchmark +----------> 3. report | | | | +----+------^--^-+ | | | | | +-------+------+ | | | | +------+ | | LTTng +--------------------+ re-instrument
The reader is referred to ros2_tracing and LTTng for the tools that enable a grey-boxed performance benchmarking approach in line with what the ROS 2 stack has been using (ROS 2 common packages come instrumented with LTTng probes). In addition, [3] and [4] present comprehensive descriptions of the ros2_tracing tools and the LTTng infrastructure.
Reference implementations complying with the recommendations of this REP can be found in literature for applications like perception and mapping [5], hardware acceleration [6] [7] or self-driving mobility [8]. A particular example of interest for the reader is the instrumentation of the image_pipeline ROS 2 package [12], which is a set of nodes for processing image data in ROS 2. The image_pipeline package has been instrumented with LTTng probes available in the ROS 2 Humble release, which results in various perception Components (e.g. RectifyNode Component) leveraging intrumentation which if enabled, can help trace the computational graph information flow of a ROS 2 application using such Component. The results of benchmarking the performance of image_pipeline are available in [13] and launch scripts to both trace and analyze perception graphs available in [14].
[1] | Hennessy, J. L., & Patterson, D. A. (2011). Computer architecture: a quantitative approach. Elsevier. |
[2] | Pemmaiah, A., Pangercic, D., Aggarwal, D., Neumann, K., & Marcey, K. (2019) "Performance Testing in ROS 2". https://drive.google.com/file/d/15nX80RK6aS8abZvQAOnMNUEgh7px9V5S/view |
[3] | (1, 2) Bédard, C., Lütkebohle, I., & Dagenais, M. (2022). ros2_tracing: Multipurpose Low-Overhead Framework for Real-Time Tracing of ROS 2. IEEE Robotics and Automation Letters, 7(3), 6511-6518. |
[4] | (1, 2) Bédard, C., Lajoie, P. Y., Beltrame, G., & Dagenais, M. (2022). Message Flow Analysis with Complex Causal Links for Distributed ROS 2 Systems. arXiv preprint arXiv:2204.10208. |
[5] | (1, 2) Lajoie, P. Y., Bédard, C., & Beltrame, G. (2022). Analyze, Debug, Optimize: Real-Time Tracing for Perception and Mapping Systems in ROS 2. arXiv preprint arXiv:2204.11778. |
[6] | (1, 2) Mayoral-Vilches, V., Neuman, S. M., Plancher, B., & Reddi, V. J. (2022). "RobotCore: An Open Architecture for Hardware Acceleration in ROS 2". https://arxiv.org/pdf/2205.03929.pdf |
[7] | (1, 2) Mayoral-Vilches, V. (2021). "Kria Robotics Stack". https://www.xilinx.com/content/dam/xilinx/support/documentation/white_papers/wp540-kria-robotics-stack.pdf |
[8] | (1, 2) Li, Z., Hasegawa, A., & Azumi, T. (2022). Autoware_Perf: A tracing and performance analysis framework for ROS 2 applications. Journal of Systems Architecture, 123, 102341. |
[9] | European robotic framework for bipedal locomotion benchmarking https://eurobench2020.eu/ |
[10] | MLPerf™ inference benchmarks https://github.com/mlcommons/inference |
[11] | MLCommons https://mlcommons.org/en/ |
[12] | image_pipeline ROS 2 package. An image processing pipeline for ROS. Humble branch. https://github.com/ros-perception/image_pipeline/tree/humble |
[13] | Case study: accelerating ROS 2 perception https://github.com/ros-acceleration/community/issues/20#issuecomment-1047570391 |
[14] | acceleration_examples. ROS 2 package examples demonstrating the use of hardware acceleration. https://github.com/ros-acceleration/acceleration_examples |
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.