Profiling

OSQP includes two mechanisms for measuring the amount of time spent in the solver:

  • Basic profiling - provides simple measurements of start/stop times for the solver

  • Advanced profiling - Annotates regions of the code so external profilers (e.g. NVidia Nsight, Intel VTune, etc.) can measure the time of each part of the solver.

Basic Profiling

Basic solver profiling is enabled by the OSQP_ENABLE_PROFILING CMake build option, and uses the system’s clock to time the setup and solve phases. The runtime for the solver is displayed alongside the solver status when the solver finishes, and is also contained inside fields of the OSQPInfo structure.

Advanced Profiling

Advanced solver profiling is available with the use of the following external tools:

While these profilers can be used with normal builds of OSQP, using the OSQP CMake build option OSQP_PROFILER_ANNOTATIONS will enable section annotation, telling the profiler when sections of the code start/end to allow more fine-grained timing of individual sections of the various solver phases.

At runtime, the OSQP profiler annotations are controlled using the OSQPSettings::profiler_level settings variable. This setting can take three different values:

  • 0 - Disable all annotation information

  • 1 - Annotate only OSQP ADMM sections

  • 2 - Annotate both OSQP ADMM sections and underlying linear algebra sections

NVidia Nsight

When using the NVidia CUDA algebra backend with the profiler annotations enabled, only the NVidia Nsight profiler should be used. OSQP uses the NVidia NVTX library library to report solver sections, which is selected using the OSQP_PROFILER_ANNOTATIONS=nvtx CMake option during configuration.

Once compiled with the NVTX library support, OSQP will report profiling information when run using the NVidia Nsight profiler tool. The timing information will be namespaced into the osqp domain, and is reported in the NVTX ranges section of any reports produced.

Command line profiling

To use the command line to profile the demo executable with the CUDA algebra and report the OSQP timings, the following should be run on Linux:

nsys profile -o <report_name> osqp_demo
nsys analyze <report_name>.nsys-rep
nsys stats --report nvtx_sum <report_name>.sqlite

This will then report all the timing information for the OSQP sections of the code in a table on the command line, similar to the one shown below.

** NVTX Range Summary (nvtx_sum):

Time (%)  Total Time (ns)  Instances    Avg (ns)       Med (ns)      Min (ns)     Max (ns)    StdDev (ns)    Style                    Range
--------  ---------------  ---------  -------------  -------------  -----------  -----------  ------------  -------  ---------------------------------------
   47.8      337,159,170          1  337,159,170.0  337,159,170.0  337,159,170  337,159,170           0.0  PushPop  osqp:Problem setup
   13.1       92,137,833          1   92,137,833.0   92,137,833.0   92,137,833   92,137,833           0.0  PushPop  osqp:Solving optimization problem
    9.4       66,362,729         94      705,986.5      466,710.5      278,144   13,770,989   1,435,295.6  PushPop  osqp:Solve the linear system
    9.2       64,993,272         90      722,147.5      540,812.5      352,410   14,033,099   1,448,686.7  PushPop  osqp:ADMM iteration
    8.4       59,295,017         90      658,833.5      482,775.0      295,428   13,872,670   1,437,104.2  PushPop  osqp:KKT system solve in ADMM iteration
    5.4       37,863,537          1   37,863,537.0   37,863,537.0   37,863,537   37,863,537           0.0  PushPop  osqp:Problem data scaling
    2.4       17,114,751          2    8,557,375.5    8,557,375.5      877,863   16,236,888  10,860,470.7  PushPop  osqp:Initialize linear system solver
    2.0       13,913,196          1   13,913,196.0   13,913,196.0   13,913,196   13,913,196           0.0  PushPop  osqp:Solution polishing
    1.3        9,058,039        301       30,093.2       27,691.0       23,970       88,999       8,018.7  PushPop  osqp:Matrix-vector multiplication
    0.8        5,615,895         90       62,398.8       55,515.5       47,176      158,284      19,590.5  PushPop  osqp:Vector updates in ADMM iteration
    0.2        1,712,233         90       19,024.8       16,265.5       14,119       72,884       7,576.9  PushPop  osqp:Projection in ADMM iteration

GUI Profiling

To use the NVidia Nsight GUI to profile OSQP, ensure that the Collect NVTX trace project option is selected, and the osqp domain is included in any NVTX domain filters that are setup. Then, the OSQP sections will be shown on the timeline, and all the OSQP sections and events can be seen in the Events view. A sample of the Nsight GUI with OSQP section information included is shown below.

Sample Nsight timeline and event panel

Intel VTune

When using CPU-based backends, OSQP can be profiled using the Intel VTune profiler. OSQP integration with VTune is enabled using the OSQP_PROFILER_ANNOTATIONS=itt CMake build option, which enables the ITT (Instrumentation and Tracing Technology APIs) library for reporting section information. OSQP reports the various parts of the solver as a Task under the :osqp domain.

GUI Profiling

To use Intel VTune to profile OSQP with the profiler annotations, run a Hotspot analysis, and ensure the Analyze user tasks, events and counters option is selected. Once run, the OSQP annotations can be seen in the application’s trace in the Platform timeline view, and also in the tooltip when the mouse hovers over an item in the timeline, as shown below.

Sample VTune timeline panel

More exact information about the various timings and all the function calls in each part of the solver can be viewed on the Bottom-up tab of the analysis window, with the Task Type / Function / Call Stack grouping, as shown below.

Sample VTune GUI task list

AMD OmniTrace

OSQP can be profiled using the AMD OmniTrace profiler, where OSQP will use the OmniTrace user API to report sections of the code for timing. To enable the section reporting, use the OSQP_PROFILER_ANNOTATIONS=omnitrace CMake option during configuration.

OmniTrace uses the command line to profile and report the OSQP timings. To build and run an instrumented OSQP executable, the following should be run on Linux (replacing osqp_demo with the desired program):

omnitrace-instrument -l --min-instructions=8 -o -- ./osqp_demo
omnitrace-run --profile -- ./osqp_demo.inst 20 4 100

After running, this creates a new directory containing the output, with a directory for each run inside. The timings for each section of the code can be viewed in the wall_clock.txt file, with a machine-readable timing report in the wall_clock.json file. A sample wall_clock.txt output is shown below.

|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                 REAL-CLOCK TIMER (I.E. WALL-CLOCK TIMER)                                                                 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                        LABEL                          | COUNT  | DEPTH  |   METRIC   | UNITS  |   SUM    |   MEAN   |   MIN    |   MAX    |   VAR    | STDDEV   | % SELF |
|-------------------------------------------------------|--------|--------|------------|--------|----------|----------|----------|----------|----------|----------|--------|
| |0>>> osqp_demo.inst                                  |      1 |      0 | wall_clock | sec    | 0.001535 | 0.001535 | 0.001535 | 0.001535 | 0.000000 | 0.000000 |   65.2 |
| |0>>> |_Problem setup                                 |      1 |      1 | wall_clock | sec    | 0.000074 | 0.000074 | 0.000074 | 0.000074 | 0.000000 | 0.000000 |   69.2 |
| |0>>>   |_Problem data scaling                        |      1 |      2 | wall_clock | sec    | 0.000008 | 0.000008 | 0.000008 | 0.000008 | 0.000000 | 0.000000 |  100.0 |
| |0>>>   |_Initialize linear system solver             |      1 |      2 | wall_clock | sec    | 0.000015 | 0.000015 | 0.000015 | 0.000015 | 0.000000 | 0.000000 |   85.0 |
| |0>>>     |_Symbolic factorization in direct solver   |      1 |      3 | wall_clock | sec    | 0.000001 | 0.000001 | 0.000001 | 0.000001 | 0.000000 | 0.000000 |  100.0 |
| |0>>>     |_Numeric factorization in direct solver    |      1 |      3 | wall_clock | sec    | 0.000001 | 0.000001 | 0.000001 | 0.000001 | 0.000000 | 0.000000 |  100.0 |
| |0>>> |_Solving optimization problem                  |      1 |      1 | wall_clock | sec    | 0.000460 | 0.000460 | 0.000460 | 0.000460 | 0.000000 | 0.000000 |   16.9 |
| |0>>>   |_ADMM iteration                              |     25 |      2 | wall_clock | sec    | 0.000331 | 0.000013 | 0.000012 | 0.000020 | 0.000000 | 0.000002 |   33.2 |
| |0>>>     |_KKT system solve in ADMM iteration        |     25 |      3 | wall_clock | sec    | 0.000136 | 0.000005 | 0.000005 | 0.000009 | 0.000000 | 0.000001 |   46.3 |
| |0>>>       |_Solve the linear system                 |     25 |      4 | wall_clock | sec    | 0.000073 | 0.000003 | 0.000003 | 0.000004 | 0.000000 | 0.000000 |   80.5 |
| |0>>>         |_Backsolve in direct solver            |     25 |      5 | wall_clock | sec    | 0.000014 | 0.000001 | 0.000001 | 0.000001 | 0.000000 | 0.000000 |  100.0 |
| |0>>>     |_Vector updates in ADMM iteration          |     25 |      3 | wall_clock | sec    | 0.000086 | 0.000003 | 0.000003 | 0.000010 | 0.000000 | 0.000002 |   76.4 |
| |0>>>       |_Projection in ADMM iteration            |     25 |      4 | wall_clock | sec    | 0.000020 | 0.000001 | 0.000000 | 0.000008 | 0.000000 | 0.000001 |  100.0 |
| |0>>>   |_Solution polishing                          |      1 |      2 | wall_clock | sec    | 0.000051 | 0.000051 | 0.000051 | 0.000051 | 0.000000 | 0.000000 |   32.8 |
| |0>>>     |_Initialize linear system solver           |      1 |      3 | wall_clock | sec    | 0.000009 | 0.000009 | 0.000009 | 0.000009 | 0.000000 | 0.000000 |   83.8 |
| |0>>>       |_Symbolic factorization in direct solver |      1 |      4 | wall_clock | sec    | 0.000001 | 0.000001 | 0.000001 | 0.000001 | 0.000000 | 0.000000 |  100.0 |
| |0>>>       |_Numeric factorization in direct solver  |      1 |      4 | wall_clock | sec    | 0.000001 | 0.000001 | 0.000001 | 0.000001 | 0.000000 | 0.000000 |  100.0 |
| |0>>>     |_Solve the linear system                   |      4 |      3 | wall_clock | sec    | 0.000025 | 0.000006 | 0.000003 | 0.000016 | 0.000000 | 0.000006 |   91.1 |

Additionally, OmniTrace outputs a trace file suitable for importing into the Perfetto timeline viewer. Once imported, the section labels appear in the timeline view, as shown below.

Timeline view from Perfetto