SW4STM32 and SW4Linux fully supports the STM32MP1 asymmetric multicore Cortex/A7+M4 MPUs

   With System Workbench for Linux, Embedded Linux on the STM32MP1 family of MPUs from ST was never as simple to build and maintain, even for newcomers in the Linux world. And, if you install System Workbench for Linux in System Workbench for STM32 you can seamlessly develop and debug asymmetric applications running partly on Linux, partly on the Cortex-M4.
You can get more information from the ac6-tools website and download (registration required) various documents highlighting:

System Workbench for STM32


Hi Rosh,

The performance difference comes from the way semihosting works: while printing on ITM is done just by writing in a few registers, like writing on an UART (which is the way it works when writing through the STLink-V2-1 provided serail over USB link), semihosting generates an SWI instructioon with a predefined code, that is intercepted by OpenOCD (stopping the CPU execution) which has to read CPU registers (through the JTAG/SWD interface) then restart CPU execution.

Due to the stop/read/start sequence of debug interaction, semihosting is thus inherently slower than writing on a few registers without debug interaction.

Bernard (Ac6)