SW4STM32 and SW4Linux fully supports the STM32MP1 asymmetric multicore Cortex/A7+M4 MPUs

   With System Workbench for Linux, Embedded Linux on the STM32MP1 family of MPUs from ST was never as simple to build and maintain, even for newcomers in the Linux world.
And, if you install System Workbench for Linux in System Workbench for STM32 you can seamlessly develop and debug asymmetric applications running partly on Linux, partly on the Cortex-M4.
You can get more information from the ac6-tools website and download two short videos (registration required) highlighting:

System Workbench for STM32

"printf("line: %d \n", __LINE__);" does not work on System Workbench

I’ve tried using semihosting in the past and ran into similar problems - crashes the target if no debugger is connected, along with the fact that I/O blocks and is SLOW.

My personal preference is to use a free USART channel for debug I/O. The trick in doing this is that you need a decent USART I/O library (preferably one that supports nonblocking, buffered operation) and need to know how to create and set up (at a minimum) the _read() and _write() functions that will redirect stdin/stdout/stderr output to the USART.