SW4STM32 and SW4Linux fully supports the STM32MP1 asymmetric multicore Cortex/A7+M4 MPUs

   With System Workbench for Linux, Embedded Linux on the STM32MP1 family of MPUs from ST was never as simple to build and maintain, even for newcomers in the Linux world.
And, if you install System Workbench for Linux in System Workbench for STM32 you can seamlessly develop and debug asymmetric applications running partly on Linux, partly on the Cortex-M4.
You can get more information from the ac6-tools website and download two short videos (registration required) highlighting:

System Workbench for STM32

Inconsistent FPU usage of soft and hard flaoting point

I have some floating point operations in my code. The problem is, I think the FPU is not being utilizied...I have mostly calls to soft floating points in the generated assembly code and sometimes for some instructions I have hard instructions... I am not sure what is going on!

- The MCU is STM32L476 which I believe has a hard FPU unit
- I am on Windows 10 and have the latest updates to SW4STM32 to date.
- I am sure that the compiler and linker settings are set correctly (I can see -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 in compiler and linker flags).
- I am using STMCubeMX HAL drivers to setup my project

Consider the example below:

float g = 0.0;
int data = 0.0;
for(float f = 0.0; f < 100.0; f += 0.1) {
g = f * 10.01;
data = g;

g = 0;

Which translates to the assembly code below:

08005772: vmov r0, s15
08005776: bl 0x8000498
195 for(float f = 0.0; f < 100.0; f += 0.1) {
0800577a: add r3, pc, #36  ; (adr r3, 0x80057a0 )
0800577c: ldrd r2, r3, r3
08005780: bl 0x80001dc
08005784: bl 0x80009ac
08005788: vmov s15, r0
0800578c: vldr s14, pc, #48  ; 0x80057c0
08005790: vcmpe.f32 s15, s14
08005794: vmrs APSR_nzcv, fpscr
08005798: bmi.n 0x8005772
0800579a: vldr s15, pc, #40  ; 0x80057c4
0800579e: b.n 0x800578c
080057a0: ldr r1, sp, #616  ; 0x268
080057a2: ldr r1, sp, #612  ; 0x264
080057a4: ldr r1, sp, #612  ; 0x264
080057a6: subs r7, #185  ; 0xb9
080057a8: movs r4, #0

As you can see the call for add and multiply goes to software implementations (e.g. __aeabi_dadd where I expect it to be vadd) but the comparison for loop condition is being done by the FPU (vcmpe.f32)...
The worst thing here is the cast from flaot to int by (_truncdfsf2) which is more than 30 instructions! Where if it was by (VCVT) it would be one instruction only.

so what is the cause for this inconsistency?

I also have a startup file which starts like this (this is generated by s24stm32 I think). It starts like this:

.syntax unified
.cpu cortex-m4
.fpu softvfp

.global g_pfnVectors
.global Default_Handler

Why .fpu is softvfp?

I think I found the problem...I had to cast everything to float implicitly. (float) blah...

try g = f * 10.01F;