Inconsistent FPU usage of soft and hard flaoting point

I have some floating point operations in my code. The problem is, I think the FPU is not being utilizied...I have mostly calls to soft floating points in the generated assembly code and sometimes for some instructions I have hard instructions... I am not sure what is going on!

- The MCU is STM32L476 which I believe has a hard FPU unit
- I am on Windows 10 and have the latest updates to SW4STM32 to date.
- I am sure that the compiler and linker settings are set correctly (I can see -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 in compiler and linker flags).
- I am using STMCubeMX HAL drivers to setup my project

Consider the example below:

float g = 0.0;
int data = 0.0;
for(float f = 0.0; f < 100.0; f += 0.1) {
g = f * 10.01;
data = g;

g = 0;

Which translates to the assembly code below:

08005772: vmov r0, s15
08005776: bl 0x8000498
195 for(float f = 0.0; f < 100.0; f += 0.1) {
0800577a: add r3, pc, #36  ; (adr r3, 0x80057a0 )
0800577c: ldrd r2, r3, r3
08005780: bl 0x80001dc
08005784: bl 0x80009ac
08005788: vmov s15, r0
0800578c: vldr s14, pc, #48  ; 0x80057c0
08005790: vcmpe.f32 s15, s14
08005794: vmrs APSR_nzcv, fpscr
08005798: bmi.n 0x8005772
0800579a: vldr s15, pc, #40  ; 0x80057c4
0800579e: b.n 0x800578c
080057a0: ldr r1, sp, #616  ; 0x268
080057a2: ldr r1, sp, #612  ; 0x264
080057a4: ldr r1, sp, #612  ; 0x264
080057a6: subs r7, #185  ; 0xb9
080057a8: movs r4, #0

As you can see the call for add and multiply goes to software implementations (e.g. __aeabi_dadd where I expect it to be vadd) but the comparison for loop condition is being done by the FPU (vcmpe.f32)...
The worst thing here is the cast from flaot to int by (_truncdfsf2) which is more than 30 instructions! Where if it was by (VCVT) it would be one instruction only.

so what is the cause for this inconsistency?

I also have a startup file which starts like this (this is generated by s24stm32 I think). It starts like this:

.syntax unified
.cpu cortex-m4
.fpu softvfp

.global g_pfnVectors
.global Default_Handler

Why .fpu is softvfp?

I think I found the problem...I had to cast everything to float implicitly. (float) blah...

try g = f * 10.01F;