I have a (possible) answer to my own question, although I’d still welcome some discussion and confirmation of my findings.
At the start of section 4.6 in ST document PM0214, “STM32F3 and STM32F4 Series Cortex-M4 Programming Manual” it states:
“The Cortex-M4F FPU implements the FPv4-SP floating-point extension”
The closest entry in the floating point hardware list to this is “fpv4-sp-16”, so I selected this along with the “hard” selection for the floating point ABI.
After setting this up, I forced a full rebuild of my application. It built and ran successfully.
I confirmed that there is a significant (positive) performance difference between using hardware vs. software FP. I wrote a simple test application that performed a single FP addition and FP multiply within a loop that was set to execute for exactly 1 second. The hardware FP test ran a bit over 3x the number of iterations per second that the software FP one ran.