floating point - Why denormalized floats are so much slower than other floats, from hardware architecture viewpoint? -


denormals known underperform severely, 100x or so, compared normals. causes unexpected software problems.

i'm curious, cpu architecture viewpoint, why denormals have that much slower? lack of performance intrinsic unfortunate representation? or maybe cpu architects neglect them reduce hardware cost under (mistaken) assumption denormals don't matter?

in former case, if denormals intrinsically hardware-unfriendly, there known non-ieee-754 floating point representations gapless near zero, more convenient hardware implementation?

on x86 systems, cause of slowness denormal values trigger fp_assist costly switches micro-code flow (very fault).

see example - https://software.intel.com/en-us/forums/intel-performance-bottleneck-analyzer/topic/487262

the reason why case, architects decided optimize hw normal values speculating each value normalized (which more common), , did not want risk performance of frequent use case sake of rare corner cases. speculation true, pay penalty when you're wrong. these trade-offs common in cpu design since investment in 1 case adds overhead on entire system.

in case, if design system tries optimize type of irregular fp values, have either add hw detect , record state of each value after each operation (which multiplied number of physical fp registers, execution units, rs entries , on - totaling in significant number of transistors , wires. alternatively, have add mechanism check value on read, slow down when reading fp value (even on normal ones).

furthermore, based on type, need perform correction or not - on x86 purpose of assist code, if did not make speculation, have perform flow conditionally on each value, add large chunk of overhead on common path.


Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -