NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...6 min read

Floating point from scratch: Hard Mode

Share
NOW LET US Article – Floating point from scratch: Hard Mode

A deep dive into the complexities of floating-point arithmetic, documenting a developer's journey to master the IEEE 754 standard from scratch.

I have a confession to make: floating point scares me.

Half a decade ago I decided that I was going to implement some floating point arithmetic. Back then it seemed approachable enough, after all, floating points are ubiquitous. How hard can it really be ? My experience until that point had been: given enough time and effort is spent bashing my brain against a problem, I can generally figure things out.

This is how I faced the most complete technical defeat of my existence. Through this utter annihilation emerged my present fear of floating point.

After half a decade I decided it was time for a rematch, time to face my dragons!

But this time, I would not simply aim for a surface level understanding, this time I would aim to deeply grasp the floating point representation.

When setting out on this crusade, I believed that there were only 3 types of people who truly understood floating point :

  • The people writing the spec
  • The math PhDs working on the floating point representation
  • The people building the floating point hardware

Welcome to round 2!

Chapter 1: Descent into madness#

Looking back on it, one of the main reasons behind my past defeat was that I mistook my ability to use floating points for a marker of understanding. And that this freed me from the need to invest the time in studying floating point, as if I was going to pick it up along the way.

So it’s now time to put the computer aside, and spend 10 days in the company of paper. ( you remember, the white stuff )

How floating point works#

I am assuming that readers already have some surface level knowledge of what floating point is, so I will spare you the basic intro.

Let me just set a few definitions, in the context of this discussion normal floating point numbers will be defined as:

With the values of (S), (E) and (T) being the values stored in the floating point fields:

  • (S) sign bit
  • (E) biased exponent
  • (T) trailing significant field

The size of these fields, as well as the values of (b) (exponent bias) and (p) (precision) depend on the floating point format.

Eg, for the IEEE 754 single precision (float32_t

) we have:

  • (b = 127)
  • (p = 24)

Resulting in:

$$ (-1)^{S} \times 2^{E−127} \times (1 + T \cdot 2^{-23}) $$In this discussion we will be calling :

  • sign, the (S) sign bit
  • exponent, the value stored in the biased exponent field (E)
  • significant/mantissa, the value stored in the (T) field

The term mantissa isn’t pedantically correct since this isn’t a logarithmic representation it should really be called a significant. But my fellow programmers in the audience will appreciate that since the sign has already used the (s) name for our single letter naming of our structure elements we have no choice but to yield and call this (m) for mantissa. I will be using the term mantissa and significant interchangeably in this article.

What you never wanted to know#

We are not actually interested in floating point in the abstract, but rather what we commonly refer to as “float” in our programs.

In the world of all the possible floating point types, these are the vanillas, except in this world, everyone also wants vanilla all the time!

This float format is canonized by the IEEE in the IEEE 754 specification. Inside this holy grail is where the expected behavior is outlined in excruciating detail making it possible for users to expect the same behavior for the same floating point operations on different platforms. A cornerstone of making float ~~portable~~.

Also, this is where hell starts!

+0/-0#

Let us commence our descent slowly.

As the most astute readers might have already noticed looking (kudos) at the representation format, we have a real sign bit. This implies that we actually have 2 representations for zero: (+0.0) and (-0.0).

Now where things get fun is that we have rules around which zero to use. For example, let us consider how we would determine the equality between two floating point numbers, say X == Y

?

To do this comparison we would generally re-use the adder and do X - Y

then check all the result’s bits are 0, problem is (-0.0) is written with an 1 in its sign bit.

So we have rules around when the result should use (+0.0) or (-0.0), and the subtracting of two equal floating point numbers is such an example of this rule :

NaN#

NaN

for Not A Number.

For all of you that thought we were talking about numbers, this is the point at which you start understanding the difference between a number and a representation format.

So let’s start with the fun bit, there are actually different types of NaN

’s:

q

uietNaN

s (qNaN

s) that you would typically encounter from your bad math.s

ignalingNaN

s (sNaN

s) the ones bad math doesn’t produce and also the ones that scream at you by signaling an invalid operation exception whenever they appear as operands. Most people won’t encounter these.

So, what do I mean by “ qNaNs are used to indicate when the result of an arithmetic operation cannot be represented”?

Here are a few examples for clarification :

  • (\sqrt{-1.0}) results in an qNaN

as (\sqrt{-1.0} = i), and (i) is an imaginary number that cannot be represented without the use of complex notation. - (\frac{0.0}{0.0}) would also result in a qNaN

because: what are you doing ? - (+\infty - \infty) would also result in a qNaN

because (\pm\infty) are actually limits, not numbers. And subtracting a limit from another limit (+\infty - \infty) just doesn’t make sense.

Want to know another fun fact about qNaN

s ?

They are contagious.

Arithmetic operations with a qNaN

as an operand will result in a qNaN

.

Think about it: what result should you give for an operation whose result can’t be represented ?

In memory NaN

s are represented with all the exponent bits set to (1) and with at least one of the significant bits set. You can then differentiate different NaN

s based on which significant bit(s) are set, the encoding of which is left to the discretion of the implementer.

Infinitys#

So we have already started introducing these with the NaN

s, but the floating point representation has room for two infinity notations: one for (+\infty) and its mirror (-\infty). These are not numbers, infinity is not a number it’s a limit!

In compliance with IEEE certain specific infinities can be used in arithmetic operations, be used as inputs for boolean operations and be produced as the result of a calculation.

In memory, infinities have their exponent bits set to all (1)s, and to differentiate them from NaN

s their significant bits are all (0)s.

Denormal#

Let’s put infinities and NaN

s on the side for a minute and get back to talking just about numbers.

In the introduction I defined a normal floating point number as:

A more common way of writing this is:

Where (m) is a number represented by a string of the form (d_0 . d_1 d_2 ... d_{p-1}), and is (p) long.(with (p) the precision, or number of bits in the significant + 1 ).

For example (1.5) would be written as :

and (3) as (2 × 1.5) :

In our normal floating point representation, the (1) in ((1 + T · 2^{1−p})) is our (d_0) and is always set to (d_0 = 1).

Now, the funny thing is our significant actually only has (p-1) bits, and (d_0) is actually an inferred bit, we call it the hidden bit

.

Seems simple enough ? Could something finally be simple about floating point ?!

Don’t worry: floating point isn’t going to let you down like this, because we have another category of numbers!

They have an implicit hidden bit set to (d_0 = 0) and are called subnormal numbers

(or denormal numbers)

. Yay 🥳

These are used to encode the smallest representable floating point numbers, and were the most controversial part of the IEEE 574 spec during its elaboration.

They are also a giant pain in the ass to implement, so

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

NOW LET US Related – Open source AI must win

dev-tools

Open source AI must win

If artificial intelligence becomes a utility rented only from a few closed institutions, humanity loses its operational freedom. Open-source AI is a vital infrastructure for the future of our digital society.

NOW LET US Related – Statement on US government directive to suspend access to Fable 5 and Mythos 5

dev-tools

Statement on US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive forcing Anthropic to suspend all access to its Fable 5 and Mythos 5 models due to national security concerns, a move the AI safety startup strongly disputes.

NOW LET US Related – Electric motors with no rare earths

dev-tools

Electric motors with no rare earths

Renault Group is pioneering the development of electrically excited synchronous motors (EESM) that eliminate the need for rare earth magnets, reducing dependency on global monopolies while driving efficiency and sustainability.

NOW LET US Related – Swift at Apple: Migrating the TrueType hinting interpreter

dev-tools

Swift at Apple: Migrating the TrueType hinting interpreter

Apple has rewritten its TrueType hinting interpreter from C to memory-safe Swift for its Fall 2025 OS releases, improving security and boosting performance by an average of 13%.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.