What the Hex?

An explanation of Hex notation and its origins

The general opinion about Hex notation seems to be that it's black magic and incomprehensible at best. This is unfortunate from my perspective. I find Hex notation theoretically elegant. Furthermore, I've played quite a lot of AD with Hex notation and I find that it's in fact mostly usable (though there are some minor issues such as it being hard to tell exactly how far away the next galaxy is, or it being hard to add numbers between roughly 20 and 1000 precisely). So I'd like Hex notation to be better-understood by others. Since just presenting the algorithm directly seems likely to lead to peoples' eyes glazing over, I'll try to also give some motivation.

The central principle of this explanation is that Hex notation is what you get if you take Logarithm notation to its extreme and try to express small integers well. What does this mean? Well, consider Logarithm notation. If we want to express 42 in usual Logarithm notation (ignoring that sufficiently small numbers, 42 being sufficiently small, aren't formatted by Lgarithm notation), it would be e1.62 (to 2 places of precision). The meaning of the e is "10 to the power of". It's called Logarithm notation because 1.62 is the logarithm base 10 of 42. However, for sufficiently big numbers, the full logarithm might be too long. For example, for 10^10^10, if we wrote the full logarithm we'd get e10000000000, but that's a lot of zeros. In fact, Logarithm notation (by default) gives us ee10.000 (the three places of precision are due to weird implementation details that you shouldn't worry about but that will keep me up at night).

In the above, we took the logarithm twice. What if we did it more? Why stop at twice? Well, let's go back to 42. Taking the logarithm once (and using 2 places of precision), we get e1.62. Twice, we get ee0.21. Thrice, we get eee-0.68. Four times, we get - oops, we're taking the Logarithm of a negative number. Fortunately Logarithm notation can handle negative numbers; it just expresses them as -e[something]. For example, -42 is expressed as -e1.62. Doing this, we get eee-e-0.17.

We can repeat this as many times as we want. Ten times (still starting with 42) gives us eee-e-e-e-e-e-e-e0.21. Twenty gives us eee-e-e-e-e-e-e-ee-e-e-e-e-e-e-ee-e-0.15. Thirty gives us eee-e-e-e-e-e-e-ee-e-e-e-e-e-e-ee-e-e-e-e-ee-ee-e-e-e0.14. If we started with 42.1 instead, we'd get (doing the logarithm thirty times) eee-e-e-e-e-e-e-ee-e-e-e-e-e-e-ee-e-e-e-e-e-e-e-e-e-e-e0.03, which, unlike what we got for 42, has a minus sign before the twenty-fifth e. Similarly, starting with 41.9, we get eee-e-e-e-e-e-e-ee-e-e-e-e-e-e-ee-e-e-ee-ee-ee-e-e-e0.08, which, unlike what we got for 42, does not have a minus sign before the twenty-third e.

One might wonder whether the minus signs and e's are really all we need if we have enough of them. After all, they allowed us to distinguish 41.9, 42, and 42.1. It turns out that the answer is yes. (Sidenote: This is because 10 is less than e^e, with e here being Euler's constant. e^e is roughly 15.15. If we used logarithm base 16 instead, certain different numbers would be represented in the same way however many times we took the logarithm.) So we can drop the final number entirely, and just keep the e's and minus signs.

There are now a lot of small issues which can be addressed in various orders. One is what happens when we start with the number 1. The logarithm of 1 is 0, the logarithm of 0 is negative infinity, and negative infinity is negative so (as above with -42) we take its absolute value before computing the logarithm to get infinity. However, we still need to compute the logarithm of infinity. We slightly (but not very) arbitrarily say it's infinity. So 1 is represented as ee-eeeeeeee..., with endless (or however many we use) e's signified by the .... (Sidenote: note that we also didn't negate 0 before taking the logarithm, which means we're considering 0 not to be negative. One could try to think throughout the rest of this explanation about what happens if we consider zero to be negative instead.)

Another issue is that though 0 and 1 now have nice representations ending in endless e's, 2, for example, doesn't. The first thirty e's of the representation of 2, with minus signs interspersed, are ee-e-e-e-e-e-e-e-e-e-e-e-e-ee-ee-e-e-e-e-e-e-e-e-e-e-ee. This at least doesn't seem to have any pattern. The very naive fix to this is to use base 2 for the logarithms rather than base 10. Then 2 would be represented as eee-eeeeeee.... However, this is a very naive fix since it fails on 3, which is still (taking the first thirty e's) eee-e-e-e-ee-ee-ee-e-eee-ee-eee-ee-ee-ee-e-ee. One way to get around this is to define the logarithm base 2 slightly differently, namely to make it linear between powers of 2 so that the logarithm of 2^n for integer n is still n, but the logarithm of a2^n for a between 1 and 2 is not log2(a) + n, but rather a - 1 + n. This makes the first thirty e's of 3 be eee-ee-eeeeeeeeeeeeeeeeeeeeeeeee instead, and as one might hope the e's at the end continue to infinity.

This explanation previously had a longstanding error, saying that if we defined a locally linearized version of the logarithm base 10, by defining the logarithm of a10^n to be (a - 1) / 9 + n, 2 and 2.1 (for example) would be represented in exactly the same way, while using a locally linearized version of the logarithm base 2, as above, keeps the property that different numbers are represented differently. In fact, it seems that for both base 2 and 10 (and possibly any integer base), the locally linearized version of the logarithm keeps the property that different numbers are represented differently (though I'm not fully sure about this for base 10). However, base 2 still has the advantage that rational numbers now can be shown to all repeat with base 2, while with base 10 they at least don't seem to (and instead seem largely random in most cases).

The remaining issues are all vaguely of the form that the notation UI sucks. Firstly, we could represent -e by a single character to slightly clean things up. Let's arbitrarily pick d. Now 42 is eeeeedeededddedeeeeeeeeeeeeeee. Secondly, one might think that since d signifies negation, a number with d in a position would be less than a number with e in the same position. However, 42.1 is eeeeedeededddddeeddddedededede, with the first difference being a d in position 14 of the representation of 42.1, where the representation of 42 has an e. So the rule of d being less than e doesn't always work. This is ultimately because d signifies -e, and -e is decreasing. That is, for example, -e1 is -2 but -e2 is -4. The greater the number after -e is, the smaller the number represented is (recall that -4 < -2). We can fix this by letting d's meaning be, instead of -e, -e-. -e- is increasing; for example, -e-1 is -0.5 and -e-2 is -0.25, and just as 1 < 2, -0.5 < 0.25. Now 42 is eeeeedddeedededddddddddddddddd and 42.1 is eeeeedddeedededddededdeeddeedd. The first difference is now in position 18, and it's a d in 42 and an e in 42.1, as we'd hope. In fact, now in comparing two numbers, looking for the first different character and saying that the number with a d is lower always works, as one might hope. In other words, we can sort numbers the same way we sort words in a dictionary.

The notation UI still sucks. One small improvement is to replace d and e by 0 and 1, respectively. Now 42 is 111110001101010000000000000000 and 42.1 is 111110001101010001010011001100. This is slightly better but not that much. However, we can fix four bits (0 or 1) into one hexadecimal digit. We're currently representing numbers by 30 bits, but we can add two to make it 32 and then use 8 hexadecimal digits to represent 32 bits. 42 is now F8D40000 and 42.1 is now F8D45333. And that's exactly what they are in Hex notation. Obviously this isn't a coincidence. In fact, the above defines Hex notation.

Here's a helpful tool to hopefully help in understanding the above. Try to get it to output Hex notation (or cheat and press the button that does that). It mostly agrees with AD Hex notation with the proper settings, but there are some edge case differences. If you can make 123456 show up as FC331900 you've (almost certainly) gotten it to work as Hex notation.

Input number:

Base:

Linearize logarithm?

Negate result if value is negative (that is, use -e- rather than -e)?

Output form:

Output iterations of logarithm:

Show final number?

Precision for final number:

This is where output will appear.