The Problem
I was just starting
to set up the procedures needed for rows of buildings when I hit a problem.
These are supposed
to be random heights! I've had a nagging
feeling that my random number generation operators weren't working properly and
had noticed a few artefacts before, but dismissed them as me being overly sensitive
to patterns. This horrible patterning however tipped me over
the edge and I had to investigate.
Visualisation
A great way to
sanity check random numbers is to visualise a sequence as a grid of intensity
values (see Runes blog post again). So I put together a couple of
nested procedures to build a 2D grid of tiles, each coloured black-to-white
according to the random function under scrutiny.
This instantly
revealed unpleasant patterns in the number sequences. Trying different seeds showed that most of them seemed to have regular repeating
pattern or clearly be restricted to lower or upper value ranges.
Bad patterning in the random number generation |
Investigation
My first avenue of
investigation was to reproduce the algorithm explicitly in the test procedure
using the same sequence of operations as the random generation library code.
I hoped that this
would allow me to play with the algorithm to find the bug. However, this version produced very nice
random values, with no discernible patterns.
Carefully picking through the implementation I couldn't see any
functional difference.
Next I added
numerical display of the seed and value for each tile and zooming in on the
start of the first row I noticed that they did agree for a few values, before
diverging. This seemed to coincide with
the seed (handled as a signed integer) going negative. I wasn't sure how this would affect the
algorithm though so I switched to debug build and stepped through the for this
case.
After some debugging
runs and breaking open the algorithms a bit I discovered what was going on.
Cause
My entropy
extraction macro (see Hash part of Random Value Generation Process in the previous article) uses right shift operations >> to re-align the
overlapping bits, but I had forgotten that this will perform a sign extended
shift if the operand is a signed value. That is, if the most significant bit is 1, indicating a negative number, then 1's are shifted in instead of 0's. This situation caused the extraction algorithm to XOR chunks of 1's into
the seed values. XOR has the interesting
(and normally useful) property that successive applications of 1's causes
toggling. This fact combined with shifts
used set up a repeating oscillation in some chunks of bits. It was completely believable that this could produce the
heavily striped and range limited results I was seeing.
Fix
The fix here was
relatively straight forward as the behaviour of >> on unsigned integers always shifts in zeros into
the upper bits, as was actually required.
Interestingly, the reason we didn't see this problem on the explicit
implementation was that I had made sure the shift operation always used the
unsigned shift.
Casting the operands
to unsigned integer prior to each shift fixed this and the two implementations
now agreed, AND also produced the randomness they were designed to generate.
Slight Red Herring
Going back to the
building height problem I discovered that this hadn't fixed it, the result was exactly the same! So, I had managed to find and fix a real
problem, just not the one that was causing my issue. Oops!
The Real Culprit
Checking the
building height calculation I realised that I was relying on the variant
parameterisation of the Random Twist operator to generate random values from
the building index along the row. The
reason it was designed to work like this is that I needed a way to generate
unique seeds for each building from the single
seed provided to the row as a whole. The
only distinguishing information the buildings have at this point where they are
in the row, i.e. their index. The twist
operation seemed to be perfect here as the variant input was intended for
exactly this sort of operation.
Unfortunately this
didn't seem to produce very good results for a sequence of successive variant
values. Adding a visualisation of the
twist operation with the tile index driving the variation input again showed up
this lack of randomness.
In hindsight this is
one of the issues Rune talks about in his article; close or
sequential seeds don't necessarily produce sequences that are very random relative to each other. He suggests a decent
hash of the input to better seed the sequences.
A Better Twist
This time the
patterning wasn't as strong, but it was certainly there. Looking into the Twist implementation I
realised the mixing method of the variant value was far too simplistic and I
was relying on small changes in variant input to produce significant changes to
the output seed and hence number sequence.
This was obviously not the case and I needed a better mixer.
// jump to another point in the
// random stream (parameterised)
int Random::Twist( int input, int variant )
{
return Random::Algorithm::Next_LCG_ANSIC( input^variant );
}
After some research,
I realised that the property I was after is referred to as 'avalanche', i.e.
where small input changes cause large output changes, and was a common
desirable feature of hashing algorithms.
Again, rather than investigate and implement my own solution I chose a
publicly available integer hashing function that claimed very good avalanche characteristics.
Replacing the mixer
generation with this in the Twist operator proved very successful and finally
achieved a good random variants again.
Fixed Buildings
Going back to my row
of buildings again, with the above changes produced much, much more
satisfactory results.
I expect to see
improvements throughout my projects as a result of these fixes, some will be
more subtle than others. I'm glad I took
the time to investigate this as it's resolved a niggle that I couldn't quite
put my finger on.
Epilogue
Procedural
generation is typically required to be completely deterministic so that, for
example, every time you re-visit a game the world is familiar to you, or that
every player of a game sees the same world.
This is an important feature of a game such as No Man's Sky, a vast,
hugely varied procedural universe that is the same for every player.
Fixing the bugs in
my random number generation operators means I have changed the behaviour of
something fundamental to all the generation processes across my projects. This in turn means that all the content they generate will now be
different, the content they generated before has been lost. Forever.
The deterministic
nature of the processes involved in procedural generation, combined with their
complexity means they are hugely susceptible to the butterfly effect, that
small changes to any part (all parts are effectively inputs) can have huge
effects on the output.
I have often
wondered how many times Hello Games has made changes to No Man's Sky that have
completely invalidated the current universe it is set in. How many versions of universe have they gone
through to get to the one we play? I
dread to think how much of a problem this is for ongoing work where they can't
change or even fix bugs in so much of the game for fear of breaking the
universe.
“There is a theory which states that if ever anyone discovers exactly what the universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable."
"There is another theory which states that this has already happened.”
- Douglas Adams, The Restaurant at the End of the Universe
No comments:
Post a Comment