Geometry clipmaps: simple terrain rendering with level of detail

26 Dec 2017 • Geometry clipmaps: simple terrain rendering with level of detail

Geometry clipmaps are a neat rendering technique for drawing huge terrains in real-time. They were first published in 2004, and a practical GPU implementation was described shortly after.

Those articles don't really explain everything you need to know. In particular there are some some unexplained things that seem to add unnecessary complexity, but are actually crucial. The main motivation for writing this blog post is I tried to simplify those things away then ran directly into the problems they resolve which was annoying and took a lot of time.

This blog post by itself is not sufficient for you to write a complete implementation of geometry clipmaps. You should probably take a look at the paper and the GPU Gems article, and spend some time drawing to convince yourself that what I've written is correct.

Overview

The idea behind geometry clipmaps is you upload a mesh that's more finely tesselated in the middle than around the edges, draw the mesh centred on the camera, and then move the vertices of the mesh to the right height on the terrain. So you end up with more detail next to the camera, and less detail where it's so far away you couldn't see it anyway.

The mesh we submit to the GPU might look like this:

And then we set the Z coordinates in the vertex shader so it looks like this:

There are tons of other LOD techniques that achieve the same thing, but clipmaps stand out for their simplicity. You don't need to run complex decimation algorithms, you don't need to worry about stitching arbitrary meshes together, you don't need to select from discrete LOD levels at runtime, it's easy to tune the quality for different quality settings, and you don't need to send tons of data to the GPU every frame.

High level implementation details

In the original paper, they divide the terrain into a few different kinds of meshes and reuse those to draw the complete terrain. You can do it with a single mesh, but I'm not going to cover that. (see this post)

Let's start by looking at a top down view of the flat terrain geometry, with each mesh coloured according to its type:

You can see that there are several rings, or levels. Each ring has the same number of vertices as the ring inside it but is twice as large, so effectively the resolution halves with each level.

You can also see that each level mainly consists of a 4x4 grid of square tiles (the blue ones). Obviously you don't draw the inner 2x2 squares except for the innermost level.

Each level also has filler meshes that effectively split the level into a 2x2 grid of 2x2 squares (the red cross that gets fatter as you move outwards), and an L shaped trim mesh that separates each level (the green meshes). I will explain why these are required in a minute!

To draw the terrain, you pretty much centre the rings on the camera and put all the pieces in the right place and that's it. I will go into more detail on this later, but for now there is one important thing to note.

The position of each mesh needs to be snapped to be a multiple of its resolution. So if a mesh has a vertex every two units, it needs to be snapped to positions that are a multiple of two. If you don't do this, vertices move up and down as they swim over the terrain and the terrain looks like it's shimmering or waving which looks terrible.

With snapping, vertices can be added and removed as the level of detail changes, but they never move around, which is a lot less noticeable.

One consequence of this is that clipmap levels move half as fast as their inside neighbour. To prevent tiles from overlapping, you need to add a gap so each ring can encircle not only the 4x4 grid of square meshes of the level inside, but also some extra padding for the inner level to move around while the outer level remains stationary. This is why you need the filler and trim meshes! This video should show you what I mean:

I'm not drawing the trim so you can more clearly see what's going on. This video should demonstrate why we need the red filler meshes. Each side of the ring needs to be padded by one unit so there's enough room for the inner level to move around without any overlaps. (try drawing it if you don't believe me)

In the original paper the filler meshes are two squares wide and the trim meshes twice as big but still only one square wide. I have one square wide fillers and narrow trim, which is the one difference between my implementation and the paper that actually ended up working. It makes positioning trim meshes ever so slightly more difficult but that's the only difference I can think of.

That's pretty much everything. In pseudocode rendering looks something like this:

for( int l = 0; l < levels; l++ ) {
	v2 snapped = compute snapped camera position;

	for( int x = 0; x < 4; x++ ) {
		for( int y = 0; y < 4; y++ ) {
			if( innermost level or not in the middle 2x2 ) {
				draw a square tile;
			}
		}
	}

	draw this level's fillers;
	draw this level's trim;
}

and below I will fill in the blanks.

Low level implementation details

Generating the meshes

Generating meshes for each piece is pretty dry and way too much code to dump on my blog (it's like 350 lines), so for now I'm just going to point you at the repository. (let me know if that link breaks)

There are a few non-obvious things to do at this point that make rendering simpler later on.

You should generate a single mesh with the four filler pieces for a single level. If you generate one mesh and rotate it you get triangulation flips at the 90/270 degree rotations, and that causes triangles to flip as you move around which is pretty noticeable.

On the other hand, you can generate a single trim mesh and rotate that. You still get triangulation flips (in fact, you can see the flipped triangles in the top down image. Look at the vertical green strip on the left), but they don't seem to produce any noticeable artifacts.

You need to generate a cross shaped filler mesh that gets centred on the camera. It needs to be its own mesh because the arms are not separated from each other and you need the extra quad in the middle so there's no hole.

Actually rendering the meshes can be made quite simple depending on how you place them in object space. If we say the square tiles have side length of TILE_RESOLUTION, the should be placed so their bottom left vertex is at (0, 0) and their top right vertex is at (TILE_RESOLUTION + 1, TILE_RESOLUTION + 1).

Assuming you made them one unit wide too, the filler mesh and filler cross should have the bottom left of the centre quad (the normal filler mesh doesn't actually meet in the middle but imagine it does) at (0, 0) so you only need to snap and scale them into place. If you made them two units wide they should be centred on (0, 0).

The trim mesh should be positioned so that all you need to do is rotate it to put it in the right place for rendering. I start with an L shape with the bottom left vertex at the origin, then transform it down and left by TILE_RESOLUTION / 2 + 0.5 units. You'll probably want to draw this one to convince yourself that's correct, and I expect the +0.5 is only correct if your fillers are one unit wide.

Seams

If you look again at the top down image you will notice there are T-junctions at the boundaries between clipmap levels, and T-junctions mean cracks in the terrain. I've set the background colour to red in the above image to make them stand out.

We aren't totally spared from having to deal with seams, but fortunately they are pretty simple. If we draw the clipmap levels slightly pulled away from each other we can draw the triangles that we need for the seam geometry. The black lines are tile borders, the grey lines are triangle borders, and the red lines are the seam triangles.

But drawing them separately like this is actually a bit misleading. There is no gap, and the vertices at the coarser clipmap level exactly match some of the vertices at the finer level. I've drawn green lines between vertices that share the same position, and I've drawn the triangles we actually need in red. It's around one third as many, and we don't need to do anything special at the corners.

I've not drawn a complete level, but this works so long as the length of a full clipmap side is even. And they will be even, because we have four square tiles, a one wide filler tile, and a one wide trim tile. So 4x + 2, which is even. (try drawing all of it if you don't believe me)

When you generate the seam mesh you should put the bottom left corner at (0, 0) in object space.

Rendering the mesh pieces

Since each level as you move outwards is twice the size of the previous level, we can compute the scale of each level as float scale = 1 << level;. Then we want to snap the camera position to be some multiple of scale, which can be done like v2 snapped_pos = floor( camera_pos / scale ) * scale;.

Placing the tiles is nice and easy. You find the bottom left corner of the bottom left tile and place each tile relative to that. Don't forget about the fillers though!

There's nothing specific to D3D/GL here, but you do need to understand my rendering API. renderer_uniforms appends all of its arguments to a big uniform buffer, and returns an offset into the buffer and the size of the data. renderer_draw_mesh enqueues a draw call, and draw calls include the offsets and sizes of the uniform data they need. At the end of a frame the entire constant buffer gets copied to the GPU then the draw calls get submitted. I've written more about it in this old post if that doesn't make sense.

(checked_cast is a cast that asserts it didn't trash anything)

// this should already be filled in
struct {
	Mesh tile;
	Mesh filler;
	Mesh trim;
	Mesh cross;
	Mesh seam;
	Texture heightmap;
} clipmap;

// RenderState represents the GPU state for a draw call. this gets
// reused for brevity since some of the parameters don't change
RenderState render_state;
render_state.textures[ 0 ] = clipmap.heightmap;
render_state.uniforms[ UNIFORMS_VIEW ] = renderer_uniforms( V, P, camera_pos );

for( u32 l = 0; l < NUM_CLIPMAP_LEVELS; l++ ) {
	// scale is the unit size for this clipmap level
	// tile_size is the size of a full tile mesh
	// snapped_pos is the camera position snapped to this level's resolution
	// base is the bottom left corner of the bottom left tile
	float scale = checked_cast< float >( u32( 1 ) << l );
	v2 snapped_pos = floor( camera_pos / scale ) * scale;

	// draw tiles
	v2 tile_size = v2( checked_cast< float >( TILE_RESOLUTION << l ) );
	v2 base = snapped_pos - tile_size * 2;

	for( int x = 0; x < 4; x++ ) {
		for( int y = 0; y < 4; y++ ) {
			// draw a 4x4 set of tiles. cut out the middle 2x2 unless we're at the finest level
			if( l != 0 && ( x == 1 || x == 2 ) && ( y == 1 || y == 2 ) )
				continue;

			// add space for the filler meshes
			v2 fill = v2( x >= 2 ? 1 : 0, y >= 2 ? 1 : 0 ) * scale;
			v2 tile_bl = base + v2( x, y ) * tile_size + fill;

			render_state.uniforms[ UNIFORMS_MODEL ] = renderer_uniforms( m4_identity() );
			render_state.uniforms[ UNIFORMS_CLIPMAP ] = renderer_uniforms( tile_bl, scale );
			renderer_draw_mesh( clipmap.tile, render_state );
		}
	}
}

Next up are the filler meshes, which are also nice and easy:

// draw filler cross
{
	v2 snapped_pos = floor( camera_pos.xy() );
	render_state.uniforms[ UNIFORMS_MODEL ] = renderer_uniforms( m4_identity() );
	render_state.uniforms[ UNIFORMS_CLIPMAP ] = renderer_uniforms( snapped_pos, 1.0f );
	renderer_draw_mesh( clipmap.gpu.cross, render_state );
}

for( u32 l = 0; l < NUM_CLIPMAP_LEVELS; l++ ) {
	float scale = checked_cast< float >( u32( 1 ) << l );
	v2 snapped_pos = floor( camera_pos / scale ) * scale;

	[draw tiles]

	// draw filler
	{
		render_state.uniforms[ UNIFORMS_MODEL ] = renderer_uniforms( m4_identity() );
		render_state.uniforms[ UNIFORMS_CLIPMAP ] = renderer_uniforms( snapped_pos, scale );
		renderer_draw_mesh( clipmap.filler, render_state );
	}
}

Seams are tougher. If you remember we pad each level with a trim mesh so it fits inside the outer level, and the seam has to go around the trim too, so we need to snap the seam mesh to the outer level's resolution. The code for that looks like this:

[draw filler cross]
for( u32 l = 0; l < NUM_CLIPMAP_LEVELS; l++ ) {
	[draw tiles]
	[draw filler]

	// no need to draw a seam around the outermost clipmap level
	if( l != NUM_CLIPMAP_LEVELS - 1 ) {
		float next_scale = scale * 2.0f;
		v2 next_snapped_pos = floor( camera_pos / next_scale ) * next_scale;

		// draw seam
		{
			v2 next_base = next_snapped_pos - v2( checked_cast< float >( TILE_RESOLUTION << ( l + 1 ) ) );

			render_state.uniforms[ UNIFORMS_MODEL ] = renderer_uniforms( m4_identity() );
			render_state.uniforms[ UNIFORMS_CLIPMAP ] = renderer_uniforms( next_base, scale );
			renderer_draw_mesh( clipmap.seam, render_state );
		}
	}
}

Finally we have the trim meshes. We need to rotate them into place which is a little more complicated than what we've seen so far.

There's a neat little trick though. Let's start with the L in the bottom left (like a normal L). Then take two bits, flipping the first bit flips the mesh horizontally, and flipping the other bit flips the mesh vertically. If we draw it that looks like this:

From that we can see that they are all equivalent to rotations about the Z axis:

And the two bits can be interpreted as decimal 0 to 3. So when rendering we can figure out which flips we need, and use those bits to index into an array of rotations.

To decide which bits to set, we need to figure out where the current clipmap level is placed relative to the outer level. If the current level is in the bottom left of the hole, the trim needs to go in the top right and we set both bits. If the current level is in the top right, the trim needs to go in the bottom left and we use 00. And so on.

The logic to figure out which bits to set is a bit tricky. I do it by looking at the difference between the current level's snapped camera position and the next outer level's snapped camera position. If there's less than one unit difference between the two in both the x and y axes, the tile will be placed in the bottom left and the trim should be placed in the top right. If there's more than one unit difference we set the bit for that axis.

All of that looks like this:

StaticArray< UniformBinding, 4 > rotation_uniforms;
rotation_uniforms[ 0 ] = renderer_uniforms( m4_identity() );
rotation_uniforms[ 1 ] = renderer_uniforms( m4_rotz270() );
rotation_uniforms[ 2 ] = renderer_uniforms( m4_rotz90() );
rotation_uniforms[ 3 ] = renderer_uniforms( m4_rotz180() );

[draw filler cross]
for( u32 l = 0; l < NUM_CLIPMAP_LEVELS; l++ ) {
	[draw tiles]
	[draw filler]

	if( l != NUM_CLIPMAP_LEVELS - 1 ) {
		float next_scale = scale * 2.0f;
		v2 next_snapped_pos = floor( camera_pos / next_scale ) * next_scale;

		[draw seam]

		// draw trim
		{
			// +0.5 because the mesh is offset by half a unit to make rotations simpler
			// and we want it to lie on the grid when we draw it
			v2 tile_centre = snapped_pos + v2( scale * 0.5f );

			v2 d = camera_pos - next_snapped_pos;
			u32 r = 0;
			r |= d.x >= scale ? 0 : 2;
			r |= d.y >= scale ? 0 : 1;

			render_state.uniforms[ UNIFORMS_MODEL ] = rotation_uniforms[ r ];
			render_state.uniforms[ UNIFORMS_CLIPMAP ] = renderer_uniforms( tile_centre, scale );
			renderer_draw_mesh( clipmap.gpu.trim, render_state );
		}
	}
}

The rotations should be exact, like m4_rotz270 should return a hardcoded matrix of zeroes and ones rather than calling some generic rotation function. I suspect it may be possible to end up with cracks in the terrain if you go through a rotation function, and it's easy to hardcode it and be sure so why not.

Shading

The vertex shader is pretty simple. It transforms the mesh position from object space to world space (it's a bit convoluted, it could all be done with a single matrix multiply), samples the heightmap using the world space xy coordinates, then finishes transforming the mesh into clip space.

struct VSOut {
	vec4 view_position;
	vec3 world_position;
	vec2 uv;
};

uniform sampler2D heightmap;

in vec3 position;
out VSOut v2f;

void main() {
	vec2 xy = offset + ( M * vec4( position, 1.0 ) ).xy * scale;

	// +0.5 so we sample from the centre of the texel
	// it's not relevant in the vertex shader but it does affect the fragment shader
	vec2 uv = ( xy + 0.5 ) / textureSize( heightmap, 0 );

	// heightmap is BC5, see next section
	vec2 height_sample = texelFetch( heightmap, ivec2( xy ), 0 ).rg;
	float z = 256.0 * height_sample.r + height_sample.g;

	v2f.view_position = V * vec4( xy, z, 1.0 );
	v2f.world_position = vec3( xy, z );
	v2f.uv = uv;
	gl_Position = P * v2f.view_position;
}

Then the fragment shader looks something like this:

uniform sampler2D normalmap;

in VSOut v2f;
out vec4 screen_colour;

void main() {
	// decode BC5 normal
	vec2 normal_xy = texture( normalmap, v2f.uv ).xy * 2.0 - 1.0;
	float normal_z = sqrt( 1.0 - normal_xy.x * normal_xy.x - normal_xy.y * normal_xy.y );
	vec3 normal = vec3( normal_xy.x, normal_xy.y, normal_z );

	// do all your normal shading

	screen_colour = whatever;
}

The only subtlety here is that if you have a normalmap etc you should sample it in the fragment shader and not in the vertex shader. If you do it in the vertex shader you lose lots of detail and it looks horrible.

The only different between those two islands is the left island samples the normalmap in the vertex shader, and the right island does it in the fragment shader. The geometry is exactly the same!

Storing the heightmap efficiently

Obviously the answer is "as an image" but it's a bit more subtle than that.

We need 16 bits of precision for the heightmap because 8 bits looks blocky and bad, and we would really like it to be in a GPU compressed texture format because it helps with performance and VRAM usage.

BC5 has a pair of (roughly) 8-bit channels, so we can store h / 256 in one and h % 256 in the other to get 16 bits of precision. I didn't do any scientific tests but I did play around with swapping between lossless and BC5 terrains and it seemed fine so I stuck with it.

For a 4k terrain it ends up using 4k x 4k x 1 byte per pixel memory, so 16MB of VRAM for the heightmap. You'll probably want a few more channels than that, probably a normal map, maybe a horizonmap and AO map for the lighting. The normalmap and horizonmap are BC5 too, the AO map can be BC4 which makes it 3.5 bytes per pixel, so 59MB total. If we expand the terrain to 8k then that's 235MB, which is probably still ok. Going beyond that is probably too much though without more cleverness.

We need to be able to decode the terrain image on the CPU too so we can use it for things like collision detection. BC5 should be simple to decode, and indeed the code for it is simple, but my decoder doesn't exactly match the GPU's! If anyone can see the problem please email me!

Extras

Skirts

It's pretty normal for games to be set on an island in the middle of an infinite ocean, because it neatly sidesteps the "invisible walls are immersion breaking" problem.

To help with the illusion, we want to really draw ocean all the way to the horizon. So we need some extra skirt geometry around the coarsest level clipmap. We could add more clipmap levels but that's wasting triangles since all of them will sample Z = 0.

Generating the mesh is pretty simple. You know how large the coarsest level clipmap is and how many vertices go along each edge, so you make a square that fits around the entire terrain and add triangles fanning out from it to some vertices arbitrarily far away.

There are tricks you can do to project vertices to the far plane, but I couldn't figure out how to make fog work with that so I just put a lot of zeroes. If anyone knows how to do this properly please get in touch!

Empty tiles

Continuing with the island idea, you probably want to be able to stand on one side of the island and have the clipmaps reach all the way to the other side. Which implies that they extend that far in the other direction too, meaning you have a lot of vertices over the ocean and outside the terrain.

It's easy enough to detect when a tile lies fully outside the world and swap in a simpler mesh and the only requirement is that it needs to have full resolution at the sides so we don't get T-junctions. A triangle fan works, and the code for swapping meshes at render time looks like this:

// draw a low poly tile if the tile is entirely outside the world
// tl = top left, br = bottom right
v2 tile_br = tile_top_left + tile_size;
bool inside = true;
if( !intervals_overlap( tile_tl.x, tile_br.x, 0, clipmap.heightmap.w ) )
	inside = false;
if( !intervals_overlap( tile_tl.y, tile_br.y, 0, clipmap.heightmap.h ) )
	inside = false;
if( !inside )
	use the simpler mesh;

Make sure to copy intervals_overlap from the ryg blog!

Geomorphing

Even though the differences between adjacent clipmap levels are pretty small, the LOD transitions can be noticeable in some cases. The idea here is that you blend between clipmap levels as you get close to the clipmap boundary.

I haven't implemented it so I have nothing to say here, but if I ever get round to it I shall update this section.

Conclusion

You should now have the knowledge to get started on an implementation of geometry clipmaps without running into the problems that I did.

It's not a huge amount of code (less than 1 KLOC), but the implementation is tricky in some places, and especially mesh generation is an absolute slog to get right.

Here's some other links I looked at while writing this:

The original paper
The GPU Gems article on GPU clipmaps
This terrain rendering project uses a single mesh instead of dividing it into parts which is probably simpler
Crest ocean renderer. Uses clipmaps to render an ocean
The Witcher 3's landscape presentation. Clipmaps in a real game!
This gamedev.net post. You might find this on Google and it's pretty crappy. It draws tiles that overlap and nothing else