OpenGL uniforms and renderer design rambling

3 Sep 2017 • OpenGL uniforms and renderer design rambling

I recently did a bit of a renderer overhaul in my engine and I'm very pleased with how it turned out so now seems like a good time to blog about it. I don't think there's anything left in my renderer that's blatantly bad or unportable, yet there's still obvious improvements that can be made whenever I feel like working on that. (I like leaving things that are easy, fun and non-critical because if I ever get bored or stuck on something else I can go and work on them)

This post is going to roughly outline the evolution of setting OpenGL uniforms in my game engine. It's a simple sounding problem ("put some data on the GPU every frame") but OpenGL gives you several different ways to do it and it's not obvious which way is best. I assume it's common knowledge in the industry, but it took me a long time to figure it out and I don't recall ever seeing it written down anywhere.

glGetUniformLocation and glUniform

This is what everyone starts off with, it's what all the OpenGL tutorials describe, and it's what most free Github engines use. I'm not going to go over it in great detail because everyone else already has.

I will say though that the biggest problem by far with this method is that the book keeping becomes a pain in the ass once you move beyond anything trivial.

Uniform block objects

A step up from loose uniforms are UBOs. Basically you can stuff your uniforms in a buffer like you do with everything else, and use that like a struct from GLSL. The Learn OpenGL guy has a full explanation of how it works.

It's best to group uniforms by how frequently you update them. So like you have a view UBO with the view/projection matrices and camera position, a window UBO with window dimensions, a light view UBO with the light's VP matrix for shadow maps, a sky UBO with skybox parameters, etc. There's not actually very many different UBOs you need, so you can hardcode an enum of all the ones you use, and then bind the names to the enum with glUniformBlockBinding.

Of course you can have a "whatever" UBO that you just stuff everything in while prototyping too.

As a more concrete example you can do this:

// in a header somewhere
const u32 UNIFORMS_VIEW = 0;
const u32 UNIFORMS_LIGHT_VIEW = 1;
const u32 UNIFORMS_WINDOW = 2;
const u32 UNIFORMS_SKY = 3;

// when creating a new shader
const char * ubo_names[] = { "view", "light_view", "window", "sky" };
for( GLuint i = 0; i < ARRAY_COUNT( ubo_names ); i++ ) {
	GLuint idx = glGetUniformBlockIndex( program, ubo_names[ i ] );
	if( idx != GL_INVALID_INDEX ) {
		glUniformBlockBinding( program, idx, i );
	}
}

// rendering
GLuint ub_view;
glBindBuffer( GL_UNIFORM_BUFFER, ub_view );
glBufferData( GL_UNIFORM_BUFFER, ... );
// ...
glBindBufferBase( GL_UNIFORM_BUFFER, UNIFORMS_VIEW, ub_view );

I found it helpful to write a wrapper around glBufferData. UBOs have funny alignment requirements (stricter than C!) and it was annoying having to mirror structs between C and GLSL. So instead I wrote a variadic template that lets me write renderer_ub_easy( ub_view, V, P, camera_pos );, which copies its arguments to a buffer with the right alignment and uploads it. The implementation is kind of hairy but here you go:

template< typename T >
constexpr size_t renderer_ubo_alignment() {
	return min( align4( sizeof( T ) ), 4 * sizeof( float ) );
}

template<>
constexpr size_t renderer_ubo_alignment< v3 >() {
	return sizeof( float ) * 4;
}

template< typename T >
constexpr size_t renderer_ub_size( size_t size ) {
	return sizeof( T ) + align_power_of_2( size, renderer_ubo_alignment< T >() );
}

template< typename S, typename T, typename... Rest >
constexpr size_t renderer_ub_size( size_t size ) {
	return renderer_ub_size< T, Rest... >( sizeof( S ) + align_power_of_2( size, renderer_ubo_alignment< S >() ) );
}

inline void renderer_ub_easy_helper( char * buf, size_t len ) { }

template< typename T, typename... Rest >
inline void renderer_ub_easy_helper( char * buf, size_t len, const T & first, Rest... rest ) {
	len = align_power_of_2( len, renderer_ubo_alignment< T >() );
	memcpy( buf + len, &first, sizeof( first ) );
	renderer_ub_easy_helper( buf, len + sizeof( first ), rest... );
}

template< typename... Rest >
inline void renderer_ub_easy( GLuint ub, Rest... rest ) {
	constexpr size_t buf_size = renderer_ub_size< Rest... >( 0 );
	char buf[ buf_size ];
	memset( buf, 0, sizeof( buf ) );
	renderer_ub_easy_helper( buf, 0, rest... );
	glBindBuffer( GL_UNIFORM_BUFFER, ub );
	glBufferData( GL_UNIFORM_BUFFER, sizeof( buf ), buf, GL_STREAM_DRAW );
}

I'm not 100% sure I got the alignment stuff right but it works for everything I've thrown at it so far.

In terms of book keeping it is better than loose uniforms, but you still need to allocate/deallocate/keep track of all your uniform buffers. It's less but still non-zero.

glMapBuffer and glBindBufferRange

For this next one you actually need to reorganise your renderer a little. Instead of submitting draw calls to the GPU immediately, you build a list of draw calls and submit them all at once at the end of the frame. More specifically you should build a list of render passes, each of which has a target framebuffer, some flags saying whether you should clear depth/colour at the start of the pass, and a list of draw calls.

People do talk about this on the internet but they focus on the performance benefits:

You can sort your draw calls by pipeline state to minimise the number of costly state changes
You can submit all your draw calls on a background thread
I guess this is how D3D12/Vulkan work so it makes porting easier too

Neither loose uniforms nor UBOs really work with this model anymore though. Maybe you can pack uniform uploads into the draw call list, but that's a pain and ugly.

The pro secret is quite simple: map a huge UBO at the start of the frame, copy the entire frame's uniforms into it, then make the offsets/lengths part of your pipeline state and bind them with glBindBufferRange.

There's no book keeping beyond telling your renderer when to start/end the frame/passes. You can use the variadic template from above with few modifications so setting uniforms is still a one-liner. It's like going from a retained mode API to an immediate mode API. If you don't upload a set of uniforms for a given frame, they just don't exist.

To make it totally clear what I mean the game code looks like this:

renderer_begin_frame();

UniformBinding light_view_uniforms = renderer_uniforms( lightP * lightV, light_pos );

// fill shadow map
{
	renderer_begin_pass( shadow_fb, RENDERER_CLEAR_COLOUR_DONT, RENDERER_CLEAR_DEPTH_DO );

	RenderState render_state;
	render_state.shader = get_shader( SHADER_WRITE_SHADOW_MAP );
	render_state.uniforms[ UNIFORM_LIGHT_VIEW ] = light_view_uniforms;

	draw_scene( render_state );

	renderer_end_pass();
}

// draw world
{
	renderer_begin_pass( RENDERER_CLEAR_COLOUR_DO, RENDERER_CLEAR_DEPTH_DO );

	RenderState render_state;
	render_state.shader = get_shader( SHADER_SHADOWED_VERTEX_COLOURS );
	render_state.uniforms[ UNIFORM_VIEW ] = renderer_uniforms( V, P, game->pos );
	render_state.uniforms[ UNIFORM_LIGHT_VIEW ] = light_view_uniforms;
	render_state.textures[ 0 ] = shadow_fb.texture;

	draw_scene( render_state );

	renderer_end_pass();
}

renderer_end_frame();

renderer_begin_frame clears the list of render passes and maps the big UBO, renderer_begin_pass records the target framebuffer and what needs clearing, draw_scene contains a bunch of draw calls which basically copy the RenderState and Meshes (VAOs) into the render pass's list of draw calls, and finally renderer_end_frame unmaps the big UBO and submits everything.

One pitfall is that glMapBuffer is probably going to return a pointer to write combining memory, so you should make sure to write the entire buffer, including all the padding you use to align things (just write zeroes). It's probably not required on modern CPUs, but it's good for peace of mind.

In case I haven't explained this well you should probably just look at my implementation in renderer.cc and renderer.h. Or look at Dolphin which does something similar.

GL_MAP_PERSISTENT_BIT

For the sake of completion, if you're using GL4 you get to use a persistent map which should be a tiny bit faster. But it's the same idea.