Breaking down the splaTV WebGL vertex shader of Spacetime Gaussians

Recently, I started a project to implement a Spacetime Gaussians renderer on Apple’s Metal graphic platform for AR/VR iOS and visionOS application.

Spacetime Gaussians is an evolution of Gaussian Splatting adding a 4th temporal dimension. Instead of storing frame-by-frame geometry (like a video mesh or point cloud per frame), Spacetime Gaussians interpolate motion continuously. This gives smooth animations with no flickering, and you can sample at any time, and from any angle.

My starting point was analysing Kevin Kwok Antimatter15’s fantastic implementation of Spacetime Gaussians in a WebGL app called splaTV. See below for two live demos if you haven’t seen it in action yet.

SplaTV’s documentation is sparse (or rather, non-existent) so I broke down the shader logic to see how it all fits together.

Below I’m sharing my notes, should they be helpful to anyone else.

The WebGL viewer’s source code is available on GitHub antimatter15/splaTV/hybrid.js#L271 as a single self-contained file hybrid.js which leverages a hybrid approach of CPU and GPU (WebGL/GLSL) processing. The heart of the rendering happens inside vertexShaderSource, and this post dives into how that shader works under the hood.

High-Level Overview

hybrid.js is structured into several logical parts: a web worker for background processing, GLSL shader code (which this blog post looks into), and a main execution block that orchestrates everything.

On a high level, the web app contained in hybrid.js does the following:

  1. Loads and parses data from either a “standard” .ply file format, or .splatv, which is a custom binary optimised format. See the following (very useful) discussion on GitHub for details on different representations and compression schemes github.com/mkkellogg/GaussianSplats3D/issues/47
  2. Sets up WebGL initializing the WebGL2 context, compiling and linking the GLSL shaders, and setting up the necessary buffers and textures
  3. GLSL Rendering Pipeline: It implements the core logic to project and render millions of semi-transparent, moving, and rotating Gaussian splats via vertexShaderSource and fragmentShaderSource – discussed in detail below.
  4. Manages the suite of camera controls, allowing users to navigate the 3D scene using a mouse, keyboard, touch gestures or even a gamepad
  5. Some performance optimizations using Web Worker to offload heavy computations like depth-sorting from the main UI thread, ensuring a smooth user experience

The GLSL Shader Code

vertexShaderSource

The primary goal of the vertex shader, defined in const vertexShaderSource is to render a single splat. Unlike traditional rendering where a vertex shader processes one corner of a triangle, this shader processes an entire object (a splat) in one go. For each splat, the shader does the following:

  1. Receives a unique index for the splat
  2. Uses the index to fetch the splat’s properties (position, rotation, scale, color, motion) from a large data texture u_texture
  3. Calculates the splat’s position and appearance at a specific time
  4. Projects the splat’s 3D Gaussian shape into a 2D ellipse on the screen
  5. Takes a simple input square (a “billboard”) and transforms its four corners to perfectly match the size, shape, and orientation of the projected 2D ellipse
  6. Calculates the splat’s color and opacity and passes it to the fragment shader

Vertex Shader Breakdown

Here is a detailed walkthrough of the main function in vertexShaderSource.

1. Initialization and Temporal Filtering

GLSL
gl_Position = vec4(0.0, 0.0, 2.0, 1.0);
    
uvec4 motion1 = texelFetch(u_texture, ivec2(((uint(index) & 0x3ffu) << 2) | 3u, uint(index) >> 10), 0);
vec2 trbf = unpackHalf2x16(motion1.w);
float dt = time - trbf.x;
    
float topacity = exp(-1.0 * pow(dt / trbf.y, 2.0));
if(topacity < 0.02) return;
  • gl_Position sets a default position in clip space far outside the view. If any of the early-exit return statements are hit, the splat is effectively discarded
  • The data for each splat is packed across 4 texels. texelFetch uses the splat’s index to fetch the motion properties of the current splat from the 4th block (... | 3u) of u_texture, a large data structure which holds the raw data
  • unpackHalf2x16(motion1.w) uses a built-in GLSL function to unpack a 32-bit unsigned integer into two 16-bit half-precision floats. Here it extracts the Temporal Radial Basis Function (TRBF) parameters:
    • trbf.x: The center time at which the splat is most “active”
    • trbf.y: The scale or duration of the splat’s activity
  • dt = time - trbf.x calculates the time difference between the global animation time and the splat’s center time
  • topacity = exp(...) is a Gaussian function applied over time. It calculates the splat’s opacity based on delta time dt. The opacity is 1.0 when dt is 0 and falls off as time moves away from the splat’s center time
  • if(topacity < 0.02) return; is an optimisation such that if a splat is temporally invisible, we stop processing

2. Data Fetching and Dynamic Position Calculation

GLSL
uvec4 motion0 = texelFetch(u_texture, ivec2(((uint(index) & 0x3ffu) << 2) | 2u, uint(index) >> 10), 0);
uvec4 static0 = texelFetch(u_texture, ivec2(((uint(index) & 0x3ffu) << 2), uint(index) >> 10), 0);

vec2 m0 = unpackHalf2x16(motion0.x), 
     m1 = unpackHalf2x16(motion0.y), 
     m2 = unpackHalf2x16(motion0.z), 
     m3 = unpackHalf2x16(motion0.w), 
     m4 = unpackHalf2x16(motion1.x); 
vec4 trot = vec4(unpackHalf2x16(motion1.y).xy, unpackHalf2x16(motion1.z).xy) * dt;
vec3 tpos = (vec3(m0.xy, m1.x) * dt + vec3(m1.y, m2.xy) * dt*dt + vec3(m3.xy, m4.x) * dt*dt*dt);
  • We use again texelFetch to fetch more of the splat’s properties, specifically static0 and motion data motion0
  • The motion coefficients for position and rotation are unpacked from the fetched data and stored into a floating point vector with 2 components
  • trot calculates the rotational change as a linear function of time dt
  • tpos calculates the positional change using a cubic polynomial of time dt. This is what allows for complex motion paths (position, velocity, acceleration)

3. 3D to 2D Projection and Culling

GLSL
vec4 cam = view * vec4(uintBitsToFloat(static0.xyz) + tpos, 1);
vec4 pos = projection * cam;

float clip = 1.2 * pos.w;
if (pos.z < -clip || pos.x < -clip || pos.x > clip || pos.y < -clip || pos.y > clip) return;
  • uintBitsToFloat(static0.xyz) converts the splat’s base position, stored as raw 32-bit integer bits, back into three 32-bit floating-point numbers.
  • The calculated motion offset tpos is added to the base position to get the final world position at the current time
  • cam = view * ...: The world position is transformed into camera-space (view-space). cam.z now represents the depth from the camera.
  • pos = projection * ...: The camera-space position is transformed into clip-space
  • if(...) return is frustum culling – a splat is discarded if it’s center is outside the camera’s view

4. Covariance Calculation (The Core Logic)

This is the most complex part. It projects the 3D Gaussian into a 2D ellipse on the screen.

  1. Construct 3D Covariance: The shader calculates the final rotation quaternion rot and scale vector. From these, it constructs the 3×3 rotation matrix R and scaling matrix S. These are used to compute the 3D covariance matrix Vrk = 4.0 * transpose(M) * M, where M = S * R, which represents the 3D Gaussian ellipsoid.
  2. Project 2D Covariance: This is the core mathematical step. It uses the Jacobian J of the perspective projection to project the 3D covariance matrix into a 2D covariance matrix cov2d on the screen. This cov2d mathematically describes the shape and orientation of the final 2D ellipse.
GLSL
uvec4 static1 = texelFetch(u_texture, ivec2(((uint(index) & 0x3ffu) << 2) | 1u, uint(index) >> 10), 0);

vec4 rot = vec4(unpackHalf2x16(static0.w).xy, unpackHalf2x16(static1.x).xy) + trot;
vec3 scale = vec3(unpackHalf2x16(static1.y).xy, unpackHalf2x16(static1.z).x);
rot /= sqrt(dot(rot, rot));

mat3 S = ...; // Scaling matrix from scale
mat3 R = ...; // Rotation matrix from quaternion 'rot'
mat3 M = S * R;
mat3 Vrk = 4.0 * transpose(M) * M; // 3D Covariance Matrix
mat3 J = ...; // Jacobian of the perspective projection
mat3 T = transpose(mat3(view)) * J;
mat3 cov2d = transpose(T) * Vrk * T; // Final 2D Covariance Matrix
  • static1 is fetched, containing the rest of the rotation, scale, and color data.
  • The final rotation quaternion rot and scale vectors scale are computed by combining static and dynamic parts. The quaternion is normalized
  • A 3D covariance matrix Vrk is constructed from the scale and rotation. This matrix represents the 3D Gaussian ellipsoid in world space
  • The Jacobian matrix J of the perspective projection is computed. It describes how 3D camera-space coordinates change with respect to 2D screen coordinates at a given depth cam.z
  • The 3D covariance matrix Vrk is projected into a 2D covariance matrix cov2d using the view matrix and the Jacobian. This cov2d matrix mathematically describes the shape and orientation of the final 2D ellipse on the screen.

5. Finding the Ellipse Axes

The shaders performs an analytical eigen-decomposition on the 2×2 cov2d matrix to find its eigenvalues lambda1 and lambda2 and eigenvector diagonalVector. This gives the majorAxis and minorAxis of the projected ellipse in pixels.

GLSL
float mid = (cov2d[0][0] + cov2d[1][1]) / 2.0;
float radius = length(vec2((cov2d[0][0] - cov2d[1][1]) / 2.0, cov2d[0][1]));
float lambda1 = mid + radius, lambda2 = mid - radius;

if(lambda2 < 0.0) return;
vec2 diagonalVector = normalize(vec2(cov2d[0][1], lambda1 - cov2d[0][0]));
vec2 majorAxis = min(sqrt(2.0 * lambda1), 1024.0) * diagonalVector;
vec2 minorAxis = min(sqrt(2.0 * lambda2), 1024.0) * vec2(diagonalVector.y, -diagonalVector.x);
  • lambda1 and lambda2 are the eigenvalues, representing the squared lengths of the ellipse’s axes
  • diagonalVector is the eigenvector for lambda1, giving the direction of the ellipse’s longest axis
  • majorAxis and minorAxis are the final vectors representing the two axes of the ellipse, scaled to the correct size in pixels

6. Final Color and Position

The shader calculates the final color vColor by combining the splat’s base color with the temporal opacity and a depth-based fade. The final vertex position gl_Position is calculated by taking the splat’s 2D center and displacing it by the majorAxis and minorAxis, effectively stretching a simple input quad to match the projected ellipse.

GLSL
uint rgba = static1.w;
vColor = ...; // Unpack and combine color, depth-fade, and temporal opacity

vec2 vCenter = vec2(pos) / pos.w;
gl_Position = vec4(
        vCenter 
        + position.x * majorAxis / viewport 
        + position.y * minorAxis / viewport, 0.0, 1.0);

    vPosition = position;
  • The final color vColor is calculated by unpacking the 8-bit RGBA values and multiplying by the temporal opacity topacity and a depth-based fading factor
  • The splat’s center vCenter in clip space (pos) is converted to Normalized Device Coordinates (NDC) by dividing by pos.w
  • gl_Position: This is the final step. It takes the splat’s center vCenter and displaces it using the input quad corner (position) scaled by the ellipse’s majorAxis and minorAxis. The axes are converted from pixel units back to NDC by dividing by the viewport dimensions. This transforms the original square into the correctly shaped and oriented ellipse on the screen.
  • The original, untransformed quad corner vPosition is passed to the fragment shader, which uses it to calculate the Gaussian falloff.

fragmentShaderSource

This shader is much simpler. Its job is to color the pixels of the quad that the vertex shader has shaped into an ellipse.

GLSL
  #version 300 es
  precision highp float;
  
  in vec4 vColor;
  in vec2 vPosition;
  
  out vec4 fragColor;
  
  void main () {
      float A = -dot(vPosition, vPosition);
      if (A < -4.0) discard;
      float B = exp(A) * vColor.a;
      fragColor = vec4(B * vColor.rgb, B);
  }
  1. It receives the interpolated color vColor and the original quad coordinate vPosition
  2. It calculates A = -dot(vPosition, vPosition), which is the negative squared distance from the center of the splat
  3. It calculates the final opacity for the pixel using a Gaussian function: B = exp(A) * vColor.a. This creates the characteristic soft, fuzzy falloff from the center of the splat.
  4. It outputs the final pixel color, modulated by this falloff opacity

Camera Controls

The viewer updates the camera’s view matrix in response to user input (mouse, keyboard, touch, etc.) using a standard 3D graphics technique. Instead of recalculating the matrix from scratch, it incrementally modifies it.

Here is the core logic, which is repeated for all user interactions:

  1. Invert the View Matrix: The current viewMatrix (which transforms world coordinates to the camera’s view) is inverted to get the camera’s world matrix. This matrix represents the camera’s actual position and orientation in the 3D scene
  2. Apply Transformation: A translation or rotation is applied to this world matrix based on the user’s input (e.g., a mouse drag or key press).
  3. Invert Back: The modified world matrix is inverted back into a new viewMatrix
  4. Render: This new viewMatrix is sent to the GPU shader for the next frame, and the user sees the updated perspective

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *