Recently, I started a project to implement a Spacetime Gaussians renderer on Apple’s Metal graphic platform for AR/VR iOS and visionOS application.
Spacetime Gaussians is an evolution of Gaussian Splatting adding a 4th temporal dimension. Instead of storing frame-by-frame geometry (like a video mesh or point cloud per frame), Spacetime Gaussians interpolate motion continuously. This gives smooth animations with no flickering, and you can sample at any time, and from any angle.
My starting point was analysing Kevin Kwok Antimatter15’s fantastic implementation of Spacetime Gaussians in a WebGL app called splaTV. See below for two live demos if you haven’t seen it in action yet.
- Scene 1: Garage Flames antimatter15.com/splaTV
- Scene 2: Steak Sear antimatter15.com/splaTV/?url=sear.splatv
SplaTV’s documentation is sparse (or rather, non-existent) so I broke down the shader logic to see how it all fits together.
Below I’m sharing my notes, should they be helpful to anyone else.
The WebGL viewer’s source code is available on GitHub antimatter15/splaTV/hybrid.js#L271 as a single self-contained file hybrid.js which leverages a hybrid approach of CPU and GPU (WebGL/GLSL) processing. The heart of the rendering happens inside vertexShaderSource, and this post dives into how that shader works under the hood.
High-Level Overview
hybrid.js is structured into several logical parts: a web worker for background processing, GLSL shader code (which this blog post looks into), and a main execution block that orchestrates everything.
On a high level, the web app contained in hybrid.js does the following:
- Loads and parses data from either a “standard”
.plyfile format, or.splatv, which is a custom binary optimised format. See the following (very useful) discussion on GitHub for details on different representations and compression schemes github.com/mkkellogg/GaussianSplats3D/issues/47 - Sets up WebGL initializing the WebGL2 context, compiling and linking the GLSL shaders, and setting up the necessary buffers and textures
- GLSL Rendering Pipeline: It implements the core logic to project and render millions of semi-transparent, moving, and rotating Gaussian splats via
vertexShaderSourceandfragmentShaderSource– discussed in detail below. - Manages the suite of camera controls, allowing users to navigate the 3D scene using a mouse, keyboard, touch gestures or even a gamepad
- Some performance optimizations using Web Worker to offload heavy computations like depth-sorting from the main UI thread, ensuring a smooth user experience
The GLSL Shader Code
vertexShaderSource
The primary goal of the vertex shader, defined in const vertexShaderSource is to render a single splat. Unlike traditional rendering where a vertex shader processes one corner of a triangle, this shader processes an entire object (a splat) in one go. For each splat, the shader does the following:
- Receives a unique
indexfor the splat - Uses the
indexto fetch the splat’s properties (position, rotation, scale, color, motion) from a large data textureu_texture - Calculates the splat’s position and appearance at a specific
time - Projects the splat’s 3D Gaussian shape into a 2D ellipse on the screen
- Takes a simple input square (a “billboard”) and transforms its four corners to perfectly match the size, shape, and orientation of the projected 2D ellipse
- Calculates the splat’s color and opacity and passes it to the fragment shader
Vertex Shader Breakdown
Here is a detailed walkthrough of the main function in vertexShaderSource.
1. Initialization and Temporal Filtering
gl_Position = vec4(0.0, 0.0, 2.0, 1.0);
uvec4 motion1 = texelFetch(u_texture, ivec2(((uint(index) & 0x3ffu) << 2) | 3u, uint(index) >> 10), 0);
vec2 trbf = unpackHalf2x16(motion1.w);
float dt = time - trbf.x;
float topacity = exp(-1.0 * pow(dt / trbf.y, 2.0));
if(topacity < 0.02) return;gl_Positionsets a default position in clip space far outside the view. If any of the early-exit return statements are hit, the splat is effectively discarded- The data for each splat is packed across 4 texels.
texelFetchuses the splat’sindexto fetch themotionproperties of the current splat from the 4th block (... | 3u) ofu_texture, a large data structure which holds the raw data unpackHalf2x16(motion1.w)uses a built-in GLSL function to unpack a 32-bit unsigned integer into two 16-bit half-precision floats. Here it extracts the Temporal Radial Basis Function (TRBF) parameters:trbf.x: The center time at which the splat is most “active”trbf.y: The scale or duration of the splat’s activity
dt = time - trbf.xcalculates the time difference between the global animationtimeand the splat’s center timetopacity = exp(...)is a Gaussian function applied over time. It calculates the splat’s opacity based on delta timedt. The opacity is 1.0 whendtis 0 and falls off astimemoves away from the splat’s center timeif(topacity < 0.02) return;is an optimisation such that if a splat is temporally invisible, we stop processing
2. Data Fetching and Dynamic Position Calculation
uvec4 motion0 = texelFetch(u_texture, ivec2(((uint(index) & 0x3ffu) << 2) | 2u, uint(index) >> 10), 0);
uvec4 static0 = texelFetch(u_texture, ivec2(((uint(index) & 0x3ffu) << 2), uint(index) >> 10), 0);
vec2 m0 = unpackHalf2x16(motion0.x),
m1 = unpackHalf2x16(motion0.y),
m2 = unpackHalf2x16(motion0.z),
m3 = unpackHalf2x16(motion0.w),
m4 = unpackHalf2x16(motion1.x);
vec4 trot = vec4(unpackHalf2x16(motion1.y).xy, unpackHalf2x16(motion1.z).xy) * dt;
vec3 tpos = (vec3(m0.xy, m1.x) * dt + vec3(m1.y, m2.xy) * dt*dt + vec3(m3.xy, m4.x) * dt*dt*dt);- We use again
texelFetchto fetch more of the splat’s properties, specificallystatic0and motion datamotion0 - The motion coefficients for position and rotation are unpacked from the fetched data and stored into a floating point vector with 2 components
trotcalculates the rotational change as a linear function of timedttposcalculates the positional change using a cubic polynomial of timedt. This is what allows for complex motion paths (position, velocity, acceleration)
3. 3D to 2D Projection and Culling
vec4 cam = view * vec4(uintBitsToFloat(static0.xyz) + tpos, 1);
vec4 pos = projection * cam;
float clip = 1.2 * pos.w;
if (pos.z < -clip || pos.x < -clip || pos.x > clip || pos.y < -clip || pos.y > clip) return;uintBitsToFloat(static0.xyz)converts the splat’s base position, stored as raw 32-bit integer bits, back into three 32-bit floating-point numbers.- The calculated motion offset
tposis added to the base position to get the final world position at the currenttime cam = view * ...: The world position is transformed into camera-space (view-space).cam.znow represents the depth from the camera.pos = projection * ...: The camera-space position is transformed into clip-spaceif(...) returnis frustum culling – a splat is discarded if it’s center is outside the camera’s view
4. Covariance Calculation (The Core Logic)
This is the most complex part. It projects the 3D Gaussian into a 2D ellipse on the screen.
- Construct 3D Covariance: The shader calculates the final rotation quaternion
rotand scale vector. From these, it constructs the 3×3 rotation matrixRand scaling matrixS. These are used to compute the 3D covariance matrixVrk = 4.0 * transpose(M) * M, whereM = S * R, which represents the 3D Gaussian ellipsoid. - Project 2D Covariance: This is the core mathematical step. It uses the Jacobian
Jof the perspective projection to project the 3D covariance matrix into a 2D covariance matrixcov2don the screen. Thiscov2dmathematically describes the shape and orientation of the final 2D ellipse.
uvec4 static1 = texelFetch(u_texture, ivec2(((uint(index) & 0x3ffu) << 2) | 1u, uint(index) >> 10), 0);
vec4 rot = vec4(unpackHalf2x16(static0.w).xy, unpackHalf2x16(static1.x).xy) + trot;
vec3 scale = vec3(unpackHalf2x16(static1.y).xy, unpackHalf2x16(static1.z).x);
rot /= sqrt(dot(rot, rot));
mat3 S = ...; // Scaling matrix from scale
mat3 R = ...; // Rotation matrix from quaternion 'rot'
mat3 M = S * R;
mat3 Vrk = 4.0 * transpose(M) * M; // 3D Covariance Matrix
mat3 J = ...; // Jacobian of the perspective projection
mat3 T = transpose(mat3(view)) * J;
mat3 cov2d = transpose(T) * Vrk * T; // Final 2D Covariance Matrixstatic1is fetched, containing the rest of the rotation, scale, and color data.- The final rotation quaternion
rotand scale vectorsscaleare computed by combining static and dynamic parts. The quaternion is normalized - A 3D covariance matrix
Vrkis constructed from the scale and rotation. This matrix represents the 3D Gaussian ellipsoid in world space - The Jacobian matrix
Jof the perspective projection is computed. It describes how 3D camera-space coordinates change with respect to 2D screen coordinates at a given depthcam.z - The 3D covariance matrix
Vrkis projected into a 2D covariance matrixcov2dusing the view matrix and the Jacobian. Thiscov2dmatrix mathematically describes the shape and orientation of the final 2D ellipse on the screen.
5. Finding the Ellipse Axes
The shaders performs an analytical eigen-decomposition on the 2×2 cov2d matrix to find its eigenvalues lambda1 and lambda2 and eigenvector diagonalVector. This gives the majorAxis and minorAxis of the projected ellipse in pixels.
float mid = (cov2d[0][0] + cov2d[1][1]) / 2.0;
float radius = length(vec2((cov2d[0][0] - cov2d[1][1]) / 2.0, cov2d[0][1]));
float lambda1 = mid + radius, lambda2 = mid - radius;
if(lambda2 < 0.0) return;
vec2 diagonalVector = normalize(vec2(cov2d[0][1], lambda1 - cov2d[0][0]));
vec2 majorAxis = min(sqrt(2.0 * lambda1), 1024.0) * diagonalVector;
vec2 minorAxis = min(sqrt(2.0 * lambda2), 1024.0) * vec2(diagonalVector.y, -diagonalVector.x);lambda1andlambda2are the eigenvalues, representing the squared lengths of the ellipse’s axesdiagonalVectoris the eigenvector forlambda1, giving the direction of the ellipse’s longest axismajorAxisandminorAxisare the final vectors representing the two axes of the ellipse, scaled to the correct size in pixels
6. Final Color and Position
The shader calculates the final color vColor by combining the splat’s base color with the temporal opacity and a depth-based fade. The final vertex position gl_Position is calculated by taking the splat’s 2D center and displacing it by the majorAxis and minorAxis, effectively stretching a simple input quad to match the projected ellipse.
uint rgba = static1.w;
vColor = ...; // Unpack and combine color, depth-fade, and temporal opacity
vec2 vCenter = vec2(pos) / pos.w;
gl_Position = vec4(
vCenter
+ position.x * majorAxis / viewport
+ position.y * minorAxis / viewport, 0.0, 1.0);
vPosition = position;- The final color
vColoris calculated by unpacking the 8-bit RGBA values and multiplying by the temporal opacitytopacityand a depth-based fading factor - The splat’s center
vCenterin clip space (pos) is converted to Normalized Device Coordinates (NDC) by dividing bypos.w gl_Position: This is the final step. It takes the splat’s centervCenterand displaces it using the input quad corner (position) scaled by the ellipse’smajorAxisandminorAxis. The axes are converted from pixel units back to NDC by dividing by theviewportdimensions. This transforms the original square into the correctly shaped and oriented ellipse on the screen.- The original, untransformed quad corner
vPositionis passed to the fragment shader, which uses it to calculate the Gaussian falloff.
fragmentShaderSource
This shader is much simpler. Its job is to color the pixels of the quad that the vertex shader has shaped into an ellipse.
#version 300 es
precision highp float;
in vec4 vColor;
in vec2 vPosition;
out vec4 fragColor;
void main () {
float A = -dot(vPosition, vPosition);
if (A < -4.0) discard;
float B = exp(A) * vColor.a;
fragColor = vec4(B * vColor.rgb, B);
}- It receives the interpolated color
vColorand the original quad coordinatevPosition - It calculates
A = -dot(vPosition, vPosition), which is the negative squared distance from the center of the splat - It calculates the final opacity for the pixel using a Gaussian function:
B = exp(A) * vColor.a. This creates the characteristic soft, fuzzy falloff from the center of the splat. - It outputs the final pixel color, modulated by this falloff opacity
Camera Controls
The viewer updates the camera’s view matrix in response to user input (mouse, keyboard, touch, etc.) using a standard 3D graphics technique. Instead of recalculating the matrix from scratch, it incrementally modifies it.
Here is the core logic, which is repeated for all user interactions:
- Invert the View Matrix: The current
viewMatrix(which transforms world coordinates to the camera’s view) is inverted to get the camera’s world matrix. This matrix represents the camera’s actual position and orientation in the 3D scene - Apply Transformation: A translation or rotation is applied to this world matrix based on the user’s input (e.g., a mouse drag or key press).
- Invert Back: The modified world matrix is inverted back into a new viewMatrix
- Render: This new viewMatrix is sent to the GPU shader for the next frame, and the user sees the updated perspective
Leave a Reply