Up to date

This page is up to date for Godot 4.2. If you still find outdated information, please open an issue.

内部渲染架构

这个页面是对 Godot 4 内部渲染器设计的高阶概述。不适用于旧版本的 Godot。

这个页面的目标是记述最符合 Godot 设计理念的设计决策,为新的渲染贡献者提供入手点。

如果你有关于内部渲染的问题在此未得到解答,欢迎在 Godot 贡献者聊天#rendering 频道中进行提问。

备注

如果你在理解这个页面上的概念时遇到了困难,建议先过一遍 LearnOpenGL 等 OpenGL 教程。

要想高效使用现代低阶 API(Vulkan/Direct3D 12)需要具备中等水平的更高阶 API(OpenGL/Direct3D 11)知识。值得庆幸的是,贡献者很少需要直接使用底层 API。Godot 的渲染器完全基于 OpenGL 和 RenderingDevice,后者是我们对 Vulkan/Direct3D 12 的抽象。

渲染方法

Forward+

这是一种前向渲染器,使用集群方法实现光照。

集群光照使用计算着色器将灯光按照 3D 视锥栅格进行分组。然后在渲染时,像素就能够查询影响某个栅格单元的有哪些灯光,仅对影响该像素的灯光进行光照计算。

这种方法能够大幅提升在桌面硬件上的渲染性能,但是在移动端会略为低效。

Forward 移动

这是使用传统单阶段光照方法的向前渲染器。

针对移动平台设计,但是也能够在桌面平台运行。这种渲染方法针对移动 GPU 进行了优化。移动 GPU 的架构与桌面 GPU 有很大的区别,因为需要考虑电池使用、散热、读写数据时的总体带宽限制等约束。对计算着色器的支持也非常有限,甚至完全不支持。因此,移动渲染器单纯使用基于光栅的着色器(片段/顶点)。

与桌面 GPU 不同,移动 GPU 执行的是基于图块的渲染。整个图像不是作为整体渲染的,而是会细分为较小的图块,适合放置到移动 GPU 更快的内部存储中。图块单独渲染后就会写入到目标纹理上。图形驱动会自动进行这一步操作。

问题在于,这种做法会在我们的传统方法中造成瓶颈。对于桌面渲染,我们会先渲染所有不透明的几何体,然后处理背景,再处理透明的几何体,最后进行后期处理。每个步骤都需要将当前的结果读进图块内存,执行对应的运算后再写出。我们需要等待所有图块都完成后才能继续下一个阶段。

移动渲染器的第一个重要更改就是不使用桌面渲染器所使用的 RGBA16F 纹理格式,改用 R10G10B10A2 UNORM 纹理格式。这样就把所需的带宽减半,并且由于移动硬件通常针对 32 位格式做了优化,所以还会带来进一步的提升。这样做的代价是移动渲染器的 HDR 能力有限,因为颜色数据降低了精度和最大值。

第二个重要更改就是尽可能使用子阶段(sub-pass)。子阶段能够按照图块来执行渲染步骤,节省每个渲染阶段之间读写图块带来的开销。使用子阶段带来的限制是无法读取相邻像素,因为我们只能针对单一图块进行处理。

子阶段的这一限制导致我们无法高效实现辉光、景深等特性。类似地,如果需要读取屏幕纹理或者深度纹理,我们就必须将渲染结果完全写出,限制对子阶段的使用。启用这种特性时,会混用子阶段和正常阶段,因此会带来明显的性能损失。

在桌面平台,使用子阶段对性能不会有任何影响。但对于简单的场景而言,这种渲染方法仍然比集群 Forward 要高效,因为复杂度和带宽占用都相对较低。这种情况在低端 GPU、集成显卡、VR 应用中尤为明显。

由于关注点在于低端设备,这种渲染方法并不提供 SDFGI、体积雾和雾体积等高端渲染特性。部分后期处理效果也不可用。

兼容

备注

这是使用 OpenGL 驱动时唯一可用的渲染方法。这种渲染方法在使用 Vulkan 和 Direct3D 12 时不可用。

这是传统的(非集群)向前渲染器,针对的是不支持 Vulkan 的老旧 GPU,但在较新的硬件上仍然能够非常高效地工作。确切地说,这种渲染器针对较旧、较低端的移动设备进行了优化。不过,很多优化也适用于较旧、较低端的桌面设备,因此也是不错的选择。

与“移动”渲染器类似,“兼容”渲染器在进行 3D 渲染时使用的也是 R10G10B10A2 UNORM 纹理。与移动渲染器不同的是,颜色都经过了色调映射,以 sRGB 格式存储,因此不支持 HDR。这样就不需要再执行色调映射阶段,能够使用低位纹理,不会产生明显的条带。

“兼容”渲染器在绘制带光照的对象时使用的传统的单阶段向前方法,但是带阴影的灯光会使用多阶段方法。确切地说,第一个阶段能够绘制多个不带阴影的灯光以及一个带阴影的 DirectionalLight3D。后续的各个阶段中,最多只能分别绘制一个带阴影的 OmniLight3D、 SpotLight3D、 DirectionalLight3D。带阴影的灯光对场景的影响与不带阴影的灯光不同,因为光照的混合使用的是 sRGB 空间而不是线性空间。这种区别会影响场景的外观,针对“兼容”渲染器设计场景时需要谨记于心。

由于关注点在于低端设备,这种渲染方法并不提供高端渲染特性(与 Forward 移动相比更少)。大多数后期处理效果不可用。

为什么不使用延迟渲染?

向前渲染通常能够在性能和灵活性之间达到更好的平衡,尤其是在灯光使用了集群方法的情况下。延迟渲染虽然在某些情况下更快,但是灵活性较低、使用 MSAA 需要特殊处理。MSAA 能够为非写实画风的游戏带来很大提升,因此我们选择在 Godot 4 使用向前渲染(Godot 3 也一样)。

话虽如此,向前渲染器中确实有一部分是使用延迟方法执行的,以便在可能的情况下进行一些优化。这一点尤其适用于 VoxelGI 和 SDFGI。

未来可能会开发集群延迟渲染器。这种渲染器可以在对性能的要求大于灵活性的场合使用。

渲染驱动

Godot 4 支持以下图形 API:

Vulkan

这是 Godot 4 的主要驱动,大部分开发集中在这个驱动上。

Vulkan 1.0 是必要的基准,Vulkan 1.1 和 1.2 的特性会有可用时使用。我们使用 volk 作为 Vulkan 加载器,使用 Vulkan Memory Allocator 进行内存管理。

使用 Vulkan 驱动时支持 Forward+ 和移动 渲染方法

Vulkan 上下文的创建:

Direct3D 12 context creation:

Direct3D 12

与 Vulkan 类似,Direct3D 12 驱动仅支持现代平台,是针对 Windows 和 Xbox 设计的(鉴于 Xbox 上无法直接使用 Vulkan)。

使用 Direct3D 12 时支持 Forward+ 和移动 渲染方法

核心着色器 are shared with the Vulkan renderer. Shaders are transpiled from GLSL to HLSL using Mesa NIR (more information). This means you don't need to know HLSL to work on the Direct3D 12 renderer, although knowing the language's basics is recommended to ease debugging.

This driver is still experimental and only available in Godot 4.3 and later. While Direct3D 12 allows supporting Direct3D-exclusive features on Windows 11 such as windowed optimizations and Auto HDR, Vulkan is still recommended for most projects. See the pull request that introduced Direct3D 12 support for more information.

Metal

Godot supports Metal rendering via MoltenVK, as macOS and iOS do not support Vulkan natively. This is done automatically when specifying the Vulkan driver in the Project Settings.

MoltenVK makes driver maintenance easy at the cost of some performance overhead. Also, MoltenVK has several limitations that a native Metal driver implementation wouldn't have. Both the clustered and mobile 渲染方法 can be used with a Metal backend via MoltenVK.

A native Metal driver is planned in the future for better performance and compatibility.

OpenGL

这个驱动使用 OpenGL ES 3.0,针对的是不支持 Vulkan 的旧有设备以及低端设备。桌面平台运行该驱动时使用的是 OpenGL 3.3 Core Profile,因为桌面平台的大部分图形驱动不支持 OpenGL ES。Web 导出使用的是 WebGL 2.0。

使用 OpenGL 驱动是只能使用 兼容 渲染方法。

核心着色器 与 Vulkan 渲染器完全不同。

截止到 2023 年五月,该驱动仍然处于开发状态。许多特性仍未实现,尤其是 3D 特性。

渲染驱动/方法总结

目前可用的渲染 API + 渲染方法组合如下:

  • Vulkan + Forward+

  • Vulkan + Forward 移动

  • Direct3D 12 + Forward+

  • Direct3D 12 + Forward 移动

  • Metal + Forward+(通过 MoltenVK)

  • Metal + Forward 移动(通过 MoltenVK)

  • OpenGL + 兼容

Each combination has its own limitations and performance characteristics. Make sure to test your changes on all rendering methods if possible before opening a pull request.

RenderingDevice 抽象

备注

OpenGL 驱动不使用 RenderingDevice 抽象。

To make the complexity of modern low-level graphics APIs more manageable, Godot uses its own abstraction called RenderingDevice.

This means that when writing code for modern rendering methods, you don't actually use the Vulkan or Direct3D 12 APIs directly. While this is still lower-level than an API like OpenGL, this makes working on the renderer easier, as RenderingDevice will abstract many API-specific quirks for you. The RenderingDevice presents a similar level of abstraction as Metal or WebGPU.

Vulkan RenderingDevice 实现:

Direct3D 12 RenderingDevice implementation:

核心渲染类架构

This diagram represents the structure of rendering classes in Godot, including the RenderingDevice abstraction:

../../../_images/rendering_architecture_diagram.webp

View at full size

核心着色器

While shaders in Godot projects are written using a custom language inspired by GLSL, core shaders are written directly in GLSL.

These core shaders are embedded in the editor and export template binaries at compile-time. To see any changes you've made to those GLSL shaders, you need to recompile the editor or export template binary.

Some material features such as height mapping, refraction and proximity fade are not part of core shaders, and are performed in the default BaseMaterial3D using the Godot shader language instead (not GLSL). This is done by procedurally generating the required shader code depending on the features enabled in the material.

By convention, shader files with _inc in their name are included in other GLSL files for better code reuse. Standard GLSL preprocessing is used to achieve this.

警告

Core material shaders will be used by every material in the scene – both with the default BaseMaterial3D and custom shaders. As a result, these shaders must be kept as simple as possible to avoid performance issues and ensure shader compilation doesn't become too slow.

If you use if branching in a shader, performance may decrease as VGPR usage will increase in the shader. This happens even if all pixels evaluate to true or false in a given frame.

If you use #if preprocessor branching, the number of required shader versions will increase in the scene. In a worst-case scenario, adding a single boolean #define can double the number of shader versions that may need to be compiled in a given scene. In some cases, Vulkan specialization constants can be used as a faster (but more limited) alternative.

This means there is a high barrier to adding new built-in material features in Godot, both in the core shaders and BaseMaterial3D. While BaseMaterial3D can make use of dynamic code generation to only include the shader code if the feature is enabled, it'll still require generating more shader versions when these features are used in a project. This can make shader compilation stutter more noticeable in complex 3D scenes.

See The Shader Permutation Problem and Branching on a GPU blog posts for more information.

核心 GLSL 材质着色器:

材质着色器生成:

Forward+ 和 Forward Mobile 渲染方法的其他 GLSL 着色器:

Compatibility 渲染方法的其他 GLSL 着色器:

2D 与 3D 渲染的拆分

备注

The following is only applicable in the Forward+ and Forward Mobile rendering methods, not in Compatibility. Multiple Viewports can be used to emulate this when using the Compatibility backend, or to perform 2D resolution scaling.

2D and 3D are rendered to separate buffers, as 2D rendering in Godot is performed in LDR sRGB-space while 3D rendering uses HDR linear space.

The color format used for 2D rendering is RGB8 (RGBA8 if the Transparent property on the Viewport is enabled). 3D rendering uses a 24-bit unsigned normalized integer depth buffer, or 32-bit signed floating-point if a 24-bit depth buffer is not supported by the hardware. 2D rendering does not use a depth buffer.

3D resolution scaling is performed differently depending on whether bilinear or FSR 1.0 scaling is used. When bilinear scaling is used, no special upscaling shader is run. Instead, the viewport's texture is stretched and displayed with a linear sampler (which makes the filtering happen directly on the hardware). This allows maximizing the performance of bilinear 3D scaling.

The configure() function in RenderSceneBuffersRD reallocates the 2D/3D buffers when the resolution or scaling changes.

Dynamic resolution scaling isn't supported yet, but is planned in a future Godot release.

2D and 3D rendering buffer configuration C++ code:

FSR 1.0:

2D 渲染技术

2D light rendering is performed in a single pass to allow for better performance with large amounts of lights.

The Forward+ and Mobile rendering methods don't feature 2D batching yet, but it's planned for a future release.

The Compatibility backend features 2D batching to improve performance, which is especially noticeable with lots of text on screen.

MSAA can be enabled in 2D to provide "automatic" line and polygon antialiasing, but FXAA does not affect 2D rendering as it's calculated before 2D rendering begins. Godot's 2D drawing methods such as the Line2D node or some CanvasItem draw_*() methods provide their own way of antialiasing based on triangle strips and vertex colors, which don't require MSAA to work.

A 2D signed distance field representing LightOccluder2D nodes in the viewport is automatically generated if an user shader requests it. This can be used for various effects in custom shaders, such as 2D global illumination. It is also used to calculate particle collisions in 2D.

2D SDF 生成 GLSL 着色器:

3D 渲染技术

分批和实例

In the Forward+ backend, Vulkan instancing is used to group rendering of identical objects for performance. This is not as fast as static mesh merging, but it still allows instances to be culled individually.

精灵、多边形和线条渲染

备注

Reflection probe and decal rendering are currently not available in the Compatibility backend.

As its name implies, the Forward+ backend uses clustered lighting. This allows using as many lights as you want; performance largely depends on screen coverage. Shadow-less lights can be almost free if they don't occupy much space on screen.

All rendering methods also support rendering up to 8 directional lights at the same time (albeit with lower shadow quality when more than one light has shadows enabled).

The Forward Mobile backend uses a single-pass lighting approach, with a limitation of 8 OmniLights + 8 SpotLights affecting each Mesh resource (plus a limitation of 256 OmniLights + 256 SpotLights in the camera view). These limits are hardcoded and can't be adjusted in the project settings.

The Compatibility backend uses a hybrid single-pass + multi-pass lighting approach. Lights without shadows are rendered in a single pass. Lights with shadows are rendered in multiple passes. This is required for performance reasons on mobile devices. As a result, performance does not scale well with many shadow-casting lights. It is recommended to only have a handful of lights with shadows in the camera frustum at a time and for those lights to be spread apart so that each object is only touched by 1 or 2 shadowed lights at a time. The maximum number of lights visible at once can be adjusted in the project settings.

In all 3 methods, lights without shadows are much cheaper than lights with shadows. To improve performance, lights are only updated when the light is modified or when objects in its radius are modified. Godot currently doesn't separate static shadow rendering from dynamic shadow rendering, but this is planned in a future release.

Clustering is also used for reflection probes and decal rendering in the Forward+ backend.

阴影贴图

Both Forward+ and Forward Mobile methods use PCF to filter shadow maps and create a soft penumbra. Instead of using a fixed PCF pattern, these methods use a vogel disk pattern which allows for changing the number of samples and smoothly changing the quality.

Godot also supports percentage-closer soft shadows (PCSS) for more realistic shadow penumbra rendering. PCSS shadows are limited to the Forward+ backend as they're too demanding to be usable in the Forward Mobile backend. PCSS also uses a vogel-disk shaped kernel.

Additionally, both shadow-mapping techniques rotate the kernel on a per-pixel basis to help soften under-sampling artifacts.

The Compatibility backend doesn't support shadow mapping for any light types yet.

Temporal antialiasing

备注

Only available in the Forward+ backend, not the Forward Mobile or Compatibility methods.

Godot uses a custom TAA implementation based on the old TAA implementation from Spartan Engine.

Temporal antialiasing requires motion vectors to work. If motion vectors are not correctly generated, ghosting will occur when the camera or objects move.

Motion vectors are generated on the GPU in the main material shader. This is done by running the vertex shader corresponding to the previous rendered frame (with the previous camera transform) in addition to the vertex shader for the current rendered frame, then storing the difference between them in a color buffer.

Alternatively, FSR 2.2 can be used as an upscaling solution that also provides its own temporal antialiasing algorithm. FSR 2.2 is implemented on top of the RenderingDevice abstraction as opposed to using AMD's reference code directly.

TAA resolve:

FSR 2.2:

全局光照

备注

VoxelGI and SDFGI are only available in the Forward+ backend, not the Forward Mobile or Compatibility methods.

LightmapGI baking is only available in the Forward+ and Forward Mobile methods, and can only be performed within the editor (not in an exported project). LightmapGI rendering will eventually be supported by the Compatibility backend.

Godot supports voxel-based GI (VoxelGI), signed distance field GI (SDFGI) and lightmap baking and rendering (LightmapGI). These techniques can be used simultaneously if desired.

Lightmap baking happens on the GPU using Vulkan compute shaders. The GPU-based lightmapper is implemented in the LightmapperRD class, which inherits from the Lightmapper class. This allows for implementing additional lightmappers, paving the way for a future port of the CPU-based lightmapper present in Godot 3.x. This would allow baking lightmaps while using the Compatibility backend.

Core GI C++ code:

Core GI GLSL shaders:

光照贴图器 C++ 代码:

光照贴图器 GLSL 着色器:

景深

备注

Only available in the Forward+ and Forward Mobile methods, not the Compatibility backend.

The Forward+ and Forward Mobile methods use different approaches to DOF rendering, with different visual results. This is done to best match the performance characteristics of the target hardware. In Clustered Forward, DOF is performed using a compute shader. In Forward Mobile, DOF is performed using a fragment shader (raster).

Box, hexagon and circle bokeh shapes are available (from fastest to slowest). Depth of field can optionally be jittered every frame to improve its appearance when temporal antialiasing is enabled.

Depth of field C++ code:

Depth of field GLSL shader (compute - used for Forward+):

Depth of field GLSL shader (raster - used for Forward Mobile):

Screen-space effects (SSAO, SSIL, SSR, SSS)

备注

Only available in the Forward+ backend, not the Forward Mobile or Compatibility methods.

The Forward+ backend supports screen-space ambient occlusion, screen-space indirect lighting, screen-space reflections and subsurface scattering.

SSAO uses an implementation derived from Intel's ASSAO (converted to Vulkan). SSIL is derived from SSAO to provide high-performance indirect lighting.

When both SSAO and SSIL are enabled, parts of SSAO and SSIL are shared to reduce the performance impact.

SSAO and SSIL are performed at half resolution by default to improve performance. SSR is always performed at half resolution to improve performance.

屏幕空间效果 C++ 代码:

屏幕空间环境光遮蔽 GLSL 着色器:

屏幕空间间接光照 GLSL 着色器:

屏幕空间反射 GLSL 着色器:

次表面散射 GLSL:

天空渲染

Godot supports using shaders to render the sky background. The radiance map (which is used to provide ambient light and reflections for PBR materials) is automatically updated based on the sky shader.

The SkyMaterial resources such as ProceduralSkyMaterial, PhysicalSkyMaterial and PanoramaSkyMaterial generate a built-in shader for sky rendering. This is similar to what BaseMaterial3D provides for 3D scene materials.

A detailed technical implementation can be found in the Custom sky shaders in Godot 4.0 article.

天空渲染 C++ 代码:

天空渲染 GLSL 着色器:

体积雾

备注

Only available in the Forward+ backend, not the Forward Mobile or Compatibility methods.

参见

雾着色器

Godot supports a frustum-aligned voxel (froxel) approach to volumetric fog rendering. As opposed to a post-processing filter, this approach is more general-purpose as it can work with any light type. Fog can also use shaders for custom behavior, which allows animating the fog or using a 3D texture to represent density.

The FogMaterial resource generates a built-in shader for FogVolume nodes. This is similar to what BaseMaterial3D provides for 3D scene materials.

A detailed technical explanation can be found in the Fog Volumes arrive in Godot 4.0 article.

体积雾 C++ 代码:

体积雾 GLSL 着色器:

遮挡剔除

While modern GPUs can handle drawing a lot of triangles, the number of draw calls in complex scenes can still be a bottleneck (even with Vulkan and Direct3D 12).

Godot 4 supports occlusion culling to reduce overdraw (when the depth prepass is disabled) and reduce vertex throughput. This is done by rasterizing a low-resolution buffer on the CPU using Embree. The buffer's resolution depends on the number of CPU threads on the system, as this is done in parallel. This buffer includes occluder shapes that were baked in the editor or created at run-time.

As complex occluders can introduce a lot of strain on the CPU, baked occluders can be simplified automatically when generated in the editor.

Godot's occlusion culling doesn't support dynamic occluders yet, but OccluderInstance3D nodes can still have their visibility toggled or be moved. However, this will be slow when updating complex occluders this way. Therefore, updating occluders at run-time is best done only on simple occluder shapes such as quads or cuboids.

This CPU-based approach has a few advantages over other solutions, such as portals and rooms or a GPU-based culling solution:

  • No manual setup required (but can be tweaked manually for best performance).

  • No frame delay, which is problematic in cutscenes during camera cuts or when the camera moves fast behind a wall.

  • Works the same on all rendering drivers and methods, with no unpredictable behavior depending on the driver or GPU hardware.

Occlusion culling is performed by registering occluder meshes, which is done using OccluderInstance3D nodes (which themselves use Occluder3D resources). RenderingServer then performs occlusion culling by calling Embree in RendererSceneOcclusionCull.

遮挡剔除 C++ 代码:

Visibility range (LOD)

Godot supports manually authored hierarchical level of detail (HLOD), with distances specified by the user in the inspector.

In RenderingSceneCull, the _scene_cull() and _render_scene() functions are where most of the LOD determination happens. Each viewport can render the same mesh with different LODs (to allow for split screen rendering to look correct).

Visibility range C++ code:

Automatic mesh LOD

The ImporterMesh class is used for the 3D mesh import workflow in the editor. Its generate_lods() function handles generating using the meshoptimizer library.

LOD mesh generation also generates shadow meshes at the same time. These are meshes that have their vertices welded regardless of smoothing and materials. This is used to improve shadow rendering performance by lowering the vertex throughput required to render shadows.

The RenderingSceneCull class's _render_scene() function determines which mesh LOD should be used when rendering. Each viewport can render the same mesh with different LODs (to allow for split screen rendering to look correct).

The mesh LOD is automatically chosen based on a screen coverage metric. This takes resolution and camera FOV changes into account without requiring user intervention. The threshold multiplier can be adjusted in the project settings.

To improve performance, shadow rendering and reflection probe rendering also choose their own mesh LOD thresholds (which can be different from the main scene rendering).

Mesh LOD generation on import C++ code:

Mesh LOD determination C++ code: