Utiliser les shaders de calcul

Ce tutoriel vous accompagnera dans le processus de création d'un shader de calcul basique. Mais d'abord, un peu de contexte sur les shaders de calcul et comment ils marchent avec Godot.

Note

This tutorial assumes you are familiar with shaders generally. If you are new to shaders please read Introduction aux shaders and your first shader before proceeding with this tutorial.

Un shader de calcul est un type spécial de programme de shader qui est orienté vers la programmation générale. En d'autres termes, ils sont plus flexibles que les shaders de sommet et de fragment car ils n'ont pas un but fixe (c.-à-d. transformer des sommets ou écrire des couleurs dans une image). Contrairement aux shaders de fragments et de sommets, les shaders de calcul font très peu de choses en arrière-plan. Le code que vous écrivez est ce que le GPU exécute et c'est à peu près tout. Cela peut en faire un outil très utile pour décharger les calculs lourds au GPU.

Now let's get started by creating a short compute shader.

D'abord, dans l'éditeur de texte externe de votre choix, créez un nouveau fichier appelé compute_example.glsl dans votre dossier de projet. Lorsque vous écrivez des shaders de calcul dans Godot, vous les écrivez directement en GLSL. La langue des shaders de Godot est basée sur le GLSL. Si vous êtes familier avec les shaders normaux de Godot, la syntaxe ci-dessous sera quelque peu familière.

Note

Les shaders de calcul ne peuvent être utilisés que dans les moteurs de rendus basés sur RenderingDevice (Forward+ ou Mobile). Pour suivre ce tutoriel, assurez-vous que vous utilisez le moteur de rendu Forward+ ou Mobile. Le réglage pour le choisir est situé dans le coin supérieur droit de l'éditeur.

Note that compute shader support is generally poor on mobile devices (due to driver bugs), even if they are technically supported.

Jetons un coup d'œil à ce code de calcul de shader  :

#[compute]
#version 450

// Invocations in the (x, y, z) dimension
layout(local_size_x = 2, local_size_y = 1, local_size_z = 1) in;

// A binding to the buffer we create in our script
layout(set = 0, binding = 0, std430) restrict buffer MyDataBuffer {
    float data[];
}
my_data_buffer;

// The code we want to execute in each invocation
void main() {
    // gl_GlobalInvocationID.x uniquely identifies this invocation across all work groups
    my_data_buffer.data[gl_GlobalInvocationID.x] *= 2.0;
}

This code takes an array of floats, multiplies each element by 2 and store the results back in the buffer array. Now let's look at it line-by-line.

#[compute]
#version 450

Ces deux lignes communiquent deux choses :

The following code is a compute shader. This is a Godot-specific hint that is needed for the editor to properly import the shader file.

The code is using GLSL version 450.

Vous ne devriez jamais avoir à changer ces deux lignes pour vos shaders de calcul personnalisées.

// Invocations in the (x, y, z) dimension
layout(local_size_x = 2, local_size_y = 1, local_size_z = 1) in;

Next, we communicate the number of invocations to be used in each workgroup. Invocations are instances of the shader that are running within the same workgroup. When we launch a compute shader from the CPU, we tell it how many workgroups to run. Workgroups run in parallel to each other. While running one workgroup, you cannot access information in another workgroup. However, invocations in the same workgroup can have some limited access to other invocations.

Think about workgroups and invocations as a giant nested for loop.

for (int x = 0; x < workgroup_size_x; x++) {
  for (int y = 0; y < workgroup_size_y; y++) {
     for (int z = 0; z < workgroup_size_z; z++) {
        // Each workgroup runs independently and in parallel.
        for (int local_x = 0; local_x < invocation_size_x; local_x++) {
           for (int local_y = 0; local_y < invocation_size_y; local_y++) {
              for (int local_z = 0; local_z < invocation_size_z; local_z++) {
                 // Compute shader runs here.
              }
           }
        }
     }
  }
}

Workgroups and invocations are an advanced topic. For now, remember that we will be running two invocations per workgroup.

// A binding to the buffer we create in our script
layout(set = 0, binding = 0, std430) restrict buffer MyDataBuffer {
    float data[];
}
my_data_buffer;

Here we provide information about the memory that the compute shader will have access to. The layout property allows us to tell the shader where to look for the buffer, we will need to match these set and binding positions from the CPU side later.

The restrict keyword tells the shader that this buffer is only going to be accessed from one place in this shader. In other words, we won't bind this buffer in another set or binding index. This is important as it allows the shader compiler to optimize the shader code. Always use restrict when you can.

This is an unsized buffer, which means it can be any size. So we need to be careful not to read from an index larger than the size of the buffer.

// The code we want to execute in each invocation
void main() {
    // gl_GlobalInvocationID.x uniquely identifies this invocation across all work groups
    my_data_buffer.data[gl_GlobalInvocationID.x] *= 2.0;
}

Finally, we write the main function which is where all the logic happens. We access a position in the storage buffer using the gl_GlobalInvocationID built-in variables. gl_GlobalInvocationID gives you the global unique ID for the current invocation.

To continue, write the code above into your newly created compute_example.glsl file.

Create a local RenderingDevice

To interact with and execute a compute shader, we need a script. Create a new script in the language of your choice and attach it to any Node in your scene.

Now to execute our shader we need a local RenderingDevice which can be created using the RenderingServer:

# Create a local rendering device.
var rd := RenderingServer.create_local_rendering_device()

// Create a local rendering device.
var rd = RenderingServer.CreateLocalRenderingDevice();

After that, we can load the newly created shader file compute_example.glsl and create a precompiled version of it using this:

# Load GLSL shader
var shader_file := load("res://compute_example.glsl")
var shader_spirv: RDShaderSPIRV = shader_file.get_spirv()
var shader := rd.shader_create_from_spirv(shader_spirv)

// Load GLSL shader
var shaderFile = GD.Load<RDShaderFile>("res://compute_example.glsl");
var shaderBytecode = shaderFile.GetSpirV();
var shader = rd.ShaderCreateFromSpirV(shaderBytecode);

Avertissement

Local RenderingDevices cannot be debugged using tools such as RenderDoc.

Procure des entrées de données

As you might remember, we want to pass an input array to our shader, multiply each element by 2 and get the results.

We need to create a buffer to pass values to a compute shader. We are dealing with an array of floats, so we will use a storage buffer for this example. A storage buffer takes an array of bytes and allows the CPU to transfer data to and from the GPU.

So let's initialize an array of floats and create a storage buffer:

# Prepare our data. We use floats in the shader, so we need 32 bit.
var input := PackedFloat32Array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
var input_bytes := input.to_byte_array()

# Create a storage buffer that can hold our float values.
# Each float has 4 bytes (32 bit) so 10 x 4 = 40 bytes
var buffer := rd.storage_buffer_create(input_bytes.size(), input_bytes)

// Prepare our data. We use floats in the shader, so we need 32 bit.
float[] input = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
var inputBytes = new byte[input.Length * sizeof(float)];
Buffer.BlockCopy(input, 0, inputBytes, 0, inputBytes.Length);

// Create a storage buffer that can hold our float values.
// Each float has 4 bytes (32 bit) so 10 x 4 = 40 bytes
var buffer = rd.StorageBufferCreate((uint)inputBytes.Length, inputBytes);

With the buffer in place we need to tell the rendering device to use this buffer. To do that we will need to create a uniform (like in normal shaders) and assign it to a uniform set which we can pass to our shader later.

# Create a uniform to assign the buffer to the rendering device
var uniform := RDUniform.new()
uniform.uniform_type = RenderingDevice.UNIFORM_TYPE_STORAGE_BUFFER
uniform.binding = 0 # this needs to match the "binding" in our shader file
uniform.add_id(buffer)
var uniform_set := rd.uniform_set_create([uniform], shader, 0) # the last parameter (the 0) needs to match the "set" in our shader file

// Create a uniform to assign the buffer to the rendering device
var uniform = new RDUniform
{
    UniformType = RenderingDevice.UniformType.StorageBuffer,
    Binding = 0
};
uniform.AddId(buffer);
var uniformSet = rd.UniformSetCreate([uniform], shader, 0);

Définir une pipeline de calcul

The next step is to create a set of instructions our GPU can execute. We need a pipeline and a compute list for that.

The steps we need to do to compute our result are:

Create a new pipeline.
Begin a list of instructions for our GPU to execute.
Bind our compute list to our pipeline
Bind our buffer uniform to our pipeline
Specify how many workgroups to use
End the list of instructions

# Create a compute pipeline
var pipeline := rd.compute_pipeline_create(shader)
var compute_list := rd.compute_list_begin()
rd.compute_list_bind_compute_pipeline(compute_list, pipeline)
rd.compute_list_bind_uniform_set(compute_list, uniform_set, 0)
rd.compute_list_dispatch(compute_list, 5, 1, 1)
rd.compute_list_end()

// Create a compute pipeline
var pipeline = rd.ComputePipelineCreate(shader);
var computeList = rd.ComputeListBegin();
rd.ComputeListBindComputePipeline(computeList, pipeline);
rd.ComputeListBindUniformSet(computeList, uniformSet, 0);
rd.ComputeListDispatch(computeList, xGroups: 5, yGroups: 1, zGroups: 1);
rd.ComputeListEnd();

Notez que nous expédions le shader de calcul avec 5 groupes de travail dans l'axe X, et un dans les autres. Comme nous avons 2 invocations locales dans l'axe X (précisé dans notre shader), 10 invocations de shader de calcul seront lancées en tout. Si vous lisez ou écrivez à des indices en dehors de la plage de votre tampon, vous pouvez accéder à la mémoire en dehors du contrôle de votre shader ou des parties d'autres variables, ce qui peut causer des problèmes sur certains matériels.

Exécuter un shader de calcul

After all of this we are almost done, but we still need to execute our pipeline. So far we have only recorded what we would like the GPU to do; we have not actually run the shader program.

To execute our compute shader we need to submit the pipeline to the GPU and wait for the execution to finish:

# Submit to GPU and wait for sync
rd.submit()
rd.sync()

// Submit to GPU and wait for sync
rd.Submit();
rd.Sync();

Ideally, you would not call sync() to synchronize the RenderingDevice right away as it will cause the CPU to wait for the GPU to finish working. In our example, we synchronize right away because we want our data available for reading right away. In general, you will want to wait at least 2 or 3 frames before synchronizing so that the GPU is able to run in parallel with the CPU.

Avertissement

Long computations can cause Windows graphics drivers to "crash" due to TDR being triggered by Windows. This is a mechanism that reinitializes the graphics driver after a certain amount of time has passed without any activity from the graphics driver (usually 5 to 10 seconds).

Depending on the duration your compute shader takes to execute, you may need to split it into multiple dispatches to reduce the time each dispatch takes and reduce the chances of triggering a TDR. Given TDR is time-dependent, slower GPUs may be more prone to TDRs when running a given compute shader compared to a faster GPU.

Récupérer les résultats

You may have noticed that, in the example shader, we modified the contents of the storage buffer. In other words, the shader read from our array and stored the data in the same array again so our results are already there. Let's retrieve the data and print the results to our console.

# Read back the data from the buffer
var output_bytes := rd.buffer_get_data(buffer)
var output := output_bytes.to_float32_array()
print("Input: ", input)
print("Output: ", output)

// Read back the data from the buffers
var outputBytes = rd.BufferGetData(buffer);
var output = new float[input.Length];
Buffer.BlockCopy(outputBytes, 0, output, 0, outputBytes.Length);
GD.Print("Input: ", string.Join(", ", input));
GD.Print("Output: ", string.Join(", ", output));

Freeing memory

The buffer, pipeline, and uniform_set variables we've been using are each an RID. Because RenderingDevice is meant to be a lower-level API, RIDs aren't freed automatically. This means that once you're done using buffer or any other RID, you are responsible for freeing its memory manually using the RenderingDevice's free_rid() method.

Avec ça, vous avez tout ce dont vous avez besoin pour commencer à travailler avec des shaders de calcul.

Voir aussi

The demo projects repository contains a Compute Shader Heightmap demo This project performs heightmap image generation on the CPU and GPU separately, which lets you compare how a similar algorithm can be implemented in two different ways (with the GPU implementation being faster in most cases).