CPU optimization

Mesure des performances

We have to know where the "bottlenecks" are to know how to speed up our program. Bottlenecks are the slowest parts of the program that limit the rate that everything can progress. Focussing on bottlenecks allows us to concentrate our efforts on optimizing the areas which will give us the greatest speed improvement, instead of spending a lot of time optimizing functions that will lead to small performance improvements.

Pour le CPU, le moyen le plus simple d'identifier les goulets d'étranglement est d'utiliser un profileur.

CPU profileurs

Les profileurs fonctionnent en parallèle de votre programme et prennent des mesures de temps pour déterminer la proportion de temps passé dans chaque fonction.

The Godot IDE conveniently has a built-in profiler. It does not run every time you start your project: it must be manually started and stopped. This is because, like most profilers, recording these timing measurements can slow down your project significantly.

Après le profilage, vous pouvez consulter les résultats pour une image.

../../_images/godot_profiler.png
Screenshot of the Godot profiler

Results of a profile of one of the demo projects.

Note

Nous pouvons voir le coût des processus intégrés tels que la physique et l'audio, ainsi que le coût de nos propres fonctions de script en bas.

Time spent waiting for various built-in servers may not be counted in the profilers. This is a known bug.

Lorsqu'un projet se déroule lentement, vous verrez souvent une fonction ou un processus évident prendre beaucoup plus de temps que d'autres. C'est votre principal goulot d'étranglement, et vous pouvez généralement augmenter la vitesse en optimisant cette partie.

For more info about using Godot's built-in profiler, see Panneau de débogage.

Profileurs externes

Bien que le profileur de l'IDE Godot soit très pratique et utile, il faut parfois plus de puissance et la capacité de profiler le code source du moteur Godot lui-même.

You can use a number of third party profilers to do this including Valgrind, VerySleepy, HotSpot, Visual Studio and Intel VTune.

Note

You will need to compile Godot from source to use a third-party profiler. This is required to obtain debugging symbols. You can also use a debug build, however, note that the results of profiling a debug build will be different to a release build, because debug builds are less optimized. Bottlenecks are often in a different place in debug builds, so you should profile release builds whenever possible.

Screenshot of Callgrind

Example results from Callgrind, which is part of Valgrind.

De gauche à droite, Callgrind indique le pourcentage de temps passé dans une fonction et ses enfants (Inclusive), le pourcentage de temps passé dans la fonction elle-même, à l'exclusion des fonctions enfants (Self), le nombre de fois que la fonction est appelée, le nom de la fonction et le fichier ou module.

In this example, we can see nearly all time is spent under the Main::iteration() function. This is the master function in the Godot source code that is called repeatedly. It causes frames to be drawn, physics ticks to be simulated, and nodes and scripts to be updated. A large proportion of the time is spent in the functions to render a canvas (66%), because this example uses a 2D benchmark. Below this, we see that almost 50% of the time is spent outside Godot code in libglapi and i965_dri (the graphics driver). This tells us the a large proportion of CPU time is being spent in the graphics driver.

This is actually an excellent example because, in an ideal world, only a very small proportion of time would be spent in the graphics driver. This is an indication that there is a problem with too much communication and work being done in the graphics API. This specific profiling led to the development of 2D batching, which greatly speeds up 2D rendering by reducing bottlenecks in this area.

Chronométrer manuellement des fonctions

Another handy technique, especially once you have identified the bottleneck using a profiler, is to manually time the function or area under test. The specifics vary depending on the language, but in GDScript, you would do the following:

var time_start = OS.get_ticks_usec()

# Your function you want to time
update_enemies()

var time_end = OS.get_ticks_usec()
print("update_enemies() took %d microseconds" % time_end - time_start)

When manually timing functions, it is usually a good idea to run the function many times (1,000 or more times), instead of just once (unless it is a very slow function). The reason for doing this is that timers often have limited accuracy. Moreover, CPUs will schedule processes in a haphazard manner. Therefore, an average over a series of runs is more accurate than a single measurement.

Lorsque vous essayez d'optimiser les fonctions, veillez à les profiler ou à les chronométrer au fur et à mesure. Cela vous permettra d'obtenir un retour d'information crucial pour savoir si l'optimisation fonctionne (ou non).

Caches

CPU caches are something else to be particularly aware of, especially when comparing timing results of two different versions of a function. The results can be highly dependent on whether the data is in the CPU cache or not. CPUs don't load data directly from the system RAM, even though it's huge in comparison to the CPU cache (several gigabytes instead of a few megabytes). This is because system RAM is very slow to access. Instead, CPUs load data from a smaller, faster bank of memory called cache. Loading data from cache is very fast, but every time you try and load a memory address that is not stored in cache, the cache must make a trip to main memory and slowly load in some data. This delay can result in the CPU sitting around idle for a long time, and is referred to as a "cache miss".

This means that the first time you run a function, it may run slowly because the data is not in the CPU cache. The second and later times, it may run much faster because the data is in the cache. Due to this, always use averages when timing, and be aware of the effects of cache.

La compréhension de la mise en cache est également cruciale pour l'optimisation CPU. Si vous disposez d'un algorithme (routine) qui charge de petits morceaux de données à partir de zones de la mémoire principale réparties de manière aléatoire, cela peut entraîner de nombreux cache misses, la plupart du temps, le CPU attendra des données au lieu d'effectuer un travail quelconque. Au lieu de cela, si vous pouvez faire en sorte que vos accès aux données soient localisés, ou mieux encore, si vous accédez à la mémoire de manière linéaire (comme une liste continue), alors le cache fonctionnera de manière optimale et le CPU pourra travailler aussi vite que possible.

Godot usually takes care of such low-level details for you. For example, the Server APIs make sure data is optimized for caching already for things like rendering and physics. Still, you should be especially aware of caching when using GDNative.

Langages

Godot supports a number of different languages, and it is worth bearing in mind that there are trade-offs involved. Some languages are designed for ease of use at the cost of speed, and others are faster but more difficult to work with.

Les fonctions intégrées du moteur fonctionnent à la même vitesse, quel que soit le langage de script que vous choisissez. Si votre projet effectue beaucoup de calculs dans son propre code, envisagez de déplacer ces calculs vers un langage plus rapide.

GDScript

GDScript is designed to be easy to use and iterate, and is ideal for making many types of games. However, in this language, ease of use is considered more important than performance. If you need to make heavy calculations, consider moving some of your project to one of the other languages.

C#

C# is popular and has first-class support in Godot.It offers a good compromise between speed and ease of use. Beware of possible garbage collection pauses and leaks that can occur during gameplay, though. A common approach to workaround issues with garbage collection is to use object pooling, which is outside the scope of this guide.

Autres langages

Des tiers fournissent un support pour plusieurs autres langages, notamment Rust et Javascript.

C++

Godot is written in C++. Using C++ will usually result in the fastest code. However, on a practical level, it is the most difficult to deploy to end users' machines on different platforms. Options for using C++ include GDNative and custom modules.

Sujets

Consider using threads when making a lot of calculations that can run in parallel to each other. Modern CPUs have multiple cores, each one capable of doing a limited amount of work. By spreading work over multiple threads, you can move further towards peak CPU efficiency.

The disadvantage of threads is that you have to be incredibly careful. As each CPU core operates independently, they can end up trying to access the same memory at the same time. One thread can be reading to a variable while another is writing: this is called a race condition. Before you use threads, make sure you understand the dangers and how to try and prevent these race conditions.

Threads can also make debugging considerably more difficult. The GDScript debugger doesn't support setting up breakpoints in threads yet.

For more information on threads, see Utilisation de plusieurs threads.

L'arbre de scène

Although Nodes are an incredibly powerful and versatile concept, be aware that every node has a cost. Built-in functions such as _process() and _physics_process() propagate through the tree. This housekeeping can reduce performance when you have very large numbers of nodes (usually in the thousands).

Each node is handled individually in the Godot renderer. Therefore, a smaller number of nodes with more in each can lead to better performance.

One quirk of the SceneTree is that you can sometimes get much better performance by removing nodes from the SceneTree, rather than by pausing or hiding them. You don't have to delete a detached node. You can for example, keep a reference to a node, detach it from the scene tree using Node.remove_child(node), then reattach it later using Node.add_child(node). This can be very useful for adding and removing areas from a game, for example.

Vous pouvez éviter complètement SceneTree en utilisant les API serveur. Pour plus d'informations, voir Optimisation à l'aide de serveurs.

Physique

In some situations, physics can end up becoming a bottleneck. This is particularly the case with complex worlds and large numbers of physics objects.

Here are some techniques to speed up physics:

  • Try using simplified versions of your rendered geometry for collision shapes. Often, this won't be noticeable for end users, but can greatly increase performance.
  • Essayez de retirer des objets de la physique lorsqu'ils sont hors de vue / en dehors de la zone actuelle, ou de réutiliser des objets de la physique (peut-être que vous autorisez 8 monstres par zone, par exemple, et que vous les réutilisez).

Another crucial aspect to physics is the physics tick rate. In some games, you can greatly reduce the tick rate, and instead of for example, updating physics 60 times per second, you may update them only 30 or even 20 times per second. This can greatly reduce the CPU load.

The downside of changing physics tick rate is you can get jerky movement or jitter when the physics update rate does not match the frames per second rendered. Also, decreasing the physics tick rate will increase input lag. It's recommended to stick to the default physics tick rate (60 Hz) in most games that feature real-time player movement.

The solution to jitter is to use fixed timestep interpolation, which involves smoothing the rendered positions and rotations over multiple frames to match the physics. You can either implement this yourself or use a third-party addon. Performance-wise, interpolation is a very cheap operation compared to running a physics tick. It's orders of magnitude faster, so this can be a significant performance win while also reducing jitter.