Generelle Optimierungs-Tipps

Einführung

In an ideal world, computers would run at infinite speed. The only limit to what we could achieve would be our imagination. However, in the real world, it's all too easy to produce software that will bring even the fastest computer to its knees.

Thus, designing games and other software is a compromise between what we would like to be possible, and what we can realistically achieve while maintaining good performance.

To achieve the best results, we have two approaches:

  • Work faster.
  • Work smarter.

Und vorzugsweise verwenden wir eine Mischung aus beiden.

Smoke and mirrors

Part of working smarter is recognizing that, in games, we can often get the player to believe they're in a world that is far more complex, interactive, and graphically exciting than it really is. A good programmer is a magician, and should strive to learn the tricks of the trade while trying to invent new ones.

Die Natur der Langsamkeit

To the outside observer, performance problems are often lumped together. But in reality, there are several different kinds of performance problems:

  • A slow process that occurs every frame, leading to a continuously low frame rate.
  • An intermittent process that causes "spikes" of slowness, leading to stalls.
  • A slow process that occurs outside of normal gameplay, for instance, when loading a level.

Jedes davon ist für den Benutzer ärgerlich, aber auf unterschiedliche Weise.

Leistung messen

Das wahrscheinlich wichtigste Instrument zur Optimierung ist die Fähigkeit, die Leistung zu messen - um festzustellen, wo Engpässe liegen, und um den Erfolg unserer Versuche zu messen, diese zu beschleunigen.

Es gibt mehrere Arten Leistung zu messen, wie:

  • Putting a start/stop timer around code of interest.
  • Using the Godot profiler.
  • Using external third-party CPU profilers.
  • Using GPU profilers/debuggers such as NVIDIA Nsight Graphics or apitrace.
  • Checking the frame rate (with V-Sync disabled).

Be very aware that the relative performance of different areas can vary on different hardware. It's often a good idea to measure timings on more than one device. This is especially the case if you're targeting mobile devices.

Limitierungen

CPU profilers are often the go-to method for measuring performance. However, they don't always tell the whole story.

  • Bottlenecks are often on the GPU, "as a result" of instructions given by the CPU.
  • Spikes can occur in the operating system processes (outside of Godot) "as a result" of instructions used in Godot (for example, dynamic memory allocation).
  • You may not always be able to profile specific devices like a mobile phone due to the initial setup required.
  • You may have to solve performance problems that occur on hardware you don't have access to.

Aufgrund dieser Einschränkungen müssen Sie häufig Detektivarbeit leisten um herauszufinden, wo Engpässe liegen.

Detektivarbeit

Detektivarbeit ist eine entscheidende Fähigkeit für Entwickler (sowohl in Bezug auf die Leistung als auch in Bezug auf die Fehlerbehebung). Dies kann Hypothesentests und binäre Suche umfassen.

Hypothesentest

Say, for example, that you believe sprites are slowing down your game. You can test this hypothesis by:

  • Die Leistung messen, wenn Sie weitere Sprites hinzufügen oder einige entfernen.

This may lead to a further hypothesis: does the size of the sprite determine the performance drop?

  • You can test this by keeping everything the same, but changing the sprite size, and measuring performance.

Profiler

Mit Profilern können Sie Ihr Programm zeitlich messen, während es ausgeführt wird. Profiler liefern dann Ergebnisse, aus denen hervorgeht, wie viel Prozent der Zeit in verschiedenen Funktionen und Bereichen verbracht wurden und wie oft Funktionen aufgerufen wurden.

This can be very useful both to identify bottlenecks and to measure the results of your improvements. Sometimes, attempts to improve performance can backfire and lead to slower performance. Always use profiling and timing to guide your efforts.

For more info about using Godot's built-in profiler, see Debugger Panel.

Prinzipien

Donald Knuth said:

Programmierer verschwenden enorm viel Zeit damit, über die Geschwindigkeit unkritischer Teile ihrer Programme nachzudenken bzw. sich darüber Gedanken zu machen und diese Effizienzversuche wirken sich tatsächlich stark negativ aus, wenn Debugging und Wartung in Betracht gezogen werden. Wir sollten kleine Verbesserungen vergessen, etwa in 97% der Fälle: Vorzeitige Optimierung ist die Wurzel allen Übels. Dennoch sollten wir unsere Chancen bei diesen kritischen 3% nicht verpassen.

Die Nachrichten sind sehr wichtig:

  • Developer time is limited. Instead of blindly trying to speed up all aspects of a program, we should concentrate our efforts on the aspects that really matter.
  • Optimierungsversuche führen häufig zu Code, der schwerer zu lesen und zu debuggen ist als nicht optimierter Code. Es liegt in unserem Interesse, dies auf Bereiche zu beschränken, die wirklich davon profitieren.

Just because we can optimize a particular bit of code, it doesn't necessarily mean that we should. Knowing when and when not to optimize is a great skill to develop.

One misleading aspect of the quote is that people tend to focus on the subquote "premature optimization is the root of all evil". While premature optimization is (by definition) undesirable, performant software is the result of performant design.

Performantes Design

The danger with encouraging people to ignore optimization until necessary, is that it conveniently ignores that the most important time to consider performance is at the design stage, before a key has even hit a keyboard. If the design or algorithms of a program are inefficient, then no amount of polishing the details later will make it run fast. It may run faster, but it will never run as fast as a program designed for performance.

This tends to be far more important in game or graphics programming than in general programming. A performant design, even without low-level optimization, will often run many times faster than a mediocre design with low-level optimization.

Inkrementelles Design

Of course, in practice, unless you have prior knowledge, you are unlikely to come up with the best design the first time. Instead, you'll often make a series of versions of a particular area of code, each taking a different approach to the problem, until you come to a satisfactory solution. It's important not to spend too much time on the details at this stage until you have finalized the overall design. Otherwise, much of your work will be thrown out.

It's difficult to give general guidelines for performant design because this is so dependent on the problem. One point worth mentioning though, on the CPU side, is that modern CPUs are nearly always limited by memory bandwidth. This has led to a resurgence in data-oriented design, which involves designing data structures and algorithms for cache locality of data and linear access, rather than jumping around in memory.

Der Optimierungprozess

Assuming we have a reasonable design, and taking our lessons from Knuth, our first step in optimization should be to identify the biggest bottlenecks - the slowest functions, the low-hanging fruit.

Once we've successfully improved the speed of the slowest area, it may no longer be the bottleneck. So we should test/profile again and find the next bottleneck on which to focus.

Der Prozess ist also:

  1. Profile / Identify bottleneck.
  2. Optimize bottleneck.
  3. Return to step 1.

Engpässe optimieren

Einige Profiler sagen Ihnen sogar welcher Teil einer Funktion (welche Datenzugriffe, Berechnungen) die Dinge verlangsamt.

As with design, you should concentrate your efforts first on making sure the algorithms and data structures are the best they can be. Data access should be local (to make best use of CPU cache), and it can often be better to use compact storage of data (again, always profile to test results). Often, you precalculate heavy computations ahead of time. This can be done by performing the computation when loading a level, by loading a file containing precalculated data or simply by storing the results of complex calculations into a script constant and reading its value.

Once algorithms and data are good, you can often make small changes in routines which improve performance. For instance, you can move some calculations outside of loops or transform nested for loops into non-nested loops. (This should be feasible if you know a 2D array's width or height in advance.)

Always retest your timing/bottlenecks after making each change. Some changes will increase speed, others may have a negative effect. Sometimes, a small positive effect will be outweighed by the negatives of more complex code, and you may choose to leave out that optimization.

Anhang

Engpass Mathematik

The proverb "a chain is only as strong as its weakest link" applies directly to performance optimization. If your project is spending 90% of the time in function A, then optimizing A can have a massive effect on performance.

A: 9 ms
Everything else: 1 ms
Total frame time: 10 ms
A: 1 ms
Everything else: 1ms
Total frame time: 2 ms

In this example, improving this bottleneck A by a factor of 9× decreases overall frame time by 5× while increasing frames per second by 5×.

However, if something else is running slowly and also bottlenecking your project, then the same improvement can lead to less dramatic gains:

A: 9 ms
Everything else: 50 ms
Total frame time: 59 ms
A: 1 ms
Everything else: 50 ms
Total frame time: 51 ms

In this example, even though we have hugely optimized function A, the actual gain in terms of frame rate is quite small.

In Spielen werden die Dinge noch komplizierter, weil CPU und GPU unabhängig voneinander laufen. Ihre gesamte Frame-Zeit wird durch die langsamere der beiden bestimmt.

CPU: 9 ms
GPU: 50 ms
Total frame time: 50 ms
CPU: 1 ms
GPU: 50 ms
Total frame time: 50 ms

In this example, we optimized the CPU hugely again, but the frame time didn't improve because we are GPU-bottlenecked.