In any application or game, sound and music playback will have a slight delay. For games, this delay is often so small that it is negligible. Sound effects will come out a few milliseconds after any play() function is called. For music this does not matter as in most games it does not interact with the gameplay.
Still, for some games (mainly, rhythm games), it may be required to synchronize player actions with something happening in a song (usually in sync with the BPM). For this, having more precise timing information for an exact playback position is useful.
Achieving very low playback timing precision is difficult. This is because many factors are at play during audio playback:
- Mixed chunks of audio are not played immediately.
The most common way to reduce latency is to shrink the audio buffers (again, by editing the latency setting in the project settings). The problem is that when latency is too small, sound mixing will require considerably more CPU. This increases the risk of skipping (a crack in sound because a mix callback was lost).
This is a common tradeoff, so Godot ships with sensible defaults that should not need to be altered.
The problem, in the end, is not this slight delay but synchronizing graphics and audio for games that require it. Beginning with Godot 3.2, some helpers were added to obtain more precise playback timing.
As mentioned before, If you call AudioStreamPlayer.play(), sound will not begin immediately, but when the audio thread processes the next chunk.
The output latency (what happens after the mix) can also be estimated by calling AudioServer.get_output_latency().
Add these two and it's possible to guess almost exactly when sound or music will begin playing in the speakers during _process():
In the long run, though, as the sound hardware clock is never exactly in sync with the system clock, the timing information will slowly drift away.
To compensate for the "chunked" output, there is a function that can help: AudioServer.get_time_since_last_mix().
To increase precision, subtract the latency information (how much it takes for the audio to be heard after it was mixed):