Work in progress

The content of this page was not yet updated for Godot `4.1` and may be outdated. If you know how to improve this page or you can confirm that it's up to date, feel free to open a pull request.

# Data preferences¶

Ever wondered whether one should approach problem X with data structure Y or Z? This article covers a variety of topics related to these dilemmas.

Note

This article makes references to "[something]-time" operations. This terminology comes from algorithm analysis' Big O Notation.

Long-story short, it describes the worst-case scenario of runtime length. In laymen's terms:

"As the size of a problem domain increases, the runtime length of the algorithm..."

• Constant-time, `O(1)`: "...does not increase."

• Logarithmic-time, `O(log n)`: "...increases at a slow rate."

• Linear-time, `O(n)`: "...increases at the same rate."

• Etc.

Imagine if one had to process 3 million data points within a single frame. It would be impossible to craft the feature with a linear-time algorithm since the sheer size of the data would increase the runtime far beyond the time allotted. In comparison, using a constant-time algorithm could handle the operation without issue.

By and large, developers want to avoid engaging in linear-time operations as much as possible. But, if one keeps the scale of a linear-time operation small, and if one does not need to perform the operation often, then it may be acceptable. Balancing these requirements and choosing the right algorithm / data structure for the job is part of what makes programmers' skills valuable.

## Array vs. Dictionary vs. Object¶

Godot stores all variables in the scripting API in the Variant class. Variants can store Variant-compatible data structures such as Array and Dictionary as well as Objects.

Godot implements Array as a `Vector<Variant>`. The engine stores the Array contents in a contiguous section of memory, i.e. they are in a row adjacent to each other.

Note

For those unfamiliar with C++, a Vector is the name of the array object in traditional C++ libraries. It is a "templated" type, meaning that its records can only contain a particular type (denoted by angled brackets). So, for example, a PackedStringArray would be something like a `Vector<String>`.

Contiguous memory stores imply the following operation performance:

• Iterate: Fastest. Great for loops.

• Op: All it does is increment a counter to get to the next record.

• Insert, Erase, Move: Position-dependent. Generally slow.

• Op: Adding/removing/moving content involves moving the adjacent records over (to make room / fill space).

• Fast add/remove from the end.

• Slow add/remove from an arbitrary position.

• Slowest add/remove from the front.

• If doing many inserts/removals from the front, then...

1. invert the array.

2. do a loop which executes the Array changes at the end.

3. re-invert the array.

This makes only 2 copies of the array (still constant time, but slow) versus copying roughly 1/2 of the array, on average, N times (linear time).

• Get, Set: Fastest by position. E.g. can request 0th, 2nd, 10th record, etc. but cannot specify which record you want.

• Op: 1 addition operation from array start position up to desired index.

• Find: Slowest. Identifies the index/position of a value.

• Op: Must iterate through array and compare values until one finds a match.

• Performance is also dependent on whether one needs an exhaustive search.

• If kept ordered, custom search operations can bring it to logarithmic time (relatively fast). Laymen users won't be comfortable with this though. Done by re-sorting the Array after every edit and writing an ordered-aware search algorithm.

Godot implements Dictionary as an `OrderedHashMap<Variant, Variant>`. The engine stores a small array (initialized to 2^3 or 8 records) of key-value pairs. When one attempts to access a value, they provide it a key. It then hashes the key, i.e. converts it into a number. The "hash" is used to calculate the index into the array. As an array, the OHM then has a quick lookup within the "table" of keys mapped to values. When the HashMap becomes too full, it increases to the next power of 2 (so, 16 records, then 32, etc.) and rebuilds the structure.

Hashes are to reduce the chance of a key collision. If one occurs, the table must recalculate another index for the value that takes the previous position into account. In all, this results in constant-time access to all records at the expense of memory and some minor operational efficiency.

1. Hashing every key an arbitrary number of times.

• Hash operations are constant-t