UUID stands for Universally Unique Identifier.
0x01 Version#
UUID has different versions, each with different use cases. For example, Version 4 recommends generating all variable factors randomly. In many scenarios, this is a very convenient implementation method. Version 1 uses a combination of timestamp, clock sequence, and node information (machine information) to ensure global uniqueness in some distributed system scenarios. Twitter's snowflake can be considered as a simplified version of UUID Version 1. So far, there are a total of 5 implementation versions of UUID:
- Version 1: Strictly implemented according to the meaning of each field defined by UUID, using the variable factors of timestamp, clock sequence, and node information (Mac address).
- Version 2: Basically the same as Version 1, but it is mainly used with DCE (IBM's set of distributed computing environments). However, this version is not specifically described in the IETF, but it is mentioned in the document "DCE 1.1: Authentication and Security Services". Therefore, this version is rarely used now, and many implementations in many places have also ignored it.
- Version 3: Implements variable factors based on name and namespace using the hash algorithm MD5.
- Version 4: Implements variable factors using random or pseudo-random methods.
- Version 5: Implements variable factors based on name and namespace using the hash algorithm SHA1.
Regardless of which version of UUID, its structure is the same. This structure is defined according to Version 1, but in other versions, several variable factors in Version 1 have changed.
0x02 Basic Structure#
UUID has a length of 128 bits (16 bytes), which can be represented by 32 hexadecimal values (each 4 bits represents a value). The values are separated by 4 hyphens in the order of 8-4-4-4-12. Including the hyphens, UUID has 36 characters. For example: 3e350a5c-222a-11eb-abef-0242ac110002.
The format of UUID is as follows: xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
The position of N can only be 8, 9, a, b.
The position of M represents the version number. Since there are 5 versions in the standard implementation of UUID, it can only be 1, 2, 3, 4, 5.
One Timestamp#
Timestamp is a 60-bit unsigned number. For UUID Version 1, it starts from 1582-10-15 00:00:000000000 to the current UTC time, with an increment of 100 nanoseconds. For systems that cannot obtain UTC time, if UTC cannot be obtained, you can use localtime uniformly (in fact, the same system time zone is sufficient).
With the timestamp, the values of time_low, time_mid, and time_hi in the structure diagram are known.
time_low
represents bits 0 to 31 of the 60-bit timestamp, a total of 32 bits.
time_mid
represents bits 32 to 47 of the 60-bit timestamp, a total of 16 bits.
time_hi_and_version
has two parts, version and time_hi. The version occupies 4 bits, representing a maximum of 31 versions. time_hi represents the remaining 12 bits of the timestamp, a total of 16 bits.
Two Clock Sequence#
If the machine calculating the UUID has adjusted the time or the nodeId has changed (the host has replaced the network card) and conflicts with other machines, a variable factor needs to be changed to ensure the uniqueness of the generated UUID.
The algorithm for changing the Clock Sequence is actually very simple. When the time is adjusted or the nodeId changes, you can directly use a random number or increment the original Clock Sequence value by one.
Clock Sequence is a total of 14 bits.
clock_seq_low
represents bits 0 to 7 of the Clock Sequence, a total of 8 bits.
clock_seq_hi_and_reserved
contains two parts, reserved and clock_seq_hi. clock_seq_hi represents bits 8 to 13 of the Clock Sequence, a total of 6 bits, and reserved is 2 bits. reserved is generally set to 10.
Three Node#
Node is a 48-bit unsigned number. For UUID Version 1, it selects the IEEE 802 MAC address, which is the MAC address of the network card. When there are multiple network cards in the system, any valid network card can be used as the Node data. For systems without network cards, a random value is used.
0x03 Differences between Different Versions#
The above content has explained the structure of UUID. Basically, this structure is the definition of UUID Version 1. We can see that its variable factors include timestamp, clock sequence, and node. However, the meanings of these variable factors are different in different versions.
In Version 4, the timestamp, clock sequence, and node are all random or pseudo-random.
But in Version 3 and 5, they are generated based on the hash algorithm of name and namespace.
The name and namespace are similar to the namespaces and class names in many programming languages. The basic requirement is that name + namespace is the standard for determining the uniqueness of the hash string. In other words, the same namespace + name must produce the same result using the same hash algorithm (such as MD5 for Version 3), but the results generated by the same name in different namespaces are different.
The three variable factors in Version 3 and Version 5 are all guaranteed by the hash algorithm, with MD5 for Version 3 and SHA1 for Version 5.
Alright, that's the end of this tutorial~ If I think the Cloud Shell experience is good, I will continue to update it in the future! If you find this article helpful, consider sponsoring, please? Thank you~