About Redis

2023-07-28

10 minute read

In the past projects I've worked on, Redis has been like a sugar for cache solutions. But as time went on, I've just been using it, yeah I know it's a key-value model database, no SQL here, and I don't know anything about its other capabilities. So maybe now is a good time to seriously learn Redis.

How are the keys and values organized?

Hash tables. Because this hash table stores all the key-value pairs, I also call it the global hash table.

The lookup process mainly relies on hash calculation (O(1) complexity), and is not directly related to the amount of data. However, after you write a large amount of data to Redis, you may find that the operations sometimes suddenly become slower, because of hash table collisions and the potential blocking caused by rehashing.

To make the rehash operation more efficient, Redis uses two global hash tables by default: hash table 1 and hash table 2. Initially, when you just start inserting data, the default hash table used is hash table 1, and hash table 2 is not yet allocated. As the data gradually increases, Redis begins to perform rehashing, which is a three-step process:

Allocate a larger space for hash table 2, for example, twice the current size of hash table 1.
Remap and copy the data from hash table 1 to hash table 2. To avoid blocking user requests during the copy process, Redis adopts a gradual rehashing approach, where it copies the entries from the first index position of hash table 1 to hash table 2 during each user request processing.
Release the space occupied by hash table 1.

Since Redis has many data types, and each data type has some common metadata that needs to be recorded (such as the last access time, reference count, etc.), Redis uses a RedisObject struct to uniformly record this metadata, and then point to the actual data. A RedisObject contains 8 bytes of metadata and an 8-byte pointer, which further points to the actual data type. However, there are exceptions to save memory space. For example, for Long integer type and SDS (Simple Dynamic String), Redis has specialized memory layout designs:

When storing a Long integer type, the pointer in the RedisObject is directly assigned the integer data, so there's no need for an additional pointer to the integer.
When storing a string data less than or equal to 44 bytes, the RedisObject's metadata, pointer, and SDS are in a continuous memory area, which can avoid memory fragmentation.

Persistence/Single-machine High Availability

AOF (Append-Only File)

Records the operations
Has a small impact on command execution
But the recovery time is relatively long

We are familiar with the Write Ahead Log (WAL) in databases, where the modified data is first recorded in the log file before actually writing the data. However, the AOF (Append-Only File) in Redis works the other way around - it is a "write-after" log, meaning Redis first executes the command and writes the data into memory, and then records the log.

The AOF log records every command received by Redis, and these commands are saved in text form. When Redis is recording the log to the AOF, it does not perform any syntax check on the commands. So, if the log is recorded before executing the commands, the log may contain incorrect commands. But with the "write-after" log approach, the system first executes the command, and only records the log if the command execution is successful, otherwise it directly reports an error to the client.

By recording the log after command execution, the current write operation is not blocked.

However, the AOF mechanism also has two potential risks related to when the AOF is written back to the disk:

If the system crashes immediately after executing a command, but before the log is recorded, there is a risk of losing that command and the corresponding data.
Although the AOF avoids blocking the current command, it may introduce a blocking risk for the next operation. This is because the AOF log is also executed in the main thread, and if the disk write pressure is high, the disk write may become very slow, causing subsequent operations to also be unable to execute.

To address this, the AOF mechanism provides three choices to control the timing of writing to the disk, corresponding to the appendfsync configuration (similar to configurations in MySQL's redo log/binary log):

Always: Synchronously write the log to disk after each write command.
Everysec: After each write command, just write the log to the AOF file's memory buffer, and write the buffer to disk every second.
No: After each write command, just write the log to the AOF file's memory buffer, and let the operating system decide when to write the buffer to disk.

What if the AOF log file becomes too large? The AOF rewrite mechanism comes into play.

The AOF rewrite mechanism rewrites the AOF file based on the current state of the database. It reads all the key-value pairs in the database and records them with a single command in the new AOF file, even if a key has been modified multiple times.

During the rewrite, the main thread forks a background bgrewriteaof subprocess. The fork will copy the main thread's memory, which contains the latest database data, to the subprocess. Then the bgrewriteaof subprocess can rewrite the log without affecting the main thread's processing of new operations. If there are any new write operations during the rewrite, they will be written to both the existing AOF log and the rewrite log. Once the rewrite is complete, the new AOF file can replace the old one.

RDB (Redis Database Backup):

Records the data at a certain point in time
The recovery time is relatively short

The RDB (Redis Database Backup) file is a way to write the state of the database at a certain point in time to disk as a file. During data restoration, you can directly read the RDB file into memory to quickly complete the recovery.

Redis provides two commands to generate the RDB file: save and bgsave.

save: Executes in the main thread, which will cause blocking. bgsave: Creates a child process dedicated to writing the RDB file, avoiding blocking the main thread. This is the default configuration for generating the Redis RDB file.

In the bgsave case, the main thread is not blocked and can still receive requests normally. However, to ensure the integrity of the snapshot, it can only handle read operations and cannot modify the data being snapshotted.

Copy-On-Write (COW): To suspend write operations for the snapshot is definitely unacceptable. The bgsave child process is forked from the main thread, so it can share all of the main thread's memory data. After the bgsave child process starts running, it begins to read the main thread's memory data and write it to the RDB file. If the main thread needs to modify a piece of data, that data will be copied to create a duplicate. Then the bgsave child process will write this duplicate data to the RDB file. (This is similar to the main thread writing to both the AOF log and the AOF rewrite log.)

The frequency of snapshots is not easy to grasp. If the frequency is too low, there could be a lot of data loss in the event of a crash between two snapshots. If full snapshots are performed frequently, it will also bring two kinds of overhead. One is that the frequent full data writes to disk will put a lot of pressure on the disk, and multiple snapshots competing for limited disk bandwidth may cause a vicious cycle where the next snapshot starts before the previous one is complete. The other is that the fork operation to create the bgsave child process will block the main thread, and the longer the main thread's memory, the longer the blocking time (so when using RDB, the memory of a single Redis instance should not be too large).

Is there a win-win solution? Redis 4.0 introduced a hybrid approach using both AOF logs and in-memory snapshots. In simple terms, in-memory snapshots are executed at a certain frequency, and during the interval between two snapshots, the AOF log records all the command operations.

Although fsync is executed by a background child thread, the main thread will monitor the progress of fsync. When the main thread uses the background child thread to execute an fsync, and needs to write the newly received operations to disk again, if the main thread finds that the previous fsync is not yet complete, it will block. So if the fsync executed by the background child thread is frequently blocked (for example, if the AOF rewrite occupies a lot of disk I/O bandwidth), the main thread will also be blocked, causing Redis performance to slow down.

Redis is an in-memory database, and if the memory usage is not controlled well, or if it is running together with other memory-intensive applications, it may be affected by swapping, causing performance to slow down. Once swapping is triggered, Redis requests have to wait for the disk data read/write to complete.

The Linux kernel has supported the large page mechanism since version 2.6.38, which supports 2MB large memory pages, while the regular memory page allocation is executed in 4KB granularity. If large pages are used, when Redis does copy-on-write, even if the client request only modifies 100B of data, Redis still needs to copy a 2MB large page. What to do? Disable large pages.

Sharded Cluster

The Redis Cluster solution uses hash slots (Hash Slot, I'll just call it Slot) to handle the mapping between data and instances. A sharded cluster has a total of 16384 hash slots. The key of a key-value pair is hashed using the CRC16 algorithm to produce a 16-bit value, which is then used to calculate the slot by taking the modulus with 16384.

When deploying the Redis Cluster solution, you can use the cluster create command to create the cluster, and Redis will automatically distribute these slots evenly across the cluster instances (although you can also configure them manually). Through the hash slots, the sharded cluster implements the mapping from data to hash slots, and then from hash slots to instances.

How do you know which instance a hash slot is on? Redis instances will send their hash slot information to the other connected instances, to propagate the slot allocation information. After the instances are interconnected, each instance will have the mapping information for all the hash slots.

Clients will cache the hash slot information locally after receiving it. When a client requests a key-value pair, it will first calculate the hash slot of the key, and then send the request to the corresponding instance.

In the cluster, the mapping between instances and hash slots is not fixed, as there can be node additions/deletions and load balancing. Instances can still exchange messages to obtain the latest hash slot allocation information, but clients cannot actively perceive these changes.

Buffer in Redis

The buffer is a key application scenario in Redis for temporarily storing command data sent by the client, or the data results returned by the server, during communication between the client and the server.

Another major application scenario for the buffer is during data synchronization between master and slave nodes, where it is used to temporarily store the write commands and data received by the master node.

Buffer overflow can lead to network connection closure and data loss. There are three main reasons for buffer overflow:

Commands are sent too quickly or are too large.
Command data processing is too slow.
The buffer space is too small.

Use case for transactions in Redis:

The data structures in Redis each have their own characteristics. In certain scenarios, a piece of data may need to be stored in multiple data structures to simultaneously satisfy the needs for read/write performance and special operation commands (such as range queries, counting, etc.).

In this case, transactions can be used to ensure the atomicity of these multiple operations on different data structures. The basic workflow is:

Start the transaction with the MULTI command.
Execute the desired commands (e.g. DECR a:stock, DECR b:stock) - these will be queued without executing.
Commit the transaction with the EXEC command - this will execute all the queued commands atomically.
Redis supports transactions through the MULTI, EXEC, DISCARD and WATCH commands, and the level of ACID guarantee is as follows:

Atomicity:

If there are errors when queuing the commands, the entire transaction will be abandoned, ensuring atomicity.
If there are no errors when queuing, but errors occur during actual execution, atomicity cannot be guaranteed.
If the EXEC command encounters a server failure, and AOF logging is enabled, atomicity can be ensured.

Consistency:

Redis transactions do provide consistency guarantees:
If there are errors when queuing the commands, the database remains consistent.
If there are no errors when queuing, but errors occur during actual execution, the database remains consistent.
If the server fails after the EXEC command but before the changes are persisted (no RDB/AOF), the database remains consistent upon restart.

Isolation:

If concurrent operations (Client Y) happen before the EXEC command (Client X), isolation can be achieved using the WATCH mechanism. WATCH monitors the key values and aborts the transaction if any monitored key has been modified, ensuring isolation.
If the concurrent operations (Client Y) happen after the EXEC command (Client X), isolation is guaranteed because Redis executes commands sequentially in the single-threaded model.