Redis in HA with Sentinel

Redis (REmote DIctionary Server) is a high-performance, open-source, non-relational in-memory database that can be used as a store of key-value data structures. It can obviously be used as a single instance, therefore installed on a single server, or configured in High Reliability. There are two ways to have a HA-type Redis architecture, activate a Redis Cluster having at least 6 nodes available (3 Master + 3 Slave), or use Sentinel having 3 nodes available with Sentinel installed on, in addition to the Master and Slave nodes of Redis.

The replication mechanism works so that data written to the master is asynchronously replicated to the slave servers. In case of connectivity issues between the master and the slave, the synchronization stops, but as soon as the communication channel becomes available again, the slave reconnects to the master and replication resumes. Read/write operations are always possible on the master node, while only read operations are allowed on a slave node, at least until the slave is elected as master. This way, clients can read data even if the master server is unavailable, assuming that at least one of the slave servers is available and contains the requested data.

Sentinel is responsible for monitoring the Redis servers in the cluster (master and slaves) and coordinating the automatic failover of the master to one of its available slaves, if a master failure occurs. When a master is no longer available, Sentinel chooses one of its available slaves and promotes it to master, so the cluster continues to run uninterrupted.

In this post we will see how to create a simple Redis HA architecture with 3 nodes, i.e. 2 nodes hosting the Redis engine and Sentinel, and one node running Sentinel only. The example schema is shown below.

redis-ha-sentinel

Redis setup

I used CentOS 7 virtual machines, but few changes are required if you use other operating systems.

Since Redis is not available as rpm in the CentoOS repositories, we enable the Remi repository and proceed with the installation on all 3 nodes. Sentinel is installed as an additional tool for Redis, so let’s do the basic installation and then on one of the nodes we will start Sentinel only, while on the other two we will start both Sentinel and Redis.

	yum install epel-release yum-utils
	yum install http://rpms.remirepo.net/enterprise/remi-release-7.rpm
	yum-config-manager --enable remi
	yum install redis

Replica configuration

Proceed with the configuration of 2 Redis servers configured in replication, so we will have a master node and a slave in replica.
The configuration file to edit is /etc/redis/redis.conf.
In the master node (IP 172.31.6.218):

	bind 127.0.0.1 172.31.6.218
	protected-mode no
	supervised systemd
	masterauth P@ssw0rd
	masteruser mymasteruser
	user default off
	user mymasteruser +@all on >P@ssw0rd
	user superuser +@all ~* on >P@ssw0rd
	user sentineluser on >P@ssw0rd allchannels +multi +slaveof +ping +exec +subscribe +config|rewrite +role +publish +info +client|setname +client|kill +script|kill

In the slave node (IP 172.31.94.134):

	bind 127.0.0.1 172.31.94.134
	protected-mode no
	supervised systemd
	replicaof 172.31.6.218 6379
	masterauth P@ssw0rd
	masteruser mymasteruser
	user default off
	user mymasteruser +@all on >P@ssw0rd
	user superuser +@all ~* on >P@ssw0rd
	user sentineluser on >P@ssw0rd allchannels +multi +slaveof +ping +exec +subscribe +config|rewrite +role +publish +info +client|setname +client|kill +script|kill

On the official documentation of redis you can find the detail of the configuration, and all the parameters are well described even within the configuration file itself. Here we briefly see the meaning of the parameters used.

  • bind: is the address that redis listens on, you need to add the machine’s IP as well as localhost to allow slave nodes and sentinel nodes to connect.
  • protected-mode: set to no to allow redis accept remote connections.
  • supervised: how the application is managed; in our case systemd.
  • replicaof: allows to tell the slave node which redis server to replicate from.
  • masterauth: is the password of the masteruser used for replication operations.
  • masteruser: is the username of the masteruser used for replication operations.
  • user: allows to configure different users to perform different operations on redis, according to the ACL management logic; in my case I configured the mymasteruser user to manage the replica operations, the superuser user who can execute all the commands on the redis server, the sentineluser user to manage the operations necessary for Sentinel.

Start Redis on both servers.

	systemctl start redis
	systemctl enable redis

Once started, using the redis-cli tool it is possible to check the status of the servers. Since we have configured ACLs for users, it is necessary to authenticate to be able to execute commands, so after starting redis-cli we have to execute the command AUTH , for example in our case *AUTH superuser P@ssw0rd *.
On the master node we have:

	127.0.0.1:6379> info replication
	# Replication
	role:master
	connected_slaves:1
	slave0:ip=172.31.94.134,port=6379,state=wait_bgsave,offset=0,lag=0
	master_failover_state:no-failover
	master_replid:7483bc9c74537c06dad78ec10bae247ac26bea52
	master_replid2:a9dbab0000bbf9b7306e617a3bb07b7070721a03
	master_repl_offset:2098991
	second_repl_offset:2086135
	repl_backlog_active:1
	repl_backlog_size:1048576
	repl_backlog_first_byte_offset:1166345
	repl_backlog_histlen:932647

While on the slave node we have:

	127.0.0.1:6379> info replication
	# Replication
	role:slave
	master_host:172.31.6.218
	master_port:6379
	master_link_status:up
	master_last_io_seconds_ago:0
	master_sync_in_progress:0
	slave_read_repl_offset:2122994
	slave_repl_offset:2122994
	slave_priority:100
	slave_read_only:1
	replica_announced:1
	connected_slaves:0
	master_failover_state:no-failover
	master_replid:7483bc9c74537c06dad78ec10bae247ac26bea52
	master_replid2:0000000000000000000000000000000000000000
	master_repl_offset:2122994
	second_repl_offset:-1
	repl_backlog_active:1
	repl_backlog_size:1048576
	repl_backlog_first_byte_offset:2098992
	repl_backlog_histlen:24003

In particular role indicates the role of the server, slave0:ip=172.31.94.134,port=6379,state=wait_bgsave,offset=0,lag=0 on the master gives us an indication of the connected slave, master_host:172.31. 6.218 on the slave gives us indication of the master to which the slave is connected.

Sentinel configuration

Well, at this point we have 2 Redis nodes in replica, one is the master and the other is the slave. In case of problems on the master, we could manually elect the slave as master and still do all the read/write operations on our data.
The role of Sentinel is to constantly monitor the status of the nodes and automatically migrate the role of the servers in case of problems.

Proceed with Sentinel configuration on our 3 nodes. The Sentinel configuration file is /etc/redis/sentinel.conf.

Node redis 1 (IP 172.31.6.218)

	bind 172.31.6.218
	port 26379
	sentinel announce-ip 172.31.6.218
	sentinel monitor mymaster 172.31.6.218 6379 2
	sentinel auth-pass mymaster P@ssw0rd
	sentinel auth-user mymaster sentineluser
	sentinel down-after-milliseconds mymaster 3000
	sentinel failover-timeout mymaster 120000

Node redis 2 (IP 172.31.94.134)

	bind 172.31.94.134
	port 26379
	sentinel announce-ip 172.31.94.134
	sentinel monitor mymaster 172.31.6.218 6379 2
	sentinel auth-pass mymaster P@ssw0rd
	sentinel auth-user mymaster sentineluser
	sentinel down-after-milliseconds mymaster 3000
	sentinel failover-timeout mymaster 120000

Node sentinel (IP 172.31.13.69)

	bind 172.31.13.69
	port 26379
	sentinel announce-ip 172.31.13.69
	sentinel monitor mymaster 172.31.6.218 6379 2
	sentinel auth-pass mymaster P@ssw0rd
	sentinel auth-user mymaster sentineluser
	sentinel down-after-milliseconds mymaster 3000
	sentinel failover-timeout mymaster 120000

Start Sentinel on all the servers:

	systemctl start redis-sentinel
	systemctl enable redis-sentinel

Now we use redis-cli again, but this time to connect to Sentinel and not the Redis server.

	[root@ip-172-31-13-69 redis]# redis-cli -h 172.31.13.69 -p 26379
	172.31.13.69:26379> sentinel masters
	1)  1) "name"
	    2) "mymaster"
	    3) "ip"
	    4) "172.31.6.218"
	    5) "port"
	    6) "6379"
	    7) "runid"
	    8) "b9b82fc75dd0bab18bb26ac8d38c6867a04626ca"
	    9) "flags"
	   10) "master"
	   11) "link-pending-commands"
	   12) "0"
	   13) "link-refcount"
	   14) "1"
	   15) "last-ping-sent"
	   16) "0"
	   17) "last-ok-ping-reply"
	   18) "321"
	   19) "last-ping-reply"
	   20) "321"
	   21) "down-after-milliseconds"
	   22) "3000"
	   23) "info-refresh"
	   24) "7559"
	   25) "role-reported"
	   26) "master"
	   27) "role-reported-time"
	   28) "3295851"
	   29) "config-epoch"
	   30) "5"
	   31) "num-slaves"
	   32) "1"
	   33) "num-other-sentinels"
	   34) "2"
	   35) "quorum"
	   36) "2"
	   37) "failover-timeout"
	   38) "120000"
	   39) "parallel-syncs"
	   40) "1"

Within the various parameters obtained in output, the most important are the IP of the master node that in this case is 172.31.6.218, num-slaves that in our case is 1 (we have only 1 slave node) and num-other-sentinels that indicates the number of additional Sentinel nodes. If this latter value were 0, it would mean that the system does not communicate correctly with the other Sentinel nodes.

Failover test

To test if the failover mechanism is working we shut down Redis on the first node, currently the master master systemctl stop redis.
After the 3 seconds that we set on the down-after-milliseconds parameter, in the log /var/log/redis/sentinel.log we will have this:

	16268:X 22 Mar 2023 08:25:02.864 # +sdown master mymaster 172.31.6.218 6379
	16268:X 22 Mar 2023 08:25:02.908 * Sentinel new configuration saved on disk
	16268:X 22 Mar 2023 08:25:02.908 # +new-epoch 6
	16268:X 22 Mar 2023 08:25:02.912 * Sentinel new configuration saved on disk
	16268:X 22 Mar 2023 08:25:02.912 # +vote-for-leader adecd88892552108121902d37e2d279ab4a1dc1c 6
	16268:X 22 Mar 2023 08:25:02.936 # +odown master mymaster 172.31.6.218 6379 #quorum 3/2
	16268:X 22 Mar 2023 08:25:02.936 # Next failover delay: I will not start a failover before Thu Mar 22 08:29:03 2023
	16268:X 22 Mar 2023 08:25:04.061 # +config-update-from sentinel adecd88892552108121902d37e2d279ab4a1dc1c 172.31.94.134 26379 @ mymaster 172.31.6.218 6379
	16268:X 22 Mar 2023 08:25:04.061 # +switch-master mymaster 172.31.6.218 6379 172.31.94.134 6379
	16268:X 22 Mar 2023 08:25:04.061 * +slave slave 172.31.6.218:6379 172.31.6.218 6379 @ mymaster 172.31.94.134 6379
	16268:X 22 Mar 2023 08:25:04.064 * Sentinel new configuration saved on disk
	16268:X 22 Mar 2023 08:25:07.107 # +sdown slave 172.31.6.218:6379 172.31.6.218 6379 @ mymaster 172.31.94.134 6379

If we check the status of the master again on Sentinel we see that now the master node is 172.31.94.134, the one that was previously slave.

	[root@ip-172-31-13-69 redis]# redis-cli -h 172.31.13.69 -p 26379
	172.31.13.69:26379> sentinel masters
	1)  1) "name"
	    2) "mymaster"
	    3) "ip"
	    4) "172.31.94.134"
	    5) "port"
	    6) "6379"
	    7) "runid"
	    8) "6a6c7352bd37ad247e40c6b498ceb688a8e2ae7e"
	    9) "flags"
	   10) "master"
	   11) "link-pending-commands"
	   12) "0"
	   13) "link-refcount"
	   14) "1"
	   15) "last-ping-sent"
	   16) "0"
	   17) "last-ok-ping-reply"
	   18) "390"
	   19) "last-ping-reply"
	   20) "390"
	   21) "down-after-milliseconds"
	   22) "3000"
	   23) "info-refresh"
	   24) "7036"
	   25) "role-reported"
	   26) "master"
	   27) "role-reported-time"
	   28) "359407"
	   29) "config-epoch"
	   30) "6"
	   31) "num-slaves"
	   32) "1"
	   33) "num-other-sentinels"
	   34) "2"
	   35) "quorum"
	   36) "2"
	   37) "failover-timeout"
	   38) "120000"
	   39) "parallel-syncs"
	   40) "1"

Note that the Master-Slave management is now the responsibility of Sentinel, and even restarting Redis on the node where we switched it off, the master will not migrate back to it, but will remain on the second node until problems arise (redis switched off or unreachable ) or moved manually.