MySQL Master/Slave Replication on AWS EC2/EBS

This very brief guide assumes you have already setup a Master/Slave relationship between two MySQL servers hosted on EC2 and are looking for an efficient way to clone large Master databases to the Slave server.

I use this method to clone a 150GB EBS volume containing Master databases to a Slave in about an hour. Most of the hour is waiting for EBS to complete the snapshot of the Master volume.

Requirements and Assumptions

  • Servers must both be hosted on EC2
  • Ideally, servers are running same OS and same OS version
  • Servers MUST be running identical version of MySQL
  • Servers MUST both be MySQL 32 bit or both be MySQL 64 bit. You cannot use this method to move data from MySQL 32 bit to MySQL 64 bit.
  • MySQL data MUST be hosted on a dedicated EBS volume (e.g. /data)
  • MySQL master must be able to shutdown for several seconds/minutes during this process
  • No MySQL slaves currently exist, or you are reloading ALL of your slaves. You MUST skip the RESET MASTER step if need to maintain existing slave relationships.

Let’s begin!

MASTER SERVER

We will open a mysql console on the master server, grant replication rights to the new slave server IP, perform a global lock, reset our master binary logs (optional), make note of our master binary log file position, and stop our MySQL server daemon.

mysql
GRANT REPLICATION SLAVE ON *.* to 'repl'@'172.x.x.x' IDENTIFIED BY 'repl-password-here';

# Close all open tables and locks all tables for all databases
flush tables with read lock;

# Delete all binary log files, reset binary log index, create new binary log file.
# WARNING: Skip this command if you need to maintain existing slave relationships!
# WARNING: Only reset master if you are reloading ALL slaves.
reset master;

# Make note of the current master file name and file position.
show master status;
 +------------------+----------+--------------+------------------+
 | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
 +------------------+----------+--------------+------------------+
 | mysql-bin.000001 | 107      |              |                  |
 +------------------+----------+--------------+------------------+

# Stop your MySQL server
/etc/init.d/mysql stop

At this point, you need to freeze your volume (example for XFS volume below) or temporarily shutdown your EC2 instance.

xfs_freeze -f /myxfs
# Take snapshot of file system 
xfs_freeze -u /myxfs

Once your volume has been frozen (or your EC2 instance has been shutdown), login to AWS console and take a snapshot of the dedicated volume that contains your MySQL data. My understanding is that it is safe to unfreeze the volume (or start your EC2 instance) as soon as AWS begins performing the EBS snapshot. You should NOT need to wait for the EBS snapshot to complete (e.g. an hour for a 150GB volume).

Once your EBS snapshot begins, unfreeze your volume and restart MySQL (or start your EC2 instance).

# Start MySQL server daemon
/etc/init.d/mysql start

# Open MySQL console
mysql

# Release global lock on all of your tables, 
# so writes can be performed again.
unlock tables;

PRO TIP: Since we store our MySQL data on a dedicated XFS volume, we use “ec2-consistent-snapshot” to freeze the XFS filesystem, begin an EBS snapshot, and unfreeze the XFS filesystem in just a few seconds. We use this to snapshot our servers (and have successfully tested/restored the snapshots). We have a custom script that runs along side this and automatically purges old snapshots. These scripts require creating AWS credentials and require configuring IAM permissions for the EC2 instance to request the EBS snapshot and view/delete old EBS snapshots. It is easier to manually create the EBS snapshot from the AWS console if you are only needing to do this once in a while.

Your MASTER SERVER should be fully operational at this point. You must wait for AWS to finish writing the EBS snapshot before proceeding.

SLAVE SERVER

Monitor EBS snapshot progress in your AWS console. Once the EBS snapshot is complete, create a new EBS volume and select the snapshot as the source for that new volume.

Make note of the old volume device name in AWS console AND in your EC2 server console. In our case, AWS shows our “/data” volume as /dev/sdb but EC2 shows the device as /dev/xvdb1.

Unmount the existing “/data” volume in the slave EC2 server console.

unmount /data

Force detach the existing “/data” volume for the slave EC2 server in your AWS console.

WARNING: Make sure you are detaching the correct volume for the correct server! A forced detach is no different than pulling a disk out of a server and will likely lead to data loss/corruption if you detach a mounted disk!

Attach the new “/data” volume to your slave EC2 server in your AWS console. Specify the volume device name shown for the old/existing volume (e.g. “/dev/sdb”).

Mount the new “/data” volume in your slave EC2 server console. If the server cannot find the volume, the volume device name may have changed. Determine new volume device name, update /etc/fstab, and try to mount again.

mount /data

At this point, an exact copy of the “/data” volume from our master MySQL server is now mounted on our slave MySQL server.

PRO TIP: AWS will restore the entire volume from a snapshot in just a few seconds, thanks to their lazy loading technique. MySQL performance will be slow until the data has been loaded. You can pre-load all EBS volumes if you are concerned about performance. Example:

# Determine device name (e.g. /dev/xvdX) for your "/data" volume
lsblk
# Read entire data volume forces EBS to write entire volume from snapshot
sudo dd if=/dev/xvdX of=/dev/null bs=1M

If you choose to pre-load your data, you can proceed with the steps below while the “dd” command above is running. We do not pre-load data for slaves that are only used to run mysqldump backups.

A few more commands, and our slave MySQL server should begin syncing to the master!

# Start the MySQL server daemon
/etc/init.d/mysql start

# Open the MySQL console
mysql

# Configure your Master settings
# NOTES:
# * Must input correct IP, username, password
# * Must input LOG FILE and LOG POS shown before starting an EBS snapshot
# * If using SSL, generate SSL certificates in advance and input paths.

# Example WITHOUT SSL
CHANGE MASTER TO
 MASTER_HOST='172.x.x.x',
 MASTER_USER='repl',
 MASTER_PASSWORD='repl-password-here',
 MASTER_LOG_FILE = 'mysql-bin.000001',
 MASTER_LOG_POS = 107;

# Example WITH SSL
CHANGE MASTER TO
    MASTER_HOST='172.x.x.x',
    MASTER_USER='repl',
    MASTER_PASSWORD='repl-password-here',
    MASTER_SSL=1,
    MASTER_SSL_CA = '/etc/ssl/certs/ssl-cert-FQDN-bundle.crt',
    MASTER_SSL_CAPATH = '/etc/ssl/certs/',
    MASTER_SSL_CERT = '/etc/ssl/certs/ssl-cert-FQDN.crt',
    MASTER_SSL_KEY = '/etc/ssl/private/ssl-cert-FQDN.key',
    MASTER_SSL_CIPHER = 'DHE-RSA-AES256-SHA',
    MASTER_LOG_FILE = 'mysql-bin.000001',
    MASTER_LOG_POS = 107;

# Start slave replication
START SLAVE;

If all goes well, your slave should begin replicating data from the master! Check slave replication status.

show slave status \G

Slave status should show “X seconds behind master”. If “X” is 0, replication is current. If “X” > 0, replication is working and X should slowly continue to decrease each time you show slave status.

If “X” is “N/A”, replication is not working and you likely need to resolve errors. Here are two common troubleshooting tips I follow.

TROUBLESHOOTING

I do not provide detailed master/slave troubleshooting. Here are a few basic

Confirm slave can connect to master. The following command is also helpful for troubleshooting SSL, but should connect to the server and output your SSL Cipher (e.g. NONE) even if you are not using SSL. Useful for identifying and resolving basic network level or permission issues, including AWS Security Group problems or MySQL replication username/password problems.

mysql -h 172.x.x.x -u repl -p -e "SHOW STATUS LIKE 'Ssl_cipher';"

Resolve errors regarding duplicate records. With the method above, I occasionally encounter an issue where the slave will try to run several queries that were already run on the master prior to the snapshot. The “show slave status” error will tell you that a duplicate key exists, or a duplicate record cannot be inserted.

To resolve this issue, I run the following commands to stop the slave, skip ONE replication query, and start the slave.

STOP SLAVE;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE;
SHOW SLAVE STATUS \G;

This will normally resolve the duplicate record error. This may lead to a NEW duplicate record error. Our database receives several writes per second when idle. I usually have to repeat this 2-5 times. I would expect performing the global write lock and then showing the master status would give me an exact binary log position and avoid this duplicate record issue? If you know why this is happening, please leave a comment below!

I can not assist you with replication errors! MySQL replication is usually straightforward, but many different issues can lead to replication errors. Please perform your own research if you have replication errors. If you would like to share your replication error and your solution in a comment below, I will approve your comment for others to see.