Articles

Troubleshoot ISP Performance (Internet Latency and Packet Loss) with Ping, Traceroute, and MTR

I have managed networks for the past 20+ years. I am able to use my experience to very quickly diagnose ISP problems that arise with my own home Internet connection. I am documenting my basic troubleshooting steps for the benefit of anyone else who is having trouble with their Internet connection.

IMPORTANT: If you are troubleshooting a problem with your Internet connection, be sure to disable any VPN connection before you run any of these tests!

Terms

The most important terms to understand when troubleshooting problems with your Internet connection are “Latency”, “Packet Loss”, and “Bandwidth”. I also defined related terms, including the tools we will use.

Latency: This is the round-trip time it takes for a “packet” of information to travel from your device, to the remote host, and back to your device. Usually measured in milli-seconds (ms). 1000 ms = 1 second.

Ping: This tool sends one or more “packets” of information to a remote host (aka “Echo Request”) and expects to receive the same number of packets back from the remote host (aka “Echo Reply”). Ping will display the Latency for each packet, or will tell you that a reply was not received.

Packet Loss: This indicates the number of “packets” that were lost when pinging a remote host. Packet Loss is measured in %. If 10 packets were sent and only 9 responses (90%) were received, Packet Loss would be 10%.

Traceroute: This tool maps each “hop” between your computer and a remote computer (and collects Packet Loss % and Latency for each “hop”) by taking advantage of an IP feature called “Time to Live” (aka “TTL”) which limits the lifespan of a packet on the Internet. Traceroute sends 3 pings with TTL value of 1 to the remote computer, which causes the 1st “hop” between your computer and the remote computer to reply 3 times. The process is repeated with a TTL of 2 to collect information from the 2nd hop, and so on until the entire path is mapped showing Packet Loss and Latency for each “hop”.

MTR: This tools is very similar to Traceroute, but provides a continuous realtime view of Packet Loss and
Latency at each “hop” between your device and the remote host. Instead of sending 3 pings to each TTL
value and expecting 3 replies from each “hop” like Traceroute, MTR continuously sends 1 ping to each
TTL value and updates the Packet Loss and Latency stats for each hop.

Bandwidth: This is the maximum amount of data that can be transferred per second between your computer and a remote host. Bandwidth is usually measured in “bits per second” or “bytes per second”. 1 byte = 8 bits. “Download” speed (from the Internet to your device) and “Upload” speed (from your device to the Internet) are usually measured separately. For example, a Gigabit Cable connection may have a “1 Gig” download speed (aka 1000 Mbps) and a “40 Mbps” upload speed, while a Gigabit Fiber connection may have a “1 Gig” download and “1 Gig” upload speed.

Ping

This is the first tool I reach for when troubleshooting problems with my Internet connection. This tool can tell me the following:

  1. Do I have any Internet connectivity?
  2. How frequently am I losing Internet connectivity?
  3. Do I have any Internet latency issues?
  4. How often am I having Internet latency issues?

Begin by pinging a reliable remote host such as “google.com” or “aws.amazon.com”. To run a continuous ping, type “ping aws.amazon.com” in a Mac/Linux Terminal or type “ping -t aws.amazon.com” in a Windows Command Prompt.

In the example above, I ran “ping aws.amazon.com” then pressed CTRL+C after 10 pings to stop the ping and show a summary. My latency ranged from 34ms to 44ms. My packet loss was 0%.

Your latency will vary depending on the type of Internet connection you have, your location, and the location of the remote host. The further a remote host is from your location, the higher the latency. If you are in the United States, latency to another host in the U.S. (e.g. “amazon.com”) will usually be much lower than latency to a host in Europe (e.g. “amazon.de”) or Asia (e.g. “amazon.cn”).

If you only get a “Request timeout” instead of the normal replies shown above, try to ping a different hostname or IP address. Here are some personal favorite hostnames and IP addresses:

  • google.com
  • aws.amazon.com
  • 208.67.220.220 (DNS resolver operated by OpenDNS)
  • 208.67.222.222 (DNS resolver operated by OpenDNS)
  • 1.1.1.1 (DNS resolver operated by CloudFlare)

TIP: If I am having intermittent problems with my Internet connection, I will leave a Terminal window running a continuous audible ping in the background (e.g. “ping -i2 -A google.com” on Mac). This command will run a ping every 2 seconds and I will hear a bell sound anytime a packet is dropped. Anytime I feel like a website or app is not responding normally, I can switch to the Terminal window and look at the ping output to see if any recent pings are showing high latency or dropped packets.

TIP: If you can ping an IP address but you cannot ping a hostname, review your DNS settings. Incorrect DNS settings can prevent you from accessing ANY websites by hostname, but would still allow you to browse to a website by IP address. For example, here is a website you can browse to by IP address even if you are having DNS problems: https://1.1.1.1/.

Traceroute

When ping is reporting high latency on my Internet connection, I can use traceroute to determine where the high latency is occurring. The issue is almost always the first “hop” outside of my home network, indicating a problem with my home Internet connection. However, the latency issue could be further upstream between my Internet provider and another network.

Here is an example of output from a traceroute. In this example, I ran a traceroute to IP address “1.1.1.1”, but you can run a traceroute to any hostname or IP address. In this example, I used the “-n” option to skip name lookups, which helps the traceroute run a little faster. However, showing names can help you determine which network each hop belongs to (e.g. your ISP, AT&T, Amazon, Google, Microsoft, etc).

Note: Mac/Linux users type “traceroute 1.1.1.1”, Windows users type “tracert 1.1.1.1”

Observations from the traceroute output above:

  1. Hop 1 is an IP address inside my home network. The best ping time is 2.655 ms, which seems reasonable for a ping to a device on my home network.

    Wireless: If I was seeing very high latency (e.g. >50 ms) to this very first hop on my network and I was using WiFi, I would switch from wireless to a hard-wired network connection by plugging directly into my home router and I would run the test again to see if the latency issue disappears. High latency that only happens on WiFi could indicate a wireless interference problem, an older WiFi access point or router that cannot handle the number of devices or your bandwidth needs, or a failing WiFi access point.

    Wired: If I was seeing very high latency (e.g. >50 ms) to this very first hop on my network and I was using a hardwired network connection, I would suspect a problem with my network wiring, a network switch, or my network router. The performance problem is likely inside my network and no fault of the ISP.
  2. Hop 2 is the first internet device outside of my home network. It is not unusual for one or more hops in a traceroute to not respond. There are a number of reasons why this hop might not respond, including IP addressing or IP filtering.
  3. Hop 3 is the first internet device outside my home network that I can ping. This is a router at my ISP. The best ping time is 8.521 ms, which seems reasonable for a ping to a local/regional ISP router outside my home network. If I see a significant increase in latency from Hop 1 (inside my network) to this next hop (at my ISP), I would suspect an issue with my router, my cable modem, or my ISP connection.

    Tip: Plug directly into your cable modem with a network cable and repeat the test to eliminate the possibility that the latency issue is being caused by your router.

    Tip: Perform a cable modem swap and repeat the test to reduce the possibility that the latency issue is being caused by your cable modem. I say “reduce” instead of “eliminate” because I’ve swapped cable modems, only to find that my brand new cable modem was also faulty.
  4. Hops 4, 6, 7: Notice that these hops show latency for 2 or 3 different IP addresses. This is very common and happens when traffic can travel multiple redundant paths to the same destination.
  5. Hop 8 is the remote host. It is NOT unusual for the remote host to block ping replies, so do not be surprised if the last hop(s) show no response (* * *).

TIP: If I want to determine who an IP address belongs to, I perform an IP WHOIS lookup on the ARIN website (https://www.arin.net/). The results will tell me who is responsible for that IP address (e.g. my ISP, “Amazon”, “Google”, etc. If the result mentions another IP registry, repeat the lookup on their website. e.g. RIPE (https://www.ripe.net/), APNIC (https://www.apnic.net/), etc.

MTR

This free tool performs a continuous traceroute. I usually reach for this tool instead of traceroute because of its speed and because of its continuous nature.

Here is an example of output from MTR after 30 seconds. Because MTR requires special permissions, I had to type “sudo mtr 1.1.1.1” in my Mac Terminal. I stopped MTR by pressing CTRL+C.

Observations from MTR output:

  1. Overall, this output is nearly identical to the traceroute output. See traceroute observations above.
  2. Because I did not skip name resolution, I can see that Hop 6 belongs to Equinix, likely in Chicago (chi). I can also see that my final hop (1.1.1.1) has a vanity reverse name of “one.one.one.one”.

Bandwidth

I can identify the cause of most Internet performance problems using some combination of Ping and Traceroute (or MTR), but it can be helpful to perform bandwidth tests. For example, I used a bandwidth speed test to confirm that my service was only upgraded to 500 Mbps when I upgraded to a Gigabit plan. The ISP was able to find/resolve the issue right away.

Here are a few of my favorite tests:

  • Ookla Speed Test (https://www.speedtest.net/) — I like that I can easily save an image of the results
  • Google Speed Test (https://www.google.com/search?q=Internet+Speed+Test) — I like that this is built into the Google search result page
  • Your ISP Speed Test — I like to use these test results when working with my ISP to troubleshoot speed issues. If these results are significantly better than one of the other speed tests, I’ll share both results with my ISP.

If you need to measure the available bandwidth between two Linux hosts, consider using “iperf”. You must be able to install the tool on both hosts.

Thanks for Reading!

I intend to update this article with example outputs when my Internet connection is having issues or some point between my Internet provider and another network is having issues.

Did you find this helpful? Let me know by sending me a comment. I tend to update and maintain posts more frequently if I know others find them helpful. Thanks for visiting!

AWS WAF (Web Application Firewall) with Laravel Getting Started Guide

Our SaaS runs a Laravel PHP web application on AWS ECS (Elastic Container Service) behind an AWS ALB (Application Load Balancer). Manually configuring the AWS WAF (Web Application Firewall) to protect our web application was very easy.

This guide will walk you through manually configuring AWS WAF to protect your Laravel PHP web application running behind an AWS ALB. Keep in mind that best practice is to write all of your AWS infrastructure as code so that your infrastructure can be easily rebuilt or duplicated. You should write a CloudFormation template that can build your WAF after you become more familiar with AWS WAF.

Pricing

Like any other AWS service, AWS WAF pricing was not clear at first. We created an ACL ($5/mo), then added several AWS Managed Groups ($1/mo each), then manually created a rule of our own to allow certain traffic that was being blocked by the AWS Managed Groups ($1/mo). Our total cost to enable WAF on our application was $11/mo.

TIP: Once the ACL has been created and configured, the same ACL and associated rules can be applied to multiple resources in your account (e.g. one or more ALB plus one or more CloudFront distributions, etc) meaning we only pay the $11/mo once even if we use this WAF configuration on multiple resources. You can only configure an ALB to use one WAF ACL at one time.

Terminology

ACL ($5/mo)

The ACL contains all of the rules you want to apply to your web application. Your web application will only need one ACL. If you have multiple web applications and you need different WAF rules for each application, plan on creating a separate ACL for each application.

Rules ($1/mo each)

You can manually create a variety of rules to block or allow certain situations. For example, we might choose to always allow traffic from a specific IP with a request header for a specific target hostname. This is helpful when fine-tuning the managed rules provided by AWS, we have had to create very few of our own rules.

You can place one or more rules into a rule group. This step is optional.

Marketplace Rule Groups ($/mo price varies)

You can purchase managed rule groups from various security vendors through the AWS marketplace. Pricing for a managed rule groups is typically $10-20/month regardless of how many rules the vendor places in their rule group. Vendors offered these managed rule groups before AWS began offering their own managed rule groups (below). We do not currently use any marketplace rule groups.

Some customers complain that the AWS managed rules generate too many false-positives (e.g. blocking legitimate users) for their applications, so they prefer the marketplace rules. You will have to do your own research and decide if the marketplace rules are worth the additional cost compared to the AWS managed rules for your application.

AWS Managed Rule Groups ($1/mo each)

When AWS released WAF v2, AWS announced AWS Managed Rules. Like marketplace rule groups (above), AWS managed rule groups cost a flat fee ($1/mo per group) regardless of how many rules the group may actually contain. If you are new to WAF, I would recommend starting with these rule groups!

Configuring WAF for Laravel

Each ACL has a capacity for up to 1500 compute units. Each individual rule requires a few units. Each AWS managed rule group requires a pre-defined number of compute units. If you are just getting started with AWS WAF and you are wanting to protect a Laravel PHP application on Linux servers, you might consider adding the following AWS managed rule groups to your ACL:

  1. Core rule set (700 units)
  2. Known bad inputs (200 units)
  3. Linux operating system (200 units)
  4. PHP application (100 units)
  5. POSIX operating system (100 units)

Pricing: If you add these 5 rule groups to a single WAF ACL, your total cost would be $10/mo. We calculate this by adding $5/mo for the ACL plus $5/mo for these 5 rule groups ($1/mo per rule group).

Capacity: These 5 rule groups would consume a total of 1300 units of your WAF’s total capacity of 1500 units, leaving some headroom for an additional managed ruleset or your own manual rules.

Instructions

  1. Create your first WAF ACL (e.g. “application-name-waf”)
  2. Go to the ACL “Rules” tab, choose “Add Rules”, then choose “Add managed rule groups”
    1. Expand “AWS managed rule groups”
    2. Toggle “Add to web ACL” next to each AWS managed rule group you would like to use
    3. IMPORTANT: Click the “Edit” button below each toggle, then enable “Set all rule actions to count” for the rule group. When we first enable WAF on our application, this option will prevent the rule from actually blocking any traffic and will allow us to figure out of legitimate traffic will be blocked BEFORE we enable blocking (see below).
    4. Repeat prior 2 steps for each AWS managed rule group you would like to use.
  3. Go to the ACL “Associated AWS Resources” tab, then choose “Add AWS resources”
    1. Assuming your application is running behind an AWS ALB, you would choose “Application Load Balancer”, then you would choose your application’s ALB (e.g. “application-name-alb”), then you would click “Add”. You could also configure an API Gateway or AppSync on this page.
  4. Go to the ACL “Logging and Metrics” tab, click “Edit” next to logging.
    1. Configure your “Logging destination” as “CloudWatch Logs log group”. You will need to create a new Log Group with a name that begins with “aws-waf-logs-“.
      1. Log Group TIP: I suggest the full Log Group name use the required prefix plus your WAF ACL name (e.g. “aws-waf-logs-application-name-waf”) so that your WAF ACL name corresponds with your Log Group name.
      2. Log Group TIP: When creating your log group in CloudWatch under Log Groups, be sure to configure a Retention Period for the group. 30 days is probably fine for most people. We have a longer retention period so we can go back further to look at trends, but please be aware that you have to pay a small amount for CloudWatch log storage ($0.03/mo per GB). If your web application receives a LOT of traffic, keep an eye on the “Stored bytes” value on your Log Group page in CloudWatch and adjust retention as necessary.
      3. Redacted fields are optional. Skip for now.
      4. Filter logs are optional. Skip for now.
      5. Click “Save”
      6. Note: If you are prompted for the names of your CloudWatch metrics, you can accept the default names (e.g. “AWS-AWSManagedRulesCommonRuleSet”, etc)

Congrats! At this point, your new AWS WAF is filtering traffic for your Laravel PHP web application running behind AWS ALB.

Monitoring WAF Logs for Blocked Traffic

You have configured your first WAF ACL with a few AWS managed rules with each rule action set to count (instead of block) and you have applied the WAF ACL to your ALB. At this point, you should log a few days or weeks of usage, then review all of the traffic that would have been blocked by your new WAF ACL.

Viewing WAF Activity Graph in the WAF Console

Open the AWS WAF console and drill-down to your new WAF ACL. The “Overview” tab for the ACL will show a graph of all requests for the selected period (e.g. 1 hour, 3 hours, 12 hours, 1 day, etc).

If the “ALL AllowedRequests” metric and the “application-name-waf AllowedRequests” metric always show the exact same count, your WAF ACL is not detecting any requests that need to be blocked. Make sure you added one or more AWS managed log groups in the Rules section of your WAF ACL. Otherwise, we may need to wait for more traffic before analyzing your logs.

If you see any “Blocked” metrics, we need to make sure “count” setting is enabled for each rule group. Go to your WAF ACL, go to WAF Rules, click “Add Rules”, click “Add managed rule groups”, expand “AWS Managed Rule Groups”, find each group where “Add to web ACL” is enabled, and click “Edit”, then make sure “Set all rule actions to count” is enabled.

If the “ALL CountedRequests” metric ever shows a non-zero count, we will need to investigate further. At this point, we use the CloudWatch console to find/review the individual WAF ACL requests that were counted.

Viewing WAF Activity Logs in the CloudWatch Console

Open the AWS CloudWatch console, go to Log Groups, and drill-down to your new WAF log group (e.g. “aws-waf-logs-application-name-waf”), go to Log Streams, click Search all Log Streams. This will allow you to search individual requests for specific activity. If you leave the search input blank and choose a time filter option (e.g. 30 minutes), you will see all individual requests processed by WAF during that time period.

While you may not normally need to look at the individual requests, you should open/expand at least one log entry and become familiar with the payload of the logged request before you begin reviewing the “counted” requests that your new WAF ACL would have normally blocked.

Reviewing WAF Counted Requests in the CloudWatch Console

Open the AWS CloudWatch console, go to Logs Insights, select a Log Period (e.g. 12 hours), select your Log Group (e.g. “aws-waf-logs-application-name-waf”), then run the following query:

fields @timestamp
| filter (@message like 'excludedRules":[{"exclusionType":"EXCLUDED_AS_COUNT","ruleId":' and @message like 'terminatingRuleId":"Default_Action"')
| parse @message '"name":"Host","value":""' as headersHost 
| parse @message '"name":"host","value":""' as headersHostLower
| fields coalesce(headersHost, headersHostLower) as targetHost
| parse @message '"ruleId":"*"' as ruleName
| display @timestamp, httpRequest.clientIp, httpRequest.country, httpRequest.httpVersion, targetHost, httpRequest.uri, ruleName, httpRequest.requestId
|limit 100

This query will filter all requests for rules that were excluded as a count action (e.g. “exclusionType” equals “EXCLUDED_AS_COUNT”) and where the terminating rule was the default action (e.g. “terminatingRuleId” equals “Default_Action”).

This query will parse the host value from the HTTP request headers (as “targetHost”) and will also parse the Rule ID that was counted (as “ruleName”)

Your query should look similar to this:

Your query results should look similar to this:

False Positives

Each query result above would have been a blocked request if you had not enabled the “count” action on each of your AWS managed rule groups. You should review all of your query results and determine if any counted requests were legitimate traffic.

If the AWS managed rules are blocking legitimate traffic, you will have to do one of the following:

  1. Create a manual rule that allows the specific traffic condition that is being blocked
  2. Modify your application so that the path or variable or content that is being blocked no longer triggers the rule. For example, you may be able to rename a variable name that is triggering a rule.
  3. Disable a portion of the AWS managed rule group or the entire AWS managed rule group.

In our case, a 3rd party integration uses XML payloads to transmit data to an older API so we created a custom rule that allowed that specific traffic. This one custom rule increased our AWS WAF cost by $1/mo. After saving the rule, WAF allowed us to configure the priority of the rule so that our custom rule was applied before any of the AWS managed rule groups.

Blocking Requests

After you have reviewed your counted requests and have eliminated false positives, you are ready to configure AWS WAF to begin blocking requests!

  1. Open the AWS WAF Console
  2. Go to your WAF ACL
  3. Go to WAF Rules
  4. Click “Add Rules”
  5. Click “Add managed rule groups”
  6. Expand “AWS Managed Rule Groups”
    1. Find each group where “Add to web ACL” is enabled
    2. Click “Edit”
    3. Disable “Set all rule actions to count”
    4. Repeat steps above for each group where “Add to web ACL” is enabled

Congrats! Your new WAF ACL is now protecting your Laravel PHP web application.

Did you find this helpful? Let me know by sending me a comment. I tend to update and maintain posts more frequently if I know others find them helpful. Thanks for visiting!

Travel Tips: Downtown Charlotte, NC

Planning to visit downtown Charlotte, NC with a few guys for a long weekend. Sharing my travel notes for other first-time visitors.

Resources

Public Transit vs Ride Sharing vs Car Rental

We initially thought we would need to ride share or rent a car, but it sounds like Charlotte, NC has a great public transit system. We plan to take a bus from Airport to Downtown (bus runs every 15 minutes), then take their light rail system and busses around downtown. Cost is about $2/ride or $30/week for unlimited rides.

CATS Bus Route: Airport Terminal to Downtown (Airport to downtown is 21 minute ride)
https://charlottenc.gov/cats/bus/routes/Pages/

CATS Light Rail: Lynx Blue/Gold Rail Lines
https://charlottenc.gov/cats/rail/Pages/

CATS Map of the Blue Line (04/2022). The trains currently stop every 15 minutes during the day and every 30 minutes in the evenings.

Here’s a Google Map of Blue Line from McCullough Station (8312 N Tryon St) to Arrowood Station (7430 South Blvd). We aren’t planning to go to the far ends of the line (I-485 or UNC) so I didn’t include the last few stops, but it is interesting to see where the line travels through the Charlotte Metro area.
https://bit.ly/map-clt-rail-blue

CATS Map of the Yellow Line (04/2022). The streetcars currently stop every 20 minutes during the day and every 30 minutes in the evenings. The BLUE/YELLOW lines meet at CTC/Arena (Charlotte Transportation Center)

Charlotte has plans to continue expanding their yellow/blue lines and add additional lines, so definitely check the CATS website for the latest maps and schedule.

Nightlife near Light Rail Stations

I compiled a list of the (currently) highest-rated attractions near Charlotte Blue Line Stations w/ the current Google Rating and Review Count. Based on the Google and Yelp reviews I’ve read, Cellar at Duckworth’s, Wooden Robot, and Aura Rooftop get my votes.

9th St Station

7th St Station

CTC/Arena Station

3rd St Station

Stonewall Station

Carson

Bland Station

East/West Blvd Station

New Bern Station

List Inspired by: Bars and Restaurants Along Charlotte’s Blue Line Light Rail
https://www.thrillist.com/eat/charlotte/bars-and-restaurants-along-charlottes-blue-line-light-rail

Charlotte Attractions

Lodging

If you can find a hotel or B&B near a public transit station, you’ll be able to get around the downtown area without renting a car or relying on ride share services (e.g. Lyft, Uber) based on comments I’ve seen in many different downtown B&B reviews.

We initially booked a B&B so that we could stay together, but the host cancelled on us 4 days before the trip due to a water/flooding emergency. We ended up booking a hotel near one of the Blue Line light rail stations. We couldn’t find another B&B near a Blue Line station that seemed like a good fit for us.

Did you find this helpful? Let me know by sending me a comment. I tend to update and maintain posts more frequently if I know others find them helpful. Thanks for visiting!

How to apply Excel “Center Across Selection” in PhpSpreadsheet XLSX files

Someone suggested replacing Merged Cells in our Excel-formatted (.xlsx) reports with “Center Across Selection” which gives the same visual result without the undesirable side effects of merged cells.

We currently use PhpSpreadsheet v1.16 to generate our Excel-formatted (.xlsx) reports.

I could NOT find any PhpSpreadsheet documentation or discussion of this feature, other than the following PhpSpreadsheet CHANGELOG reference from over a decade ago:

Horizontal center across selection - @Erik Tilt

I dug into Microsoft file format reference material and finally determined that the PhpSpreadsheet equivalent to Excel “Center Across Selection” is the “Alignment::HORIZONTAL_CENTER_CONTINUOUS” value (aka “centerContinuous”).

Here’s how to apply Excel “Center Across Selection” to a range in PhpSpreadsheet:

$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
$sheet->getStyle("A1:C1")->getAlignment()->setHorizontal(Alignment::HORIZONTAL_CENTER_CONTINUOUS);
$sheet->setCellValue("A1", "Example");

The code above will write text “Example” to cell A1. The text will be centered across 3 cells (A1,B1,C1).

Did you find this helpful? Let me know by sending me a comment. I tend to update and maintain posts more frequently if I know others find them helpful. Thanks for visiting!

AWS Aurora MySQL Manual Failover Instructions – Demote Writer instance to Reader instance

This article is for you if…

  • You are using Aurora MySQL on Amazon Web Services (AWS).
  • You have one Writer instance and one Reader instance.
  • You would like to perform a manual failover (so that your Writer instance is demoted to a Reader instance).

I needed to do this recently, but could not find clear instructions anywhere. Here are step-by-step instructions with screenshots.

  1. Login to your AWS Console
  2. Go to the RDS Console and select your Aurora MySQL Cluster
  3. Highlight your Aurora MySQL Writer Instance, choose Failover from the Actions menu

    Comment: Highlighting a specific instance may not be important. I believe if you have multiple Reader instances, the failover will use the “Failover priority” (e.g. “tier-0”, “tier-1”, … “tier-15”) to determine which Reader to promote to Writer. When I added a temporary 2nd Reader with “Failover priority” of “tier-0” and performed failover, that Reader was promoted to Writer. Please post a comment at the bottom of this page if you can confirm how failover is handled with multiple Reader instances! If the “Failover priority” is used to select which Writer instance will be promoted, please confirm if “tier-0” is indeed considered first when promoting a Reader to a Writer and I will update these instructions.

4. Confirm you would like to failover the cluster.

5. Go to the RDS Console and select your Aurora MySQL Cluster. The page will not show any changes for a few seconds.

6. Go to the RDS Console and select your Aurora MySQL Cluster (or refresh the page). Repeat until the page shows that the Writer instance has been demoted to a Reader instance.

7. Failover complete!

In my experience, the failover occurs within 10-15 seconds after submitting the failover request. Our application is aware of the separate reader and writer instances. We have not noticed any application read errors during the manual failover process. However, we have noticed temporary write errors during the failover process. Our application is designed to handle brief write failures and retry the writes, so this was not an issue.

Did you find this helpful? Let me know by sending me a comment. I tend to update and maintain posts more frequently if I know others find them helpful. Thanks for visiting!