[CONJ-751] Support Aurora mode with a Master only connection - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: N/A
Component/s: aurora
Labels:
None

Description

Related to https://jira.mariadb.org/browse/CONJ-551

Hi!

Following a recent manual failover scenario in our production, I was tasked to investigate the long recovery time for our instances on a 2 nodes aurora cluster. Since we have strict consistency requirement, we need to always use the master instance to avoid replica lag inconsistency. Therefore, we use the "failover" (or "loadbalance") mode of the driver.

My investigation pointed out that the maximum lifetime of the connection was the only mechanism in which the connections will get reestablished to the new elected master. This is because since it's a manual failover, the now-reader node is still up but is throwing the "read-only" exception on each modifying query. Nobody handles that and after the connection max lifetime has elapsed, the connection get reestablished to the new master (usually, DNS propagation has happened by then). This leads to potential very long downtime!

We've made tests in our develop environment with the aurora mode of the driver and the results we're really impressive, with a connection pool of 12 connections and 144 insert/sec on 12 different threads, the downtime following a manual failover was next to none, each thread logged one error and that's it so it meant less than a second of downtime. Like I said, impressive and good job on that!

This brings me to the title of this jira, would it be possible to have a "aurora" mode combined with the "MasterProtocol" so we could leverage the awesome failover capabilities while retaining the master only connection and data consistency?

In case someone else finds this, here are some workaround possible :

Have a thread in the background that fetches the ip address of the cluster endpoint and evict the connections of your pool when it changes (Hikari has a neat softEvictConnections method), we will be using this for now
Have a wrapper around your datasource to catch the read-only exception, evict the connection and rethrow it
After a manual failover, reboot the new reader (the ex master) so that the existing connections die
Have a very small max lifetime on your connections (something like 1-2 minutes)
Have a validation query that checks the read-only status of the instance

Attachments

Issue Links

duplicates

CONJ-723 Aurora: allow disabling of load balancing

Closed

relates to

CONJ-551 Provide a way to ignore the load balancing in AWS Aurora while retaining failover

Closed

Activity

People

Assignee:: Diego Dupin

Reporter:: Jacques-Etienne Beaudet

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2019-12-19 22:22

Updated:: 2020-03-06 10:11

Resolved:: 2020-03-06 10:11

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.