[CONJ-419] Insert Query failed after failover and not able to get master connection Created: 2017-01-30  Updated: 2017-11-21  Resolved: 2017-11-21

Status: Closed
Project: MariaDB Connector/J
Component/s: Failover
Affects Version/s: 1.5.6
Fix Version/s: 2.1.0, 1.6.3

Type: Bug Priority: Major
Reporter: Seonmi Anderson Assignee: Diego Dupin
Resolution: Fixed Votes: 1
Labels: None
Environment:

AWS Aurora
Spring JdbcTemplate and SimpleJdbcCall
rewriteBatchedStatements=true
Execute INSERT query using jdbcTemplate
Execute Stored Procedure using SimpleJdbcCall


Issue Links:
Relates
relates to CONJ-502 Aurora JDBC connections leaking betwe... Closed

 Description   

Inserting records failed after failover and throws an exception as following;
SQL state [HY000]; error code [1290]; (conn:6) The MySQL server is running with the --read-only option so it cannot execute this statement

Any exception starting with '08xxx' or '25Sxxx' handled and reset connection. Error code '1290' is treated general SQL exception and eventually application instance should be restarted to fix it.
In AuroraProtocol, it passes MustBeMasterConnection before executing query but it seems not used to validate the master connection.
Please help me if I missed any parameters in connection configuration.



 Comments   
Comment by Diego Dupin [ 2017-01-31 ]

Hi Seonmi,
I'll investigate this, in the meantime could you confirm that you haven't set driver to use slave connection ( by using "@Transactional(readOnly = true) or directly with connection.setReadOnly() ).

Comment by Seonmi Anderson [ 2017-02-01 ]

Thanks for looking into the issue.
The database connection was created for Inserting record only, so it was not annotated readOnly.
The connection is used by 'batch insert' up to 100 records at once and insert query (won't rewritten by BatchRewrite) or stored procedure call for single record insert. It seems 'batch insert' transactions are properly closed after failover but calling procedure and insert query for single record are continue throwing "read-only" (1290) exception. Our app persists 3000 requests per sec.

Comment by Diego Dupin [ 2017-02-09 ]

After failover code review, the only possibility i see for now seems to be aurora master having global/session variable "read_only = ON;", but that require SUPER priviledge.
So that's doubtful this is the case, but this part is now handle with this commits 1 and 2 .

That part will be included in 1.5.8, and when failover error occur, now current instance name will be part of the exception, permitting better analysis if this occur again.

Comment by Paul B Johnson [ 2017-02-16 ]

I am no DB expert, but I believe I am seeing the same issue. I have a service which is using Spring Data JPA/Hibernate with mariaDB driver to access AWS Aurora DB. I am including the "aurora" keyword in the JDBC connection string. That is, the connection string is "jdbc:mariadb:aurora://<path-to-aws-aurora-cluster>"

Everything works fine until I execute a manually-invoked (via AWS console) failover. After a short time, I get the error described above:

WARN o.h.engine.jdbc.spi.SqlExceptionHelper - SQL Error: 1290, SQLState: HY000
ERROR o.h.engine.jdbc.spi.SqlExceptionHelper - (conn:7) The MySQL server is running with the --read-only option so it cannot execute this statement

It seems that the appearance of this error is triggered by some DB activity (the error does not always appear immediately after failover). But I have not yet been able to isolate the trigger. Once it starts to fail, all subsequent DB writes fail with this error.

I saw this with mariadb-java-client-1.5.5. I upgraded to 1.5.8 and see the same thing. I do not see where "read only" is set on the DB, but, again, I am no expert. I can attach to cluster with HeidiSQL and read/write without error, so the DB itself is not read-only.

Note that failover back to original instance does not fix the problem. Restarting my service is only way I have found so far to get things working again once they start failing.

Any suggestions are welcome.

Thanks,
Paul

Comment by Paul B Johnson [ 2017-02-16 ]

Just to be super-clear on this: my mariadb driver is configured to talk to the AWS Aurora cluster. The failover is happening in AWS. I was under the impression that this sort of failover would be essentially transparent to the driver. So, I'm confused now. I'm wondering if this could be a bug in AWS clustering?? Do you have any thoughts?

Comment by Seonmi Anderson [ 2017-02-17 ]

Diego,

I tested 1.5.8 yesterday and couldn't be able to resolve the issue. As Paul stated, SQL error continues until ec2 instance is restarted.
You have made code change at line 215 of FailoverProxy but it seems the method never gets called in my case. My SQL exception (error 1290) is caught at line 157 but Driver can't be reset since the error code doesn't start with '08xxx'.

Comment by Paul B Johnson [ 2017-02-17 ]

I have spent more hours today testing. Results are essentially the same. The only new data point is that I tried specifying the DB instance endpoints (instead of the cluster endpoint) in the jdbc: string. Same behavior was observed.

It seems that the issue may be triggered by updating a row in the DB after the failover. The impression is that the original (originally writable, now read-only (as set by AWS)) master instance is cached somewhere with the data read from the row and when i go to update the row it tries to write back to that original instance and fails (this is just my impression).

Comment by Paul B Johnson [ 2017-03-06 ]

Still battling this issue with no progress to report. I did upgrade to tomcat connection pool with hope that this would help, but the failover behavior is the same. Does anyone else have any progress they can report?

Diego: is there a preferred connection pool implementation for the mariadb driver?

Comment by Seonmi Anderson [ 2017-03-06 ]

Paul,

You may try dbcp2 pool instead of tomcat pool. Tomcat pool is preferred by Spring. Hikari, dbcp, and dbcp2 can be used as well.

Thanks,
Seonmi

Comment by Diego Dupin [ 2017-03-08 ]

There is not "prefered" pool for driver. Lot of people use Hikari, dbcp, and dbcp2 or tomcat pool.
The only disavantage of tomcat pool is not having an option "connection max life time" (server has a wait_timeout option that can conflict if not set to a big value, then option sessionVariables=wait_timeout=2147483 must be set)

Comment by Paul B Johnson [ 2017-03-17 ]

Good news, I think. I believe I have figured out that, for me, this issue is triggered by calling the JPA CrudRepository "findOne()" API (http://docs.spring.io/spring-data/commons/docs/current/api/org/springframework/data/repository/CrudRepository.html?is-external=true#findOne-ID-). After replacing this call with actual queries on the primary ID, the issue seems to have gone away.

Diego: does this make any sense? Is there a reason that using findOne() could result in writes going to the wrong DB instance?

Comment by Diego Dupin [ 2017-03-21 ]

Spring data reading methods like findAll() and findOne(…) are using @Transactional(readOnly = true). That mean using replicates. That's typically to avoid using master server too much.

pbj To be sure of the issue : your application use findOne with default @Transactional(readOnly = true) and then, triggering a failover, you have an exception. What is this exception ? (must not be "The MySQL server is running with the --read-only" since that error will be thrown for write operation).

Comment by Paul B Johnson [ 2017-03-21 ]

The normal sequence is:
1) use findOne() to get entity from DB
2) modify entity
3) call CrudRepository save() method to store the modified entity back to the DB

[Maybe I should not be using findOne() in this situation, but it works in this normal (non-failover) case.]

The failover sequence is:
1) test above normal sequence
2) perform failover
3) use findOne() to get entity from DB (succeeds)
4) modify entity
5) call JPA CrudRepository save() method to store the modified entity back to the DB (fails with SQL Error: 1290, SQLState: HY000)
6) make any other call that attempts to write to the DB (even if unrelated to the findOne() call in step #3) (fails with SQL Error: 1290, SQLState: HY000)

The only way I have found to recover at this point is to restart my service.

Paul

Comment by Diego Dupin [ 2017-04-03 ]

Hi pbj

This is the normal use of spring data (findOne, update, then save).
I've tryed multiple failover without success to reproduced these issue.

Didn't you set option "failOnReadOnly" to true, that would lead to that ?

If your application use at least java 8, could you reproduced this issue with log enable (profileSql=true) and version 2.0.0-SNAPSHOT ?
Advantage of this version is that logging for master/slaves has been improved, displaying log like :

2017-04-03 18:03:17 [http-nio-8091-exec-1] INFO  o.m.j.i.logging.ProtocolLoggingProxy - Query - conn:116(M) - 110,55 ms - "SELECT 1"
2017-04-03 18:03:17 [http-nio-8091-exec-1] INFO  o.m.j.i.logging.ProtocolLoggingProxy - Query - conn:55(S) - 119,136 ms - "set autocommit=0"
2017-04-03 18:03:18 [http-nio-8091-exec-1] INFO  o.m.j.i.logging.ProtocolLoggingProxy - Query - conn:55(S) - 118,726 ms - "select user0_.id as id1_0_0_, user0_.email as email2_0_0_, user0_.name as name3_0_0_ from users user0_ where user0_.id=?, parameters [1]"
2017-04-03 18:03:18 [http-nio-8091-exec-1] INFO  o.m.j.i.logging.ProtocolLoggingProxy - Query - conn:55(S) - 123,878 ms - "COMMIT"
2017-04-03 18:03:18 [http-nio-8091-exec-1] INFO  o.m.j.i.logging.ProtocolLoggingProxy - Query - conn:55(S) - 117,939 ms - "set autocommit=1"
2017-04-03 18:03:18 [http-nio-8091-exec-1] INFO  o.m.j.i.logging.ProtocolLoggingProxy - Query - conn:116(M) - 110,357 ms - "set autocommit=0"
2017-04-03 18:03:18 [http-nio-8091-exec-1] INFO  o.m.j.i.logging.ProtocolLoggingProxy - Query - conn:116(M) - 111,551 ms - "update users set email=?, name=? where id=?, parameters ['dd@diego.fr','s45',1]"
2017-04-03 18:03:18 [http-nio-8091-exec-1] INFO  o.m.j.i.logging.ProtocolLoggingProxy - Query - conn:116(M) - 112,953 ms - "COMMIT"
2017-04-03 18:03:18 [http-nio-8091-exec-1] INFO  o.m.j.i.logging.ProtocolLoggingProxy - Query - conn:116(M) - 109,261 ms - "set autocommit=1"

with "(M)" indicating master connection and "(S)" slave

2.0.0-SNAPSHOT is available using :

<repositories>
    <repository>
        <id>sonatype-nexus-snapshots</id>
        <name>Sonatype Nexus Snapshots</name>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    </repository>
</repositories>
 
<dependencies>
    <dependency>
        <groupId>org.mariadb.jdbc</groupId>
        <artifactId>mariadb-java-client</artifactId>
        <version>2.0.0-SNAPSHOT</version>
    </dependency>
</dependencies>

Comment by Paul B Johnson [ 2017-04-05 ]

Diego,

I have emailed to you the requested log file. Note that I can only reproduce this problem as part of our standard automation test suite. So at this point I suspect there is a very particular sequence of events that results in this problem. Hopefully the log file will help you out.

I did not explicitly set "failOnReadOnly" (or "failOverReadOnly" for that matter).

Paul

Comment by Diego Dupin [ 2017-04-05 ]

Hi Paul,

Thanks for the logs, i'll check them in detail tomorrow now. I hope to bring good news next time.

Diego.

Comment by Paul B Johnson [ 2017-04-06 ]

Note that since I have found and implemented the work-around of not calling "findOne()", our failover tests all pass. Thus the issue has become less of a priority, and more "academic" in nature. We just hope that there are not similar, but non-resolved, issues lurking.

Comment by Diego Dupin [ 2017-04-10 ]

See spring-data documentation about @Transactional.
Persisting and deleting objects requires a transaction in JPA. Thus we need to make sure a transaction is running, which we do by having the method annotated with @Transactional.

Methods that will insert/update database will have to be marked as @Transactional ( = readOnly=false).
If method contain find* and save methods without explicit @Transactional (without readOnly=true) then transaction will use the one initiated with find*, resulting in an exception ("The MySQL server is running with the --read-only option so it cannot execute this statement" using aurora). This isn't related to failover but to load balancing (read and write) to be distributed over multiple servers. .

find* methods, that have implicit @Transactional(readOnly = true), Spring-data will call driver Connection.setReadOnly(true) methods.

When using masters/slaves configuration (like aurora) query will be launched on a slave(replica) server.
Update/insert will then fail with an exception

Adding "@Transactional(readOnly = false)" or just "@Transactional" will solve the issue.

Comment by Diego Dupin [ 2017-11-21 ]

Description correspond to CONJ-502 that is now corrected

Generated at Thu Feb 08 03:15:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.