[CONJ-1061] Many threads blocked in MariaDbStatement#executeInternal waiting on lock - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Not a Bug
Affects Version/s: 2.7.2, 2.7.8
Fix Version/s: N/A
Component/s: Other
Labels:
None

Description

We are seeing customers thread dumps showing hundreds of threads blocked waiting on MariaDbStatement.executeInternal on line 340 (in v2.7.2) which is a lock.lock() call. Although I have no repro, I suspect somewhere the lock is not being unlocked.

EDIT: The following paragraph talks about 2.7.2 code, which it looks like has since been patched and the code I mention is no longer in 2.7.8. Apologies, I was unaware. However, we are still seeing customers with hundreds of threads waiting on the locks in 2.7.8.
A code inspection shows that there are finally-blocks around the code to do the unlock but there are lines of code that can throw RuntimeException before the unlock occurs, which would essentially make it so the unlock() call is not reached. For example, executeInternal's finally block calls executeEpilogue before it unlocks. Unfortunately, a subroutine, stopTimeoutTask(), calls Future#get() which throws CancellationException which is a RuntimeException and would not be caught, preventing the unlock. Additionally stopTimeoutTask() calls Thread.currentThread().interrupt() which can also throw SecurityException which is another RuntimeException.

ASK: Please ensure that unlock() is called for all the methods that do locking. Either ensure that prior methods in the finally block cannot throw any exception or unlock first, if possible. (EDIT: perhaps this means wrapping all methods before the unlock with a try/catch since underlying methods do not need to declare any RuntimeExceptions thrown)

Affected versions: 2.7.2 and 2.7.8 for sure, but also probably the versions in between and possibly earlier versions as well.

Using server mariadb server v10.6.11

Repro: None

Attachments

Activity

Ascending order - Click to sort in descending order

Diego Dupin added a comment - 2023-03-01 13:46 - edited

Just curious, are you using mysql server or mariadb < 10.2 for the connector to use the timeout using another thread?

I'm trying to identify what may be causing this.

Unfortunately, a subroutine, stopTimeoutTask(), calls Future#get() which throws CancellationException which is a RuntimeException and would not be caught, preventing the unlock.

this Future.get is surrounded by try catch

  private void stopTimeoutTask() {

    if (timerTaskFuture != null) {

      if (!timerTaskFuture.cancel(true)) {

        // could not cancel, task either started or already finished

        // we must now wait for task to finish ensuring state modifications are done

        try {

          timerTaskFuture.get();

        } catch (InterruptedException | ExecutionException | CancellationException e) {

          // ignore error, likely due to interrupting during cancel

        // we don't catch the exception if already canceled, that would indicate we tried

        // to cancel in parallel (which this code currently is not designed for)

      timerTaskFuture = null;

So I wonder how it can't be caught. In this case, catch could be even bigger to make sure there is no problem, but that should normally catch all possible exceptions thrown there.

Additionally, stopTimeoutTask() calls Thread.currentThread().interrupt() which may also throw SecurityException which is another RuntimeException.

Do not follow you mean by that.

Since you have hundreds of locks, there's probably something wrong, but right now I don't see how it can happen...
btw, could you tell which Java implementation you are using (there was an issue with IBM jdk a long time ago about threading)?

Diego Dupin added a comment - 2023-03-01 13:46 - edited Just curious, are you using mysql server or mariadb < 10.2 for the connector to use the timeout using another thread? I'm trying to identify what may be causing this. Unfortunately, a subroutine, stopTimeoutTask(), calls Future#get() which throws CancellationException which is a RuntimeException and would not be caught, preventing the unlock. this Future.get is surrounded by try catch private void stopTimeoutTask() { if (timerTaskFuture != null ) { if (!timerTaskFuture.cancel( true )) { // could not cancel, task either started or already finished // we must now wait for task to finish ensuring state modifications are done try { timerTaskFuture.get(); } catch (InterruptedException | ExecutionException | CancellationException e) { // ignore error, likely due to interrupting during cancel } // we don't catch the exception if already canceled, that would indicate we tried // to cancel in parallel (which this code currently is not designed for) } timerTaskFuture = null ; } } So I wonder how it can't be caught. In this case, catch could be even bigger to make sure there is no problem, but that should normally catch all possible exceptions thrown there. Additionally, stopTimeoutTask() calls Thread.currentThread().interrupt() which may also throw SecurityException which is another RuntimeException. Do not follow you mean by that. Since you have hundreds of locks, there's probably something wrong, but right now I don't see how it can happen... btw, could you tell which Java implementation you are using (there was an issue with IBM jdk a long time ago about threading)?

Julian Bui added a comment - 2023-03-01 14:17 - edited

Hi Diego. Sorry there is a bit of an error on my part. The original description of the ticket was based on code from 2.7.2 which I didn't realized got patched. Yes, you're right that it seems 2.7.8 handles CancellationException from the Future#get call. And there's no more interrupt() call in 2.7.8 either so ignore my comment.

We are seeing this behavior against maria db server v10.6.11 and with driver v2.7.8. We are using JRE 1.8.

However, the driver still calls Future#cancel without any try/catch. Though I'm not sure, it's hypothetically possible for it to throw a RuntimeException. Would it be safer to wrap ALL code before the unlock() with a try/catch to catch RuntimeExceptions? After all, we cannot rely on documentation to list all the RuntimeExceptions that could possibly be thrown.

Julian Bui added a comment - 2023-03-01 14:17 - edited Hi Diego. Sorry there is a bit of an error on my part. The original description of the ticket was based on code from 2.7.2 which I didn't realized got patched. Yes, you're right that it seems 2.7.8 handles CancellationException from the Future#get call. And there's no more interrupt() call in 2.7.8 either so ignore my comment. We are seeing this behavior against maria db server v10.6.11 and with driver v2.7.8. We are using JRE 1.8. However, the driver still calls Future#cancel without any try/catch. Though I'm not sure, it's hypothetically possible for it to throw a RuntimeException. Would it be safer to wrap ALL code before the unlock() with a try/catch to catch RuntimeExceptions? After all, we cannot rely on documentation to list all the RuntimeExceptions that could possibly be thrown.

Julian Bui added a comment - 2023-04-07 12:26

Hi Diego, I think you can close this. I've lost enough context with the customer incidents to follow up in any meaningful way.

Julian Bui added a comment - 2023-04-07 12:26 Hi Diego, I think you can close this. I've lost enough context with the customer incidents to follow up in any meaningful way.

Diego Dupin added a comment - 2023-04-11 14:28

Closing as requested by reporter

Diego Dupin added a comment - 2023-04-11 14:28 Closing as requested by reporter

MariaDB Connector/J

Many threads blocked in MariaDbStatement#executeInternal waiting on lock

Details

Description

Attachments

Activity

People

Dates

Git Integration