[MDEV-31818] Server crashes in choose_best_splitting - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.11.2, 10.11.4
Fix Version/s: 10.11
Component/s: Server
Labels:
- crash
Environment:
ProLiant DL360 Gen10, 48 cores, 128GB memory, Centos 8

Description

We have a primary with 2 replicas and have seen a crash where both replicas crash almost simultaneously at the same query. At other times during the day the same query has had no problems and unfortunately we have not managed to replicate this either.
So far we have seen exactly the same behaviour only once more two weeks ago where a very similar query crashed both replication servers.
In both cases there was a bulk import running on the primary server that got replicated to the crashing replicas, but on a different schema and table that is requested in the query.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

2023-08-03-mariaDB-logs.zip
14 kB
2023-08-03 08:57
2023-11-08-mariaDB-logs.zip
4 kB
2023-11-08 11:20
xtradb-3.mariadb.log
24 kB
2023-08-01 10:31
xtradb-4.mariadb.log
45 kB
2023-08-01 10:31

Issue Links

relates to

MDEV-31403 Server crashes in st_join_table::choose_best_splitting (still)

Closed

MDEV-31440 choose_best_splitting: crash on update query using correlated subquery after minor update

Confirmed

MDEV-32064 Crash when searching for the best split of derived table

Closed

Activity

Ascending order - Click to sort in descending order

Christian Braeuner added a comment - 2023-08-03 09:12

To clarify our setup: one replica is MariaDB 10.11.2 the other 10.11.4. The replicas are used exclusively for reading. The master is not used for queries and is only importing data.
We had another crash of both replicas today and again there was a bulk import replicated from the master in parallel, however on an unrelated schema and a different table than the last time where the replicas crashed. The master imports run relative frequently, so this could still be coincidence.

Christian Braeuner added a comment - 2023-08-03 09:12 To clarify our setup: one replica is MariaDB 10.11.2 the other 10.11.4. The replicas are used exclusively for reading. The master is not used for queries and is only importing data. We had another crash of both replicas today and again there was a bulk import replicated from the master in parallel, however on an unrelated schema and a different table than the last time where the replicas crashed. The master imports run relative frequently, so this could still be coincidence.

Christian Braeuner added a comment - 2023-11-08 11:28

We are still experiencing the crashes at random intervals, on average 3-4 times a month.
I have uploaded the latest crash dump from today, although there is nothing new and it is the same method "choose_best_splitting". So far we have only ever seen the crashes while a table import on the primary is taking place that gets copied through to the replica when the problematic query is executed on the replica.
Is there anything we can try to help finding the problem?

Christian Braeuner added a comment - 2023-11-08 11:28 We are still experiencing the crashes at random intervals, on average 3-4 times a month. I have uploaded the latest crash dump from today, although there is nothing new and it is the same method "choose_best_splitting". So far we have only ever seen the crashes while a table import on the primary is taking place that gets copied through to the replica when the problematic query is executed on the replica. Is there anything we can try to help finding the problem?

Richard DEMONGEOT added a comment - 2023-11-09 17:37

Hello cbefin;

Could you read the https://jira.mariadb.org/browse/MDEV-32064 issue? i think it's very similar.

If yes, a patch was written, but not delivered yet. Should be on the next release.

Regards;

Richard DEMONGEOT added a comment - 2023-11-09 17:37 Hello cbefin ; Could you read the https://jira.mariadb.org/browse/MDEV-32064 issue? i think it's very similar. If yes, a patch was written, but not delivered yet. Should be on the next release. Regards;

Christian Braeuner added a comment - 2023-11-13 10:35

Hi Richard,
thanks, I have experimented with the in_predicate_conversion_threshold setting and while I can make it crash with the steps given in the report, I was not able to reproduce a crash with our own problematic query under load and also using different values for the in_predicate_conversion_threshold. The query takes a lot longer when setting it too low, but it does not crash the db.

Christian Braeuner added a comment - 2023-11-13 10:35 Hi Richard, thanks, I have experimented with the in_predicate_conversion_threshold setting and while I can make it crash with the steps given in the report, I was not able to reproduce a crash with our own problematic query under load and also using different values for the in_predicate_conversion_threshold. The query takes a lot longer when setting it too low, but it does not crash the db.

Christian Braeuner added a comment - 2023-11-20 09:33

We have changed our configuration last week to also use the primary node for requests in an attempt eliminate the replication as one of the factors. Today we had a simulataneous crash of the primary and one replica, which tells us that replication is not causing the instability. The crash was again happening during a bulk import of an unrelated table in a separate schema.

Christian Braeuner added a comment - 2023-11-20 09:33 We have changed our configuration last week to also use the primary node for requests in an attempt eliminate the replication as one of the factors. Today we had a simulataneous crash of the primary and one replica, which tells us that replication is not causing the instability. The crash was again happening during a bulk import of an unrelated table in a separate schema.

Alice Sherepa added a comment - 2023-12-15 16:28

Is it possible for you to upgrade to the recent MariaDB version? It might be the same as MDEV-31440 and with the test case, that was provided there, the crash does not happen anymore.

Alice Sherepa added a comment - 2023-12-15 16:28 Is it possible for you to upgrade to the recent MariaDB version? It might be the same as MDEV-31440 and with the test case, that was provided there, the crash does not happen anymore.

Christian Braeuner added a comment - 2024-01-30 14:38

Hi, we have changed the query in the meantime to no longer use a subselect with distinct. Since then we had no crashes. As this crash was only ever observed in our productive environment we do not want to put the dangerous query back in order to test later versions of MariaDB.

Christian Braeuner added a comment - 2024-01-30 14:38 Hi, we have changed the query in the meantime to no longer use a subselect with distinct. Since then we had no crashes. As this crash was only ever observed in our productive environment we do not want to put the dangerous query back in order to test later versions of MariaDB.

People

Assignee:: Sergei Petrunia

Reporter:: Christian Braeuner

Votes:: 2 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2023-08-01 10:44

Updated:: 2024-01-30 14:38

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration