[MDEV-31818] Server crashes in choose_best_splitting Created: 2023-08-01  Updated: 2024-01-30

Status: Open
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.11.2, 10.11.4
Fix Version/s: 10.11

Type: Bug Priority: Major
Reporter: Christian Braeuner Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 2
Labels: crash
Environment:

ProLiant DL360 Gen10, 48 cores, 128GB memory, Centos 8


Attachments: Zip Archive 2023-08-03-mariaDB-logs.zip     Zip Archive 2023-11-08-mariaDB-logs.zip     Text File xtradb-3.mariadb.log     Text File xtradb-4.mariadb.log    
Issue Links:
Relates
relates to MDEV-31403 Server crashes in st_join_table::choo... Closed
relates to MDEV-31440 choose_best_splitting: crash on updat... Confirmed
relates to MDEV-32064 Crash when searching for the best spl... Closed

 Description   

We have a primary with 2 replicas and have seen a crash where both replicas crash almost simultaneously at the same query. At other times during the day the same query has had no problems and unfortunately we have not managed to replicate this either.
So far we have seen exactly the same behaviour only once more two weeks ago where a very similar query crashed both replication servers.
In both cases there was a bulk import running on the primary server that got replicated to the crashing replicas, but on a different schema and table that is requested in the query.



 Comments   
Comment by Christian Braeuner [ 2023-08-03 ]

To clarify our setup: one replica is MariaDB 10.11.2 the other 10.11.4. The replicas are used exclusively for reading. The master is not used for queries and is only importing data.
We had another crash of both replicas today and again there was a bulk import replicated from the master in parallel, however on an unrelated schema and a different table than the last time where the replicas crashed. The master imports run relative frequently, so this could still be coincidence.

Comment by Christian Braeuner [ 2023-11-08 ]

We are still experiencing the crashes at random intervals, on average 3-4 times a month.
I have uploaded the latest crash dump from today, although there is nothing new and it is the same method "choose_best_splitting". So far we have only ever seen the crashes while a table import on the primary is taking place that gets copied through to the replica when the problematic query is executed on the replica.
Is there anything we can try to help finding the problem?

Comment by Richard DEMONGEOT [ 2023-11-09 ]

Hello cbefin;

Could you read the https://jira.mariadb.org/browse/MDEV-32064 issue? i think it's very similar.

If yes, a patch was written, but not delivered yet. Should be on the next release.

Regards;

Comment by Christian Braeuner [ 2023-11-13 ]

Hi Richard,
thanks, I have experimented with the in_predicate_conversion_threshold setting and while I can make it crash with the steps given in the report, I was not able to reproduce a crash with our own problematic query under load and also using different values for the in_predicate_conversion_threshold. The query takes a lot longer when setting it too low, but it does not crash the db.

Comment by Christian Braeuner [ 2023-11-20 ]

We have changed our configuration last week to also use the primary node for requests in an attempt eliminate the replication as one of the factors. Today we had a simulataneous crash of the primary and one replica, which tells us that replication is not causing the instability. The crash was again happening during a bulk import of an unrelated table in a separate schema.

Comment by Alice Sherepa [ 2023-12-15 ]

Is it possible for you to upgrade to the recent MariaDB version? It might be the same as MDEV-31440 and with the test case, that was provided there, the crash does not happen anymore.

Comment by Christian Braeuner [ 2024-01-30 ]

Hi, we have changed the query in the meantime to no longer use a subselect with distinct. Since then we had no crashes. As this crash was only ever observed in our productive environment we do not want to put the dangerous query back in order to test later versions of MariaDB.

Generated at Thu Feb 08 10:26:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.