Status: Closed (View Workflow)
10.0, 10.1, 10.1.23, 10.2
CentOS7, Ubuntu 16.04
Over the past few weeks I've had three different servers with MariaDB 10.1.20 through 10.1.23 inclusive all crash seemingly at random with a Signal 11.
No common time of day, no common operation other than operating on data stored in InnoDB tables.
Attached is the GDB trace from the core dump I managed to obtain as well as configs.
I can make the core dump available if needs be however it is over 200GB uncompressed, approx 50GB with gzip.
The overall configuration is a single read/write master with slaves attached. One of the slaves is used for SELECT statements by the same application while the other is a dormant stand by.
The crash happens on three separate servers of identical hardware. CentOS7 and Ubuntu 16.04 have both been the running OS while the crash has occurred. The crash has only occurred while the server has been operating as the master with slaves attached, as yet we've not seen an active read slave or dormant slave exhibit the same crash. Potential concurrency issue?
We DO have a vast number of tables in the DB, close to 700k, approx 400k of those active during a normal business day. table_open_cache is currently set at 524288 because we've found if we don't set it/set it low, we get locked up waiting for mysql to look through the table cache for an evict-able table before it gives up and adds a new table_cache_entry anyway.
Also attached is the my.cnf and the query/query plan for the query in the stack trace for the thread that segfaulted, though we've seen it happen on optimize table statements before now.
On the current active write master I've set the optimizer_switch to default as I see MRR involved in the thread that died, something we turned on at some point and I note in MariaDB defaults, it's switched off.
- relates to
MDEV-13425 Signal 11 on get_share() inside HASH_SEARCH and strcmp