Am running the following on a slave:
- Largish (24h, 600M rows, 200G) ALTER TABLE
- Events with INFORMATION_SCHEMA queries
- Threadpool pool-of-threads active
- Replication active
- No other significant traffic
After several hours, MariaDB locks up with 0% CPU and disk activity, and no response on existing or new connections on port, extra_port, or socket.
Attached are gdb backtraces for two occurrences, examples of the ALTER and the INFORMATION_SCHEMA activity, and other info. Would appreciate any insight from devs to identify the deadlock, and to narrow down the variables for a test case that isn't 200G.
Am presently trialing the ALTER outside the threadpool using the extra_port, with all other settings unchanged.
- It doesn't seem to be a thread pool overload, as there aren't enough threads in the backtrace.
- The INFORMATION_SCHEMA event traffic uses GET_LOCK to serialize some activity and prevent pile-up.