Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.11.13
-
debian bookworm
Description
During a routine schema change the server got totally stuck.
Schema change:
alter table categorylinks drop primary key, add primary key(cl_from, cl_target_id); CREATE INDEX cl_timestamp_id ON categorylinks (cl_target_id, cl_timestamp);
|
Table definition before the schema change:
Create Table: CREATE TABLE `categorylinks` (
|
`cl_from` int(8) unsigned NOT NULL DEFAULT 0,
|
`cl_to` varbinary(255) NOT NULL DEFAULT '',
|
`cl_sortkey` varbinary(230) NOT NULL DEFAULT '',
|
`cl_timestamp` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
|
`cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT '',
|
`cl_collation` varbinary(32) NOT NULL DEFAULT '',
|
`cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
|
`cl_collation_id` smallint(5) unsigned NOT NULL DEFAULT 0,
|
`cl_target_id` bigint(20) unsigned DEFAULT NULL,
|
PRIMARY KEY (`cl_from`,`cl_to`),
|
KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
|
KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
|
KEY `cl_sortkey_id` (`cl_target_id`,`cl_type`,`cl_sortkey`,`cl_from`)
|
) ENGINE=InnoDB DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED
|
Expected table definition after the schema change:
Create Table: CREATE TABLE `categorylinks` (
|
`cl_from` int(8) unsigned NOT NULL DEFAULT 0,
|
`cl_to` varbinary(255) NOT NULL DEFAULT '',
|
`cl_sortkey` varbinary(230) NOT NULL DEFAULT '',
|
`cl_timestamp` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
|
`cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT '',
|
`cl_collation` varbinary(32) NOT NULL DEFAULT '',
|
`cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
|
`cl_collation_id` smallint(5) unsigned NOT NULL DEFAULT 0,
|
`cl_target_id` bigint(20) unsigned NOT NULL,
|
PRIMARY KEY (`cl_from`,`cl_target_id`),
|
KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
|
KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
|
KEY `cl_sortkey_id` (`cl_target_id`,`cl_type`,`cl_sortkey`,`cl_from`),
|
KEY `cl_timestamp_id` (`cl_target_id`,`cl_timestamp`)
|
) ENGINE=InnoDB DEFAULT CHARSET=binary
|
This change ran finely across multiple hosts and tables but on this host it has gotten stuck for many hours and the server was totally unresponsive (to the point that I had to kill -9).
cumin2024@db1229.eqiad.wmnet[(none)]> show full processlist;
|
+------------+-----------------+----------------------+-------------+---------+---------+------------------------------------------+---------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+------------+-----------------+----------------------+-------------+---------+---------+------------------------------------------+---------------------------------------------------------------------------------------------------------+----------+
|
| 2 | event_scheduler | localhost | NULL | Daemon | 2749346 | Waiting for next activation | NULL | 0.000 |
|
| 11 | orchestrator | 208.80.155.103:36444 | NULL | Sleep | 3 | | NULL | 0.000 |
|
| 12 | orchestrator | 208.80.155.103:36450 | NULL | Sleep | 3 | | NULL | 0.000 |
|
| 13 | orchestrator | 208.80.155.103:36462 | NULL | Sleep | 3 | | NULL | 0.000 |
|
| 1127202449 | cumin2024 | 10.64.48.98:57588 | enwikiquote | Query | 48100 | Committing alter table to storage engine | alter table categorylinks drop primary key, add primary key(cl_from, cl_target_id) | 0.000 |
|
| 1127202523 | root | localhost | ops | Connect | 47880 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127202846 | root | localhost | ops | Connect | 46980 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127203177 | root | localhost | ops | Connect | 46080 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127203462 | root | localhost | ops | Connect | 45180 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127203745 | root | localhost | ops | Connect | 44280 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127204068 | root | localhost | ops | Connect | 43380 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127204399 | root | localhost | ops | Connect | 42480 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127204682 | root | localhost | ops | Connect | 41580 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127204967 | root | localhost | ops | Connect | 40680 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127205291 | root | localhost | ops | Connect | 39780 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127205622 | root | localhost | ops | Connect | 38880 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127205911 | root | localhost | ops | Connect | 37980 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127206194 | root | localhost | ops | Connect | 37080 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127206515 | root | localhost | ops | Connect | 36180 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127206846 | root | localhost | ops | Connect | 35280 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127207129 | root | localhost | ops | Connect | 34380 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127207414 | root | localhost | ops | Connect | 33480 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127207737 | root | localhost | ops | Connect | 32580 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127208068 | root | localhost | ops | Connect | 31680 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127208353 | root | localhost | ops | Connect | 30780 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127208636 | root | localhost | ops | Connect | 29880 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127208959 | root | localhost | ops | Connect | 28980 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127209290 | root | localhost | ops | Connect | 28080 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127209578 | root | localhost | ops | Connect | 27180 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127209861 | root | localhost | ops | Connect | 26280 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127210183 | root | localhost | ops | Connect | 25380 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127210514 | root | localhost | ops | Connect | 24480 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127210799 | root | localhost | ops | Connect | 23580 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127211082 | root | localhost | ops | Connect | 22680 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127211404 | root | localhost | ops | Connect | 21780 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127211735 | root | localhost | ops | Connect | 20880 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127212024 | root | localhost | ops | Connect | 19980 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127212307 | root | localhost | ops | Connect | 19080 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127212628 | root | localhost | ops | Connect | 18180 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127212959 | root | localhost | ops | Connect | 17280 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127213242 | root | localhost | ops | Connect | 16380 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127213527 | root | localhost | ops | Connect | 15480 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127213850 | root | localhost | ops | Connect | 14580 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127214181 | root | localhost | ops | Connect | 13680 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127214466 | root | localhost | ops | Connect | 12780 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127214749 | root | localhost | ops | Connect | 11880 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127215071 | root | localhost | ops | Connect | 10980 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127215402 | root | localhost | ops | Connect | 10080 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127215686 | root | localhost | ops | Connect | 9180 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127215971 | root | localhost | ops | Connect | 8280 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127216296 | root | localhost | ops | Connect | 7380 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127216627 | root | localhost | ops | Connect | 6480 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127216913 | root | localhost | ops | Connect | 5580 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127217196 | root | localhost | ops | Connect | 4680 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127217518 | root | localhost | ops | Connect | 3780 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127217849 | root | localhost | ops | Connect | 2880 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127218134 | root | localhost | ops | Connect | 1980 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127218419 | root | localhost | ops | Connect | 1080 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127218448 | root | localhost | NULL | Sleep | 987 | | NULL | 0.000 |
|
| 1127218744 | root | localhost | ops | Connect | 180 | Opening tables | delete from event_log where stamp < now() - interval 1 day and server_id = NAME_CONST('sid',171970754) | 0.000 |
|
| 1127218810 | cumin2024 | 10.64.48.98:53996 | NULL | Query | 0 | starting | show full processlist | 0.000 |
|
+------------+-----------------+----------------------+-------------+---------+---------+------------------------------------------+---------------------------------------------------------------------------------------------------------+----------+
|
61 rows in set (0.002 sec)
|
|
cumin2024@db1229.eqiad.wmnet[(none)]> show engine innodb status\G
|
*************************** 1. row ***************************
|
Type: InnoDB
|
Name:
|
Status:
|
=====================================
|
2025-07-18 06:18:20 0x7f9b0cb666c0 INNODB MONITOR OUTPUT
|
=====================================
|
Per second averages calculated from the last 42 seconds
|
-----------------
|
BACKGROUND THREAD
|
-----------------
|
srv_master_thread loops: 33 srv_active, 0 srv_shutdown, 2700380 srv_idle
|
srv_master_thread log flush and writes: 2700402
|
----------
|
SEMAPHORES
|
----------
|
------------
|
TRANSACTIONS
|
------------
|
Trx id counter 181844730732
|
Purge done for trx's n:o < 181844730730 undo n:o < 0 state: running
|
History list length 1
|
LIST OF TRANSACTIONS FOR EACH SESSION:
|
---TRANSACTION 181844730731, ACTIVE 48121 sec renaming table
|
11 lock struct(s), heap size 1128, 0 row lock(s), undo log entries 3
|
MariaDB thread id 1127202449, OS thread handle 140303901337280, query id 28250671597 10.64.48.98 cumin2024 Committing alter table to storage engine
|
alter table categorylinks drop primary key, add primary key(cl_from, cl_target_id)
|
---TRANSACTION 181844730729, ACTIVE 48121 sec
|
mysql tables in use 1, locked 1
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 1127202449, OS thread handle 140303901337280, query id 28250671597 10.64.48.98 cumin2024 Committing alter table to storage engine
|
alter table categorylinks drop primary key, add primary key(cl_from, cl_target_id)
|
Trx read view will not see trx with id >= 181844730728, sees < 181844730728
|
---TRANSACTION (0x7ffacc24a780), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7ffacc248680), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7ffacc249180), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7ffacc247b80), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
--------
|
FILE I/O
|
--------
|
Pending flushes (fsync): 0
|
501803960 OS file reads, 207739151 OS file writes, 191731558 OS fsyncs
|
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
|
---
|
LOG
|
---
|
Log sequence number 55824729055508
|
Log flushed up to 55824661967567
|
Pages flushed up to 55823338001381
|
Last checkpoint at 55823329660192
|
----------------------
|
BUFFER POOL AND MEMORY
|
----------------------
|
Total large memory allocated 405907963904
|
Dictionary memory allocated 2413549608
|
Buffer pool size 24530688
|
Free buffers 1536
|
Database pages 24075345
|
Old database pages 8887167
|
Modified db pages 88919
|
Percent of dirty pages(LRU & free pages): 0.369
|
Max dirty pages percent: 75.000
|
Pending reads 0
|
Pending writes: LRU 0, flush list 0
|
Pages made young 545099336, not young 2580188305
|
0.00 youngs/s, 0.00 non-youngs/s
|
Pages read 501771254, created 5242410, written 205819881
|
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
|
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
|
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
|
LRU len: 24075345, unzip_LRU len: 907598
|
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
|
--------------
|
ROW OPERATIONS
|
--------------
|
1 read views open inside InnoDB
|
state: flushing log
|
----------------------------
|
END OF INNODB MONITOR OUTPUT
|
============================
|
|
1 row in set (0.002 sec)
|
cumin2024@db1229.eqiad.wmnet[(none)]> show global status like 'Uptime';
|
+---------------+---------+
|
| Variable_name | Value |
|
+---------------+---------+
|
| Uptime | 2749412 |
|
+---------------+---------+
|
1 row in set (0.002 sec)
|
I have attached stacktraces from the running process.