Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37270

Stuck server during an ALTER TABLE

    XMLWordPrintable

Details

    Description

      During a routine schema change the server got totally stuck.
      Schema change:

      alter table categorylinks drop primary key, add primary key(cl_from, cl_target_id); CREATE INDEX cl_timestamp_id ON categorylinks (cl_target_id, cl_timestamp);
      

      Table definition before the schema change:

      Create Table: CREATE TABLE `categorylinks` (
        `cl_from` int(8) unsigned NOT NULL DEFAULT 0,
        `cl_to` varbinary(255) NOT NULL DEFAULT '',
        `cl_sortkey` varbinary(230) NOT NULL DEFAULT '',
        `cl_timestamp` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
        `cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT '',
        `cl_collation` varbinary(32) NOT NULL DEFAULT '',
        `cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
        `cl_collation_id` smallint(5) unsigned NOT NULL DEFAULT 0,
        `cl_target_id` bigint(20) unsigned DEFAULT NULL,
        PRIMARY KEY (`cl_from`,`cl_to`),
        KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
        KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
        KEY `cl_sortkey_id` (`cl_target_id`,`cl_type`,`cl_sortkey`,`cl_from`)
      ) ENGINE=InnoDB DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED
      

      Expected table definition after the schema change:

      Create Table: CREATE TABLE `categorylinks` (
        `cl_from` int(8) unsigned NOT NULL DEFAULT 0,
        `cl_to` varbinary(255) NOT NULL DEFAULT '',
        `cl_sortkey` varbinary(230) NOT NULL DEFAULT '',
        `cl_timestamp` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
        `cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT '',
        `cl_collation` varbinary(32) NOT NULL DEFAULT '',
        `cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
        `cl_collation_id` smallint(5) unsigned NOT NULL DEFAULT 0,
        `cl_target_id` bigint(20) unsigned NOT NULL,
        PRIMARY KEY (`cl_from`,`cl_target_id`),
        KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
        KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
        KEY `cl_sortkey_id` (`cl_target_id`,`cl_type`,`cl_sortkey`,`cl_from`),
        KEY `cl_timestamp_id` (`cl_target_id`,`cl_timestamp`)
      ) ENGINE=InnoDB DEFAULT CHARSET=binary
      

      This change ran finely across multiple hosts and tables but on this host it has gotten stuck for many hours and the server was totally unresponsive (to the point that I had to kill -9).

      cumin2024@db1229.eqiad.wmnet[(none)]> show full processlist;
      +------------+-----------------+----------------------+-------------+---------+---------+------------------------------------------+---------------------------------------------------------------------------------------------------------+----------+
      | Id         | User            | Host                 | db          | Command | Time    | State                                    | Info                                                                                                    | Progress |
      +------------+-----------------+----------------------+-------------+---------+---------+------------------------------------------+---------------------------------------------------------------------------------------------------------+----------+
      |          2 | event_scheduler | localhost            | NULL        | Daemon  | 2749346 | Waiting for next activation              | NULL                                                                                                    |    0.000 |
      |         11 | orchestrator    | 208.80.155.103:36444 | NULL        | Sleep   |       3 |                                          | NULL                                                                                                    |    0.000 |
      |         12 | orchestrator    | 208.80.155.103:36450 | NULL        | Sleep   |       3 |                                          | NULL                                                                                                    |    0.000 |
      |         13 | orchestrator    | 208.80.155.103:36462 | NULL        | Sleep   |       3 |                                          | NULL                                                                                                    |    0.000 |
      | 1127202449 | cumin2024       | 10.64.48.98:57588    | enwikiquote | Query   |   48100 | Committing alter table to storage engine | alter table categorylinks drop primary key, add primary key(cl_from, cl_target_id)                      |    0.000 |
      | 1127202523 | root            | localhost            | ops         | Connect |   47880 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127202846 | root            | localhost            | ops         | Connect |   46980 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127203177 | root            | localhost            | ops         | Connect |   46080 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127203462 | root            | localhost            | ops         | Connect |   45180 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127203745 | root            | localhost            | ops         | Connect |   44280 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127204068 | root            | localhost            | ops         | Connect |   43380 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127204399 | root            | localhost            | ops         | Connect |   42480 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127204682 | root            | localhost            | ops         | Connect |   41580 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127204967 | root            | localhost            | ops         | Connect |   40680 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127205291 | root            | localhost            | ops         | Connect |   39780 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127205622 | root            | localhost            | ops         | Connect |   38880 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127205911 | root            | localhost            | ops         | Connect |   37980 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127206194 | root            | localhost            | ops         | Connect |   37080 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127206515 | root            | localhost            | ops         | Connect |   36180 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127206846 | root            | localhost            | ops         | Connect |   35280 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127207129 | root            | localhost            | ops         | Connect |   34380 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127207414 | root            | localhost            | ops         | Connect |   33480 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127207737 | root            | localhost            | ops         | Connect |   32580 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127208068 | root            | localhost            | ops         | Connect |   31680 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127208353 | root            | localhost            | ops         | Connect |   30780 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127208636 | root            | localhost            | ops         | Connect |   29880 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127208959 | root            | localhost            | ops         | Connect |   28980 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127209290 | root            | localhost            | ops         | Connect |   28080 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127209578 | root            | localhost            | ops         | Connect |   27180 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127209861 | root            | localhost            | ops         | Connect |   26280 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127210183 | root            | localhost            | ops         | Connect |   25380 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127210514 | root            | localhost            | ops         | Connect |   24480 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127210799 | root            | localhost            | ops         | Connect |   23580 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127211082 | root            | localhost            | ops         | Connect |   22680 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127211404 | root            | localhost            | ops         | Connect |   21780 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127211735 | root            | localhost            | ops         | Connect |   20880 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127212024 | root            | localhost            | ops         | Connect |   19980 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127212307 | root            | localhost            | ops         | Connect |   19080 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127212628 | root            | localhost            | ops         | Connect |   18180 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127212959 | root            | localhost            | ops         | Connect |   17280 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127213242 | root            | localhost            | ops         | Connect |   16380 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127213527 | root            | localhost            | ops         | Connect |   15480 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127213850 | root            | localhost            | ops         | Connect |   14580 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127214181 | root            | localhost            | ops         | Connect |   13680 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127214466 | root            | localhost            | ops         | Connect |   12780 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127214749 | root            | localhost            | ops         | Connect |   11880 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127215071 | root            | localhost            | ops         | Connect |   10980 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127215402 | root            | localhost            | ops         | Connect |   10080 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127215686 | root            | localhost            | ops         | Connect |    9180 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127215971 | root            | localhost            | ops         | Connect |    8280 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127216296 | root            | localhost            | ops         | Connect |    7380 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127216627 | root            | localhost            | ops         | Connect |    6480 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127216913 | root            | localhost            | ops         | Connect |    5580 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127217196 | root            | localhost            | ops         | Connect |    4680 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127217518 | root            | localhost            | ops         | Connect |    3780 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127217849 | root            | localhost            | ops         | Connect |    2880 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127218134 | root            | localhost            | ops         | Connect |    1980 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127218419 | root            | localhost            | ops         | Connect |    1080 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127218448 | root            | localhost            | NULL        | Sleep   |     987 |                                          | NULL                                                                                                    |    0.000 |
      | 1127218744 | root            | localhost            | ops         | Connect |     180 | Opening tables                           | delete from event_log where stamp < now() - interval 1 day and server_id =  NAME_CONST('sid',171970754) |    0.000 |
      | 1127218810 | cumin2024       | 10.64.48.98:53996    | NULL        | Query   |       0 | starting                                 | show full processlist                                                                                   |    0.000 |
      +------------+-----------------+----------------------+-------------+---------+---------+------------------------------------------+---------------------------------------------------------------------------------------------------------+----------+
      61 rows in set (0.002 sec)
      
      

      cumin2024@db1229.eqiad.wmnet[(none)]> show engine innodb status\G
      *************************** 1. row ***************************
        Type: InnoDB
        Name:
      Status:
      =====================================
      2025-07-18 06:18:20 0x7f9b0cb666c0 INNODB MONITOR OUTPUT
      =====================================
      Per second averages calculated from the last 42 seconds
      -----------------
      BACKGROUND THREAD
      -----------------
      srv_master_thread loops: 33 srv_active, 0 srv_shutdown, 2700380 srv_idle
      srv_master_thread log flush and writes: 2700402
      ----------
      SEMAPHORES
      ----------
      ------------
      TRANSACTIONS
      ------------
      Trx id counter 181844730732
      Purge done for trx's n:o < 181844730730 undo n:o < 0 state: running
      History list length 1
      LIST OF TRANSACTIONS FOR EACH SESSION:
      ---TRANSACTION 181844730731, ACTIVE 48121 sec renaming table
      11 lock struct(s), heap size 1128, 0 row lock(s), undo log entries 3
      MariaDB thread id 1127202449, OS thread handle 140303901337280, query id 28250671597 10.64.48.98 cumin2024 Committing alter table to storage engine
      alter table categorylinks drop primary key, add primary key(cl_from, cl_target_id)
      ---TRANSACTION 181844730729, ACTIVE 48121 sec
      mysql tables in use 1, locked 1
      0 lock struct(s), heap size 1128, 0 row lock(s)
      MariaDB thread id 1127202449, OS thread handle 140303901337280, query id 28250671597 10.64.48.98 cumin2024 Committing alter table to storage engine
      alter table categorylinks drop primary key, add primary key(cl_from, cl_target_id)
      Trx read view will not see trx with id >= 181844730728, sees < 181844730728
      ---TRANSACTION (0x7ffacc24a780), not started
      0 lock struct(s), heap size 1128, 0 row lock(s)
      ---TRANSACTION (0x7ffacc248680), not started
      0 lock struct(s), heap size 1128, 0 row lock(s)
      ---TRANSACTION (0x7ffacc249180), not started
      0 lock struct(s), heap size 1128, 0 row lock(s)
      ---TRANSACTION (0x7ffacc247b80), not started
      0 lock struct(s), heap size 1128, 0 row lock(s)
      --------
      FILE I/O
      --------
      Pending flushes (fsync): 0
      501803960 OS file reads, 207739151 OS file writes, 191731558 OS fsyncs
      0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
      ---
      LOG
      ---
      Log sequence number 55824729055508
      Log flushed up to   55824661967567
      Pages flushed up to 55823338001381
      Last checkpoint at  55823329660192
      ----------------------
      BUFFER POOL AND MEMORY
      ----------------------
      Total large memory allocated 405907963904
      Dictionary memory allocated 2413549608
      Buffer pool size   24530688
      Free buffers       1536
      Database pages     24075345
      Old database pages 8887167
      Modified db pages  88919
      Percent of dirty pages(LRU & free pages): 0.369
      Max dirty pages percent: 75.000
      Pending reads 0
      Pending writes: LRU 0, flush list 0
      Pages made young 545099336, not young 2580188305
      0.00 youngs/s, 0.00 non-youngs/s
      Pages read 501771254, created 5242410, written 205819881
      0.00 reads/s, 0.00 creates/s, 0.00 writes/s
      Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
      Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
      LRU len: 24075345, unzip_LRU len: 907598
      I/O sum[0]:cur[0], unzip sum[0]:cur[0]
      --------------
      ROW OPERATIONS
      --------------
      1 read views open inside InnoDB
      state: flushing log
      ----------------------------
      END OF INNODB MONITOR OUTPUT
      ============================
       
      1 row in set (0.002 sec)
      

      cumin2024@db1229.eqiad.wmnet[(none)]>  show global status like 'Uptime';
      +---------------+---------+
      | Variable_name | Value   |
      +---------------+---------+
      | Uptime        | 2749412 |
      +---------------+---------+
      1 row in set (0.002 sec)
      

      I have attached stacktraces from the running process.

      Attachments

        Activity

          People

            marko Marko Mäkelä
            marostegui Manuel Arostegui
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.