Details

    • Bug
    • Status: Needs Feedback (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.11.11
    • None
    • None

    Description

      This is a regression affecting the open source OpenSVC monitoring database
      The issue is triggered with a 20 rows scheduler table that is not inside opensvc code but a generic web2py internal scheduler

      CREATE TABLE `scheduler_task` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `uuid` varchar(255) DEFAULT NULL,
        `args` longtext DEFAULT NULL,
        `vars` longtext DEFAULT NULL,
        `enabled` char(1) DEFAULT NULL,
        `start_time` datetime DEFAULT NULL,
        `next_run_time` datetime DEFAULT NULL,
        `stop_time` datetime DEFAULT NULL,
        `repeats` int(11) DEFAULT NULL,
        `retry_failed` int(11) DEFAULT NULL,
        `period` int(11) DEFAULT NULL,
        `timeout` int(11) DEFAULT NULL,
        `sync_output` int(11) DEFAULT NULL,
        `times_run` int(11) DEFAULT NULL,
        `times_failed` int(11) DEFAULT NULL,
        `last_run_time` datetime DEFAULT NULL,
        `prevent_drift` char(1) DEFAULT NULL,
        `group_name` varchar(255) DEFAULT NULL,
        `function_name` varchar(255) DEFAULT NULL,
        `status` varchar(255) DEFAULT NULL,
        `task_name` varchar(255) DEFAULT NULL,
        `application_name` varchar(255) DEFAULT NULL,
        `assigned_worker_name` varchar(512) DEFAULT '',
        PRIMARY KEY (`id`),
        UNIQUE KEY `uuid` (`uuid`),
        KEY `idx_dispatch` (`assigned_worker_name`(255),`status`),
        KEY `idx_expire` (`status`,`stop_time`)
      ) ENGINE=InnoDB AUTO_INCREMENT=166 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
      

      The affected user capture a partial processlist before restarting the server

      -e "SHOW FULL PROCESSLIST" 
      +---------+---------+-----------------+---------+---------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
      | Id      | User    | Host            | db      | Command | Time    | State                     | Info                                                                                                                                                                                                                                                                                         | Progress |
      +---------+---------+-----------------+---------+---------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
      |      11 | opensvc | 127.0.0.1:32928 | opensvc | Sleep   |      11 |                           | NULL                                                                                                                                                                                                                                                                                         |    0.000 |
      |      45 | opensvc | 127.0.0.1:33220 | opensvc | Sleep   |       7 |                           | NULL                                                                                                                                                                                                                                                                                         |    0.000 |
      |     121 | opensvc | 127.0.0.1:33862 | opensvc | Sleep   |      50 |                           | NULL                                                                                                                                                                                                                                                                                         |    0.000 |
      |     186 | opensvc | 127.0.0.1:34456 | opensvc | Query   | 1261600 | Updating                  | UPDATE `scheduler_task` SET `status`='RUNNING',`last_run_time`='2025-03-03 00:39:36' WHERE (`scheduler_task`.`id` = 19)                                                                                                                                                                      |    0.000 |
      |     187 | opensvc | 127.0.0.1:34460 | opensvc | Sleep   |      17 |                           | NULL                                                                                                                                                                                                                                                                                         |    0.000 |
      |     193 | opensvc | 127.0.0.1:34522 | opensvc | Sleep   |      50 |                           | NULL                                                                                                                                                                                                                                                                                         |    0.000 |
      |     195 | opensvc | 127.0.0.1:34540 | opensvc | Sleep   |      49 |                           | NULL                                                                                                                                                                                                                                                                                         |    0.000 |
      |     201 | opensvc | 127.0.0.1:51328 | opensvc | Sleep   |       8 |                           | NULL                                                                                                                                                                                                                                                                                         |    0.000 |
      |     206 | opensvc | 127.0.0.1:51374 | opensvc | Query   | 1261585 | Searching rows for update | UPDATE `scheduler_task` SET `status`='EXPIRED' WHERE ((`scheduler_task`.`status` IN ('ASSIGNED','QUEUED')) AND (`scheduler_task`.`stop_time` < '2025-03-03 00:39:51'))                                                                                                                       |    0.000 |
      |     207 | opensvc | 127.0.0.1:33216 | opensvc | Sleep   |       1 |                           | NULL                                                                                                                                                                                                                                                                                         |    0.000 |
      |     209 | opensvc | 127.0.0.1:33224 | opensvc | Sleep   |      19 |                           | NULL                                                                                                                                                                                                                                                                                         |    0.000 |
      |     208 | opensvc | 127.0.0.1:33222 | opensvc | Sleep   |      19 |                           | NULL                                                                                                                                                                                                                                                                                         |    0.000 |
      |   71278 | opensvc | 127.0.0.1:53904 | opensvc | Query   | 1264427 | Updating                  | UPDATE `nodes` SET `last_comm`='2025-03-02 23:52:29' WHERE (`nodes`.`node_id` = 'c0994f20-1e29-42a1-9c85-68f2d512df85')                                                                                                                                                                      |    0.000 |
      |   71282 | opensvc | 127.0.0.1:47578 | opensvc | Query   | 1264419 | Updating                  | UPDATE `nodes` SET `last_comm`='2025-03-02 23:52:37' WHERE (`nodes`.`node_id` = '755b77c4-9f98-4114-b993-2438cc85771d')                                                                                                                                                                      |    0.000 |
      |   71286 | opensvc | 127.0.0.1:56316 | opensvc | Query   | 1264365 | Update                    | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxxx not registered", "nodename": "xxx"}') |    0.000 |
      |   71292 | opensvc | 127.0.0.1:59498 | opensvc | Query   | 1264354 | Update                    | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') |    0.000 |
      |   71298 | opensvc | 127.0.0.1:34314 | opensvc | Query   | 1264295 | Update                    | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') |    0.000 |
      |   71302 | opensvc | 127.0.0.1:59928 | opensvc | Query   | 1264241 | Updating                  | UPDATE `nodes` SET `last_comm`='2025-03-02 23:55:35' WHERE (`nodes`.`node_id` = '775975b4-b85b-4211-ad58-278c5b51774a')                                                                                                                                                                      |    0.000 |
      |   71304 | opensvc | 127.0.0.1:59948 | opensvc | Query   | 1264235 | Update                    | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') |    0.000 |
      |   71310 | opensvc | 127.0.0.1:59450 | opensvc | Query   | 1264178 | Update                    | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') |    0.000 |
      |   71314 | opensvc | 127.0.0.1:41286 | opensvc | Query   | 1264170 | Updating                  | UPDATE `nodes` SET `last_comm`='2025-03-02 23:56:45' WHERE (`nodes`.`node_id` = 'bce5b121-24e8-4874-bf97-97eca661c52f')                                                                                                                                                                      |    0.000 |
      |   71332 | opensvc | 127.0.0.1:60228 | opensvc | Query   | 1264043 | Update                    | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') |    0.000 |
      |   71336 | opensvc | 127.0.0.1:40108 | opensvc | Query   | 1263986 | Updating                  | UPDATE `nodes` SET `last_comm`='2025-03-02 23:59:49' WHERE (`nodes`.`node_id` = '5fe0a2ad-d284-4a94-ac7e-998a39286ecc')                                                                                                                                                                      |    0.000 |
      |   71340 | opensvc | 127.0.0.1:38404 | opensvc | Query   | 1263981 | Update                    | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') |    0.000 |
      |   71346 | opensvc | 127.0.0.1:37182 | opensvc | Query   | 1263923 | Update                    | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') |    0.000 |
      |   71350 | opensvc | 127.0.0.1:45466 | opensvc | Query   | 1263869 | Update                    | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') |    0.000 |
      |   71354 | opensvc | 127.0.0.1:57502 | opensvc | Query   | 1263861 | Update                    | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') |    0.000 |
      ....
      

      The following query look at innodb TRX state

       
      -e "SELECT r.trx_id AS blocking_transaction, r.trx_mysql_thread_id AS blocking_thread, r.trx_state AS blocking_state, b.trx_id AS blocked_transaction, b.trx_mysql_thread_id AS blocked_thread, b.trx_state AS blocked_state FROM information_schema.innodb_lock_waits w JOIN information_schema.innodb_trx b ON w.requesting_trx_id = b.trx_id JOIN information_schema.innodb_trx r ON w.blocking_trx_id = r.trx_id;"
      +----------------------+-----------------+----------------+---------------------+----------------+---------------+
      | blocking_transaction | blocking_thread | blocking_state | blocked_transaction | blocked_thread | blocked_state |
      +----------------------+-----------------+----------------+---------------------+----------------+---------------+
      |              5650814 |           71675 | LOCK WAIT      |             5650815 |          71698 | LOCK WAIT     |
      |              5650813 |           71674 | LOCK WAIT      |             5650815 |          71698 | LOCK WAIT     |
      |              5650813 |           71674 | LOCK WAIT      |             5650814 |          71675 | LOCK WAIT     |
      |              5622544 |             206 | RUNNING        |             5650815 |          71698 | LOCK WAIT     |
      |              5622544 |             206 | RUNNING        |             5650814 |          71675 | LOCK WAIT     |
      |              5622544 |             206 | RUNNING        |             5650813 |          71674 | LOCK WAIT     |
      +----------------------+-----------------+----------------+---------------------+----------------+---------------+
      4:22 PM
      

      -e "SHOW EXPLAIN FOR 71675;"
      --------------
      SHOW EXPLAIN FOR 71675
      --------------
       
      ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
      -e "SHOW EXPLAIN FOR 71674;"
      +------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
      | id   | select_type | table          | type  | possible_keys | key        | key_len | ref  | rows | Extra                     |
      +------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
      |    1 | SIMPLE      | scheduler_task | range | idx_expire    | idx_expire | 774     | NULL |    2 | Using where; Using buffer |
      +------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
      -e "SHOW EXPLAIN FOR 71698;"
      +------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
      | id   | select_type | table          | type  | possible_keys | key        | key_len | ref  | rows | Extra                     |
      +------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
      |    1 | SIMPLE      | scheduler_task | range | idx_expire    | idx_expire | 774     | NULL |    2 | Using where; Using buffer |
      +------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
      -e "SHOW EXPLAIN FOR 71675;"
      +------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
      | id   | select_type | table          | type  | possible_keys | key        | key_len | ref  | rows | Extra                     |
      +------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
      

      From a previous crash freeze on 10.11.7

      *************************** 1. row ***************************
        Type: InnoDB
        Name: 
      Status: 
      =====================================
      2025-02-10 15:53:40 0x7ff936f4c640 INNODB MONITOR OUTPUT
      =====================================
      Per second averages calculated from the last 37 seconds
      -----------------
      BACKGROUND THREAD
      -----------------
      srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 363824 srv_idle
      srv_master_thread log flush and writes: 363735
      ----------
      SEMAPHORES
      ----------
      ------------
      TRANSACTIONS
      ------------
      Trx id counter 3279232
      Purge done for trx's n:o < 3272353 undo n:o < 0 state: running but idle
      History list length 0
      LIST OF TRANSACTIONS FOR EACH SESSION:
      ---TRANSACTION 3279231, ACTIVE 30 sec starting index read
      mysql tables in use 1, locked 1
      LOCK WAIT 2 lock struct(s), heap size 1128, 1 row lock(s)
      MariaDB thread id 1013514, OS thread handle 140708360341056, query id 2850795 127.0.0.1 opensvc Updating
      UPDATE `nodes` SET `last_comm`='2025-02-10 15:53:09' WHERE (`nodes`.`node_id` = '755b77c4-9f98-4114-b993-2438cc85771d')
      Trx read view will not see trx with id >= 3279231, sees < 3272354
      ------- TRX HAS BEEN WAITING 30598033 ns FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 99 page no 4 n bits 16 index k_node_id of table `opensvc`.`nodes` trx id 3279231 lock_mode X waiting
      Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
       0: len 30; hex 37353562373763342d396639382d343131342d623939332d323433386363; asc 755b77c4-9f98-4114-b993-2438cc; (total 36 bytes);
       1: len 4; hex 8011653e; asc   e>;;
       
      .......
      Update
      INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}')
      Trx read view will not see trx with id >= 3272372, sees < 3272354
      ---TRANSACTION 3272365, ACTIVE 271477 sec inserting
      mysql tables in use 1, locked 1
      1 lock struct(s), heap size 1128, 0 row lock(s)
      MariaDB thread id 136431, OS thread handle 140708701066816, query id 2300994 127.0.0.1 opensvc Update
      INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}')
      Trx read view will not see trx with id >= 3272365, sees < 3272354
      ---TRANSACTION 3272362, ACTIVE 271483 sec inserting
      mysql tables in use 1, locked 1
      1 lock struct(s), heap size 1128, 0 row lock(s)
      MariaDB thread id 136419, OS thread handle 140708719388224, query id 2300936 127.0.0.1 opensvc Update
      INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}')
      Trx read view will not see trx with id >= 3272362, sees < 3272354
      ---TRANSACTION 3272356, ACTIVE 271541 sec starting index read
      mysql tables in use 1, locked 1
      2 lock struct(s), heap size 1128, 1 row lock(s)
      MariaDB thread id 136357, OS thread handle 140708337927744, query id 2300670 127.0.0.1 opensvc Updating
      UPDATE `nodes` SET `last_comm`='2025-02-07 12:27:59' WHERE (`nodes`.`node_id` = 'c0994f20-1e29-42a1-9c85-68f2d512df85')
      Trx read view will not see trx with id >= 3272354, sees < 3272354
      ---TRANSACTION 3272355, ACTIVE 271544 sec inserting
      mysql tables in use 1, locked 1
      1 lock struct(s), heap size 1128, 0 row lock(s)
      MariaDB thread id 136352, OS thread handle 140708301592128, query id 2300646 127.0.0.1 opensvc Update
      INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}')
      Trx read view will not see trx with id >= 3272354, sees < 3272354
      ---TRANSACTION 3272354, ACTIVE 271545 sec updating or deleting
      mysql tables in use 1, locked 1
      3 lock struct(s), heap size 1128, 2 row lock(s)
      MariaDB thread id 136349, OS thread handle 140708720309824, query id 2300631 127.0.0.1 opensvc Updating
      UPDATE `nodes` SET `last_comm`='2025-02-07 12:27:55' WHERE (`nodes`.`node_id` = 'bce5b121-24e8-4874-bf97-97eca661c52f')
      Trx read view will not see trx with id >= 3272354, sees < 3272354
      ---TRANSACTION (0x7ffa5cf36780), not started
      0 lock struct(s), heap size 1128, 0 row lock(s)
      ---TRANSACTION (0x7ffa5cf37d80), not started
      0 lock struct(s), heap size 1128, 0 row lock(s)
      ---TRANSACTION 3272557, ACTIVE 270013 sec updating or deleting
      mysql tables in use 1, locked 1
      2 lock struct(s), heap size 1128, 1 row lock(s)
      MariaDB thread id 19, OS thread handle 140708703524416, query id 2308189 127.0.0.1 opensvc Updating
      UPDATE `scheduler_task` SET `status`='RUNNING',`last_run_time`='2025-02-07 12:53:27' WHERE (`scheduler_task`.`id` = 19)
      Trx read view will not see trx with id >= 3272557, sees < 3272354
      ---TRANSACTION 3272558, ACTIVE 269998 sec starting index read
      mysql tables in use 1, locked 1
      2 lock struct(s), heap size 1128, 1 row lock(s)
      MariaDB thread id 188, OS thread handle 140708336084544, query id 2308239 127.0.0.1 opensvc Searching rows for update
      UPDATE `scheduler_task` SET `status`='EXPIRED' WHERE ((`scheduler_task`.`status` IN ('ASSIGNED','QUEUED')) AND (`scheduler_task`.`stop_time` < '2025-02-07 12:53:42'))
      ---TRANSACTION (0x7ffa5cf34680), not started
      0 lock struct(s), heap size 1128, 0 row lock(s)
      ---TRANSACTION (0x7ffa5cf33b80), not started
      0 lock struct(s), heap size 1128, 0 row lock(s)
      --------
      FILE I/O
      --------
      Pending flushes (fsync): 0
      1857 OS file reads, 451773 OS file writes, 257778 OS fsyncs
      0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
      ---
      LOG
      ---
      Log sequence number 3505933275
      Log flushed up to   3505933275
      Pages flushed up to 3505926452
      Last checkpoint at  3505926452
      ----------------------
      BUFFER POOL AND MEMORY
      ----------------------
      Total large memory allocated 4328521728
      Dictionary memory allocated 26740392
      Buffer pool size   259584
      Free buffers       257536
      Database pages     2048
      Old database pages 768
      Modified db pages  4
      Percent of dirty pages(LRU & free pages): 0.002
      Max dirty pages percent: 40.000
      Pending reads 0
      Pending writes: LRU 0, flush list 4
      Pages made young 592, not young 47425
      0.00 youngs/s, 0.00 non-youngs/s
      Pages read 1676, created 92317, written 279544
      0.00 reads/s, 0.00 creates/s, 0.00 writes/s
      Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
      Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
      LRU len: 2048, unzip_LRU len: 0
      I/O sum[0]:cur[0], unzip sum[0]:cur[0]
      --------------
      ROW OPERATIONS
      --------------
      138 read views open inside InnoDB
      state: sleeping
      ----------------------------
      END OF INNODB MONITOR OUTPUT
      ============================
      

      Found in 10.11.10 that may trigger such case
      MDEV-34515

      This is so far all what we have but we will report progress on the issue any help on is welcome

      Attachments

        Activity

          People

            marko Marko Mäkelä
            stephane@skysql.com VAROQUI Stephane
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.