Details
-
Bug
-
Status: Needs Feedback (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.11.11
-
None
-
None
Description
This is a regression affecting the open source OpenSVC monitoring database
The issue is triggered with a 20 rows scheduler table that is not inside opensvc code but a generic web2py internal scheduler
CREATE TABLE `scheduler_task` ( |
`id` int(11) NOT NULL AUTO_INCREMENT, |
`uuid` varchar(255) DEFAULT NULL, |
`args` longtext DEFAULT NULL, |
`vars` longtext DEFAULT NULL, |
`enabled` char(1) DEFAULT NULL, |
`start_time` datetime DEFAULT NULL, |
`next_run_time` datetime DEFAULT NULL, |
`stop_time` datetime DEFAULT NULL, |
`repeats` int(11) DEFAULT NULL, |
`retry_failed` int(11) DEFAULT NULL, |
`period` int(11) DEFAULT NULL, |
`timeout` int(11) DEFAULT NULL, |
`sync_output` int(11) DEFAULT NULL, |
`times_run` int(11) DEFAULT NULL, |
`times_failed` int(11) DEFAULT NULL, |
`last_run_time` datetime DEFAULT NULL, |
`prevent_drift` char(1) DEFAULT NULL, |
`group_name` varchar(255) DEFAULT NULL, |
`function_name` varchar(255) DEFAULT NULL, |
`status` varchar(255) DEFAULT NULL, |
`task_name` varchar(255) DEFAULT NULL, |
`application_name` varchar(255) DEFAULT NULL, |
`assigned_worker_name` varchar(512) DEFAULT '', |
PRIMARY KEY (`id`), |
UNIQUE KEY `uuid` (`uuid`), |
KEY `idx_dispatch` (`assigned_worker_name`(255),`status`), |
KEY `idx_expire` (`status`,`stop_time`) |
) ENGINE=InnoDB AUTO_INCREMENT=166 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci; |
The affected user capture a partial processlist before restarting the server
-e "SHOW FULL PROCESSLIST"
|
+---------+---------+-----------------+---------+---------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+---------+---------+-----------------+---------+---------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
| 11 | opensvc | 127.0.0.1:32928 | opensvc | Sleep | 11 | | NULL | 0.000 |
|
| 45 | opensvc | 127.0.0.1:33220 | opensvc | Sleep | 7 | | NULL | 0.000 |
|
| 121 | opensvc | 127.0.0.1:33862 | opensvc | Sleep | 50 | | NULL | 0.000 |
|
| 186 | opensvc | 127.0.0.1:34456 | opensvc | Query | 1261600 | Updating | UPDATE `scheduler_task` SET `status`='RUNNING',`last_run_time`='2025-03-03 00:39:36' WHERE (`scheduler_task`.`id` = 19) | 0.000 |
|
| 187 | opensvc | 127.0.0.1:34460 | opensvc | Sleep | 17 | | NULL | 0.000 |
|
| 193 | opensvc | 127.0.0.1:34522 | opensvc | Sleep | 50 | | NULL | 0.000 |
|
| 195 | opensvc | 127.0.0.1:34540 | opensvc | Sleep | 49 | | NULL | 0.000 |
|
| 201 | opensvc | 127.0.0.1:51328 | opensvc | Sleep | 8 | | NULL | 0.000 |
|
| 206 | opensvc | 127.0.0.1:51374 | opensvc | Query | 1261585 | Searching rows for update | UPDATE `scheduler_task` SET `status`='EXPIRED' WHERE ((`scheduler_task`.`status` IN ('ASSIGNED','QUEUED')) AND (`scheduler_task`.`stop_time` < '2025-03-03 00:39:51')) | 0.000 |
|
| 207 | opensvc | 127.0.0.1:33216 | opensvc | Sleep | 1 | | NULL | 0.000 |
|
| 209 | opensvc | 127.0.0.1:33224 | opensvc | Sleep | 19 | | NULL | 0.000 |
|
| 208 | opensvc | 127.0.0.1:33222 | opensvc | Sleep | 19 | | NULL | 0.000 |
|
| 71278 | opensvc | 127.0.0.1:53904 | opensvc | Query | 1264427 | Updating | UPDATE `nodes` SET `last_comm`='2025-03-02 23:52:29' WHERE (`nodes`.`node_id` = 'c0994f20-1e29-42a1-9c85-68f2d512df85') | 0.000 |
|
| 71282 | opensvc | 127.0.0.1:47578 | opensvc | Query | 1264419 | Updating | UPDATE `nodes` SET `last_comm`='2025-03-02 23:52:37' WHERE (`nodes`.`node_id` = '755b77c4-9f98-4114-b993-2438cc85771d') | 0.000 |
|
| 71286 | opensvc | 127.0.0.1:56316 | opensvc | Query | 1264365 | Update | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxxx not registered", "nodename": "xxx"}') | 0.000 |
|
| 71292 | opensvc | 127.0.0.1:59498 | opensvc | Query | 1264354 | Update | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') | 0.000 |
|
| 71298 | opensvc | 127.0.0.1:34314 | opensvc | Query | 1264295 | Update | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') | 0.000 |
|
| 71302 | opensvc | 127.0.0.1:59928 | opensvc | Query | 1264241 | Updating | UPDATE `nodes` SET `last_comm`='2025-03-02 23:55:35' WHERE (`nodes`.`node_id` = '775975b4-b85b-4211-ad58-278c5b51774a') | 0.000 |
|
| 71304 | opensvc | 127.0.0.1:59948 | opensvc | Query | 1264235 | Update | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') | 0.000 |
|
| 71310 | opensvc | 127.0.0.1:59450 | opensvc | Query | 1264178 | Update | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') | 0.000 |
|
| 71314 | opensvc | 127.0.0.1:41286 | opensvc | Query | 1264170 | Updating | UPDATE `nodes` SET `last_comm`='2025-03-02 23:56:45' WHERE (`nodes`.`node_id` = 'bce5b121-24e8-4874-bf97-97eca661c52f') | 0.000 |
|
| 71332 | opensvc | 127.0.0.1:60228 | opensvc | Query | 1264043 | Update | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') | 0.000 |
|
| 71336 | opensvc | 127.0.0.1:40108 | opensvc | Query | 1263986 | Updating | UPDATE `nodes` SET `last_comm`='2025-03-02 23:59:49' WHERE (`nodes`.`node_id` = '5fe0a2ad-d284-4a94-ac7e-998a39286ecc') | 0.000 |
|
| 71340 | opensvc | 127.0.0.1:38404 | opensvc | Query | 1263981 | Update | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') | 0.000 |
|
| 71346 | opensvc | 127.0.0.1:37182 | opensvc | Query | 1263923 | Update | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') | 0.000 |
|
| 71350 | opensvc | 127.0.0.1:45466 | opensvc | Query | 1263869 | Update | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') | 0.000 |
|
| 71354 | opensvc | 127.0.0.1:57502 | opensvc | Query | 1263861 | Update | INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}') | 0.000 |
|
....
|
The following query look at innodb TRX state
|
-e "SELECT r.trx_id AS blocking_transaction, r.trx_mysql_thread_id AS blocking_thread, r.trx_state AS blocking_state, b.trx_id AS blocked_transaction, b.trx_mysql_thread_id AS blocked_thread, b.trx_state AS blocked_state FROM information_schema.innodb_lock_waits w JOIN information_schema.innodb_trx b ON w.requesting_trx_id = b.trx_id JOIN information_schema.innodb_trx r ON w.blocking_trx_id = r.trx_id;"
|
+----------------------+-----------------+----------------+---------------------+----------------+---------------+
|
| blocking_transaction | blocking_thread | blocking_state | blocked_transaction | blocked_thread | blocked_state |
|
+----------------------+-----------------+----------------+---------------------+----------------+---------------+
|
| 5650814 | 71675 | LOCK WAIT | 5650815 | 71698 | LOCK WAIT |
|
| 5650813 | 71674 | LOCK WAIT | 5650815 | 71698 | LOCK WAIT |
|
| 5650813 | 71674 | LOCK WAIT | 5650814 | 71675 | LOCK WAIT |
|
| 5622544 | 206 | RUNNING | 5650815 | 71698 | LOCK WAIT |
|
| 5622544 | 206 | RUNNING | 5650814 | 71675 | LOCK WAIT |
|
| 5622544 | 206 | RUNNING | 5650813 | 71674 | LOCK WAIT |
|
+----------------------+-----------------+----------------+---------------------+----------------+---------------+
|
4:22 PM
|
-e "SHOW EXPLAIN FOR 71675;"
|
--------------
|
SHOW EXPLAIN FOR 71675
|
--------------
|
|
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
|
-e "SHOW EXPLAIN FOR 71674;"
|
+------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
|
| 1 | SIMPLE | scheduler_task | range | idx_expire | idx_expire | 774 | NULL | 2 | Using where; Using buffer |
|
+------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
|
-e "SHOW EXPLAIN FOR 71698;"
|
+------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
|
| 1 | SIMPLE | scheduler_task | range | idx_expire | idx_expire | 774 | NULL | 2 | Using where; Using buffer |
|
+------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
|
-e "SHOW EXPLAIN FOR 71675;"
|
+------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+----------------+-------+---------------+------------+---------+------+------+---------------------------+
|
From a previous crash freeze on 10.11.7
*************************** 1. row ***************************
|
Type: InnoDB
|
Name:
|
Status:
|
=====================================
|
2025-02-10 15:53:40 0x7ff936f4c640 INNODB MONITOR OUTPUT
|
=====================================
|
Per second averages calculated from the last 37 seconds
|
-----------------
|
BACKGROUND THREAD
|
-----------------
|
srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 363824 srv_idle
|
srv_master_thread log flush and writes: 363735
|
----------
|
SEMAPHORES
|
----------
|
------------
|
TRANSACTIONS
|
------------
|
Trx id counter 3279232
|
Purge done for trx's n:o < 3272353 undo n:o < 0 state: running but idle
|
History list length 0
|
LIST OF TRANSACTIONS FOR EACH SESSION:
|
---TRANSACTION 3279231, ACTIVE 30 sec starting index read
|
mysql tables in use 1, locked 1
|
LOCK WAIT 2 lock struct(s), heap size 1128, 1 row lock(s)
|
MariaDB thread id 1013514, OS thread handle 140708360341056, query id 2850795 127.0.0.1 opensvc Updating
|
UPDATE `nodes` SET `last_comm`='2025-02-10 15:53:09' WHERE (`nodes`.`node_id` = '755b77c4-9f98-4114-b993-2438cc85771d')
|
Trx read view will not see trx with id >= 3279231, sees < 3272354
|
------- TRX HAS BEEN WAITING 30598033 ns FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 99 page no 4 n bits 16 index k_node_id of table `opensvc`.`nodes` trx id 3279231 lock_mode X waiting
|
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
|
0: len 30; hex 37353562373763342d396639382d343131342d623939332d323433386363; asc 755b77c4-9f98-4114-b993-2438cc; (total 36 bytes);
|
1: len 4; hex 8011653e; asc e>;;
|
|
.......
|
Update
|
INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}')
|
Trx read view will not see trx with id >= 3272372, sees < 3272354
|
---TRANSACTION 3272365, ACTIVE 271477 sec inserting
|
mysql tables in use 1, locked 1
|
1 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 136431, OS thread handle 140708701066816, query id 2300994 127.0.0.1 opensvc Update
|
INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}')
|
Trx read view will not see trx with id >= 3272365, sees < 3272354
|
---TRANSACTION 3272362, ACTIVE 271483 sec inserting
|
mysql tables in use 1, locked 1
|
1 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 136419, OS thread handle 140708719388224, query id 2300936 127.0.0.1 opensvc Update
|
INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}')
|
Trx read view will not see trx with id >= 3272362, sees < 3272354
|
---TRANSACTION 3272356, ACTIVE 271541 sec starting index read
|
mysql tables in use 1, locked 1
|
2 lock struct(s), heap size 1128, 1 row lock(s)
|
MariaDB thread id 136357, OS thread handle 140708337927744, query id 2300670 127.0.0.1 opensvc Updating
|
UPDATE `nodes` SET `last_comm`='2025-02-07 12:27:59' WHERE (`nodes`.`node_id` = 'c0994f20-1e29-42a1-9c85-68f2d512df85')
|
Trx read view will not see trx with id >= 3272354, sees < 3272354
|
---TRANSACTION 3272355, ACTIVE 271544 sec inserting
|
mysql tables in use 1, locked 1
|
1 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 136352, OS thread handle 140708301592128, query id 2300646 127.0.0.1 opensvc Update
|
INSERT INTO `log`(`svc_id`,`log_level`,`log_user`,`node_id`,`log_action`,`log_fmt`,`log_dict`) VALUES (NULL,'warning','feed',NULL,'node.auth','node authentication error: %(e)s from %(nodename)s','{"e": "agent xxx not registered", "nodename": "xxx"}')
|
Trx read view will not see trx with id >= 3272354, sees < 3272354
|
---TRANSACTION 3272354, ACTIVE 271545 sec updating or deleting
|
mysql tables in use 1, locked 1
|
3 lock struct(s), heap size 1128, 2 row lock(s)
|
MariaDB thread id 136349, OS thread handle 140708720309824, query id 2300631 127.0.0.1 opensvc Updating
|
UPDATE `nodes` SET `last_comm`='2025-02-07 12:27:55' WHERE (`nodes`.`node_id` = 'bce5b121-24e8-4874-bf97-97eca661c52f')
|
Trx read view will not see trx with id >= 3272354, sees < 3272354
|
---TRANSACTION (0x7ffa5cf36780), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7ffa5cf37d80), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION 3272557, ACTIVE 270013 sec updating or deleting
|
mysql tables in use 1, locked 1
|
2 lock struct(s), heap size 1128, 1 row lock(s)
|
MariaDB thread id 19, OS thread handle 140708703524416, query id 2308189 127.0.0.1 opensvc Updating
|
UPDATE `scheduler_task` SET `status`='RUNNING',`last_run_time`='2025-02-07 12:53:27' WHERE (`scheduler_task`.`id` = 19)
|
Trx read view will not see trx with id >= 3272557, sees < 3272354
|
---TRANSACTION 3272558, ACTIVE 269998 sec starting index read
|
mysql tables in use 1, locked 1
|
2 lock struct(s), heap size 1128, 1 row lock(s)
|
MariaDB thread id 188, OS thread handle 140708336084544, query id 2308239 127.0.0.1 opensvc Searching rows for update
|
UPDATE `scheduler_task` SET `status`='EXPIRED' WHERE ((`scheduler_task`.`status` IN ('ASSIGNED','QUEUED')) AND (`scheduler_task`.`stop_time` < '2025-02-07 12:53:42'))
|
---TRANSACTION (0x7ffa5cf34680), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7ffa5cf33b80), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
--------
|
FILE I/O
|
--------
|
Pending flushes (fsync): 0
|
1857 OS file reads, 451773 OS file writes, 257778 OS fsyncs
|
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
|
---
|
LOG
|
---
|
Log sequence number 3505933275
|
Log flushed up to 3505933275
|
Pages flushed up to 3505926452
|
Last checkpoint at 3505926452
|
----------------------
|
BUFFER POOL AND MEMORY
|
----------------------
|
Total large memory allocated 4328521728
|
Dictionary memory allocated 26740392
|
Buffer pool size 259584
|
Free buffers 257536
|
Database pages 2048
|
Old database pages 768
|
Modified db pages 4
|
Percent of dirty pages(LRU & free pages): 0.002
|
Max dirty pages percent: 40.000
|
Pending reads 0
|
Pending writes: LRU 0, flush list 4
|
Pages made young 592, not young 47425
|
0.00 youngs/s, 0.00 non-youngs/s
|
Pages read 1676, created 92317, written 279544
|
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
|
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
|
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
|
LRU len: 2048, unzip_LRU len: 0
|
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
|
--------------
|
ROW OPERATIONS
|
--------------
|
138 read views open inside InnoDB
|
state: sleeping
|
----------------------------
|
END OF INNODB MONITOR OUTPUT
|
============================
|
Found in 10.11.10 that may trigger such case
MDEV-34515
This is so far all what we have but we will report progress on the issue any help on is welcome
Attachments
Activity
Transition | Time In Source Status | Execution Times |
---|
|
10h 59m | 1 |