[MCOL-4784] Replication lag due to waiting for metadata lock on CS table Created: 2021-06-30  Updated: 2021-09-15  Resolved: 2021-09-02

Status: Closed
Project: MariaDB ColumnStore
Component/s: N/A
Affects Version/s: 5.5.2
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Sandhya Balakrishnan (Inactive) Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 1
Labels: None
Environment:

CentOS Linux release 7.9


Issue Links:
Relates
relates to MCOL-4515 Binlog replication getting stuck when... Closed

 Description   

Truncate table is not able to execute due to waiting for metadata lock (thread id 1122427). There are many selects running for more than 7hrs. 3 of them are in Killed state (thread id 1361504, 1361536, 1361567) and truncate may be waiting for this

MariaDB [(none)]> show processlist;
+---------+-----------------+------------------+-----------+-----------+--------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+---------+-----------------+------------------+-----------+-----------+--------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
| 1122426 | system user | | NULL | Slave_IO | 237516 | Waiting for master to send event | NULL | 0.000 |
| 1122427 | system user | | dm_fait | Slave_SQL | 5560 | Waiting for table metadata lock | TRUNCATE TABLE dm_master.dim_structure | 0.000 |
| 1355372 | userbi | mxs1:55280 | dm_fait | Query | 28634 | Executing | SELECT
u.CODE_TYPE_USER AS CODE_TYPE_USER_UT,
GROUP_CONCAT(DISTINCT RA.HEAD_ID) RUBRIQUE
FROM
dm | 0.000 |
| 1355574 | userbi | mxs1:56124 | dm_fait | Query | 28694 | Executing | SELECT
u.CODE_TYPE_USER AS CODE_TYPE_USER_UT,
GROUP_CONCAT(DISTINCT RA.HEAD_ID) RUBRIQUE
FROM
dm | 0.000 |
| 1360885 | userbi | mxs1:50452 | dm_fait | Query | 28770 | Executing | SELECT
'P' AS CODE_TYPE_USER_P,
GROUP_CONCAT(DISTINCT HEAD_ID) AS RUBRIQUE
FROM
dm_master.dim_ | 0.000 |
| 1361504 | userbi | mxs1:53054 | dm_fait | Killed | 29006 | Executing | select * from (select `INVOICE`.`INV_NUMBER` as `c0`, `INVOICE`.`INV_DATE` as `c1`, COALESCE(CAST(CA | 0.000 |
| 1361536 | userbi | mxs1:53168 | dm_fait | Killed | 28988 | Executing | select * from (select `INVOICE`.`INV_NUMBER` as `c0`, `INVOICE`.`INV_DATE` as `c1`, COALESCE(CAST(CA | 0.000 |
| 1361567 | userbi | mxs1:53276 | dm_fait | Killed | 28970 | Executing | select * from (select `INVOICE`.`INV_NUMBER` as `c0`, `INVOICE`.`INV_DATE` as `c1`, COALESCE(CAST(CA | 0.000 |
| 1361587 | userbi | mxs1:53346 | dm_fait | Query | 28961 | Executing | select * from (select `INVOICE`.`INV_NUMBER` as `c0`, `INVOICE`.`INV_DATE` as `c1`, COALESCE(CAST(CA | 0.000 |
| 1361829 | userbi | mxs1:54582 | dm_fait | Query | 28768 | Executing | SELECT IF(COUNT(1)<>0,1,0) AS TEST_A_SUPP
FROM dm_fait.fait_rav
WHERE ENRG_A_SUPP = 1 | 0.000 |

The truncate is running on a columnstore table so I have installed metadata_lock_info plugin to check what is holding the lock. I tried to kill one of the blocking thread but it remains in killed state and does not clear. To fix this for now, I have killed the mariadb process and started up again and the processlist is clear and there is no replication lag.

--threads which are holding metadata locks.
MariaDB [(none)]> SELECT CONCAT('Thread ',P.ID,' executing "',P.INFO,'" IS LOCKED BY Thread ',M.THREAD_ID) WhoLocksWho FROM INFORMATION_SCHEMA.PROCESSLIST P,INFORMATION_SCHEMA.METADATA_LOCK_INFO M WHERE LOCATE(lcase(LOCK_TYPE), lcase(STATE))>0 limit 5;
+-----------------------------------------------------------------------------------------------+
| WhoLocksWho |
+-----------------------------------------------------------------------------------------------+
| Thread 1122427 executing "TRUNCATE TABLE dm_master.dim_structure" IS LOCKED BY Thread 1396804 |
| Thread 1122427 executing "TRUNCATE TABLE dm_master.dim_structure" IS LOCKED BY Thread 1355574 |
| Thread 1122427 executing "TRUNCATE TABLE dm_master.dim_structure" IS LOCKED BY Thread 1361915 |
| Thread 1122427 executing "TRUNCATE TABLE dm_master.dim_structure" IS LOCKED BY Thread 1355372 |
| Thread 1122427 executing "TRUNCATE TABLE dm_master.dim_structure" IS LOCKED BY Thread 1361977 |
+-----------------------------------------------------------------------------------------------+
5 rows in set (0.008 sec)
 
MariaDB [(none)]> select ID,USER,HOST,DB,COMMAND,TIME,INFO from information_schema.processlist where ID=1396804;
+---------+--------+------------+---------+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | USER | HOST | DB | COMMAND | TIME | INFO |
+---------+--------+------------+---------+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1396804 | userbi | mxs1:37548 | dm_fait | Query | 5562 | SELECT 'dm_fait.fait_orders' AS 'COL', ENRG_A_SUPP AS ENRG_A_SUPP, COUNT(1) AS NB_LIGNES, CAST(MIN(DATE_MAJ) AS CHAR(20)) AS 'DATE_MIN', CAST(MAX(DATE_MAJ) AS CHAR(20)) AS 'DATE_MAX'
FROM dm_fait.fait_orders GROUP BY 1,2 |
+---------+--------+------------+---------+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
MariaDB [(none)]> select ID,USER,HOST,DB,COMMAND,TIME,INFO from information_schema.processlist where ID=1355574;
+---------+--------+------------+---------+---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | USER | HOST | DB | COMMAND | TIME | INFO |
+---------+--------+------------+---------+---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1355574 | userbi | mxs1:56124 | dm_fait | Query | 39752 | SELECT
u.CODE_TYPE_USER AS CODE_TYPE_USER_UT,
GROUP_CONCAT(DISTINCT RA.HEAD_ID) RUBRIQUE
FROM
dm_fait.fait_ra AS RA
INNER JOIN dm_master.uti_uti_org AS uo ON
RA.ORG_ID_STRUCTURE_EU = uo.ORG_ID
INNER JOIN dm_master.uti_utilisateur AS u ON
u.uti_id = uo.uti_id
WHERE
RA.ENRG_A_SUPP = 0
AND uo.UTI_LDAP_ID = 'supervista2@pixid.biz'
GROUP BY 1 |
+---------+--------+------------+---------+---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.003 sec)
 
MariaDB [(none)]> select ID,USER,HOST,DB,COMMAND,TIME,INFO from information_schema.processlist where ID=1361915;
+---------+--------+------------+---------+---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | USER | HOST | DB | COMMAND | TIME | INFO |
+---------+--------+------------+---------+---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1361915 | userbi | mxs1:55092 | dm_fait | Query | 39819 | SELECT
u.CODE_TYPE_USER AS CODE_TYPE_USER_UT,
GROUP_CONCAT(DISTINCT RA.HEAD_ID) RUBRIQUE
FROM
dm_fait.fait_ra AS RA
INNER JOIN dm_master.uti_uti_org AS uo ON
RA.ORG_ID_STRUCTURE_EU = uo.ORG_ID
INNER JOIN dm_master.uti_utilisateur AS u ON
u.uti_id = uo.uti_id
WHERE
RA.ENRG_A_SUPP = 0
AND uo.UTI_LDAP_ID = 'supervista2@pixid.biz'
GROUP BY 1 |
+---------+--------+------------+---------+---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.001 sec)
 
--killing one of the blocking queries and thread remains in killed state
MariaDB [(none)]> kill 1396804;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [(none)]> select ID,USER,HOST,DB,COMMAND,TIME,INFO from information_schema.processlist where ID=1396804;
+---------+--------+------------+---------+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | USER | HOST | DB | COMMAND | TIME | INFO |
+---------+--------+------------+---------+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1396804 | userbi | mxs1:37548 | dm_fait | Killed | 6162 | SELECT 'dm_fait.fait_orders' AS 'COL', ENRG_A_SUPP AS ENRG_A_SUPP, COUNT(1) AS NB_LIGNES, CAST(MIN(DATE_MAJ) AS CHAR(20)) AS 'DATE_MIN', CAST(MAX(DATE_MAJ) AS CHAR(20)) AS 'DATE_MAX'
FROM dm_fait.fait_orders GROUP BY 1,2 |
+---------+--------+------------+---------+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.003 sec)
 
-- after mariadb restart
MariaDB [(none)]> show processlist;
+-----+-------------+------------------+------+-----------+------+--------------------------------------------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+-----+-------------+------------------+------+-----------+------+--------------------------------------------------------+------------------+----------+
| 5 | system user | | NULL | Slave_IO | 18 | Waiting for master to send event | NULL | 0.000 |
| 6 | system user | | NULL | Slave_SQL | 0 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
| 7 | mxs | mxs1:44510 | NULL | Sleep | 0 | | NULL | 0.000 |
| 10 | monitoring | 10.10.1.98:38794 | NULL | Sleep | 0 | | NULL | 0.000 |
| 24 | monitoring | mxs1:44520 | NULL | Sleep | 0 | | NULL | 0.000 |
| 105 | root | localhost | NULL | Query | 0 | starting | show processlist | 0.000 |
+-----+-------------+------------------+------+-----------+------+--------------------------------------------------------+------------------+----------+
6 rows in set (0.000 sec)



 Comments   
Comment by Manjot Singh (Inactive) [ 2021-07-15 ]

Workaround - use schema replication filters in server to skip the CS schema (but it skips DDL changes too)

Generated at Thu Feb 08 02:52:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.