[MCOL-859] Running TRUNCATE on many tables in parallel seems to eventually deadlock Created: 2017-08-07  Updated: 2020-08-25  Resolved: 2017-10-26

Status: Closed
Project: MariaDB ColumnStore
Component/s: DDLProc
Affects Version/s: 1.0.10
Fix Version/s: 1.0.12, 1.1.1

Type: Bug Priority: Major
Reporter: Andrew Hutchings (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 1
Labels: None

Sprint: 2017-18, 2017-19, 2017-20, 2017-21

 Description   

Reported by a customer who is doing parallel jobs of TRUNCATE followed by LOAD DATA.



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2017-09-15 ]

Got a test case to reproduce this. Create two tables:

create table mcol859a (a int, b varchar(50)) engine=columnstore;
create table mcol859a (a int, b varchar(50)) engine=columnstore;

Create a tab separated csv file:

1	this
2	is
3	a
4	test

Create two PHP files as follows (make one use mcol859a and the other mcol859b):

<?php
$data = 1;
$conn = new mysqli("127.0.0.1", "root", "", "test");
for ($i = 0; $i < 1000000; $i++)
{
	$conn->query("truncate table mcol859b");
	$conn->query("load data infile '/home/linuxjedi/tmp/mcol859.csv' into table mcol859b;");
	echo ".";
}

Run the two PHP scripts in parallel, reproduced in less than a minute. Both threads on the truncate command, things locked up so bad that the MariaDB client hangs on new connections.

Comment by Andrew Hutchings (Inactive) [ 2017-09-15 ]

Cause is a DDLProc thread in system catalog deadlocking itself.

colTypeDct locks lk3 at the end which is fDctTokenMapLock, this calls colType which calls checkSysCatVer, checkSysCatVer calls flushCache if the cached catalogue version is wrong which also locks fDctTokenMapLock. But the thread already has that lock with colTypeDct causing a deadlock.

Comment by Andrew Hutchings (Inactive) [ 2017-09-15 ]

Switched the mutex in question to a recursive mutex and after half an hour of running I've not been able to reproduce it any more (it took 1 minute before the fix).

Pull request for 1.0 and 1.1. Do not merge the 1.1 until after 1.1.0 is released.

For QA: Please see my comment with the test

Comment by Daniel Lee (Inactive) [ 2017-10-26 ]

Build verified: Github source for 1.0.12

/root/columnstore/mariadb-columnstore-server
commit a42eb6d1e74e44c9e8fd9bb8290e6ce7dbf909f5
Merge: 2965fc8 6a14ced
Author: David.Hall <david.hall@mariadb.com>
Date: Tue Oct 3 10:12:33 2017 -0500

Merge pull request #69 from mariadb-corporation/MCOL-940

MCOL-940

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit a8414b9a8f586917e20c2688f053d30d36b725d4
Merge: 22f5c04 16990c8
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Wed Oct 25 14:28:31 2017 +0300

Merge pull request #297 from mariadb-corporation/MCOL-985-1.0

MCOL-985 Add return code test after call to buildReturnedColumn

Build verified: Github source for 1.1.1

/root/columnstore/mariadb-columnstore-server
commit f6cd94ea167789970db7b5b501569a6549495d10
Merge: 3d846d3 91b2553
Author: David.Hall <david.hall@mariadb.com>
Date: Tue Oct 24 09:15:58 2017 -0500

Merge pull request #72 from mariadb-corporation/MCOL-982

MCOL-982 Merge MariaDB 10.2.9

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 4985f3456e02b4ade6254f3dcc05a26a6bc4d338
Merge: 723620d 7aa588f
Author: David.Hall <david.hall@mariadb.com>
Date: Thu Oct 26 09:15:42 2017 -0500

Merge pull request #302 from mariadb-corporation/MCOL-984

MCOL-984 Fix API bulk insert rowID/HWM calulation

Verified with my own test scripts:

[root@localhost tests]# cat t1.sql
truncate table mcol859a;
load data infile '/tmp/mcol859.csv' into table mcol859a fields terminated by "|";
[root@localhost tests]# cat t1.sh
#!/bin/bash
#
for i in `seq 1 1000000`;
do
/data/qa/autopilot/common/sh/execSQLScript.sh mytest t1.sql 1
done

[root@localhost tests]# cat t2.sql
truncate table mcol859b;
load data infile '/tmp/mcol859.csv' into table mcol859b fields terminated by "|";
[root@localhost tests]# cat t2.sh
#!/bin/bash
#
for i in `seq 1 1000000`;
do
/data/qa/autopilot/common/sh/execSQLScript.sh mytest t2.sql 1
done

Generated at Thu Feb 08 02:24:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.