[MDEV-10804] main.stat_tables_par fails sporadically in buildbot Created: 2016-09-13  Updated: 2017-08-17  Resolved: 2017-05-11

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.2
Fix Version/s: 10.2.6

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 1
Labels: None

Attachments: Text File sync.diff    
Issue Links:
Duplicate
duplicates MDEV-3891 Deadlock with statistics tables and A... Closed
Relates
relates to MDEV-7069 Fix buildbot failures in main server ... Stalled
relates to MDEV-13266 Race condition in ANALYZE TABLE / sta... Closed
relates to MDEV-13544 main.stat_tables_par failed in buildbot Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/p8-rhel71-bintar-debug/builds/1194/steps/test/logs/stdio

main.stat_tables_par                     w1 [ fail ]
        Test ended at 2016-09-13 02:56:52
 
CURRENT_TEST: main.stat_tables_par
--- /home/buildbot/maria-slave/power8-vlp06-bintar-debug/build/mysql-test/r/stat_tables_par.result	2016-09-12 23:22:49.532112558 -0400
+++ /home/buildbot/maria-slave/power8-vlp06-bintar-debug/build/mysql-test/r/stat_tables_par.reject	2016-09-13 02:56:52.463254631 -0400
@@ -176,7 +176,6 @@
 dbt3_s001	lineitem	i_l_orderkey_quantity	2	1.0404
 dbt3_s001	lineitem	i_l_partkey	1	30.0250
 dbt3_s001	lineitem	i_l_receiptdate	1	2.6477
-dbt3_s001	lineitem	i_l_shipdate	1	2.6500
 dbt3_s001	lineitem	i_l_suppkey	1	600.5000
 dbt3_s001	lineitem	i_l_suppkey_partkey	1	30.0250
 dbt3_s001	lineitem	i_l_suppkey_partkey	2	8.5786
@@ -191,7 +190,6 @@
 dbt3_s001	lineitem	i_l_orderkey_quantity	2	1.0404
 dbt3_s001	lineitem	i_l_partkey	1	30.0250
 dbt3_s001	lineitem	i_l_receiptdate	1	2.6477
-dbt3_s001	lineitem	i_l_shipdate	1	2.6500
 dbt3_s001	lineitem	i_l_suppkey	1	600.5000
 dbt3_s001	lineitem	i_l_suppkey_partkey	1	30.0250
 dbt3_s001	lineitem	i_l_suppkey_partkey	2	8.5786
 
mysqltest: Result length mismatch



 Comments   
Comment by Elena Stepanova [ 2017-01-24 ]

Still happens. Most recent occurrence:
http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/2081/steps/test/logs/stdio

Comment by Daniel Black [ 2017-02-16 ]

The diffs are in the results of these two (identical) queries: https://github.com/MariaDB/server/blob/10.2/mysql-test/t/stat_tables_par.test#L147..L154

Seems to be a sparatic error however I did have it once.

The removal of the disable_

{result_log|warnings}

for connection1/2 before these statements will probably provide the insight required. Both analyze statements should predictably return:

Table  Op      Msg_type        Msg_text
dbt3_s001.lineitem     analyze status  Engine-independent statistics collected
dbt3_s001.lineitem     analyze status  OK

I also noticed the following timeout which is further down in the test:

| Id | User        | Host      | db        | Command | Time | State                                   | Info                                                                                                                               | Progress |
| 15 | root        | localhost | dbt3_s001 | Query   |   98 | Waiting for table level lock            | analyze table lineitem persistent for all                                                                                          |    0.000 |
| 16 | root        | localhost | dbt3_s001 | Query   |   98 | debug sync point: statistics_read_start | select * from mysql.index_stats, lineitem where index_name= 'i_l_shipdate' and l_orderkey=1 and l_partkey=68 order by prefix_arity |    0.000 |

Comment by Michael Widenius [ 2017-05-08 ]

Original bug was fixed as part of MDEV-3891, but the test case had a bug that cause a random failure in debug builds while running tests.

Comment by Michael Widenius [ 2017-05-08 ]

The reason for this is that the following code almost always causes a deadlock that will eventually be resolved with a timeout:

connection con1;
set debug_sync='statistics_update_start SIGNAL parker WAIT_FOR go1 EXECUTE 1';
set debug_sync='thr_multi_lock_after_thr_lock SIGNAL go2 EXECUTE 2';
use dbt3_s001;
--send analyze table lineitem persistent for all

connection con2;
set debug_sync='open_and_process_table WAIT_FOR parker';
set debug_sync='statistics_read_start SIGNAL go1 WAIT_FOR go2';
use dbt3_s001;
--send select * from mysql.index_stats, lineitem where index_name= 'i_l_shipdate' and l_orderkey=1 and l_partkey=68 order by prefix_arity;

The deadlock happens because the SELECT will first run and take a read lock for index_stats, then the analyze statement will run and tries to take a write lock on table_stats, column_stats and index_stats.
This will block as SELECT already have the lock.

The tests works 'sometimes' depending on lock order in memory, as thr_multi_lock_after_thr_lock was done after the first lock.
Another problem is that the sync point thr_multi_lock_after_thr_lock was hit multiple times by the analyze
thread, which caused the sync point to become inactive before it was time to use it.

Can be fixed by adding a new sync point just before all thr locks and ensuring that thr_multi_lock_before_thr_lock is not triggered for the first call.

This should work as the bug fix for MDEV-3891 will notice that the select should not use statistics for this query and the analyze will continue after the select finishes.

The attached patch fixes these issues

Comment by Michael Widenius [ 2017-05-11 ]

Fixed by moving DEBUG_SYNC's to make things predictable and adjusting test

This also reduce the time for the test from 300 seconds to a few seconds

Generated at Thu Feb 08 07:45:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.