[MDEV-7142] main.index_merge_innodb fails sporadically in buildbot with wrong result or timeout Created: 2014-11-19  Updated: 2023-11-29  Resolved: 2023-11-29

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.0, 10.1, 10.2
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Won't Fix Votes: 0
Labels: buildbot, tests

Issue Links:
Relates
relates to MDEV-7069 Fix buildbot failures in main server ... Stalled
relates to MDEV-14121 Assertion failure 'slot != NULL' in o... Closed
relates to MDEV-15203 innodb.innodb_max_recordsize_64k fail... Closed
Sprint: 10.2.1-5

 Description   

http://buildbot.askmonty.org/buildbot/builders/bintar-rhel7-p8/builds/16/steps/test/logs/stdio

main.index_merge_innodb 'xtradb'         w1 [ fail ]
        Test ended at 2014-11-19 12:27:44
 
CURRENT_TEST: main.index_merge_innodb
--- /home/buildbot/maria-slave/power8-vlp03-bintar/build/mysql-test/r/index_merge_innodb.result	2014-11-19 10:43:11.911410000 -0500
+++ /home/buildbot/maria-slave/power8-vlp03-bintar/build/mysql-test/r/index_merge_innodb.reject	2014-11-19 12:27:43.950083391 -0500
@@ -313,7 +313,7 @@
 update t1 set key2=key1,key3=key1;
 explain select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t1	index_merge	i2,i3	i3,i2	4,4	NULL	REF	Using sort_union(i3,i2); Using where
+1	SIMPLE	t1	ALL	i2,i3	NULL	NULL	NULL	REF	Using where
 select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
 key1	key2	key3
 31	31	31

It happens from time to time on different builds.



 Comments   
Comment by Sergei Petrunia [ 2014-11-20 ]

I suspect this something that is not a problem of index_merge. index_merge code hasn't been touched in years. Perhaps, innodb started to return different record estimates...

elenst, can you track when (at least approximately) this issue started to show up?

Comment by Sergei Petrunia [ 2016-09-05 ]

Debugging this example, in get_best_disjunct_quick

The table has 1024 rows.

costs of merged index scans are:

(gdb) p (*cur_child)->read_cost
  $7 = 1.6124384296513044
...
(gdb) p (*cur_child)->read_cost
  $8 = 2.2160960741282612

After adding costs merging and of using Unique we construct TRP_INDEX_MERGE
with this cost:

(gdb) print imerge_cost
  $20 = 12.928837487575034

There is a huge difference with cost of full scan:

(gdb) print read_time
  $13 = 207.90000000000001
 
(gdb) p param->table->stat_records()
  $12 = 1024

So, this failure doesn't look like an "optimizer switches between two query plans with nearly identical costs" failure.

Comment by Sergei Petrunia [ 2016-09-05 ]

The only reason for this failure that I can think of is InnoDB's statistics update being delayed. This will cause the optimizer to see a very low stat_records(), and it will pick full scan instead of index_merge.
Pushed this patch: https://github.com/MariaDB/server/commit/a14f61ef749ad9f9ab2b0f5badf6754ba7443c9e

elenst, feel free to re-open if the issue still shows up.

Comment by Elena Stepanova [ 2016-11-06 ]

It still happens.
http://buildbot.askmonty.org/buildbot/builders/kvm-deb-jessie-amd64/builds/1308/steps/test_4/logs/stdio

main.index_merge_innodb 'xtradb'         w3 [ fail ]
        Test ended at 2016-11-01 04:24:15
 
CURRENT_TEST: main.index_merge_innodb
--- /usr/share/mysql/mysql-test/r/index_merge_innodb.result	2016-11-01 01:56:57.000000000 -0400
+++ /dev/shm/var/3/log/index_merge_innodb.reject	2016-11-01 04:24:15.250529474 -0400
@@ -316,7 +316,7 @@
 test.t1	analyze	status	OK
 explain select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t1	index_merge	i2,i3	i3,i2	4,4	NULL	REF	Using sort_union(i3,i2); Using where
+1	SIMPLE	t1	ALL	i2,i3	NULL	NULL	NULL	REF	Using where
 select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
 key1	key2	key3
 31	31	31
 
mysqltest: Result length mismatch

Normally we don't re-open issues after (attempted) bugfixes have been released, but since it's just a test, I suppose there is no big harm.

Comment by Elena Stepanova [ 2017-02-08 ]

Still happens, now on 10.2:
https://internal.askmonty.org/buildbot/builders/win32-packages/builds/2110/steps/test/logs/stdio

Comment by Elena Stepanova [ 2017-05-24 ]

It also fails with timeout on valgrind:
http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/10064/steps/test/logs/stdio

main.index_merge_innodb 'xtradb'         w1 [ fail ]  timeout after 9000 seconds
        Test ended at 2017-05-23 02:58:00
 
Test case timeout after 9000 seconds
 
== /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/1/log/index_merge_innodb.log == 
key2 int not null,
pktail1ok  int not null,
pktail2ok  int not null,
pktail3bad int not null,
pktail4bad int not null,
pktail5bad int not null,
pk2copy int not null,
badkey  int not null,
filler1 char (200),
filler2 char (200),
key (key1),
key (key2),
/* keys with tails from CPK members */
key (pktail1ok, pk1),
key (pktail2ok, pk1, pk2),
key (pktail3bad, pk2, pk1),
key (pktail4bad, pk1, pk2copy),
key (pktail5bad, pk1, pk2, pk2copy),
primary key (pk1, pk2)
);
 
 == /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/1/tmp/analyze-timeout-mysqld.1.err ==
mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/tmp/1/mysqld.1.sock' (111 "Connection refused")
 
 
 - skipping '/mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/1/log/main.index_merge_innodb-xtradb/'
***Warnings generated in error logs during shutdown after running tests: funcs_1.is_columns_is main.index_merge_innodb
 
Attempting backtrace. You can use the following information to find out

Comment by Laurynas Biveinis [ 2017-05-24 ]

My attempt to debug this issue (the original one, not the timeout) has resulted in https://bugs.mysql.com/bug.php?id=84366 (InnoDB index dives do not detect concurrent tree changes, return bogus estimates)

Comment by Alice Sherepa [ 2017-09-05 ]

it fails on 10.3 too, now with innodb. http://buildbot.askmonty.org/buildbot/builders/kvm-deb-stretch-x86/builds/1183/steps/mtr/logs/stdio

main.index_merge_innodb 'innodb'         w4 [ fail ]
        Test ended at 2017-09-05 03:31:47
CURRENT_TEST: main.index_merge_innodb
--- /usr/share/mysql/mysql-test/r/index_merge_innodb.result	2017-09-05 02:12:54.000000000 -0400
+++ /dev/shm/var/4/log/index_merge_innodb.reject	2017-09-05 03:31:47.271347035 -0400
@@ -316,7 +316,7 @@
 test.t1	analyze	status	OK
 explain select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t1	index_merge	i2,i3	i3,i2	4,4	NULL	REF	Using sort_union(i3,i2); Using where
+1	SIMPLE	t1	ALL	i2,i3	NULL	NULL	NULL	REF	Using where
 select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
 key1	key2	key3
 31	31	31
 
mysqltest: Result length mismatch

Generated at Thu Feb 08 07:17:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.