[MDEV-30395] Wrong result with semijoin and Federated as outer table Created: 2023-01-12  Updated: 2023-02-10  Resolved: 2023-01-13

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - Federated
Affects Version/s: 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.11.2, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: 11.0-sel

Issue Links:
Problem/Incident
is caused by MDEV-26974 Improve selectivity and related costs... Closed
Relates
relates to MDEV-30569 Assertion `!(p->table->table->file->h... Closed

 Description   

INSTALL SONAME 'ha_federatedx'; 
 
eval create server s foreign data wrapper mysql options (host "127.0.0.1", database "test", user "root", port $MASTER_MYPORT);
set optimizer_switch="materialization=off";
 
CREATE TABLE t1 (a INT);
INSERT INTO t1 VALUES (3),(2),(3);
 
CREATE TABLE t2 (pk INT PRIMARY KEY);
INSERT INTO t2 VALUES (1),(2),(3),(4);
 
CREATE TABLE t2_fed ENGINE=FEDERATED CONNECTION='s/t2';
 
SELECT * FROM t2_fed WHERE pk IN ( SELECT a FROM t1 );
SET optimizer_switch='semijoin=off';
SELECT * FROM t2_fed WHERE pk IN ( SELECT a FROM t1 );
 
# Cleanup
 
DROP TABLE t2_fed, t1, t2;
DROP SERVER s;
 
UNINSTALL SONAME 'ha_federatedx';

bb-11.0 78c07ed17

SELECT * FROM t2_fed WHERE pk IN ( SELECT a FROM t1 );
pk
3
2
3
SET optimizer_switch='semijoin=off';
SELECT * FROM t2_fed WHERE pk IN ( SELECT a FROM t1 );
pk
2
3

2 rows which are returned without semijoin is the expected result.

Plans with semijoin enabled:

bb-11.0

EXPLAIN EXTENDED SELECT * FROM t2_fed WHERE pk IN ( SELECT a FROM t1 );
id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	t1	ALL	NULL	NULL	NULL	NULL	3	100.00	Using where; Start temporary
1	PRIMARY	t2_fed	eq_ref	PRIMARY	PRIMARY	4	test.t1.a	1	33.33	End temporary
Warnings:
Note	1003	select `test`.`t2_fed`.`pk` AS `pk` from `test`.`t2_fed` semi join (`test`.`t1`) where `test`.`t2_fed`.`pk` = `test`.`t1`.`a`

baseline

EXPLAIN EXTENDED SELECT * FROM t2_fed WHERE pk IN ( SELECT a FROM t1 );
id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	<subquery2>	ALL	distinct_key	NULL	NULL	NULL	3	100.00	
1	PRIMARY	t2_fed	eq_ref	PRIMARY	PRIMARY	4	test.t1.a	1	100.00	
2	MATERIALIZED	t1	ALL	NULL	NULL	NULL	NULL	3	100.00	Using where
Warnings:
Note	1003	select `test`.`t2_fed`.`pk` AS `pk` from `test`.`t2_fed` semi join (`test`.`t1`) where `test`.`t2_fed`.`pk` = `test`.`t1`.`a`



 Comments   
Comment by Michael Widenius [ 2023-01-12 ]

I have verified that this an old bug.

Adding
"set optimizer_switch="materialization=off" to the test cases
causes 3 rows to be returned for the first query also in 10.5 and 10.11

Comment by Elena Stepanova [ 2023-01-12 ]

Updated versions etc. based on the above.

Comment by Michael Widenius [ 2023-01-12 ]

The problem was that federated engine does not support comparable rowids which was not taken into account by semijoin code.

Fixed by checking that we don't use semijoin with tables that does not support comparable rowids.

Comment by Michael Widenius [ 2023-01-13 ]

Finding the cause and finding the best way to fix this.
Also a lot of discussions with Petrunia about the right way to fix this.

Generated at Thu Feb 08 10:15:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.