[MDEV-26247] Spider: Valid LEFT JOIN results in ERROR 1064 Created: 2021-07-26  Updated: 2023-12-05  Resolved: 2023-11-17

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - Spider
Affects Version/s: 10.4.11, 10.5.11, 10.3
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3

Type: Bug Priority: Critical
Reporter: Bob Gladhill Assignee: Yuchen Pei
Resolution: Fixed Votes: 1
Labels: 1064, not-10.2, spider-gbh
Environment:

Back-end database instance: Mariadb 10.4.11 with 32 GB RAM 6 x 1 core Intel(R) Xeon(R) CPU E7-4850 v3 @ 2.20GHz

Spider node instance: Mariadb 10.4.11 with 300 GB RAM 24 x 1 core Intel(R) Xeon(R) CPU E7-4850 v3 @ 2.20GHz Linux 3.10.0-1127.19.1.el7.x86_64 #1 Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 SMP GNU/Linux

All instances are running the following in a VM environment:
CentOS 7 Linux 3.10.0-1127.19.1.el7.x86_64 #1 Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 SMP GNU/Linux


Issue Links:
Blocks
blocks MDEV-29163 Server crash with SIGSEGV or dynamic-... Closed
blocks MDEV-30392 Syntax error upon query with subquery... Closed
blocks MDEV-31645 Spider doesn't recognize semi JOIN Closed
PartOf
includes MDEV-29163 Server crash with SIGSEGV or dynamic-... Closed
includes MDEV-30392 Syntax error upon query with subquery... Closed
includes MDEV-31645 Spider doesn't recognize semi JOIN Closed
Relates
relates to MDEV-32238 Add a switch to disable spider group ... Closed
relates to MDEV-32273 Broken queries are passed to the grou... Open
relates to MDEV-28691 Spider: fall back to usual handler if... Open

 Description   

The following is a simplified version of my original query where the bug was first discovered. When attempting to do the following join of two Spider tables, the following error message is returned for a valid query:

select nextID from tab1 LEFT join tab2 on id = currentID where id in ( 1 );

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'join `forensic`.`tab2` t0 on (t0.`currentID` = 1 where 1' at line 1

If I run the query on the back-end database instance, it works as expected with no error. If I run it on the Spider node instance, I get the 1064 error.

I have tested this with Mariadb 10.4.11 and 10.5.11 (both with Spider version 3.3.15). I have also tested this with both tables as Innodb tables. The same error results.

I have reduced the contents of my.cnf to the following for simplicity (but I've tested with and without these and they don't effect the result) and there are file path definitions that I define:

log_warnings=1
default_storage_engine=MyISAM
innodb=OFF
general_log=ON
expire_logs_days=2

create database mydb default character set latin1;
use mydb
 
create user spidy@`%`;
set password for spidy@`%` = password('password123');
grant select on *.* to spidy@`%`;
 
-- grant all on mydb.tab1 to spidy@`%`;
-- grant all on mydb.tab2 to spidy@`%`;
 
create table tab1 (
  id int(4) not null auto_increment,
  status varchar(25) default null,
  primary key (id)
) engine=MYISAM default charset=latin1;
 
create table tab2 (
  currentID int(4) not null default 0,
  nextID int(4) not null default 0,
  primary key (currentID, nextID)
) engine=MYISAM default charset=latin1;
 
insert into tab1 values 
(null, 'WORKING'),
(null, 'READY FOR QC'),
(null, 'READY TO CREATE'),
(null, 'CREATED'),
(null, 'APPROVAL ERROR'),
(null, 'CREATION ERROR'),
(null, 'AWAITING'),
(null, 'NEEDS RTL'),
(null, 'RTL SUBMITTED'),
(null, 'ARCHIVED');
 
insert into tab2 values 
(1,2),
(1,7),
(1,8),
(1,10),
(2,6),
(2,10),
(3,4),
(3,6),
(3,10),
(5,1),
(5,10),
(6,1),
(6,10),
(7,2),
(7,10),
(8,5),
(8,9),
(8,10),
(9,2),
(9,5),
(9,7),
(9,10),
(10,1),
(10,2),
(10,3),
(10,5),
(10,6),
(10,7),
(10,8),
(10,9);
 
create or replace server dataNode1 foreign data wrapper mysql options (PORT 3306, HOST 'hostname', USER 'spidy', PASSWORD 'password123');
select * from mysql.servers;
select db_name, table_name, server from mysql.spider_tables;
 
create table mydb.tab1 engine=SPIDER comment='wrapper "mysql", srv "dataNode1", table "tab1"';
create table mydb.tab2 engine=SPIDER comment='wrapper "mysql", srv "dataNode1", table "tab2"';
 
-- Test cases:
 
-- produces ERROR
select nextID 
from tab1
LEFT join tab2 on id = currentID
where id in ( 1 );
 
-- produces ERROR
select nextID
from tab2
RIGHT join tab1 on currentID = id
where id in ( 1 );
 
-- produces ERROR
select nextID
from tab2
RIGHT join tab1 on currentID = id
where id in ( 1,2 );
 
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'join `forensic`.`tab2` t0 on (t0.`currentID` = 1 where 1' at line 1
 
-- OK - no error
select nextID 
from tab1
LEFT join tab2 on id = currentID
where id in ( 1,2 );
 
-- OK - no error
select nextID 
from tab2
LEFT join tab1 on id = currentID
where id in ( 1 );
 
-- OK - no error
select nextID
from tab1
RIGHT join tab2 on currentID = id
where id in ( 1 );



 Comments   
Comment by Nayuta Yanagisawa (Inactive) [ 2021-08-11 ]

bgladhill Thank you for your report! I will check it soon.

Comment by Todd McDowell [ 2021-08-11 ]

Thank you! bgladhill and I are excited for a resolution to this issue. Keep us posted. Thanks again!

Comment by Nayuta Yanagisawa (Inactive) [ 2021-08-13 ]

The bug reproduced on 10.3, 10.4,10.5, and 10.6 HEAD, but not on 10.2 HEAD.

The following wrong query rewrite is done on 10.3 or above.

select nextID from sp1 LEFT join sp2 on id = currentID where id in ( 1 );    /* Query on Spider */
=> select t0.`nextID` `nextID` from  left join `test`.`tab2` t0 on (t0.`currentID` = 1) where 1    /* Query on data node */ 

Comment by Nayuta Yanagisawa (Inactive) [ 2021-08-26 ]

I created a simpler test case:
https://github.com/MariaDB/server/tree/bb-10.3-mdev-26247

Comment by Nayuta Yanagisawa (Inactive) [ 2021-08-27 ]

The following function generates wrong queries:

int spider_db_mbase_util::append_from_and_tables(
  ha_spider *spider,
  spider_fields *fields,
  spider_string *str,
  TABLE_LIST *table_list,
  uint table_count
) {
   ...
}

What is storage is that, when the following query is executed, the table_list only includes tbl_b and table_count is 1.

SELECT d FROM tbl_a LEFT JOIN tbl_b ON a = c WHERE a IN (1);

This is just a guess but the query optimizer possibly optimizer away tbl_a. That is because the optimizer know there is a single tbl_a row matching WHERE a = 1 and thus there is no need to scan tbl_a.

Comment by Todd McDowell [ 2021-08-29 ]

Thanks nayuta-yanagisawa. This is great progress. Looking forward to a fix!

Comment by Nayuta Yanagisawa (Inactive) [ 2021-08-31 ]

I noticed that the bug is a regression and tried git-bisect, but it was difficult to run git-bisect correctly. I'm now trying manual git-bisecting.

Comment by Nayuta Yanagisawa (Inactive) [ 2021-08-31 ]

The result of git-bisect implies that the bug has existed since the direct join is implemented. Therefore, it was not possible, by git-bisecting, to identify which part of the code had the problem.

6caf9ec425a12ed54741cdf4c4a238c45cabd898 is the first bad commit
commit 6caf9ec425a12ed54741cdf4c4a238c45cabd898
Author: Kentoku <kentokushiba@gmail.com>
Date:   Tue Nov 20 00:12:58 2018 +0900
 
    Update Spider to version 3.3.14. Add direct left outer join/right outer join/inner join feature
 
 storage/spider/ha_spider.cc                        | 191 ++++-
 storage/spider/ha_spider.h                         |  92 ++-
 storage/spider/hs_client/hs_compat.h               |   2 +-
 .../spider/include/direct_join_using_deinit.inc    |   9 +
 .../spider/include/direct_join_using_init.inc      |  13 +
 .../spider/include/direct_left_join_deinit.inc     |   9 +
 .../spider/include/direct_left_join_init.inc       |  13 +
 .../include/direct_left_join_nullable_deinit.inc   |   9 +
 .../include/direct_left_join_nullable_init.inc     |  13 +
 .../direct_left_right_join_nullable_deinit.inc     |   9 +
 .../direct_left_right_join_nullable_init.inc       |  13 +
 ...direct_left_right_left_join_nullable_deinit.inc |   9 +
 .../direct_left_right_left_join_nullable_init.inc  |  13 +
 .../spider/include/direct_right_join_deinit.inc    |   9 +
 .../spider/include/direct_right_join_init.inc      |  13 +
 .../include/direct_right_join_nullable_deinit.inc  |   9 +
 .../include/direct_right_join_nullable_init.inc    |  13 +
 .../direct_right_left_join_nullable_deinit.inc     |   9 +
 .../direct_right_left_join_nullable_init.inc       |  13 +
 ...irect_right_left_right_join_nullable_deinit.inc |   9 +
 .../direct_right_left_right_join_nullable_init.inc |  13 +
 .../mysql-test/spider/include/init_spider.inc      |  42 +-
 .../mysql-test/spider/r/direct_join_using.result   | 108 +++
 .../mysql-test/spider/r/direct_left_join.result    | 108 +++
 .../spider/r/direct_left_join_nullable.result      | 113 +++
 .../r/direct_left_right_join_nullable.result       | 113 +++
 .../r/direct_left_right_left_join_nullable.result  | 112 +++
 .../mysql-test/spider/r/direct_right_join.result   | 108 +++   
 .../spider/r/direct_right_join_nullable.result     | 113 +++
 .../r/direct_right_left_join_nullable.result       | 112 +++
 .../r/direct_right_left_right_join_nullable.result | 113 +++
 .../mysql-test/spider/r/show_system_tables.result  |  37 +
 .../mysql-test/spider/t/direct_join_using.test     | 197 +++++
 .../mysql-test/spider/t/direct_left_join.test      | 197 +++++
 .../spider/t/direct_left_join_nullable.test        | 212 +++++
 .../spider/t/direct_left_right_join_nullable.test  | 212 +++++
 .../t/direct_left_right_left_join_nullable.test    | 212 +++++
 .../mysql-test/spider/t/direct_right_join.test     | 197 +++++
 .../spider/t/direct_right_join_nullable.test       | 212 +++++
 .../spider/t/direct_right_left_join_nullable.test  | 212 +++++
 .../t/direct_right_left_right_join_nullable.test   | 212 +++++
 .../mysql-test/spider/t/show_system_tables.test    |  26 +
 storage/spider/scripts/install_spider.sql          |  88 +-
 storage/spider/spd_conn.cc                         |   9 +-
 storage/spider/spd_conn.h                          |   4 +-
 storage/spider/spd_copy_tables.cc                  |  45 +-
 storage/spider/spd_db_conn.cc                      | 413 ++++++----
 storage/spider/spd_db_conn.h                       |  23 +-
 storage/spider/spd_db_handlersocket.cc             | 105 +--
 storage/spider/spd_db_handlersocket.h              |   7 +-
 storage/spider/spd_db_include.h                    |  10 +-
 storage/spider/spd_db_mysql.cc                     | 902 +++++++++++++++++----
 storage/spider/spd_db_mysql.h                      |  43 +-
 storage/spider/spd_db_oracle.cc                    | 288 ++++---
 storage/spider/spd_db_oracle.h                     |   9 +-
 storage/spider/spd_direct_sql.cc                   |  79 +-
 storage/spider/spd_environ.h                       |  15 +-
 storage/spider/spd_group_by_handler.cc             |  63 +-
 storage/spider/spd_i_s.cc                          |   4 +-
 storage/spider/spd_include.h                       |  85 +-
 storage/spider/spd_param.cc                        |  27 +-
 storage/spider/spd_param.h                         |   5 +-
 storage/spider/spd_ping_table.cc                   |   2 +-
 storage/spider/spd_sys_table.cc                    | 163 +++-
 storage/spider/spd_sys_table.h                     |  33 +-
 storage/spider/spd_table.cc                        |  42 +-
 storage/spider/spd_trx.cc                          |  14 +-
 67 files changed, 5237 insertions(+), 692 deletions(-)
 create mode 100644 storage/spider/mysql-test/spider/include/direct_join_using_deinit.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_join_using_init.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_left_join_deinit.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_left_join_init.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_left_join_nullable_deinit.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_left_join_nullable_init.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_left_right_join_nullable_deinit.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_left_right_join_nullable_init.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_left_right_left_join_nullable_deinit.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_left_right_left_join_nullable_init.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_right_join_deinit.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_right_join_init.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_right_join_nullable_deinit.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_right_join_nullable_init.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_right_left_join_nullable_deinit.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_right_left_join_nullable_init.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_right_left_right_join_nullable_deinit.inc
 create mode 100644 storage/spider/mysql-test/spider/include/direct_right_left_right_join_nullable_init.inc
 create mode 100644 storage/spider/mysql-test/spider/r/direct_join_using.result
 create mode 100644 storage/spider/mysql-test/spider/r/direct_left_join.result
 create mode 100644 storage/spider/mysql-test/spider/r/direct_left_join_nullable.result
 create mode 100644 storage/spider/mysql-test/spider/r/direct_left_right_join_nullable.result
 create mode 100644 storage/spider/mysql-test/spider/r/direct_left_right_left_join_nullable.result
 create mode 100644 storage/spider/mysql-test/spider/r/direct_right_join.result
 create mode 100644 storage/spider/mysql-test/spider/r/direct_right_join_nullable.result
 create mode 100644 storage/spider/mysql-test/spider/r/direct_right_left_join_nullable.result
 create mode 100644 storage/spider/mysql-test/spider/r/direct_right_left_right_join_nullable.result
 create mode 100644 storage/spider/mysql-test/spider/r/show_system_tables.result
 create mode 100644 storage/spider/mysql-test/spider/t/direct_join_using.test
 create mode 100644 storage/spider/mysql-test/spider/t/direct_left_join.test
 create mode 100644 storage/spider/mysql-test/spider/t/direct_left_join_nullable.test
 create mode 100644 storage/spider/mysql-test/spider/t/direct_left_right_join_nullable.test
 create mode 100644 storage/spider/mysql-test/spider/t/direct_left_right_left_join_nullable.test
 create mode 100644 storage/spider/mysql-test/spider/t/direct_right_join.test
 create mode 100644 storage/spider/mysql-test/spider/t/direct_right_join_nullable.test
 create mode 100644 storage/spider/mysql-test/spider/t/direct_right_left_join_nullable.test
 create mode 100644 storage/spider/mysql-test/spider/t/direct_right_left_right_join_nullable.test
 create mode 100644 storage/spider/mysql-test/spider/t/show_system_tables.test

Comment by Nayuta Yanagisawa (Inactive) [ 2021-10-08 ]

spider_db_mbase_util::append_from_and_tables() seems to wrongly handles the first table appears in table_list, judging by the generated query corresponding to SELECT d FROM tbl_b RIGHT JOIN tbl_a ON c = a WHERE a IN (1,2). The function must put the first table, just after FROM, without any (LEFT/RIGHT) JOIN clause.

Comment by Nayuta Yanagisawa (Inactive) [ 2021-10-08 ]

SELECT d FROM tbl_b RIGHT JOIN tbl_a ON c = a WHERE a IN (1,2)
#=> select t0.`d` `d` from  join `auto_test_remote`.`tbl_b` t0 on (t0.`c` = t1.`a`),`auto_test_remote`.`tbl_a` t1 where (t1.`a` in( 1 , 2))

Comment by Todd McDowell [ 2021-10-09 ]

Thanks for resuming work on fixing this bug, nayuta-yanagisawa.

Comment by Nayuta Yanagisawa (Inactive) [ 2021-10-12 ]

spider_db_mbase_util::append_from_and_tables() seems to expect that the first table in TABLE_LIST satisfies !table_list->outer_join && !table_list->on_expr && !table_list->join_using_fields. If the condition does not holds, the function print a JOIN clause. However, the expectation does not hold for the following queries:

  • SELECT d FROM tbl_b RIGHT JOIN tbl_a ON c = a WHERE a IN (1,2)
  • SELECT d FROM tbl_b RIGHT JOIN tbl_a ON c = a WHERE a IN (2)
  • SELECT d FROM tbl_a LEFT JOIN tbl_b ON a = c WHERE a IN (1)
Comment by Nayuta Yanagisawa (Inactive) [ 2021-10-12 ]

For the first query, the ON condition is attached to the first table of TABLE_LIST (not OK). For the second query, one of the two tables is optimized away but the remaining table has an ON condition (not OK). The third query is the same as the second one.

For the first query, swapping the first table and the second one in TABLE_LIST fixes the present bug but it affects other test cases. For the second and the third query, we need to move the search condition from the ON clause to the WHERE clause.

Comment by Nayuta Yanagisawa (Inactive) [ 2021-10-12 ]

I forgot to paste my MTR test case:

--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
 
--connection child2_1
CREATE DATABASE auto_test_remote;
USE auto_test_remote;
 
eval CREATE TABLE tbl_a (
  a int,
  primary key (a)
) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
 
eval CREATE TABLE tbl_b (
  c int,
  d int,
  primary key (c, d)
) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
 
INSERT INTO tbl_a VALUES (1), (2), (3);
INSERT INTO tbl_b VALUES (1, 11), (2, 22), (3, 33);
 
--connection master_1
CREATE DATABASE auto_test_local;
USE auto_test_local;
 
eval CREATE TABLE tbl_a (
  a int,
  primary key (a)
) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='srv "s_2_1", table "tbl_a"';
 
eval CREATE TABLE tbl_b (
  c int,
  d int,
  primary key (c, d)
) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='srv "s_2_1", table "tbl_b"';
 
SELECT d FROM tbl_b RIGHT JOIN tbl_a ON c = a WHERE a IN (1,2);
SELECT d FROM tbl_b RIGHT JOIN tbl_a ON c = a WHERE a IN (2);
SELECT d FROM tbl_a LEFT JOIN tbl_b ON a = c WHERE a IN (1);
 
--connection master_1
DROP DATABASE IF EXISTS auto_test_local;
--connection child2_1
DROP DATABASE IF EXISTS auto_test_remote;
 
--disable_query_log
--disable_result_log
--source ../../t/test_deinit.inc
--enable_result_log
--enable_query_log

Comment by Sergei Petrunia [ 2021-10-20 ]

Looking at the first query

SELECT d FROM tbl_b RIGHT JOIN tbl_a ON c = a WHERE a IN (1,2);

  #0  spider_db_mbase_util::append_from_and_tables (this=0x7fffec5eed90 <spider_db_mysql_utility>, spider=0x7fff90146c30, fields=0x7fff901af460, str=0x7fff90152ea0, table_list=0x7fff90015028, table_count=2) at /home/psergey/dev-git2/10.6-cp/storage/spider/spd_db_mysql.cc:7683
  #1  0x00007fffec372aff in spider_mbase_handler::append_from_and_tables_part (this=0x7fff90152e40, fields=0x7fff901af460, sql_type=1) at /home/psergey/dev-git2/10.6-cp/storage/spider/spd_db_mysql.cc:16414
  #2  0x00007fffec37c2aa in spider_group_by_handler::init_scan (this=0x7fff901b5730) at /home/psergey/dev-git2/10.6-cp/storage/spider/spd_group_by_handler.cc:1323
  #3  0x0000555555f5ec21 in Pushdown_query::execute (this=0x7fff9001a0d8, join=0x7fff90017268) at /home/psergey/dev-git2/10.6-cp/sql/group_by_handler.cc:49

Spider has produced:

(gdb) p str->str.Ptr
  $68 = 0x7fff901bc4b8 "select t0.`d` `d` from  join `auto_test_remote`.`tbl_b` t0 on (t0.`c` = t1.`a`),`auto_test_remote`.`tbl_a` t1"

That is, it first added

join `auto_test_remote`.`tbl_b` t0 on (t0.`c` = t1.`a`)

and then it added

,`auto_test_remote`.`tbl_a` t1

I'm not sure what is the intent of the code in spider_mbase_handler::append_from_and_tables_part, but it does it incorrect.

Here's what I get when I ask the SQL layer to print the code:

(gdb) up 3
  #3  0x0000555555f5ec21 in Pushdown_query::execute (this=0x7fff9001a0d8, join=0x7fff90017268) at /home/psergey/dev-git2/10.6-cp/sql/group_by_handler.cc:49
 
(gdb) p dbug_print(join->select_lex)
  $71 = 0x555557a70be0 <dbug_item_print_buf> "select tbl_b.d AS d from tbl_a left join tbl_b on(tbl_b.c = tbl_a.a) where tbl_a.a in (1,2)"

note the RIGHT join was changed to the LEFT one.

Comment by Sergei Petrunia [ 2021-10-20 ]

The problem is not limited to the RIGHT join...

(gdb) p thd->query_string
  $98 = {string = {str = 0x7fff90014240 "SELECT d FROM tbl_a LEFT JOIN tbl_b ON a = c WHERE a IN (1)",

  Thread 24 "mariadbd" hit Breakpoint 9, spider_mbase_handler::append_from_and_tables_part (this=0x7fff901aaec0, fields=0x7fff901b4820, sql_type=1) at /home/psergey/dev-git2/10.6-cp/storage/spider/spd_db_mysql.cc:16417
(gdb) p str->str.Ptr
  $90 = 0x7fff90050088 "select t0.`d` `d` from  left join `auto_test_remote`.`tbl_b` t0 on (t0.`c` = 1)"

incorrect, note "... left join tbl_b"

again, please check out how dbug_print(SELECT_LEX*) does it:

(gdb) p dbug_print(join->select_lex)
$93 = 0x555557a70be0 <dbug_item_print_buf> "select tbl_b.d AS d from tbl_a left join tbl_b on(tbl_b.c = 1) where 1"

Comment by Nayuta Yanagisawa (Inactive) [ 2021-10-21 ]

psergei Thank you very much for your analysis.

In my understanding, spider_mbase_handler::append_from_and_tables() and spider_db_mbase_util::append_table() seem to assume that tables in table_list (Query::from) are in the same order as it appears in the query (join->select_lex).

However, the assumption does not hold for the query below. As a result, the Spider (wrongly) prints the tbl_b first and then print tbl_a.

SELECT d FROM tbl_b RIGHT JOIN tbl_a ON c = a WHERE a IN (1,2)
#=> select t0.`d` `d` from  join `auto_test_remote`.`tbl_b` t0 on (t0.`c` = t1.`a`),`auto_test_remote`.`tbl_a` t1 where (t1.`a` in( 1 , 2))

Do you think the assumption is valid? If not, how can I find the correct order of the table?

Or, the idea to print the tables appeared in table_list one by one itself is a bad idea? For the second query, table_list doesn't include table_a.

SELECT d FROM tbl_b RIGHT JOIN tbl_a ON c = a WHERE a IN (2)

Comment by Todd McDowell [ 2022-04-11 ]

Hello. I'm still really interested in a fix for this bug. Has there been any updates or progress on this ticket? Thanks so much!

Comment by Elena Stepanova [ 2022-07-08 ]

Apparently related crash:

--source plugin/spider/spider/include/init_spider.inc
 
SET spider_same_server_link= on;
eval create server s foreign data wrapper mysql options (host "127.0.0.1", database "test", user "root", port $MASTER_MYPORT);
 
CREATE TABLE t1 (a INT);
CREATE TABLE t2 (b INT);
CREATE TABLE t3 (c INT, PRIMARY KEY(c));
 
CREATE TABLE t1_spider (a INT) ENGINE=SPIDER COMMENT = "wrapper 'mysql', srv 's', table 't1'";
CREATE TABLE t2_spider (b INT) ENGINE=SPIDER COMMENT = "wrapper 'mysql', srv 's', table 't2'";
CREATE TABLE t3_spider (c INT, PRIMARY KEY(c)) ENGINE=SPIDER COMMENT = "wrapper 'mysql', srv 's', table 't3'";
 
SELECT t1_spider.* FROM t1_spider LEFT JOIN t2_spider LEFT JOIN t3_spider ON b = c ON a = b;
 
# Cleanup
DROP TABLE t1_spider, t2_spider, t3_spider, t1, t2, t3;
 
--source plugin/spider/spider/include/deinit_spider.inc

10.3 d6e80c21

#2  0x000055d328afb26a in handle_fatal_signal (sig=11) at /data/src/10.3/sql/signal_handler.cc:365
        curr_time = 1657238605
        tm = {tm_sec = 25, tm_min = 3, tm_hour = 3, tm_mday = 8, tm_mon = 6, tm_year = 122, tm_wday = 5, tm_yday = 188, tm_isdst = 1, tm_gmtoff = 10800, tm_zone = 0x55d32a909fe0 "EEST"}
        thd = 0x7fc4b0000d90
        print_invalid_query_pointer = false
#3  <signal handler called>
No locals.
#4  0x00007fc4c15c62ae in spider_db_mbase_util::append_table (this=0x7fc4c16564d0 <spider_db_mysql_utility>, spider=0x7fc4b00bb4a8, fields=0x7fc4b01059b0, str=0x7fc4b00c9188, table_list=0x7fc4b0013af8, used_table_list=0x7fc4c169f360, current_pos=0x7fc4c169f3bc, cond_table_list_ptr=0x7fc4c169f3d0, top_down=true, first=true) at /data/src/10.3/storage/spider/spd_db_mysql.cc:5569
        on_expr = 0x55d329392da5 <code_state+167>
        error_num = 32708
        use_cond_table_list = false
        db_share = 0x55d3286c526a <String::q_append(char const*, unsigned long)+66>
        dbton_hdl = 0x7fc4c1615f1e
        table_holder = 0x0
        cond_table_list = 0x0
        spd = 0x7fc4c169f050
        _db_stack_frame_ = {func = 0x7fc4c161e050 "spider_db_mbase_util::append_tables_top_down", file = 0x7fc4c161b2d8 "/data/src/10.3/storage/spider/spd_db_mysql.cc", level = 2147483664, line = -1, prev = 0x7fc4c169f130}
#5  0x00007fc4c15c6b63 in spider_db_mbase_util::append_tables_top_down (this=0x7fc4c16564d0 <spider_db_mysql_utility>, spider=0x7fc4b00bb4a8, fields=0x7fc4b01059b0, str=0x7fc4b00c9188, table_list=0x7fc4b0014568, used_table_list=0x7fc4c169f360, current_pos=0x7fc4c169f3bc, cond_table_list_ptr=0x7fc4c169f3d0) at /data/src/10.3/storage/spider/spd_db_mysql.cc:5732
        error_num = 691468691
        outer_join_backup = 21971
        cur_table_list = 0x7fc4b0013af8
        prev_table_list = 0x0
        cond_table_list = 0x7fc4b0014568
        first = true
        _db_stack_frame_ = {func = 0x7fc4c161e160 "spider_db_mbase_util::append_embedding_tables", file = 0x7fc4c161b2d8 "/data/src/10.3/storage/spider/spd_db_mysql.cc", level = 2147483663, line = -1, prev = 0x7fc4c169f200}
        __PRETTY_FUNCTION__ = "int spider_db_mbase_util::append_tables_top_down(ha_spider*, spider_fields*, spider_string*, TABLE_LIST*, TABLE_LIST**, uint*, TABLE_LIST**)"
        it1 = {<base_list_iterator> = {list = 0x7fc4b0014bc8, el = 0x7fc4b0014c38, prev = 0x0, current = 0x0}, <No data fields>}
#6  0x00007fc4c15c753a in spider_db_mbase_util::append_embedding_tables (this=0x7fc4c16564d0 <spider_db_mysql_utility>, spider=0x7fc4b00bb4a8, fields=0x7fc4b01059b0, str=0x7fc4b00c9188, table_list=0x7fc4b0014568, used_table_list=0x7fc4c169f360, current_pos=0x7fc4c169f3bc, cond_table_list_ptr=0x7fc4c169f3d0) at /data/src/10.3/storage/spider/spd_db_mysql.cc:5885
        error_num = 32708
        embedding = 0x0
        _db_stack_frame_ = {func = 0x7fc4c161df18 "spider_db_mbase_util::append_table", file = 0x7fc4c161b2d8 "/data/src/10.3/storage/spider/spd_db_mysql.cc", level = 2147483662, line = -1, prev = 0x7fc4c169f2e0}
        __PRETTY_FUNCTION__ = "int spider_db_mbase_util::append_embedding_tables(ha_spider*, spider_fields*, spider_string*, TABLE_LIST*, TABLE_LIST**, uint*, TABLE_LIST**)"
#7  0x00007fc4c15c5e6b in spider_db_mbase_util::append_table (this=0x7fc4c16564d0 <spider_db_mysql_utility>, spider=0x7fc4b00bb4a8, fields=0x7fc4b01059b0, str=0x7fc4b00c9188, table_list=0x7fc4b0013438, used_table_list=0x7fc4c169f360, current_pos=0x7fc4c169f3bc, cond_table_list_ptr=0x7fc4c169f3d0, top_down=false, first=false) at /data/src/10.3/storage/spider/spd_db_mysql.cc:5475
        error_num = 0
        use_cond_table_list = false
        db_share = 0x7fc4b00c0aa0
        dbton_hdl = 0x7fc4b00c9130
        table_holder = 0x7fc4b01a8a20
        cond_table_list = 0x0
        spd = 0x7fc4b00bb4a8
        _db_stack_frame_ = {func = 0x7fc4c161e2c0 "spider_db_mbase_util::append_from_and_tables", file = 0x7fc4c161b2d8 "/data/src/10.3/storage/spider/spd_db_mysql.cc", level = 2147483661, line = -1, prev = 0x7fc4c169f3f0}
#8  0x00007fc4c15c77a7 in spider_db_mbase_util::append_from_and_tables (this=0x7fc4c16564d0 <spider_db_mysql_utility>, spider=0x7fc4b00bb4a8, fields=0x7fc4b01059b0, str=0x7fc4b00c9188, table_list=0x7fc4b0013438, table_count=2) at /data/src/10.3/storage/spider/spd_db_mysql.cc:5939
        error_num = 0
        current_pos = 1
        roop_count = 1
        backup_pos = 1
        outer_join_backup = 32708
        table = 0x7fc4b0100970
        used_table_list = 0x7fc4c169f360
        prev_table_list = 0x0
        cond_table_list = 0x0
        _db_stack_frame_ = {func = 0x7fc4c1622830 "spider_mbase_handler::append_from_and_tables_part", file = 0x7fc4c161b2d8 "/data/src/10.3/storage/spider/spd_db_mysql.cc", level = 2147483660, line = -1, prev = 0x7fc4c169f480}
#9  0x00007fc4c15f2b1b in spider_mbase_handler::append_from_and_tables_part (this=0x7fc4b00c9130, fields=0x7fc4b01059b0, sql_type=1) at /data/src/10.3/storage/spider/spd_db_mysql.cc:14247
        error_num = 0
        str = 0x7fc4b00c9188
        table_holder = 0x7fc4b01a8a20
        table_list = 0x7fc4b0012d78
        _db_stack_frame_ = {func = 0x7fc4c1623da0 "spider_group_by_handler::init_scan", file = 0x7fc4c1623198 "/data/src/10.3/storage/spider/spd_group_by_handler.cc", level = 2147483659, line = -1, prev = 0x7fc4c169f530}
#10 0x00007fc4c15fbc45 in spider_group_by_handler::init_scan (this=0x7fc4b01a8eb0) at /data/src/10.3/storage/spider/spd_group_by_handler.cc:1321
        error_num = 0
        link_idx = 1
        dbton_id = 0
        dbton_hdl = 0x7fc4b00c9130
        select_lex = 0x7fc4b00053d8
        select_limit = 9223372036854775807
        direct_order_limit = 9223372036854775807
        share = 0x7fc4b00bd1a0
        conn = 0x55d3293951dc <_db_enter_+282>
        result_list = 0x7fc4b00bb9f8
        link_idx_chain = 0x7fc4c169f560
        link_idx_holder = 0x7fc4c169f5c0
        _db_stack_frame_ = {func = 0x55d329429907 "Pushdown_query::execute", file = 0x55d3294298e0 "/data/src/10.3/sql/group_by_handler.cc", level = 2147483658, line = -1, prev = 0x7fc4c169f5c0}
        field = 0x7fc4b01aa0d8
        __PRETTY_FUNCTION__ = "virtual int spider_group_by_handler::init_scan()"
#11 0x000055d32889bcdf in Pushdown_query::execute (this=0x7fc4b01a6bf8, join=0x7fc4b0015a90) at /data/src/10.3/sql/group_by_handler.cc:49
        err = -1342174000
        max_limit = 140482743196304
        reset_limit = 0x0
        reset_item = 0x0
        thd = 0x7fc4b0000d90
        table = 0x7fc4b01a8fd8
        _db_stack_frame_ = {func = 0x55d329421220 "do_select", file = 0x55d32941e5c0 "/data/src/10.3/sql/sql_select.cc", level = 2147483657, line = -1, prev = 0x7fc4c169f660}
#12 0x000055d328853f35 in do_select (join=0x7fc4b0015a90, procedure=0x0) at /data/src/10.3/sql/sql_select.cc:19331
        res = 21971
        rc = 0
        error = NESTED_LOOP_OK
        _db_stack_frame_ = {func = 0x55d32941f2e1 "JOIN::exec_inner", file = 0x55d32941e5c0 "/data/src/10.3/sql/sql_select.cc", level = 2147483656, line = -1, prev = 0x7fc4c169f6c0}
        __PRETTY_FUNCTION__ = "int do_select(JOIN*, Procedure*)"
#13 0x000055d32882b14b in JOIN::exec_inner (this=0x7fc4b0015a90) at /data/src/10.3/sql/sql_select.cc:4151
        columns_list = 0x7fc4b0005500
        _db_stack_frame_ = {func = 0x55d32941f396 "mysql_select", file = 0x55d32941e5c0 "/data/src/10.3/sql/sql_select.cc", level = 2147483655, line = -1, prev = 0x7fc4c169f7a0}
        __PRETTY_FUNCTION__ = "void JOIN::exec_inner()"
#14 0x000055d32882a50e in JOIN::exec (this=0x7fc4b0015a90) at /data/src/10.3/sql/sql_select.cc:3945
No locals.
#15 0x000055d32882b837 in mysql_select (thd=0x7fc4b0000d90, tables=0x7fc4b0012d78, wild_num=1, fields=@0x7fc4b0005500: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fc4b0012d20, last = 0x7fc4b0012d20, elements = 1}, <No data fields>}, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fc4b0015a68, unit=0x7fc4b0004c40, select_lex=0x7fc4b00053d8) at /data/src/10.3/sql/sql_select.cc:4354
        err = 0
        free_join = true
        _db_stack_frame_ = {func = 0x55d32941e5fd "handle_select", file = 0x55d32941e5c0 "/data/src/10.3/sql/sql_select.cc", level = 2147483654, line = -1, prev = 0x7fc4c169f860}
        join = 0x7fc4b0015a90
#16 0x000055d32881cdad in handle_select (thd=0x7fc4b0000d90, lex=0x7fc4b0004b80, result=0x7fc4b0015a68, setup_tables_done_option=0) at /data/src/10.3/sql/sql_select.cc:372
        unit = 0x7fc4b0004c40
        res = false
        select_lex = 0x7fc4b00053d8
        _db_stack_frame_ = {func = 0x55d3294132c8 "mysql_execute_command", file = 0x55d3294126a8 "/data/src/10.3/sql/sql_parse.cc", level = 2147483653, line = -1, prev = 0x7fc4c169fed0}
#17 0x000055d3287e3d90 in execute_sqlcom_select (thd=0x7fc4b0000d90, all_tables=0x7fc4b0012d78) at /data/src/10.3/sql/sql_parse.cc:6339
        save_protocol = 0x0
        lex = 0x7fc4b0004b80
        result = 0x7fc4b0015a68
        res = false
        __PRETTY_FUNCTION__ = "bool execute_sqlcom_select(THD*, TABLE_LIST*)"
#18 0x000055d3287da798 in mysql_execute_command (thd=0x7fc4b0000d90) at /data/src/10.3/sql/sql_parse.cc:3870
        privileges_requested = 1
        res = 0
        up_result = 0
        lex = 0x7fc4b0004b80
        select_lex = 0x7fc4b00053d8
        first_table = 0x7fc4b0012d78
        all_tables = 0x7fc4b0012d78
        unit = 0x7fc4b0004c40
        have_table_map_for_update = false
        rpl_filter = 0x388293951dc
        _db_stack_frame_ = {func = 0x55d3294145a0 "mysql_parse", file = 0x55d3294126a8 "/data/src/10.3/sql/sql_parse.cc", level = 2147483652, line = -1, prev = 0x7fc4c16a0400}
        __PRETTY_FUNCTION__ = "int mysql_execute_command(THD*)"
        orig_binlog_format = BINLOG_FORMAT_MIXED
        orig_current_stmt_binlog_format = BINLOG_FORMAT_STMT
#19 0x000055d3287e8090 in mysql_parse (thd=0x7fc4b0000d90, rawbuf=0x7fc4b0012ad8 "SELECT t1_spider.* FROM t1_spider LEFT JOIN t2_spider LEFT JOIN t3_spider ON b = c ON a = b", length=91, parser_state=0x7fc4c16a05b0, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:7870
        found_semicolon = 0x0
        error = 32708
        lex = 0x7fc4b0004b80
        err = false
        _db_stack_frame_ = {func = 0x55d329412b32 "dispatch_command", file = 0x55d3294126a8 "/data/src/10.3/sql/sql_parse.cc", level = 2147483651, line = -1, prev = 0x7fc4c16a0590}
        __PRETTY_FUNCTION__ = "void mysql_parse(THD*, char*, uint, Parser_state*, bool, bool)"
#20 0x000055d3287d48c5 in dispatch_command (command=COM_QUERY, thd=0x7fc4b0000d90, packet=0x7fc4b0008f31 "", packet_length=91, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1852
        packet_end = 0x7fc4b0012b33 ""
        parser_state = {m_lip = {lookahead_token = -1, lookahead_yylval = 0x0, m_thd = 0x7fc4b0000d90, m_ptr = 0x7fc4b0012b34 "\004", m_tok_start = 0x7fc4b0012b34 "\004", m_tok_end = 0x7fc4b0012b34 "\004", m_end_of_query = 0x7fc4b0012b33 "", m_tok_start_prev = 0x7fc4b0012b33 "", m_buf = 0x7fc4b0012ad8 "SELECT t1_spider.* FROM t1_spider LEFT JOIN t2_spider LEFT JOIN t3_spider ON b = c ON a = b", m_buf_length = 91, m_echo = true, m_echo_saved = 12, m_cpp_buf = 0x7fc4b0012b90 "SELECT t1_spider.* FROM t1_spider LEFT JOIN t2_spider LEFT JOIN t3_spider ON b = c ON a = b", m_cpp_ptr = 0x7fc4b0012beb "", m_cpp_tok_start = 0x7fc4b0012beb "", m_cpp_tok_start_prev = 0x7fc4b0012beb "", m_cpp_tok_end = 0x7fc4b0012beb "", m_body_utf8 = 0x0, m_body_utf8_ptr = 0x100002936f993 <error: Cannot access memory at address 0x100002936f993>, m_cpp_utf8_processed_ptr = 0x0, next_state = MY_LEX_END, found_semicolon = 0x0, ignore_space = false, stmt_prepare_mode = false, multi_statements = true, yylineno = 1, m_digest = 0x0, in_comment = NO_COMMENT, in_comment_saved = PRESERVE_COMMENT, m_cpp_text_start = 0x7fc4b0012bea "b", m_cpp_text_end = 0x7fc4b0012beb "", m_underscore_cs = 0x0}, m_yacc = {yacc_yyss = 0x0, yacc_yyvs = 0x0, m_set_signal_info = {m_item = {0x0 <repeats 12 times>}}, m_lock_type = TL_READ_DEFAULT, m_mdl_type = MDL_SHARED_READ}, m_digest_psi = 0x7fc4b0004658}
        net = 0x7fc4b0001098
        error = false
        do_end_of_statement = true
        _db_stack_frame_ = {func = 0x55d3294128bd "do_command", file = 0x55d3294126a8 "/data/src/10.3/sql/sql_parse.cc", level = 2147483650, line = -1, prev = 0x7fc4c16a0df0}
        drop_more_results = false
        __PRETTY_FUNCTION__ = "bool dispatch_command(enum_server_command, THD*, char*, uint, bool, bool)"
        res = <optimized out>
#21 0x000055d3287d3283 in do_command (thd=0x7fc4b0000d90) at /data/src/10.3/sql/sql_parse.cc:1398
        return_value = false
        packet = 0x7fc4b0008f30 "\001"
        packet_length = 92
        net = 0x7fc4b0001098
        command = COM_QUERY
        _db_stack_frame_ = {func = 0x55d3297947d0 "?func", file = 0x55d3297947d6 "?file", level = 2147483649, line = -1, prev = 0x0}
        __PRETTY_FUNCTION__ = "bool do_command(THD*)"
#22 0x000055d328950878 in do_handle_one_connection (connect=0x55d32aadc170) at /data/src/10.3/sql/sql_connect.cc:1403
        create_user = true
        thr_create_utime = 3660717642552
        thd = 0x7fc4b0000d90
#23 0x000055d3289505e3 in handle_one_connection (arg=0x55d32aadc170) at /data/src/10.3/sql/sql_connect.cc:1308
        connect = 0x55d32aadc170
#24 0x000055d3292ffb62 in pfs_spawn_thread (arg=0x55d32abdca40) at /data/src/10.3/storage/perfschema/pfs.cc:1869
        typed_arg = 0x55d32abdca40
        user_arg = 0x55d32aadc170
        user_start_routine = 0x55d3289505b3 <handle_one_connection(void*)>
        pfs = 0x7fc4c59706c0
        klass = 0x55d32a8cf280
#25 0x00007fc4c7812ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140483035272960, -5235067572857094810, 140727913514734, 140727913514735, 140483035271104, 311296, 5246921248254967142, 5246907717992693094}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#26 0x00007fc4c7742def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Reproducible on 10.3-10.10.
The same test case but without primary key on t3_spider causes a syntax error.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-07-24 ]

Thanks elenst. I've created a separated issue for the above crash, MDEV-29163.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-07-26 ]

A tentative fix for the issue: https://github.com/MariaDB/server/commit/38c72f816155ede02bc66fc0cdb9f6567834bab5

Comment by Todd McDowell [ 2023-02-17 ]

Any updates? We're anxiously awaiting a fix. Thanks to all that have and are working this!

Comment by Sergei Golubchik [ 2023-02-17 ]

tjmac1200, the next release is planned for the end of April, so we'll actively start working on bug fixing somewhere mid-March. It's unlikely there will be any updates before that.

Comment by Yuchen Pei [ 2023-07-14 ]

What happens is, when the spider GBH rewrites a query, and when the
query selects from something like

tbl_a join tbl_b on tbl_a.a = tbl_b.c where tbl_a.a in (1),

t1 is optimised into a const table with the condition transformed
into where tbl_b.c = 1, the GBH skips over constant tables
(tbl_a) and only writes tbl_b in the query.

The part in the query rewrite code that differentiates between inner
and outer join is

int spider_db_mbase_util::append_table(
...
    if (
      table_list->outer_join ||
      table_list->on_expr ||
      table_list->join_using_fields
    ) {
// if outer join
// then append "outer join ${table_name}"
// else append "join {table_name}"

With inner join, the if condition does not hold, because on_expr has
also been optimised away[1], so we end up outside of this if, where
the GBH simply append the table name.

The query according to dbug_print:
"select tbl_b.d AS d from tbl_a join tbl_b where tbl_b.c = 1"

The query sent to the data node:
"select t0.`d` `d` from `auto_test_remote`.`tbl_b` t0 where (t0.`c` = 1)"

With outer join, both table_list->outer_join and
table_list->on_expr are non-null, which caused the GBH to append
"outer join ${table_name}".

Here, the query according to dbug_print:
"select tbl_b.d AS d from tbl_a left join tbl_b on(tbl_b.c = 1) where 1"

[1]: the trace at the time when on_expr gets optimised away:

simplify_joins > simplify_joins > JOIN::optimize_inner > JOIN::optimize > mysql_select > handle_select > execute_sqlcom_select > mysql_execute_command > mysql_parse > dispatch_command > do_command > do_handle_one_connection > handle_one_connection > pfs_spawn_thread

Comment by Yuchen Pei [ 2023-07-17 ]

dbug_print() can't really help here because is does not always
print an equivalent query (MDEV-31715), though the spider GBH seems to
be doing similar things.

If we remove the constant table skipping behaviour in spider gbh[1],
we get the same query rewrite result as dbug_print, for both join and
left join:

# original query
SELECT d FROM tbl_a JOIN tbl_b ON a = c WHERE a IN (1);
# dbug_print(join->select_lex): query is not equivalent
select tbl_b.d AS d from tbl_a join tbl_b where tbl_b.c = 1;
# query sent to data node, no skip const, same as dbug_print
select t1.`d` `d` from `auto_test_remote`.`tbl_a` t0 join `auto_test_remote`.`tbl_b` t1 where (t1.`c` = 1)
# query sent to data node, skip const. query is equivalent
"select t0.`d` `d` from `auto_test_remote`.`tbl_b` t0 where (t0.`c` = 1)"
 
# original query
SELECT d FROM tbl_a LEFT JOIN tbl_b ON a = c WHERE a IN (1);
# dbug_print query: query is not equivalent
select tbl_b.d AS d from tbl_a left join tbl_b on(tbl_b.c = 1) where 1;
# query sent to data node, no skip const table, same as dbug_print
select t1.`d` `d` from `auto_test_remote`.`tbl_a` t0 left join `auto_test_remote`.`tbl_b` t1 on (t1.`c` = 1) where 1
# skip const table, missing a table
select t0.`d` `d` from  left join `auto_test_remote`.`tbl_b` t0 on (t0.`c` = 1) where 1

[1] https://github.com/MariaDB/server/commit/5978b71abb9

If we look at how the server handles the select without gbh, e.g. with
the following test.

CREATE TABLE tbl_a (
  a int,
  primary key (a)
);
 
CREATE TABLE tbl_b (
  c int,
  d int,
  primary key (c, d)
);
 
INSERT INTO tbl_a VALUES (1), (2), (3);
INSERT INTO tbl_b VALUES (1, 11), (2, 22), (3, 33);
 
# original left join query, const table optimization
SELECT d FROM tbl_a LEFT JOIN tbl_b ON a = c WHERE a IN (1);
# same query when reaching the exec phase, no const table
select tbl_b.d AS d from tbl_a left join tbl_b on(tbl_b.c = 1) where 1;
 
# original join query, const table optimization
SELECT d FROM tbl_a JOIN tbl_b ON a = c WHERE a IN (1);
# same query when reaching the exec phase, no const table
select tbl_b.d AS d from tbl_a join tbl_b where tbl_b.c = 1;
 
drop table tbl_a,tbl_b;

The relevant bit seems to be the following lines in do_select()

    JOIN_TAB *join_tab= join->join_tab +
                        (join->tables_list ? join->const_tables : 0);
...
      error= join->first_select(join,join_tab,0);

where the first_select was always sub_select() in these examples,
which calls evaluate_join_record(), and and the following line
works on the next join table

    if (found)
    {
      enum enum_nested_loop_state rc;
      /* A match from join_tab is found for the current partial join. */
      rc= (*join_tab->next_select)(join, join_tab+1, 0);

So without const tables, tbl_a is passed to first_select(), and
then tbl_b is passed to tbl_a->next_select(). With const table,
only tbl_b is passed to first_select().

I don't see how this comparison could help with the present ticket,
though, as spider is not in the business of calling these functions...

Comment by Yuchen Pei [ 2023-07-20 ]

For this ticket, we can fix the bug with the following solution:

  • if it's not an outer join, simply skip the const table
  • if it is an outer join, print (select 1) alias

See also the code in print_join().

I have a patch[1] that fixes the relevant testcases, but it fails
regression test spider.direct_right_left_join_nullable. As I look
further into how spider gbh rewrites joins, I found more issues. For
example, here's another testcase the can fail due to nonsensical
rewrite of the table join

--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
 
--connection child2_1
CREATE DATABASE auto_test_remote;
USE auto_test_remote;
 
eval CREATE TABLE t1 (a int) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
eval CREATE TABLE t2 (a int) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
eval CREATE TABLE t3 (a int) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
 
INSERT INTO t1 VALUES (1);
INSERT INTO t2 VALUES (1), (2);
INSERT INTO t3 VALUES (1), (2), (3);
 
--connection master_1
CREATE DATABASE auto_test_local;
USE auto_test_local;
 
eval CREATE TABLE t1 (a int) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='srv "s_2_1", table "t1"';
eval CREATE TABLE t2 (a int) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='srv "s_2_1", table "t2"';
eval CREATE TABLE t3 (a int) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='srv "s_2_1", table "t3"';
 
# ok
select * from t3 left join t1 on t3.a = t1.a left join t2 on t3.a = t2.a;
# parsing error due to gbh rewrite
select * from t1 left join t2 on t1.a = t2.a right join t3 on t3.a = t1.a;
# parsing error due to gbh rewrite
select * from t3 left join (t1 left join t2 on t1.a = t2.a) on t3.a = t1.a;
 
drop table t1, t2, t3;
 
--connection master_1
DROP DATABASE IF EXISTS auto_test_local;
--connection child2_1
DROP DATABASE IF EXISTS auto_test_remote;
 
--disable_query_log
--disable_result_log
--source ../t/test_deinit.inc
--enable_query_log
--enable_result_log

[1] https://github.com/MariaDB/server/commit/66263d01f87

The code of functions called directly or indirectly by
append_from_and_tables_part(), often with names following the
globbing pattern append_*table*() that cause such problems, is
complex and undocumented, and if we just fix the issues in this ticket
with added complexity it could make it harder to fix other bugs in
join rewrite down the road. So I think it would make sense to extend
the scope of this ticket to fix all known issues involving rewriting
joins, possibly by a rewrite of these append_*table*()
functions. I have also labelled bugs caused by spider gbh with
spider-gbh:

https://jira.mariadb.org/issues/?jql=labels+%3D+spider-gbh

Comment by Yuchen Pei [ 2023-07-21 ]

I have been able to re-implement the spider gbh join rewrite in a way
similar to print_join(), see [1] and its commit message.

Some test case exposes existing issues with the item printing.
However, since the scope of the present ticket is to fix the table
join rewrite (i.e. part belonging to
append_from_and_tables_part()), let us defer the fix of item
printing to another ticket.

[1] https://github.com/MariaDB/server/commit/834b989bd9a

Comment by Yuchen Pei [ 2023-07-26 ]

I'll put a bookmark here and come back to this task in September.

To summarise:

  • The surface issue is that the current spider group by handler (GBH)
    does not handle constant tables correctly in its query rewrite.
  • The deeper issue is that the spider GBH query rewrite is
    unnecessarily complex legacy undocumented code. Simply trying to fix
    the surface issue will make it more complex and potentially more
    fragile: https://github.com/MariaDB/server/commit/66263d01f87
  • My proposed solution is to rewrite the spider GBH query rewrite,
    modelled after dbug_print() but adapting to spider's needs
  • Thee scope of this ticket should be fixing the join rewrite only
    (i.e. part belonging to append_from_and_tables_part()) without
    introducing regressions
  • Current wip poc is
    https://github.com/MariaDB/server/commit/3527a48b5fa
Comment by Yuchen Pei [ 2023-09-21 ]

Here's a little demo showing disabling the spider GBH "fixes" the
issue. It could be a fallback / workaround, and it would be
interesting to see which other gbh bugs can also be mitigated by
this workaround.

637c1a3c315 upstream/bb-11.0-ycp-mdev-26247-disable-gbh MDEV-26247 [demo] disabling the spider gbh fixes the issue

update on [2023-09-22 Fri]: I confirm that all other bugs current
labeled with spider-gbh can be mitigated by this workaround:

So it makes sense to add a system variable e.g.
spider_disable_group_by_handler. I have opened MDEV-32238 for this.

Comment by Yuchen Pei [ 2023-09-22 ]

The previous patch from July caused regressions, caused by spider
field chain not in the same order between the two passes: spider gbh
creation (during optimizer phase) and the call to init_scan() (during
exec phase).

On further inspection, the sole purpose of the field chains (and field
holders) is to call append_column_name_with_alias() on the correct
spider_db_share, with the correct table aliases passed. However, such
information is already available in spider_fields::table_holder, so
the field chains in the first pass are redundant. The reason they were
used is probably some kind of optimization. But the time complexity is
the same if we skip this step and do linear lookups on tables. So we
do the latter and get rid a whole lot of code (see commit 14eed9fe44a
below). This also fixes regressions on the tip (139967b1d1d below).

The current WIP commits are

139967b1d1d upstream/bb-11.0-ycp-mdev-26247 MDEV-26247 [PoC] Re-implement spider gbh query rewrite of tables
c60ac6382f4 MDEV-26247 [wip] Spider gbh should send correct queries involving const tables
e0e0135981e MDEV-26247 clean up spider_group_by_handler::init_scan()
14eed9fe44a MDEV-26247 [wip] Spider gbh query rewrite should get table for fields in a simple way
6cf66aa6761 MDEV-26247 Remove two unused methods of spider_fields

Comment by Yuchen Pei [ 2023-09-27 ]

We have a series of patches that works pretty well with existing
cases and cases reported in this ticket:

51c98783527 upstream/bb-11.0-ycp-mdev-26247 MDEV-26247 [wip] Re-implement spider gbh query rewrite of tables
581da268ae4 MDEV-26247 clean up spider_group_by_handler::init_scan()
eb8df64200c MDEV-26247 [wip] Spider gbh query rewrite should get table for fields in a simple way
67fb0378d58 MDEV-26247 Remove some unused methods

But they fail a case where the optimizer changes certain valid joins
into invalid ones before asking the storage engine to execute the
query.

Here's an example:

--echo #
--echo # MDEV-26247 Spider: Valid LEFT JOIN results in ERROR 1064
--echo #
 
--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
 
--connection child2_1
CREATE DATABASE auto_test_remote;
USE auto_test_remote;
 
eval CREATE TABLE t1 (a int) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
eval CREATE TABLE t2 (a int) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
eval CREATE TABLE t3 (a int) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
 
INSERT INTO t1 VALUES (1);
INSERT INTO t2 VALUES (1), (2);
INSERT INTO t3 VALUES (1), (2), (3);
 
--connection master_1
CREATE DATABASE auto_test_local;
USE auto_test_local;
 
eval CREATE TABLE t1 (a int) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='srv "s_2_1", table "t1"';
eval CREATE TABLE t2 (a int) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='srv "s_2_1", table "t2"';
eval CREATE TABLE t3 (a int) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='srv "s_2_1", table "t3"';
 
# ok
/* 1 */ select * from t3 left join t1 on t3.a = t1.a left join t2 on t3.a = t2.a;
 
# fails
/* 2 */ select * from t1 left join t2 on t1.a = t2.a right join t3 on t3.a = t1.a;
 
# equivalent to the previous query, fails
/* 3 */ select * from t3 left join (t1 left join t2 on t1.a = t2.a) on t3.a = t1.a;
 
drop table t1, t2, t3;
 
--connection master_1
DROP DATABASE IF EXISTS auto_test_local;
--connection child2_1
DROP DATABASE IF EXISTS auto_test_remote;
 
--disable_query_log
--disable_result_log
--source ../t/test_deinit.inc
--enable_query_log
--enable_result_log

The 2nd and 3rd selects fail because after optimizer stage the select
query is printed as

select t1.a AS a,t2.a AS a,t3.a AS a from t3 left join (t1 left join t2 on(t2.a = t3.a)) on(t1.a = t3.a) where 1;

which will fail if you directly try to execute this query because
t3.a is not a valid column at the context of the t1 left join t2.This
is caused by the optimizer replaces t1.a = t2.a with multiple
equal(t3.a, t1.a, t2.a) in build_equal_items(), then replaces the
multiple equality with t2.a = t3.a in
substitute_for_best_equal_field().

Unexecutable dbug_print() output is a symptom, but what happens is the
underlying select_lex/join has been modified by the optimizer in an
inconsistent way at a high level, i.e. the
join->tables_list->join_list->elem(0)->on_expr is not valid for
the join. And this could be problematic if the inconsistency remains
at a certain time, e.g. when we are in do_select() and outsourcing the
exec phase to a storage engine, because the latter has no idea how to
recover to an executable query (or has it?):

do_select(JOIN *join, Procedure *procedure)
{
...
  if (join->pushdown_query)
  {
...
    /* The storage engine will take care of the group by query result */
    int res= join->pushdown_query->execute(join);
 
    if (res)
      DBUG_RETURN(res);
...
  }

Opened MDEV-32273.

Update on [2023-09-28 Thu]: we have a fix for this specific case, by
tracking the tables encountered so far:

028b6e8bc9e upstream/bb-11.0-ycp-mdev-26247 MDEV-26247 Add detection of reference to outer fields in join on_expr

Comment by Yuchen Pei [ 2023-09-28 ]

Given this is a rewrite, it makes sense to check as many spider GBH
issues as possible, and cover the relevant ones with the fix for this
ticket.

Tickets that should be covered are listed as a part of this ticket.
Of known ones, only MDEV-29163 needs to be fixed by covering
eliminated tables.

Tickets that we still want to check whether they should be part of
this ticket: MDEV-29546, MDEV-26345. These are irrelevant.

Comment by Yuchen Pei [ 2023-10-05 ]

Hi holyfoot, ptal thanks:

1. 88765e7d583 upstream/bb-11.0-mdev-26247 MDEV-26247 Re-implement spider gbh query rewrite of tables
2. fe3d8dcf492 MDEV-26247 clean up spider_group_by_handler::init_scan()
3. 5b486c4893b MDEV-26247 Clean up spider_fields
4. dd9eea879e3 MDEV-26247 Remove some unused spider methods
---- no review needed for the following two commits ----
5. dfb5616184e MDEV-29502 Fix some issues with spider direct aggregate
6. a88b8bdd705 MDEV-31673 MDEV-29502 Remove spider_db_handler::need_lock_before_set_sql_for_exec

Of these commits, 1 is the actual bugfix, and it also fixes tickets included in this ticket. 2, 3, and 4 are cleanups. 5 and 6 are from MDEV-29502 which is fixed but have not propagated to 11.0 yet, and they do not require review. They also help make MDEV-26345 fail in the correct way (wrong results), rather than crash.

Tested on ES-23.08 and 10.4 too:

ES-23.08 (a slightly earlier version)

827da11c103 upstream/23.08-enterprise-mdev-26247 MDEV-26247 Re-implement spider gbh query rewrite of tables
4b5fb4bb7ef MDEV-26247 clean up spider_group_by_handler::init_scan()
5638e375b8b MDEV-26247 Clean up spider_fields
8da5898302e MDEV-26247 Remove some unused spider methods
53294d52f8e MDEV-28998 remove a known reference to a SPIDER_CONN when it is freed

10.4

1ea1d1f6bf1 upstream/bb-10.4-ycp MDEV-26247 Re-implement spider gbh query rewrite of tables
a0266d3f5b1 MDEV-26247 clean up spider_group_by_handler::init_scan()
b1ecfa0ce02 MDEV-26247 Clean up spider_fields
788354732db MDEV-26247 Remove some unused spider methods

Comment by Alexey Botchkov [ 2023-10-15 ]

Agree with the solution.
Still there are two issues about the patch in the comment.

Comment by Yuchen Pei [ 2023-10-18 ]

Hi holyfoot, thanks for the comment, which I have addressed at [1]. ptal the updated commits, thanks:

11fde0f8bd7 upstream/bb-11.0-mdev-26247 MDEV-26247 Re-implement spider gbh query rewrite of tables
fe3d8dcf492 MDEV-26247 clean up spider_group_by_handler::init_scan()
5b486c4893b MDEV-26247 Clean up spider_fields
dd9eea879e3 MDEV-26247 Remove some unused spider methods

[1] https://github.com/MariaDB/server/commit/88765e7d583e3aeae59be0c5f1d3f2894fea72f9#commitcomment-130266611

Comment by Alexey Botchkov [ 2023-10-30 ]

ok to push.

Comment by Yuchen Pei [ 2023-11-17 ]

pushing the following to 10.4

178396573a1 upstream/bb-10.4-mdev-26247 MDEV-26247 Re-implement spider gbh query rewrite of tables
0bacef76177 MDEV-26247 clean up spider_group_by_handler::init_scan()
2d1e09a77f0 MDEV-26247 Clean up spider_fields
8c1dcb2579e MDEV-26247 Remove some unused spider methods

conflict solutions:

  • 10.4->10.5: 1328e6298651fa92af92605153f5119b9c33cb97
  • 10.5->10.6: 90d6006cbfffdb9226a822b8842f474f8cfe5168
  • 10.6->10.11: 21671e090b9e1596f8e355541788b7b0f2dcfb54
  • 10.11->11.0: 9c7bf73f66e991c9697c1c278581f2dd8b46b267
Generated at Thu Feb 08 09:43:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.