[MDEV-25985] Spider handle ">=" as ">" in some cases Created: 2021-06-22 Updated: 2021-09-17 Resolved: 2021-07-14 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - Spider |
| Affects Version/s: | 10.5.9, 10.6.2, 10.2.39, 10.3.30, 10.4.20 |
| Fix Version/s: | 10.2.40, 10.3.31, 10.4.21, 10.5.12, 10.6.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Richard Stracke | Assignee: | Nayuta Yanagisawa (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Attachments: |
|
| Description |
|
Testcase to reproduce:
Happens on Spider 3.3.15 and 3.4 |
| Comments |
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've tested on 10.4.20 (why not 10.5.9? Because I've not constructed a test environment for 10.5. I will soon test on 10.5.). The bug reproduced.
When I executed the above query on the Spider node, the following corresponding query was executed on the data node. The clause c `col3` > _latin1'2021-04-01 00:00:00' seems to be the source of the problem.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Richard Stracke [ 2021-07-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Interesting point here is, that adding keyword limit with random number gives the correct result SELECT DISTINCT A.col1, A.col2, B.col3 FROM test.local_tab A INNER JOIN test.spider_data_tab2 B ON A.col1 = B.col1 AND A.col2 = B.col2 AND A.col1 IN ('00166' , '00174') AND B.col4 <> 1 AND B.col3 >= '2021-04-01 00:00:00' AND B.col3 <= '2021-04-03 00:00:00' limit 5555; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Richard I'm afraid that you may be querying the data node because spider_data_tab2 is on the data node. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The bug is reproducible even on 10.2.39 and 10.3.30. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I generated, on 10.2 HEAD(99f700a820ef90b5b36ef765fb1532145ab3e907), the debug trace of the query that is causing the problem. A partial trace is attached to the present issue. By combining the debug trace and GDB debugging on 10.2 HEAD, I found that the wrong condition (`col3` > _latin1'2021-04-01 00:00:00') is added by the function spider_db_append_key_where_internal(), especially by spd_db_conn.cc: L1867-L1925.
My insight is supported by the fact that the following modified version of the query (B.col4 <> 1 is removed) works as expected. The Spider also calls the function spider_db_append_key_where_internal() for the modified query but does not goes into the problematic part, L1867-L1925.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Allen Lee (Inactive) [ 2021-07-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
nayuta-yanagisawa is there any reason you've removed condition (B.col4 <> 1)? I thought this is the key problem causing different result? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
> Nayuta Yanagisawa is there any reason you've removed condition (B.col4 <> 1)? I tested the modified query just because it was suggested as a workaround in the support ticket. I need more time to detect the root cause. > Also, do you also know why `_latin1` is added to query execution though both spider and data node uses utf8? I've not found why `_latin1` is added. I will investigate it later. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The bug is reproducible, on 10.2 HEAD, without the local table, while the table definitions and data inserted are the same as the ones in the issue description. On data node:
On Spider node:
The corresponding query executed on the data node is the following:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I took a deeper look at the bug. The function spider_db_append_key_where_internal() converts HA_READ_AFTER_KEY to > (see the code block above).The conversion seems to be correct for single-column indexes because HA_READ_AFTER_KEY means read the key after the provided value. However, how about multi-column indexes? Assume that there is a multi-column index on c1 and c2 and we would like to search with WHERE c1 >= 100 AND c2 > 200. I think that the key_range.flag corresponds to the search condition could be HA_READ_AFTER_KEY . In such a case, we could not simply convert HA_READ_AFTER_KEY to >. I think that the bug reproduction without the local table is probably a real world example of the case. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This seems to be also possible for HA_READ_BEFORE_KEY. See the following two queries on the spider node and the data node (tested on 10.2 HEAD):
EDIT: This seems to be a bug, but I noticed that this may be another bug. That is because, when executing the first query on the Spider node, the server didn't go into case HA_READ_BEFORE_KEY: and the corresponding query executed on the data node is the following:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Simply replacing SPIDER_SQL_LT_XX by SPIDER_SQL_LTEQUAL_XX does not work. That result in failures of existing tests, say spider.direct_aggregate and spider.direct_aggregate_part.
Output of spider.direct_aggregate:
NOTE: The output of spider.direct_aggregate_part is almost the same as one of spider.direct_aggregate. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
serg I've got stuck now. Do you have any idea about why the above fix (comment-193668) doesn't work? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I just asked a question, but I think I found the answer by myself. The fix is based on the assumption that MariaDB's executor has the capability to filter out rows that do not match search conditions. However, while this is just a guess from the test name, the Spider SE seems to push down some aggregation functions. In such cases, we have to push down all exact search conditions. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I got a bit confused in a weird way while I've already got to the heart of the issue. In conclusion, the correct conversion seems to be to convert HA_READ_AFTER_KEY to '>' only for the last column in tgt_key_part_map and to convert HA_READ_AFTER_KEY to '>=' for the other column. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I minimized the test case:
The corresponding query executed on the data node:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I created another JIRA issue related to the present issue: MDEV-26111 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Another buggy behavior here, while the table difinitions and data are the same as ones in comment-193869.
The corresponding query executed on the data node:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I pushed a patch for the present issue: https://github.com/MariaDB/server/commit/da70f190279738804b7ddbc15a71e1c3cf088dd0 Most of the files in `storage/spider/mysql-test/spider/bugfix` is copies of these on 10.3 HEAD. So, the fundamental differences are in:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
serg According to your review comment, I merged the test case for the issue into the existing test for partitioned tables. The code is as is, only the test is updated. https://github.com/MariaDB/server/commit/737ced02f897ce757fd66330fd6843c5b2e5b834 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2021-07-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
737ced02f89 is ok to push | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-07-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Merged https://github.com/MariaDB/server/commit/0f6e170c34700a2964556839c676c1b7768f3ffb |