[MDEV-8954] unnecessary fetch of entire table - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.1(EOL)
Fix Version/s: 10.2.6
Component/s: Storage Engine - Spider
Labels:
None
Environment:
Ubuntu

Sprint:
2017-02, 10.1.22

Description

set global optimizer_switch="engine_condition_pushdown=on"

Ouch! It is now pulling back the entire table 200 rows at a time.

Query: SELECT DISTINCT x FROM table
PRIMARY KEY (x,y,z)
Backend queries: SELECT x FROM table x > 12345 ORDER BY x,y,z LIMIT 200.

(And it is taking much longer than when ecp=OFF)

Question 1: Why fetch the table rather than executing the query?

Question 2: When you have a multi-part PK, that code is likely to be "wrong". Note that it always compared only x, not y and z. And it always said 200.

Attachments

Issue Links

causes

MDEV-13282 post-fix for MDEV-8954

Closed

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2015-10-17 15:41 - edited

rjasdf,

Please note that engine_condition_pushdown switch has been deprecated since 10.1.1. As you can see from the explanation, it is not expected to produce any optimization, only an overhead.

I will assign the report to psergey anyway, in case he wants to look into it and see if there is any problem outside the scope of engine_condition_pushdown switch. If it turns out to be solely the switch'es guilt, I recommend closing it as 'Won't fix'.

rjasdf, please also specify the version where you observed the problem.

Elena Stepanova added a comment - 2015-10-17 15:41 - edited rjasdf , Please note that engine_condition_pushdown switch has been deprecated since 10.1.1 . As you can see from the explanation, it is not expected to produce any optimization, only an overhead. I will assign the report to psergey anyway, in case he wants to look into it and see if there is any problem outside the scope of engine_condition_pushdown switch. If it turns out to be solely the switch'es guilt, I recommend closing it as 'Won't fix'. rjasdf , please also specify the version where you observed the problem.

Rick James (Inactive) added a comment - 2015-10-20 18:01

OK, engine_condition_pushdown may be a red herring. Still, I have 2
questions that are not necessarily related to that.

On Mon, Oct 19, 2015 at 5:47 PM, Elena Stepanova (JIRA) <

Rick James (Inactive) added a comment - 2015-10-20 18:01 OK, engine_condition_pushdown may be a red herring. Still, I have 2 questions that are not necessarily related to that. On Mon, Oct 19, 2015 at 5:47 PM, Elena Stepanova (JIRA) <

Sergei Petrunia added a comment - 2015-11-11 23:53

This looks like a problem inside Spider. If the query is SELECT DISTINCT x FROM table, there are no conditions that SQL layer could push (or not push) down to Spider.

I am not sure if I should attempt to fix Spider bugs.

Sergei Petrunia added a comment - 2015-11-11 23:53 This looks like a problem inside Spider. If the query is SELECT DISTINCT x FROM table , there are no conditions that SQL layer could push (or not push) down to Spider. I am not sure if I should attempt to fix Spider bugs.

Jacob Mathew (Inactive) added a comment - 2017-01-06 22:48

I'm attempting to reproduce the bug.

Jacob Mathew (Inactive) added a comment - 2017-01-06 22:48 I'm attempting to reproduce the bug.

Jacob Mathew (Inactive) added a comment - 2017-01-07 02:20

I don't think that engine condition pushdown is pertinent, as the setting is deprecated and is therefore always ON.

I am working with a table with 1 local node and 1 remote node, with 3 INT columns that comprise the primary key. The table has 2000 rows, where each distinct value of column 1 appears 10 times in the table. My SELECT statement has a WHERE clause that qualifies the greatest 10 of those distinct values. Index fetch next is executed in the remote table exactly 100 times, once for each occurrence of the 100 rows that qualify the WHERE clause. This seems like correct behavior.

Jacob Mathew (Inactive) added a comment - 2017-01-07 02:20 I don't think that engine condition pushdown is pertinent, as the setting is deprecated and is therefore always ON. I am working with a table with 1 local node and 1 remote node, with 3 INT columns that comprise the primary key. The table has 2000 rows, where each distinct value of column 1 appears 10 times in the table. My SELECT statement has a WHERE clause that qualifies the greatest 10 of those distinct values. Index fetch next is executed in the remote table exactly 100 times, once for each occurrence of the 100 rows that qualify the WHERE clause. This seems like correct behavior.

Jacob Mathew (Inactive) added a comment - 2017-01-17 00:25

I continued testing with a 10000-row table. All columns in every row have a nonzero value. SELECT DISTINCT C1 FROM ST WHERE C1 > 0 ORDER BY C1,C2 visits each row once. Removal of the ORDER BY, the WHERE clause, or both of them results in each row being visited many, many times. This is the bug, and it pertains to DISTINCT. Now that I understand the problem, I can reduce the table size to continue debugging it.

Jacob Mathew (Inactive) added a comment - 2017-01-17 00:25 I continued testing with a 10000-row table. All columns in every row have a nonzero value. SELECT DISTINCT C1 FROM ST WHERE C1 > 0 ORDER BY C1,C2 visits each row once. Removal of the ORDER BY, the WHERE clause, or both of them results in each row being visited many, many times. This is the bug, and it pertains to DISTINCT. Now that I understand the problem, I can reduce the table size to continue debugging it.

Jacob Mathew (Inactive) added a comment - 2017-01-17 23:25

Testing with a 15-row table and lots of breakpoints, I can see what is happening. For SELECT DISTINCT, for a table that contains 5 distinct values (in my test case 1..5) for the first column, we get 5 queries sent to the back-end spider servers that access the table rows as follows:

1. All rows with values 1..5 for the first column value.
2. All rows with values 2..5 for the first column value.
3. All rows with values 3..5 for the first column value.
4. All rows with values 4..5 for the first column value.
5. All rows with values 5..5 for the first column value.

Jacob Mathew (Inactive) added a comment - 2017-01-17 23:25 Testing with a 15-row table and lots of breakpoints, I can see what is happening. For SELECT DISTINCT, for a table that contains 5 distinct values (in my test case 1..5) for the first column, we get 5 queries sent to the back-end spider servers that access the table rows as follows: 1. All rows with values 1..5 for the first column value. 2. All rows with values 2..5 for the first column value. 3. All rows with values 3..5 for the first column value. 4. All rows with values 4..5 for the first column value. 5. All rows with values 5..5 for the first column value.

Jacob Mathew (Inactive) added a comment - 2017-02-02 21:44

After doing more extensive testing with different spidered configurations, it is clear that the reported performance problem occurs only with the basic, unpartitioned configuration. I have augmented my fix to pertain to only this configuration.

I have also found other problems in the basic, unpartitioned configuration pertaining to table statistics. These problems cause the optimizer to choose the wrong query plan and yield incorrect query results. The problems disappear after the next background refresh of the table statistics from the back-end server.

Jacob Mathew (Inactive) added a comment - 2017-02-02 21:44 After doing more extensive testing with different spidered configurations, it is clear that the reported performance problem occurs only with the basic, unpartitioned configuration. I have augmented my fix to pertain to only this configuration. I have also found other problems in the basic, unpartitioned configuration pertaining to table statistics. These problems cause the optimizer to choose the wrong query plan and yield incorrect query results. The problems disappear after the next background refresh of the table statistics from the back-end server.

Jacob Mathew (Inactive) added a comment - 2017-02-07 01:02

I have committed locally my changes to fix the reported problem, awaiting review.

Jacob Mathew (Inactive) added a comment - 2017-02-07 01:02 I have committed locally my changes to fix the reported problem, awaiting review.

Jacob Mathew (Inactive) added a comment - 2017-04-26 21:56

The fix was committed in: https://github.com/MariaDB/server/commit/ab841ac6c7eed993d4a69f8f205b97d43ebc3847 and merged into the 10.2-spider branch. Note that this will be released in a 10.3 release of MariaDB.

Jacob Mathew (Inactive) added a comment - 2017-04-26 21:56 The fix was committed in: https://github.com/MariaDB/server/commit/ab841ac6c7eed993d4a69f8f205b97d43ebc3847 and merged into the 10.2-spider branch. Note that this will be released in a 10.3 release of MariaDB.

Elena Stepanova added a comment - 2017-05-01 00:28

It shouldn't really be closed until it makes to a main tree (10.3 in this case).

Elena Stepanova added a comment - 2017-05-01 00:28 It shouldn't really be closed until it makes to a main tree (10.3 in this case).

Jacob Mathew (Inactive) added a comment - 2017-05-01 18:57

Note that builds from the 10.2-spider branch will be given to customers. We therefore need to be able to mark bugs fixed in that branch as CLOSED.

Jacob Mathew (Inactive) added a comment - 2017-05-01 18:57 Note that builds from the 10.2-spider branch will be given to customers. We therefore need to be able to mark bugs fixed in that branch as CLOSED.

Michael Widenius added a comment - 2017-07-06 11:44

Code needs to fixed according to review

Michael Widenius added a comment - 2017-07-06 11:44 Code needs to fixed according to review

Sergei Golubchik added a comment - 2017-07-10 06:35

after-review fixes are in ~~MDEV-13282~~

Sergei Golubchik added a comment - 2017-07-10 06:35 after-review fixes are in MDEV-13282

People

Assignee:: Jacob Mathew (Inactive)

Reporter:: Rick James (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2015-10-16 21:30

Updated:: 2017-07-10 06:35

Resolved:: 2017-07-10 06:35

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server