[MCOL-790] DISTINCT in subquery is optimized away Created: 2017-06-28 Updated: 2017-08-09 Resolved: 2017-08-09 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | MariaDB Server |
| Affects Version/s: | None |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Major |
| Reporter: | Developer | Assignee: | Unassigned |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Description |
| Comments |
| Comment by David Thompson (Inactive) [ 2017-06-28 ] | ||||||||||||
|
Hi Shashank, there are a few clues if you look at the difference in trace results. Take a look at this kb article on tuning config: Generally all of these are applicable to both infinidb and columnstore so can you compare the values in your infinidb calpont.xml with cs columnstore.xml? For example you can see that the HJS primitive step is executing on the UM in columnstore but is distributed on the PM in infinidb, i suspect you probably have a larger PmMaxMemorySmallSide setting on infinidb. Also if you look at the first BPS column scan entry you can see that the elapsed rows count is a lot larger so there is a much larger intermediate result set: even on the rewrite this is a lot bigger. How many rows would you normally expect for this on the subquery? It could still mostly be the fact that the join is executing on the um vs the pm so transferring the data is the main overhead. | ||||||||||||
| Comment by Developer [ 2017-06-28 ] | ||||||||||||
|
Also if you look at the first BPS column scan entry you can see that the elapsed rows count is a lot larger so there is a much larger intermediate result set: -------------------------------------------------- I am worried about above point only. Why same query elapsed small amount of rows in infinidb database which has same records. I have just exported from infinidb and imported to mariadb. BPS PM ferrero_mults 30228 (GID,SNO) 0 76935 0 1.317 616380 *(InfiniDB Result) * Thatswhy I am worried does any issue in my mariadb setup? As I mentioned I have multiple PM and each has 1 dbroot attached with them. So does it data distribution issue? or anything else? ---------------------------------------------------------------- For this I can say I can expect near about 10000 rows in subquery. | ||||||||||||
| Comment by David Thompson (Inactive) [ 2017-06-28 ] | ||||||||||||
|
can you confirm on the tuning settings i referenced and if they are different in infinidb from cs? Lets exclude that first. Also how did you migrate - did you use our procedure on the knowledge base? | ||||||||||||
| Comment by Developer [ 2017-06-28 ] | ||||||||||||
|
can you confirm on the tuning settings i referenced and if they are different in infinidb from cs? Lets exclude that first. Okay that I will check and confirm you. Also how did you migrate - did you use our procedure on the knowledge base? I am exporting data using SELECT * INTO OUTFILE 'tablename.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n' FROM tablename; Than loading data to columnstore as LOAD DATA INFILE 'tablename.csv' INTO TABLE tablename FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n'; I am changing ENGINE in table structure as ColumnStore from InfiniDB ) ENGINE=ColumnStore DEFAULT CHARSET=utf8; Thank you. | ||||||||||||
| Comment by David Thompson (Inactive) [ 2017-06-28 ] | ||||||||||||
|
using that migration path, it's possible the data might have become reordered so if you were relying on partition eliminatiion in infinidb, that could have been changed in the migration. You could use editem to compare: | ||||||||||||
| Comment by Developer [ 2017-06-29 ] | ||||||||||||
|
Hi David, I have verified infinidb and mariadb .xml files both has same settings. The variables which you mentioned "PmMaxMemorySmallSide" that is also matching. I have found 2 variables that are not matching after fixing that I have restarted mariaDB system but still same issue of long execution in case of subquery. Even I noticed that when I run query having subquery though all 3 PM resources(CPU and Memory) are free that are not used by query. I have also look onto editem which you told but that is also not giving any output. I have run below command that just output nothing. --------------------
--------------------
root@ip-172-31-0-155:/home/ubuntu# /usr/local/mariadb/columnstore/bin/editem -o 3157 -t Please check and let me know your view. | ||||||||||||
| Comment by Developer [ 2017-06-30 ] | ||||||||||||
|
Hi David, can you please check above comments? Also I need to clear few concepts and doubts as below. Please help me to clear concepts. Thank You. | ||||||||||||
| Comment by Developer [ 2017-07-03 ] | ||||||||||||
|
Hi David, Can you please respond to above points? Thank You. | ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-07-04 ] | ||||||||||||
|
OK, the cause is due to optimizer differences between MySQL 5.1 which InfiniDB uses and MariaDB 10.1 that ColumnStore uses. The DISTINCT in the subquery gets optimized away due to this: https://mariadb.com/kb/en/mariadb/optimizing-group-by/ This causes the DISTINCT to get dropped. In your case the WHERE condition in the subquery is matching every row in the table and you are relying in DISTINCT to filter. Assuming these are INTs that means 350MB of data gets transferred from the PMs to the UM and this entire set is used in the join. I think the fix for this would be to either implicitly add the DISTINCT or run a TAS step in the PM and UM. We need to think through the possible scenarios this will work and break first. The workaround is to rewrite the query as the join as you have indicated in the mailing list. Your questions from the 30th are being tracked in | ||||||||||||
| Comment by Developer [ 2017-07-04 ] | ||||||||||||
|
The workaround is to rewrite the query as the join as you have indicated in the mailing list. As you can find that I have also send you the trace report for join query on MariaDB that is taking more time than InfiniDB why? can you please provide me details for that? | ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-07-04 ] | ||||||||||||
|
In your InfiniDB setup you are using a single server, in ColumnStore you are using multi server. The extra time is likely the network round trip time to send the 27228 blocks of data (213MB). If your data was sorted before inserted in such a way that blocks could be eliminated (sorting on GID) then the query would likely run a little faster. This is dependant on the range of values for GID. | ||||||||||||
| Comment by Developer [ 2017-07-05 ] | ||||||||||||
|
Hi Andrew, I have one question as per my current architecture (Multi Node 1 UM and 3 PM) if I need to modify variable like key_buffer_size, sort_buffer_size, read_buffer_size How I need to modify those? can you please help me? Do I need to update this is my.cnf file for all four server? How can I check if its updated or not? Please help me. Thank You. | ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-07-05 ] | ||||||||||||
|
Since you only have 1 UM you only need to set those in /usr/local/mariadb/columnstore/mysql/my.cnf on the UM. | ||||||||||||
| Comment by Developer [ 2017-07-06 ] | ||||||||||||
|
okay thanks. can you please suggest some variables which I can upgrade to improve query performance on my UM by considering old discussion? Thank you. | ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-07-06 ] | ||||||||||||
|
this isn't relevant to the bug report. Can we please take this to the Google Group. Otherwise it is going to be much harder to track the real information about this bug. | ||||||||||||
| Comment by Developer [ 2017-07-06 ] | ||||||||||||
|
Okay fine I have added post there. Please reply me there. Thank You. |