[MDEV-18844] Implement EXCEPT ALL and INTERSECT ALL operations Created: 2019-03-07 Updated: 2020-11-18 Resolved: 2019-09-10 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Fix Version/s: | 10.5.0 |
| Type: | Task | Priority: | Major |
| Reporter: | Igor Babaev | Assignee: | Igor Babaev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | gsoc19 | ||
| Issue Links: |
|
||||||||||||
| Description |
|
SQL Standard allows to use EXCEPT ALL and INTERSECT ALL as set operations. The goal of this task is to support EXCEPT ALL and INTERSECT ALL |
| Comments |
| Comment by KHAMAR ALI SHAIKH [ 2019-03-17 ] |
|
Hi Igor , i am student from RV college of engineering banglore . I am interested to be a part of GSOC 2019 and want to take this feature implementation for the same . Please guide me for the further implementations or proceedings. Thanks , |
| Comment by Igor Babaev [ 2019-03-18 ] |
|
Hi Khamar, I'm glad that you've chosen this task. For the beginning look how other set operations were incorporated into MariaDB. See sql_yacc.yy Regards, |
| Comment by KHAMAR ALI SHAIKH [ 2019-03-18 ] |
|
Hi Igor , |
| Comment by Igor Babaev [ 2019-03-18 ] |
|
Khamar, What exactly problems do you experience with enrollment into GSoC 2019? |
| Comment by KHAMAR ALI SHAIKH [ 2019-03-19 ] |
|
Hi Igor , I am not facing any problems with enrollment into GSoc 2019 , the thing i am asking for is when i submit the project proposal for GSOC it would be very helpful if you could accept the proposal as i ll be working on the project "Implement EXCEPT ALL and INTERSECT ALL operations" . The application for GSoc wont be accepted unless mentoring organization accepts it |
| Comment by Vicențiu Ciorbaru [ 2019-03-25 ] |
|
Hi khamar! As you probably know from GSoC website, students get accepted based on their proposals and slot allocation from Google. Naturally we will accept well written proposals. In order for you to get a good proposal, I suggest you submit a draft as soon as possible and ask plenty of questions. We will provide feedback for you to improve on it. It helps to give you a better chance if you also take one of the beginner friendly bugs we have available and submit a patch for that. The list for those is here: Take note that we value community interaction a lot. Looking forward to seeing your proposal. Vicențiu |
| Comment by Ruihang Xia [ 2019-06-04 ] |
|
For `A EXCEPT ALL B`, turn A (now A is a list of records) into a temporary table with an extra field on each row that stands for how many same record there are. Then for each record in B (B is a list of records too) try to find it in the temporary table. If find decrease that extra field's number by 1. Means delete the record. After all check the temporary and delete all rows whose extra field less than zero. |
| Comment by Ruihang Xia [ 2019-06-09 ] |
Details about two new fields.Now there are two extra field in the temporary table. One is mentioned in Idea for EXCEPT ALL. Another one is sepcially add for INTERSECT ALL. Let's call the first one "counter" and the new one "intersect counter". Counter stands for how many duplicate of a record there are. Intersect counter is a temporary variable. What it actually counts is the duplicate number in B. Counter is new added in this work. intersect counter can use the field count already exist, added by Oleksandr when implementing INTERSECT and EXCEPT. Or another way, put two logical field into one physical field. Idea for INTERSECT ALLFor `A INTERSECT ALL B`, now A is a temporary table and B is a list of records. First set field intersect counter to zero as initialization. Then go through B. For each record in B try to find it in A. If find increase intersect counter by one. After all set counter to the minimal between counter and intersect counter. |
| Comment by Ruihang Xia [ 2019-06-16 ] |
|
Examples:
Failed test cases:
|
| Comment by Ruihang Xia [ 2019-06-24 ] |
|
Test cases added and passed. Only one bug left (output mismatch in main.limit_rows_examined). Prepare to add a pull request. |
| Comment by Ruihang Xia [ 2019-07-01 ] |
|
In this phase, I spent approximately 50 hours. A major part of the time was spent on running test and solve bugs founds by test cases. This part is a bit falling behind the schedule. In my perspective, this can blame on my lack of ability of debugging and not familiar to codes around my work. Thanks to my mentors for guiding me when I run into an impasse and giving me advice that prevents me from detours. That helps a lot. |
| Comment by Ruihang Xia [ 2019-07-19 ] |
|
I write the simplification of the sequence of set operations and add another test case for it. Make sure the test works by running the coverage test. In my implementation, I get all six types of set operations involved. But this may have some problem in both passing test and performance. Treat complex and easy operation equally will bring some extra calculation. So I need to bypass some operation. This is not finished yet and I'll continue working on it. |
| Comment by Ruihang Xia [ 2019-07-21 ] |
|
I spent approximately 30 hours in the optimize phase. Following are something I learned from my work. I'm working with a huge system. In the early stage, I focus on how to make my part works. After this work was done, however, I find it is also important that make my part work with the whole system. I mixed up my logits with those already existed and broken something without aware of it. This brings me some trouble but also teaches me not to change things unless I'm clear about what I'm doing. The second thing is about the test case. Under my mentor's help, I find lots of bugs not checked by test cases. Some of them are covered by test case but not gives an error because the test case is too shallow. Though this is unavoidable in some aspect we still need to write test case as more and detailed as we can. Then is the comment. When writing code I may leave just a few rough words. Some have ambiguity and some do not tell what those codes actually do. Comments like this are not very helpful. So I need to make my comment more detailed and more helpful. |
| Comment by Ruihang Xia [ 2019-07-29 ] |
|
This week I try to refine my code. Include deciding the routine of processing a record in advance rather than do it every time before execution or reduce the number of loops etc. |
| Comment by Ruihang Xia [ 2019-08-14 ] |
|
In past days I change the code with my mentor. Run the test with a new option "--ps-protocol" and make it passed. Remove a callback function in send_data() and some dead code to make the code more clear. And changed some logits may lead to error or make the program run slower like unfolding duplicate records as soon as possible or adding a flag to prevent from optimizing twice. The behavior of SHOW CREATE VIEW also has changed. Now it will give the raw query rather than the query after optimized. There still has some problem in my code and I will continue work with it. |
| Comment by Ruihang Xia [ 2019-08-22 ] |
|
This week I changed the way two extra fields are used. Make it more natural. The extra field closest to the data will now be used first. |
| Comment by Oleksandr Byelkin [ 2019-08-27 ] |
|
I reviewed your patch, there is 2 small thing you have to fix: 2) you made this public: you do not need it public, maximum protected, please remove this hack. |
| Comment by Igor Babaev [ 2019-09-10 ] |
|
The patch for the task was merged into 10.5 |