[MCOL-1559] Regression on working_tpch1/misc/bug3669 string pad compare not correct Created: 2018-07-13 Updated: 2020-04-01 Resolved: 2020-02-12 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | PrimProc |
| Affects Version/s: | 1.0.0 |
| Fix Version/s: | 1.2.6, 1.4.3 |
| Type: | Bug | Priority: | Minor |
| Reporter: | David Hall (Inactive) | Assignee: | David Hall (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Sprint: | 2019-04, 2019-06 | ||||||||||||||||||||||||||||||||||||
| Description |
|
Probably because of a change made in Using a padded string (i.e., "abc ") in a where clause should compare favorably with a column that has no padding. It no longer works that way. |
| Comments |
| Comment by David Hall (Inactive) [ 2019-03-11 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Regression test cases: | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-04-10 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Since I used | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-04-10 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Your comment in | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2019-04-19 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Build verified: 1.2.4-1 nightly Reproduced the issue in 1.1.5-1 Verified the fix in 1.2.4-1 MariaDB [mytest]> create table stringtest (c1 char(10), c2 varchar(10), c3 varchar(6))engine=columnstore; MariaDB [mytest]> insert into stringtest values ('abc','cde','abc'), ('cde','abc','cde'); MariaDB [mytest]> select * from stringtest where c1='abc' or c2='abc';
-----
----- MariaDB [mytest]> select * from stringtest where c1= 'abc ';
-----
----- MariaDB [mytest]> select * from stringtest where c2='abc ';
-----
----- MariaDB [mytest]> select * from stringtest where c1='abc ' or c2='abc ';
-----
----- MariaDB [mytest]> select * from stringtest where c1 = 'abc' or c3='cde ';
-----
----- MariaDB [mytest]> select concat('abc ', c2), concat('cde ', c3) from stringtest;
-------------------
------------------- | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-05-22 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
The fix causes a serious performance regression. We need to find a way to do this without string copy. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-07-05 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Added a new fix. QA: This needs to be checked for performance, as the last fix hurt in this area. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-07-23 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
QA: If comparing to a constant (the most common of filters), these are the rules: 1. If the column is a VARCHAR or a CHAR, a compare for an operator (=, <, >, etc) should be true if there are trailing spaces in the column data or in the compare data or both. 2. For CHAR and VARCHAR, if the column value has no trailing space, then it should compare false for LIKE if the constant has a trailing space and true if the constant has no trailing space. 3. For CHAR, if the column value has a trailing space, then it should compare false for LIKE if the constant has a trailing space and would otherwise compare (go figure). 4. For VARCHAR, if the column value has a trailing space, then it should compare true for LIKE if the constant has a trailing space and would otherwise compare. 5. For CHAR, if the column value has a trailing space, then it should compare true for LIKE if the constant has no trailing space and otherwise compares. 6. For VARCHAR, if the column value has a trailing space, then it should compare false for LIKE if the constant has no trailing space and otherwise compares. Truth table for LIKE
| |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-07-24 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Regression test queries/working_tpch1_compareLogOnly/misc/ | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-07-24 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
I changed the code to always use boost::regex_match() rather than #ifdefing to regexec(). I'm not sure why the difference was made. Perhaps because of performance (regex == faster?). Removed a memcpy of the string into a stack allocated array. This seems wasteful, and is surely dangerous if the string was long and blew up the stack. And the code was not portable, using a gcc extension for variable length arrays (VLA). Future versions of gcc will not allow VLA with this syntax. We could do the same thing with alloca, but the copy and potential stack problems say we probably shouldn't. // Ugh. The strings returned by getStrVal have null padding out to the col width. boost::regex This comment for this bit of code makes no sense. Perhaps in the past there was no version of regexec or regex_match that took a null-terminated string, so it was needed? Basically, the string was copied and then put into a new std::string (making two physical copies), ostensibly with only one NULL. But we can pass in a NULL terminated string to regexec or to boost::regex_match, so what the what? std::string.c_str() works just fine. For 1.4, this should all be changed to use std::regex_match (C++11). | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-07-24 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Don't forget about performance testing here. Do we want Daniel to do his thing before we merge? | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2019-12-10 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Build tested: 1.2.6-1 engine commit: Reproduced the issue in bug3669.sql and verified the issue in 1.2.6-1 MariaDB [mytest]> select * from stringtest where c1='abc' or c2='abc';
-----
----- MariaDB [mytest]> select * from stringtest where c1= 'abc ';
-----
----- MariaDB [mytest]> select * from stringtest where c2='abc ';
-----
----- MariaDB [mytest]> select * from stringtest where c1='abc ' or c2='abc ';
-----
----- MariaDB [mytest]> select * from stringtest where c1 = 'abc' or c3='cde ';
-----
----- MariaDB [mytest]> select concat('abc ', c2), concat('cde ', c3) from stringtest;
-------------------
------------------- | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2019-12-10 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Did a performance comparison for concat() on integer, date, char-1byte, char-dictionary for 1gb database lineitem table. On average, the fix is about 8% slower. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2019-12-10 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Opened ticket MCOL-3661 for further performance investigation. Closing this ticket now. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gregory Dorman (Inactive) [ 2020-02-07 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
This needs to be reopened. The fix as made has created an unacceptable performance impact. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2020-02-12 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
This was placed in stalled because of some performance concerns. MCOL-3661 addressed those concerns, so this can be closed. |