[MCOL-5196] REPLACE can provoke invalid capacity assertion in binary processing mode Created: 2022-08-16  Updated: 2023-02-17  Resolved: 2022-09-06

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 5.6.5, 6.4.2
Fix Version/s: 22.08.1

Type: Task Priority: Major
Reporter: Sergey Zefirov Assignee: Sergey Zefirov
Resolution: Fixed Votes: 0
Labels: None


 Description   

REPLACE('a', 'pqrs', 'b') can provoke invalid length assertion when used with charset that is not multibyte and has MY_CS_BINSORT flag set. This is latin1_bin charset, for one example.

The code:

  const string& str = fp[0]->data()->getStrVal(row, isNull);
  if (isNull)
    return "";
  size_t strLen = str.length();
 
  const string& fromstr = fp[1]->data()->getStrVal(row, isNull);
  if (isNull)
    return "";
  if (fromstr.length() == 0)
    return str;
  size_t fromLen = fromstr.length();
 
  const string& tostr = fp[2]->data()->getStrVal(row, isNull);
  if (isNull)
    return "";
  size_t toLen = tostr.length();
 
  bool binaryCmp = (cs->state & MY_CS_BINSORT) || !cs->use_mb();
  string newstr;
  size_t pos = 0;
  if (binaryCmp)
  {
    // Count the number of fromstr in strend so we can reserve buffer space.
    int count = 0;
    do
    {
      ++count;
      pos = str.find(fromstr, pos + fromLen);
    } while (pos != string::npos);
 
    newstr.reserve(strLen + (count * ((int)toLen - (int)fromLen)) + 1); // <- the culprit.
 
    uint32_t i = 0;
    pos = str.find(fromstr);
    if (pos == string::npos)
      return str;
    // Move the stuff into newstr
    do
    {
      if (pos > i)
        newstr = newstr + str.substr(i, pos - i);
 
      newstr = newstr + tostr;
      i = pos + fromLen;
      pos = str.find(fromstr, i);
    } while (pos != string::npos);
 
    newstr = newstr + str.substr(i, string::npos);
  }
...

We count number of occurences starting with 1, even if there are no occurences of "fromstring" in the string "str" we are processing. Then we calculate signed integer difference between string to replace to "tostr" and string to replace "fromstring" and multiply it by count of occurences found plus 1.

For string to process 'a', string to replace 'pqrs' and string to replace to 'a', the difference will be negative -3, count will be 1 and string to process length is 1. The end result will be a value that is about 4G bytes long or even more.



 Comments   
Comment by alexey vorovich (Inactive) [ 2022-08-16 ]

drrtuy sergey.zefirov Are we planning to do this for 220801 or later ?

Comment by Sergey Zefirov [ 2022-08-22 ]

Comments contain a link to relevant PR.

Comment by Daniel Lee (Inactive) [ 2022-09-06 ]

Build verified: 22.08 (#5531)

Reproduced the issue in 6.4.2 and verified the fix.

6.4.2

MariaDB [mytest]> select replace(c1, 'pqrs','b') from t1;
ERROR 1815 (HY000): Internal error: An unexpected condition within the query caused an internal processing error within Columnstore. Please check the log files for more details. Additional Information: error in BatchPrimitivePro

22.08

MariaDB [mytest]>  select replace(c1, 'pqrs','b') from t1;
+-------------------------+
| replace(c1, 'pqrs','b') |
+-------------------------+
| a                       |
+-------------------------+
1 row in set (0.022 sec)

Generated at Thu Feb 08 02:56:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.