[MCOL-707] GROUP_CONCAT function on TEXT/TEXT got ColumnStore stuck in "Join or subselect exceeds memory limit" error Created: 2017-05-08  Updated: 2017-05-12  Resolved: 2017-05-10

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.1.0
Fix Version/s: 1.1.0

Type: Bug Priority: Major
Reporter: Daniel Lee (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File dttCreateTable_c.sql     File dttInsertRowsMidValues.sql    
Issue Links:
Relates
relates to MCOL-711 Backport MCOL-707 to 1.0 Closed
Sprint: 2017-10

 Description   

Build tested: Github source

[root@localhost mariadb-columnstore-server]# git show
commit 349cae544b6bc71910267a3b3b0fa3fb57b0a587
Merge: bd13090 2ecb85c
Author: benthompson15 <ben.thompson@mariadb.com>
Date: Thu May 4 16:06:16 2017 -0500

Merge pull request #50 from mariadb-corporation/10.2-fixes

10.2 fixes

[root@localhost mariadb-columnstore-server]# cd mariadb-columnstore-engine/
[root@localhost mariadb-columnstore-engine]# git show
commit 9ef603c14aa3cb8e9fe7f21c9965f6cf3f0a11d8
Merge: 19342ef 82c983e
Author: dhall-InfiniDB <david.hall@mariadb.com>
Date: Fri May 5 13:09:34 2017 -0500

Merge pull request #169 from mariadb-corporation/MCOL-701

MCOL-701 stop join on BLOB columns

Test queries are at the end of this comment.

The following query would cause this the "Sorting length Exceeded" error: (query #22)

MariaDB [mytest]> select cidx, CTEXT, GROUP_CONCAT(CTEXT) from datatypetestm group by cidx, CTEXT order by cidx;
ERROR 1815 (HY000): Internal error: IDB-2015: Sorting length exceeded. Session variable max_length_for_sort_data needs to be set higher.

The following query would cause this the "Aggregation/Distinct memory limit Exceeded" error: (query #23)
MariaDB [mytest]> select GROUP_CONCAT(CLONGTEXT) from datatypetestm;
ERROR 1815 (HY000): Internal error: IDB-2003: Aggregation/Distinct memory limit is exceeded.

If I execute the script that include the following 45 queries, ColumnStore would get into a condition that it will return a "IDB-2001: Join or subselect exceeds memory limit." even if I just execute a select count.

MariaDB [mytest]> select count from datatypetestm;
ERROR 1815 (HY000): Internal error: IDB-2001: Join or subselect exceeds memory limit.

MariaDB [mytest]> create table t2 (c1 int) engine=columnstore;
Query OK, 0 rows affected (0.07 sec)
MariaDB [mytest]> select count from t2;
ERROR 1815 (HY000): Internal error: IDB-2001: Join or subselect exceeds memory limit.

I need to restart the system to get out of this condition.

select GROUP_CONCAT(CCHAR2) from datatypetestm;
select GROUP_CONCAT(CCHAR3) from datatypetestm;
select GROUP_CONCAT(CCHAR4) from datatypetestm;
select GROUP_CONCAT(CCHAR5) from datatypetestm;
select GROUP_CONCAT(CCHAR6) from datatypetestm;
select GROUP_CONCAT(CCHAR7) from datatypetestm;
select GROUP_CONCAT(CCHAR8) from datatypetestm;
select GROUP_CONCAT(CCHAR9) from datatypetestm;
select GROUP_CONCAT(CCHAR255) from datatypetestm;
select GROUP_CONCAT(CVCHAR1) from datatypetestm;
select GROUP_CONCAT(CVCHAR2) from datatypetestm;
select GROUP_CONCAT(CVCHAR3) from datatypetestm;
select GROUP_CONCAT(CVCHAR4) from datatypetestm;
select GROUP_CONCAT(CVCHAR5) from datatypetestm;
select GROUP_CONCAT(CVCHAR6) from datatypetestm;
select GROUP_CONCAT(CVCHAR7) from datatypetestm;
select GROUP_CONCAT(CVCHAR8) from datatypetestm;
select GROUP_CONCAT(CVCHAR255) from datatypetestm;
select GROUP_CONCAT(CTEXT) from datatypetestm;
select GROUP_CONCAT(CLONGTEXT) from datatypetestm;
select GROUP_CONCAT('Hello') from datatypetestm;
select GROUP_CONCAT('Hello, World') from datatypetestm;
select cidx, CCHAR1, GROUP_CONCAT(CCHAR1) from datatypetestm group by cidx, CCHAR1 order by cidx;
select cidx, CCHAR2, GROUP_CONCAT(CCHAR2) from datatypetestm group by cidx, CCHAR2 order by cidx;
select cidx, CCHAR3, GROUP_CONCAT(CCHAR3) from datatypetestm group by cidx, CCHAR3 order by cidx;
select cidx, CCHAR4, GROUP_CONCAT(CCHAR4) from datatypetestm group by cidx, CCHAR4 order by cidx;
select cidx, CCHAR5, GROUP_CONCAT(CCHAR5) from datatypetestm group by cidx, CCHAR5 order by cidx;
select cidx, CCHAR6, GROUP_CONCAT(CCHAR6) from datatypetestm group by cidx, CCHAR6 order by cidx;
select cidx, CCHAR7, GROUP_CONCAT(CCHAR7) from datatypetestm group by cidx, CCHAR7 order by cidx;
select cidx, CCHAR8, GROUP_CONCAT(CCHAR8) from datatypetestm group by cidx, CCHAR8 order by cidx;
select cidx, CCHAR9, GROUP_CONCAT(CCHAR9) from datatypetestm group by cidx, CCHAR9 order by cidx;
select cidx, CCHAR255, GROUP_CONCAT(CCHAR255) from datatypetestm group by cidx, CCHAR255 order by cidx;
select cidx, CVCHAR1, GROUP_CONCAT(CVCHAR1) from datatypetestm group by cidx, CVCHAR1 order by cidx;
select cidx, CVCHAR2, GROUP_CONCAT(CVCHAR2) from datatypetestm group by cidx, CVCHAR2 order by cidx;
select cidx, CVCHAR3, GROUP_CONCAT(CVCHAR3) from datatypetestm group by cidx, CVCHAR3 order by cidx;
select cidx, CVCHAR4, GROUP_CONCAT(CVCHAR4) from datatypetestm group by cidx, CVCHAR4 order by cidx;
select cidx, CVCHAR5, GROUP_CONCAT(CVCHAR5) from datatypetestm group by cidx, CVCHAR5 order by cidx;
select cidx, CVCHAR6, GROUP_CONCAT(CVCHAR6) from datatypetestm group by cidx, CVCHAR6 order by cidx;
select cidx, CVCHAR7, GROUP_CONCAT(CVCHAR7) from datatypetestm group by cidx, CVCHAR7 order by cidx;
select cidx, CVCHAR8, GROUP_CONCAT(CVCHAR8) from datatypetestm group by cidx, CVCHAR8 order by cidx;
select cidx, CVCHAR255, GROUP_CONCAT(CVCHAR255) from datatypetestm group by cidx, CVCHAR255 order by cidx;
select cidx, CTEXT, GROUP_CONCAT(CTEXT) from datatypetestm group by cidx, CTEXT order by cidx;
select cidx, CLONGTEXT, GROUP_CONCAT(CLONGTEXT) from datatypetestm group by cidx, CLONGTEXT order by cidx;



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2017-05-09 ]

The cause will be some kind of query with a join or subselect that will trigger that error. Then for some reason that error gets stuck. It doesn't look like any of those queries meets the criteria so it likely happened before you executed them.

Comment by Daniel Lee (Inactive) [ 2017-05-09 ]

Running these statements in a script caused this error condition. I restarted the system and verified simple queries were executed successfully. Once I executed this script, ColumnStore stuck in this condition.

Comment by Andrew Hutchings (Inactive) [ 2017-05-09 ]

After some debugging I found this query is executing the offending join:

select GROUP_CONCAT(CLONGTEXT) from datatypetestm;

Comment by Daniel Lee (Inactive) [ 2017-05-09 ]

As I mentioned in the ticket, this query returned the "Aggregation/Distinct memory limit is exceeded." error. I was still able to run a "select count" query after that. It was later that ColumnStore got stuck in the "Join or subselect exceeds memory limit." error.

Comment by Andrew Hutchings (Inactive) [ 2017-05-09 ]

The problem is twofold:

1. We get the errors ("Aggregation/Distinct memory limit Exceeded" from ExeMgr and "Join or subselect exceeds memory limit" from PrimProc) at the same time
2. The error repeats after the first query

You will be able to continue after point 1 until you start a new session (I haven't quite figured out why yet, but probably not important).

I believe what is happening is in PrimProc the error code and attempted memory usage is getting stuck in resource manager after the initial error. The initial error itself is probably valid but I will see if there is a workaround.

Comment by Daniel Lee (Inactive) [ 2017-05-09 ]

I agree that the initial error is probably valid. When the query that caused this error exits, resource should be released and subsequent simple queries should be able to get executed.

Comment by Andrew Hutchings (Inactive) [ 2017-05-10 ]

This bug is different to what I originally thought and over a long period of time could affect 1.0 too. This is what is happening:

  • ExeMgr is using ResourceManager to count memory usage
  • GroupConcat's requirement exceeds available memory (by about 2TB) as designated by TotalUmMemory and throws the "Aggregation/Distinct memory limit is exceeded." error
  • The error does not tell ResourceManager to release the memory usage counter
  • The next attempt to access ResourceManager memory in the same ExeMgr will get a memory exceeded error. This is the join processor, hence "Join or subselect exceeds memory limit."

This can happen in 1.0 in theory if a query somehow hits "Aggregation/Distinct memory limit Exceeded" or a couple of similar errors many times. It is just massively magnified by LONGTEXT.

Comment by David Hall (Inactive) [ 2017-05-10 ]

After testing for 1.1, this should be put back into the queue for a fix in 1.0

Comment by Andrew Hutchings (Inactive) [ 2017-05-10 ]

No need, I forked MCOL-711 for the backport so it could be done after 1.0.9

Comment by Daniel Lee (Inactive) [ 2017-05-10 ]

Build tested: Github source
[root@localhost mariadb-columnstore-server]# git show
commit 349cae544b6bc71910267a3b3b0fa3fb57b0a587
Merge: bd13090 2ecb85c
Author: benthompson15 <ben.thompson@mariadb.com>
Date: Thu May 4 16:06:16 2017 -0500
Merge pull request #50 from mariadb-corporation/10.2-fixes
10.2 fixes
[root@localhost mariadb-columnstore-server]# cd mariadb-columnstore-engine/
[root@localhost mariadb-columnstore-engine]# git show
commit 03c62656b3750fd5f9759897f048bda8423c4aae
Author: david hill <david.hill@mariadb.com>
Date: Wed May 10 10:35:56 2017 -0500
update post-install
removed test lines of echo adn early exit
diff --git a/oam/install_scripts/post-install b/oam/install_scripts/post-install
index c09e7d5..69dfb58 100755
— a/oam/install_scripts/post-install
+++ b/oam/install_scripts/post-install
@@ -36,10 +36,6 @@ for arg in "$@"; do
echo "post-install: ignoring unknown argument: $arg" 1>&2
fi
done
-
-echo $installdir
-echo $prefix
-exit 0
if [ $user != "root" ]; then
export COLUMNSTORE_INSTALL_DIR=$installdir

Still having the same issue. It can be reproduced by running the following queries in a .sql script or pasted to the sql client.

select cidx, CLONGTEXT, GROUP_CONCAT(CLONGTEXT,CCHAR3) from datatypetestm group by cidx, CLONGTEXT ;
select cidx, CLONGTEXT, GROUP_CONCAT(CLONGTEXT,CCHAR3,'Calpont') from datatypetestm group by cidx, CLONGTEXT ;
select cidx, CLONGTEXT, GROUP_CONCAT(CLONGTEXT,CCHAR3,'Calpont' separator '#') from datatypetestm group by cidx, CLONGTEXT ;
select cidx, CLONGTEXT, GROUP_CONCAT(CLONGTEXT,CCHAR3,'Calpont' ORDER BY CIDX ASC) from datatypetestm group by cidx, CLONGTEXT ;
select count from datatypetestm;

It seems to be a timing issue. If I run these queries one at a time manually, it does not have the "stuck" issue.

ERROR 1815 (HY000): Internal error: IDB-2003: Aggregation/Distinct memory limit is exceeded.
MariaDB [mytest]> select cidx, CLONGTEXT, GROUP_CONCAT(CLONGTEXT,CCHAR3) from datatypetestm group by cidx, CLONGTEXT ;
ERROR 1815 (HY000): Internal error: IDB-2003: Aggregation/Distinct memory limit is exceeded.
MariaDB [mytest]> select cidx, CLONGTEXT, GROUP_CONCAT(CLONGTEXT,CCHAR3,'Calpont') from datatypetestm group by cidx, CLONGTEXT ;
ERROR 1815 (HY000): Internal error: IDB-2003: Aggregation/Distinct memory limit is exceeded.
MariaDB [mytest]> select cidx, CLONGTEXT, GROUP_CONCAT(CLONGTEXT,CCHAR3,'Calpont' separator '#') from datatypetestm group by cidx, CLONGTEXT ;
ERROR 1815 (HY000): Internal error: IDB-2003: Aggregation/Distinct memory limit is exceeded.
MariaDB [mytest]> select cidx, CLONGTEXT, GROUP_CONCAT(CLONGTEXT,CCHAR3,'Calpont' ORDER BY CIDX ASC) from datatypetestm group by cidx, CLONGTEXT ;
ERROR 1815 (HY000): Internal error: IDB-2003: Aggregation/Distinct memory limit is exceeded.
MariaDB [mytest]> select count from datatypetestm;
ERROR 1815 (HY000): Internal error: IDB-2001: Join or subselect exceeds memory limit.

Comment by Andrew Hutchings (Inactive) [ 2017-05-10 ]

Additional commit to fix "order by" memory accounting in the same way as found in Daniel's tests.

Comment by Daniel Lee (Inactive) [ 2017-05-10 ]

Build verified: Github source 1.1.0

[root@localhost mariadb-columnstore-server]# git show
commit 349cae544b6bc71910267a3b3b0fa3fb57b0a587
Merge: bd13090 2ecb85c
Author: benthompson15 <ben.thompson@mariadb.com>
Date: Thu May 4 16:06:16 2017 -0500

Merge pull request #50 from mariadb-corporation/10.2-fixes

10.2 fixes

[root@localhost mariadb-columnstore-server]# cd mariadb-columnstore-engine/
[root@localhost mariadb-columnstore-engine]# git show
commit 85a5eaac31a539e66bdb5bfd650e2e257783abc5
Merge: 03c6265 f369ee6
Author: dhall-InfiniDB <david.hall@mariadb.com>
Date: Wed May 10 17:05:03 2017 -0500

Merge pull request #181 from mariadb-corporation/MCOL-707

MCOL-707 Fix memory accounting for ORDER BY

Repeated GROUP_CONCAT test case and issue has been fixed.

Generated at Thu Feb 08 02:23:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.