[MCOL-5366] Research TPC-DS queries failing due to memory/resource constraints Created: 2022-12-20  Updated: 2023-06-08

Status: Open
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: Icebox

Type: Bug Priority: Critical
Reporter: Gagan Goel (Inactive) Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
Blocks
Relates
relates to MCOL-5501 Joins on longtext using exorbitant me... Open

 Description   

Following TPC-DS queries need more investigation as they are failing due to memory related errors:

query2:
On a 64GB RAM, 16 cores, 256GB SSD system:

  [root@tntnatbry-rockylinux8 queries]# mariadb tpc_ds < query2.sql 
  ERROR 1815 (HY000) at line 2: Internal error: (437) MCS-2001: Join or subselect exceeds memory limit.

On a 128GB RAM, 32cores, 256GB SSD system:

  [root@tntnatbry-rockylinux8-2 queries]# mariadb tpc_ds < query2.sql
  ERROR 1815 (HY000) at line 2: Internal error: MCS-2003: Aggregation/Distinct memory limit is exceeded.

Enabling disk based aggregation and joins resulted in query running for over 2.5 hours at which point it was killed with ctl+c signal.

query14a:
On a 64GB RAM, 16 cores, 256GB SSD system:

  [root@tntnatbry-rockylinux8 queries]# mariadb tpc_ds < query14a.sql : FROZE

On a 128GB RAM, 32cores, 256GB SSD system:

[root@tntnatbry-rockylinux8-2 queries]# mariadb tpc_ds < query14a.sql
ERROR 1815 (HY000) at line 2: Internal error: InetStreamSocket::readToMagic: Remote is closed
 
Dec 14 20:44:56 tntnatbry-rockylinux8-2 env[229238]: Too much memory allocated!
Dec 14 20:44:56 tntnatbry-rockylinux8-2 env[229238]: ExeMgr[229238]: 56.449486 |0|0|0| C 16 CAL0044: FATAL ERROR: ExeMgr has allocated too much memory! Percent allocation-96, allowed-95. ExeMgr is restarting.
Dec 14 20:44:58 tntnatbry-rockylinux8-2 ExeMgr[229238]: 56.449486 |0|0|0| C 16 CAL0044: FATAL ERROR: ExeMgr has allocated too much memory! Percent allocation-96, allowed-95. ExeMgr is restarting.
Dec 14 20:45:39 tntnatbry-rockylinux8-2 env[229238]: Warning: 2072 bytes lost at 0x7f3f6b9d7270, allocated by T@0 at 0x7f5e1027d7a6, 0x7f5e10275b24, 0x7f5e1027b947, 0x7f5e10274d07, 0x7f5e10275296, 0x7f5e0ec3012e, 0x7f5e0ec30161, 0x7f5e10212545
Dec 14 20:46:10 tntnatbry-rockylinux8-2 kernel: Unspecified invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

^ PrimProc crash

Enabling disk based aggregation and joins:

[root@tntnatbry-rockylinux8-2 queries]# mariadb tpc_ds < query14a.sql
ERROR 1815 (HY000) at line 2: Internal error: InetStreamSocket::readToMagic: Remote is closed
 
Dec 16 23:02:35 tntnatbry-rockylinux8-2 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mcs-primproc.service,task=PrimProc,pid=261137,uid=994
Dec 16 23:02:35 tntnatbry-rockylinux8-2 kernel: Out of memory: Killed process 261137 (PrimProc) total-vm:276619940kB, anon-rss:129170584kB, file-rss:0kB, shmem-rss:0kB, UID:994 pgtables:334836kB oom_score_adj:0
Dec 16 23:02:35 tntnatbry-rockylinux8-2 kernel: oom_reaper: reaped process 261137 (PrimProc), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Dec 16 23:02:35 tntnatbry-rockylinux8-2 env[255052]: Warning: 2072 bytes lost at 0x7f99e79ce870, allocated by T@0 at ??:0, ??:0, ??:0, ??:0, ??:0, 0x7fb3ea57d12e, ??:0, Printing to addr2line failed
Dec 16 23:02:35 tntnatbry-rockylinux8-2 env[255052]: 0x7fb3ebb5f545
Dec 16 23:02:38 tntnatbry-rockylinux8-2 kernel: PrimProc invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0

^ kernel OOM killed PrimProc

query72:
On a 64GB RAM, 16cores, 256GB SSD system:

[root@tntnatbry-rockylinux8 queries]# mariadb tpc_ds < query72.sql
ERROR 1815 (HY000) at line 2: Internal error: InetStreamSocket::readToMagic: Remote is closed

^ PrimProc crash

On a 128GB RAM, 32cores, 256GB SSD system with and without disk based aggregation and disk based joins:

[root@tntnatbry-rockylinux8-2 queries]# mariadb tpc_ds < query72.sql : HUNG for 1hr45mins

query95

[root@tntnatbry-rockylinux8 queries]# mariadb tpc_ds < query95.sql
ERROR 1815 (HY000) at line 2: Internal error: (437) MCS-2001: Join or subselect exceeds memory limit.

query67a

[root@tntnatbry-rockylinux8-2 queries]# mariadb tpc_ds < query67a.sql
ERROR 1815 (HY000) at line 2: Internal error: TupleAggregateStep::threadedAggregateRowGroups()[19] MCS-2003: Aggregation/Distinct memory limit is exceeded.



 Comments   
Comment by Roman [ 2023-01-11 ]

What is the scale factor used tntnatbry? Did you activat external JOIN and GROUP BY in the config?

Generated at Thu Feb 08 02:57:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.