[MCOL-5565] Queries stuck in MDB waiting for an answer from PP Created: 2023-08-31  Updated: 2024-01-13

Status: In Progress
Project: MariaDB ColumnStore
Component/s: MDB Plugin, PrimProc
Affects Version/s: 23.02.4
Fix Version/s: 23.10.1

Type: Bug Priority: Critical
Reporter: Roman Assignee: Roman
Resolution: Unresolved Votes: 0
Labels: triage

Issue Links:
Relates
relates to MCOL-5559 Shmem segment remap causes SEGV in Ex... Closed
Sprint: 2023-10, 2023-11, 2023-12

 Description   

PrimProc runs multiple parts of a query called primitive jobs.
There is a problem that causes primitive in-memory representation to become corrupted so it cannot be stopped and doesn't go away from PP execution thread pool. When there is a certain number of such primitives it is enough to get busy all worker threads in PP.
Effectively this looks like all future query are stuck.
The issue hits our CI. One of the customers also suffers from the same issue.



 Comments   
Comment by alexey vorovich (Inactive) [ 2023-09-01 ]

allen.herrera this is in develop as of now . pls try to test

Comment by Roman [ 2023-10-20 ]

I had fixed 2 corner cases that causes queries to stuck in PP processing thread pool.
There is another one that I am still working with. It manifests itself, if one runs a workload on tiny machine, e.g. 5 queries on a 4 CPU node. This scenario is still unstable so I ll left the issue open, even if the original cases were solved.

Comment by Roman [ 2023-11-16 ]

There is a repro discovered by allen.herrera.

Generated at Thu Feb 08 02:58:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.