When there is too little memory available in docker, a destructive problem occurs. Currently - it is a crash of PrimProc, with the subsequent inability to run any queries. While prior commits improved the situation, it did not improve it enough. Yes, it is no longer a crash but an error message. And yes, in some cases the cluster remains operational. But - it is not always. There are cases when it is non operational after an error message.
In order to complete this ticket, the remaining problem needs to be corrected. Even after the memory is exceeded and an error message is generated, the cluster should be operational, and be able to execute queries that fit in memory.
We will continue working on this problem under
MCOL-4733. The goal for that one is to prevent error message from happening in the first place. Memory should never be over-allocated.
The problem is not restricted to docker environments, or clusters with low memory in nodes. Some bigger jobs, like big insert into ... select from... may cause primProc crash even when there is 16GB of memory available, and that would happen on prem or in dockers.
Two things need to happen:
- when there is larger memory (e.g. 16GB) things should just work with defaults.
- If someone wants to run on lower memory (like 4GB), they should get a reasonable error message that memory is lacking for the job. Smaller jobs should continue to work.
- and - in smaller memory deployments, one should be able to lower TotalUMMemory (25% default) and NumBlocksPCT (50% default) and be able to do even bigger jobs.
There may be deeper problems on very small settings like 1GB. Once we fix and verify the above, we should investigate what to do in those cases.
2. Technical description is below:
Goal is to implement internal realtime tracking of memory used by each process ExeMgr and PrimProc. This removes the need to ping the system at intervals to check memory usage and compare against some threshold (MaxPct). In doing so prior to each allocation further we can detect if it is approaching OOM quicker. (Before this would rely on the interval of refreshing the systems view of process memory and lead to possibly going further above the MaxPct than can be recovered. This should allow killing queries without consuming so much memory future queries can be blocked.
In order to complete the solution we would implement better management of who holds memory, and allow the system to free most of the held memory, and essentially reset all the block cache and instances holding any memory at an OOM event and ensure next query would be as if the processes were reset without having to restart them.