[MCOL-5057] EM index code miscalculates RAM needed to allocate its structures Created: 2022-04-18  Updated: 2022-05-05  Resolved: 2022-04-22

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr, PrimProc
Affects Version/s: 5.6.5, 6.3.1
Fix Version/s: 6.3.1

Type: Bug Priority: Blocker
Reporter: Roman Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
causes MCOL-5050 Worker node crash after DDL . Possibl... Closed
is caused by MCOL-4912 MCS bulk insertion is slow Closed
Sprint: 2021-17

 Description   

As of MCS 5.6.5 there cases when initial EM load(load_brm) causes boost::inter_process::bad_alloc exception in ExtentMap::loadVersion4() when populating EM index.

There is an extent map example attached to this issue that can be used to reproduce the issue. At the certain record EM index managed shmem segment has 1.2 MB but this pool is fragmented so that 1.2 KB can't be allocated in a continues chunk. unordered_map rehashing throws bad_alloc in this case.
Steps to reproduce:

  • stop MCS
  • save the problematic EM as /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_em (the EM image has an impossible number of records so the original file should be edited with a hexeditor. The proper number of EMEntries can be calculated from the size of the image.
  • run load_brm /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves
    load_brm throws the above mentioned exception.

Moreover I think this can happen in a real_time also setting a cluster to read-only.



 Comments   
Comment by David Hall (Inactive) [ 2022-04-22 ]

We reverted MCOL-4912 and found the following.
On standard machines, the reverted code worked as expectd.
On docker based tests, load_brm crashed every time.

Linux maintains a max limit to how large shared memory is allowed to grow to. For normal Linux, this is set as one half total available RAM. But docker defaults this to 64mb. The file we were attempting to load tries to allocate shared memory to 423mb. Thus the crash.

The solution:
If not using docker-compose, add --shm-size=512m to the docker run command

docker run -it --shm-size=512m oracle11g /bin/bash

If using docker compose, add

services:
  your_service:
    build:
      context: .
      shm_size: '512mb' <-- this will set the size when BUILDING
    shm_size: '512mb' <-- when RUNNING 

To your docker-compose.yml

I've tried to discover why docker sets the default so small and if there are any negative affects for setting it larger. I have found nothing useful. Perhaps those with more docker experience can help.

512mb may be super large for our purposes. Remember, the system we got this from is super gigantic – far larger than anything sky is likely to ever see. If there are negative effects for increasing shm-size, we should look at scaling it to docker size. If there are not, 512m is a good max that will likely not be reached for a good many years.

Comment by alexey vorovich (Inactive) [ 2022-04-22 ]

drrtuy We should consider adding a log message in the case of shared memory issues , if possible

Comment by Daniel Lee (Inactive) [ 2022-04-22 ]

Build verified: 6.3.1-1 (#4308), cmapi 1.6.3 (#628)

With a new, modified BRM_saves_em file provided by the development, the load_brm loaded the file successfully on VMs. On docker images, the same test ended with a core dump. As noted in David.Hall's comment above, increasing shm to would fix the core dump problem.

The recommended amount of shm for running ColumnStore in Docker should be determined and published for internal and external use.

Comment by Roman [ 2022-05-05 ]

The solution is to use try/catch in EM::insert3dLayer() to grow() EM index shmem and re-try the insert.

Generated at Thu Feb 08 02:55:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.