Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-6075

Research/Reproduce 23.10.4 Memory Allocation Related Errors

    XMLWordPrintable

Details

    • Task
    • Status: In Progress (View Workflow)
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • 2025-6, 2025-9, 2025-10

    Description

      Summary
      Users have experienced new errors for existing queries they claim to have run on 23.02.x after upgrading to 23.10.4+. Current guess is an issue with the memory allocator in 23.10.4 calculating that the query would take too much memory. Either way clearer error logs and if possible sql client error is being requested.
      Ideally we return how much memory was used and how much is being estimated to be needed.

      Reproduction:
      See edwards comments and Instructions_for_reproducing_the_bug_MCOL-6075.txt

      Actual:
      In the sql client they get error

      MCS-2001: Join or subselect exceeds memory limit.
      

      and in the debug log we see std::bad_alloc

      Sep 19 18:07:13 ip-172-31-39-250 joblist[50051]: 13.552385 |0|0|0| I 05 CAL0000: (358) MCS-2001: Join or subselect exceeds memory limit.         %%10%%
      Sep 19 18:07:13 ip-172-31-39-250 threadpool[50051]: 13.554690 |0|0|0| E 22 CAL0005: threadFcn: Caught exception:  std::bad_alloc
      Sep 19 18:07:13 ip-172-31-39-250 threadpool[50051]: 13.555167 |0|0|0| E 22 CAL0005: threadFcn: Caught exception:  std::bad_alloc
      Sep 19 18:07:13 ip-172-31-39-250 threadpool[50051]: 13.555672 |0|0|0| E 22 CAL0005: threadFcn: Caught exception:  std::bad_alloc
      Sep 19 18:07:13 ip-172-31-39-250 threadpool[50051]: 13.556261 |0|0|0| E 22 CAL0005: threadFcn: Caught exception:  std::bad_alloc
      Sep 19 18:07:13 ip-172-31-39-250 ExeMgr[50051]: 13.556548 |6|0|0| D 16 CAL0042: End SQL statement
      

      Expected:
      1) If the query ran in 23.02.x, then the query should also run in 23.10.x.
      2) Clearer error messages around how much ram is needed
      Example:

      Client:
      MCS-2001: Join or subselect exceeds memory limit of x.xGB. Estimated need of x.xGB
       
      Logs:
      Sep 19 18:07:13 ip-172-31-39-250 threadpool[50051]: 13.554690 |0|0|0| E 22 CAL0005: threadFcn: Caught exception:  std::bad_alloc: Ran out of allocated memory xGB. Need approximately x GB
      

      -----------------------------------------------------

      Old Ticket description included the following error messages too being experienced by users after upgrade but no reproduction has been found for these:

      Internal error: MCS-2004: Cannot connect to ExeMgr.
      Internal error: MCS-2033: Error occurred when calling system catalog.
      Internal error: InetStreamSocket::readToMagic: Remote is closed
      

      Attachments

        1. screenshot-1.png
          48 kB
          Edward Stoever
        2. PrimProc.154392.log
          0.6 kB
          Allen Herrera
        3. Instructions_for_reproducing_the_bug_MCOL-6075.txt
          3 kB
          Edward Stoever
        4. Screenshot_20251010_022917.png
          55 kB
          Aleksei Antipovskii

        Issue Links

          Activity

            People

              alexey.antipovsky Aleksei Antipovskii
              allen.herrera Allen Herrera
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.