Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3983

segv from cpimport bulk load preparation

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 6.3.1
    • Component/s: cpimport, Storage Manager
    • Labels:
      None
    • Environment:
      Skysql, 1.4.3-1 of columnstore
    • Sprint:
      2021-11, 2021-12, 2021-13, 2021-14, 2021-15

      Description

      A customer ran into a problem that caused SM to continuously restart. Looking at the core file, there were 886 threads, and the ones I looked at had pretty crazy backtraces. For example, the ultimate cause of the crash, according to gdb, was an assertion failure in the string dtor, in the metadataObject dtor, except the line it's pointing at instantiates a metadataObject (doesn't destroy it). Then, that causes fatalHandler() to run, which segfaults, causing fatalHander() to run again.

      My suspicion is that there is a general synchronization problem, and this results in mem corruption, and all of the random fallout that can happen from that. Need to follow up on things like Synchronizer::process(), where we use references to strings in a list (need to verify the iterator can't be invalidated or the value changed during use, etc).

      This ticket is for general robustification of StorageManager. Also need to figure out how they got up to 886 threads right away (or ever for that matter).

      Unclear whether licensing restrictions prevent me from saving the core file somewhere and linking the ticket to it. I'll do that once I get the go-ahead.

      They were running 10.4.12-6 enterprise with columnstore @ 'columnstore-1.4.3-1'

      An update, I found a bug in the config listeners for Downloader and Synchronizer, where if the config file has max_concurrent_uploads/downloads = 20, those threadpools never have a limit imposed on start. That could explain the whole problem. If the OS decides we've started too many threads too fast, or we hit a thread limit, that could cause a wide range of problems. Still, a scan through the code looking for races would be justified

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              ben.thompson Ben Thompson
              Reporter:
              pleblanc Patrick LeBlanc (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration