Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-6326

cpimport rollback fails to clear Casual Partitioning Min/Max metadata, causing MAX() to return 0000-00-00

    XMLWordPrintable

Details

    • Bug
    • Status: Needs Feedback (View Workflow)
    • Major
    • Resolution: Unresolved
    • 23.02.18
    • None
    • None
    • None

    Description

      Symptom:
      After an interrupted or failed bulk load (cpimport), queries using aggregate functions like MAX(date) or MIN(date) return invalid results such as 0000-00-00, even though a SELECT COUNT(1) WHERE date='0000-00-00' returns 0.

      Root Cause:
      When cpimport allocates a new extent, the Casual Partitioning (CP) Min/Max headers are initialized in the Extent Map (BRM). For standard numeric/date types, the uninitialized internal representation defaults to 0.

      If the import is interrupted (e.g., mariadbd crashes or cpimport receives a SIGSEGV), the BulkRollbackMgr successfully deletes or truncates the physical .cdf files. It then calls BRMWrapper::rollbackColumnExtents_DBroot() to return the extents to the free list or adjust the High-Water Mark (HWM).

      The Bug:
      During this partial rollback in extentmap.cpp (where emEntry.HWM = fboLo - 1 is set), the engine fails to invalidate or clear the CP Min/Max values. The 0 is left permanently stranded in the shared memory Extent Map. Future MAX() queries read this metadata 0 instead of scanning the physical files, resulting in the 0000-00-00 output.

      Workaround:
      The corrupted Extent Map headers can be flushed without a cluster restart by using the editem utility to clear the limits for the affected OID:

      editem -c <OID>
      

      Steps to Reproduce:
      This bash script reproduces the issue by sending a SIGSEGV to cpimport mid-flight, triggering the flawed rollback mechanism.

      #!/bin/bash
       
      DB_NAME="test_cp_rb"
      TB_NAME="date_crash"
       
      echo "1. Setting up test schema..."
      mariadb -e "CREATE DATABASE IF NOT EXISTS ${DB_NAME};"
      mariadb -e "CREATE TABLE ${DB_NAME}.${TB_NAME} (id INT, dt DATE) ENGINE=ColumnStore;"
       
      echo "2. Generating test data..."
      seq 1 5000000 | awk '{print $1",2026-03-01"}' > /tmp/test_data.csv
       
      echo "3. Starting cpimport..."
      cpimport -s ',' ${DB_NAME} ${TB_NAME} /tmp/test_data.csv &
      CP_PID=$!
       
      echo "4. Waiting 1 second for extents to allocate..."
      sleep 1
       
      echo "5. Sending SIGSEGV to force BulkRollbackMgr execution..."
      kill -11 $CP_PID
      wait $CP_PID
       
      echo "6. Testing MAX(dt) - Will incorrectly return 0000-00-00:"
      mariadb -e "SELECT MAX(dt) FROM ${DB_NAME}.${TB_NAME};"
      

      Proposed Fix:
      Update the rollback logic within extentmap.cpp (or DBRM::rollbackColumnExtents_DBroot) so that when an extent's HWM is rolled back or an extent is freed, its corresponding CP min/max values are explicitly set to their invalid/uninitialized state (similar to the behavior of editem -c).

      Attachments

        Activity

          People

            Unassigned Unassigned
            kyle.hutchinson Kyle Hutchinson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.