Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39102

Wrong result for COUNT(DISTINCT) on large MyISAM table with WHERE condition

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.5.19
    • N/A
    • None
    • Linux

    Description

      Hello,

      we have encountered a query result that appears to be incorrect on a very large MyISAM table in MariaDB and would like to report it in case this indicates a bug.

      Environment
      MariaDB version: 10.5.19-MariaDB-0+deb11u2
      Storage engine: MyISAM
      Table size: approximately 706 million rows
      During all tests there were no concurrent writes to the table.

      Problem description
      The following query is executed on table I_01_100_DOCUMENT:

      SELECT COUNT(DISTINCT SYSROWID)
      FROM I_01_100_DOCUMENT
      WHERE MIGCR='I1';
      

      The query returns the value: 206512802

      However, based on our data validation the correct result should be: 207649165

      So the result returned by the query is lower by roughly 1.1 million rows.

      Data consistency verification
      To verify that the issue is not caused by duplicate SYSROWID values, we executed the following checks.

      First we verified global uniqueness:

      SELECT COUNT(*), COUNT(DISTINCT SYSROWID)
      FROM I_01_100_DOCUMENT;
      

      Both values are identical, which confirms that SYSROWID is globally unique.

      We also explicitly searched for duplicates:

      SELECT SYSROWID, COUNT(*)
      FROM I_01_100_DOCUMENT
      GROUP BY SYSROWID
      HAVING COUNT(*) > 1
      LIMIT 1;
      

      This query returned an empty result set, confirming that no duplicate SYSROWID values exist.

      Execution plan
      The execution plan for the problematic query is:

      type: ALL
      key: NULL
      rows: 706459104
      Extra: Using where

      So the optimizer performs a full table scan.

      Comparison with another table
      We have another table called "documents" with a very similar logical structure and indexes.
      Running the equivalent query there produces the correct result:

      SELECT COUNT(DISTINCT SYSROWID)
      FROM documents
      WHERE CR='I1';
      

      This query returns: 207649165

      The execution plan for that query is different:

      type: ref
      key: CR_i
      Extra: Using index condition

      So in this case MariaDB uses an index.

      Repair attempts
      We attempted to rebuild and repair the table, but the result remained unchanged.

      The following operations were executed:

      REPAIR TABLE I_01_100_DOCUMENT;
      

      and

      ALTER TABLE I_01_100_DOCUMENT ENGINE=MyISAM;
      

      The ALTER TABLE operation rebuilt the table and all indexes.

      Additional observations
      The issue only appears when using the WHERE MIGCR='I1' filter together with COUNT(DISTINCT SYSROWID).
      A global COUNT(DISTINCT SYSROWID) without the filter returns the correct result.
      Smaller tables do not show this behavior.
      The problem only occurs on this very large MyISAM table (~706 million rows).

      Hypothesis
      Based on our observations, the incorrect result appears when MariaDB executes COUNT(DISTINCT) using a full table scan on a very large MyISAM table. The issue might be related to the internal DISTINCT aggregation path when the optimizer chooses a full scan instead of an index-based plan.

      Expected result: 207649165
      Actual result: 206512802

      EXPLAIN SELECT COUNT(DISTINCT SYSROWID) FROM I_01_100_DOCUMENT WHERE MIGCR='I1';
      

      Result:

      id: 1
      select_type: SIMPLE
      table: I_01_100_DOCUMENT
      type: ALL
      possible_keys: CR_ix
      key: NULL
      key_len: NULL
      ref: NULL
      rows: 706459104
      Extra: Using where

      EXPLAIN SELECT COUNT(DISTINCT SYSROWID) FROM documents WHERE CR='I1';
      

      Result:

      id: 1
      select_type: SIMPLE
      table: documents
      type: ref
      possible_keys: CR_i
      key: CR_i
      key_len: 49
      ref: const
      rows: 164281932
      Extra: Using index condition

      EXPLAIN SELECT COUNT(*) FROM I_01_100_DOCUMENT WHERE MIGCR='I1';
      

      Result:

      id: 1
      select_type: SIMPLE
      table: I_01_100_DOCUMENT
      type: ref
      possible_keys: CR_ix
      key: CR_ix
      key_len: 15
      ref: const
      rows: 224080784
      Extra: Using where; Using index

      EXPLAIN SELECT COUNT(*) FROM documents WHERE CR='I1';
      

      Result:

      id: 1
      select_type: SIMPLE
      table: documents
      type: ref
      possible_keys: CR_i
      key: CR_i
      key_len: 49
      ref: const
      rows: 164281932
      Extra: Using where; Using index

      SHOW CREATE TABLE I_01_100_DOCUMENT;
       
      CREATE TABLE I_01_100_DOCUMENT (
      id int(11) unsigned NOT NULL AUTO_INCREMENT,
      XHDOC varchar(45) DEFAULT NULL,
      XXUSER varchar(11) DEFAULT NULL,
      XXLEVEL decimal(5,0) DEFAULT NULL,
      XXFLAGS decimal(10,0) DEFAULT NULL,
      SYSACLIST varchar(11) DEFAULT NULL,
      SYSCONTENTTYPE varchar(255) DEFAULT NULL,
      SYSCREATEDATE datetime DEFAULT '0000-00-00 00:00:00',
      SYSCREATEUSER varchar(129) DEFAULT NULL,
      SYSDATASIZE float DEFAULT NULL,
      SYSDISPOSITIONTIME datetime DEFAULT '0000-00-00 00:00:00',
      SYSDOCTYPES varchar(250) DEFAULT NULL,
      SYSDOCUMENTDESCRIPTION text DEFAULT NULL,
      SYSFILENAMES varchar(250) DEFAULT NULL,
      SYSFOLDERID varchar(250) DEFAULT NULL,
      SYSFOLDERTABLES text DEFAULT NULL,
      SYSINDEXSTATE decimal(10,0) DEFAULT NULL,
      SYSLCMSTAGE decimal(10,0) DEFAULT NULL,
      SYSLITIGATIONHOLD varchar(2) DEFAULT NULL,
      SYSMODIFYUSER varchar(129) DEFAULT NULL,
      SYSPAGES decimal(10,0) DEFAULT NULL,
      SYSRETENTIONCLASS varchar(255) DEFAULT NULL,
      SYSRETENTIONTIME datetime DEFAULT '0000-00-00 00:00:00',
      SYSROWID varchar(45) DEFAULT NULL,
      SYSSTORAGESIZE float DEFAULT NULL,
      SYSTIMESTAMP datetime DEFAULT '0000-00-00 00:00:00',
      SAPDOCID varchar(41) NOT NULL,
      CONTENTREPOSITORY varchar(30) DEFAULT NULL,
      DOCIDSTR varchar(40) DEFAULT NULL,
      ATTRIBUTE varchar(128) DEFAULT NULL,
      FASTLINK_PROFILE varchar(10) DEFAULT NULL,
      SCANPROFILE varchar(2) DEFAULT NULL,
      SCANUSER varchar(20) DEFAULT NULL,
      SYSSAPBARCODE varchar(51) DEFAULT NULL,
      SYSSAPBARCODEERROR varchar(201) DEFAULT NULL,
      SYSSAPBARCODESTATUS decimal(10,0) DEFAULT NULL,
      ADM_UNIQID varchar(80) DEFAULT NULL,
      BEGAUFB datetime DEFAULT '0000-00-00 00:00:00',
      DAUERAUFB decimal(10,0) DEFAULT NULL,
      ERZEUGER varchar(128) DEFAULT NULL,
      DOKART varchar(30) DEFAULT NULL,
      FK_LAND varchar(3) DEFAULT NULL,
      FK_MODUL varchar(3) DEFAULT NULL,
      FK_SYSTYP varchar(3) DEFAULT NULL,
      ORGKRIT1 varchar(20) DEFAULT NULL,
      ORGKRIT2 varchar(20) DEFAULT NULL,
      QUELLSYS varchar(30) DEFAULT NULL,
      VERSION varchar(50) DEFAULT NULL,
      VERTRDOK varchar(25) DEFAULT NULL,
      CONVERTED varchar(11) DEFAULT NULL,
      MIGCR varchar(4) DEFAULT NULL,
      time timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
      PRIMARY KEY (id),
      UNIQUE KEY SYSROWID_uq (SYSROWID),
      KEY time_i (time),
      KEY XHDOC_ix (XHDOC),
      KEY CONTENTREPOSITORY_ix (CONTENTREPOSITORY),
      KEY CR_ix (MIGCR)
      ) ENGINE=MyISAM AUTO_INCREMENT=706459105 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
       
      CREATE TABLE `documents` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `SYSROWID` char(45) NOT NULL DEFAULT '',
        `XHDOC` char(45) NOT NULL DEFAULT '',
        `DUMP` char(50) NOT NULL DEFAULT '',
        `DDC` char(25) NOT NULL DEFAULT '',
        `CR` char(12) DEFAULT '',
        `year` int(11) DEFAULT 0,
        `status` int(11) DEFAULT 0,
        `errno` int(11) DEFAULT 0,
        `data_status` int(11) DEFAULT 0,
        `has_idx` int(11) DEFAULT 0,
        `ixos_doc` smallint(6) DEFAULT 0,
        `checkouttime` datetime DEFAULT NULL,
        `checkout_packet` int(11) DEFAULT 0,
        `no_doxis_docs` int(11) DEFAULT 0,
        `SERID` char(84) DEFAULT NULL,
        `importtime` datetime DEFAULT NULL,
        `time` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
        PRIMARY KEY (`id`),
        UNIQUE KEY `doc_ui` (`SYSROWID`),
        UNIQUE KEY `sapid_2` (`XHDOC`),
        KEY `CR_i` (`CR`),
        KEY `DDC_i` (`DDC`),
        KEY `year_i` (`year`),
        KEY `status_i` (`status`),
        KEY `errno_i` (`errno`),
        KEY `data_status_i` (`data_status`),
        KEY `ixos_doc_i` (`ixos_doc`),
        KEY `time_i` (`time`),
        KEY `checkout_packet_ix` (`checkout_packet`)
      ) ENGINE=MyISAM AUTO_INCREMENT=735414289 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
      

      Additional information regarding SYSROWID

      Although the column SYSROWID in table I_01_100_DOCUMENT is defined as

      SYSROWID varchar(45) DEFAULT NULL
      

      in practice the column does not contain NULL values.

      This was verified during testing and no rows with NULL SYSROWID were found.

      Additionally, SYSROWID is defined with a unique index:

      UNIQUE KEY SYSROWID_uq (SYSROWID)
      

      The following checks were executed:

      SELECT COUNT(*) , COUNT(DISTINCT SYSROWID)
      FROM I_01_100_DOCUMENT;
      

      Both values were identical, confirming that SYSROWID is globally unique.

      We also checked explicitly for duplicates:

      SELECT SYSROWID, COUNT()
      FROM I_01_100_DOCUMENT
      GROUP BY SYSROWID
      HAVING COUNT() > 1
      LIMIT 1;
      

      This returned an empty result set.

      Therefore the incorrect result from

      SELECT COUNT(DISTINCT SYSROWID)
      FROM I_01_100_DOCUMENT
      WHERE MIGCR='I1';
      

      cannot be explained by duplicate or NULL values in SYSROWID.

      Best regards

      Attachments

        Activity

          People

            Unassigned Unassigned
            meb Melinda
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.