Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.5.19
-
None
-
Linux
Description
Hello,
we have encountered a query result that appears to be incorrect on a very large MyISAM table in MariaDB and would like to report it in case this indicates a bug.
Environment
MariaDB version: 10.5.19-MariaDB-0+deb11u2
Storage engine: MyISAM
Table size: approximately 706 million rows
During all tests there were no concurrent writes to the table.
Problem description
The following query is executed on table I_01_100_DOCUMENT:
SELECT COUNT(DISTINCT SYSROWID) |
FROM I_01_100_DOCUMENT |
WHERE MIGCR='I1'; |
The query returns the value: 206512802
However, based on our data validation the correct result should be: 207649165
So the result returned by the query is lower by roughly 1.1 million rows.
Data consistency verification
To verify that the issue is not caused by duplicate SYSROWID values, we executed the following checks.
First we verified global uniqueness:
SELECT COUNT(*), COUNT(DISTINCT SYSROWID) |
FROM I_01_100_DOCUMENT; |
Both values are identical, which confirms that SYSROWID is globally unique.
We also explicitly searched for duplicates:
SELECT SYSROWID, COUNT(*) |
FROM I_01_100_DOCUMENT |
GROUP BY SYSROWID |
HAVING COUNT(*) > 1 |
LIMIT 1;
|
This query returned an empty result set, confirming that no duplicate SYSROWID values exist.
Execution plan
The execution plan for the problematic query is:
type: ALL
key: NULL
rows: 706459104
Extra: Using where
So the optimizer performs a full table scan.
Comparison with another table
We have another table called "documents" with a very similar logical structure and indexes.
Running the equivalent query there produces the correct result:
SELECT COUNT(DISTINCT SYSROWID) |
FROM documents |
WHERE CR='I1'; |
This query returns: 207649165
The execution plan for that query is different:
type: ref
key: CR_i
Extra: Using index condition
So in this case MariaDB uses an index.
Repair attempts
We attempted to rebuild and repair the table, but the result remained unchanged.
The following operations were executed:
REPAIR TABLE I_01_100_DOCUMENT; |
and
ALTER TABLE I_01_100_DOCUMENT ENGINE=MyISAM; |
The ALTER TABLE operation rebuilt the table and all indexes.
Additional observations
The issue only appears when using the WHERE MIGCR='I1' filter together with COUNT(DISTINCT SYSROWID).
A global COUNT(DISTINCT SYSROWID) without the filter returns the correct result.
Smaller tables do not show this behavior.
The problem only occurs on this very large MyISAM table (~706 million rows).
Hypothesis
Based on our observations, the incorrect result appears when MariaDB executes COUNT(DISTINCT) using a full table scan on a very large MyISAM table. The issue might be related to the internal DISTINCT aggregation path when the optimizer chooses a full scan instead of an index-based plan.
Expected result: 207649165
Actual result: 206512802
EXPLAIN SELECT COUNT(DISTINCT SYSROWID) FROM I_01_100_DOCUMENT WHERE MIGCR='I1'; |
Result:
id: 1
select_type: SIMPLE
table: I_01_100_DOCUMENT
type: ALL
possible_keys: CR_ix
key: NULL
key_len: NULL
ref: NULL
rows: 706459104
Extra: Using where
EXPLAIN SELECT COUNT(DISTINCT SYSROWID) FROM documents WHERE CR='I1'; |
Result:
id: 1
select_type: SIMPLE
table: documents
type: ref
possible_keys: CR_i
key: CR_i
key_len: 49
ref: const
rows: 164281932
Extra: Using index condition
EXPLAIN SELECT COUNT(*) FROM I_01_100_DOCUMENT WHERE MIGCR='I1'; |
Result:
id: 1
select_type: SIMPLE
table: I_01_100_DOCUMENT
type: ref
possible_keys: CR_ix
key: CR_ix
key_len: 15
ref: const
rows: 224080784
Extra: Using where; Using index
EXPLAIN SELECT COUNT(*) FROM documents WHERE CR='I1'; |
Result:
id: 1
select_type: SIMPLE
table: documents
type: ref
possible_keys: CR_i
key: CR_i
key_len: 49
ref: const
rows: 164281932
Extra: Using where; Using index
SHOW CREATE TABLE I_01_100_DOCUMENT; |
 |
CREATE TABLE I_01_100_DOCUMENT ( |
id int(11) unsigned NOT NULL AUTO_INCREMENT, |
XHDOC varchar(45) DEFAULT NULL, |
XXUSER varchar(11) DEFAULT NULL, |
XXLEVEL decimal(5,0) DEFAULT NULL, |
XXFLAGS decimal(10,0) DEFAULT NULL, |
SYSACLIST varchar(11) DEFAULT NULL, |
SYSCONTENTTYPE varchar(255) DEFAULT NULL, |
SYSCREATEDATE datetime DEFAULT '0000-00-00 00:00:00', |
SYSCREATEUSER varchar(129) DEFAULT NULL, |
SYSDATASIZE float DEFAULT NULL, |
SYSDISPOSITIONTIME datetime DEFAULT '0000-00-00 00:00:00', |
SYSDOCTYPES varchar(250) DEFAULT NULL, |
SYSDOCUMENTDESCRIPTION text DEFAULT NULL, |
SYSFILENAMES varchar(250) DEFAULT NULL, |
SYSFOLDERID varchar(250) DEFAULT NULL, |
SYSFOLDERTABLES text DEFAULT NULL, |
SYSINDEXSTATE decimal(10,0) DEFAULT NULL, |
SYSLCMSTAGE decimal(10,0) DEFAULT NULL, |
SYSLITIGATIONHOLD varchar(2) DEFAULT NULL, |
SYSMODIFYUSER varchar(129) DEFAULT NULL, |
SYSPAGES decimal(10,0) DEFAULT NULL, |
SYSRETENTIONCLASS varchar(255) DEFAULT NULL, |
SYSRETENTIONTIME datetime DEFAULT '0000-00-00 00:00:00', |
SYSROWID varchar(45) DEFAULT NULL, |
SYSSTORAGESIZE float DEFAULT NULL, |
SYSTIMESTAMP datetime DEFAULT '0000-00-00 00:00:00', |
SAPDOCID varchar(41) NOT NULL, |
CONTENTREPOSITORY varchar(30) DEFAULT NULL, |
DOCIDSTR varchar(40) DEFAULT NULL, |
ATTRIBUTE varchar(128) DEFAULT NULL, |
FASTLINK_PROFILE varchar(10) DEFAULT NULL, |
SCANPROFILE varchar(2) DEFAULT NULL, |
SCANUSER varchar(20) DEFAULT NULL, |
SYSSAPBARCODE varchar(51) DEFAULT NULL, |
SYSSAPBARCODEERROR varchar(201) DEFAULT NULL, |
SYSSAPBARCODESTATUS decimal(10,0) DEFAULT NULL, |
ADM_UNIQID varchar(80) DEFAULT NULL, |
BEGAUFB datetime DEFAULT '0000-00-00 00:00:00', |
DAUERAUFB decimal(10,0) DEFAULT NULL, |
ERZEUGER varchar(128) DEFAULT NULL, |
DOKART varchar(30) DEFAULT NULL, |
FK_LAND varchar(3) DEFAULT NULL, |
FK_MODUL varchar(3) DEFAULT NULL, |
FK_SYSTYP varchar(3) DEFAULT NULL, |
ORGKRIT1 varchar(20) DEFAULT NULL, |
ORGKRIT2 varchar(20) DEFAULT NULL, |
QUELLSYS varchar(30) DEFAULT NULL, |
VERSION varchar(50) DEFAULT NULL, |
VERTRDOK varchar(25) DEFAULT NULL, |
CONVERTED varchar(11) DEFAULT NULL, |
MIGCR varchar(4) DEFAULT NULL, |
time timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(), |
PRIMARY KEY (id), |
UNIQUE KEY SYSROWID_uq (SYSROWID), |
KEY time_i (time), |
KEY XHDOC_ix (XHDOC), |
KEY CONTENTREPOSITORY_ix (CONTENTREPOSITORY), |
KEY CR_ix (MIGCR) |
) ENGINE=MyISAM AUTO_INCREMENT=706459105 DEFAULT CHARSET=utf8 COLLATE=utf8_bin; |
 |
CREATE TABLE `documents` ( |
`id` int(11) NOT NULL AUTO_INCREMENT, |
`SYSROWID` char(45) NOT NULL DEFAULT '', |
`XHDOC` char(45) NOT NULL DEFAULT '', |
`DUMP` char(50) NOT NULL DEFAULT '', |
`DDC` char(25) NOT NULL DEFAULT '', |
`CR` char(12) DEFAULT '', |
`year` int(11) DEFAULT 0, |
`status` int(11) DEFAULT 0, |
`errno` int(11) DEFAULT 0, |
`data_status` int(11) DEFAULT 0, |
`has_idx` int(11) DEFAULT 0, |
`ixos_doc` smallint(6) DEFAULT 0, |
`checkouttime` datetime DEFAULT NULL, |
`checkout_packet` int(11) DEFAULT 0, |
`no_doxis_docs` int(11) DEFAULT 0, |
`SERID` char(84) DEFAULT NULL, |
`importtime` datetime DEFAULT NULL, |
`time` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(), |
PRIMARY KEY (`id`), |
UNIQUE KEY `doc_ui` (`SYSROWID`), |
UNIQUE KEY `sapid_2` (`XHDOC`), |
KEY `CR_i` (`CR`), |
KEY `DDC_i` (`DDC`), |
KEY `year_i` (`year`), |
KEY `status_i` (`status`), |
KEY `errno_i` (`errno`), |
KEY `data_status_i` (`data_status`), |
KEY `ixos_doc_i` (`ixos_doc`), |
KEY `time_i` (`time`), |
KEY `checkout_packet_ix` (`checkout_packet`) |
) ENGINE=MyISAM AUTO_INCREMENT=735414289 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci; |
Additional information regarding SYSROWID
Although the column SYSROWID in table I_01_100_DOCUMENT is defined as
SYSROWID varchar(45) DEFAULT NULL |
in practice the column does not contain NULL values.
This was verified during testing and no rows with NULL SYSROWID were found.
Additionally, SYSROWID is defined with a unique index:
UNIQUE KEY SYSROWID_uq (SYSROWID) |
The following checks were executed:
SELECT COUNT(*) , COUNT(DISTINCT SYSROWID) |
FROM I_01_100_DOCUMENT; |
Both values were identical, confirming that SYSROWID is globally unique.
We also checked explicitly for duplicates:
SELECT SYSROWID, COUNT() |
FROM I_01_100_DOCUMENT |
GROUP BY SYSROWID |
HAVING COUNT() > 1 |
LIMIT 1;
|
This returned an empty result set.
Therefore the incorrect result from
SELECT COUNT(DISTINCT SYSROWID) |
FROM I_01_100_DOCUMENT |
WHERE MIGCR='I1'; |
cannot be explained by duplicate or NULL values in SYSROWID.
Best regards