[MDEV-15519] CHECKSUM TABLE gives different results across MariaDB 10.0 and 10.2 Created: 2018-03-08  Updated: 2023-04-27

Status: Confirmed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.0, 10.2
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Nemanja Stambolic Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 0
Labels: None
Environment:

GNU/Linux


Attachments: File MariaDB102_CSUM_dump.sql    
Issue Links:
Relates
relates to MDEV-28020 CHECKSUM TABLE calculates different c... Closed
relates to MDEV-15377 10.2+ Integer DEFAULT values missing ... Closed
relates to MDEV-15407 DEFAULT NULL forced when no DEFAULT s... Closed
relates to MDEV-30428 checksums on multiple servers are not... Open

 Description   

According to https://mariadb.com/kb/en/library/checksum-table/ only older MySQL servers would yield different results.

As it turns out, supported, widely used and deployed MariaDB instances will also return different results for the exact same table structure and data?!

Comparing 10.0 (Ubuntu LTS repo release) and 10.2 (MariaDB official stable repo release).

CHECKSUM TABLE badcs;

MariaDB 10.0: 1873571331
MariaDB 10.2: 3472699532

Even worse, MySQL is different on it's own:
MySQL 5.6.34: 260370412

But I'm interested here in MariaDB only.
It seems that MyISAM tables will produce same results across each server version/flavour, so all OK as expected with checksumming process done on exact same table/data.

Only InnoDB will (mis)behave like this, thus effectively breaking entire point of CHECKSUM-ming data across servers.

Isn't this suppose to be standard's compliant command?

In attachment there's my sample table with few rows. So it's easy to be reproduced.



 Comments   
Comment by Elena Stepanova [ 2018-03-09 ]

I assume you mean that the checksum is different if you create a new table on 10.0 and 10.2 (as opposed to migrating the table from 10.0 to 10.2).

In this case, the likely reason of the difference is the different format of temporal fields. Versions 10.1 and higher have mysql56_temporal_format variable which is set to on by default. It affects TIMESTAMP fields which, in turn, affects the checksum. Could you please check that with mysql56_temporal_format=off on 10.2 you are getting expected checksums?

Comment by Nemanja Stambolic [ 2018-03-09 ]

Not exactly. If you take dump I attached, and import it onto 10.0 and 10.2 respectively you'll see the difference even though table structure and data are exactly the same.

I will check temporal fields settings.

Comment by Nemanja Stambolic [ 2018-03-09 ]

So this was default:

Variable_name                        Value
mysql56_temporal_format       ON

I changed to:

Variable_name                        Value
mysql56_temporal_format	     OFF

Still getting same results on both 10.2 instances in this case, with 1st set to ON and 2nd set to OFF for mysql56_temporal_format.
So that did not make any difference for checksums on 10.2. Simply 10.2 will have different results no matter what, while 10.0 will have it's own results.

You can see it for yourself with sample SQL dump in attachment.

Comment by Elena Stepanova [ 2018-03-09 ]

Still getting same results on both 10.2 instances in this case, with 1st set to ON and 2nd set to OFF for mysql56_temporal_format.
...
You can see it for yourself with sample SQL dump in attachment.

Yes, I did use your SQL file, and I am getting identical results after adjusting the timestamp format, see below, that's why I asked you to confirm.
Did you re-create the table after you have changed mysql56_temporal_format? The option has an effect when you build the table, not when you calculate the checksum.

If you did, but are still getting different results, there are other factors that might be important. They don't change anything for me with your example, but maybe they do for you.

First, between 10.0 and 10.2 the default value of ROW FORMAT for InnoDB tables has changed. When you create your table from the attached SQL file in 10.0, it gets created with ROW_FORMAT=Compact; but in 10.2, it gets created with ROW_FORMAT=Dynamic. The documentation does say that row format is important for checksum calculation, and while it doesn't affect the outcome for me, maybe it does for you, it would be a legitimate reason.

So, please add the explicit ROW_FORMAT=... (with the value of your choice) to dump.

Secondly, the default innodb_checksum_algorithm is different in 10.0 vs 10.2. While to my understanding it shouldn't have any effect on the result, maybe it does.

If after adjusting those you still get different results, please attach your cnf file(s) or the output of SHOW VARIABLES, we'll dig further.

MariaDB [test]> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 10.0.34-MariaDB |
+-----------------+
1 row in set (0.00 sec)
 
MariaDB [test]> DROP TABLE IF EXISTS `badcs`;
Query OK, 0 rows affected, 1 warning (0.00 sec)
 
MariaDB [test]> CREATE TABLE `badcs` (
    ->   `bad_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    ->   `partner_id` mediumint(8) NOT NULL,
    ->   `service_type` varchar(50) COLLATE utf8_bin NOT NULL,
    ->   `service_level` tinyint(3) unsigned NOT NULL,
    ->   `service_start` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
    ->   `service_end` timestamp NULL DEFAULT NULL,
    ->   `invoice` tinyint(1) unsigned NOT NULL,
    ->   PRIMARY KEY (`bad_id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Query OK, 0 rows affected (0.56 sec)
 
MariaDB [test]> 
MariaDB [test]> INSERT INTO `badcs` (`bad_id`, `partner_id`, `service_type`, `service_level`, `service_start`, `service_end`, `invoice`) VALUES
    -> (1,123,'CREATE TABLE badcs',5,'2018-03-02 10:48:23',NULL,1),
    -> (2,123,'zzzzzzzzzzzzz',4,'2018-03-02 10:48:46',NULL,1),
    -> (3,255,'trrrrrrrrrrrr',4,'2018-03-02 11:56:37',NULL,0);
Query OK, 3 rows affected (0.07 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
MariaDB [test]> checksum table badcs;
+------------+------------+
| Table      | Checksum   |
+------------+------------+
| test.badcs | 3449432362 |
+------------+------------+
1 row in set (0.00 sec)

MariaDB [test]> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 10.2.13-MariaDB |
+-----------------+
1 row in set (0.00 sec)
 
MariaDB [test]> select @@mysql56_temporal_format;
+---------------------------+
| @@mysql56_temporal_format |
+---------------------------+
|                         0 |
+---------------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> DROP TABLE IF EXISTS `badcs`;
Query OK, 0 rows affected (0.57 sec)
 
MariaDB [test]> CREATE TABLE `badcs` (
    ->   `bad_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    ->   `partner_id` mediumint(8) NOT NULL,
    ->   `service_type` varchar(50) COLLATE utf8_bin NOT NULL,
    ->   `service_level` tinyint(3) unsigned NOT NULL,
    ->   `service_start` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
    ->   `service_end` timestamp NULL DEFAULT NULL,
    ->   `invoice` tinyint(1) unsigned NOT NULL,
    ->   PRIMARY KEY (`bad_id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Query OK, 0 rows affected (0.78 sec)
 
MariaDB [test]> 
MariaDB [test]> INSERT INTO `badcs` (`bad_id`, `partner_id`, `service_type`, `service_level`, `service_start`, `service_end`, `invoice`) VALUES
    -> (1,123,'CREATE TABLE badcs',5,'2018-03-02 10:48:23',NULL,1),
    -> (2,123,'zzzzzzzzzzzzz',4,'2018-03-02 10:48:46',NULL,1),
    -> (3,255,'trrrrrrrrrrrr',4,'2018-03-02 11:56:37',NULL,0);
Query OK, 3 rows affected (0.08 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
MariaDB [test]> checksum table badcs;
+------------+------------+
| Table      | Checksum   |
+------------+------------+
| test.badcs | 3449432362 |
+------------+------------+
1 row in set (0.00 sec)

Comment by Nemanja Stambolic [ 2018-03-13 ]

No, I haven't re-created the table after

mysql56_temporal_format

modification.. which basically means that I would need to drop entire database on 10.2 and re-import it from backup.... what a not nice situation.

I'll test scenario again with table being recreated.

Thanks

Comment by Elena Stepanova [ 2018-03-13 ]

I cannot yet claim whether it's a bug or not that the timestamp format causes the checksum change. For now I just want to make sure that it's indeed what causes the difference on your side.

Comment by Nemanja Stambolic [ 2018-03-13 ]

At last I managed to get the same checksums across 10.0 and 10.2 MariaDB in our system! Thank you Elena!

P.S. STILL getting different checksum on MySQL 5.6.34.

So to conclude: this method is not usable any longer and should be avoided (CHECKSUM TABLE). Or, simply only use it with exact same RDBMS engine...

Thanks again Elena!

IMHO this should be documented in bold... whenever bug or not.

Comment by Nemanja Stambolic [ 2018-03-13 ]

Hmm, I was looking what are implications disabling this temporal format.
This is linked from https://mariadb.com/kb/en/library/server-system-variables/#mysql56_temporal_format

https://jira.mariadb.org/browse/MDEV-10723

If set (the default), MariaDB uses the MySQL 5.6 low level formats for TIME, DATETIME and TIMESTAMP instead of the MariaDB 5.3 version. The version MySQL introduced in 5.6 requires more storage, but potentially allows negative dates and has some advantages in replication. There should be no reason to revert to the old MariaDB 5.3 microsecond format.

IDK why this is enabled by default - when it does not keep compatibility with MySQL 5.6 or so?

Comment by Elena Stepanova [ 2018-03-16 ]

While the question about checksum with different values of mysql56_temporal_format is vague, the documentation does say that MariaDB and MySQL are supposed to calculate checksums the same way, so it appears there is a bug here, or the documentation is outdated.

The difference is not related to possibly peculiarities of InnoDB; for simplicity, we can rule out InnoDB completely and check MyISAM, still the same difference.

DROP TABLE IF EXISTS `badcs`;
 
CREATE TABLE `badcs` (
   `bad_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
   `partner_id` mediumint(8) NOT NULL,
   `service_type` varchar(50) COLLATE utf8_bin NOT NULL,
   `service_level` tinyint(3) unsigned NOT NULL,
   `service_start` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
   `service_end` timestamp NULL DEFAULT NULL,
   `invoice` tinyint(1) unsigned NOT NULL,
   PRIMARY KEY (`bad_id`)
 ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
 
 
INSERT INTO `badcs` (`bad_id`, `partner_id`, `service_type`, `service_level`, `service_start`, `service_end`, `invoice`) VALUES
(1,123,'CREATE TABLE badcs',5,'2018-03-02 10:48:23',NULL,1),
(2,123,'zzzzzzzzzzzzz',4,'2018-03-02 10:48:46',NULL,1),
(3,255,'trrrrrrrrrrrr',4,'2018-03-02 11:56:37',NULL,0);
 
CHECKSUM TABLE badcs;
SELECT @@version, @@version_comment;

MySQL 5.6

MySQL [test]> CHECKSUM TABLE badcs;
+------------+-----------+
| Table      | Checksum  |
+------------+-----------+
| test.badcs | 240480577 |
+------------+-----------+
1 row in set (0.00 sec)
 
MySQL [test]> SELECT @@version, @@version_comment;
+--------------+---------------------+
| @@version    | @@version_comment   |
+--------------+---------------------+
| 5.6.39-debug | Source distribution |
+--------------+---------------------+
1 row in set (0.00 sec)

MariaDB 10.0

MariaDB [test]> CHECKSUM TABLE badcs;
+------------+------------+
| Table      | Checksum   |
+------------+------------+
| test.badcs | 3314725531 |
+------------+------------+
1 row in set (0.00 sec)
 
MariaDB [test]> SELECT @@version, @@version_comment;
+-----------------------+---------------------+
| @@version             | @@version_comment   |
+-----------------------+---------------------+
| 10.0.35-MariaDB-debug | Source distribution |
+-----------------------+---------------------+
1 row in set (0.00 sec)

The question of mysql56_temporal_format is also open, though.

10.2 mysql56_temporal_format=ON

MariaDB [test]> CHECKSUM TABLE badcs;
+------------+------------+
| Table      | Checksum   |
+------------+------------+
| test.badcs | 3379190625 |
+------------+------------+
1 row in set (0.00 sec)
 
MariaDB [test]> select @@mysql56_temporal_format;
+---------------------------+
| @@mysql56_temporal_format |
+---------------------------+
|                         1 |
+---------------------------+
1 row in set (0.00 sec)

10.2 mysql56_temporal_format=OFF

MariaDB [test]> CHECKSUM TABLE badcs;
+------------+------------+
| Table      | Checksum   |
+------------+------------+
| test.badcs | 3449432362 |
+------------+------------+
1 row in set (0.00 sec)
 
MariaDB [test]> select @@mysql56_temporal_format;
+---------------------------+
| @@mysql56_temporal_format |
+---------------------------+
|                         0 |
+---------------------------+
1 row in set (0.00 sec)

Comment by Nemanja Stambolic [ 2018-05-09 ]

Any news on this? It's classified as major bug, which for sure it is...

Thanks

Comment by Sergei Golubchik [ 2018-05-09 ]

No, not yet, sorry.

Comment by Sergei Golubchik [ 2018-07-16 ]

There are (at least) two options that affect the checksum calculations.

One you already know — @@mysql56_temporal_format, it determines the on-disk format of temporal fields. To have MySQL-5.6 compatible format, the table has to be created when this variable is set to ON. To have MariaDB-10.0 compatible format the table has to be created when this variable is set to OFF.

The other is @@old. Checksum format was changed in MySQL in 2007 (before MariaDB was created), but apparently this change never got into any MySQL release. So @@old=1 enables "old" pre-2007 (and apparently all current MySQL too) checksum algorithm. For InnoDB it's enough to set it before CHECKSUM TABLE statement, not before the table is created.

Comment by Sergei Golubchik [ 2021-01-10 ]

I don't seem to be able to repeat it with MyISAM.

MariaDB [test]> checksum table badcs100, badcs102_53temp, badcs102_56temp;
+----------------------+------------+
| Table                | Checksum   |
+----------------------+------------+
| test.badcs100        | 3975984850 |
| test.badcs102_53temp | 3975984850 |
| test.badcs102_56temp | 3472699532 |
+----------------------+------------+

here the first table was created in 10.0, the second in 10.2 with @@mysql56_temporal_format being 0, and the last one was created in 10.2 with @@mysql56_temporal_format being 1.

CHECKSUM TABLE returns the same values (as above) both in 10.0 and 10.2.

Comment by Elena Stepanova [ 2021-01-15 ]

I can't reproduce it on a single machine either.
It is possible that I was running 10.0 and 10.2 on different machines back then. I could have failed to mention it as I wouldn't expect it to be of any importance, but apparently it is.

Below are the results I'm getting on Ubuntu Focal x86_64 vs Ubuntu Xenial x86_64. Both are running the same server from MariaDB 10.0.38 / 10.2.36 GLIBC_2.14 binary tarballs

$ ls -l  *gz
-rw-rw-r-- 1 buildbot buildbot 447462919 Jan 30  2019 mariadb-10.0.38-linux-glibc_214-x86_64.tar.gz
-rw-rw-r-- 1 buildbot buildbot 814589442 Nov 10 20:21 mariadb-10.2.36-linux-glibc_214-x86_64.tar.gz

Focal

MariaDB [test]> CHECKSUM TABLE badcs100, badcs100_53temp;
+----------------------+------------+
| Table                | Checksum   |
+----------------------+------------+
| test.badcs100        | 3449432362 |
| test.badcs100_53temp | 3449432362 |
+----------------------+------------+
2 rows in set (0.000 sec)

Xenial

MariaDB [test]> CHECKSUM TABLE badcs100, badcs100_53temp;
+----------------------+------------+
| Table                | Checksum   |
+----------------------+------------+
| test.badcs100        | 3088635921 |
| test.badcs100_53temp | 3088635921 |
+----------------------+------------+

I don't know if it's expected.

Comment by Marko Mäkelä [ 2021-10-12 ]

Did anyone test this with Valgrind or MemorySanitizer, while disabling the questionable function Field_varstring::mark_unused_memory_as_defined()? InnoDB will leave the unused part of VARCHAR buffers uninitialized when returning data to the SQL layer. That function (and equivalent Valgrind suppressions) are silencing warnings about uninitialized data being used.

Comment by Marko Mäkelä [ 2021-10-12 ]

I tested this with the following patch:

diff --git a/sql/field.cc b/sql/field.cc
index 08bde5f58e1..46e0a2e5954 100644
--- a/sql/field.cc
+++ b/sql/field.cc
@@ -7865,8 +7865,10 @@ bool Field_varstring::send(Protocol *protocol)
 #ifdef HAVE_MEM_CHECK
 void Field_varstring::mark_unused_memory_as_defined()
 {
+#if 0
   uint used_length __attribute__((unused)) = get_length();
   MEM_MAKE_DEFINED(get_data() + used_length, field_length - used_length);
+#endif
 }
 #endif
 

and the following test case on a recent 10.5. But, it did not generate any messages about using uninitialized memory. I invoked Valgrind manually while passing the --manual-gdb option to mtr, so no suppressions should have been present.

--source include/have_innodb.inc
CREATE TABLE badcs (
  bad_id int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
  partner_id mediumint(8) NOT NULL,
  service_type varchar(50) NOT NULL,
  service_level tinyint(3) unsigned NOT NULL,
  service_start timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  service_end timestamp NULL DEFAULT NULL,
  invoice tinyint(1) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET utf8mb3;
 
INSERT INTO badcs VALUES
(1,	123,	'CREATE TABLE badcs',	5,	'2018-03-02 10:48:23',	NULL,	1),
(2,	123,	'zzzzzzzzzzzzz',	4,	'2018-03-02 10:48:46',	NULL,	1),
(3,	255,	'trrrrrrrrrrrr',	4,	'2018-03-02 11:56:37',	NULL,	0);
CHECKSUM TABLE badcs;
DROP TABLE badcs;

I got the same checksum reported when I invoked the test with

./mtr --mysqld=--innodb-default-row-format=redundant 

Comment by Marko Mäkelä [ 2021-10-12 ]

For the record, the checksum that I got reported was 3379190625, which is differs from any checksums that were reported here earlier.

Comment by Michael Widenius [ 2023-01-11 ]

I also tested the original data and I got checksum 3379190625
(This with 11.0)

I don't think the VARCHAR or blobs are a problem.
The code in handler.cc:calculate_checksum() ensures that for BLOB, VARCHAR, GEOMETRY and BIT,
we only calculate the checksum for the active data.

Generated at Thu Feb 08 08:21:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.