Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32174

ROW_FORMAT=COMPRESSED table corruption due to ROLLBACK

Details

    • Bug
    • Status: Needs Feedback (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.11.3, 10.6.12
    • 10.6
    • None

    Description

      After upgrading to ubuntu 22.04, and alongside that to mariadb 10.6, we started experiencing table corruption during our nightly restores. Because these restores are to staging servers loss of data was acceptable, so we fixed the problem by simply removing the corrupted table. But this obviously won't fly should we ever have to do a production restore (Which is very rarely).
      The database portion of the restores specifically are done using mydumper with the default amount of 4 threads.
      What makes the problem particularly nasty is that it cannot be consistently reproduced, we ourselves noticed it seemed to happen in roughly 1 out of every 7 restores (Where each restore takes about ~40-50 minutes).
      This made us think it was related to parallelism. So we tried running mydumper single-threaded, which did not solve the problem.
      We have also tried upgrading to various versions of mariadb/mydumper, most notably:

      mariadb mydumper
      10.6.12 0.15
      10.11.3 0.10
      10.11.3 0.15

      But with all of the above version combinations the problem still occurred
      Eventually we found that we could no longer reproduce the problem while running mariadb 10.5.21 with mydumper 0.10, but we are still unsure of the underlying cause.

      Provided files:
      The included table structure is just one table of our db, as we were able to reproduce it by only restoring backups of this table.
      Because it is quite time consuming to reproduce and we would have to generate a large amount of dummy data (Existing dumps all contain customer data), we have not included a database dump for now.
      But we still wanted to create this bug report just in case you might already see something strange based on the included table structure and error log.
      In case there is no immediate apparent problem and you still want to look further into this, we would of course be happy to provide a database dump.

      Update:
      After running on 10.6.15 I was again able to reproduce it. I generated a stacktrace form the core dump (mariadbd_full_bt_all_threads.txt)

      Attachments

        1. configuration.txt
          78 kB
        2. mariadbd_full_bt_all_threads.txt
          149 kB
        3. MDEV-32174_ps.txt
          335 kB
        4. restore-failure-error-log.txt
          20 kB
        5. syslog-restore-10.6.15.txt
          24 kB
        6. table_structure.txt
          1 kB

        Issue Links

          Activity

            No corruption issues thus far on recent & current 10.6 tests:

            CS 10.6.21 c05e7c4e0eff174a1f2b5e6efd53d80954f9e34b (Optimized)

            2025-01-28 15:19:35 0 [Note] /test/MD210125-mariadb-10.6.21-linux-x86_64-opt/bin/mariadbd: ready for connections.
            Version: '10.6.21-MariaDB-log'  socket: '/test/MD210125-mariadb-10.6.21-linux-x86_64-opt/socket.sock'  port: 12427  MariaDB Server
            2025-01-28 15:20:32 1511 [Note] InnoDB: Number of pools: 2
            2025-01-28 15:20:45 3015 [Note] InnoDB: Number of pools: 3
            2025-01-28 15:21:01 4524 [Note] InnoDB: Number of pools: 4
            2025-01-28 15:21:20 6027 [Note] InnoDB: Number of pools: 5
            2025-01-28 15:22:11 7533 [Note] InnoDB: Number of pools: 6
            2025-01-28 15:25:26 1804 [Note] Detected table cache mutex contention at instance 1: 55% waits. Additional table cache instance activated. Number of instances after activation: 2.
            

            I did however see a few duplicate errors again:

            MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug)

            ERROR 1062 (23000) at line 1: Duplicate entry '908-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '869-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '969-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '924-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1062-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1227-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1195-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1166-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1290-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1275-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1416-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1461-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1753-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1861-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1853-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '878-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1325-2025-01-28 15:57:58' for key 'PRIMARY'
            ERROR 1062 (23000) at line 1: Duplicate entry '1897-2025-01-28 15:57:58' for key 'PRIMARY'
            

            The scarcity/infrequency of these errors (they are almost never seen, even when running 8k threads), combined with the fact they should not really happen afaics, makes me believe there is a highly-sporadic race condition bug with LAST_INSERT_ID. The query that produced these errors is:

            INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()))
            

            Roel Roel Van de Paar added a comment - No corruption issues thus far on recent & current 10.6 tests: CS 10.6.21 c05e7c4e0eff174a1f2b5e6efd53d80954f9e34b (Optimized) 2025-01-28 15:19:35 0 [Note] /test/MD210125-mariadb-10.6.21-linux-x86_64-opt/bin/mariadbd: ready for connections. Version: '10.6.21-MariaDB-log' socket: '/test/MD210125-mariadb-10.6.21-linux-x86_64-opt/socket.sock' port: 12427 MariaDB Server 2025-01-28 15:20:32 1511 [Note] InnoDB: Number of pools: 2 2025-01-28 15:20:45 3015 [Note] InnoDB: Number of pools: 3 2025-01-28 15:21:01 4524 [Note] InnoDB: Number of pools: 4 2025-01-28 15:21:20 6027 [Note] InnoDB: Number of pools: 5 2025-01-28 15:22:11 7533 [Note] InnoDB: Number of pools: 6 2025-01-28 15:25:26 1804 [Note] Detected table cache mutex contention at instance 1: 55% waits. Additional table cache instance activated. Number of instances after activation: 2. I did however see a few duplicate errors again: MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug) ERROR 1062 (23000) at line 1: Duplicate entry '908-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '869-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '969-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '924-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1062-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1227-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1195-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1166-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1290-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1275-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1416-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1461-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1753-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1861-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1853-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '878-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1325-2025-01-28 15:57:58' for key 'PRIMARY' ERROR 1062 (23000) at line 1: Duplicate entry '1897-2025-01-28 15:57:58' for key 'PRIMARY' The scarcity/infrequency of these errors (they are almost never seen, even when running 8k threads), combined with the fact they should not really happen afaics, makes me believe there is a highly-sporadic race condition bug with LAST_INSERT_ID . The query that produced these errors is: INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()))
            Roel Roel Van de Paar added a comment - - edited

            An overnight run did not produce any new occurences. Please do have a look at the duplicate key issue mentioned in the last comment. It happens only very sporadically.

            Roel Roel Van de Paar added a comment - - edited An overnight run did not produce any new occurences. Please do have a look at the duplicate key issue mentioned in the last comment. It happens only very sporadically.

            The duplicate key errors might be related to the corner that I cut while fixing MDEV-30882. Does CHECK TABLE…EXTENDED (MDEV-24402) report any errors for these ROW_FORMAT=COMPRESSED tables?

            marko Marko Mäkelä added a comment - The duplicate key errors might be related to the corner that I cut while fixing MDEV-30882 . Does CHECK TABLE…EXTENDED ( MDEV-24402 ) report any errors for these ROW_FORMAT=COMPRESSED tables?

            Had an interesting run this morning with this. This time I used:

            threads=8000   # Number of concurrent threads
            queries=100    # Number of t1/t2 INSERTs per thread/per test round
            

            And here is what I saw:

            MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug)

            /test/MDEV-32174_mariadb-10.6.21-linux-x86_64$ ./script1.sh 
            Count: 100
            Count: 200
            Count: 300
            Count: 400
            Count: 500
            ...normal count operation continues...
            Count: 2600
            Count: 2700
            Count: 2800
            Count: 2900
            Count: 3000
            --------------
            INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()))
            --------------
             
            ERROR 1062 (23000) at line 1: Duplicate entry '2198-2025-02-10 06:38:40' for key 'PRIMARY'
            --------------
            INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()))
            --------------
             
            ERROR 1062 (23000) at line 1: Duplicate entry '1931-2025-02-10 06:38:40' for key 'PRIMARY'
            --------------
            INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()))
            --------------
            ...additional similar duplicate entries...
            ERROR 1062 (23000) at line 1: Duplicate entry '2781-2025-02-10 06:38:40' for key 'PRIMARY'
            --------------
            INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()))
            --------------
             
            ERROR 1062 (23000) at line 1: Duplicate entry '3123-2025-02-10 06:38:40' for key 'PRIMARY'
            --------------
            INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()))
            --------------
             
            ERROR 1062 (23000) at line 1: Duplicate entry '3179-2025-02-10 06:38:40' for key 'PRIMARY'
            Count: 3100
            Count: 3200
            Count: 3300
            Count: 3400
            Count: 3500
            ...normal count operation continues...
            Count: 4600
            Count: 4700
            Count: 4800
            Count: 4900
            Count: 5000
            --------------
            INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()))
            --------------
             
            ERROR 1062 (23000) at line 1: Duplicate entry '858-2025-02-10 06:39:03' for key 'PRIMARY'
            Count: 5100
            Count: 5200
            Count: 5300
            Count: 5400
            Count: 5500
            Count: 5600
            Count: 5700
            Count: 5800
            Count: 5900
            Count: 6000
            ./script1.sh: line 23: 2193819 Killed                  ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}"
            ./script1.sh: line 23: 2193829 Killed                  ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}"
            ...many similar such messages...
            ./script1.sh: line 23: 2194480 Killed                  ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}"
            ./script1.sh: line 23: 2194481 Killed                  ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}"
            Count: 6100
            Count: 6200
            Count: 6300
            Count: 6400
            Count: 6500
            ...script continues...
            

            Firstly it seems interesting that the single '858-2025-02-10 06:39:03' duplicate entry occurence happened quite offset from the earlier batch of duplicate entries, again suggesting some sort of (highly-sporadic) race condition bug, as we were discussing. Immediately after this (let's say around count 5500), I used CHECK TABLE and the table was fine:

            MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug)

            10.6.21>CHECK TABLE t2 EXTENDED;
            +---------+-------+----------+----------+
            | Table   | Op    | Msg_type | Msg_text |
            +---------+-------+----------+----------+
            | test.t2 | check | status   | OK       |
            +---------+-------+----------+----------+
            1 row in set (0.037 sec)
            

            However, a little later the client kills happened, likely as a result of automated server resource monitoring we use.
            Immediately after this, CHECK TABLE reported a warning:

            MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug)

            10.6.21>CHECK TABLE t2 EXTENDED;
            +---------+-------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            | Table   | Op    | Msg_type | Msg_text                                                                                                                                                                                                                                           |
            +---------+-------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            | test.t2 | check | Warning  | InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4]    (0x80000002),[5]   i (0x99B5D469D5),[6]    \ (0x000000015CBF),[7]      T(0xD000000C0D0554),[8]        (0x0000000000000000)} |
            | test.t2 | check | status   | OK                                                                                                                                                                                                                                                 |
            +---------+-------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            2 rows in set (0.655 sec)
            

            And checking the error log we see it's right in admist the killed connections:

            MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug)

            ...more aborted connections before...
            2025-02-10  6:39:32 5829 [Warning] Aborted connection 5829 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets)
            2025-02-10  6:39:32 5832 [Warning] Aborted connection 5832 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets)
            2025-02-10  6:39:33 4052 [Warning] InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4]    (0x80000002),[5]   i (0x99B5D469D5),[6]    \ (0x000000015CBF),[7]      T(0xD000000C0D0554),[8]        (0x0000000000000000)}
            2025-02-10  6:39:33 5848 [Warning] Aborted connection 5848 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets)
            2025-02-10  6:39:33 5912 [Warning] Aborted connection 5912 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets)
            2025-02-10  6:39:34 5924 [Warning] Aborted connection 5924 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets)
            ...more aborted connections after...
            

            Could client kills (or flaky connections) be the issue not just for this warning but also for the earlier observed corruptions?

            Roel Roel Van de Paar added a comment - Had an interesting run this morning with this. This time I used: threads=8000 # Number of concurrent threads queries=100 # Number of t1/t2 INSERTs per thread/per test round And here is what I saw: MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug) /test/MDEV-32174_mariadb-10.6.21-linux-x86_64$ ./script1.sh Count: 100 Count: 200 Count: 300 Count: 400 Count: 500 ...normal count operation continues... Count: 2600 Count: 2700 Count: 2800 Count: 2900 Count: 3000 -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) --------------   ERROR 1062 (23000) at line 1: Duplicate entry '2198-2025-02-10 06:38:40' for key 'PRIMARY' -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) --------------   ERROR 1062 (23000) at line 1: Duplicate entry '1931-2025-02-10 06:38:40' for key 'PRIMARY' -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) -------------- ...additional similar duplicate entries... ERROR 1062 (23000) at line 1: Duplicate entry '2781-2025-02-10 06:38:40' for key 'PRIMARY' -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) --------------   ERROR 1062 (23000) at line 1: Duplicate entry '3123-2025-02-10 06:38:40' for key 'PRIMARY' -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) --------------   ERROR 1062 (23000) at line 1: Duplicate entry '3179-2025-02-10 06:38:40' for key 'PRIMARY' Count: 3100 Count: 3200 Count: 3300 Count: 3400 Count: 3500 ...normal count operation continues... Count: 4600 Count: 4700 Count: 4800 Count: 4900 Count: 5000 -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) --------------   ERROR 1062 (23000) at line 1: Duplicate entry '858-2025-02-10 06:39:03' for key 'PRIMARY' Count: 5100 Count: 5200 Count: 5300 Count: 5400 Count: 5500 Count: 5600 Count: 5700 Count: 5800 Count: 5900 Count: 6000 ./script1.sh: line 23: 2193819 Killed ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}" ./script1.sh: line 23: 2193829 Killed ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}" ...many similar such messages... ./script1.sh: line 23: 2194480 Killed ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}" ./script1.sh: line 23: 2194481 Killed ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}" Count: 6100 Count: 6200 Count: 6300 Count: 6400 Count: 6500 ...script continues... Firstly it seems interesting that the single '858-2025-02-10 06:39:03' duplicate entry occurence happened quite offset from the earlier batch of duplicate entries, again suggesting some sort of (highly-sporadic) race condition bug, as we were discussing. Immediately after this (let's say around count 5500), I used CHECK TABLE and the table was fine: MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug) 10.6.21>CHECK TABLE t2 EXTENDED; +---------+-------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+-------+----------+----------+ | test.t2 | check | status | OK | +---------+-------+----------+----------+ 1 row in set (0.037 sec) However, a little later the client kills happened, likely as a result of automated server resource monitoring we use. Immediately after this, CHECK TABLE reported a warning: MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug) 10.6.21>CHECK TABLE t2 EXTENDED; +---------+-------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+-------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | test.t2 | check | Warning | InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4] (0x80000002),[5] i (0x99B5D469D5),[6] \ (0x000000015CBF),[7] T(0xD000000C0D0554),[8] (0x0000000000000000)} | | test.t2 | check | status | OK | +---------+-------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.655 sec) And checking the error log we see it's right in admist the killed connections: MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug) ...more aborted connections before... 2025-02-10 6:39:32 5829 [Warning] Aborted connection 5829 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets) 2025-02-10 6:39:32 5832 [Warning] Aborted connection 5832 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets) 2025-02-10 6:39:33 4052 [Warning] InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4] (0x80000002),[5] i (0x99B5D469D5),[6] \ (0x000000015CBF),[7] T(0xD000000C0D0554),[8] (0x0000000000000000)} 2025-02-10 6:39:33 5848 [Warning] Aborted connection 5848 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets) 2025-02-10 6:39:33 5912 [Warning] Aborted connection 5912 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets) 2025-02-10 6:39:34 5924 [Warning] Aborted connection 5924 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets) ...more aborted connections after... Could client kills (or flaky connections) be the issue not just for this warning but also for the earlier observed corruptions?
            Roel Roel Van de Paar added a comment - - edited

            The Clustered index record with stale history in table warning is readily reproducible, even with different locations:

            MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug)

            | test.t2 | check | Warning  | InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4]   l(0x8000006C),[5]   z (0x99B5D47AE8),[6]      (0x0000000C86B7),[7]   "   (0xD00000220716A0),[8]        (0x0000000000000000)} |
            ...
            | test.t2 | check | Warning  | InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4]    (0x80000016),[5]   { (0x99B5D47B0C),[6]     a(0x0000000C8661),[7]      8(0xFA00000CF30938),[8]        (0x0000000000000000)} |
            ...
            | test.t2 | check | Warning  | InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4]    (0x80000020),[5]   {G(0x99B5D47B47),[6]     h(0x0000000DDD68),[7]   $ ) (0xC5000024D6299C),[8]        (0x0000000000000000)} |
            

            It seems these are "repaired" without problems. The table will show clear/ok afterwards. However, when many threads are running, these are observed near-constantly.

            I also tried reproducing corruption by running a kill script which randomly kills connections, but no corruption was observed, which seems consistent with your findings.

            Roel Roel Van de Paar added a comment - - edited The Clustered index record with stale history in table warning is readily reproducible, even with different locations: MDEV-32174 CS 10.6.21 831f5bc66f8d2147edd7991caf69e34058566b67 (Debug) | test.t2 | check | Warning | InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4] l(0x8000006C),[5] z (0x99B5D47AE8),[6] (0x0000000C86B7),[7] " (0xD00000220716A0),[8] (0x0000000000000000)} | ... | test.t2 | check | Warning | InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4] (0x80000016),[5] { (0x99B5D47B0C),[6] a(0x0000000C8661),[7] 8(0xFA00000CF30938),[8] (0x0000000000000000)} | ... | test.t2 | check | Warning | InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4] (0x80000020),[5] {G(0x99B5D47B47),[6] h(0x0000000DDD68),[7] $ ) (0xC5000024D6299C),[8] (0x0000000000000000)} | It seems these are "repaired" without problems. The table will show clear/ok afterwards. However, when many threads are running, these are observed near-constantly. I also tried reproducing corruption by running a kill script which randomly kills connections, but no corruption was observed, which seems consistent with your findings.

            People

              marko Marko Mäkelä
              bjhilbrands Bart-Jan Hilbrands
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.