Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27401

Randomly destroyed tables after migration to 10.6.4 (also under 10.6.5)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Duplicate
    • 10.6.4
    • 10.5.13, 10.6.5

    Description

      Hi,

      two weeks ago or so, I updated from OpenBSD 6.9 (MariaDB 10.5.9) to OpenBSD 7.0 (MariaDB 10.6.4). Since then nearly every day one of my tables breaks. Every time it is a different/random table and random database. If I "dump", "select", or "check" that table then the MariaDB server leaves this plane of existence for some time before coming back to life. This can happen to tables that were create before or after the update. Also I noticed that only chunks of the table are damaged. If I "select" stuff before or after the broken segment of the table, then I can still get the data.

      First I thought that something is wrong with the official OpenBSD package or some libs are not the same as in the official OpenBSD build. Thus I just compiled 10.6.5 on that server from source.

      cogniumweb# mysql_upgrade -u root -p
      Enter password:
      Phase 1/7: Checking and upgrading mysql database
      Processing databases
      mysql
      mysql.column_stats OK
      [...]
      Phase 2/7: Installing used storage engines... Skipped
      Phase 3/7: Fixing views
      [...]
      Phase 4/7: Running 'mysql_fix_privilege_tables'
      Phase 5/7: Fixing table and database names
      Phase 6/7: Checking and upgrading tables
      Processing databases
      [...]
      ntcognium.oc_file_locks OK
      ntcognium.oc_filecache OK
      ntcognium.oc_filecache_extended OK
      ntcognium.oc_files_trash OK
      [...]
      Phase 7/7: Running 'FLUSH PRIVILEGES'
      OK

      cogniumweb# mysqlcheck -A -u root -p
      [...]
      ntcognium.oc_federated_reshares OK
      ntcognium.oc_file_locks OK
      mysqlcheck: Got error: 2013: Lost connection to server during query when executing 'CHECK TABLE ... '

      I dropped that table and then recreated it via
      CREATE TABLE `oc_filecache` (
      `fileid` bigint(20) NOT NULL AUTO_INCREMENT,
      `storage` bigint(20) NOT NULL DEFAULT 0,
      `path` varchar(4000) COLLATE utf8mb4_bin DEFAULT NULL,
      `path_hash` varchar(32) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
      `parent` bigint(20) NOT NULL DEFAULT 0,
      `name` varchar(250) COLLATE utf8mb4_bin DEFAULT NULL,
      `mimetype` bigint(20) NOT NULL DEFAULT 0,
      `mimepart` bigint(20) NOT NULL DEFAULT 0,
      `size` bigint(20) NOT NULL DEFAULT 0,
      `mtime` bigint(20) NOT NULL DEFAULT 0,
      `storage_mtime` bigint(20) NOT NULL DEFAULT 0,
      `encrypted` int(11) NOT NULL DEFAULT 0,
      `unencrypted_size` bigint(20) NOT NULL DEFAULT 0,
      `etag` varchar(40) COLLATE utf8mb4_bin DEFAULT NULL,
      `permissions` int(11) DEFAULT 0,
      `checksum` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
      PRIMARY KEY (`fileid`),
      UNIQUE KEY `fs_storage_path_hash` (`storage`,`path_hash`),
      KEY `fs_parent_name_hash` (`parent`,`name`),
      KEY `fs_storage_mimetype` (`storage`,`mimetype`),
      KEY `fs_storage_mimepart` (`storage`,`mimepart`),
      KEY `fs_storage_size` (`storage`,`size`,`fileid`),
      KEY `fs_id_storage_size` (`fileid`,`storage`,`size`),
      KEY `fs_mtime` (`mtime`),
      KEY `fs_size` (`size`),
      KEY `fs_storage_path_prefix` (`storage`,`path`(64))
      ) ENGINE=InnoDB AUTO_INCREMENT=1009 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED;

      And now the system is happy again but tomorrow I expect it to kill the next table...
      It happened to nextcloud databases as well as drupal databases.

      I am a bit out of ideas what to do... Help? Please!

      best wishes
      David

      Attachments

        1. err
          8 kB
        2. my.cnf
          0.9 kB

        Issue Links

          Activity

            The 10.6.4 release would be affected by a bug that was introduced in MDEV-26040 and fixed in MDEV-26537.

            Do you get files corrupted if you upgrade straight from 10.5.9 to 10.6.5?

            marko Marko Mäkelä added a comment - The 10.6.4 release would be affected by a bug that was introduced in MDEV-26040 and fixed in MDEV-26537 . Do you get files corrupted if you upgrade straight from 10.5.9 to 10.6.5?

            Sorry, I don't have this kind of backup of the old database to test it. (I am using for my backups sql dumps of the databases and I don't backup the /var/mysql directory for more than a few days.)

            davrot David Rotermund added a comment - Sorry, I don't have this kind of backup of the old database to test it. (I am using for my backups sql dumps of the databases and I don't backup the /var/mysql directory for more than a few days.)

            davrot, let me put it in another way: Does MariaDB 10.6.5 corrupt your files if you start from a SQL dump? I am pretty sure that 10.6.4 can do that when using an operating system or file system where stat() or fstat() reports a st_blksize that is not a power of 2. In a comment in MDEV-26537 you can find a simple test that I used for repeating the problem:

            seq -f 'create table t%g engine=innodb select * from seq_1_to_200000;' 1 100|mysql test&
            seq -f 'create table u%g engine=innodb select * from seq_1_to_200000;' 1 100|mysql test&
            

            When the server is configured with a small enough innodb_buffer_pool_size, it should notice the corruption when it has to read pages back from the file system. The bug was that when data files were being extended, they were being corrupted.

            marko Marko Mäkelä added a comment - davrot , let me put it in another way: Does MariaDB 10.6.5 corrupt your files if you start from a SQL dump? I am pretty sure that 10.6.4 can do that when using an operating system or file system where stat() or fstat() reports a st_blksize that is not a power of 2. In a comment in MDEV-26537 you can find a simple test that I used for repeating the problem: seq -f 'create table t%g engine=innodb select * from seq_1_to_200000;' 1 100|mysql test& seq -f 'create table u%g engine=innodb select * from seq_1_to_200000;' 1 100|mysql test& When the server is configured with a small enough innodb_buffer_pool_size , it should notice the corruption when it has to read pages back from the file system. The bug was that when data files were being extended, they were being corrupted.

            I tried to set innodb_buffer_pool_size=1M but it automatically increased it to 5M. Nevertheless, the test database grew up to 1.6GByte and was happy. Afterwards I ran a mysqlcheck on the database and it was still happy. I did this several times and everything was okay.

            The moral of the story is for me: I will drop all my databases and re-populate them from the backup sql dumps and wait...

            Thank you for your help!

            davrot David Rotermund added a comment - I tried to set innodb_buffer_pool_size=1M but it automatically increased it to 5M. Nevertheless, the test database grew up to 1.6GByte and was happy. Afterwards I ran a mysqlcheck on the database and it was still happy. I did this several times and everything was okay. The moral of the story is for me: I will drop all my databases and re-populate them from the backup sql dumps and wait... Thank you for your help!

            davrot, thank you for confirming that this was likely a duplicate of MDEV-26537.

            Again, I am sorry that we do not have any flavor of BSD in any of our CI systems. OK, except maybe for IBM AIX, which currently fails to build 10.6 or later due to recently changed build-time dependencies.

            marko Marko Mäkelä added a comment - davrot , thank you for confirming that this was likely a duplicate of MDEV-26537 . Again, I am sorry that we do not have any flavor of BSD in any of our CI systems. OK, except maybe for IBM AIX, which currently fails to build 10.6 or later due to recently changed build-time dependencies.

            People

              marko Marko Mäkelä
              davrot David Rotermund
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.