Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32817

在最近将版本升级到10.11.5后,针对表进行频繁的读写操作不久后,出现index for table xxxx is corrupt,随后此表tablespace xxxxxx corrupted,最后Tablespace is missing for a table,此表已完全不可用

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 10.11.5
    • N/A
    • None
    • None

    Description



      Attachments

        Issue Links

          Activity

            ls ls added a comment -

            Nearly two weeks have passed. I wonder if the cause of this problem has been determined. I am considering upgrading to 10.11.8 recently, but I am not sure what caused this problem. So I am not sure if 10.11.8 will have this problem again, which will prevent me from upgrading.

            ls ls added a comment - Nearly two weeks have passed. I wonder if the cause of this problem has been determined. I am considering upgrading to 10.11.8 recently, but I am not sure what caused this problem. So I am not sure if 10.11.8 will have this problem again, which will prevent me from upgrading.
            danblack Daniel Black added a comment -

            I tried the sysbench from MDEV-34480 (without your configuration file and set innodb buffer pool size to 5G) and got the errors in MDEV-34566.

            I encourage you to try the docker compose file and adjust the version in the container image. Should work just as well on Aarch64 (though the sysbench is amd64 image only sorry - will probably emulate fast enough to generate a load)

            danblack Daniel Black added a comment - I tried the sysbench from MDEV-34480 (without your configuration file and set innodb buffer pool size to 5G) and got the errors in MDEV-34566 . I encourage you to try the docker compose file and adjust the version in the container image. Should work just as well on Aarch64 (though the sysbench is amd64 image only sorry - will probably emulate fast enough to generate a load)
            debarun Debarun Banerjee added a comment - - edited

            I ran sysbench with the exact same configuration as in MDEV-34479 in x86-64. My machine is 16 core with 24 cpu(hyper-threaded). I tried with using 4, 8 and all the cores.

            taskset -c 0,2,4,6,8,10,12,14 /home/hdd/deb/maria-src5/bld_rel_10.11.5_7875294b6b/sql/mariadbd
            

            The load was successful in all cases and the issue didn't repeat. Looking back I see ls mentioned already that the test runs fine with MariaDB official binary.

            So, we have 3 bugs reported from this issue. Here is what I understand as the current state.

            MDEV-34479: x86_64: Index corrupt during sysbench load. Requires building MariaDB from source with specific CentOS version to repeat.
            MDEV-34480: aarch64: Page number, offset assert. Requires building MariaDB from source with specific CentOS version to repeat.

            To investigate MDEV-34779/MDEV-3448 further, we need to first repeat it by re-building mariadb in the exact environment specified in respective MDEVs with centos, compiler version etc.

            MDEV-34566: "Crash recovery was broken" message. This is unrelated to the other 2 bugs and happens because redo log size is kept at 96M. This is too small and there is not enough margin when 32 concurrent transactions are running. This is known limitation in Innodb in general and is generally resolved with larger redo log size. We could improve the error message here. Solving the root cause would require major design overhaul and may not justify the ROI as it is generally not encountered by the user base if redo is well configured.

            debarun Debarun Banerjee added a comment - - edited I ran sysbench with the exact same configuration as in MDEV-34479 in x86-64. My machine is 16 core with 24 cpu(hyper-threaded). I tried with using 4, 8 and all the cores. taskset -c 0,2,4,6,8,10,12,14 /home/hdd/deb/maria-src5/bld_rel_10.11.5_7875294b6b/sql/mariadbd The load was successful in all cases and the issue didn't repeat. Looking back I see ls mentioned already that the test runs fine with MariaDB official binary. So, we have 3 bugs reported from this issue. Here is what I understand as the current state. MDEV-34479 : x86_64: Index corrupt during sysbench load. Requires building MariaDB from source with specific CentOS version to repeat. MDEV-34480 : aarch64: Page number, offset assert. Requires building MariaDB from source with specific CentOS version to repeat. To investigate MDEV-34779 / MDEV-3448 further, we need to first repeat it by re-building mariadb in the exact environment specified in respective MDEVs with centos, compiler version etc. MDEV-34566 : "Crash recovery was broken" message. This is unrelated to the other 2 bugs and happens because redo log size is kept at 96M. This is too small and there is not enough margin when 32 concurrent transactions are running. This is known limitation in Innodb in general and is generally resolved with larger redo log size. We could improve the error message here. Solving the root cause would require major design overhaul and may not justify the ROI as it is generally not encountered by the user base if redo is well configured.

            ls, I believe that MDEV-33588 could have fixed this. The fix was included in the previous quarterly releases (including 10.11.8), over 3 months ago. The most recent quarterly releases included some performance fixes, but sadly not a fix of MDEV-34689, which I think would be much harder to hit than the bugs that MDEV-33588 fixed.

            It looks like debarun mistyped MDEV-34479, which is another ticket from you. Can you reproduce the corruption with 10.11.8 or 10.11.9?

            marko Marko Mäkelä added a comment - ls , I believe that MDEV-33588 could have fixed this. The fix was included in the previous quarterly releases (including 10.11.8), over 3 months ago. The most recent quarterly releases included some performance fixes, but sadly not a fix of MDEV-34689 , which I think would be much harder to hit than the bugs that MDEV-33588 fixed. It looks like debarun mistyped MDEV-34479 , which is another ticket from you. Can you reproduce the corruption with 10.11.8 or 10.11.9?

            I filed MDEV-34823 for the incorrect tablespace ID output in the corruption message.

            marko Marko Mäkelä added a comment - I filed MDEV-34823 for the incorrect tablespace ID output in the corruption message.

            People

              debarun Debarun Banerjee
              1406921854@qq.com yangtao
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.