Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32817

在最近将版本升级到10.11.5后,针对表进行频繁的读写操作不久后,出现index for table xxxx is corrupt,随后此表tablespace xxxxxx corrupted,最后Tablespace is missing for a table,此表已完全不可用

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 10.11.5
    • N/A
    • None
    • None

    Description



      Attachments

        Issue Links

          Activity

            1406921854@qq.com yangtao created issue -
            1406921854@qq.com yangtao made changes -
            Field Original Value New Value
            Attachment image-2023-11-16-15-06-43-866.png [ 72493 ]
            Attachment image-2023-11-16-15-06-13-780.png [ 72494 ]
            Attachment image-2023-11-16-15-05-26-913.png [ 72495 ]
            Affects Version/s 10.11.6 [ 29020 ]
            Description  !image-2023-11-16-15-05-26-913.png|thumbnail!
             !image-2023-11-16-15-06-13-780.png|thumbnail!
             !image-2023-11-16-15-06-43-866.png|thumbnail!
            Priority Major [ 3 ] Critical [ 2 ]
            Summary 在最近将mara 在最近将版本升级到10.11.6后,针对表进行频繁的读写操作不久后,出现index for table xxxx is corrupt,随后此表tablespace xxxxxx corrupted,最后Tablespace is missing for a table,此表已完全不可用
            1406921854@qq.com yangtao made changes -
            Affects Version/s 10.11.5 [ 29019 ]
            Affects Version/s 10.11.6 [ 29020 ]
            Summary 在最近将版本升级到10.11.6后,针对表进行频繁的读写操作不久后,出现index for table xxxx is corrupt,随后此表tablespace xxxxxx corrupted,最后Tablespace is missing for a table,此表已完全不可用 在最近将版本升级到10.11.5后,针对表进行频繁的读写操作不久后,出现index for table xxxx is corrupt,随后此表tablespace xxxxxx corrupted,最后Tablespace is missing for a table,此表已完全不可用
            danblack Daniel Black added a comment -

            google translate:

            After recently upgrading the version to 10.11.6, shortly after frequent read and write operations on the table, index for table xxxx is corrupt appeared, then tablespace xxxxxx corrupted, and finally Tablespace is missing for a table, this table is completely unavailable. use

            Linked is potentially related issue under investigation.

            danblack Daniel Black added a comment - google translate: After recently upgrading the version to 10.11.6, shortly after frequent read and write operations on the table, index for table xxxx is corrupt appeared, then tablespace xxxxxx corrupted, and finally Tablespace is missing for a table, this table is completely unavailable. use Linked is potentially related issue under investigation.
            danblack Daniel Black made changes -

            The text in the image file image-2023-11-16-15-06-13-780.png mentions a tablespace ID 281473165384968 (0xffff94095d08). This is somewhat strange. In MariaDB Server 10.7, the data type of fil_space_t::id was changed from ulint (size_t) to uint32_t. I do not see where the tablespace ID could be sign-extended, to only 48 bits, instead of 64 bits. It turns out that this actually is a fil_space_t* being mis-interpreted as unsigned long int in row_mysql_get_table_status():

            			if (push_warning) {
            				ib_push_warning(trx, DB_CORRUPTION,
            					"Table %s in tablespace %lu corrupted.",
            					table->name.m_name, table->space);
            			}
            

            Unfortunately, no progress has been made on MDEV-21978, and therefore this function is not declared with the GCC function attribute printf so that such mismatch could be caught at compilation time.

            If this is a LP64 platform, sizeof(unsigned long)=sizeof(void*) should hold, so the pointer in fact should be 0xffff94095d08. If this were an LLP64 platform (such as Microsoft Windows), then we should have sizeof(unsigned long)=4 and only the least significant 32 bits of the pointer should have been reported.

            The pointer looks invalid to me. I would expect functions like malloc() to return at least 16-byte aligned memory, but here the pointer is only aligned to 8 bytes.

            Which operating system (OS) and instruction set architecture (ISA) does this occur on? Has the system memory been tested? Is the bug repeatable when starting up the database on a copy of this data directory in another environment? If the hardware is fine, the cause of this could also be a memory corruption due to a software bug.

            marko Marko Mäkelä added a comment - The text in the image file image-2023-11-16-15-06-13-780.png mentions a tablespace ID 281473165384968 (0xffff94095d08). This is somewhat strange. In MariaDB Server 10.7, the data type of fil_space_t::id was changed from ulint ( size_t ) to uint32_t . I do not see where the tablespace ID could be sign-extended, to only 48 bits, instead of 64 bits. It turns out that this actually is a fil_space_t* being mis-interpreted as unsigned long int in row_mysql_get_table_status() : if (push_warning) { ib_push_warning(trx, DB_CORRUPTION, "Table %s in tablespace %lu corrupted." , table->name.m_name, table->space); } Unfortunately, no progress has been made on MDEV-21978 , and therefore this function is not declared with the GCC function attribute printf so that such mismatch could be caught at compilation time. If this is a LP64 platform, sizeof(unsigned long)=sizeof(void*) should hold, so the pointer in fact should be 0xffff94095d08. If this were an LLP64 platform (such as Microsoft Windows), then we should have sizeof(unsigned long)=4 and only the least significant 32 bits of the pointer should have been reported. The pointer looks invalid to me. I would expect functions like malloc() to return at least 16-byte aligned memory, but here the pointer is only aligned to 8 bytes. Which operating system (OS) and instruction set architecture (ISA) does this occur on? Has the system memory been tested? Is the bug repeatable when starting up the database on a copy of this data directory in another environment? If the hardware is fine, the cause of this could also be a memory corruption due to a software bug.
            1406921854@qq.com yangtao added a comment -

            此现象发生在麒麟操作系统,ARM架构。系统内存有一百多G可用;起初版本是10.11.4,多次出现过表索引损坏的现象,后续发现10.11.5版本修复了很多问题,故将版本升级到此版本后,发现了当前现象(index for table is corrupt => table xxxx in tablespace xxxx corrupted => Tablespace is missing),未发现硬件问题

            1406921854@qq.com yangtao added a comment - 此现象发生在麒麟操作系统,ARM架构。系统内存有一百多G可用;起初版本是10.11.4,多次出现过表索引损坏的现象,后续发现10.11.5版本修复了很多问题,故将版本升级到此版本后,发现了当前现象(index for table is corrupt => table xxxx in tablespace xxxx corrupted => Tablespace is missing),未发现硬件问题

            Thank you. The wrong error message is just an "icing on the cake"; we would need to find the cause of the corruption. 10.11.4 suffered from the bug MDEV-31767, which I think triggered bogus error messages that claimed tables to be corrupted when there was no actual corruption.

            If you are using FOREIGN KEY constraints, the recently released 10.11.6 would include a fix of MDEV-30531. But, your error messages seem to suggest that the problem is something else, not limited to secondary indexes.

            Would it be possible to make a logical dump (mariadb-dump or mysqldump) and load it to a newly created MariaDB Server 10.11.6 instance? This would include some corruption bug fixes, such as MDEV-31826, MDEV-32552, and MDEV-32511.

            By starting the server from the scratch, we could be sure that there is no pre-existing corruption in the data files. For example, if the undo logs or the transaction system metadata are in some way corrupted, that could cause trouble. Likewise, there could be some old garbage in the change buffer, even if it was disabled by default for new data already in MDEV-27734.

            marko Marko Mäkelä added a comment - Thank you. The wrong error message is just an "icing on the cake"; we would need to find the cause of the corruption. 10.11.4 suffered from the bug MDEV-31767 , which I think triggered bogus error messages that claimed tables to be corrupted when there was no actual corruption. If you are using FOREIGN KEY constraints, the recently released 10.11.6 would include a fix of MDEV-30531 . But, your error messages seem to suggest that the problem is something else, not limited to secondary indexes. Would it be possible to make a logical dump ( mariadb-dump or mysqldump ) and load it to a newly created MariaDB Server 10.11.6 instance? This would include some corruption bug fixes, such as MDEV-31826 , MDEV-32552 , and MDEV-32511 . By starting the server from the scratch, we could be sure that there is no pre-existing corruption in the data files. For example, if the undo logs or the transaction system metadata are in some way corrupted, that could cause trouble. Likewise, there could be some old garbage in the change buffer, even if it was disabled by default for new data already in MDEV-27734 .
            1406921854@qq.com yangtao added a comment -

            我们再一次复现出index corrupted的问题:
            Warning : InnoDB: Index PRIMARY is marked as corrupted
            Warning : InnoDB: Index ix_tgom_operation_log_created_time is marked as corrupted
            Warning : InnoDB: Index ix_tgom_operation_log_status is marked as corrupted
            Warning : InnoDB: Index ix_tgom_operation_log_level is marked as corrupted
            error : Corrupt

            通过日志再次发现同时间点出现的日志:
            [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
            2023-11-20 1:00:05 15122 [ERROR] mariadbd: Index for table 'tgom_operation_log' is corrupt; try to repair it

            这次与上次异常打印的日志的时间点都差不多,两次异常相同的操作有:1、当时时间点对此表有大量的读写操作,2、当时时间点有执行过mysqldump进行备份。

            1406921854@qq.com yangtao added a comment - 我们再一次复现出index corrupted的问题: Warning : InnoDB: Index PRIMARY is marked as corrupted Warning : InnoDB: Index ix_tgom_operation_log_created_time is marked as corrupted Warning : InnoDB: Index ix_tgom_operation_log_status is marked as corrupted Warning : InnoDB: Index ix_tgom_operation_log_level is marked as corrupted error : Corrupt 通过日志再次发现同时间点出现的日志: [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. 2023-11-20 1:00:05 15122 [ERROR] mariadbd: Index for table 'tgom_operation_log' is corrupt; try to repair it 这次与上次异常打印的日志的时间点都差不多,两次异常相同的操作有:1、当时时间点对此表有大量的读写操作,2、当时时间点有执行过mysqldump进行备份。

            The use of mysqldump (or mariadb-dump) could cause a lot of data to be loaded from the file system to the InnoDB buffer pool. If you were using a version that was affected by the race condition bug MDEV-31767, this could be a "false alarm", that is, the data is not actually corrupted. But, you reported this bug for MariaDB 10.11.5.

            Unfortunately, it looks like the table really is corrupted. What would the following statement report?

            CHECK TABLE tgom_operation_log EXTENDED;
            

            It could be a good idea to execute that for all tables. There is also a separate mariadb-check or mysqlcheck utility for running such commands; see MDEV-30129.

            If you do not have a backup of a corrupted table, you could try to dump its contents in multiple parts:

            SELECT * FROM tgom_operation_log WHERE pk > XXXX;
            

            Replace pk with the name of the PRIMARY KEY column, and XXXX with something larger than the latest primary key value that was reported in the previous SELECT.

            marko Marko Mäkelä added a comment - The use of mysqldump (or mariadb-dump ) could cause a lot of data to be loaded from the file system to the InnoDB buffer pool. If you were using a version that was affected by the race condition bug MDEV-31767 , this could be a "false alarm", that is, the data is not actually corrupted. But, you reported this bug for MariaDB 10.11.5. Unfortunately, it looks like the table really is corrupted. What would the following statement report? CHECK TABLE tgom_operation_log EXTENDED; It could be a good idea to execute that for all tables. There is also a separate mariadb-check or mysqlcheck utility for running such commands; see MDEV-30129 . If you do not have a backup of a corrupted table, you could try to dump its contents in multiple parts: SELECT * FROM tgom_operation_log WHERE pk > XXXX; Replace pk with the name of the PRIMARY KEY column, and XXXX with something larger than the latest primary key value that was reported in the previous SELECT .
            1406921854@qq.com yangtao added a comment -

            执行CHECK TABLE tgom_operation_log EXTENDED后结果如下:
            -------------------------------------------------------------------------------------------------------------+

            Table Op Msg_type Msg_text

            -------------------------------------------------------------------------------------------------------------+

            tgom.tgom_operation_log check Warning InnoDB: Index PRIMARY is marked as corrupted
            tgom.tgom_operation_log check Warning InnoDB: Index ix_tgom_operation_log_created_time is marked as corrupted
            tgom.tgom_operation_log check Warning InnoDB: Index ix_tgom_operation_log_status is marked as corrupted
            tgom.tgom_operation_log check Warning InnoDB: Index ix_tgom_operation_log_level is marked as corrupted
            tgom.tgom_operation_log check error Corrupt

            -------------------------------------------------------------------------------------------------------------+
            此现象的根因到底会是什么呢,而且10.11.6版本是否修复了呢(我们使用的是General集群)

            1406921854@qq.com yangtao added a comment - 执行CHECK TABLE tgom_operation_log EXTENDED后结果如下: ------------------------ ----- -------- ------------------------------------------------------------------------+ Table Op Msg_type Msg_text ------------------------ ----- -------- ------------------------------------------------------------------------+ tgom.tgom_operation_log check Warning InnoDB: Index PRIMARY is marked as corrupted tgom.tgom_operation_log check Warning InnoDB: Index ix_tgom_operation_log_created_time is marked as corrupted tgom.tgom_operation_log check Warning InnoDB: Index ix_tgom_operation_log_status is marked as corrupted tgom.tgom_operation_log check Warning InnoDB: Index ix_tgom_operation_log_level is marked as corrupted tgom.tgom_operation_log check error Corrupt ------------------------ ----- -------- ------------------------------------------------------------------------+ 此现象的根因到底会是什么呢,而且10.11.6版本是否修复了呢(我们使用的是General集群)

            Is there any more output from the CHECK TABLE statement in the server error log?

            It is hard to say for sure if this error has been fixed, because we do not know what exactly caused this corruption in the first place. A number of bugs that affect crash recovery and backup have been fixed between 10.11.4 and 10.11.6. A good candidate would be MDEV-32552. It has been there for a rather long time, but we only noticed it recently.

            I would suggest to upgrade the server to 10.11.6 and restore everything from a logical dump (create a new data directory). If the corruption still occurs, we would need more details to analyze the root cause.

            marko Marko Mäkelä added a comment - Is there any more output from the CHECK TABLE statement in the server error log? It is hard to say for sure if this error has been fixed, because we do not know what exactly caused this corruption in the first place. A number of bugs that affect crash recovery and backup have been fixed between 10.11.4 and 10.11.6. A good candidate would be MDEV-32552 . It has been there for a rather long time, but we only noticed it recently. I would suggest to upgrade the server to 10.11.6 and restore everything from a logical dump (create a new data directory). If the corruption still occurs, we would need more details to analyze the root cause.

            This is a long shot, but MDEV-29832 and MDEV-29610 hint that this could also be a Linux kernel bug. In MDEV-30728, you can find links to a kernel bug that was fixed on POWER. I can imagine that something similar could affect specific ARM setups, such as multi-socket CPU configurations. Note: I do not think that MDEV-28430 can cause this form of corruption.

            marko Marko Mäkelä added a comment - This is a long shot, but MDEV-29832 and MDEV-29610 hint that this could also be a Linux kernel bug. In MDEV-30728 , you can find links to a kernel bug that was fixed on POWER. I can imagine that something similar could affect specific ARM setups, such as multi-socket CPU configurations. Note: I do not think that MDEV-28430 can cause this form of corruption.
            1406921854@qq.com yangtao added a comment -

            但是我在你给的链接中看到一个关联的 MDEV-32170 看到一个评论:将10.11降级到10.3或10.5就可以完全解决损坏问题,是真的吗

            1406921854@qq.com yangtao added a comment - 但是我在你给的链接中看到一个关联的 MDEV-32170 看到一个评论:将10.11降级到10.3或10.5就可以完全解决损坏问题,是真的吗
            marko Marko Mäkelä added a comment - - edited

            If there is a Linux kernel bug that would explain this corruption, that would likely be in the io_uring interface for which MariaDB implemented support in MDEV-24883. It is also possible to compile the server with the older libaio (io_submit(), io_getevents()). The libaio interface is much older and more stable, originally appearing somewhere in Linux 2.6.

            marko Marko Mäkelä added a comment - - edited If there is a Linux kernel bug that would explain this corruption, that would likely be in the io_uring interface for which MariaDB implemented support in MDEV-24883 . It is also possible to compile the server with the older libaio ( io_submit() , io_getevents() ). The libaio interface is much older and more stable, originally appearing somewhere in Linux 2.6.
            1406921854@qq.com yangtao added a comment -

            我们编译服务器的时候没有指定,似乎默认就是用libaio

            1406921854@qq.com yangtao added a comment - 我们编译服务器的时候没有指定,似乎默认就是用libaio
            1406921854@qq.com yangtao added a comment -

            我们在系统日志中发现我们服务所在盘经常打印VFS: Open an exclusive opened block device for write xxxx ,不知道此是否会有影响呢

            1406921854@qq.com yangtao added a comment - 我们在系统日志中发现我们服务所在盘经常打印VFS: Open an exclusive opened block device for write xxxx ,不知道此是否会有影响呢

            Thank you for the update. By default, the MariaDB build scripts use whatever is available. If there is libaio-dev but not liburing-dev installed in the build environment, then the older interface will be used. Because your build used libaio, we can rule out a possible kernel bug that would affect io_uring on ARM.

            What is your storage layer like? Which file system(s) is the data stored on? Is there any LVM or software RAID? NFS? NAS? You might want to check if reverting to buffered I/O (innodb_flush_method=fsync) would stop the corruption from occurring. The default was changed in MDEV-24854.

            danblack, since you are much more familiar with the Linux kernel than me, can you comment on that VFS error message?

            marko Marko Mäkelä added a comment - Thank you for the update. By default, the MariaDB build scripts use whatever is available. If there is libaio-dev but not liburing-dev installed in the build environment, then the older interface will be used. Because your build used libaio , we can rule out a possible kernel bug that would affect io_uring on ARM. What is your storage layer like? Which file system(s) is the data stored on? Is there any LVM or software RAID? NFS? NAS? You might want to check if reverting to buffered I/O ( innodb_flush_method=fsync ) would stop the corruption from occurring. The default was changed in MDEV-24854 . danblack , since you are much more familiar with the Linux kernel than me, can you comment on that VFS error message?
            danblack Daniel Black added a comment -

            Its from https://patchwork.kernel.org/project/linux-fsdevel/patch/1557845106-60163-2-git-send-email-yi.zhang@huawei.com/

            Notably this commit isn't in the current Linus kernel.

            Per the message it is from two open write handles on the same block device.

            Are you using innodb block devices directly? Any idea what else could be running? A LVM resize is a possibility.

            Per the commit message it, need not be the cause, but knowing which are the things opening the block device would be good to know.

            Storage information previously asked for would be quite useful. Which kernel version?

            danblack Daniel Black added a comment - Its from https://patchwork.kernel.org/project/linux-fsdevel/patch/1557845106-60163-2-git-send-email-yi.zhang@huawei.com/ Notably this commit isn't in the current Linus kernel. Per the message it is from two open write handles on the same block device. Are you using innodb block devices directly? Any idea what else could be running? A LVM resize is a possibility. Per the commit message it, need not be the cause, but knowing which are the things opening the block device would be good to know. Storage information previously asked for would be quite useful. Which kernel version?
            1406921854@qq.com yangtao made changes -
            Attachment screenshot-1.png [ 72517 ]
            1406921854@qq.com yangtao added a comment -

            你好,最近我们在尝试复现,看看此现象出现的必要条件,最新一次在10.11.5版本上又复现,

            从mysql日志中只看到:
            2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 1073 of index `PRIMARY` of table `tgom`.`tgom_operation_log`
            InnoDB: broken FIL_PAGE_NEXT link
            2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 446 of index `ix_tgom_operation_log_level` of table `tgom`.`tgom_operation_log`
            InnoDB: broken FIL_PAGE_NEXT link
            2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 439 of index `ix_tgom_operation_log_created_time` of table `tgom`.`tgom_operation_log`
            InnoDB: broken FIL_PAGE_NEXT link
            2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 3560 of index `ix_tgom_operation_log_status` of table `tgom`.`tgom_operation_log`
            InnoDB: broken FIL_PAGE_NEXT link

            目前也在相同的方法在10.11.6版本尝试是否能复现~

            1406921854@qq.com yangtao added a comment - 你好,最近我们在尝试复现,看看此现象出现的必要条件,最新一次在10.11.5版本上又复现, 从mysql日志中只看到: 2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 1073 of index `PRIMARY` of table `tgom`.`tgom_operation_log` InnoDB: broken FIL_PAGE_NEXT link 2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 446 of index `ix_tgom_operation_log_level` of table `tgom`.`tgom_operation_log` InnoDB: broken FIL_PAGE_NEXT link 2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 439 of index `ix_tgom_operation_log_created_time` of table `tgom`.`tgom_operation_log` InnoDB: broken FIL_PAGE_NEXT link 2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 3560 of index `ix_tgom_operation_log_status` of table `tgom`.`tgom_operation_log` InnoDB: broken FIL_PAGE_NEXT link 目前也在相同的方法在10.11.6版本尝试是否能复现~

            1406921854@qq.com, thank you, and good luck with narrowing this down. Currently I would suspect a bug in the kernel version that you are using, but we will see how it turns out.

            marko Marko Mäkelä added a comment - 1406921854@qq.com , thank you, and good luck with narrowing this down. Currently I would suspect a bug in the kernel version that you are using, but we will see how it turns out.
            1406921854@qq.com yangtao made changes -
            Attachment screenshot-2.png [ 72523 ]
            1406921854@qq.com yangtao added a comment -

            你好,经过我们测试验证,目前在10.11.6版本中也复现:

            1406921854@qq.com yangtao added a comment - 你好,经过我们测试验证,目前在10.11.6版本中也复现:

            The 18446744073709551615 in screenshot-2.png corresponds to ULINT_UNDEFINED in the source code. I think that we report that value when the PRIMARY key index tree (also known as the clustered index) is corrupted. That the "next" and "previous" page links are inconsistent is a serious corruption.

            marko Marko Mäkelä added a comment - The 18446744073709551615 in screenshot-2.png corresponds to ULINT_UNDEFINED in the source code. I think that we report that value when the PRIMARY key index tree (also known as the clustered index) is corrupted. That the "next" and "previous" page links are inconsistent is a serious corruption.
            1406921854@qq.com yangtao added a comment - - edited

            目前我们将版本降级到10.3.39版本,以相同的方法复现,运行了接近1天,未曾复现此问题

            1406921854@qq.com yangtao added a comment - - edited 目前我们将版本降级到10.3.39版本,以相同的方法复现,运行了接近1天,未曾复现此问题

            Thank you. Is the problem reproducible in 10.11.6 you configure innodb_flush_method=fsync? The recent new default O_DIRECT (MDEV-24854) turned out to cause corruption on some file systems in some Linux kernel versions.

            marko Marko Mäkelä added a comment - Thank you. Is the problem reproducible in 10.11.6 you configure innodb_flush_method=fsync ? The recent new default O_DIRECT ( MDEV-24854 ) turned out to cause corruption on some file systems in some Linux kernel versions.
            serg Sergei Golubchik made changes -
            Status Open [ 1 ] Needs Feedback [ 10501 ]

            If you run MariaDB Server 10.3.39 with innodb_flush_method=O_DIRECT, will you experience similar corruption? The Linux kernel bug that was behind MDEV-30728 is specific to the POWER ISA, so it should be unrelated to the ARMv8 ISA that you are using.

            marko Marko Mäkelä added a comment - If you run MariaDB Server 10.3.39 with innodb_flush_method=O_DIRECT , will you experience similar corruption? The Linux kernel bug that was behind MDEV-30728 is specific to the POWER ISA, so it should be unrelated to the ARMv8 ISA that you are using.
            1406921854@qq.com yangtao added a comment -

            当前我们一直用的是10.3.39版本 innodb_flush_method=fsync;验证了多次,未曾复现当前问题

            1406921854@qq.com yangtao added a comment - 当前我们一直用的是10.3.39版本 innodb_flush_method=fsync;验证了多次,未曾复现当前问题
            serg Sergei Golubchik made changes -
            Assignee Marko Mäkelä [ marko ]
            Status Needs Feedback [ 10501 ] Open [ 1 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.11 [ 27614 ]

            Would MariaDB Server 10.3.39 start to corrupt data when you set the parameter innodb_flush_method=O_DIRECT?

            Would MariaDB Server 10.11.6 avoid data corruption if you set innodb_flush_method=fsync?

            marko Marko Mäkelä added a comment - Would MariaDB Server 10.3.39 start to corrupt data when you set the parameter innodb_flush_method=O_DIRECT ? Would MariaDB Server 10.11.6 avoid data corruption if you set innodb_flush_method=fsync ?
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] Needs Feedback [ 10501 ]
            elenst Elena Stepanova made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Incomplete [ 4 ]
            Status Needs Feedback [ 10501 ] Closed [ 6 ]
            fuxuegang daeco added a comment -

            MariaDB Server 10.11.6 innodb_flush_method=fsync arm64 Table index corrupted occurs

            sysbench.sbtest11
            Warning : InnoDB: The B-tree of index k_11 is corrupted.
            Warning : InnoDB: Index 'k_11' contains 7190331 entries, should be 10000000.
            error : Corrupt
            sysbench.sbtest13
            Warning : InnoDB: The B-tree of index k_13 is corrupted.
            Warning : InnoDB: Index 'k_13' contains 825258 entries, should be 10000000.
            error : Corrupt

            fuxuegang daeco added a comment - MariaDB Server 10.11.6 innodb_flush_method=fsync arm64 Table index corrupted occurs sysbench.sbtest11 Warning : InnoDB: The B-tree of index k_11 is corrupted. Warning : InnoDB: Index 'k_11' contains 7190331 entries, should be 10000000. error : Corrupt sysbench.sbtest13 Warning : InnoDB: The B-tree of index k_13 is corrupted. Warning : InnoDB: Index 'k_13' contains 825258 entries, should be 10000000. error : Corrupt
            marko Marko Mäkelä added a comment - - edited

            fuxuegang, which Linux kernel version and file system did you use? I see that you tried with innodb_flush_method=fsync, like I suggested earlier.

            marko Marko Mäkelä added a comment - - edited fuxuegang , which Linux kernel version and file system did you use? I see that you tried with innodb_flush_method=fsync , like I suggested earlier.

            I went through the InnoDB changes since the 10.11.6 release (November 2023), and I did not find anything that would obviously explain this. MDEV-33379 is not applicable, because you can reproduce the corruption also with innodb_flush_method=fsync. Nevertheless, it would be interesting to repeat this experiment with the latest 10.11 release (currently 10.11.8, released in May 2024).

            marko Marko Mäkelä added a comment - I went through the InnoDB changes since the 10.11.6 release (November 2023), and I did not find anything that would obviously explain this. MDEV-33379 is not applicable, because you can reproduce the corruption also with innodb_flush_method=fsync . Nevertheless, it would be interesting to repeat this experiment with the latest 10.11 release (currently 10.11.8, released in May 2024).
            marko Marko Mäkelä made changes -
            Resolution Incomplete [ 4 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s N/A [ 14700 ]
            marko Marko Mäkelä made changes -
            Status Stalled [ 10000 ] Needs Feedback [ 10501 ]
            fuxuegang daeco added a comment -

            Marko Mäkelä, Linux version 4.19.90-24.4.v2101.ky10.aarch64 ext4

            fuxuegang daeco added a comment - Marko Mäkelä, Linux version 4.19.90-24.4.v2101.ky10.aarch64 ext4

            OK, so this must be using the older libaio interface and not io_uring. What is the CPU type?

            marko Marko Mäkelä added a comment - OK, so this must be using the older libaio interface and not io_uring . What is the CPU type?
            fuxuegang daeco added a comment -

            Architecture: aarch64
            CPU op-mode(s): 64-bit
            Byte Order: Little Endian
            CPU(s): 8
            On-line CPU(s) list: 0-7
            Thread(s) per core: 1
            Core(s) per socket: 8
            Socket(s): 1
            NUMA node(s): 1

            fuxuegang daeco added a comment - Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 1

            Can you get more specific data from /proc/cpuinfo or elsewhere? I mean the manufacturer and the model. Hardware bugs related to caches are not impossible; see for example https://lore.kernel.org/dri-devel/20240617105846.1516006-1-uwu@icenowy.me/ (different ISA, so not directly related to this).

            We do have several ARMv8 targets in buildbot.mariadb.org, and I think that tests have been rather stable ever since MDEV-28430 was fixed. That said, the regression test suite does not involve much concurrency; it typically plays a single-threaded test client against each mariadbd process, and there are only a few tests that really exercise the buffer pool or crash recovery. Most of our own performance testing tends to take place on various implementations of the x86-64 ISA.

            marko Marko Mäkelä added a comment - Can you get more specific data from /proc/cpuinfo or elsewhere? I mean the manufacturer and the model. Hardware bugs related to caches are not impossible; see for example https://lore.kernel.org/dri-devel/20240617105846.1516006-1-uwu@icenowy.me/ (different ISA, so not directly related to this). We do have several ARMv8 targets in buildbot.mariadb.org, and I think that tests have been rather stable ever since MDEV-28430 was fixed. That said, the regression test suite does not involve much concurrency; it typically plays a single-threaded test client against each mariadbd process, and there are only a few tests that really exercise the buffer pool or crash recovery. Most of our own performance testing tends to take place on various implementations of the x86-64 ISA.
            ls ls added a comment -

            My mariadb 10.11.5 also encountered the same problem on the x86_64 architecture. I compiled mariadb10.11.5 on the centos 8.1 system and ran a stress test. I tested it once yesterday and again today. Both times, the error "Index for table 'sbtestxx' is corrupt; try to repair it" appeared. It is a bug that must appear.
            https://jira.mariadb.org/browse/MDEV-34479

            And I also tested it on the centos 8.1 system of aarch64, and the same problem appeared.
            https://jira.mariadb.org/browse/MDEV-34480

            But I used the official mariadb-10.11.5-linux-systemd-x86_64.tar.gz package to run in the centos 8.1 x86_64 environment without this problem.
            According to the relevant information I tested above. It is judged that mariadb 10.11.5 compiled on centos 8.1 system will have this problem. It should have nothing to do with the CPU architecture. Of course, there will be another problem of InnoDB: Assertion failure on aarch64 architecture. It is not sure whether it is caused by the same reason as the problem discussed here.

            ls ls added a comment - My mariadb 10.11.5 also encountered the same problem on the x86_64 architecture. I compiled mariadb10.11.5 on the centos 8.1 system and ran a stress test. I tested it once yesterday and again today. Both times, the error "Index for table 'sbtestxx' is corrupt; try to repair it" appeared. It is a bug that must appear. https://jira.mariadb.org/browse/MDEV-34479 And I also tested it on the centos 8.1 system of aarch64, and the same problem appeared. https://jira.mariadb.org/browse/MDEV-34480 But I used the official mariadb-10.11.5-linux-systemd-x86_64.tar.gz package to run in the centos 8.1 x86_64 environment without this problem. According to the relevant information I tested above. It is judged that mariadb 10.11.5 compiled on centos 8.1 system will have this problem. It should have nothing to do with the CPU architecture. Of course, there will be another problem of InnoDB: Assertion failure on aarch64 architecture. It is not sure whether it is caused by the same reason as the problem discussed here.

            ls, because you have reproduced this in multiple environments, it should be a software bug. I can’t find anything obvious that could explain this. There have been some fixes related to InnoDB corruption handling between 10.11.5 and 10.11.8, but nothing obvious that would explain this.

            Still, I would recommend you to test MariaDB Server 10.11.8. On high-end aarch64 implementations it would also be interesting to know if the configuration parameter that was introduced MDEV-33515 in would be useful.

            marko Marko Mäkelä added a comment - ls , because you have reproduced this in multiple environments, it should be a software bug. I can’t find anything obvious that could explain this. There have been some fixes related to InnoDB corruption handling between 10.11.5 and 10.11.8, but nothing obvious that would explain this. Still, I would recommend you to test MariaDB Server 10.11.8. On high-end aarch64 implementations it would also be interesting to know if the configuration parameter that was introduced MDEV-33515 in would be useful.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            I will leave on vacation, but I think that it is important for us to reproduce this corruption.

            Can someone share the configuration parameters and a Sysbench invocation for reproducing this?

            marko Marko Mäkelä added a comment - I will leave on vacation, but I think that it is important for us to reproduce this corruption. Can someone share the configuration parameters and a Sysbench invocation for reproducing this?
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Debarun Banerjee [ JIRAUSER54513 ]
            ls ls added a comment - - edited

            Test mariadb 10.11.8 compiled and run on centos 8.1 (aarch64 and x86_64), no exception occurs, sysbench data import is successful.

            I also tested compiling mariadb 10.11.5 in CentOS-Stream 9 x86_64 environment. no exception occurs.

            ls ls added a comment - - edited Test mariadb 10.11.8 compiled and run on centos 8.1 (aarch64 and x86_64), no exception occurs, sysbench data import is successful. I also tested compiling mariadb 10.11.5 in CentOS-Stream 9 x86_64 environment. no exception occurs.

            ls, thank you. It is annoying when we don’t know if a bug still exists or where it might have been fixed. Come to think of it, MDEV-33588 #3098 may not have been a mere performance improvement. It could also have fixed some operations where we would buffer-fix blocks, not acquire a page latch. Unlike x86, ARM and POWER use weak memory ordering and therefore some race conditions are more easily reproducible there. Earlier, the similar bug MDEV-31767 was fixed, but that fix appeared already in 10.11.5. That bug was reproducible on x86 with some effort.

            Can you try to revert the MDEV-33588 fix or test its parent commit, to see if this is the explanation?

            marko Marko Mäkelä added a comment - ls , thank you. It is annoying when we don’t know if a bug still exists or where it might have been fixed. Come to think of it, MDEV-33588 #3098 may not have been a mere performance improvement. It could also have fixed some operations where we would buffer-fix blocks, not acquire a page latch. Unlike x86, ARM and POWER use weak memory ordering and therefore some race conditions are more easily reproducible there. Earlier, the similar bug MDEV-31767 was fixed, but that fix appeared already in 10.11.5. That bug was reproducible on x86 with some effort. Can you try to revert the MDEV-33588 fix or test its parent commit, to see if this is the explanation?
            ls ls added a comment -

            Just saw your comment above. I compiled the system environment, compilation process and compilation commands, and my.cnf configuration file to reproduce this problem in MDEV-34480. I also wrote down the sysbench command executed when the failure occurred. I think this information should be able to be reproduced.
            The following disk and file system information may also be helpful
            Disk: ssd
            File system: ext4

            ls ls added a comment - Just saw your comment above. I compiled the system environment, compilation process and compilation commands, and my.cnf configuration file to reproduce this problem in MDEV-34480 . I also wrote down the sysbench command executed when the failure occurred. I think this information should be able to be reproduced. The following disk and file system information may also be helpful Disk: ssd File system: ext4
            ls ls added a comment -

            Nearly two weeks have passed. I wonder if the cause of this problem has been determined. I am considering upgrading to 10.11.8 recently, but I am not sure what caused this problem. So I am not sure if 10.11.8 will have this problem again, which will prevent me from upgrading.

            ls ls added a comment - Nearly two weeks have passed. I wonder if the cause of this problem has been determined. I am considering upgrading to 10.11.8 recently, but I am not sure what caused this problem. So I am not sure if 10.11.8 will have this problem again, which will prevent me from upgrading.
            danblack Daniel Black added a comment -

            I tried the sysbench from MDEV-34480 (without your configuration file and set innodb buffer pool size to 5G) and got the errors in MDEV-34566.

            I encourage you to try the docker compose file and adjust the version in the container image. Should work just as well on Aarch64 (though the sysbench is amd64 image only sorry - will probably emulate fast enough to generate a load)

            danblack Daniel Black added a comment - I tried the sysbench from MDEV-34480 (without your configuration file and set innodb buffer pool size to 5G) and got the errors in MDEV-34566 . I encourage you to try the docker compose file and adjust the version in the container image. Should work just as well on Aarch64 (though the sysbench is amd64 image only sorry - will probably emulate fast enough to generate a load)
            debarun Debarun Banerjee added a comment - - edited

            I ran sysbench with the exact same configuration as in MDEV-34479 in x86-64. My machine is 16 core with 24 cpu(hyper-threaded). I tried with using 4, 8 and all the cores.

            taskset -c 0,2,4,6,8,10,12,14 /home/hdd/deb/maria-src5/bld_rel_10.11.5_7875294b6b/sql/mariadbd
            

            The load was successful in all cases and the issue didn't repeat. Looking back I see ls mentioned already that the test runs fine with MariaDB official binary.

            So, we have 3 bugs reported from this issue. Here is what I understand as the current state.

            MDEV-34479: x86_64: Index corrupt during sysbench load. Requires building MariaDB from source with specific CentOS version to repeat.
            MDEV-34480: aarch64: Page number, offset assert. Requires building MariaDB from source with specific CentOS version to repeat.

            To investigate MDEV-34779/MDEV-3448 further, we need to first repeat it by re-building mariadb in the exact environment specified in respective MDEVs with centos, compiler version etc.

            MDEV-34566: "Crash recovery was broken" message. This is unrelated to the other 2 bugs and happens because redo log size is kept at 96M. This is too small and there is not enough margin when 32 concurrent transactions are running. This is known limitation in Innodb in general and is generally resolved with larger redo log size. We could improve the error message here. Solving the root cause would require major design overhaul and may not justify the ROI as it is generally not encountered by the user base if redo is well configured.

            debarun Debarun Banerjee added a comment - - edited I ran sysbench with the exact same configuration as in MDEV-34479 in x86-64. My machine is 16 core with 24 cpu(hyper-threaded). I tried with using 4, 8 and all the cores. taskset -c 0,2,4,6,8,10,12,14 /home/hdd/deb/maria-src5/bld_rel_10.11.5_7875294b6b/sql/mariadbd The load was successful in all cases and the issue didn't repeat. Looking back I see ls mentioned already that the test runs fine with MariaDB official binary. So, we have 3 bugs reported from this issue. Here is what I understand as the current state. MDEV-34479 : x86_64: Index corrupt during sysbench load. Requires building MariaDB from source with specific CentOS version to repeat. MDEV-34480 : aarch64: Page number, offset assert. Requires building MariaDB from source with specific CentOS version to repeat. To investigate MDEV-34779 / MDEV-3448 further, we need to first repeat it by re-building mariadb in the exact environment specified in respective MDEVs with centos, compiler version etc. MDEV-34566 : "Crash recovery was broken" message. This is unrelated to the other 2 bugs and happens because redo log size is kept at 96M. This is too small and there is not enough margin when 32 concurrent transactions are running. This is known limitation in Innodb in general and is generally resolved with larger redo log size. We could improve the error message here. Solving the root cause would require major design overhaul and may not justify the ROI as it is generally not encountered by the user base if redo is well configured.

            ls, I believe that MDEV-33588 could have fixed this. The fix was included in the previous quarterly releases (including 10.11.8), over 3 months ago. The most recent quarterly releases included some performance fixes, but sadly not a fix of MDEV-34689, which I think would be much harder to hit than the bugs that MDEV-33588 fixed.

            It looks like debarun mistyped MDEV-34479, which is another ticket from you. Can you reproduce the corruption with 10.11.8 or 10.11.9?

            marko Marko Mäkelä added a comment - ls , I believe that MDEV-33588 could have fixed this. The fix was included in the previous quarterly releases (including 10.11.8), over 3 months ago. The most recent quarterly releases included some performance fixes, but sadly not a fix of MDEV-34689 , which I think would be much harder to hit than the bugs that MDEV-33588 fixed. It looks like debarun mistyped MDEV-34479 , which is another ticket from you. Can you reproduce the corruption with 10.11.8 or 10.11.9?
            marko Marko Mäkelä made changes -

            I filed MDEV-34823 for the incorrect tablespace ID output in the corruption message.

            marko Marko Mäkelä added a comment - I filed MDEV-34823 for the incorrect tablespace ID output in the corruption message.
            serg Sergei Golubchik made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Incomplete [ 4 ]
            Status Needs Feedback [ 10501 ] Closed [ 6 ]

            People

              debarun Debarun Banerjee
              1406921854@qq.com yangtao
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.