[MDEV-32817] 在最近将版本升级到10.11.5后,针对表进行频繁的读写操作不久后,出现index for table xxxx is corrupt,随后此表tablespace xxxxxx corrupted,最后Tablespace is missing for a table,此表已完全不可用 Created: 2023-11-16  Updated: 2024-02-07

Status: Needs Feedback
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.11.5
Fix Version/s: 10.11

Type: Bug Priority: Critical
Reporter: yangtao Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: None

Attachments: PNG File image-2023-11-16-15-05-26-913.png     PNG File image-2023-11-16-15-06-13-780.png     PNG File image-2023-11-16-15-06-43-866.png     PNG File screenshot-1.png     PNG File screenshot-2.png    
Issue Links:
Relates
relates to MDEV-32811 Potentially broken crash recovery if ... Closed

 Description   





 Comments   
Comment by Daniel Black [ 2023-11-16 ]

google translate:

After recently upgrading the version to 10.11.6, shortly after frequent read and write operations on the table, index for table xxxx is corrupt appeared, then tablespace xxxxxx corrupted, and finally Tablespace is missing for a table, this table is completely unavailable. use

Linked is potentially related issue under investigation.

Comment by Marko Mäkelä [ 2023-11-16 ]

The text in the image file image-2023-11-16-15-06-13-780.png mentions a tablespace ID 281473165384968 (0xffff94095d08). This is somewhat strange. In MariaDB Server 10.7, the data type of fil_space_t::id was changed from ulint (size_t) to uint32_t. I do not see where the tablespace ID could be sign-extended, to only 48 bits, instead of 64 bits. It turns out that this actually is a fil_space_t* being mis-interpreted as unsigned long int in row_mysql_get_table_status():

			if (push_warning) {
				ib_push_warning(trx, DB_CORRUPTION,
					"Table %s in tablespace %lu corrupted.",
					table->name.m_name, table->space);
			}

Unfortunately, no progress has been made on MDEV-21978, and therefore this function is not declared with the GCC function attribute printf so that such mismatch could be caught at compilation time.

If this is a LP64 platform, sizeof(unsigned long)=sizeof(void*) should hold, so the pointer in fact should be 0xffff94095d08. If this were an LLP64 platform (such as Microsoft Windows), then we should have sizeof(unsigned long)=4 and only the least significant 32 bits of the pointer should have been reported.

The pointer looks invalid to me. I would expect functions like malloc() to return at least 16-byte aligned memory, but here the pointer is only aligned to 8 bytes.

Which operating system (OS) and instruction set architecture (ISA) does this occur on? Has the system memory been tested? Is the bug repeatable when starting up the database on a copy of this data directory in another environment? If the hardware is fine, the cause of this could also be a memory corruption due to a software bug.

Comment by yangtao [ 2023-11-16 ]

此现象发生在麒麟操作系统,ARM架构。系统内存有一百多G可用;起初版本是10.11.4,多次出现过表索引损坏的现象,后续发现10.11.5版本修复了很多问题,故将版本升级到此版本后,发现了当前现象(index for table is corrupt => table xxxx in tablespace xxxx corrupted => Tablespace is missing),未发现硬件问题

Comment by Marko Mäkelä [ 2023-11-16 ]

Thank you. The wrong error message is just an "icing on the cake"; we would need to find the cause of the corruption. 10.11.4 suffered from the bug MDEV-31767, which I think triggered bogus error messages that claimed tables to be corrupted when there was no actual corruption.

If you are using FOREIGN KEY constraints, the recently released 10.11.6 would include a fix of MDEV-30531. But, your error messages seem to suggest that the problem is something else, not limited to secondary indexes.

Would it be possible to make a logical dump (mariadb-dump or mysqldump) and load it to a newly created MariaDB Server 10.11.6 instance? This would include some corruption bug fixes, such as MDEV-31826, MDEV-32552, and MDEV-32511.

By starting the server from the scratch, we could be sure that there is no pre-existing corruption in the data files. For example, if the undo logs or the transaction system metadata are in some way corrupted, that could cause trouble. Likewise, there could be some old garbage in the change buffer, even if it was disabled by default for new data already in MDEV-27734.

Comment by yangtao [ 2023-11-20 ]

我们再一次复现出index corrupted的问题:
Warning : InnoDB: Index PRIMARY is marked as corrupted
Warning : InnoDB: Index ix_tgom_operation_log_created_time is marked as corrupted
Warning : InnoDB: Index ix_tgom_operation_log_status is marked as corrupted
Warning : InnoDB: Index ix_tgom_operation_log_level is marked as corrupted
error : Corrupt

通过日志再次发现同时间点出现的日志:
[ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-11-20 1:00:05 15122 [ERROR] mariadbd: Index for table 'tgom_operation_log' is corrupt; try to repair it

这次与上次异常打印的日志的时间点都差不多,两次异常相同的操作有:1、当时时间点对此表有大量的读写操作,2、当时时间点有执行过mysqldump进行备份。

Comment by Marko Mäkelä [ 2023-11-20 ]

The use of mysqldump (or mariadb-dump) could cause a lot of data to be loaded from the file system to the InnoDB buffer pool. If you were using a version that was affected by the race condition bug MDEV-31767, this could be a "false alarm", that is, the data is not actually corrupted. But, you reported this bug for MariaDB 10.11.5.

Unfortunately, it looks like the table really is corrupted. What would the following statement report?

CHECK TABLE tgom_operation_log EXTENDED;

It could be a good idea to execute that for all tables. There is also a separate mariadb-check or mysqlcheck utility for running such commands; see MDEV-30129.

If you do not have a backup of a corrupted table, you could try to dump its contents in multiple parts:

SELECT * FROM tgom_operation_log WHERE pk > XXXX;

Replace pk with the name of the PRIMARY KEY column, and XXXX with something larger than the latest primary key value that was reported in the previous SELECT.

Comment by yangtao [ 2023-11-20 ]

执行CHECK TABLE tgom_operation_log EXTENDED后结果如下:
-------------------------------------------------------------------------------------------------------------+

Table Op Msg_type Msg_text

-------------------------------------------------------------------------------------------------------------+

tgom.tgom_operation_log check Warning InnoDB: Index PRIMARY is marked as corrupted
tgom.tgom_operation_log check Warning InnoDB: Index ix_tgom_operation_log_created_time is marked as corrupted
tgom.tgom_operation_log check Warning InnoDB: Index ix_tgom_operation_log_status is marked as corrupted
tgom.tgom_operation_log check Warning InnoDB: Index ix_tgom_operation_log_level is marked as corrupted
tgom.tgom_operation_log check error Corrupt

-------------------------------------------------------------------------------------------------------------+
此现象的根因到底会是什么呢,而且10.11.6版本是否修复了呢(我们使用的是General集群)

Comment by Marko Mäkelä [ 2023-11-20 ]

Is there any more output from the CHECK TABLE statement in the server error log?

It is hard to say for sure if this error has been fixed, because we do not know what exactly caused this corruption in the first place. A number of bugs that affect crash recovery and backup have been fixed between 10.11.4 and 10.11.6. A good candidate would be MDEV-32552. It has been there for a rather long time, but we only noticed it recently.

I would suggest to upgrade the server to 10.11.6 and restore everything from a logical dump (create a new data directory). If the corruption still occurs, we would need more details to analyze the root cause.

Comment by Marko Mäkelä [ 2023-11-20 ]

This is a long shot, but MDEV-29832 and MDEV-29610 hint that this could also be a Linux kernel bug. In MDEV-30728, you can find links to a kernel bug that was fixed on POWER. I can imagine that something similar could affect specific ARM setups, such as multi-socket CPU configurations. Note: I do not think that MDEV-28430 can cause this form of corruption.

Comment by yangtao [ 2023-11-20 ]

但是我在你给的链接中看到一个关联的 MDEV-32170 看到一个评论:将10.11降级到10.3或10.5就可以完全解决损坏问题,是真的吗

Comment by Marko Mäkelä [ 2023-11-20 ]

If there is a Linux kernel bug that would explain this corruption, that would likely be in the io_uring interface for which MariaDB implemented support in MDEV-24883. It is also possible to compile the server with the older libaio (io_submit(), io_getevents()). The libaio interface is much older and more stable, originally appearing somewhere in Linux 2.6.

Comment by yangtao [ 2023-11-21 ]

我们编译服务器的时候没有指定,似乎默认就是用libaio

Comment by yangtao [ 2023-11-21 ]

我们在系统日志中发现我们服务所在盘经常打印VFS: Open an exclusive opened block device for write xxxx ,不知道此是否会有影响呢

Comment by Marko Mäkelä [ 2023-11-21 ]

Thank you for the update. By default, the MariaDB build scripts use whatever is available. If there is libaio-dev but not liburing-dev installed in the build environment, then the older interface will be used. Because your build used libaio, we can rule out a possible kernel bug that would affect io_uring on ARM.

What is your storage layer like? Which file system(s) is the data stored on? Is there any LVM or software RAID? NFS? NAS? You might want to check if reverting to buffered I/O (innodb_flush_method=fsync) would stop the corruption from occurring. The default was changed in MDEV-24854.

danblack, since you are much more familiar with the Linux kernel than me, can you comment on that VFS error message?

Comment by Daniel Black [ 2023-11-22 ]

Its from https://patchwork.kernel.org/project/linux-fsdevel/patch/1557845106-60163-2-git-send-email-yi.zhang@huawei.com/

Notably this commit isn't in the current Linus kernel.

Per the message it is from two open write handles on the same block device.

Are you using innodb block devices directly? Any idea what else could be running? A LVM resize is a possibility.

Per the commit message it, need not be the cause, but knowing which are the things opening the block device would be good to know.

Storage information previously asked for would be quite useful. Which kernel version?

Comment by yangtao [ 2023-11-22 ]

你好,最近我们在尝试复现,看看此现象出现的必要条件,最新一次在10.11.5版本上又复现,

从mysql日志中只看到:
2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 1073 of index `PRIMARY` of table `tgom`.`tgom_operation_log`
InnoDB: broken FIL_PAGE_NEXT link
2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 446 of index `ix_tgom_operation_log_level` of table `tgom`.`tgom_operation_log`
InnoDB: broken FIL_PAGE_NEXT link
2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 439 of index `ix_tgom_operation_log_created_time` of table `tgom`.`tgom_operation_log`
InnoDB: broken FIL_PAGE_NEXT link
2023-11-22 14:21:17 12935 [ERROR] InnoDB: In page 3560 of index `ix_tgom_operation_log_status` of table `tgom`.`tgom_operation_log`
InnoDB: broken FIL_PAGE_NEXT link

目前也在相同的方法在10.11.6版本尝试是否能复现~

Comment by Marko Mäkelä [ 2023-11-22 ]

1406921854@qq.com, thank you, and good luck with narrowing this down. Currently I would suspect a bug in the kernel version that you are using, but we will see how it turns out.

Comment by yangtao [ 2023-11-22 ]

你好,经过我们测试验证,目前在10.11.6版本中也复现:

Comment by Marko Mäkelä [ 2023-11-22 ]

The 18446744073709551615 in screenshot-2.png corresponds to ULINT_UNDEFINED in the source code. I think that we report that value when the PRIMARY key index tree (also known as the clustered index) is corrupted. That the "next" and "previous" page links are inconsistent is a serious corruption.

Comment by yangtao [ 2023-11-23 ]

目前我们将版本降级到10.3.39版本,以相同的方法复现,运行了接近1天,未曾复现此问题

Comment by Marko Mäkelä [ 2023-11-23 ]

Thank you. Is the problem reproducible in 10.11.6 you configure innodb_flush_method=fsync? The recent new default O_DIRECT (MDEV-24854) turned out to cause corruption on some file systems in some Linux kernel versions.

Comment by Marko Mäkelä [ 2023-12-05 ]

If you run MariaDB Server 10.3.39 with innodb_flush_method=O_DIRECT, will you experience similar corruption? The Linux kernel bug that was behind MDEV-30728 is specific to the POWER ISA, so it should be unrelated to the ARMv8 ISA that you are using.

Comment by yangtao [ 2023-12-05 ]

当前我们一直用的是10.3.39版本 innodb_flush_method=fsync;验证了多次,未曾复现当前问题

Comment by Marko Mäkelä [ 2024-01-09 ]

Would MariaDB Server 10.3.39 start to corrupt data when you set the parameter innodb_flush_method=O_DIRECT?

Would MariaDB Server 10.11.6 avoid data corruption if you set innodb_flush_method=fsync?

Generated at Thu Feb 08 10:34:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.