[MDEV-32817] 在最近将版本升级到10.11.5后，针对表进行频繁的读写操作不久后，出现index for table xxxx is corrupt，随后此表tablespace xxxxxx corrupted，最后Tablespace is missing for a table，此表已完全不可用 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Incomplete
Affects Version/s: 10.11.5
Fix Version/s: N/A
Component/s: None
Labels:
None

Description

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

image-2023-11-16-15-05-26-913.png
2023-11-16 07:05
82 kB
yangtao
image-2023-11-16-15-06-13-780.png
2023-11-16 07:06
55 kB
yangtao
image-2023-11-16-15-06-43-866.png
2023-11-16 07:06
44 kB
yangtao
screenshot-1.png
2023-11-22 06:37
13 kB
yangtao
screenshot-2.png
2023-11-22 11:35
70 kB
yangtao

Issue Links

relates to

MDEV-32811 Potentially broken crash recovery if a mini-transaction frees a page, not modifying previously clean pages

Closed

MDEV-34479 mariadb 10.11.5 bulk insert: Index for table 'sbtest24' is corrupt; try to repair it

Closed

MDEV-34480 mariadb 10.11.5 bulk insert: crash, errorlog: InnoDB: Assertion failure

Closed

MDEV-34823 Invalid arguments in ib_push_warning()

Closed

MDEV-33189 Server crash after reading outside of bounds on ibdata1 , file corrupted, no auto-recovery

Closed

MDEV-34453 Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1

Closed

(1 relates to)

Activity

Ascending order - Click to sort in descending order

View 37 older comments

ls added a comment - 2024-07-11 03:05

Nearly two weeks have passed. I wonder if the cause of this problem has been determined. I am considering upgrading to 10.11.8 recently, but I am not sure what caused this problem. So I am not sure if 10.11.8 will have this problem again, which will prevent me from upgrading.

ls added a comment - 2024-07-11 03:05 Nearly two weeks have passed. I wonder if the cause of this problem has been determined. I am considering upgrading to 10.11.8 recently, but I am not sure what caused this problem. So I am not sure if 10.11.8 will have this problem again, which will prevent me from upgrading.

Daniel Black added a comment - 2024-07-11 04:37

I tried the sysbench from ~~MDEV-34480~~ (without your configuration file and set innodb buffer pool size to 5G) and got the errors in ~~MDEV-34566~~.

I encourage you to try the docker compose file and adjust the version in the container image. Should work just as well on Aarch64 (though the sysbench is amd64 image only sorry - will probably emulate fast enough to generate a load)

Daniel Black added a comment - 2024-07-11 04:37 I tried the sysbench from MDEV-34480 (without your configuration file and set innodb buffer pool size to 5G) and got the errors in MDEV-34566 . I encourage you to try the docker compose file and adjust the version in the container image. Should work just as well on Aarch64 (though the sysbench is amd64 image only sorry - will probably emulate fast enough to generate a load)

Debarun Banerjee added a comment - 2024-07-11 19:02 - edited

I ran sysbench with the exact same configuration as in ~~MDEV-34479~~ in x86-64. My machine is 16 core with 24 cpu(hyper-threaded). I tried with using 4, 8 and all the cores.

taskset -c 0,2,4,6,8,10,12,14 /home/hdd/deb/maria-src5/bld_rel_10.11.5_7875294b6b/sql/mariadbd

The load was successful in all cases and the issue didn't repeat. Looking back I see ls mentioned already that the test runs fine with MariaDB official binary.

So, we have 3 bugs reported from this issue. Here is what I understand as the current state.

~~MDEV-34479~~: x86_64: Index corrupt during sysbench load. Requires building MariaDB from source with specific CentOS version to repeat.
~~MDEV-34480~~: aarch64: Page number, offset assert. Requires building MariaDB from source with specific CentOS version to repeat.

To investigate ~~MDEV-34779~~/~~MDEV-3448~~ further, we need to first repeat it by re-building mariadb in the exact environment specified in respective MDEVs with centos, compiler version etc.

~~MDEV-34566~~: "Crash recovery was broken" message. This is unrelated to the other 2 bugs and happens because redo log size is kept at 96M. This is too small and there is not enough margin when 32 concurrent transactions are running. This is known limitation in Innodb in general and is generally resolved with larger redo log size. We could improve the error message here. Solving the root cause would require major design overhaul and may not justify the ROI as it is generally not encountered by the user base if redo is well configured.

Debarun Banerjee added a comment - 2024-07-11 19:02 - edited I ran sysbench with the exact same configuration as in MDEV-34479 in x86-64. My machine is 16 core with 24 cpu(hyper-threaded). I tried with using 4, 8 and all the cores. taskset -c 0,2,4,6,8,10,12,14 /home/hdd/deb/maria-src5/bld_rel_10.11.5_7875294b6b/sql/mariadbd The load was successful in all cases and the issue didn't repeat. Looking back I see ls mentioned already that the test runs fine with MariaDB official binary. So, we have 3 bugs reported from this issue. Here is what I understand as the current state. MDEV-34479 : x86_64: Index corrupt during sysbench load. Requires building MariaDB from source with specific CentOS version to repeat. MDEV-34480 : aarch64: Page number, offset assert. Requires building MariaDB from source with specific CentOS version to repeat. To investigate MDEV-34779 / MDEV-3448 further, we need to first repeat it by re-building mariadb in the exact environment specified in respective MDEVs with centos, compiler version etc. MDEV-34566 : "Crash recovery was broken" message. This is unrelated to the other 2 bugs and happens because redo log size is kept at 96M. This is too small and there is not enough margin when 32 concurrent transactions are running. This is known limitation in Innodb in general and is generally resolved with larger redo log size. We could improve the error message here. Solving the root cause would require major design overhaul and may not justify the ROI as it is generally not encountered by the user base if redo is well configured.

Marko Mäkelä added a comment - 2024-08-19 05:42

ls, I believe that ~~MDEV-33588~~ could have fixed this. The fix was included in the previous quarterly releases (including 10.11.8), over 3 months ago. The most recent quarterly releases included some performance fixes, but sadly not a fix of ~~MDEV-34689~~, which I think would be much harder to hit than the bugs that ~~MDEV-33588~~ fixed.

It looks like debarun mistyped ~~MDEV-34479~~, which is another ticket from you. Can you reproduce the corruption with 10.11.8 or 10.11.9?

Marko Mäkelä added a comment - 2024-08-19 05:42 ls , I believe that MDEV-33588 could have fixed this. The fix was included in the previous quarterly releases (including 10.11.8), over 3 months ago. The most recent quarterly releases included some performance fixes, but sadly not a fix of MDEV-34689 , which I think would be much harder to hit than the bugs that MDEV-33588 fixed. It looks like debarun mistyped MDEV-34479 , which is another ticket from you. Can you reproduce the corruption with 10.11.8 or 10.11.9?

Marko Mäkelä added a comment - 2024-08-27 15:20

I filed ~~MDEV-34823~~ for the incorrect tablespace ID output in the corruption message.

Marko Mäkelä added a comment - 2024-08-27 15:20 I filed MDEV-34823 for the incorrect tablespace ID output in the corruption message.

MariaDB Server

在最近将版本升级到10.11.5后，针对表进行频繁的读写操作不久后，出现index for table xxxx is corrupt，随后此表tablespace xxxxxx corrupted，最后Tablespace is missing for a table，此表已完全不可用

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration