[MDEV-26552] MyISAM index corruption on 400M rows Created: 2021-09-06  Updated: 2021-11-26  Resolved: 2021-11-26

Status: Closed
Project: MariaDB Server
Component/s: N/A
Affects Version/s: 10.5.11, 10.5.12
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Vassilis Virvilis Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: corruption
Environment:

debian unstable


Attachments: File mariadb-groupby-fails-bug.sh     File mysql-index-corruption-bug.sh    
Issue Links:
Relates
relates to MDEV-16461 MyISAM creates defect indexes on varc... Confirmed

 Description   

Hi

I have table corruption caught by check table.

I narrowed my ~400M rows table to a single integer column. The error happens after index creation if you run check table.

I managed to recreate it with a random set of 70K values distributed in 400M rows.

See the attached script. It completes in 32' in my system (oldish xeon with NVM) and it fails
with

--------------
CHECK TABLE bugtable
--------------
 
+---------------+-------+-------+------------------------------------------+
| budb.bugtable | check | error | Key in wrong position at page 1023494144 |
| budb.bugtable | check | error | Corrupt                                  |
+---------------+-------+-------+------------------------------------------+
2 rows in set (38.701 sec)

The script is easily run with:

time ./mysql-index-corruption-bug.sh bugdb bugtable;



 Comments   
Comment by Vassilis Virvilis [ 2021-11-25 ]

I think I found another manifestation of this bug.

GROUP BY fails to produce unique output (produces duplicate rows) in filesort mode.

I theorize that this is because when in filesort mode it first creates a sort index and that part fails.

Further experimentation showed that the bug manifests when the number of rows are more than 390136719 (lower than 390039063 works)

Finally the bug does not exists in 10.6.4

How to replicate this one: check the replicator script ./mariadb-groupby-fails-bug.sh

You can run it with
./mariadb-groupby-fails-bug.sh database bugtable

I would be great to know
a) the commit that fixed the issue
b) if a test that could guard against this could be devised?

Vassilis

Comment by Vassilis Virvilis [ 2021-11-25 ]

First of all - my bad

It was a configuration error after all.

I upgraded to 10.6.4 and the bug persists. Since I had other machines with 10.6.4 where the bug doesn't manifest I thought it must be an error in configuration.

Indeed I checked again and lo and behold I spot this

sort-buffer-size = 8G

It was supposed to be 8M.

So I switched it to 8M and everything works now.

Maybe this situation warrants a warning...

Sorry for the noise.

Vassilis

Comment by Alice Sherepa [ 2021-11-26 ]

Great - I'm glad to hear this!

Generated at Thu Feb 08 09:46:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.