Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
12.2.2
-
None
-
HW: Dell r720xd, 64G ECC RAM, 2xE5-2630v2 (12 cores, 24 threads in total)
OS: Debian Bookworm (oldstable) amd64
Dedicated (not running anything else)
Description
Hello everybody,
I rolled out an upgrade to on of our servers, 10.7.GodKnowsWhat to 12.2.2, this weekend.
We have three tables, name them log%u, regularly reaching having 1B, 3B, and 25M rows, respectively.
(Most unwanted, sure, but ingress is so dense the maintainer EVENT I added has had no real chance to obtain a table-level lock.)
Until today, unpartitioned.
So afterwards, I needed to reconfigure primary key for partitioning.
(Before starting, I stopped all connecting services and ALTER EVENT DISABLEd that, so none of these could be the cause.)
In doing this, I tried to
1. Drop
PRIMARY KEY |
(failed, it needs to be an index, beyond me why - not the issue at hand)
2. Drop
AUTO_INCREMENT
|
from
PRIMARY KEY |
column
ALTER TABLE %s CHANGE id id BIGINT(20) UNSIGNED NOT NULL; |
3. Drop
PRIMARY KEY |
again
4. Recreate
PRIMARY KEY |
as a compound (the previous
AUTO_INCREMENT
|
, as well as the
TIMESTAMP ON UPDATE CURRENT_TIMESTAMP() |
creation timestamp column
5. Make the ID column
AUTO_INCREMENT
|
again
Points 2-5 above were the flow I would have gone for the rest of the tables.
To my utter dismay, the server started all phases with
Stage1of4, copying
then at
Stage2of4,enabling keys
it regularly stopped, throwing
#1034 Index for table 'user' is corrupt; try to repair it. |
Now
REPAIR TABLE %s |
obviously did nothing, for it being not MyISAM but InnoDB.
ANALYZE TABLE %s EXTENDED |
said "all OK".
Then the same command failed in the exact same manner..
The only thing that worked was
CREATE TABLE %s LIKE %s_old |
, then do the modifications, finally
INSERT INTO %s SELECT * FROM %s_old |
for all the data.
The very next time I tried to do something else with the new table, it presented the same error.
I contemplated this being because I missed the part that I should have run mariadb-upgrade (I have no idea what it even is; I was in the belief all if any of this shall be done by triggers from the deb package.)
But, most importantly, this did not manifest with the table with 1-5M rows, it only happened with the other two tables.
So I strongly lean towards believing this has to do something whether the whole table fits in RAM or not.
I may be completely be wrong.
For the record, I checked the syslog and found no ECC correction notices.
(Nor anything OOM killer, for the record. I did the tuning - I'd say - with quite some consideration on that version. Not touched since though.)
I then figured out one of the operations - changing the datetime default to utc_timestamp from current_timestamp - that failed the same way at the same stage completed without issue when I used
ALGORITHM=INPLACE
|
.
Not sure if this helps at all; probably not.