[MDEV-24108] Our server crashed and there was no possibility to restart it Created: 2020-11-03  Updated: 2021-04-25  Resolved: 2021-04-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Sergii Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-514.el7.x86_64
Architecture: x86-64


Attachments: Text File mariadb-for-jira.log    
Issue Links:
Relates
relates to MDEV-19916 Corruption after instant ADD/DROP and... Closed

 Description   

Suddenly we started receiving the next errors:
Error Code: 2013 Lost connection to MySQL server at 'reading initial communication packet', system error: 0
Error Code: 2003 Can't connect to MySQL server on '185.156.42.233' (10061)

After checking the log we found out that MariaDB is not able to restart after some fatal crash. Please investigate the log, I've attached the whole day trace, you may find a crash starting at about 12:55.



 Comments   
Comment by Marko Mäkelä [ 2020-11-03 ]

Could this report duplicate MDEV-19916, which was fixed in 10.3.17?
mariadb-for-jira.log contains the following, which might match that bug:

mariadb-10.3.11

2020-11-03 12:55:04 0x7fc7877fe700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.11/storage/innobase/rem/rem0rec.cc line 820
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/
InnoDB: about forcing recovery.
201103 12:55:04 [ERROR] mysqld got signal 6 ;
2020-11-03 12:55:05 0 [ERROR] InnoDB: Wrong owned count 11, 3, rec 7484

I cannot say this for sure, and this is a bug tracker, not a customer support platform.

Comment by Sergii [ 2020-11-04 ]

As I understood from logs the rootcause of this behavior was described by these statements:

2020-11-03 12:55:05 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 3123149291, calculated checksums for field1: crc32 3123149291/2544593751, innodb 4231256809, page type 17855 == INDEX.none 3735928559, stored checksum in field2 3123149291, calculated checksums for field2: crc32 3123149291/2544593751, innodb 3996955962, none 3735928559, page LSN 1315 2769137356, low 4 bytes of LSN at page end 2769137356, page number (if stored to page already) 469, space id (if created with >= MySQL-4.1.1 and stored already) 403452

2020-11-03 12:55:05 0 [Note] InnoDB: Page may be an index page where index id is 541398

2020-11-03 12:55:05 0 [Note] InnoDB: Index 541398 is `PRIMARY` in table `pxbjyhhu_main1`.`oc_promo_mailing_queue`

2020-11-03 12:55:05 0 [ERROR] [FATAL] InnoDB: Apparent corruption of an index page [page id: space=403452, page number=469] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.

For me, it seems like server had an intention to update an index page, understood that some calculated result was invalid and tried to prevent the data corruption by crashing itself.
I also found the same trace when we had this issue half a year ago. The only difference was in the table name.

Comment by Daniel Black [ 2021-03-28 ]

If you've restored from a backup and are running >10.3.17 we'd like to know if this happens again.

Generated at Thu Feb 08 09:27:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.