[MDEV-4662] InnoDB: Use of large externally-stored fields makes crash recovery lose data Created: 2013-06-14  Updated: 2014-11-10  Resolved: 2014-11-10

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.0.3, 5.5.31
Fix Version/s: 10.0.14

Type: Bug Priority: Major
Reporter: Jeremy Cole Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: upstream-fixed
Environment:

All


Attachments: File innodb_blob_unrecoverable_crash.test    

 Description   

When too-large blob fields are used, this is noted to the administrator in a rather innocuous looking message:

InnoDB: ERROR: the age of the last checkpoint is XXX,
InnoDB: which exceeds the log group capacity YYY.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.

I would have expected that this means that InnoDB is stalling in order to make more space in its redo logs. However, what it actually means is that InnoDB has overwritten its most recent checkpoint in its redo logs. This compromises crash recovery, potentially causing data loss (or even metadata loss, such as writes to data dictionary tables or system tablespace data). This is easily reproducible using the attached test case.

This appears to happen because externally-stored fields are always written in a single batch to the redo logs, all while holding the log mutex, thus making it impossible to checkpoint during that write. There are several possible solutions to this:

1. Allow flushing to "catch up" and checkpoint during large external field writes. This will involve releasing the log mutex during the write, which is likely complex.

2. Disallow (at least optionally) such large writes. Disallowing external field writes which sum to more than 10% of the total redo log space will in theory prevent this problem, because log_free_check() is called before the write of the external field, and (although it has some races) it should ensure that 10% of the log space is available before starting the write.

This issue exists in all versions of MySQL and MariaDB.



 Comments   
Comment by Elena Stepanova [ 2013-06-14 ]

I think it was even documented somewhere, wasn't it?
Or maybe it was an old bug report...

Comment by Jeremy Cole [ 2013-06-14 ]

Elena: There have been a few bug reports about this, but none of them have touched on the core issue that this compromises the actual ACID properties of InnoDB itself. Personally I would rather have InnoDB assert itself with "durability may be compromised by continuing" rather than its current situation which could best be described as "keep going and hope for the best".

Most of the bug reports have been either "Can't repeat" or have had people increase their logs and "it doesn't happen anymore". This bug report was a first effort to discuss the actual problem and propose some fixes.

Comment by Jeremy Cole [ 2013-06-14 ]

Also reported as: http://bugs.mysql.com/bug.php?id=69477

Comment by Elena Stepanova [ 2013-06-14 ]

Right, thanks. I just had a vague remembrance and was wondering if it's the same issue.
Now when you mentioned increasing the logs, I recalled it – the problem happened in tests a lot, and the usual recipe was "the log size is not enough for your flow, set it to a higher value" (and I think the default value got increased eventually).
I certainly agree it would be good to fix it properly.

Comment by Jeremy Cole [ 2014-05-31 ]

Note that this was the other day marked as fixed in upstream MySQL Bug 69477. Perhaps it should be merged in once they release it.

Comment by Elena Stepanova [ 2014-11-10 ]

Upstream bugfix in 5.6.20:

revno: 5958
revision-id: annamalai.gurusami@oracle.com-20140522155303-y5bvfo4sq0tdls98
parent: marko.makela@oracle.com-20140522115539-2yijjno0m7n65i7o
committer: Annamalai Gurusami <annamalai.gurusami@oracle.com>
branch nick: mysql-5.6
timestamp: Thu 2014-05-22 21:23:03 +0530
message:
  Bug #16963396 INNODB: USE OF LARGE EXTERNALLY-STORED FIELDS MAKES CRASH
  RECOVERY LOSE DATA
  
  Problem:
  
  When too-large blob fields are used, InnoDB overwrites its most recent
  checkpoint in its redo logs.
  
  Solution:
  
  Ensure that the total blob length does not exceed 10% of the redo log file
  size.
  
  rb#5399 approved by Marko, Nuno, Manish. 
  Venkat also contributed to patch (in replication related test case).

Not reproducible on the current 10.0 (10.0.14+) tree, which is expected since InnoDB 5.6.20 was merged into 10.0.14.

Generated at Thu Feb 08 06:58:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.