[MDEV-8421] ibdata1 size growing too much but cannot be explained by usage Created: 2015-07-03 Updated: 2021-01-16 Resolved: 2021-01-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB, Storage Engine - XtraDB |
| Affects Version/s: | 10.0.14-galera |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Petros Moisiadis | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Linux db1 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2 x86_64 GNU/Linux |
||
| Description |
|
We recently had an issue with one node of our 3-node galera cluster eating all hard disk space. It turned out to be that the ibdata1 file had grown up too much in size. However, this increase cannot be explained by the cluster's everyday usage. The cluster is hosting one database that has reached the size of 25GB over a span of 5 years. When all free space was consumed, the size of ibdata1 had grown up to ~12GB, which is about 48% of the whole database size. Using 'innochecksum' and 'innodb_space' tools, we found out that 97% of pages in the ibdata1 file is undo log pages. Unfortunately the cluster was loosely monitored at the time of the event, so we do not know if the increase was rapid or increamental over a longer period of time. In any case, the size of 12GB seems to be too big to have been caused by normal usage. After recovering from the situation (we rebootstrapped the cluster on one of the other two nodes and rejoined the affected node after removing its data files), we started monitoring ibdata1 on all 3 nodes. With the same everyday usage on the cluster, ibdata1 now remains the same size (~76MB) on all 3 nodes. So we are not seeing any incremental increase in size. Some other, possibly relevant, facts:
We have kept a copy of the 12GB ibdata1 file for further investigation, but we are not sure about what to investigate. Maybe, there is some repeatability of data that could make us think that an endless loop wrote 12GB of data before eating all space, or maybe there is something else. We will really appreciate it if we get any advice on this by innodb / xtradb experts out there. Thank you in advance. |
| Comments |
| Comment by Elena Stepanova [ 2015-07-08 ] |
|
Could you please take a look at this and maybe advise on how to proceed with the investigation more efficiently. |
| Comment by Jan Lindström (Inactive) [ 2015-07-21 ] |
|
See http://blog.jcole.us/2014/04/16/the-basics-of-the-innodb-undo-logging-and-history-system/. While InnoDB purge does freeing undo log pages and unlinking them from the global history list to make them available for re-use. They remain on ibdata file. These free pages are never currently returned to OS and naturally there are several situations where these pages can't be really re-used e.g. on situation where new undo pages are bigger than fragmented datafile pages. I do not know any means to really investigate after something mysterious has happened, only way is to monitor current disk usage and if some unexplained allocations are done see what operations really did cause that. |
| Comment by Elena Stepanova [ 2015-08-06 ] |
|
According to the initial description, the mysteriously big ibdata file is available. Is there any way to analyze its contents to see what really takes all the space, whether it the undo pages or anything else? |
| Comment by Petros Moisiadis [ 2015-08-17 ] |
|
As we have already reported, 97% of pages consists of undo pages indeed, but the size of the data is too big to have been produced by normal usage. So, we suspect that something has stuck and caused an abnormal amount of undo logs to be written. Since an infinite loop could lead to a lot of undo logs written with the same It's also important to note that this happened only few days after we intorduced the "INSERT INTO ... SELECT ..." issue in our code and has not happened again since fixing this. So, it is very likely that the bug |
| Comment by Marko Mäkelä [ 2016-11-23 ] |
|
The worst case scenario with regard to undo log growth involves big transactions with DML operations and long-running transactions (which could be the large DML operation, or just some read-only transaction that was started but not committed yet). An unlikely possibility could be a bug in the InnoDB page allocation, causing the data file to grow even though reusable (freed) pages exist. https://bugs.mysql.com/bug.php?id=59783 was fixed in 2011. |