[MDEV-32663] mariadb filesystem size increasing for wp_options.ibd files Created: 2023-11-02 Updated: 2024-01-23 |
|
| Status: | Needs Feedback |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6.15 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Gabriel Tziotzis | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Ubuntu 20.04 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
I'm in a situation with a wordpress site where a table wp_options keeps increasing in filesystem size really fast even though it only has 10000 records and we have to constantly optimize it to reduce the usage. It's my understanding that when records are constantly deleted / inserted they are not actually deleted from the ibd file but they're kind of soft-deleted. Thus the filesystem size on the file actually increases constantly reaching even 500G within 2 days. In order to fix the issue I need to actually see what is being written / deleted from the table constantly to find possible plugins that might be affecting it. I can safely assume it's wordpress transients however I can't say with certainty which type of transient is causing this bloat without examining the data. So in order to find this out I'm looking for ways to read the soft-deleted data from the bloated ibd file. I've tried a couple of tools like undrop-for-innodb but it's not yielding any results. Any ideas how I can read the mass of deleted data that's occupying the filesystem space? I also think that there should be a limit to the expansion of the table with deleted data. Like, why would a table of 20MBs in size need 500GB of deleted data in the filesystem? |
| Comments |
| Comment by Marko Mäkelä [ 2023-11-02 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I think that this could be a duplicate of Furthermore, MariaDB Server 10.6.16 with a fix of | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-02 ] | ||||||||||||||||||||||||||||||||||||||||||
|
With the table at 1.6GB size the history length was 2317 | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
With the table at 50GB size after a single day History list length 24897 | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
The history list is measured in transactions, not rows. If the transactions are updating indexed columns, they could cause the secondary indexes to grow a lot. Note that InnoDB never shrinks .ibd files, except when the table is being rebuilt due to an operation like OPTIMIZE TABLE, ALTER TABLE, or TRUNCATE TABLE. So, even if the purge of history caught up, the file size would likely remain at 50GB until it needs to be extended again. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I'm not sure how to respond to this as from a user's point of view, having the table increase to 500GB by deleted data or indexes which are oversized compared to the actuall necessary data appears unreasonble. Basically as a user when I delete the data I expect the data to be deleted and the space reclaimed if all went properly. Does the table description help in deducing why the space is never reclaimed? If not how can I assist in troubleshooting this?
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
To understand why this is as it is, you could watch these presentations: | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
OK sure, but what is the end-user supposed to do if that's the case? 3 facts remain at the very lest 1. Obviously we need the many reads / writes / deletes otherwise the app wouldn't do it in the first place. How is a user supposed to tackle this? From the settings
-------------------------
------------------------- By default it's turned to off, but does it mean that enabling innodb_undo_log_truncate would actually reclaim the space? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
The setting innodb_undo_log_truncate is related to truncating dedicated innodb_undo_tablespaces. The space that is occupied by undo log records in the system tablespace can only be reclaimed in very recent versions that include The end user is expected to upgrade to MariaDB 10.6.16 once it is released, and report if the problem was fixed by that. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
OK I see, it sounds reasonable. Until then is it possible to mitigate ? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hi @Marko I'm now on Server version: 10.6.16-MariaDB-1:10.6.16+maria~ubu2004 mariadb.org binary distribution and the issue's impact is reduced but not eliminated I still get a bloated filesystem within a few days. At least it's not 500G which is progress but still the increase is there for indeterminate time and resources are not being released. While I understand the concept you shared in the videos, having ever-growing tables in the filesystem holding deleted data doesn't help. Optimizing the DB tables constantly doesn't help with performance or downtimes either. Before optimize After optimize I'll provide the history list length when it increases again as after optimize it's > History list length 25 | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
The same table is now 1.2G and history list length is 160 | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
ioweb.gr, thank you for the update. It seems to me that after the fix of The remaining growth could also be related to the way InnoDB allocates pages within the data file. That may not have been improved enough in Can you try to get some page-level statistics from the data file? To force all data to be written back from the buffer pool to the data file, you might execute
(or alternatively tune innodb_max_dirty_pages_pct and wait some time as noted in MDEV-30000) and then execute the innochecksum utility on the wp_options.ibd file, to determine which indexes are consuming the most space within the data files. The INDEX_ID should correspond to INFORMATION_SCHEMA.INNODB_SYS_INDEXES.INDEX_ID. There is an attribute COMMENT 'MERGE_THRESHOLD=50' that was introduced in MySQL 5.7 and MariaDB Server 10.2, for tables as well as individual indexes. You could try to experiment with it. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
This didn't work. The size remains at over 1.3G
What should I put for innodb_max_dirty_pages_pct? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
ioweb.gr, the only purpose of the SQL statements was to ensure that innochecksum can access a reasonably recent version of the data in the .ibd file. We’d want to see what the data pages inside the .ibd file are being used for. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I see and please excuse me if I'm not being easy to assist but I'm not well versed into database administrative tasks. I'm assuming I need to run
And then check for example index_id 31538956 which seems to consume most size Which yields
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Thank you, this is better. I don’t think that the innochecksum tool currently parses the allocation bitmap pages. That is, some of the pages that it identifies as index or BLOB pages could actually be marked as free in the tablespace. Can you post the results of the following:
If you are using the default innodb_page_size=16k, you would seem to have 81971*16384 = 1,343,012,864 bytes in current or former BLOB data pages inside the file, which is almost 10 times the file size after OPTIMIZE TABLE. Based on the information gathered so far, the bug would seem to be the way how BLOB pages are allocated in InnoDB. It would be great if you can write a script that would simulate your workload. Using such a script we should be able to reproduce and fix the anomaly. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I see. Unfortunately I'm not able to provide info for the workload which is causing it. I understand something is writing and deleting data which is temporary in the wp_options table which is common to a lot of plugins in wordpress but also wordpress itself and in the wordpress world they're called transients as a form of object cache. It's certain that it's some type of transient that's repeating too often the write/delete process However to be able to reproduce this, I need to see what actual data is stored in the deleted space otherwise I can't even guess what type of data is being created or deleted constantly to find the pattern It feels like chasing my tail. In order to find the reason I need to see the data to start tracking which query writes them and when. In order to find the bug you need to see the workload that causes it. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Can you give any hints about the workload? Would option_id ever be changed in an UPDATE? For InnoDB BLOBs, that is handled as a special case: the newly inserted record with the updated PRIMARY KEY will inherit the BLOBs from the old record that will be delete-marked, and the old record will ‘disown’ the BLOBs. When BLOBs themselves are updated, a new copy is always allocated and written, and the old copy should be freed as part of purging the undo log records. There could be some bug in this area. Can you provide copies of the allocation bitmap pages for the file? That is, if you are using the default innodb_page_size=16384, we would need copies of the files produced by the following:
These pages contain bitmaps that identify which pages inside the file are allocated and not garbage. Please also post the innochecksum output for the file at the same point of time. If most of the BLOB pages are actually marked as allocated, then this would be a bug in the purge of BLOBs. If they are marked as free, then the failure to reuse the freed pages would be a bug of the page allocator. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-23 ] | ||||||||||||||||||||||||||||||||||||||||||
|
As far as I know option_id is never changed. I've done a grep in code and I don't see this key anywhere in SQL update / insert commands. Regarding the files request, do they contain sensitive information? If so is there a secure way to share them where the link won't be publicly accessible? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-23 ] | ||||||||||||||||||||||||||||||||||||||||||
|
For BLOBs and PRIMARY KEY updates, we have the open bug The allocation bitmap pages do not contain any sensitive information or any user data. Because your data is mostly ASCII text (except for the option_id which would be stored as a big endian 64-bit integer), you could run the strings utility on the files to ensure that there is no legible text stored there. Index pages should have the strings infimum and supremum on them. Page 0 contains the size of the file in pages, as well as information on which page numbers between 0 and innodb_page_size-1 are allocated. There also are some linked lists of allocated file segments or extents which span all the allocation bitmap pages. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-11-23 ] | ||||||||||||||||||||||||||||||||||||||||||
|
OK I've added the required files in blob.zip | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-12-10 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hi, Anything else needed on my end to help reproduce this issue? For the record we still get extreme filesystem sizes daily | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-12-11 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hi, sorry, I was occupied with other tasks. In the file innochecksum_result.txt inside blob.zip
The first one corresponds to the PRIMARY KEY index, and the next two should correspond to the UNIQUE KEY and the non-unique KEY. BLOB pages will be allocated from the same "file segment" as primary key leaf pages (BTR_SEG_LEAF). According to the #leaf_pages output above, there only are 191 primary key leaf pages or BLOB pages in total. There are at least 2 index leaf pages, because we see that there is one non-leaf page (the total number of pages is one larger). Inside the file we also have statistics saying that there are 480,609 BLOB pages. It looks like at most 189 of them will actually be allocated for BLOBs. The allocation bitmap page format is a little cumbersome to parse manually from a hexdump, because the bitmap bytes are interleaved with something else. There are two bits per page. One is always set, so 0xaa would refer to 4 allocated pages and 0xff would refer to 4 free pages. In page65536.bin I see 0xfaffff…. The first byte 0xfa means that the first 2 pages (page 65536 and 65537) are allocated by the system (for the allocation bitmap itself, and for a change buffer bitmap). The files page32768.bin and page49152.bin look similar. In the file page16384.bin the bitmap starts with 0xfaffff… but there are some scattered allocations: 0xabfffffbfffffbfaffbfaaaaaaaaaa. Based on this quick look, it seems that only some of the first 20,000 pages (or 320 MiB) of the file may actually be used, while the file is much larger. In total, only 294 pages (4.6 MiB) inside the file are used by actual payload, according to innochecksum. Actually, there should be 30 allocation bitmap pages in total. This looks like a possible bug in the page allocation algorithm. I say "possible", because this would be an expected situation if there had been a huge purge lag at some point, and those BLOB pages were only recently freed. InnoDB never shrinks .ibd files. Possibly the secondary indexes will contribute to the purge lag; this would eventually be fixed by a file format change (MDEV-17598). Can you monitor the history list length over time (once per second or minute), to see if there still are some bursts? Would the .ibd file keep growing further from its current size, or would the former BLOB pages be eventually reused when the next burst of workload arrives? Can you try to write a script that simulates the updates to this table? I am afraid that without such a script, it is hard to do anything. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2023-12-11 ] | ||||||||||||||||||||||||||||||||||||||||||
|
My real problem is determining what updates are running. Any ideas how to log all queries for a specific database and a specific table ? Perhaps if we parse a log file logging everything we can track the query that's mostly executing and try to simulate it. I can write a script that monitors the history list length of course, it will run every minute (cron limitation) and log it . Do you also need me to output the filesize of the table as well in the same log? It would output something like <list length> - <size> | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-12-14 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I’m not familiar with any query filtering, but maybe danblack can help with that. Yes, the history size and the wp_options.ibd file size would be useful to sample together. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-12-15 ] | ||||||||||||||||||||||||||||||||||||||||||
|
The collection of not so great options include:
The better option is probably:
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-12-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
There also is a MyISAM bug MDEV-28499 that mentions the table wp_options. As far as I understand, the .MYD heap file would keep growing until TRUNCATE or OPTIMIZE. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2024-01-02 ] | ||||||||||||||||||||||||||||||||||||||||||
|
ioweb.gr, do you have any results that you could share? If the majority of read-write transactions is modifying something else than the wp_options table, then the history list length (which is measured in transactions, not rows) would not necessarily correlate with the growth of the file. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2024-01-04 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hi, sorry for the delay, I had to contact my customer about these options to get permission as they could potentially impede the normal functioning of the website
What would you need me to provide after adding this to the table? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2024-01-04 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I would not recommend enabling system versioning on the table on a production server. With system versioning, any UPDATE statements will be converted to INSERT and any DELETE to UPDATE. That is, the table will preserve the entire history from the time when system versioning was enabled. It would be better to set up replication on this table only, and to enable the system versioning on the replica. It could be very helpful to have the history that corresponds to a time frame where the file is growing on the primary server. This of course together with the monitoring of the history list length on the primary server. On the replica, we would only be interested in the history of this table, nothing else. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2024-01-15 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hi, We'll have to try and think of easier potential workarounds that are feasible to perform on my end without breaking our production server. Any other ideas on how to approach this? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2024-01-23 ] | ||||||||||||||||||||||||||||||||||||||||||
|
The reason for the table growth is system versioning. This means that nothing is never deleted from the table and any change will just grow it indefinitely until versioning data is truncated. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Tziotzis [ 2024-01-23 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hi, in my case it's not because simply ALTER TABLE wp_options DROP SYSTEM VERSIONING; |