[MDEV-27153] ibdata1 file Leaking ? (errno 135) - using innodb_undo_tablespaces Created: 2021-12-01  Updated: 2022-10-03  Resolved: 2022-10-03

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.6.5
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Tech Magos Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Linux Centos 7.4


Issue Links:
Relates
relates to MDEV-19514 Defer change buffer merge until pages... Closed
relates to MDEV-21952 ibdata1 file size growing in MariaDB Closed

 Description   

Been using mariadb server for years. We always set the server via this so we put a bound to ibdata1 fiile :
innodb_data_file_path=ibdata1:20m:autoextend:max:5G
This always worked, never hit an error.

With 10.6.5, (but 10.5.x and after configuring to use innodb_undo_tablespaces,
we get below error when e.g. creating a simple table with 3 rows, this error hits.

Can't create table `XXX` (errno: 135: No more room in record file")

This hits when the ibdata1 file raaches the bounds given. So we have to keep growing the max bound, by 2GB ever few days as it keeps hitting the limit, so this looks like an idbata1 LEAK.

When this error hits , all 4 undo files are around half their max size, so not exhausted.

We were hoping that auto truncating undo tablespaces would avoid this very issue to ever happen, but it actually seems to trigger it !

Below are the relevant parts of our config.

innodb_undo_tablespaces=4
innodb_undo_log_truncate=1
innodb_max_undo_log_size=1024m
innodb_log_file_size=1024m
innodb_log_files_in_group=2
innodb_flush_log_at_timeout=20

transaction_isolation=READ-COMMITTED
innodb_file_per_table=1
innodb_data_file_path=ibdata1:20m:autoextend:max:5G
innodb_autoextend_increment=200 # m incrs
innodb_log_file_size=750m # TODO: try 2G+
innodb_log_files_in_group=2
innodb_checksum_algorithm=crc32
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit=2
innodb_autoinc_lock_mode=2
innodb_use_native_aio=1
innodb_open_files=3000
innodb_stats_on_metadata=0

query_cache_type=1
query_cache_size=600m
query_cache_limit=100m
query_alloc_block_size=32k

table_cache=12000
table_open_cache=8192
table_definition_cache=8192



 Comments   
Comment by Marko Mäkelä [ 2022-01-03 ]

Can you please provide a test case for reproducing this?
Another possible source of InnoDB system tablespace bloat would be the change buffer. Did you try innodb_change_buffering=none?

I would not recommend creating data files with the non-default setting innodb_checksum_algorithm=crc32. The default setting full_crc32 is safer and more efficient.

The option innodb_log_files_in_group has no effect already in MariaDB Server 10.5. A single file was found to be slightly more efficient in MDEV-20907. A log file size of only 1 gigabyte or 750 megabytes feels small. It could be set close to the buffer pool size (which you did not disclose).

Comment by Tech Magos [ 2022-01-30 ]

Thanks Marko

(yes, aware of those options deprecated in 10.5/10.6)

innodb_log_file_size of 2GB and checksum chang did not make a diff. The "leakage" continues as the ibdata1 limit was set to 7.5GB, it needed more after 4-5 weeks , set to 8.5Gb and waiting again (the rate at which this happens makes it v hard for me to give you a repro case)

Other settings we use, which i did not provided above:
innodb_buffer_pool_size=33000m
innodb_log_buffer_size=450m
innodb_max_dirty_pages_pct=90
innodb_commit_concurrency=0
innodb_purge_threads=4
innodb_thread_concurrency=32
innodb_read_io_threads=32
innodb_write_io_threads=3
innodb_io_capacity=1000
innodb_flush_log_at_timeout=20
performance_schema=1

Your suggestion for log_size close to be close to buffer pool: that will mean around 30GB storage waste, for a db server that houses dbs of total sizse < 40GB..

The issue here is that the same database server under 10.3.x had a 5GB ibdata1 limit for over 2 yrs and this error never appeared, ibdata1 never needed to grow, same settings except for new innodb_undo_tablespaces, the new feature, which we were hoping to avoid undo data palced in ibdafa1, by construction). So pls take this into consideration.

Did not try this yet but plan to : innodb_change_buffering=none; but will this change not slow down our current insert/update/delete operations (and we never had to touch this one in years of running 10.3.x)

Comment by Marko Mäkelä [ 2022-01-31 ]

With the innochecksum tool you should be able to see what is occupying the most space in the ibdata1 file.

If you follow the analysis in MDEV-21952, the best candidate is the InnoDB change buffer. The situation could have been made worse by MDEV-19514, which aimed to avoid crash restart loops and to remove an innodb_force_recovery option that can cause permanent data loss.

I would recommend disabling the change buffer not only because it makes the system tablespace grow, but also because of bugs that we have been unable to reproduce, such as MDEV-26977. In some performance tests that we conducted on SSD last week, the change buffer improved throughput on 10.5.13 and 10.6.5 by at most 5%, and sometimes reduced performance.

Comment by Marko Mäkelä [ 2022-03-07 ]

technomagos@gmail.com, did you try setting innodb_change_buffering=none or checking with innochecksum -S what is occupying the space in the system tablespace?

Comment by Tech Magos [ 2022-03-23 ]

Sorry for the delay to respond.

NO did not set that yet. was worried about perf impact, sugegsted by yoru docs.

I did did below and have not seen the issue hit again, but i have 'leaked' a few gbytes of ibdata1 already., which require to rebuild all databses from backup from scratch (not fun).

I set:
innodb_change_buffer_max_size=15

I changed:
innodb_io_capacity=up to 1800
innodb_log_file_size=uppped to 2400m
innodb_flush_log_at_timeout=down to 8

I removed this but i assume this does nothing as the checksum is "stamped' on my existing tables?
innodb_checksum_algorithm=crc32

Comment by Marko Mäkelä [ 2022-08-04 ]

Sorry, I missed the update.

Meanwhile, the InnoDB change buffer was disabled by default (MDEV-27734) due to hard-to-reproduce data corruption. No significant performance regression was observed.

Can you try to disable the InnoDB change buffer?

Generated at Thu Feb 08 09:50:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.