[MDEV-27153] ibdata1 file Leaking ? (errno 135) - using innodb_undo_tablespaces - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.6.5
Fix Version/s: N/A
Component/s: Server
Labels:
None
Environment:
Linux Centos 7.4

Description

Been using mariadb server for years. We always set the server via this so we put a bound to ibdata1 fiile :
innodb_data_file_path=ibdata1:20m:autoextend:max:5G
This always worked, never hit an error.

With 10.6.5, (but 10.5.x and after configuring to use innodb_undo_tablespaces,
we get below error when e.g. creating a simple table with 3 rows, this error hits.

Can't create table `XXX` (errno: 135: No more room in record file")

This hits when the ibdata1 file raaches the bounds given. So we have to keep growing the max bound, by 2GB ever few days as it keeps hitting the limit, so this looks like an idbata1 LEAK.

When this error hits , all 4 undo files are around half their max size, so not exhausted.

We were hoping that auto truncating undo tablespaces would avoid this very issue to ever happen, but it actually seems to trigger it !

Below are the relevant parts of our config.

innodb_undo_tablespaces=4
innodb_undo_log_truncate=1
innodb_max_undo_log_size=1024m
innodb_log_file_size=1024m
innodb_log_files_in_group=2
innodb_flush_log_at_timeout=20

transaction_isolation=READ-COMMITTED
innodb_file_per_table=1
innodb_data_file_path=ibdata1:20m:autoextend:max:5G
innodb_autoextend_increment=200 # m incrs
innodb_log_file_size=750m # TODO: try 2G+
innodb_log_files_in_group=2
innodb_checksum_algorithm=crc32
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit=2
innodb_autoinc_lock_mode=2
innodb_use_native_aio=1
innodb_open_files=3000
innodb_stats_on_metadata=0

query_cache_type=1
query_cache_size=600m
query_cache_limit=100m
query_alloc_block_size=32k

table_cache=12000
table_open_cache=8192
table_definition_cache=8192

Attachments

Issue Links

relates to

MDEV-19514 Defer change buffer merge until pages are requested

Closed

MDEV-21952 ibdata1 file size growing in MariaDB

Closed

Activity

Ascending order - Click to sort in descending order

Tech Magos created issue - 2021-12-01 22:17

Sergei Golubchik made changes - 2021-12-06 21:35

Field	Original Value	New Value
Workflow	MariaDB v3 [ 128080 ]	MariaDB v4 [ 143403 ]

Marko Mäkelä made changes - 2022-01-03 14:48

Status

Open [ 1 ]

Needs Feedback [ 10501 ]

Marko Mäkelä added a comment - 2022-01-03 14:53

Can you please provide a test case for reproducing this?
Another possible source of InnoDB system tablespace bloat would be the change buffer. Did you try innodb_change_buffering=none?

I would not recommend creating data files with the non-default setting innodb_checksum_algorithm=crc32. The default setting full_crc32 is safer and more efficient.

The option innodb_log_files_in_group has no effect already in MariaDB Server 10.5. A single file was found to be slightly more efficient in ~~MDEV-20907~~. A log file size of only 1 gigabyte or 750 megabytes feels small. It could be set close to the buffer pool size (which you did not disclose).

Marko Mäkelä added a comment - 2022-01-03 14:53 Can you please provide a test case for reproducing this? Another possible source of InnoDB system tablespace bloat would be the change buffer. Did you try innodb_change_buffering=none ? I would not recommend creating data files with the non-default setting innodb_checksum_algorithm=crc32 . The default setting full_crc32 is safer and more efficient. The option innodb_log_files_in_group has no effect already in MariaDB Server 10.5. A single file was found to be slightly more efficient in MDEV-20907 . A log file size of only 1 gigabyte or 750 megabytes feels small. It could be set close to the buffer pool size (which you did not disclose).

Tech Magos added a comment - 2022-01-30 11:45

Thanks Marko

(yes, aware of those options deprecated in 10.5/10.6)

innodb_log_file_size of 2GB and checksum chang did not make a diff. The "leakage" continues as the ibdata1 limit was set to 7.5GB, it needed more after 4-5 weeks , set to 8.5Gb and waiting again (the rate at which this happens makes it v hard for me to give you a repro case)

Other settings we use, which i did not provided above:
innodb_buffer_pool_size=33000m
innodb_log_buffer_size=450m
innodb_max_dirty_pages_pct=90
innodb_commit_concurrency=0
innodb_purge_threads=4
innodb_thread_concurrency=32
innodb_read_io_threads=32
innodb_write_io_threads=3
innodb_io_capacity=1000
innodb_flush_log_at_timeout=20
performance_schema=1

Your suggestion for log_size close to be close to buffer pool: that will mean around 30GB storage waste, for a db server that houses dbs of total sizse < 40GB..

The issue here is that the same database server under 10.3.x had a 5GB ibdata1 limit for over 2 yrs and this error never appeared, ibdata1 never needed to grow, same settings except for new innodb_undo_tablespaces, the new feature, which we were hoping to avoid undo data palced in ibdafa1, by construction). So pls take this into consideration.

Did not try this yet but plan to : innodb_change_buffering=none; but will this change not slow down our current insert/update/delete operations (and we never had to touch this one in years of running 10.3.x)

Tech Magos added a comment - 2022-01-30 11:45 Thanks Marko (yes, aware of those options deprecated in 10.5/10.6) innodb_log_file_size of 2GB and checksum chang did not make a diff. The "leakage" continues as the ibdata1 limit was set to 7.5GB, it needed more after 4-5 weeks , set to 8.5Gb and waiting again (the rate at which this happens makes it v hard for me to give you a repro case) Other settings we use, which i did not provided above: innodb_buffer_pool_size=33000m innodb_log_buffer_size=450m innodb_max_dirty_pages_pct=90 innodb_commit_concurrency=0 innodb_purge_threads=4 innodb_thread_concurrency=32 innodb_read_io_threads=32 innodb_write_io_threads=3 innodb_io_capacity=1000 innodb_flush_log_at_timeout=20 performance_schema=1 Your suggestion for log_size close to be close to buffer pool: that will mean around 30GB storage waste, for a db server that houses dbs of total sizse < 40GB.. The issue here is that the same database server under 10.3.x had a 5GB ibdata1 limit for over 2 yrs and this error never appeared, ibdata1 never needed to grow, same settings except for new innodb_undo_tablespaces, the new feature, which we were hoping to avoid undo data palced in ibdafa1, by construction). So pls take this into consideration. Did not try this yet but plan to : innodb_change_buffering=none; but will this change not slow down our current insert/update/delete operations (and we never had to touch this one in years of running 10.3.x)

Marko Mäkelä made changes - 2022-01-31 12:18

Link

This issue relates to ~~MDEV-21952~~ [ ~~MDEV-21952~~ ]

Marko Mäkelä made changes - 2022-01-31 12:18

Link

This issue relates to ~~MDEV-19514~~ [ ~~MDEV-19514~~ ]

Marko Mäkelä added a comment - 2022-01-31 12:18

With the innochecksum tool you should be able to see what is occupying the most space in the ibdata1 file.

If you follow the analysis in ~~MDEV-21952~~, the best candidate is the InnoDB change buffer. The situation could have been made worse by ~~MDEV-19514~~, which aimed to avoid crash restart loops and to remove an innodb_force_recovery option that can cause permanent data loss.

I would recommend disabling the change buffer not only because it makes the system tablespace grow, but also because of bugs that we have been unable to reproduce, such as ~~MDEV-26977~~. In some performance tests that we conducted on SSD last week, the change buffer improved throughput on 10.5.13 and 10.6.5 by at most 5%, and sometimes reduced performance.

Marko Mäkelä added a comment - 2022-01-31 12:18 With the innochecksum tool you should be able to see what is occupying the most space in the ibdata1 file. If you follow the analysis in MDEV-21952 , the best candidate is the InnoDB change buffer. The situation could have been made worse by MDEV-19514 , which aimed to avoid crash restart loops and to remove an innodb_force_recovery option that can cause permanent data loss. I would recommend disabling the change buffer not only because it makes the system tablespace grow, but also because of bugs that we have been unable to reproduce, such as MDEV-26977 . In some performance tests that we conducted on SSD last week, the change buffer improved throughput on 10.5.13 and 10.6.5 by at most 5%, and sometimes reduced performance.

Marko Mäkelä added a comment - 2022-03-07 11:36

technomagos@gmail.com, did you try setting innodb_change_buffering=none or checking with innochecksum -S what is occupying the space in the system tablespace?

Marko Mäkelä added a comment - 2022-03-07 11:36 technomagos@gmail.com , did you try setting innodb_change_buffering=none or checking with innochecksum -S what is occupying the space in the system tablespace?

Tech Magos added a comment - 2022-03-23 21:59

Sorry for the delay to respond.

NO did not set that yet. was worried about perf impact, sugegsted by yoru docs.

I did did below and have not seen the issue hit again, but i have 'leaked' a few gbytes of ibdata1 already., which require to rebuild all databses from backup from scratch (not fun).

I set:
innodb_change_buffer_max_size=15

I changed:
innodb_io_capacity=up to 1800
innodb_log_file_size=uppped to 2400m
innodb_flush_log_at_timeout=down to 8

I removed this but i assume this does nothing as the checksum is "stamped' on my existing tables?
innodb_checksum_algorithm=crc32

Tech Magos added a comment - 2022-03-23 21:59 Sorry for the delay to respond. NO did not set that yet. was worried about perf impact, sugegsted by yoru docs. I did did below and have not seen the issue hit again, but i have 'leaked' a few gbytes of ibdata1 already., which require to rebuild all databses from backup from scratch (not fun). I set: innodb_change_buffer_max_size=15 I changed: innodb_io_capacity=up to 1800 innodb_log_file_size=uppped to 2400m innodb_flush_log_at_timeout=down to 8 I removed this but i assume this does nothing as the checksum is "stamped' on my existing tables? innodb_checksum_algorithm=crc32

Sergei Golubchik made changes - 2022-04-25 08:54

Status

Needs Feedback [ 10501 ]

Open [ 1 ]

Marko Mäkelä added a comment - 2022-08-04 06:12

Sorry, I missed the update.

Meanwhile, the InnoDB change buffer was disabled by default (~~MDEV-27734~~) due to hard-to-reproduce data corruption. No significant performance regression was observed.

Can you try to disable the InnoDB change buffer?

Marko Mäkelä added a comment - 2022-08-04 06:12 Sorry, I missed the update. Meanwhile, the InnoDB change buffer was disabled by default ( MDEV-27734 ) due to hard-to-reproduce data corruption. No significant performance regression was observed. Can you try to disable the InnoDB change buffer?

Marko Mäkelä made changes - 2022-08-04 06:12

Assignee		Marko Mäkelä [ marko ]
Status	Open [ 1 ]	Needs Feedback [ 10501 ]

Elena Stepanova made changes - 2022-09-04 23:28

Status

Needs Feedback [ 10501 ]

Open [ 1 ]

Elena Stepanova made changes - 2022-09-04 23:28

Status

Open [ 1 ]

Needs Feedback [ 10501 ]

Sergei Golubchik made changes - 2022-10-03 11:46

Fix Version/s		N/A [ 14700 ]
Resolution		Incomplete [ 4 ]
Status	Needs Feedback [ 10501 ]	Closed [ 6 ]

People

Assignee:: Marko Mäkelä

Reporter:: Tech Magos

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2021-12-01 22:17

Updated:: 2022-10-03 11:46

Resolved:: 2022-10-03 11:46

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.