Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27153

ibdata1 file Leaking ? (errno 135) - using innodb_undo_tablespaces

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.6.5
    • N/A
    • Server
    • None
    • Linux Centos 7.4

    Description

      Been using mariadb server for years. We always set the server via this so we put a bound to ibdata1 fiile :
      innodb_data_file_path=ibdata1:20m:autoextend:max:5G
      This always worked, never hit an error.

      With 10.6.5, (but 10.5.x and after configuring to use innodb_undo_tablespaces,
      we get below error when e.g. creating a simple table with 3 rows, this error hits.

      Can't create table `XXX` (errno: 135: No more room in record file")

      This hits when the ibdata1 file raaches the bounds given. So we have to keep growing the max bound, by 2GB ever few days as it keeps hitting the limit, so this looks like an idbata1 LEAK.

      When this error hits , all 4 undo files are around half their max size, so not exhausted.

      We were hoping that auto truncating undo tablespaces would avoid this very issue to ever happen, but it actually seems to trigger it !

      Below are the relevant parts of our config.

      innodb_undo_tablespaces=4
      innodb_undo_log_truncate=1
      innodb_max_undo_log_size=1024m
      innodb_log_file_size=1024m
      innodb_log_files_in_group=2
      innodb_flush_log_at_timeout=20

      transaction_isolation=READ-COMMITTED
      innodb_file_per_table=1
      innodb_data_file_path=ibdata1:20m:autoextend:max:5G
      innodb_autoextend_increment=200 # m incrs
      innodb_log_file_size=750m # TODO: try 2G+
      innodb_log_files_in_group=2
      innodb_checksum_algorithm=crc32
      innodb_flush_method=O_DIRECT
      innodb_flush_log_at_trx_commit=2
      innodb_autoinc_lock_mode=2
      innodb_use_native_aio=1
      innodb_open_files=3000
      innodb_stats_on_metadata=0

      query_cache_type=1
      query_cache_size=600m
      query_cache_limit=100m
      query_alloc_block_size=32k

      table_cache=12000
      table_open_cache=8192
      table_definition_cache=8192

      Attachments

        Issue Links

          Activity

            Can you please provide a test case for reproducing this?
            Another possible source of InnoDB system tablespace bloat would be the change buffer. Did you try innodb_change_buffering=none?

            I would not recommend creating data files with the non-default setting innodb_checksum_algorithm=crc32. The default setting full_crc32 is safer and more efficient.

            The option innodb_log_files_in_group has no effect already in MariaDB Server 10.5. A single file was found to be slightly more efficient in MDEV-20907. A log file size of only 1 gigabyte or 750 megabytes feels small. It could be set close to the buffer pool size (which you did not disclose).

            marko Marko Mäkelä added a comment - Can you please provide a test case for reproducing this? Another possible source of InnoDB system tablespace bloat would be the change buffer. Did you try innodb_change_buffering=none ? I would not recommend creating data files with the non-default setting innodb_checksum_algorithm=crc32 . The default setting full_crc32 is safer and more efficient. The option innodb_log_files_in_group has no effect already in MariaDB Server 10.5. A single file was found to be slightly more efficient in MDEV-20907 . A log file size of only 1 gigabyte or 750 megabytes feels small. It could be set close to the buffer pool size (which you did not disclose).
            technomagos@gmail.com Tech Magos added a comment -

            Thanks Marko

            (yes, aware of those options deprecated in 10.5/10.6)

            innodb_log_file_size of 2GB and checksum chang did not make a diff. The "leakage" continues as the ibdata1 limit was set to 7.5GB, it needed more after 4-5 weeks , set to 8.5Gb and waiting again (the rate at which this happens makes it v hard for me to give you a repro case)

            Other settings we use, which i did not provided above:
            innodb_buffer_pool_size=33000m
            innodb_log_buffer_size=450m
            innodb_max_dirty_pages_pct=90
            innodb_commit_concurrency=0
            innodb_purge_threads=4
            innodb_thread_concurrency=32
            innodb_read_io_threads=32
            innodb_write_io_threads=3
            innodb_io_capacity=1000
            innodb_flush_log_at_timeout=20
            performance_schema=1

            Your suggestion for log_size close to be close to buffer pool: that will mean around 30GB storage waste, for a db server that houses dbs of total sizse < 40GB..

            The issue here is that the same database server under 10.3.x had a 5GB ibdata1 limit for over 2 yrs and this error never appeared, ibdata1 never needed to grow, same settings except for new innodb_undo_tablespaces, the new feature, which we were hoping to avoid undo data palced in ibdafa1, by construction). So pls take this into consideration.

            Did not try this yet but plan to : innodb_change_buffering=none; but will this change not slow down our current insert/update/delete operations (and we never had to touch this one in years of running 10.3.x)

            technomagos@gmail.com Tech Magos added a comment - Thanks Marko (yes, aware of those options deprecated in 10.5/10.6) innodb_log_file_size of 2GB and checksum chang did not make a diff. The "leakage" continues as the ibdata1 limit was set to 7.5GB, it needed more after 4-5 weeks , set to 8.5Gb and waiting again (the rate at which this happens makes it v hard for me to give you a repro case) Other settings we use, which i did not provided above: innodb_buffer_pool_size=33000m innodb_log_buffer_size=450m innodb_max_dirty_pages_pct=90 innodb_commit_concurrency=0 innodb_purge_threads=4 innodb_thread_concurrency=32 innodb_read_io_threads=32 innodb_write_io_threads=3 innodb_io_capacity=1000 innodb_flush_log_at_timeout=20 performance_schema=1 Your suggestion for log_size close to be close to buffer pool: that will mean around 30GB storage waste, for a db server that houses dbs of total sizse < 40GB.. The issue here is that the same database server under 10.3.x had a 5GB ibdata1 limit for over 2 yrs and this error never appeared, ibdata1 never needed to grow, same settings except for new innodb_undo_tablespaces, the new feature, which we were hoping to avoid undo data palced in ibdafa1, by construction). So pls take this into consideration. Did not try this yet but plan to : innodb_change_buffering=none; but will this change not slow down our current insert/update/delete operations (and we never had to touch this one in years of running 10.3.x)

            With the innochecksum tool you should be able to see what is occupying the most space in the ibdata1 file.

            If you follow the analysis in MDEV-21952, the best candidate is the InnoDB change buffer. The situation could have been made worse by MDEV-19514, which aimed to avoid crash restart loops and to remove an innodb_force_recovery option that can cause permanent data loss.

            I would recommend disabling the change buffer not only because it makes the system tablespace grow, but also because of bugs that we have been unable to reproduce, such as MDEV-26977. In some performance tests that we conducted on SSD last week, the change buffer improved throughput on 10.5.13 and 10.6.5 by at most 5%, and sometimes reduced performance.

            marko Marko Mäkelä added a comment - With the innochecksum tool you should be able to see what is occupying the most space in the ibdata1 file. If you follow the analysis in MDEV-21952 , the best candidate is the InnoDB change buffer. The situation could have been made worse by MDEV-19514 , which aimed to avoid crash restart loops and to remove an innodb_force_recovery option that can cause permanent data loss. I would recommend disabling the change buffer not only because it makes the system tablespace grow, but also because of bugs that we have been unable to reproduce, such as MDEV-26977 . In some performance tests that we conducted on SSD last week, the change buffer improved throughput on 10.5.13 and 10.6.5 by at most 5%, and sometimes reduced performance.

            technomagos@gmail.com, did you try setting innodb_change_buffering=none or checking with innochecksum -S what is occupying the space in the system tablespace?

            marko Marko Mäkelä added a comment - technomagos@gmail.com , did you try setting innodb_change_buffering=none or checking with innochecksum -S what is occupying the space in the system tablespace?
            technomagos@gmail.com Tech Magos added a comment -

            Sorry for the delay to respond.

            NO did not set that yet. was worried about perf impact, sugegsted by yoru docs.

            I did did below and have not seen the issue hit again, but i have 'leaked' a few gbytes of ibdata1 already., which require to rebuild all databses from backup from scratch (not fun).

            I set:
            innodb_change_buffer_max_size=15

            I changed:
            innodb_io_capacity=up to 1800
            innodb_log_file_size=uppped to 2400m
            innodb_flush_log_at_timeout=down to 8

            I removed this but i assume this does nothing as the checksum is "stamped' on my existing tables?
            innodb_checksum_algorithm=crc32

            technomagos@gmail.com Tech Magos added a comment - Sorry for the delay to respond. NO did not set that yet. was worried about perf impact, sugegsted by yoru docs. I did did below and have not seen the issue hit again, but i have 'leaked' a few gbytes of ibdata1 already., which require to rebuild all databses from backup from scratch (not fun). I set: innodb_change_buffer_max_size=15 I changed: innodb_io_capacity=up to 1800 innodb_log_file_size=uppped to 2400m innodb_flush_log_at_timeout=down to 8 I removed this but i assume this does nothing as the checksum is "stamped' on my existing tables? innodb_checksum_algorithm=crc32

            Sorry, I missed the update.

            Meanwhile, the InnoDB change buffer was disabled by default (MDEV-27734) due to hard-to-reproduce data corruption. No significant performance regression was observed.

            Can you try to disable the InnoDB change buffer?

            marko Marko Mäkelä added a comment - Sorry, I missed the update. Meanwhile, the InnoDB change buffer was disabled by default ( MDEV-27734 ) due to hard-to-reproduce data corruption. No significant performance regression was observed. Can you try to disable the InnoDB change buffer?

            People

              marko Marko Mäkelä
              technomagos@gmail.com Tech Magos
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.