Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14462

Confusing error message: ib_logfiles are too small for innodb_thread_concurrency=0

Details

    Description

      Start server with --innodb_thread_concurrency=0 --innodb_log_file_size=1M --innodb_log_files_in_group=1
      (innodb_thread_concurrency=0 is actually default, so it can be omitted).

      2017-11-21 23:45:22 140609571486592 [ERROR] InnoDB: Cannot continue operation. ib_logfiles are too small for innodb_thread_concurrency=0. The combined size of ib_logfiles should be bigger than 200 kB * innodb_thread_concurrency. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html
      2017-11-21 23:45:22 140609571486592 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2120] with error Generic error
      

      Attachments

        Issue Links

          Activity

            Related to MDEV-14425, we plan to change the way how the redo log is written.
            As part of introducing a dedicated log writer thread, we can remove this limitation altogether, and have the dedicated thread initiate log checkpoints when necessary.

            marko Marko Mäkelä added a comment - Related to MDEV-14425 , we plan to change the way how the redo log is written. As part of introducing a dedicated log writer thread, we can remove this limitation altogether, and have the dedicated thread initiate log checkpoints when necessary.
            marko Marko Mäkelä added a comment - - edited

            As part of introducing a dedicated log writer task, the following needs to be done:

            • Remove srv_max_n_threads and some related code (as noted in MDEV-16264).
            • Remove log_free_check() calls from user threads. Checkpoints will be initiated by the dedicated log writer task only.
            • Replace mtr_t::m_memo and mtr_t::m_log with something like std::map<buf_block_t*,byte*> whose ownership can be transferred from the user thread doing mtr_t::commit() to the dedicated log writer task. At least, try to replace mtr_t::m_memo with fewer pointers: latched_space, latched_index, array of buf_block_t*.
            • If a user thread wants to ensure that a mini-transaction has been made durable, some synchronization with the log writer task will be needed.
            • The dedicated log writer task must write log snippets to the global buffer in LSN order (no matter whether the LSNs are counting bytes or mini-transactions). We might want that task to assign LSNs once it has copied the log, and to signal the user threads via some ‘mini-transaction handle’.
            marko Marko Mäkelä added a comment - - edited As part of introducing a dedicated log writer task, the following needs to be done: Remove srv_max_n_threads and some related code (as noted in MDEV-16264 ). Remove log_free_check() calls from user threads. Checkpoints will be initiated by the dedicated log writer task only. Replace mtr_t::m_memo and mtr_t::m_log with something like std::map<buf_block_t*,byte*> whose ownership can be transferred from the user thread doing mtr_t::commit() to the dedicated log writer task. At least, try to replace mtr_t::m_memo with fewer pointers: latched_space , latched_index , array of buf_block_t* . If a user thread wants to ensure that a mini-transaction has been made durable, some synchronization with the log writer task will be needed. The dedicated log writer task must write log snippets to the global buffer in LSN order (no matter whether the LSNs are counting bytes or mini-transactions). We might want that task to assign LSNs once it has copied the log, and to signal the user threads via some ‘mini-transaction handle’.

            In log_set_capacity() we have the following code:

            	/* For each OS thread we must reserve so much free space in the
            	smallest log group that it can accommodate the log entries produced
            	by single query steps: running out of free log space is a serious
            	system error which requires rebooting the database. */
             
            	free = LOG_CHECKPOINT_FREE_PER_THREAD * (10 + srv_thread_concurrency)
            		+ LOG_CHECKPOINT_EXTRA_FREE;
            	if (free >= smallest_capacity / 2) {
            		ib::error() << "Cannot continue operation because log file is "
            			       "too small. Increase innodb_log_file_size "
            			       "or decrease innodb_thread_concurrency. "
            			    << INNODB_PARAMETERS_MSG;
            		return false;
            	}
            

            As noted in MDEV-23382, the practical limit on concurrent writers may be rather close to 128 threads.

            We might have no other good choice than to remove this message. There is a kind of inherent race condition between threads that do the following:

            log_free_check();
            mtr.start();
            …
            mtr.commit();
            

            We can have N threads that successfully execute log_free_check(), noting that there is enough free space in the redo log. Then, each thread could generate redo log, and at mtr_t::commit() attempt to write that log to the global buffer. If each concurrent thread generates the worst-case amount of log (say, each one is splitting a very high B-tree from the leaf to the root, rewriting about 64 pages), each thread could be attempting to write 64*16KiB=1MiB of redo log. So, should we ensure that N*1MiB will be available in the redo log before the checkpoint will be overwritten? In some setups, this is reasonable, but it could be a serious overkill for smaller server instances.

            A proper fix to this problem seems to require a change to how the redo log is written. We should experiment with something that allows us to trigger a log checkpoint in mtr_t::commit() and possibly allows us to remove log_free_check().

            marko Marko Mäkelä added a comment - In log_set_capacity() we have the following code: /* For each OS thread we must reserve so much free space in the smallest log group that it can accommodate the log entries produced by single query steps: running out of free log space is a serious system error which requires rebooting the database. */   free = LOG_CHECKPOINT_FREE_PER_THREAD * (10 + srv_thread_concurrency) + LOG_CHECKPOINT_EXTRA_FREE; if ( free >= smallest_capacity / 2) { ib::error() << "Cannot continue operation because log file is " "too small. Increase innodb_log_file_size " "or decrease innodb_thread_concurrency. " << INNODB_PARAMETERS_MSG; return false ; } As noted in MDEV-23382 , the practical limit on concurrent writers may be rather close to 128 threads. We might have no other good choice than to remove this message. There is a kind of inherent race condition between threads that do the following: log_free_check(); mtr.start(); … mtr.commit(); We can have N threads that successfully execute log_free_check() , noting that there is enough free space in the redo log. Then, each thread could generate redo log, and at mtr_t::commit() attempt to write that log to the global buffer. If each concurrent thread generates the worst-case amount of log (say, each one is splitting a very high B-tree from the leaf to the root, rewriting about 64 pages), each thread could be attempting to write 64*16KiB=1MiB of redo log. So, should we ensure that N*1MiB will be available in the redo log before the checkpoint will be overwritten? In some setups, this is reasonable, but it could be a serious overkill for smaller server instances. A proper fix to this problem seems to require a change to how the redo log is written. We should experiment with something that allows us to trigger a log checkpoint in mtr_t::commit() and possibly allows us to remove log_free_check() .

            Because srv_thread_concurrency=0 actually means unlimited, the formula 10 + srv_thread_concurrency seems to be incorrect. We would actually require more safety margin when an innodb_thread_concurrency limit has been specified.

            marko Marko Mäkelä added a comment - Because srv_thread_concurrency=0 actually means unlimited, the formula 10 + srv_thread_concurrency seems to be incorrect. We would actually require more safety margin when an innodb_thread_concurrency limit has been specified.

            Yes, the message is confusing, and it was slightly changed when innodb_thread_concurrency was deprecated in MDEV-23379.

            Thanks to MDEV-23855 in MariaDB Server 10.5.7, problems due to misconfiguration should be less likely, but not impossible. With MDEV-23855, the log checkpoints will typically be initiated by the dedicated page cleaner thread.

            In MariaDB Enterprise Server 10.5, the log can be resized by SET GLOBAL without restarting the server.

            marko Marko Mäkelä added a comment - Yes, the message is confusing, and it was slightly changed when innodb_thread_concurrency was deprecated in MDEV-23379 . Thanks to MDEV-23855 in MariaDB Server 10.5.7, problems due to misconfiguration should be less likely, but not impossible. With MDEV-23855 , the log checkpoints will typically be initiated by the dedicated page cleaner thread. In MariaDB Enterprise Server 10.5, the log can be resized by SET GLOBAL without restarting the server.

            We could remove log_free_check() and possibly significantly reduce contention on log_sys.mutex if we changed mtr_t::commit() to detach the mtr_t::m_log and mtr_t::m_memo and pass their ownership to a dedicated thread that would write the data to log_sys.buf. That dedicated thread might even be the buf_flush_page_cleaner, because that thread is normally the one that executes log checkpoints.

            A potential drawback of such a solution would be that if a thread is accessing the same block soon again, it would end up waiting on the page latch, which would be held by the dedicated log writer until the log has been written.

            Furthermore, mtr_t::commit_lsn() would have to wait for the log writer to finish. Most callers of that are related to rare special cases, such as creating or renaming files. The data flow in trx_t::commit_in_memory() and trx_commit_complete_for_mysql() and possibly trx_prepare_low() would have to be refactored to avoid excessive waiting. wlad, ideally we would want to pass a ‘durability callback’ that would be notified once the log records have not only been written to log_sys.buf but also from there to the redo log file (so that log_sys.flushed_to_disk_lsn includes everything up to the mini-transaction commit).

            marko Marko Mäkelä added a comment - We could remove log_free_check() and possibly significantly reduce contention on log_sys.mutex if we changed mtr_t::commit() to detach the mtr_t::m_log and mtr_t::m_memo and pass their ownership to a dedicated thread that would write the data to log_sys.buf . That dedicated thread might even be the buf_flush_page_cleaner , because that thread is normally the one that executes log checkpoints. A potential drawback of such a solution would be that if a thread is accessing the same block soon again, it would end up waiting on the page latch, which would be held by the dedicated log writer until the log has been written. Furthermore, mtr_t::commit_lsn() would have to wait for the log writer to finish. Most callers of that are related to rare special cases, such as creating or renaming files. The data flow in trx_t::commit_in_memory() and trx_commit_complete_for_mysql() and possibly trx_prepare_low() would have to be refactored to avoid excessive waiting. wlad , ideally we would want to pass a ‘durability callback’ that would be notified once the log records have not only been written to log_sys.buf but also from there to the redo log file (so that log_sys.flushed_to_disk_lsn includes everything up to the mini-transaction commit).

            Hypothetically speaking, an alternative might be to switch from a circular log file to a log file that is being appended to, such as the binlog. That would introduce different types of problems, as noted in MDEV-27803.

            An attempt to solve this by introducing a dedicated log writer thread would introduce lots of thread context switches and likely destroy the performance improvements that were gained in MDEV-27774.

            Furthermore, even for a dedicated log writer thread, it could be impossible to advance the start of the circular log file, so that the tail would not overwrite it. Advancing the start (or the checkpoint LSN) will require writing out the oldest modified page, so that buf_pool.get_oldest_modification() can return a later checkpoint LSN. In the worst case, that very page is being modified (and already exclusively latched) by the mini-transaction whose log we are attempting to write. If we wanted to advance the checkpoint, we would need to have a copy of the unmodified page in the buffer pool, so that it could be written out. Implementing that would be very complicated.

            In MariaDB 10.8 or later, when the log is located in persistent memory, log overwrites should be extremely unlikely.

            marko Marko Mäkelä added a comment - Hypothetically speaking, an alternative might be to switch from a circular log file to a log file that is being appended to, such as the binlog. That would introduce different types of problems, as noted in MDEV-27803 . An attempt to solve this by introducing a dedicated log writer thread would introduce lots of thread context switches and likely destroy the performance improvements that were gained in MDEV-27774 . Furthermore, even for a dedicated log writer thread, it could be impossible to advance the start of the circular log file, so that the tail would not overwrite it. Advancing the start (or the checkpoint LSN) will require writing out the oldest modified page, so that buf_pool.get_oldest_modification() can return a later checkpoint LSN. In the worst case, that very page is being modified (and already exclusively latched) by the mini-transaction whose log we are attempting to write. If we wanted to advance the checkpoint, we would need to have a copy of the unmodified page in the buffer pool, so that it could be written out. Implementing that would be very complicated. In MariaDB 10.8 or later, when the log is located in persistent memory, log overwrites should be extremely unlikely.

            This message was removed as part of MDEV-27774. The minimum innodb_log_file_size was increased from 1MiB to 4MiB.

            The log overwrite error message (as noted in MDEV-27784) will remain. They are a serious indicator for users that their database is not crash safe. For performance, the innodb_log_file_size should be set reasonably large (sometimes maybe even larger than innodb_buffer_pool_size) in order to reduce the amount of checkpoint flushing.

            marko Marko Mäkelä added a comment - This message was removed as part of MDEV-27774 . The minimum innodb_log_file_size was increased from 1MiB to 4MiB. The log overwrite error message (as noted in MDEV-27784 ) will remain. They are a serious indicator for users that their database is not crash safe. For performance, the innodb_log_file_size should be set reasonably large (sometimes maybe even larger than innodb_buffer_pool_size ) in order to reduce the amount of checkpoint flushing.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.