[MDEV-13080] [ERROR] InnoDB: Missing MLOG_CHECKPOINT between the checkpoint x and the end y - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.2.6, 10.2.17, 10.3(EOL)
Fix Version/s: 10.2.25, 10.3.16, 10.4.5
Component/s: Storage Engine - InnoDB, Tests
Labels:
None
Environment:
Ubuntu 14.04.4 LTS

Description

Hello,

I've been using MariaDB 10.2.6 with slave_run_triggers_for_rbr for some aggregation tables.
However, there's a big memory leak with this setup and the server regularly hangs.

Last time it hangs, innodb crash recovery could not start because MLOG_CHECKPOINT was not found.

My server has got a BBWC so any fsync should be persisted on disk. I dont understand how that could happen ?

Attachments

Issue Links

duplicates

MDEV-17680 innodb.undo_truncate_recover fails in buildbot with Missing MLOG_CHECKPOINT

Closed

relates to

MDEV-15282 innodb.autoinc_persist failed in buildbot, Assertion failed: recv_sys->mlog_checkpoint_lsn <= recv_sys->recovered_lsn

Closed

MDEV-18038 Assertion failure in innodb.undo_truncate_recover: "pad_len >= len || i * 512U >= len - pad_len || log_block_get_hdr_no( buf + i * 512U) == log_block_get_hdr_no(buf) + i"

Closed

MDEV-13830 Assertion failed: recv_sys->mlog_checkpoint_lsn <= recv_sys->recovered_lsn

Closed

MDEV-19346 Assertion `recv_sys->mlog_checkpoint_lsn <= recv_sys->recovered_lsn' failed in recv_parse_log_recs during mariabackup --prepare

Closed

Activity

Ascending order - Click to sort in descending order

View 18 older comments

Marko Mäkelä added a comment - 2019-03-08 13:20

kevg, that is interesting. Could it be that we sometimes update last_checkpoint_lsn before the corresponding write to the checkpoint header becomes persistent? Could it be that fil_aio_wait() is invoked on the redo log pseudo-tablespace for something else than the latest pending checkpoint write? After all, we do allow multiple log_checkpoint() executions in parallel, interleaved with each other.

Could you add an assertion to log_complete_checkpoint() or its call path, to verify that the completed write was for the expected checkpoint?

Note: The redo log is not being extended while the server is running. Only the redo log write buffer is extended during this in 10.2 and 10.3. In 10.4, less redo log is being written thanks to the ~~MDEV-17138~~ MLOG_MEMSET record.

Marko Mäkelä added a comment - 2019-03-08 13:20 kevg , that is interesting. Could it be that we sometimes update last_checkpoint_lsn before the corresponding write to the checkpoint header becomes persistent? Could it be that fil_aio_wait() is invoked on the redo log pseudo-tablespace for something else than the latest pending checkpoint write? After all, we do allow multiple log_checkpoint() executions in parallel, interleaved with each other. Could you add an assertion to log_complete_checkpoint() or its call path, to verify that the completed write was for the expected checkpoint? Note: The redo log is not being extended while the server is running. Only the redo log write buffer is extended during this in 10.2 and 10.3. In 10.4, less redo log is being written thanks to the MDEV-17138 MLOG_MEMSET record.

Eugene Kosov (Inactive) added a comment - 2019-03-26 14:56

My assumption about `write_always = true` was incorrect at least for ~~MDEV-18038~~ which sometimes happen.

Eugene Kosov (Inactive) added a comment - 2019-03-26 14:56 My assumption about `write_always = true` was incorrect at least for MDEV-18038 which sometimes happen.

Marko Mäkelä added a comment - 2019-04-23 13:10 - edited

I implemented debug instrumentation that should make bugs in crash recovery and mariabackup --prepare more repeatable.

Usage:

./mtr --mysqld=--debug=d,ib_log_checkpoint_avoid …

Marko Mäkelä added a comment - 2019-04-23 13:10 - edited I implemented debug instrumentation that should make bugs in crash recovery and mariabackup --prepare more repeatable. Usage: ./mtr --mysqld=--debug=d,ib_log_checkpoint_avoid …

Eugene Kosov (Inactive) added a comment - 2019-05-13 07:33 - edited

The bug is inside log_buffer_extend(). ~~MDEV-18038~~ too.

This patch makes log_buffer_extend() run on every mtr.commit().

diff --git a/storage/innobase/include/log0log.h b/storage/innobase/include/log0log.h

index 16399edad50..dd549baddce 100644

--- a/storage/innobase/include/log0log.h

+++ b/storage/innobase/include/log0log.h

@@ -81,7 +81,7 @@ log_free_check(void);

 /** Extends the log buffer.

 @param[in]     len     requested minimum size in bytes */

-void log_buffer_extend(ulong len);

+void log_buffer_extend();

 /** Check margin not to overwrite transaction log from the last checkpoint.

 If would estimate the log write to exceed the log_group_capacity,

diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc

index 409b42c452a..4ffa906f86b 100644

--- a/storage/innobase/log/log0log.cc

+++ b/storage/innobase/log/log0log.cc

@@ -159,8 +159,10 @@ log_buf_pool_get_oldest_modification(void)

 /** Extends the log buffer.

 @param[in]     len     requested minimum size in bytes */

-void log_buffer_extend(ulong len)

+void log_buffer_extend()

+       size_t len = srv_log_buffer_size + 1;

        byte    tmp_buf[OS_FILE_LOG_BLOCK_SIZE];

        log_mutex_enter_all();

diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc

index 92b1aa38a81..d4950ffba05 100644

--- a/storage/innobase/mtr/mtr0mtr.cc

+++ b/storage/innobase/mtr/mtr0mtr.cc

@@ -764,9 +764,7 @@ mtr_t::Command::prepare_write()

        ut_ad(len > 0);

        ut_ad(n_recs > 0);

-       if (len > srv_log_buffer_size / 2) {

-               log_buffer_extend(ulong((len + 1) * 2));

-       }

+       log_buffer_extend();

        ut_ad(m_impl->m_n_log_recs == n_recs);

And with that patch I always see ~~MDEV-13080~~ or ~~MDEV-18038~~ in 10.3 with this simple test:

create or replace table t (a int) engine=innodb;

insert into t values (1);

drop table t;

Eugene Kosov (Inactive) added a comment - 2019-05-13 07:33 - edited The bug is inside log_buffer_extend() . MDEV-18038 too. This patch makes log_buffer_extend() run on every mtr.commit() . diff --git a/storage/innobase/include/log0log.h b/storage/innobase/include/log0log.h index 16399edad50..dd549baddce 100644 --- a/storage/innobase/include/log0log.h +++ b/storage/innobase/include/log0log.h @@ -81,7 +81,7 @@ log_free_check(void); /** Extends the log buffer. @param[in] len requested minimum size in bytes */ -void log_buffer_extend(ulong len); +void log_buffer_extend(); /** Check margin not to overwrite transaction log from the last checkpoint. If would estimate the log write to exceed the log_group_capacity, diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc index 409b42c452a..4ffa906f86b 100644 --- a/storage/innobase/log/log0log.cc +++ b/storage/innobase/log/log0log.cc @@ -159,8 +159,10 @@ log_buf_pool_get_oldest_modification(void) /** Extends the log buffer. @param[in] len requested minimum size in bytes */ -void log_buffer_extend(ulong len) +void log_buffer_extend() { + size_t len = srv_log_buffer_size + 1; + byte tmp_buf[OS_FILE_LOG_BLOCK_SIZE]; log_mutex_enter_all(); diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc index 92b1aa38a81..d4950ffba05 100644 --- a/storage/innobase/mtr/mtr0mtr.cc +++ b/storage/innobase/mtr/mtr0mtr.cc @@ -764,9 +764,7 @@ mtr_t::Command::prepare_write() ut_ad(len > 0); ut_ad(n_recs > 0); - if (len > srv_log_buffer_size / 2) { - log_buffer_extend(ulong((len + 1) * 2)); - } + log_buffer_extend(); ut_ad(m_impl->m_n_log_recs == n_recs); And with that patch I always see MDEV-13080 or MDEV-18038 in 10.3 with this simple test: create or replace table t (a int ) engine=innodb; insert into t values (1); drop table t;

Marko Mäkelä added a comment - 2019-05-15 12:34

We will keep the log_buffer_extend() but remove the log file write from it.

Marko Mäkelä added a comment - 2019-05-15 12:34 We will keep the log_buffer_extend() but remove the log file write from it.

MariaDB Server

[ERROR] InnoDB: Missing MLOG_CHECKPOINT between the checkpoint x and the end y

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration