Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13080

[ERROR] InnoDB: Missing MLOG_CHECKPOINT between the checkpoint x and the end y

Details

    Description

      Hello,

      I've been using MariaDB 10.2.6 with slave_run_triggers_for_rbr for some aggregation tables.
      However, there's a big memory leak with this setup and the server regularly hangs.

      Last time it hangs, innodb crash recovery could not start because MLOG_CHECKPOINT was not found.

      My server has got a BBWC so any fsync should be persisted on disk. I dont understand how that could happen ?

      Attachments

        Issue Links

          Activity

            Fuyar Farid UYAR created issue -

            Can you paste the body of the trigger?
            We have a confirmed report about a leak in a trigger which uses a subquery: MDEV-12992 . Can yours be similar?

            elenst Elena Stepanova added a comment - Can you paste the body of the trigger? We have a confirmed report about a leak in a trigger which uses a subquery: MDEV-12992 . Can yours be similar?
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Labels need_feedback
            Fuyar Farid UYAR added a comment -

            Dont really understand russian so cant comment on the previous report. But yeah on my triggers I'm storing some variables like this :

            SET @var = (SELECT IFNULL(total,0) from table_name where ...).

            Then using those variables to insert them (simple aggregations). Could it be that those var are never freed from memory ?

            Fuyar Farid UYAR added a comment - Dont really understand russian so cant comment on the previous report. But yeah on my triggers I'm storing some variables like this : SET @var = (SELECT IFNULL(total,0) from table_name where ...). Then using those variables to insert them (simple aggregations). Could it be that those var are never freed from memory ?

            Sorry I forgot to mention that, you didn't need to read the whole report, there were a couple of comments at the end with the scenario and all.

            That particular scenario involved a subquery in the trigger, although possibly there were other variations. Or, does your WHERE clause have a subquery?

            elenst Elena Stepanova added a comment - Sorry I forgot to mention that, you didn't need to read the whole report, there were a couple of comments at the end with the scenario and all. That particular scenario involved a subquery in the trigger, although possibly there were other variations. Or, does your WHERE clause have a subquery?
            Fuyar Farid UYAR added a comment -

            Okay I show your comment on the other issue. I dont really have SELECT subqueries like in your example.
            Most of my triggers are like this :

            IF NEW.var1 = X AND NEW.var2 = Y THEN
                 INSERT INTO or DELETE FROM
            END IF;
            

            Seems pretty simple to me.

            The most complex ones are like this :

            IF NEW.var1 = X THEN
               SET @x = (SELECT count(*) FROM other_table WHERE column1 = NEW.xx AND column2 = yy)
               IF @x > 0 THEN
                      INSERT INTO agg_table VALUES (NEW.id, @x)
               ELSE
                      DELETE FROM agg_table 
               END IF;
            END IF;
            

            So no subqueries, only @var usage to store count aggregate.

            Fuyar Farid UYAR added a comment - Okay I show your comment on the other issue. I dont really have SELECT subqueries like in your example. Most of my triggers are like this : IF NEW.var1 = X AND NEW.var2 = Y THEN INSERT INTO or DELETE FROM END IF; Seems pretty simple to me. The most complex ones are like this : IF NEW.var1 = X THEN SET @x = (SELECT count(*) FROM other_table WHERE column1 = NEW.xx AND column2 = yy) IF @x > 0 THEN INSERT INTO agg_table VALUES (NEW.id, @x) ELSE DELETE FROM agg_table END IF; END IF; So no subqueries, only @var usage to store count aggregate.
            elenst Elena Stepanova made changes -
            Labels need_feedback

            Are other_table and agg_table here actual tables, or is any of them a view?
            And in the first trigger, what does DELETE have in WHERE clause, and what kind of INSERT is it – INSERT .. VALUES or INSERT ... SELECT?

            elenst Elena Stepanova added a comment - Are other_table and agg_table here actual tables, or is any of them a view? And in the first trigger, what does DELETE have in WHERE clause, and what kind of INSERT is it – INSERT .. VALUES or INSERT ... SELECT ?
            elenst Elena Stepanova made changes -
            Labels need_feedback

            Fuyar,

            Did you have a chance to upgrade to 10.2.7 and if you did, did it resolve the problem?

            elenst Elena Stepanova added a comment - Fuyar , Did you have a chance to upgrade to 10.2.7 and if you did, did it resolve the problem?
            amramsay Andrew Ramsay added a comment -

            I nearly logged a new fault, but the missing checkpoint message in the log says the problem is the same, even if the cause is not.
            I believe 10.2.7 also has [the] problem, although possibly not to the same degree:

            2017-07-29 11:58:59 4208 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
            2017-07-29 11:58:59 4208 [Note] InnoDB: Uses event mutexes
            2017-07-29 11:58:59 4208 [Note] InnoDB: Compressed tables use zlib 1.2.3
            2017-07-29 11:58:59 4208 [Note] InnoDB: Number of pools: 1
            2017-07-29 11:58:59 4208 [Note] InnoDB: Using generic crc32 instructions
            2017-07-29 11:58:59 4208 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M
            2017-07-29 11:59:00 4208 [Note] InnoDB: Completed initialization of buffer pool
            2017-07-29 11:59:00 4208 [Note] InnoDB: Highest supported file format is Barracuda.
            2017-07-29 11:59:00 4208 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1070287781678
            2017-07-29 11:59:00 4208 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 1070287781678 between the checkpoint 1070287781678 and the end 1070287807736.
            2017-07-29 11:59:00 4208 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
            2017-07-29 11:59:00 4208 [Note] InnoDB: Starting shutdown...
            2017-07-29 11:59:01 4208 [ERROR] Plugin 'InnoDB' init function returned error.
            2017-07-29 11:59:01 4208 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
            2017-07-29 11:59:01 4208 [Note] Plugin 'FEEDBACK' is disabled.
            2017-07-29 11:59:01 4208 [ERROR] Unknown/unsupported storage engine: InnoDB
            2017-07-29 11:59:01 4208 [ERROR] Aborting

            happened for the first time two days ago, although seems more stable than 10.2.6. This happened after I had taken the server up and down a few times in fairly quick succession due to rebooting win7x64 after multiple software upgrades (this could be a red herring though!). server refused to start until innodb_force_recovery=6.

            amramsay Andrew Ramsay added a comment - I nearly logged a new fault, but the missing checkpoint message in the log says the problem is the same, even if the cause is not. I believe 10.2.7 also has [the] problem, although possibly not to the same degree: 2017-07-29 11:58:59 4208 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions 2017-07-29 11:58:59 4208 [Note] InnoDB: Uses event mutexes 2017-07-29 11:58:59 4208 [Note] InnoDB: Compressed tables use zlib 1.2.3 2017-07-29 11:58:59 4208 [Note] InnoDB: Number of pools: 1 2017-07-29 11:58:59 4208 [Note] InnoDB: Using generic crc32 instructions 2017-07-29 11:58:59 4208 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M 2017-07-29 11:59:00 4208 [Note] InnoDB: Completed initialization of buffer pool 2017-07-29 11:59:00 4208 [Note] InnoDB: Highest supported file format is Barracuda. 2017-07-29 11:59:00 4208 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1070287781678 2017-07-29 11:59:00 4208 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 1070287781678 between the checkpoint 1070287781678 and the end 1070287807736. 2017-07-29 11:59:00 4208 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 2017-07-29 11:59:00 4208 [Note] InnoDB: Starting shutdown... 2017-07-29 11:59:01 4208 [ERROR] Plugin 'InnoDB' init function returned error. 2017-07-29 11:59:01 4208 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2017-07-29 11:59:01 4208 [Note] Plugin 'FEEDBACK' is disabled. 2017-07-29 11:59:01 4208 [ERROR] Unknown/unsupported storage engine: InnoDB 2017-07-29 11:59:01 4208 [ERROR] Aborting happened for the first time two days ago, although seems more stable than 10.2.6. This happened after I had taken the server up and down a few times in fairly quick succession due to rebooting win7x64 after multiple software upgrades (this could be a red herring though!). server refused to start until innodb_force_recovery=6.

            It appears that there are three problems (either related to each other or not) reported in this issue:

            there's a big memory leak

            the server regularly hangs

            innodb crash recovery could not start because MLOG_CHECKPOINT was not found

            amramsay, as I understand, you have had the third kind. Have you experienced the other two?

            elenst Elena Stepanova added a comment - It appears that there are three problems (either related to each other or not) reported in this issue: there's a big memory leak the server regularly hangs innodb crash recovery could not start because MLOG_CHECKPOINT was not found amramsay , as I understand, you have had the third kind. Have you experienced the other two?
            amramsay Andrew Ramsay added a comment -

            1. To my knowledge I have not experienced the memory leak, I have been watching memory usage quite closely, during my current processing - it has remained static at 18.5GB.
            2.I am not aware of the server hanging, although on occasion it has run a bit slow, but I put that down to the amount of data being processed/imported - I have been moving fairly substantial amounts of data (for me anyway) over the last few weeks (~50GB binlogs per 24hour period) - repeated db imports, all has run smoothly.
            3. as you said - yes, the checkpoint issue, once.

            amramsay Andrew Ramsay added a comment - 1. To my knowledge I have not experienced the memory leak, I have been watching memory usage quite closely, during my current processing - it has remained static at 18.5GB. 2.I am not aware of the server hanging, although on occasion it has run a bit slow, but I put that down to the amount of data being processed/imported - I have been moving fairly substantial amounts of data (for me anyway) over the last few weeks (~50GB binlogs per 24hour period) - repeated db imports, all has run smoothly. 3. as you said - yes, the checkpoint issue, once.

            amramsay, thanks.

            In this case, let's treat this issue as being about the missing MLOG_CHECKPOINT (as the subject suggests, anyway), and consider other parts of it possibly irrelevant.

            jplindst, any thoughts about the checkpoint problem?

            elenst Elena Stepanova added a comment - amramsay , thanks. In this case, let's treat this issue as being about the missing MLOG_CHECKPOINT (as the subject suggests, anyway), and consider other parts of it possibly irrelevant. jplindst , any thoughts about the checkpoint problem?
            elenst Elena Stepanova made changes -
            Component/s Storage Engine - InnoDB [ 10129 ]
            Assignee Jan Lindström [ jplindst ]
            Labels need_feedback
            amramsay Andrew Ramsay added a comment -

            For information - I note that MLOG_CHECKPOINT issue is still present in 10.2.8. I installed 10.2.8 on 18-Aug, crash occurred on 20-Aug, minimal db activity between upgrade and crash recovery:

            – restart after crash, but no encryption key present –
            2017-08-20 20:43:38 6472 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
            2017-08-20 20:43:38 6472 [Note] InnoDB: Uses event mutexes
            2017-08-20 20:43:38 6472 [Note] InnoDB: Compressed tables use zlib 1.2.3
            2017-08-20 20:43:38 6472 [Note] InnoDB: Number of pools: 1
            2017-08-20 20:43:38 6472 [Note] InnoDB: Using generic crc32 instructions
            2017-08-20 20:43:38 6472 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M
            2017-08-20 20:43:39 6472 [Note] InnoDB: Completed initialization of buffer pool
            2017-08-20 20:43:39 6472 [Note] InnoDB: Highest supported file format is Barracuda.
            2017-08-20 20:43:39 6472 [Note] InnoDB: Starting crash recovery from checkpoint LSN=656734850618
            2017-08-20 20:43:39 6472 [ERROR] InnoDB: Incorrect log record type:90
            2017-08-20 20:43:39 6472 [Warning] InnoDB: Log scan aborted at LSN 656734850560
            2017-08-20 20:43:39 6472 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
            2017-08-20 20:43:39 6472 [Note] InnoDB: Starting shutdown...
            2017-08-20 20:43:40 6472 [ERROR] Plugin 'InnoDB' init function returned error.
            2017-08-20 20:43:40 6472 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
            2017-08-20 20:43:40 6472 [Note] Plugin 'FEEDBACK' is disabled.
            2017-08-20 20:43:40 6472 [ERROR] Unknown/unsupported storage engine: InnoDB
            2017-08-20 20:43:40 6472 [ERROR] Aborting

            – restart, now with encryption key present –
            2017-08-20 21:15:29 4800 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
            2017-08-20 21:15:29 4800 [Note] InnoDB: Uses event mutexes
            2017-08-20 21:15:29 4800 [Note] InnoDB: Compressed tables use zlib 1.2.3
            2017-08-20 21:15:29 4800 [Note] InnoDB: Number of pools: 1
            2017-08-20 21:15:29 4800 [Note] InnoDB: Using generic crc32 instructions
            2017-08-20 21:15:29 4800 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M
            2017-08-20 21:15:30 4800 [Note] InnoDB: Completed initialization of buffer pool
            2017-08-20 21:15:30 4800 [Note] InnoDB: Highest supported file format is Barracuda.
            2017-08-20 21:15:30 4800 [Note] InnoDB: Starting crash recovery from checkpoint LSN=656734850618
            2017-08-20 21:15:30 4800 [ERROR] InnoDB: Incorrect log record type:90
            2017-08-20 21:15:30 4800 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 656734850618 between the checkpoint 656734850618 and the end 656734861665.
            2017-08-20 21:15:30 4800 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
            2017-08-20 21:15:30 4800 [Note] InnoDB: Starting shutdown...
            2017-08-20 21:15:31 4800 [ERROR] Plugin 'InnoDB' init function returned error.
            2017-08-20 21:15:31 4800 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
            2017-08-20 21:15:31 4800 [Note] Plugin 'FEEDBACK' is disabled.
            2017-08-20 21:15:31 4800 [ERROR] Unknown/unsupported storage engine: InnoDB
            2017-08-20 21:15:31 4800 [ERROR] Aborting

            ...I had to go to recovery level 6 to get database online for dumping.
            I noted that the log message was slightly different on the first pass, presumably because data could not be decrypted for interpretation, due to lack of key (my fault)?

            amramsay Andrew Ramsay added a comment - For information - I note that MLOG_CHECKPOINT issue is still present in 10.2.8. I installed 10.2.8 on 18-Aug, crash occurred on 20-Aug, minimal db activity between upgrade and crash recovery: – restart after crash, but no encryption key present – 2017-08-20 20:43:38 6472 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions 2017-08-20 20:43:38 6472 [Note] InnoDB: Uses event mutexes 2017-08-20 20:43:38 6472 [Note] InnoDB: Compressed tables use zlib 1.2.3 2017-08-20 20:43:38 6472 [Note] InnoDB: Number of pools: 1 2017-08-20 20:43:38 6472 [Note] InnoDB: Using generic crc32 instructions 2017-08-20 20:43:38 6472 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M 2017-08-20 20:43:39 6472 [Note] InnoDB: Completed initialization of buffer pool 2017-08-20 20:43:39 6472 [Note] InnoDB: Highest supported file format is Barracuda. 2017-08-20 20:43:39 6472 [Note] InnoDB: Starting crash recovery from checkpoint LSN=656734850618 2017-08-20 20:43:39 6472 [ERROR] InnoDB: Incorrect log record type:90 2017-08-20 20:43:39 6472 [Warning] InnoDB: Log scan aborted at LSN 656734850560 2017-08-20 20:43:39 6472 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 2017-08-20 20:43:39 6472 [Note] InnoDB: Starting shutdown... 2017-08-20 20:43:40 6472 [ERROR] Plugin 'InnoDB' init function returned error. 2017-08-20 20:43:40 6472 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2017-08-20 20:43:40 6472 [Note] Plugin 'FEEDBACK' is disabled. 2017-08-20 20:43:40 6472 [ERROR] Unknown/unsupported storage engine: InnoDB 2017-08-20 20:43:40 6472 [ERROR] Aborting – restart, now with encryption key present – 2017-08-20 21:15:29 4800 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions 2017-08-20 21:15:29 4800 [Note] InnoDB: Uses event mutexes 2017-08-20 21:15:29 4800 [Note] InnoDB: Compressed tables use zlib 1.2.3 2017-08-20 21:15:29 4800 [Note] InnoDB: Number of pools: 1 2017-08-20 21:15:29 4800 [Note] InnoDB: Using generic crc32 instructions 2017-08-20 21:15:29 4800 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M 2017-08-20 21:15:30 4800 [Note] InnoDB: Completed initialization of buffer pool 2017-08-20 21:15:30 4800 [Note] InnoDB: Highest supported file format is Barracuda. 2017-08-20 21:15:30 4800 [Note] InnoDB: Starting crash recovery from checkpoint LSN=656734850618 2017-08-20 21:15:30 4800 [ERROR] InnoDB: Incorrect log record type:90 2017-08-20 21:15:30 4800 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 656734850618 between the checkpoint 656734850618 and the end 656734861665. 2017-08-20 21:15:30 4800 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 2017-08-20 21:15:30 4800 [Note] InnoDB: Starting shutdown... 2017-08-20 21:15:31 4800 [ERROR] Plugin 'InnoDB' init function returned error. 2017-08-20 21:15:31 4800 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2017-08-20 21:15:31 4800 [Note] Plugin 'FEEDBACK' is disabled. 2017-08-20 21:15:31 4800 [ERROR] Unknown/unsupported storage engine: InnoDB 2017-08-20 21:15:31 4800 [ERROR] Aborting ...I had to go to recovery level 6 to get database online for dumping. I noted that the log message was slightly different on the first pass, presumably because data could not be decrypted for interpretation, due to lack of key (my fault)?
            elenst Elena Stepanova made changes -
            Fix Version/s 10.2 [ 14601 ]

            http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/14336

            innodb.undo_truncate_recover '16k,2,innodb' w1 [ fail ]  Found warnings/errors in server log file!
                    Test ended at 2018-09-21 04:45:22
            line
            2018-09-21  4:45:20 0 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 24653138 between the checkpoint 23916606 and the end 24653138.
            2018-09-21  4:45:20 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2012] with error Generic error
            2018-09-21  4:45:21 0 [ERROR] Plugin 'InnoDB' init function returned error.
            2018-09-21  4:45:21 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
            ^ Found warnings in /mnt/buildbot/build/mariadb-10.3.10/mysql-test/var/1/log/mysqld.1.err
            ok
            

            elenst Elena Stepanova added a comment - http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/14336 innodb.undo_truncate_recover '16k,2,innodb' w1 [ fail ] Found warnings/errors in server log file! Test ended at 2018-09-21 04:45:22 line 2018-09-21 4:45:20 0 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 24653138 between the checkpoint 23916606 and the end 24653138. 2018-09-21 4:45:20 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2012] with error Generic error 2018-09-21 4:45:21 0 [ERROR] Plugin 'InnoDB' init function returned error. 2018-09-21 4:45:21 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. ^ Found warnings in /mnt/buildbot/build/mariadb-10.3.10/mysql-test/var/1/log/mysqld.1.err ok
            elenst Elena Stepanova made changes -
            Component/s Tests [ 10800 ]
            Fix Version/s 10.3 [ 22126 ]
            Affects Version/s 10.3 [ 22126 ]
            danblack Daniel Black added a comment - - edited

             docker logs a80641d42796
            MySQL init password for existing database...
            MySQL init process in progress...
            2018-10-15 13:12:54 70366838301360 [Note] mysqld (mysqld 10.2.17-MariaDB-1:10.2.17+maria~bionic) starting as process 43 ...
            2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
            2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Uses event mutexes
            2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Compressed tables use zlib 1.2.11
            2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Using Linux native AIO
            2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Number of pools: 1
            2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Using POWER8 crc32 instructions
            2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
            2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Completed initialization of buffer pool
            2018-10-15 13:12:54 70366127255728 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
            2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Highest supported file format is Barracuda.
            2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Starting crash recovery from checkpoint LSN=4101625
            2018-10-15 13:12:54 70366838301360 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 4101625 between the checkpoint 4101305 and the end 4101632.
            2018-10-15 13:12:54 70366838301360 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
            2018-10-15 13:12:55 70366838301360 [Note] InnoDB: Starting shutdown...
            2018-10-15 13:12:55 70366838301360 [ERROR] Plugin 'InnoDB' init function returned error.
            2018-10-15 13:12:55 70366838301360 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
            2018-10-15 13:12:55 70366838301360 [Note] Plugin 'FEEDBACK' is disabled.
            2018-10-15 13:12:55 70366838301360 [ERROR] Unknown/unsupported storage engine: InnoDB
            2018-10-15 13:12:55 70366838301360 [ERROR] Aborting
            

            update:

            uploaded files to ftp.mariadb.org:

            • MDEV-13080-docker-export-ppc64le.tar.xz - the result of the docker export of the container. Will be exceptionally close to the docker mariadb official mariadb:10.2.18-bionic - basic configuration.
            • MDEV-13080-datadir.tar.xz - datadir (unrecovered)
            danblack Daniel Black added a comment - - edited docker logs a80641d42796 MySQL init password for existing database... MySQL init process in progress... 2018-10-15 13:12:54 70366838301360 [Note] mysqld (mysqld 10.2.17-MariaDB-1:10.2.17+maria~bionic) starting as process 43 ... 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Uses event mutexes 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Compressed tables use zlib 1.2.11 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Using Linux native AIO 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Number of pools: 1 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Using POWER8 crc32 instructions 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Completed initialization of buffer pool 2018-10-15 13:12:54 70366127255728 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Highest supported file format is Barracuda. 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Starting crash recovery from checkpoint LSN=4101625 2018-10-15 13:12:54 70366838301360 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 4101625 between the checkpoint 4101305 and the end 4101632. 2018-10-15 13:12:54 70366838301360 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 2018-10-15 13:12:55 70366838301360 [Note] InnoDB: Starting shutdown... 2018-10-15 13:12:55 70366838301360 [ERROR] Plugin 'InnoDB' init function returned error. 2018-10-15 13:12:55 70366838301360 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2018-10-15 13:12:55 70366838301360 [Note] Plugin 'FEEDBACK' is disabled. 2018-10-15 13:12:55 70366838301360 [ERROR] Unknown/unsupported storage engine: InnoDB 2018-10-15 13:12:55 70366838301360 [ERROR] Aborting update: uploaded files to ftp.mariadb.org: MDEV-13080 -docker-export-ppc64le.tar.xz - the result of the docker export of the container. Will be exceptionally close to the docker mariadb official mariadb:10.2.18-bionic - basic configuration. MDEV-13080 -datadir.tar.xz - datadir (unrecovered)
            danblack Daniel Black made changes -
            Affects Version/s 10.2.17 [ 23111 ]
            elenst Elena Stepanova made changes -
            thiru Thirunarayanan Balathandayuthapani made changes -
            Assignee Jan Lindström [ jplindst ] Thirunarayanan Balathandayuthapani [ thiru ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            MDEV-13830 and MDEV-15282 could share the root cause with this.

            marko Marko Mäkelä added a comment - MDEV-13830 and MDEV-15282 could share the root cause with this.

            MLOG_CHECKPOINT is present in the redo log partially. MLOG_CHECKPOINT record is 9 bytes. 1 byte for MLOG_CHECKPOINT marker and 8 bytes for CHECKPOINT LSN.

            MLOG_CHECKPOINT record is present partially (only 3 bytes) at the end of log block and next block is empty.

            Scenario could be that checkpoint info is written before MLOG_CHECKPOINT record is written in the next block and server could be killed
            before writing the block.

            Yet to write mtr test case for it and read the code too

            thiru Thirunarayanan Balathandayuthapani added a comment - MLOG_CHECKPOINT is present in the redo log partially. MLOG_CHECKPOINT record is 9 bytes. 1 byte for MLOG_CHECKPOINT marker and 8 bytes for CHECKPOINT LSN. MLOG_CHECKPOINT record is present partially (only 3 bytes) at the end of log block and next block is empty. Scenario could be that checkpoint info is written before MLOG_CHECKPOINT record is written in the next block and server could be killed before writing the block. Yet to write mtr test case for it and read the code too

            I tried to review the code, and I did not find anything obviously wrong, except potential race conditions in log_write_up_to(), for example here:

            	if (flush_to_disk
            	    && (log_sys->n_pending_flushes > 0
            		|| !os_event_is_set(log_sys->flush_event))) {
            		/* Figure out if the current flush will do the job
            		for us. */
            		bool work_done = log_sys->current_flush_lsn >= lsn;
             
            		log_write_mutex_exit();
             
            		os_event_wait(log_sys->flush_event);
             
            		if (work_done) {
            			return;
            

            How can we be sure that the desired work was actually completed here? Could there have been multiple pending writes, and could the event have been signaled for an earlier write, for something older than log_sys->current_flush_lsn? Perhaps we should check log_sys->flushed_to_disk_lsn before returning.

            I tried to reproduce the failure, but in the end I only reproduced (and subsequently developed a fix for) MDEV-17849.

            marko Marko Mäkelä added a comment - I tried to review the code, and I did not find anything obviously wrong, except potential race conditions in log_write_up_to() , for example here: if (flush_to_disk && (log_sys->n_pending_flushes > 0 || !os_event_is_set(log_sys->flush_event))) { /* Figure out if the current flush will do the job for us. */ bool work_done = log_sys->current_flush_lsn >= lsn;   log_write_mutex_exit();   os_event_wait(log_sys->flush_event);   if (work_done) { return ; How can we be sure that the desired work was actually completed here? Could there have been multiple pending writes, and could the event have been signaled for an earlier write, for something older than log_sys->current_flush_lsn ? Perhaps we should check log_sys->flushed_to_disk_lsn before returning. I tried to reproduce the failure, but in the end I only reproduced (and subsequently developed a fix for) MDEV-17849 .
            kevg Eugene Kosov (Inactive) made changes -
            kevg Eugene Kosov (Inactive) made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Eugene Kosov [ kevg ]

            To reproduce this bug I use mtr -mem -force -max-test-fail=9999 -suite=innodb -par=5 innodb.undo_truncate_recover{,,,} -repeat=100 --. Sometimes also https://jira.mariadb.org/browse/MDEV-18038 happens.

            Both failures are rare.

            log_buffer_extend() is called in this test. When I disable it with --innodb-log-buffer-size=100000000 I see no errors at all. But I'm unsure because it's not easy to reproduce. Anyway this is my current guess that it's data races during redo log extending.

            kevg Eugene Kosov (Inactive) added a comment - To reproduce this bug I use mtr -mem -force -max-test-fail=9999 -suite=innodb -par=5 innodb.undo_truncate_recover{,,,} -repeat=100 -- . Sometimes also https://jira.mariadb.org/browse/MDEV-18038 happens. Both failures are rare. log_buffer_extend() is called in this test. When I disable it with --innodb-log-buffer-size=100000000 I see no errors at all. But I'm unsure because it's not easy to reproduce. Anyway this is my current guess that it's data races during redo log extending.

            I put write_always = true; at the very beginning of log_checkpoint(). This makes checkpoints flushed more frequent. And both failures again disappear. Maybe the last checkpoint sometimes is not committed and this is the source of problems?

            kevg Eugene Kosov (Inactive) added a comment - I put write_always = true; at the very beginning of log_checkpoint() . This makes checkpoints flushed more frequent. And both failures again disappear. Maybe the last checkpoint sometimes is not committed and this is the source of problems?

            kevg, that is interesting. Could it be that we sometimes update last_checkpoint_lsn before the corresponding write to the checkpoint header becomes persistent? Could it be that fil_aio_wait() is invoked on the redo log pseudo-tablespace for something else than the latest pending checkpoint write? After all, we do allow multiple log_checkpoint() executions in parallel, interleaved with each other.

            Could you add an assertion to log_complete_checkpoint() or its call path, to verify that the completed write was for the expected checkpoint?

            Note: The redo log is not being extended while the server is running. Only the redo log write buffer is extended during this in 10.2 and 10.3. In 10.4, less redo log is being written thanks to the MDEV-17138 MLOG_MEMSET record.

            marko Marko Mäkelä added a comment - kevg , that is interesting. Could it be that we sometimes update last_checkpoint_lsn before the corresponding write to the checkpoint header becomes persistent? Could it be that fil_aio_wait() is invoked on the redo log pseudo-tablespace for something else than the latest pending checkpoint write? After all, we do allow multiple log_checkpoint() executions in parallel, interleaved with each other. Could you add an assertion to log_complete_checkpoint() or its call path, to verify that the completed write was for the expected checkpoint? Note: The redo log is not being extended while the server is running. Only the redo log write buffer is extended during this in 10.2 and 10.3. In 10.4, less redo log is being written thanks to the MDEV-17138 MLOG_MEMSET record.

            My assumption about `write_always = true` was incorrect at least for MDEV-18038 which sometimes happen.

            kevg Eugene Kosov (Inactive) added a comment - My assumption about `write_always = true` was incorrect at least for MDEV-18038 which sometimes happen.
            serg Sergei Golubchik made changes -
            Fix Version/s 10.4 [ 22408 ]
            marko Marko Mäkelä added a comment - - edited

            I implemented debug instrumentation that should make bugs in crash recovery and mariabackup --prepare more repeatable.

            Usage:

            ./mtr --mysqld=--debug=d,ib_log_checkpoint_avoid …
            

            marko Marko Mäkelä added a comment - - edited I implemented debug instrumentation that should make bugs in crash recovery and mariabackup --prepare more repeatable. Usage: ./mtr --mysqld=--debug=d,ib_log_checkpoint_avoid …
            marko Marko Mäkelä made changes -
            kevg Eugene Kosov (Inactive) added a comment - - edited

            The bug is inside log_buffer_extend(). MDEV-18038 too.

            This patch makes log_buffer_extend() run on every mtr.commit().

            diff --git a/storage/innobase/include/log0log.h b/storage/innobase/include/log0log.h
            index 16399edad50..dd549baddce 100644
            --- a/storage/innobase/include/log0log.h
            +++ b/storage/innobase/include/log0log.h
            @@ -81,7 +81,7 @@ log_free_check(void);
             
             /** Extends the log buffer.
             @param[in]     len     requested minimum size in bytes */
            -void log_buffer_extend(ulong len);
            +void log_buffer_extend();
             
             /** Check margin not to overwrite transaction log from the last checkpoint.
             If would estimate the log write to exceed the log_group_capacity,
            diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
            index 409b42c452a..4ffa906f86b 100644
            --- a/storage/innobase/log/log0log.cc
            +++ b/storage/innobase/log/log0log.cc
            @@ -159,8 +159,10 @@ log_buf_pool_get_oldest_modification(void)
             
             /** Extends the log buffer.
             @param[in]     len     requested minimum size in bytes */
            -void log_buffer_extend(ulong len)
            +void log_buffer_extend()
             {
            +       size_t len = srv_log_buffer_size + 1;
            +
                    byte    tmp_buf[OS_FILE_LOG_BLOCK_SIZE];
             
                    log_mutex_enter_all();
            diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc
            index 92b1aa38a81..d4950ffba05 100644
            --- a/storage/innobase/mtr/mtr0mtr.cc
            +++ b/storage/innobase/mtr/mtr0mtr.cc
            @@ -764,9 +764,7 @@ mtr_t::Command::prepare_write()
                    ut_ad(len > 0);
                    ut_ad(n_recs > 0);
             
            -       if (len > srv_log_buffer_size / 2) {
            -               log_buffer_extend(ulong((len + 1) * 2));
            -       }
            +       log_buffer_extend();
             
                    ut_ad(m_impl->m_n_log_recs == n_recs);
            

            And with that patch I always see MDEV-13080 or MDEV-18038 in 10.3 with this simple test:

            create or replace table t (a int) engine=innodb;
            insert into t values (1);
            drop table t;
            

            kevg Eugene Kosov (Inactive) added a comment - - edited The bug is inside log_buffer_extend() . MDEV-18038 too. This patch makes log_buffer_extend() run on every mtr.commit() . diff --git a/storage/innobase/include/log0log.h b/storage/innobase/include/log0log.h index 16399edad50..dd549baddce 100644 --- a/storage/innobase/include/log0log.h +++ b/storage/innobase/include/log0log.h @@ -81,7 +81,7 @@ log_free_check(void); /** Extends the log buffer. @param[in] len requested minimum size in bytes */ -void log_buffer_extend(ulong len); +void log_buffer_extend(); /** Check margin not to overwrite transaction log from the last checkpoint. If would estimate the log write to exceed the log_group_capacity, diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc index 409b42c452a..4ffa906f86b 100644 --- a/storage/innobase/log/log0log.cc +++ b/storage/innobase/log/log0log.cc @@ -159,8 +159,10 @@ log_buf_pool_get_oldest_modification(void) /** Extends the log buffer. @param[in] len requested minimum size in bytes */ -void log_buffer_extend(ulong len) +void log_buffer_extend() { + size_t len = srv_log_buffer_size + 1; + byte tmp_buf[OS_FILE_LOG_BLOCK_SIZE]; log_mutex_enter_all(); diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc index 92b1aa38a81..d4950ffba05 100644 --- a/storage/innobase/mtr/mtr0mtr.cc +++ b/storage/innobase/mtr/mtr0mtr.cc @@ -764,9 +764,7 @@ mtr_t::Command::prepare_write() ut_ad(len > 0); ut_ad(n_recs > 0); - if (len > srv_log_buffer_size / 2) { - log_buffer_extend(ulong((len + 1) * 2)); - } + log_buffer_extend(); ut_ad(m_impl->m_n_log_recs == n_recs); And with that patch I always see MDEV-13080 or MDEV-18038 in 10.3 with this simple test: create or replace table t (a int ) engine=innodb; insert into t values (1); drop table t;
            kevg Eugene Kosov (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            kevg Eugene Kosov (Inactive) made changes -
            Assignee Eugene Kosov [ kevg ] Marko Mäkelä [ marko ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            We will keep the log_buffer_extend() but remove the log file write from it.

            marko Marko Mäkelä added a comment - We will keep the log_buffer_extend() but remove the log file write from it.
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2019-05-15 12:34:11.0 2019-05-15 12:34:11.213
            marko Marko Mäkelä made changes -
            Fix Version/s 10.2.25 [ 23408 ]
            Fix Version/s 10.3.16 [ 23410 ]
            Fix Version/s 10.4.5 [ 23311 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Assignee Marko Mäkelä [ marko ] Eugene Kosov [ kevg ]
            Resolution Fixed [ 1 ]
            Status In Review [ 10002 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 81222 ] MariaDB v4 [ 152322 ]

            People

              kevg Eugene Kosov (Inactive)
              Fuyar Farid UYAR
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.