Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20605

Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration

Details

    Description

      Some users are seeing duplicate key errors like the following when using the optimistic and aggressive modes of parallel replication:

      MariaDB [(none)]> SHOW SLAVE STATUS\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 172.30.0.154
                        Master_User: repl
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.000055
                Read_Master_Log_Pos: 593262418
                     Relay_Log_File: ip-172-30-0-228-relay-bin.000002
                      Relay_Log_Pos: 33725828
              Relay_Master_Log_File: mariadb-bin.000039
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB:
                Replicate_Ignore_DB:
                 Replicate_Do_Table:
             Replicate_Ignore_Table:
            Replicate_Wild_Do_Table:
        Replicate_Wild_Ignore_Table:
                         Last_Errno: 1062
                         Last_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 705697132
                    Relay_Log_Space: 17101184558
                    Until_Condition: None
                     Until_Log_File:
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File:
                 Master_SSL_CA_Path:
                    Master_SSL_Cert:
                  Master_SSL_Cipher:
                     Master_SSL_Key:
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error:
                     Last_SQL_Errno: 1062
                     Last_SQL_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
        Replicate_Ignore_Server_Ids:
                   Master_Server_Id: 1
                     Master_SSL_Crl:
                 Master_SSL_Crlpath:
                         Using_Gtid: Slave_Pos
                        Gtid_IO_Pos: 0-1-42163518
            Replicate_Do_Domain_Ids:
        Replicate_Ignore_Domain_Ids:
                      Parallel_Mode: aggressive
                          SQL_Delay: 0
                SQL_Remaining_Delay: NULL
            Slave_SQL_Running_State:
      1 row in set (0.00 sec)
      

      However, when the problem is analyzed, it is clear that the duplicate key should not occur.

      The configuration:

      [mariadb]
      server_id=2
      log_bin=mariadb-bin
      log_slave_updates
      binlog_format=MIXED
      slave_parallel_threads=16
      slave_parallel_max_queued=1048576
      slave_parallel_mode=aggressive
      

      The configuration with slave_parallel_mode=optimistic was identical.

      Some observations:

      • When the table is queried on the slave, the row does indeed exists, so it is clear that the duplicate key error is real.
      • The SQL script contains an UPDATE and a DELETE before the INSERT, so the row should have been deleted, which should have prevented the duplicate key error.
      • The row still exists, so it seems that the DELETE statement had no effect.
      • The values in the existing row in the table are the old non-updated values, so it seems that the UPDATE statement also had no effect.
      • The slave's binary log contains both the UPDATE and DELETE statements, which should indicate that the statements were successfully executed on the slave.
      • I haven't yet been able to reproduce the problem when the master has binlog_format=ROW set, so row-based replication may be immune to the problem.

      Attachments

        1. MDEV-20605.patch
          8 kB
        2. thiru_20605.yy
          0.9 kB
        3. MDEV-20605-restore-cursor.test
          3 kB
        4. MDEV-20605-deadlock.test
          2 kB
        5. MDEV-20605-restore-cursor-v2.test
          3 kB
        6. MDEV-20605-locking.test
          3 kB
        7. MDEV-20605-mysql.test
          3 kB

        Issue Links

          Activity

            GeoffMontee Geoff Montee (Inactive) created issue -
            GeoffMontee Geoff Montee (Inactive) made changes -
            Field Original Value New Value
            GeoffMontee Geoff Montee (Inactive) made changes -
            Description Some users are seeing duplicate key errors like the following when using aggressive mode of parallel replication:

            {noformat}
            MariaDB [(none)]> SHOW SLAVE STATUS\G
            *************************** 1. row ***************************
                           Slave_IO_State: Waiting for master to send event
                              Master_Host: 172.30.0.154
                              Master_User: repl
                              Master_Port: 3306
                            Connect_Retry: 60
                          Master_Log_File: mariadb-bin.000055
                      Read_Master_Log_Pos: 593262418
                           Relay_Log_File: ip-172-30-0-228-relay-bin.000002
                            Relay_Log_Pos: 33725828
                    Relay_Master_Log_File: mariadb-bin.000039
                         Slave_IO_Running: Yes
                        Slave_SQL_Running: No
                          Replicate_Do_DB:
                      Replicate_Ignore_DB:
                       Replicate_Do_Table:
                   Replicate_Ignore_Table:
                  Replicate_Wild_Do_Table:
              Replicate_Wild_Ignore_Table:
                               Last_Errno: 1062
                               Last_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
                             Skip_Counter: 0
                      Exec_Master_Log_Pos: 705697132
                          Relay_Log_Space: 17101184558
                          Until_Condition: None
                           Until_Log_File:
                            Until_Log_Pos: 0
                       Master_SSL_Allowed: No
                       Master_SSL_CA_File:
                       Master_SSL_CA_Path:
                          Master_SSL_Cert:
                        Master_SSL_Cipher:
                           Master_SSL_Key:
                    Seconds_Behind_Master: NULL
            Master_SSL_Verify_Server_Cert: No
                            Last_IO_Errno: 0
                            Last_IO_Error:
                           Last_SQL_Errno: 1062
                           Last_SQL_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
              Replicate_Ignore_Server_Ids:
                         Master_Server_Id: 1
                           Master_SSL_Crl:
                       Master_SSL_Crlpath:
                               Using_Gtid: Slave_Pos
                              Gtid_IO_Pos: 0-1-42163518
                  Replicate_Do_Domain_Ids:
              Replicate_Ignore_Domain_Ids:
                            Parallel_Mode: aggressive
                                SQL_Delay: 0
                      SQL_Remaining_Delay: NULL
                  Slave_SQL_Running_State:
            1 row in set (0.00 sec)
            {noformat}

            However, when the problem is analyzed, it is clear that the duplicate key should not occur.

            The configuration:

            {noformat}
            [mariadb]
            server_id=2
            log_bin=mariadb-bin
            log_slave_updates
            binlog_format=MIXED
            slave_parallel_threads=16
            slave_parallel_mode=aggressive
            {noformat}

            Some observations:

            * When the table is queried on the slave, the row does indeed exists, so it is clear that the duplicate key error is real.

            * The SQL script contains an {{UPDATE}} and a {{DELETE}} before the {{INSERT}}, so the row should have been deleted, which should have prevented the duplicate key error.

            * The row still exists, so it seems that the {{DELETE}} statement had no effect.

            * The values in the existing row in the table are the old non-updated values, so it seems that the {{UPDATE}} statement also had no effect.

            * The slave's binary log contains both the {{UPDATE}} and {{DELETE}} statements, which *should* indicate that the statements were successfully executed on the slave.
            Some users are seeing duplicate key errors like the following when using aggressive mode of parallel replication:

            {noformat}
            MariaDB [(none)]> SHOW SLAVE STATUS\G
            *************************** 1. row ***************************
                           Slave_IO_State: Waiting for master to send event
                              Master_Host: 172.30.0.154
                              Master_User: repl
                              Master_Port: 3306
                            Connect_Retry: 60
                          Master_Log_File: mariadb-bin.000055
                      Read_Master_Log_Pos: 593262418
                           Relay_Log_File: ip-172-30-0-228-relay-bin.000002
                            Relay_Log_Pos: 33725828
                    Relay_Master_Log_File: mariadb-bin.000039
                         Slave_IO_Running: Yes
                        Slave_SQL_Running: No
                          Replicate_Do_DB:
                      Replicate_Ignore_DB:
                       Replicate_Do_Table:
                   Replicate_Ignore_Table:
                  Replicate_Wild_Do_Table:
              Replicate_Wild_Ignore_Table:
                               Last_Errno: 1062
                               Last_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
                             Skip_Counter: 0
                      Exec_Master_Log_Pos: 705697132
                          Relay_Log_Space: 17101184558
                          Until_Condition: None
                           Until_Log_File:
                            Until_Log_Pos: 0
                       Master_SSL_Allowed: No
                       Master_SSL_CA_File:
                       Master_SSL_CA_Path:
                          Master_SSL_Cert:
                        Master_SSL_Cipher:
                           Master_SSL_Key:
                    Seconds_Behind_Master: NULL
            Master_SSL_Verify_Server_Cert: No
                            Last_IO_Errno: 0
                            Last_IO_Error:
                           Last_SQL_Errno: 1062
                           Last_SQL_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
              Replicate_Ignore_Server_Ids:
                         Master_Server_Id: 1
                           Master_SSL_Crl:
                       Master_SSL_Crlpath:
                               Using_Gtid: Slave_Pos
                              Gtid_IO_Pos: 0-1-42163518
                  Replicate_Do_Domain_Ids:
              Replicate_Ignore_Domain_Ids:
                            Parallel_Mode: aggressive
                                SQL_Delay: 0
                      SQL_Remaining_Delay: NULL
                  Slave_SQL_Running_State:
            1 row in set (0.00 sec)
            {noformat}

            However, when the problem is analyzed, it is clear that the duplicate key should not occur.

            The configuration:

            {noformat}
            [mariadb]
            server_id=2
            log_bin=mariadb-bin
            log_slave_updates
            binlog_format=MIXED
            slave_parallel_threads=16
            slave_parallel_max_queued=1048576
            slave_parallel_mode=aggressive
            {noformat}

            Some observations:

            * When the table is queried on the slave, the row does indeed exists, so it is clear that the duplicate key error is real.

            * The SQL script contains an {{UPDATE}} and a {{DELETE}} before the {{INSERT}}, so the row should have been deleted, which should have prevented the duplicate key error.

            * The row still exists, so it seems that the {{DELETE}} statement had no effect.

            * The values in the existing row in the table are the old non-updated values, so it seems that the {{UPDATE}} statement also had no effect.

            * The slave's binary log contains both the {{UPDATE}} and {{DELETE}} statements, which *should* indicate that the statements were successfully executed on the slave.
            GeoffMontee Geoff Montee (Inactive) made changes -
            Summary With aggressive mode of parallel replication, some replicated statements have no effect With optimistic and aggressive modes of parallel replication, some replicated statements have no effect
            GeoffMontee Geoff Montee (Inactive) made changes -
            Description Some users are seeing duplicate key errors like the following when using aggressive mode of parallel replication:

            {noformat}
            MariaDB [(none)]> SHOW SLAVE STATUS\G
            *************************** 1. row ***************************
                           Slave_IO_State: Waiting for master to send event
                              Master_Host: 172.30.0.154
                              Master_User: repl
                              Master_Port: 3306
                            Connect_Retry: 60
                          Master_Log_File: mariadb-bin.000055
                      Read_Master_Log_Pos: 593262418
                           Relay_Log_File: ip-172-30-0-228-relay-bin.000002
                            Relay_Log_Pos: 33725828
                    Relay_Master_Log_File: mariadb-bin.000039
                         Slave_IO_Running: Yes
                        Slave_SQL_Running: No
                          Replicate_Do_DB:
                      Replicate_Ignore_DB:
                       Replicate_Do_Table:
                   Replicate_Ignore_Table:
                  Replicate_Wild_Do_Table:
              Replicate_Wild_Ignore_Table:
                               Last_Errno: 1062
                               Last_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
                             Skip_Counter: 0
                      Exec_Master_Log_Pos: 705697132
                          Relay_Log_Space: 17101184558
                          Until_Condition: None
                           Until_Log_File:
                            Until_Log_Pos: 0
                       Master_SSL_Allowed: No
                       Master_SSL_CA_File:
                       Master_SSL_CA_Path:
                          Master_SSL_Cert:
                        Master_SSL_Cipher:
                           Master_SSL_Key:
                    Seconds_Behind_Master: NULL
            Master_SSL_Verify_Server_Cert: No
                            Last_IO_Errno: 0
                            Last_IO_Error:
                           Last_SQL_Errno: 1062
                           Last_SQL_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
              Replicate_Ignore_Server_Ids:
                         Master_Server_Id: 1
                           Master_SSL_Crl:
                       Master_SSL_Crlpath:
                               Using_Gtid: Slave_Pos
                              Gtid_IO_Pos: 0-1-42163518
                  Replicate_Do_Domain_Ids:
              Replicate_Ignore_Domain_Ids:
                            Parallel_Mode: aggressive
                                SQL_Delay: 0
                      SQL_Remaining_Delay: NULL
                  Slave_SQL_Running_State:
            1 row in set (0.00 sec)
            {noformat}

            However, when the problem is analyzed, it is clear that the duplicate key should not occur.

            The configuration:

            {noformat}
            [mariadb]
            server_id=2
            log_bin=mariadb-bin
            log_slave_updates
            binlog_format=MIXED
            slave_parallel_threads=16
            slave_parallel_max_queued=1048576
            slave_parallel_mode=aggressive
            {noformat}

            Some observations:

            * When the table is queried on the slave, the row does indeed exists, so it is clear that the duplicate key error is real.

            * The SQL script contains an {{UPDATE}} and a {{DELETE}} before the {{INSERT}}, so the row should have been deleted, which should have prevented the duplicate key error.

            * The row still exists, so it seems that the {{DELETE}} statement had no effect.

            * The values in the existing row in the table are the old non-updated values, so it seems that the {{UPDATE}} statement also had no effect.

            * The slave's binary log contains both the {{UPDATE}} and {{DELETE}} statements, which *should* indicate that the statements were successfully executed on the slave.
            Some users are seeing duplicate key errors like the following when using the optimistic and aggressive modes of parallel replication:

            {noformat}
            MariaDB [(none)]> SHOW SLAVE STATUS\G
            *************************** 1. row ***************************
                           Slave_IO_State: Waiting for master to send event
                              Master_Host: 172.30.0.154
                              Master_User: repl
                              Master_Port: 3306
                            Connect_Retry: 60
                          Master_Log_File: mariadb-bin.000055
                      Read_Master_Log_Pos: 593262418
                           Relay_Log_File: ip-172-30-0-228-relay-bin.000002
                            Relay_Log_Pos: 33725828
                    Relay_Master_Log_File: mariadb-bin.000039
                         Slave_IO_Running: Yes
                        Slave_SQL_Running: No
                          Replicate_Do_DB:
                      Replicate_Ignore_DB:
                       Replicate_Do_Table:
                   Replicate_Ignore_Table:
                  Replicate_Wild_Do_Table:
              Replicate_Wild_Ignore_Table:
                               Last_Errno: 1062
                               Last_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
                             Skip_Counter: 0
                      Exec_Master_Log_Pos: 705697132
                          Relay_Log_Space: 17101184558
                          Until_Condition: None
                           Until_Log_File:
                            Until_Log_Pos: 0
                       Master_SSL_Allowed: No
                       Master_SSL_CA_File:
                       Master_SSL_CA_Path:
                          Master_SSL_Cert:
                        Master_SSL_Cipher:
                           Master_SSL_Key:
                    Seconds_Behind_Master: NULL
            Master_SSL_Verify_Server_Cert: No
                            Last_IO_Errno: 0
                            Last_IO_Error:
                           Last_SQL_Errno: 1062
                           Last_SQL_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
              Replicate_Ignore_Server_Ids:
                         Master_Server_Id: 1
                           Master_SSL_Crl:
                       Master_SSL_Crlpath:
                               Using_Gtid: Slave_Pos
                              Gtid_IO_Pos: 0-1-42163518
                  Replicate_Do_Domain_Ids:
              Replicate_Ignore_Domain_Ids:
                            Parallel_Mode: aggressive
                                SQL_Delay: 0
                      SQL_Remaining_Delay: NULL
                  Slave_SQL_Running_State:
            1 row in set (0.00 sec)
            {noformat}

            However, when the problem is analyzed, it is clear that the duplicate key should not occur.

            The configuration:

            {noformat}
            [mariadb]
            server_id=2
            log_bin=mariadb-bin
            log_slave_updates
            binlog_format=MIXED
            slave_parallel_threads=16
            slave_parallel_max_queued=1048576
            slave_parallel_mode=aggressive
            {noformat}

            The configuration with {{slave_parallel_mode=optimistic}} was identical.

            Some observations:

            * When the table is queried on the slave, the row does indeed exists, so it is clear that the duplicate key error is real.

            * The SQL script contains an {{UPDATE}} and a {{DELETE}} before the {{INSERT}}, so the row should have been deleted, which should have prevented the duplicate key error.

            * The row still exists, so it seems that the {{DELETE}} statement had no effect.

            * The values in the existing row in the table are the old non-updated values, so it seems that the {{UPDATE}} statement also had no effect.

            * The slave's binary log contains both the {{UPDATE}} and {{DELETE}} statements, which *should* indicate that the statements were successfully executed on the slave.
            GeoffMontee Geoff Montee (Inactive) made changes -
            Affects Version/s 10.3 [ 22126 ]
            GeoffMontee Geoff Montee (Inactive) made changes -
            Elkin Andrei Elkin made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            GeoffMontee Geoff Montee (Inactive) made changes -
            Description Some users are seeing duplicate key errors like the following when using the optimistic and aggressive modes of parallel replication:

            {noformat}
            MariaDB [(none)]> SHOW SLAVE STATUS\G
            *************************** 1. row ***************************
                           Slave_IO_State: Waiting for master to send event
                              Master_Host: 172.30.0.154
                              Master_User: repl
                              Master_Port: 3306
                            Connect_Retry: 60
                          Master_Log_File: mariadb-bin.000055
                      Read_Master_Log_Pos: 593262418
                           Relay_Log_File: ip-172-30-0-228-relay-bin.000002
                            Relay_Log_Pos: 33725828
                    Relay_Master_Log_File: mariadb-bin.000039
                         Slave_IO_Running: Yes
                        Slave_SQL_Running: No
                          Replicate_Do_DB:
                      Replicate_Ignore_DB:
                       Replicate_Do_Table:
                   Replicate_Ignore_Table:
                  Replicate_Wild_Do_Table:
              Replicate_Wild_Ignore_Table:
                               Last_Errno: 1062
                               Last_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
                             Skip_Counter: 0
                      Exec_Master_Log_Pos: 705697132
                          Relay_Log_Space: 17101184558
                          Until_Condition: None
                           Until_Log_File:
                            Until_Log_Pos: 0
                       Master_SSL_Allowed: No
                       Master_SSL_CA_File:
                       Master_SSL_CA_Path:
                          Master_SSL_Cert:
                        Master_SSL_Cipher:
                           Master_SSL_Key:
                    Seconds_Behind_Master: NULL
            Master_SSL_Verify_Server_Cert: No
                            Last_IO_Errno: 0
                            Last_IO_Error:
                           Last_SQL_Errno: 1062
                           Last_SQL_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
              Replicate_Ignore_Server_Ids:
                         Master_Server_Id: 1
                           Master_SSL_Crl:
                       Master_SSL_Crlpath:
                               Using_Gtid: Slave_Pos
                              Gtid_IO_Pos: 0-1-42163518
                  Replicate_Do_Domain_Ids:
              Replicate_Ignore_Domain_Ids:
                            Parallel_Mode: aggressive
                                SQL_Delay: 0
                      SQL_Remaining_Delay: NULL
                  Slave_SQL_Running_State:
            1 row in set (0.00 sec)
            {noformat}

            However, when the problem is analyzed, it is clear that the duplicate key should not occur.

            The configuration:

            {noformat}
            [mariadb]
            server_id=2
            log_bin=mariadb-bin
            log_slave_updates
            binlog_format=MIXED
            slave_parallel_threads=16
            slave_parallel_max_queued=1048576
            slave_parallel_mode=aggressive
            {noformat}

            The configuration with {{slave_parallel_mode=optimistic}} was identical.

            Some observations:

            * When the table is queried on the slave, the row does indeed exists, so it is clear that the duplicate key error is real.

            * The SQL script contains an {{UPDATE}} and a {{DELETE}} before the {{INSERT}}, so the row should have been deleted, which should have prevented the duplicate key error.

            * The row still exists, so it seems that the {{DELETE}} statement had no effect.

            * The values in the existing row in the table are the old non-updated values, so it seems that the {{UPDATE}} statement also had no effect.

            * The slave's binary log contains both the {{UPDATE}} and {{DELETE}} statements, which *should* indicate that the statements were successfully executed on the slave.
            Some users are seeing duplicate key errors like the following when using the optimistic and aggressive modes of parallel replication:

            {noformat}
            MariaDB [(none)]> SHOW SLAVE STATUS\G
            *************************** 1. row ***************************
                           Slave_IO_State: Waiting for master to send event
                              Master_Host: 172.30.0.154
                              Master_User: repl
                              Master_Port: 3306
                            Connect_Retry: 60
                          Master_Log_File: mariadb-bin.000055
                      Read_Master_Log_Pos: 593262418
                           Relay_Log_File: ip-172-30-0-228-relay-bin.000002
                            Relay_Log_Pos: 33725828
                    Relay_Master_Log_File: mariadb-bin.000039
                         Slave_IO_Running: Yes
                        Slave_SQL_Running: No
                          Replicate_Do_DB:
                      Replicate_Ignore_DB:
                       Replicate_Do_Table:
                   Replicate_Ignore_Table:
                  Replicate_Wild_Do_Table:
              Replicate_Wild_Ignore_Table:
                               Last_Errno: 1062
                               Last_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
                             Skip_Counter: 0
                      Exec_Master_Log_Pos: 705697132
                          Relay_Log_Space: 17101184558
                          Until_Condition: None
                           Until_Log_File:
                            Until_Log_Pos: 0
                       Master_SSL_Allowed: No
                       Master_SSL_CA_File:
                       Master_SSL_CA_Path:
                          Master_SSL_Cert:
                        Master_SSL_Cipher:
                           Master_SSL_Key:
                    Seconds_Behind_Master: NULL
            Master_SSL_Verify_Server_Cert: No
                            Last_IO_Errno: 0
                            Last_IO_Error:
                           Last_SQL_Errno: 1062
                           Last_SQL_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
              Replicate_Ignore_Server_Ids:
                         Master_Server_Id: 1
                           Master_SSL_Crl:
                       Master_SSL_Crlpath:
                               Using_Gtid: Slave_Pos
                              Gtid_IO_Pos: 0-1-42163518
                  Replicate_Do_Domain_Ids:
              Replicate_Ignore_Domain_Ids:
                            Parallel_Mode: aggressive
                                SQL_Delay: 0
                      SQL_Remaining_Delay: NULL
                  Slave_SQL_Running_State:
            1 row in set (0.00 sec)
            {noformat}

            However, when the problem is analyzed, it is clear that the duplicate key should not occur.

            The configuration:

            {noformat}
            [mariadb]
            server_id=2
            log_bin=mariadb-bin
            log_slave_updates
            binlog_format=MIXED
            slave_parallel_threads=16
            slave_parallel_max_queued=1048576
            slave_parallel_mode=aggressive
            {noformat}

            The configuration with {{slave_parallel_mode=optimistic}} was identical.

            Some observations:

            * When the table is queried on the slave, the row does indeed exists, so it is clear that the duplicate key error is real.

            * The SQL script contains an {{UPDATE}} and a {{DELETE}} before the {{INSERT}}, so the row should have been deleted, which should have prevented the duplicate key error.

            * The row still exists, so it seems that the {{DELETE}} statement had no effect.

            * The values in the existing row in the table are the old non-updated values, so it seems that the {{UPDATE}} statement also had no effect.

            * The slave's binary log contains both the {{UPDATE}} and {{DELETE}} statements, which *should* indicate that the statements were successfully executed on the slave.

            * I haven't yet been able to reproduce the problem when the master has {{binlog_format=ROW}} set, so row-based replication may be immune to the problem.
            marko Marko Mäkelä made changes -
            Attachment MDEV-20605.patch [ 49204 ]

            I debugged this in Elkin’s environment today and concluded that InnoDB is to blame.

            The InnoDB purge thread seems to be wrongly converting a waiting gap lock request that is attached to a purgeable record, to a granted gap lock request on the page supremum.

            I have suspected that the gap-lock-widening of the purge thread could cause problems, but mostly I had the non-waiting scenarios in mind, and did not come up with a specific example. The gap locking rules are summarized in MDEV-16406.

            MDEV-20605.patch introduces an assertion to the current MariaDB 10.4. It should catch this bug more easily. I hope that mleich can reproduce this bug with RQG and use the RQG simplifier to produce a simple grammar.

            The test case that I debugged was running on a replication slave, and it would not fail every time. It is much more convenient to test and fix this without involving replication.

            marko Marko Mäkelä added a comment - I debugged this in Elkin ’s environment today and concluded that InnoDB is to blame. The InnoDB purge thread seems to be wrongly converting a waiting gap lock request that is attached to a purgeable record, to a granted gap lock request on the page supremum. I have suspected that the gap-lock-widening of the purge thread could cause problems, but mostly I had the non-waiting scenarios in mind, and did not come up with a specific example. The gap locking rules are summarized in MDEV-16406 . MDEV-20605.patch introduces an assertion to the current MariaDB 10.4. It should catch this bug more easily. I hope that mleich can reproduce this bug with RQG and use the RQG simplifier to produce a simple grammar. The test case that I debugged was running on a replication slave, and it would not fail every time. It is much more convenient to test and fix this without involving replication.
            marko Marko Mäkelä made changes -
            Component/s Storage Engine - InnoDB [ 10129 ]
            Assignee Andrei Elkin [ elkin ] Matthias Leich [ mleich ]
            mleich Matthias Leich added a comment - - edited

            10.4 b05be3ef8c8668ddbcbe8e1c08dcd4fcc88eb4cf 2019-10-11 + the MDEV-20605.patch mentioned above compiled with debug.
            I get frequent the following assert
            Version: '10.4.9-MariaDB-debug-log'  socket: ...
            2019-10-11 17:01:20 0x7f8100ff9700  InnoDB: Assertion failure in file /home/mleich/work/10.4/storage/innobase/lock/lock0lock.cc line 2447
            InnoDB: Failing assertion: !interesting || !(lock->type_mode & LOCK_WAIT)
            ...
            Query (0x0): 
            Connection ID (thread ID): 3
            Status: NOT_KILLED
            ...
            #3  <signal handler called>
            #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
            #5  0x00007f815f52437a in __GI_abort () at abort.c:89
            #6  0x00005628d0b2022a in ut_dbg_assertion_failed (expr=0x5628d12ae780 "!interesting || !(lock->type_mode & LOCK_WAIT)", file=0x5628d12ad928 "storage/innobase/lock/lock0lock.cc", line=2447) at storage/innobase/ut/ut0dbg.cc:60
            #7  0x00005628d093fd94 in lock_rec_inherit_to_gap (heir_block=0x7f813c02e500, block=0x7f813c02e500, heir_heap_no=1, heap_no=2, interesting=true) at storage/innobase/lock/lock0lock.cc:2447
            #8  0x00005628d0942b67 in lock_update_delete (block=0x7f813c02e500, rec=0x7f813c97807f "\200", interesting=true) at storage/innobase/lock/lock0lock.cc:3393
            #9  0x00005628d0b69241 in btr_cur_optimistic_delete (cursor=0x5628d46e5480, flags=0, mtr=0x7f8100ff82d0, interesting=true) at storage/innobase/btr/btr0cur.cc:5721
            #10 0x00005628d0a66ceb in row_purge_remove_clust_if_poss_low (node=0x5628d46e53d8, mode=2) at storage/innobase/row/row0purge.cc:143
            #11 0x00005628d0a66f5b in row_purge_remove_clust_if_poss (node=0x5628d46e53d8) at storage/innobase/row/row0purge.cc:194
            #12 0x00005628d0a68aaf in row_purge_del_mark (node=0x5628d46e53d8) at storage/innobase/row/row0purge.cc:784
            #13 0x00005628d0a6a67c in row_purge_record_func (node=0x5628d46e53d8, undo_rec=0x5628d46e5988 "", thr=0x5628d46e5208, updated_extern=false) at storage/innobase/row/row0purge.cc:1195
            #14 0x00005628d0a6a989 in row_purge (node=0x5628d46e53d8, undo_rec=0x5628d46e5988 "", thr=0x5628d46e5208) at storage/innobase/row/row0purge.cc:1262
            #15 0x00005628d0a6abe6 in row_purge_step (thr=0x5628d46e5208) at storage/innobase/row/row0purge.cc:1321
            #16 0x00005628d09d55c9 in que_thr_step (thr=0x5628d46e5208) at storage/innobase/que/que0que.cc:1037
            #17 0x00005628d09d5860 in que_run_threads_low (thr=0x5628d46e5208) at storage/innobase/que/que0que.cc:1099
            #18 0x00005628d09d5ab0 in que_run_threads (thr=0x5628d46e5208) at storage/innobase/que/que0que.cc:1139
            #19 0x00005628d0ab5647 in srv_task_execute () at storage/innobase/srv/srv0srv.cc:2457
            #20 0x00005628d0ab5870 in srv_worker_thread (arg=0x0) at storage/innobase/srv/srv0srv.cc:2505
            #21 0x00007f81603676da in start_thread (arg=0x7f8100ff9700) at pthread_create.c:456
             
            The RQG testing scenario:
            Session 1 at begin:
                CREATE TABLE t13 ( pk INTEGER AUTO_INCREMENT, col1 VARCHAR(255), PRIMARY KEY (pk)) ENGINE=innodb ;
                INSERT INTO t13 VALUES  (NULL, 'yourself') ;
            After that run the sessions 1 till 10 again and again only
               DELETE FROM t13 WHERE col1 <> ( 'otto' ) ORDER BY `pk` LIMIT 1 ;
            till the assert above is hit.
            
            

            mleich Matthias Leich added a comment - - edited 10.4 b05be3ef8c8668ddbcbe8e1c08dcd4fcc88eb4cf 2019-10-11 + the MDEV-20605.patch mentioned above compiled with debug. I get frequent the following assert Version: '10.4.9-MariaDB-debug-log' socket: ... 2019-10-11 17:01:20 0x7f8100ff9700 InnoDB: Assertion failure in file /home/mleich/work/10.4/storage/innobase/lock/lock0lock.cc line 2447 InnoDB: Failing assertion: !interesting || !(lock->type_mode & LOCK_WAIT) ... Query (0x0): Connection ID (thread ID): 3 Status: NOT_KILLED ... #3 <signal handler called> #4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #5 0x00007f815f52437a in __GI_abort () at abort.c:89 #6 0x00005628d0b2022a in ut_dbg_assertion_failed (expr=0x5628d12ae780 "!interesting || !(lock->type_mode & LOCK_WAIT)", file=0x5628d12ad928 "storage/innobase/lock/lock0lock.cc", line=2447) at storage/innobase/ut/ut0dbg.cc:60 #7 0x00005628d093fd94 in lock_rec_inherit_to_gap (heir_block=0x7f813c02e500, block=0x7f813c02e500, heir_heap_no=1, heap_no=2, interesting=true) at storage/innobase/lock/lock0lock.cc:2447 #8 0x00005628d0942b67 in lock_update_delete (block=0x7f813c02e500, rec=0x7f813c97807f "\200", interesting=true) at storage/innobase/lock/lock0lock.cc:3393 #9 0x00005628d0b69241 in btr_cur_optimistic_delete (cursor=0x5628d46e5480, flags=0, mtr=0x7f8100ff82d0, interesting=true) at storage/innobase/btr/btr0cur.cc:5721 #10 0x00005628d0a66ceb in row_purge_remove_clust_if_poss_low (node=0x5628d46e53d8, mode=2) at storage/innobase/row/row0purge.cc:143 #11 0x00005628d0a66f5b in row_purge_remove_clust_if_poss (node=0x5628d46e53d8) at storage/innobase/row/row0purge.cc:194 #12 0x00005628d0a68aaf in row_purge_del_mark (node=0x5628d46e53d8) at storage/innobase/row/row0purge.cc:784 #13 0x00005628d0a6a67c in row_purge_record_func (node=0x5628d46e53d8, undo_rec=0x5628d46e5988 "", thr=0x5628d46e5208, updated_extern=false) at storage/innobase/row/row0purge.cc:1195 #14 0x00005628d0a6a989 in row_purge (node=0x5628d46e53d8, undo_rec=0x5628d46e5988 "", thr=0x5628d46e5208) at storage/innobase/row/row0purge.cc:1262 #15 0x00005628d0a6abe6 in row_purge_step (thr=0x5628d46e5208) at storage/innobase/row/row0purge.cc:1321 #16 0x00005628d09d55c9 in que_thr_step (thr=0x5628d46e5208) at storage/innobase/que/que0que.cc:1037 #17 0x00005628d09d5860 in que_run_threads_low (thr=0x5628d46e5208) at storage/innobase/que/que0que.cc:1099 #18 0x00005628d09d5ab0 in que_run_threads (thr=0x5628d46e5208) at storage/innobase/que/que0que.cc:1139 #19 0x00005628d0ab5647 in srv_task_execute () at storage/innobase/srv/srv0srv.cc:2457 #20 0x00005628d0ab5870 in srv_worker_thread (arg=0x0) at storage/innobase/srv/srv0srv.cc:2505 #21 0x00007f81603676da in start_thread (arg=0x7f8100ff9700) at pthread_create.c:456   The RQG testing scenario: Session 1 at begin: CREATE TABLE t13 ( pk INTEGER AUTO_INCREMENT, col1 VARCHAR(255), PRIMARY KEY (pk)) ENGINE=innodb ; INSERT INTO t13 VALUES (NULL, 'yourself') ; After that run the sessions 1 till 10 again and again only DELETE FROM t13 WHERE col1 <> ( 'otto' ) ORDER BY `pk` LIMIT 1 ; till the assert above is hit.
            marko Marko Mäkelä made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            Assignee Matthias Leich [ mleich ] Thirunarayanan Balathandayuthapani [ thiru ]
            mleich Matthias Leich made changes -
            Attachment thiru_20605.yy [ 49219 ]

            Let me give some background for understanding the problem and a possible solution.

            InnoDB implements multi-versioning by guaranteeing to keep the full history of committed transactions until there is no old read view that would see the database state as it was before those transactions were committed. In particular, records are not deleted immediately; they are merely marked for deletion. Only the ROLLBACK of an INSERT can immediately delete a record.

            So, there may be many delete-marked records in indexes. Some of them could be purgeable. There may be a significant purge lag, for example, in the instant when a very old read view was destroyed when its transaction was committed, millions of records could become purgeable.

            InnoDB implements next-key locking by gap locks that are anchored to the record that is succeeding the preceding record of the gap. (There are two mutually exclusive types of gap locks, in MDEV-16406 called read-gap and write-gap locks. Traditionally, read-gap locks are called ‘gap locks’ and write-gap locks are ‘insert intention locks.)

            I think that at the root of the problem is the fact that the anchor records of the gap locks may be committed (and possibly purgeable) delete-marked records. This problem has been described earlier in MDEV-14589 (InnoDB should not lock a delete-marked record).

            The problem

            As proven with MDEV-20605.patch, it seems that when a waiting gap lock request is attached to a purgeable record, and when another transaction holds a conflicting gap lock on the successor of the purgeable record, purge would wrongly convert the waiting gap lock to a granted lock on that successor record, even though another transaction is holding a conflicting type of a gap lock on that record.

            Possible workaround: Suspend purge

            A simple ‘fix’ could be to stop the purge when it detects this situation. There are numerous problems with this, such as:

            • How would purge be signaled that it should retry?
            • What if other transactions keep arriving, attaching gap locks on the to-be-purged record? Shouldn’t these transactions be blocked, to avoid starving the purge?
            • How to ensure that purge will proceed nevertheless on shutdown?

            Possible fix: Acquire gap locks on all successive delete-marked committed records

            There seems to be another way of guaranteeing that purge will be invisible to transactions. Let us assume that the index contents is as follows, and we want a gap-lock that covers the non-existing key (2):

            (0),delete-marked(5),delete-marked(9),13

            Currently, we would only acquire a gap lock on the delete-marked key (5), and this gap lock would cover the keys in the open range (0,5). If purge removes the key (5), it would ‘widen’ the gap by attaching the gap lock on the delete-marked key (9). So, the gap would cover (0,9).

            What if we changed the gap lock acquisition algorithm? Instead of acquiring a gap lock only on the successor record, let us keep acquiring record+gap locks on its successors until we reach the end of the index, or a record that is not delete-marked. With this approach, we would lock the semi-open ranges (0,5],(5,9] before locking the final open range (9,13). This combined range (0,5],(5,9],(9,13) is equivalent to (0,13). Thus, we are immune to the scenario where purge would remove the delete-marked anchor records in between!

            Note: I am proposing that we acquire also record locks on the delete-marked records. I think that we are already doing that, except for READ COMMITTED isolation level the fix of MDEV-14589 is skipping that.

            With this extended gap locking, whenever we remove a record, we can simply remove the attached gap lock without having to create a gap lock for the successor record. If there are waiting gap lock requests on the record that we are removing, we would have to wake up the waiting transactions.

            Note on determinism

            I believe that the extended gap locking could make InnoDB gap locks fully deterministic and thus possibly help make some cases of replication deterministic. Currently, without my proposed change, depending on how much purge is lagging behind, gap locks could be ‘narrow’ or ‘wide’. Let us consider my above example again:

            create table t(k int primary key) engine=innodb;
            insert into t values (0),(5),(9),(13);
            delete from t where k=5 or k=9;
            commit;
            connection a;
            begin; select * from t where k=2 lock in share mode;
            connection b;
            begin; insert into t set k=10;
            

            With my proposed fix, both the SELECT and INSERT would lock the whole range of 0<k<13, and they would conflict with each other, leading into a lock wait timeout for one of the transactions if neither transaction is committed.

            Currently, depending on when purge removes the delete-marked committed keys k=5, k=9, both transactions could actually succeed, and never observe a lock wait timeout, no matter how long they wait. And, I believe that both would eventually be holding conflicting locks (read-gap a.k.a. ‘gap’ lock and write-gap a.k.a. ‘insert intention’ lock) on the full range 0<k<13. This feels very wrong!

            marko Marko Mäkelä added a comment - Let me give some background for understanding the problem and a possible solution. InnoDB implements multi-versioning by guaranteeing to keep the full history of committed transactions until there is no old read view that would see the database state as it was before those transactions were committed. In particular, records are not deleted immediately; they are merely marked for deletion. Only the ROLLBACK of an INSERT can immediately delete a record. So, there may be many delete-marked records in indexes. Some of them could be purgeable. There may be a significant purge lag , for example, in the instant when a very old read view was destroyed when its transaction was committed, millions of records could become purgeable. InnoDB implements next-key locking by gap locks that are anchored to the record that is succeeding the preceding record of the gap. (There are two mutually exclusive types of gap locks, in MDEV-16406 called read-gap and write-gap locks. Traditionally, read-gap locks are called ‘gap locks’ and write-gap locks are ‘insert intention locks.) I think that at the root of the problem is the fact that the anchor records of the gap locks may be committed (and possibly purgeable) delete-marked records. This problem has been described earlier in MDEV-14589 (InnoDB should not lock a delete-marked record). The problem As proven with MDEV-20605.patch , it seems that when a waiting gap lock request is attached to a purgeable record, and when another transaction holds a conflicting gap lock on the successor of the purgeable record, purge would wrongly convert the waiting gap lock to a granted lock on that successor record, even though another transaction is holding a conflicting type of a gap lock on that record. Possible workaround: Suspend purge A simple ‘fix’ could be to stop the purge when it detects this situation. There are numerous problems with this, such as: How would purge be signaled that it should retry? What if other transactions keep arriving, attaching gap locks on the to-be-purged record? Shouldn’t these transactions be blocked, to avoid starving the purge? How to ensure that purge will proceed nevertheless on shutdown? Possible fix: Acquire gap locks on all successive delete-marked committed records There seems to be another way of guaranteeing that purge will be invisible to transactions. Let us assume that the index contents is as follows, and we want a gap-lock that covers the non-existing key (2): (0),delete-marked(5),delete-marked(9),13 Currently, we would only acquire a gap lock on the delete-marked key (5), and this gap lock would cover the keys in the open range (0,5). If purge removes the key (5), it would ‘widen’ the gap by attaching the gap lock on the delete-marked key (9). So, the gap would cover (0,9). What if we changed the gap lock acquisition algorithm? Instead of acquiring a gap lock only on the successor record, let us keep acquiring record+gap locks on its successors until we reach the end of the index, or a record that is not delete-marked. With this approach, we would lock the semi-open ranges (0,5],(5,9] before locking the final open range (9,13). This combined range (0,5],(5,9],(9,13) is equivalent to (0,13). Thus, we are immune to the scenario where purge would remove the delete-marked anchor records in between! Note: I am proposing that we acquire also record locks on the delete-marked records. I think that we are already doing that, except for READ COMMITTED isolation level the fix of MDEV-14589 is skipping that. With this extended gap locking, whenever we remove a record, we can simply remove the attached gap lock without having to create a gap lock for the successor record. If there are waiting gap lock requests on the record that we are removing, we would have to wake up the waiting transactions. Note on determinism I believe that the extended gap locking could make InnoDB gap locks fully deterministic and thus possibly help make some cases of replication deterministic. Currently, without my proposed change, depending on how much purge is lagging behind, gap locks could be ‘narrow’ or ‘wide’. Let us consider my above example again: create table t(k int primary key ) engine=innodb; insert into t values (0),(5),(9),(13); delete from t where k=5 or k=9; commit ; connection a; begin ; select * from t where k=2 lock in share mode; connection b; begin ; insert into t set k=10; With my proposed fix, both the SELECT and INSERT would lock the whole range of 0<k<13, and they would conflict with each other, leading into a lock wait timeout for one of the transactions if neither transaction is committed. Currently, depending on when purge removes the delete-marked committed keys k=5, k=9, both transactions could actually succeed, and never observe a lock wait timeout, no matter how long they wait. And, I believe that both would eventually be holding conflicting locks (read-gap a.k.a. ‘gap’ lock and write-gap a.k.a. ‘insert intention’ lock) on the full range 0<k<13. This feels very wrong!
            marko Marko Mäkelä made changes -
            thiru Thirunarayanan Balathandayuthapani made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]

            The following test case can get crash for instrumented patch which basically checks whether purge grants the waiting lock.

            --source include/have_innodb.inc
            # Stops the purge
            connect(con_purge,localhost,root,,);
            begin;
            start transaction with consistent snapshot;
             
            connection default;
            create table t1 (f1 int not null, primary key(f1))engine=innodb;
            insert into t1 values(1);
            delete from t1;
             
            connect(con1,localhost,root,,);
            begin;
            select * from t1 where f1 > 2 for update;
            f1
             
            # Connection con2 will wait for LOCK_ORDINARY in delete marked record.
            connect(con2,localhost,root,,);
            begin;
            set debug_sync="lock_wait_suspend_thread_enter SIGNAL purge_start";
            send select * from t1 where f1 > 2 for update;
             
            connection con_purge;
            set debug_sync="now WAIT_FOR purge_start";
            # Allow purge to start and crash at the instrumented patch. (grant lock_wait for con2)
            commit;
             
            connection con2;
            reap;
            f1
            
            

            I wrote the comments in test case for explanation. I am not convinced that instrumental patch repeats the correct issue.
            The above test case is same as the following one which doesn't have to deal with purge:

             
            --source include/have_innodb.inc
            create table t1 (f1 int not null, primary key(f1))engine=innodb;
             
            connect(con1,localhost,root,,);
            begin;
            select * from t1 where f1 > 2 for update;
            f1
             
            connect(con2,localhost,root,,);
            begin;
            select * from t1 where f1 > 2 for update;
            f1
            
            

            I may need to get the perspective from marko as well.

            thiru Thirunarayanan Balathandayuthapani added a comment - - edited The following test case can get crash for instrumented patch which basically checks whether purge grants the waiting lock. --source include/have_innodb.inc # Stops the purge connect (con_purge,localhost,root,,); begin ; start transaction with consistent snapshot;   connection default ; create table t1 (f1 int not null , primary key (f1))engine=innodb; insert into t1 values (1); delete from t1;   connect (con1,localhost,root,,); begin ; select * from t1 where f1 > 2 for update ; f1   # Connection con2 will wait for LOCK_ORDINARY in delete marked record. connect (con2,localhost,root,,); begin ; set debug_sync= "lock_wait_suspend_thread_enter SIGNAL purge_start" ; send select * from t1 where f1 > 2 for update ;   connection con_purge; set debug_sync= "now WAIT_FOR purge_start" ; # Allow purge to start and crash at the instrumented patch. ( grant lock_wait for con2) commit ;   connection con2; reap; f1 I wrote the comments in test case for explanation. I am not convinced that instrumental patch repeats the correct issue. The above test case is same as the following one which doesn't have to deal with purge:   --source include/have_innodb.inc create table t1 (f1 int not null , primary key (f1))engine=innodb;   connect (con1,localhost,root,,); begin ; select * from t1 where f1 > 2 for update ; f1   connect (con2,localhost,root,,); begin ; select * from t1 where f1 > 2 for update ; f1 I may need to get the perspective from marko as well.
            marko Marko Mäkelä made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]

            My debug instrumentation may be too eager. I still think that my idea of acquiring multiple gap locks (and simply discarding them on purge) is correct. I must to come up with an executable test case that demonstrates my line of thought.

            marko Marko Mäkelä added a comment - My debug instrumentation may be too eager. I still think that my idea of acquiring multiple gap locks (and simply discarding them on purge) is correct. I must to come up with an executable test case that demonstrates my line of thought.
            marko Marko Mäkelä made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Marko Mäkelä [ marko ]
            marko Marko Mäkelä made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            marko Marko Mäkelä made changes -

            I debugged thiru’s test cases (with some fixes) on 10.3, with MDEV-20605.patch ported to 10.3 (one trivial conflict).

            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc
            #--source include/have_sequence.inc
             
            connect(con_purge,localhost,root,,);
            start transaction with consistent snapshot;
             
            connection default;
            create table t1 (f1 int not null, primary key(f1))engine=innodb;
            #insert into t1 select -seq from seq_10000_to_1;
            insert into t1 values(1);
            delete from t1 where f1=1;
             
            connect(con1,localhost,root,,);
            begin;
            select * from t1 where f1 > 2 for update;
             
            # Connection con2 will wait for LOCK_ORDINARY in delete marked record.
            connect(con2,localhost,root,,);
            begin;
            set debug_sync="lock_wait_suspend_thread_enter SIGNAL purge_start";
            send select * from t1 where f1 > 2 for update;
             
            connection con_purge;
             
            set debug_sync="now WAIT_FOR purge_start";
            # Allow purge to start and crash at the instrumented patch. (grant lock_wait for con2)
            commit;
            connection con2;
            reap;
            disconnect con2;
            disconnect con_purge;
            disconnect con1;
            connection default;
            drop table t1;
            SET DEBUG_SYNC='RESET';
            

            Curiously, if I do not add the records f1<0 to the table, the SELECT * FROM t1 WHERE f1>2 FOR UPDATE will actually lock the delete-marked record f1=1, because the WHERE condition would not be passed down to prebuilt->search_tuple. Apparently, it will be a full table scan, via handler::ha_index_first(). This is causing some unnecessary locking.

            The SELECT should not really acquire a lock on the delete-marked record itself, but instead only acquire a special gap lock that will prevent the delete-marked key from being inserted by other transactions. That is, there should be no conflict between the two locking SELECT operations to begin with, because neither one matched any records. Both SELECT should be allowed to continue in the first place. If the extra records f1<0 are present in the table, then that will be the case: only a gap lock on the supremum record will be acquired, only blocking any future INSERT.

            I believe that implementing my idea of acquiring multiple gap locks when encountering delete-marked records should fix also this scenario. If the record was committed as delete-marked, we must acquire a read-gap-only lock on that record, and keep advancing to next records until an uncommitted record or committed non-delete-marked record or the end of the index is reached. Furthermore, we must ensure that row_ins_sec_index_entry_by_modify() and row_ins_clust_index_entry_by_modify() and similar functions will conflict with any read-gap-only locks that might exist on the successor of the committed delete-marked record.

            marko Marko Mäkelä added a comment - I debugged thiru ’s test cases (with some fixes) on 10.3, with MDEV-20605.patch ported to 10.3 (one trivial conflict). --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc # --source include/have_sequence.inc   connect (con_purge,localhost,root,,); start transaction with consistent snapshot;   connection default ; create table t1 (f1 int not null , primary key (f1))engine=innodb; # insert into t1 select -seq from seq_10000_to_1; insert into t1 values (1); delete from t1 where f1=1;   connect (con1,localhost,root,,); begin ; select * from t1 where f1 > 2 for update ;   # Connection con2 will wait for LOCK_ORDINARY in delete marked record. connect (con2,localhost,root,,); begin ; set debug_sync= "lock_wait_suspend_thread_enter SIGNAL purge_start" ; send select * from t1 where f1 > 2 for update ;   connection con_purge;   set debug_sync= "now WAIT_FOR purge_start" ; # Allow purge to start and crash at the instrumented patch. ( grant lock_wait for con2) commit ; connection con2; reap; disconnect con2; disconnect con_purge; disconnect con1; connection default ; drop table t1; SET DEBUG_SYNC= 'RESET' ; Curiously, if I do not add the records f1<0 to the table, the SELECT * FROM t1 WHERE f1>2 FOR UPDATE will actually lock the delete-marked record f1=1 , because the WHERE condition would not be passed down to prebuilt->search_tuple . Apparently, it will be a full table scan, via handler::ha_index_first() . This is causing some unnecessary locking. The SELECT should not really acquire a lock on the delete-marked record itself, but instead only acquire a special gap lock that will prevent the delete-marked key from being inserted by other transactions. That is, there should be no conflict between the two locking SELECT operations to begin with, because neither one matched any records. Both SELECT should be allowed to continue in the first place. If the extra records f1<0 are present in the table, then that will be the case: only a gap lock on the supremum record will be acquired, only blocking any future INSERT . I believe that implementing my idea of acquiring multiple gap locks when encountering delete-marked records should fix also this scenario. If the record was committed as delete-marked, we must acquire a read-gap-only lock on that record, and keep advancing to next records until an uncommitted record or committed non-delete-marked record or the end of the index is reached. Furthermore, we must ensure that row_ins_sec_index_entry_by_modify() and row_ins_clust_index_entry_by_modify() and similar functions will conflict with any read-gap-only locks that might exist on the successor of the committed delete-marked record.

            I tried and failed to create a test case that involves INSERT (which was part of the original replication scenario). While debugging my test, I see that we are unnecessarily locking a delete-marked committed record f1=1 (but not the preceding gap) here:

            		if (index == clust_index
            		    && mode == PAGE_CUR_GE
            		    && direction == 0
            		    && dtuple_get_n_fields_cmp(search_tuple)
            		    == dict_index_get_n_unique(index)
            		    && 0 == cmp_dtuple_rec(search_tuple, rec, offsets)) {
            no_gap_lock:
            			lock_type = LOCK_REC_NOT_GAP;
            		}
             
            		err = sel_set_rec_lock(pcur,
            				       rec, index, offsets,
            				       prebuilt->select_lock_type,
            				       lock_type, thr, &mtr);
            

            The current statement is

            select * from t1 where f1 between 1 and 3 for update;
            

            The delete-marked committed record is f1=1, which is at the very start of the search range.

            The search is also locking ‘one row too much’, f1=4 (and its preceding gap), which is after the search range. If I add two ‘sentinel records’, hoping to not see a conflict between the INSERT and SELECT, also f1=5 will be locked by the SELECT:

            connect(con_purge,localhost,root,,);
            start transaction with consistent snapshot;
             
            connection default;
            create table t1 (f1 int not null, primary key(f1))engine=innodb;
            insert into t1 select -seq from seq_10000_to_1;
            insert into t1 values(1),(3),(4);
            delete from t1 where f1>0;
             
            connect(con1,localhost,root,,);
            begin;
            insert into t1 values(5);
             
            # Connection con2 will wait for LOCK_ORDINARY in delete marked record.
            connect(con2,localhost,root,,);
            begin;
            set debug_sync="lock_wait_suspend_thread_enter SIGNAL purge_start";
            send select * from t1 where f1 between 1 and 3 for update;
            

            For that particular bug (locking too many records after a range), MySQL 8.0.18 included a fix that we should evaluate and try to port to MariaDB.

            Side note: When purge is removing the record f1=1, it is creating a LOCK_GAP|LOCK_X|LOCK_REC on the successor record f1=3, which is delete-marked and committed. It fails to notice that there already was a covering LOCK_X|LOCK_REC on that record, and unnecessarily allocates another record lock. My proposed fix for this bug would avoid any lock creation when purge or rollback are deleting records. Such unnecessary allocation would also be avoided if we introduce a single record lock bitmap with 4 bits per record, as proposed in MDEV-16406.

            marko Marko Mäkelä added a comment - I tried and failed to create a test case that involves INSERT (which was part of the original replication scenario). While debugging my test, I see that we are unnecessarily locking a delete-marked committed record f1=1 (but not the preceding gap) here: if (index == clust_index && mode == PAGE_CUR_GE && direction == 0 && dtuple_get_n_fields_cmp(search_tuple) == dict_index_get_n_unique(index) && 0 == cmp_dtuple_rec(search_tuple, rec, offsets)) { no_gap_lock: lock_type = LOCK_REC_NOT_GAP; }   err = sel_set_rec_lock(pcur, rec, index, offsets, prebuilt->select_lock_type, lock_type, thr, &mtr); The current statement is select * from t1 where f1 between 1 and 3 for update ; The delete-marked committed record is f1=1 , which is at the very start of the search range. The search is also locking ‘one row too much’, f1=4 (and its preceding gap), which is after the search range. If I add two ‘sentinel records’, hoping to not see a conflict between the INSERT and SELECT , also f1=5 will be locked by the SELECT : connect (con_purge,localhost,root,,); start transaction with consistent snapshot;   connection default ; create table t1 (f1 int not null , primary key (f1))engine=innodb; insert into t1 select -seq from seq_10000_to_1; insert into t1 values (1),(3),(4); delete from t1 where f1>0;   connect (con1,localhost,root,,); begin ; insert into t1 values (5);   # Connection con2 will wait for LOCK_ORDINARY in delete marked record. connect (con2,localhost,root,,); begin ; set debug_sync= "lock_wait_suspend_thread_enter SIGNAL purge_start" ; send select * from t1 where f1 between 1 and 3 for update ; For that particular bug (locking too many records after a range), MySQL 8.0.18 included a fix that we should evaluate and try to port to MariaDB. Side note: When purge is removing the record f1=1 , it is creating a LOCK_GAP|LOCK_X|LOCK_REC on the successor record f1=3 , which is delete-marked and committed. It fails to notice that there already was a covering LOCK_X|LOCK_REC on that record, and unnecessarily allocates another record lock. My proposed fix for this bug would avoid any lock creation when purge or rollback are deleting records. Such unnecessary allocation would also be avoided if we introduce a single record lock bitmap with 4 bits per record, as proposed in MDEV-16406 .
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Affects Version/s 5.5 [ 15800 ]
            Affects Version/s 10.0 [ 16000 ]
            Affects Version/s 10.1 [ 16100 ]
            Affects Version/s 10.4 [ 22408 ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Vladislav Lesin [ vlad.lesin ]
            marko Marko Mäkelä added a comment - - edited

            An alternative approach was suggested, to make purge more visible to concurrent transactions. We already have an unfortunate case of that, for SPATIAL INDEX (see MDEV-15284 and MDEV-16269).

            What would be the benefit of making purge more visible? To me, doing so would seem to cause more locking inconsistency between server instances in a replication or Galera setup, which is something that should be avoided.

            I would prefer things to be so regular that they can be described in simple abstract terms. A maintenance task that runs in the background and lacks theoretical foundation should be as invisible as possible. Side note: This was a main motivation for remove the InnoDB background change buffer merge in MDEV-19514.

            It is a bug if purge or rollback are creating or acquiring any locks.

            • Anything that rollback is modifying should already be exclusively locked by the transaction that is being rolled back, partially or fully.
            • By design, the purge of history is supposed to be invisible to any transactions. Purge is not run within a transaction, so it cannot acquire record locks for itself. And I am reasoning that it should not acquire record locks on behalf of other transactions either.
            marko Marko Mäkelä added a comment - - edited An alternative approach was suggested, to make purge more visible to concurrent transactions. We already have an unfortunate case of that, for SPATIAL INDEX (see MDEV-15284 and MDEV-16269 ). What would be the benefit of making purge more visible? To me, doing so would seem to cause more locking inconsistency between server instances in a replication or Galera setup, which is something that should be avoided. I would prefer things to be so regular that they can be described in simple abstract terms. A maintenance task that runs in the background and lacks theoretical foundation should be as invisible as possible. Side note: This was a main motivation for remove the InnoDB background change buffer merge in MDEV-19514 . It is a bug if purge or rollback are creating or acquiring any locks. Anything that rollback is modifying should already be exclusively locked by the transaction that is being rolled back, partially or fully. By design, the purge of history is supposed to be invisible to any transactions. Purge is not run within a transaction, so it cannot acquire record locks for itself. And I am reasoning that it should not acquire record locks on behalf of other transactions either.
            julien.fritsch Julien Fritsch made changes -
            Assignee Vladislav Lesin [ vlad.lesin ] Julien Fritsch [ julien.fritsch ]
            julien.fritsch Julien Fritsch made changes -
            Assignee Julien Fritsch [ julien.fritsch ] Vladislav Lesin [ vlad.lesin ]
            julien.fritsch Julien Fritsch made changes -
            Rank Ranked higher
            vlad.lesin Vladislav Lesin made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            vlad.lesin Vladislav Lesin added a comment - - edited

            Let's consider the following test (the same test which was previously considered by marko and thiru under 10.2:

            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc
            create table t1 (f1 int not null, primary key(f1))engine=innodb;
            insert into t1 values (10), (20), (30);
             
            connect(con_purge,localhost,root,,);
            start transaction with consistent snapshot;
             
            connection default;
            delete from t1 where f1 = 10 and f1 = 20;
             
            connect(con1,localhost,root,,);
            begin;
            select * from t1 where f1 < 0 for update; // <--- first select
            # Connection con2 will wait for LOCK_ORDINARY in delete marked record.
            connect(con2,localhost,root,,);
            begin;
            set debug_sync="lock_wait_suspend_thread_enter SIGNAL purge_start";
            send select * from t1 where f1 < 0 for update; // <--- second select
             
            connection con_purge;
            set debug_sync="now WAIT_FOR purge_start";
            # Allow purge to start and crash at the instrumented patch. (grant lock_wait for con2)
            commit;
             
            connection con2;
            reap;
            disconnect con2;
            disconnect con_purge;
            disconnect con1;
            connection default;
            drop table t1;
            SET DEBUG_SYNC='RESET';
            

            The first "select" sets LOCK_ORDINARY locks on delete marked records (10), (20) and on record (30). The locks are set with the following stack trace:

            #0  lock_rec_lock (impl=false, mode=3, block=0x7ffff40cc260, heap_no=3, index=0x7fff9c112a88, thr=0x7fff9c09ddc8)
                at ./storage/innobase/lock/lock0lock.cc:2148
            #1  0x0000555556008153 in lock_clust_rec_read_check_and_lock (flags=0, block=0x7ffff40cc260, rec=0x7ffff45c4093 "\200", index=0x7fff9c112a88, 
                offsets=0x7fffefd1a280, mode=LOCK_X, gap_mode=0, thr=0x7fff9c09ddc8)
                at ./storage/innobase/lock/lock0lock.cc:6130
            #2  0x00005555560e8f53 in sel_set_rec_lock (pcur=0x7fff9c09d590, rec=0x7ffff45c4093 "\200", index=0x7fff9c112a88, offsets=0x7fffefd1a280, mode=3, 
                type=0, thr=0x7fff9c09ddc8, mtr=0x7fffefd1a660) at ./storage/innobase/row/row0sel.cc:1261
            #3  0x00005555560f143e in row_search_mvcc (buf=0x7fff9c0a0110 "\377", mode=PAGE_CUR_G, prebuilt=0x7fff9c09d3c8, match_mode=0, direction=1)
                at ./storage/innobase/row/row0sel.cc:5133
            #4  0x0000555555f88aec in ha_innobase::general_fetch (this=0x7fff9c09cc38, buf=0x7fff9c0a0110 "\377", direction=1, match_mode=0)
                at ./storage/innobase/handler/ha_innodb.cc:9736
            

            Let's take a look at the following code in row_search_mvcc():

            ... row_search_mvcc(...) 
            {
            ...
            locks_ok:
            	/* NOTE that at this point rec can be an old version of a clustered
            	index record built for a consistent read. We cannot assume after this
            	point that rec is on a buffer pool page. Functions like
            	page_rec_is_comp() cannot be used! */
             
            	if (rec_get_deleted_flag(rec, comp)) {
            locks_ok_del_marked:
            		/* In delete-marked records, DB_TRX_ID must
            		always refer to an existing undo log record. */
            		ut_ad(index != clust_index
            		      || row_get_rec_trx_id(rec, index, offsets));
             
            		/* The record is delete-marked: we can skip it */
             
            		/* This is an optimization to skip setting the next key lock
            		on the record that follows this delete-marked record. This
            		optimization works because of the unique search criteria
            		which precludes the presence of a range lock between this
            		delete marked record and the record following it.
             
            		For now this is applicable only to clustered indexes while
            		doing a unique search except for HANDLER queries because
            		HANDLER allows NEXT and PREV even in unique search on
            		clustered index. There is scope for further optimization
            		applicable to unique secondary indexes. Current behaviour is
            		to widen the scope of a lock on an already delete marked record
            		if the same record is deleted twice by the same transaction */
            		if (index == clust_index && unique_search
            		    && !prebuilt->used_in_HANDLER) {
             
            			err = DB_RECORD_NOT_FOUND;
             
            			goto normal_return;
            		}
             
            		goto next_rec;
            	}
             
            ...
            }
            

            Please pay attention on "goto next_rec;". I.e. when delete marked record is locked, the lock on the next record is acquired. That is why the first "select" locks two delete marked records (10), (20) and one record (30).

            If we apply the patch proposed by thiru and modify it a little bit:

            @@ -3523,8 +3527,8 @@ lock_update_delete(
                    lock_mutex_enter();
             
                    /* Let the next record inherit the locks from rec, in gap mode */
            -
            -       lock_rec_inherit_to_gap(block, block, next_heap_no, heap_no);
            +       if (!interesting)
            +               lock_rec_inherit_to_gap(block, block, next_heap_no, heap_no, interesting);
             
                    /* Reset the lock bits on rec and release waiting transactions */
            

            I.e. purge thread will not extend lock from delete marked record to it's successor, we will see that the second "select" will fail with "lock wait timeout" error even after all delete marked records are purged, because the first "select" set conflicting lock on record (30).

            In other words the requested by marko functionality is already implemented.

            If so then why the lock is granted for the second "select" in the original test after delete marked (10) is purged?

            Let's now consider the original test. I.e. replace:

            -insert into t1 values (10), (20), (30);
            +insert into t1 values (10);
            ...
            -delete from t1 where f1 = 10 and f1 = 20;
            +delete from t1 where f1 = 10;
            

            In this case the first "select" locks delete marked (10) record with LOCK_ORDINARY type and supremum. All locks on supremum are considered as gap locks.

            When purge thread unlocks delete marked (10), the second "select" is unlocked(see lock_rec_reset_and_release_wait() call in lock_update_delete(), which is invoked from purge thread) and acquires lock on supremum. And the lock is granted with the following stack trace:

            #0  lock_rec_has_to_wait (for_locking=true, trx=0x7ffff495b478, type_mode=3, lock2=0x7ffff495a448, lock_is_on_supremum=true)
                at ./storage/innobase/lock/lock0lock.cc:669
            #1  0x0000555555ffa3e9 in lock_rec_other_has_conflicting (mode=3, block=0x7ffff40cc260, heap_no=1, trx=0x7ffff495b478)
                at ./storage/innobase/lock/lock0lock.cc:1186
            #2  0x0000555555ffd37a in lock_rec_lock_slow (impl=0, mode=3, block=0x7ffff40cc260, heap_no=1, index=0x7fff9c112a88, thr=0x7fff9c09ddc8)
                at ./storage/innobase/lock/lock0lock.cc:2097
            #3  0x0000555555ffd676 in lock_rec_lock (impl=false, mode=3, block=0x7ffff40cc260, heap_no=1, index=0x7fff9c112a88, thr=0x7fff9c09ddc8)
                at ./storage/innobase/lock/lock0lock.cc:2169
            #4  0x0000555556008124 in lock_clust_rec_read_check_and_lock (flags=0, block=0x7ffff40cc260, rec=0x7ffff45c4070 "supremum", index=0x7fff9c112a88, offsets=0x7fffefccf400, mode=LOCK_X, gap_mode=0, 
                thr=0x7fff9c09ddc8) at ./storage/innobase/lock/lock0lock.cc:6130
            #5  0x00005555560e8f67 in sel_set_rec_lock (pcur=0x7fff9c09d590, rec=0x7ffff45c4070 "supremum", index=0x7fff9c112a88, offsets=0x7fffefccf400, mode=3, type=0, thr=0x7fff9c09ddc8, 
                mtr=0x7fffefccf7e0) at ./storage/innobase/row/row0sel.cc:1261
            #6  0x00005555560f07db in row_search_mvcc (buf=0x7fff9c0a0110 "\377", mode=PAGE_CUR_G, prebuilt=0x7fff9c09d3c8, match_mode=0, direction=0)
                at ./storage/innobase/row/row0sel.cc:4835
            

            Let's look at the following code in lock_rec_has_to_wait():

            ...
            lock_rec_has_to_wait()
            {
            ...
            		/* We have somewhat complex rules when gap type record locks
            		cause waits */
             
            		if ((lock_is_on_supremum || (type_mode & LOCK_GAP))
            		    && !(type_mode & LOCK_INSERT_INTENTION)) {
             
            			/* Gap type locks without LOCK_INSERT_INTENTION flag
            			do not need to wait for anything. This is because
            			different users can have conflicting lock types
            			on gaps. */
             
            			return(FALSE);
            		}
            ...
            }
            

            As locks on supremum are considered as gap locks, and gap locks without LOCK_INSERT_INTENTION flag are not conflicting, then the above code and behaviour are correct.

            Note if we try to insert some record in the table after delete marked (10) is purged, the "insert" will be locked as expected even if we don't invoke lock_rec_inherit_to_gap() for delete marked (10) from purge thread, because the first "select" set lock on supremum. As well I would add, that even if we invoke lock_rec_inherit_to_gap() from purge thread, it will not add new lock for supremum and for (30) in the previous test, because it does not create new lock if it already exists, so there is no need to change the code here.

            I have not yet checked the behaviour on "rollback", but the above allows to claim, that the test for the root case is wrong and the requested functionality is already implemented.

            vlad.lesin Vladislav Lesin added a comment - - edited Let's consider the following test (the same test which was previously considered by marko and thiru under 10.2: --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc create table t1 (f1 int not null , primary key(f1))engine=innodb; insert into t1 values ( 10 ), ( 20 ), ( 30 );   connect(con_purge,localhost,root,,); start transaction with consistent snapshot;   connection default ; delete from t1 where f1 = 10 and f1 = 20 ;   connect(con1,localhost,root,,); begin; select * from t1 where f1 < 0 for update; // <--- first select # Connection con2 will wait for LOCK_ORDINARY in delete marked record. connect(con2,localhost,root,,); begin; set debug_sync= "lock_wait_suspend_thread_enter SIGNAL purge_start" ; send select * from t1 where f1 < 0 for update; // <--- second select   connection con_purge; set debug_sync= "now WAIT_FOR purge_start" ; # Allow purge to start and crash at the instrumented patch. (grant lock_wait for con2) commit;   connection con2; reap; disconnect con2; disconnect con_purge; disconnect con1; connection default ; drop table t1; SET DEBUG_SYNC= 'RESET' ; The first "select" sets LOCK_ORDINARY locks on delete marked records (10), (20) and on record (30). The locks are set with the following stack trace: # 0 lock_rec_lock (impl= false , mode= 3 , block= 0x7ffff40cc260 , heap_no= 3 , index= 0x7fff9c112a88 , thr= 0x7fff9c09ddc8 ) at ./storage/innobase/lock/lock0lock.cc: 2148 # 1 0x0000555556008153 in lock_clust_rec_read_check_and_lock (flags= 0 , block= 0x7ffff40cc260 , rec= 0x7ffff45c4093 "\200" , index= 0x7fff9c112a88 , offsets= 0x7fffefd1a280 , mode=LOCK_X, gap_mode= 0 , thr= 0x7fff9c09ddc8 ) at ./storage/innobase/lock/lock0lock.cc: 6130 # 2 0x00005555560e8f53 in sel_set_rec_lock (pcur= 0x7fff9c09d590 , rec= 0x7ffff45c4093 "\200" , index= 0x7fff9c112a88 , offsets= 0x7fffefd1a280 , mode= 3 , type= 0 , thr= 0x7fff9c09ddc8 , mtr= 0x7fffefd1a660 ) at ./storage/innobase/row/row0sel.cc: 1261 # 3 0x00005555560f143e in row_search_mvcc (buf= 0x7fff9c0a0110 "\377" , mode=PAGE_CUR_G, prebuilt= 0x7fff9c09d3c8 , match_mode= 0 , direction= 1 ) at ./storage/innobase/row/row0sel.cc: 5133 # 4 0x0000555555f88aec in ha_innobase::general_fetch ( this = 0x7fff9c09cc38 , buf= 0x7fff9c0a0110 "\377" , direction= 1 , match_mode= 0 ) at ./storage/innobase/handler/ha_innodb.cc: 9736 Let's take a look at the following code in row_search_mvcc(): ... row_search_mvcc(...) { ... locks_ok: /* NOTE that at this point rec can be an old version of a clustered index record built for a consistent read. We cannot assume after this point that rec is on a buffer pool page. Functions like page_rec_is_comp() cannot be used! */   if (rec_get_deleted_flag(rec, comp)) { locks_ok_del_marked: /* In delete-marked records, DB_TRX_ID must always refer to an existing undo log record. */ ut_ad(index != clust_index || row_get_rec_trx_id(rec, index, offsets));   /* The record is delete-marked: we can skip it */   /* This is an optimization to skip setting the next key lock on the record that follows this delete-marked record. This optimization works because of the unique search criteria which precludes the presence of a range lock between this delete marked record and the record following it.   For now this is applicable only to clustered indexes while doing a unique search except for HANDLER queries because HANDLER allows NEXT and PREV even in unique search on clustered index. There is scope for further optimization applicable to unique secondary indexes. Current behaviour is to widen the scope of a lock on an already delete marked record if the same record is deleted twice by the same transaction */ if (index == clust_index && unique_search && !prebuilt->used_in_HANDLER) {   err = DB_RECORD_NOT_FOUND;   goto normal_return; }   goto next_rec; }   ... } Please pay attention on "goto next_rec;". I.e. when delete marked record is locked, the lock on the next record is acquired. That is why the first "select" locks two delete marked records (10), (20) and one record (30). If we apply the patch proposed by thiru and modify it a little bit: @@ - 3523 , 8 + 3527 , 8 @@ lock_update_delete( lock_mutex_enter(); /* Let the next record inherit the locks from rec, in gap mode */ - - lock_rec_inherit_to_gap(block, block, next_heap_no, heap_no); + if (!interesting) + lock_rec_inherit_to_gap(block, block, next_heap_no, heap_no, interesting); /* Reset the lock bits on rec and release waiting transactions */ I.e. purge thread will not extend lock from delete marked record to it's successor, we will see that the second "select" will fail with "lock wait timeout" error even after all delete marked records are purged, because the first "select" set conflicting lock on record (30). In other words the requested by marko functionality is already implemented. If so then why the lock is granted for the second "select" in the original test after delete marked (10) is purged? Let's now consider the original test. I.e. replace: -insert into t1 values ( 10 ), ( 20 ), ( 30 ); +insert into t1 values ( 10 ); ... -delete from t1 where f1 = 10 and f1 = 20 ; +delete from t1 where f1 = 10 ; In this case the first "select" locks delete marked (10) record with LOCK_ORDINARY type and supremum. All locks on supremum are considered as gap locks. When purge thread unlocks delete marked (10), the second "select" is unlocked(see lock_rec_reset_and_release_wait() call in lock_update_delete(), which is invoked from purge thread) and acquires lock on supremum. And the lock is granted with the following stack trace: # 0 lock_rec_has_to_wait (for_locking= true , trx= 0x7ffff495b478 , type_mode= 3 , lock2= 0x7ffff495a448 , lock_is_on_supremum= true ) at ./storage/innobase/lock/lock0lock.cc: 669 # 1 0x0000555555ffa3e9 in lock_rec_other_has_conflicting (mode= 3 , block= 0x7ffff40cc260 , heap_no= 1 , trx= 0x7ffff495b478 ) at ./storage/innobase/lock/lock0lock.cc: 1186 # 2 0x0000555555ffd37a in lock_rec_lock_slow (impl= 0 , mode= 3 , block= 0x7ffff40cc260 , heap_no= 1 , index= 0x7fff9c112a88 , thr= 0x7fff9c09ddc8 ) at ./storage/innobase/lock/lock0lock.cc: 2097 # 3 0x0000555555ffd676 in lock_rec_lock (impl= false , mode= 3 , block= 0x7ffff40cc260 , heap_no= 1 , index= 0x7fff9c112a88 , thr= 0x7fff9c09ddc8 ) at ./storage/innobase/lock/lock0lock.cc: 2169 # 4 0x0000555556008124 in lock_clust_rec_read_check_and_lock (flags= 0 , block= 0x7ffff40cc260 , rec= 0x7ffff45c4070 "supremum" , index= 0x7fff9c112a88 , offsets= 0x7fffefccf400 , mode=LOCK_X, gap_mode= 0 , thr= 0x7fff9c09ddc8 ) at ./storage/innobase/lock/lock0lock.cc: 6130 # 5 0x00005555560e8f67 in sel_set_rec_lock (pcur= 0x7fff9c09d590 , rec= 0x7ffff45c4070 "supremum" , index= 0x7fff9c112a88 , offsets= 0x7fffefccf400 , mode= 3 , type= 0 , thr= 0x7fff9c09ddc8 , mtr= 0x7fffefccf7e0 ) at ./storage/innobase/row/row0sel.cc: 1261 # 6 0x00005555560f07db in row_search_mvcc (buf= 0x7fff9c0a0110 "\377" , mode=PAGE_CUR_G, prebuilt= 0x7fff9c09d3c8 , match_mode= 0 , direction= 0 ) at ./storage/innobase/row/row0sel.cc: 4835 Let's look at the following code in lock_rec_has_to_wait(): ... lock_rec_has_to_wait() { ... /* We have somewhat complex rules when gap type record locks cause waits */   if ((lock_is_on_supremum || (type_mode & LOCK_GAP)) && !(type_mode & LOCK_INSERT_INTENTION)) {   /* Gap type locks without LOCK_INSERT_INTENTION flag do not need to wait for anything. This is because different users can have conflicting lock types on gaps. */   return (FALSE); } ... } As locks on supremum are considered as gap locks, and gap locks without LOCK_INSERT_INTENTION flag are not conflicting, then the above code and behaviour are correct. Note if we try to insert some record in the table after delete marked (10) is purged, the "insert" will be locked as expected even if we don't invoke lock_rec_inherit_to_gap() for delete marked (10) from purge thread, because the first "select" set lock on supremum. As well I would add, that even if we invoke lock_rec_inherit_to_gap() from purge thread, it will not add new lock for supremum and for (30) in the previous test, because it does not create new lock if it already exists, so there is no need to change the code here. I have not yet checked the behaviour on "rollback", but the above allows to claim, that the test for the root case is wrong and the requested functionality is already implemented.

            A few quick observations:

            • The SQL snippets that I posted in this ticket are not the original simplified test case, but something that attempted (and failed) to simplify that test case even further. That test case that I analyzed involves replication and is not completely deterministic.
            • The fact that the row_search_mvcc() function is acquiring locks on delete-marked records and preceding gaps seems to give us permission to remove the lock_rec_inherit_to_gap() calls from purge, and probably also rollback and all other cases.
            • The function row_sel() duplicates the logic for the InnoDB internal SQL parser, which is used for updating InnoDB data dictionary statistics, persistent index cardinality statistics, and fulltext indexes. We must ensure that the locking works in the same way in both functions. Both functions are used for locking SELECT as well as UPDATE and DELETE (which is an update of the delete-mark flag).
            • As hinted in MDEV-14589 and MySQL Bug #19762, ideally InnoDB should not lock delete-marked records themselves, but make it appear so that a contiguous gap lock exists all the way from the preceding record to the next visible and committed record, or to the end of the index.
            • Note: In MDEV-14589 we only fixed a special case of "InnoDB should not lock a delete-marked record". Bogus deadlocks are still possible in other cases.

            My observation on 2019-10-09 suggests that it is necessary to remove some lock_rec_inherit_to_gap() calls to fix the scenario that I analyzed:

            The InnoDB purge thread seems to be wrongly converting a waiting gap lock request that is attached to a purgeable record, to a granted gap lock request on the page supremum.

            It may be the case that in order to fix the problems, we do not only have to remove lock_rec_inherit_to_gap(), but also actually fix MDEV-14589 in all cases. That is, we may have to introduce a new gap lock mode that is not considered to lock the committed delete-marked record itself, but instead lock a contiguous gap. For example, if we have the keys (a),(b),(c),(d) in the index and (b) is a committed delete-marked record, and we are doing a locking read of the range between (a) and (c), currently we would acquire{{LOCK_REC_NOT_GAP}} on (a) and LOCK_ORDINARY on both (b) and (c). Perhaps we should acquire a special type of gap record on the delete-marked record (b) that says that (b) itself is not locked. In this way, there would be no bogus locking conflict on (b), if multiple transactions are trying to delete or update (b), which has already been deleted but not purged. Once (b) is finally purged, there should be absolutely no observable change to the gap locks; we would hold a contiguous gap lock that covers the range (a) to (c). Remember: multiple 'read gaps' or 'write gaps' can be acquired on any given gap, as long as all gap locks are of the same type. Implementing this fix might benefit from refactoring the data structures, as suggested in MDEV-16406.

            marko Marko Mäkelä added a comment - A few quick observations: The SQL snippets that I posted in this ticket are not the original simplified test case, but something that attempted (and failed) to simplify that test case even further. That test case that I analyzed involves replication and is not completely deterministic. The fact that the row_search_mvcc() function is acquiring locks on delete-marked records and preceding gaps seems to give us permission to remove the lock_rec_inherit_to_gap() calls from purge, and probably also rollback and all other cases. The function row_sel() duplicates the logic for the InnoDB internal SQL parser, which is used for updating InnoDB data dictionary statistics, persistent index cardinality statistics, and fulltext indexes. We must ensure that the locking works in the same way in both functions. Both functions are used for locking SELECT as well as UPDATE and DELETE (which is an update of the delete-mark flag). As hinted in MDEV-14589 and MySQL Bug #19762, ideally InnoDB should not lock delete-marked records themselves, but make it appear so that a contiguous gap lock exists all the way from the preceding record to the next visible and committed record, or to the end of the index. Note: In MDEV-14589 we only fixed a special case of "InnoDB should not lock a delete-marked record". Bogus deadlocks are still possible in other cases. My observation on 2019-10-09 suggests that it is necessary to remove some lock_rec_inherit_to_gap() calls to fix the scenario that I analyzed: The InnoDB purge thread seems to be wrongly converting a waiting gap lock request that is attached to a purgeable record, to a granted gap lock request on the page supremum. It may be the case that in order to fix the problems, we do not only have to remove lock_rec_inherit_to_gap() , but also actually fix MDEV-14589 in all cases. That is, we may have to introduce a new gap lock mode that is not considered to lock the committed delete-marked record itself, but instead lock a contiguous gap. For example, if we have the keys (a),(b),(c),(d) in the index and (b) is a committed delete-marked record, and we are doing a locking read of the range between (a) and (c), currently we would acquire{{LOCK_REC_NOT_GAP}} on (a) and LOCK_ORDINARY on both (b) and (c). Perhaps we should acquire a special type of gap record on the delete-marked record (b) that says that (b) itself is not locked. In this way, there would be no bogus locking conflict on (b), if multiple transactions are trying to delete or update (b), which has already been deleted but not purged. Once (b) is finally purged, there should be absolutely no observable change to the gap locks; we would hold a contiguous gap lock that covers the range (a) to (c). Remember: multiple 'read gaps' or 'write gaps' can be acquired on any given gap, as long as all gap locks are of the same type. Implementing this fix might benefit from refactoring the data structures, as suggested in MDEV-16406 .

            Further observations:
            If we fully fix MDEV-14589, then there never should be a lock on a delete-marked record itself, but only a ‘redundant’ gap lock that is by design (of row_search_mvcc() and row_sel()) guaranteed to extend to a gap lock on the following record. Thus, if rollback or purge removes such a purgable record, it will not need to wake up any waiting transactions, and also that wake-up code should be removed.

            A full fix of MDEV-14589 requires ensuring that read-gap and write-gap locks are mutually exclusive. Let me illustrate this with a .test file:

            --source include/have_innodb.inc
            SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
            SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
             
            let $show_hang=0;
            --echo # let show_hang=1;
             
            create table t(k tinyint unsigned primary key) engine=innodb;
            connect (prevent_purge,localhost,root,,);
            start transaction with consistent snapshot;
            connection default;
            begin;
            insert into t values (0),(5),(9),(13);
            delete from t where k=5 or k=9;
            commit;
            begin;
            insert into t values(3);
            connect (con1,localhost,root,,);
            set transaction isolation level serializable;
            begin;
            select * from t where k=6;
            connection default;
            if ($show_hang) {
            disconnect prevent_purge;
            --source include/wait_all_purged.inc
            }
            insert into t values(10);
            if (!$show_hang) {
            disconnect prevent_purge;
            --source include/wait_all_purged.inc
            }
            disconnect con1;
            connection default;
            rollback;
            select * from t;
            drop table t;
            SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;
            

            I was actually expecting that the SELECT would block due to the preceding INSERT, but that is not the case, even if the DELETE statement is removed or the condition k=6 is replaced with anything else than k=3. Only if I change the condition to use >=, such as k>=10, there will be a locking conflict on record heap number 6, which is the record k=3. This is apparently because of an optimizer misfeature that appears to prefer a full table scan over range scans for small tables.

            vlad.lesin, for a complete understanding of this, I hope that you can determine when exactly INSERT is acquiring write-gap locks. I think that in the basic case, INSERT relies solely on implicit locking (no record locks). If the INSERT were acquiring a write-gap lock, then it obviously should block a later locking read operation with k=4, because that key would fall within the same gap.

            Anyway, the last INSERT after the locking SELECT is showing the problem that I wanted to illustrate. As long as the delete-marked record k=5 exists, the last INSERT can proceed. If the record k=5 is purged while both transactions are active, then we will have both a write-gap and read-gap lock on the remaining gap, which is a violation of the gap locking rules!

            I think that replication (as well as some users) can reasonably expect locking to be completely agnostic to the purge of history. Fixing this (that is, ensuring that the SELECT and INSERT will conflict no matter if delete-marked records exist between them) might introduce more locking conflicts, so we might want to introduce a configuration option (or abuse innodb_locks_unsafe_for_binlog) to enable the old non-conflicting gap locking. Note: that deprecated option was removed from MariaDB Server 10.5.0 in MDEV-19544.

            marko Marko Mäkelä added a comment - Further observations: If we fully fix MDEV-14589 , then there never should be a lock on a delete-marked record itself, but only a ‘redundant’ gap lock that is by design (of row_search_mvcc() and row_sel() ) guaranteed to extend to a gap lock on the following record. Thus, if rollback or purge removes such a purgable record, it will not need to wake up any waiting transactions, and also that wake-up code should be removed. A full fix of MDEV-14589 requires ensuring that read-gap and write-gap locks are mutually exclusive. Let me illustrate this with a .test file: --source include/have_innodb.inc SET @saved_frequency = @@ GLOBAL .innodb_purge_rseg_truncate_frequency; SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;   let $show_hang=0; --echo # let show_hang=1;   create table t(k tinyint unsigned primary key ) engine=innodb; connect (prevent_purge,localhost,root,,); start transaction with consistent snapshot; connection default ; begin ; insert into t values (0),(5),(9),(13); delete from t where k=5 or k=9; commit ; begin ; insert into t values (3); connect (con1,localhost,root,,); set transaction isolation level serializable ; begin ; select * from t where k=6; connection default ; if ($show_hang) { disconnect prevent_purge; --source include/wait_all_purged.inc } insert into t values (10); if (!$show_hang) { disconnect prevent_purge; --source include/wait_all_purged.inc } disconnect con1; connection default ; rollback ; select * from t; drop table t; SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency; I was actually expecting that the SELECT would block due to the preceding INSERT , but that is not the case, even if the DELETE statement is removed or the condition k=6 is replaced with anything else than k=3 . Only if I change the condition to use >= , such as k>=10 , there will be a locking conflict on record heap number 6, which is the record k=3 . This is apparently because of an optimizer misfeature that appears to prefer a full table scan over range scans for small tables. vlad.lesin , for a complete understanding of this, I hope that you can determine when exactly INSERT is acquiring write-gap locks. I think that in the basic case, INSERT relies solely on implicit locking (no record locks). If the INSERT were acquiring a write-gap lock, then it obviously should block a later locking read operation with k=4 , because that key would fall within the same gap. Anyway, the last INSERT after the locking SELECT is showing the problem that I wanted to illustrate. As long as the delete-marked record k=5 exists, the last INSERT can proceed. If the record k=5 is purged while both transactions are active, then we will have both a write-gap and read-gap lock on the remaining gap, which is a violation of the gap locking rules! I think that replication (as well as some users) can reasonably expect locking to be completely agnostic to the purge of history. Fixing this (that is, ensuring that the SELECT and INSERT will conflict no matter if delete-marked records exist between them) might introduce more locking conflicts, so we might want to introduce a configuration option (or abuse innodb_locks_unsafe_for_binlog ) to enable the old non-conflicting gap locking. Note: that deprecated option was removed from MariaDB Server 10.5.0 in MDEV-19544 .
            vlad.lesin Vladislav Lesin added a comment - - edited

            The above test shows the problem. The following is the explanation why the last INSERT is locked after purge and is not locked before purge.

            1. Connection "default", repeatable read, "insert into t values(3)":

            (0)...(3 - just inserted, X-lock)...(5 - dmc)...(9 - dmc)...(13)
            

            where dmc - delete-marked committed

            2. Connection "con1", serializable, "select * from t where k=6":

            (0)...(3 - X-lock)...(5 - dmc)...(9 - dmc, gap S-lock)...(13)
            

            The lock on (9) is set with the following backtrace:

             #0  lock_rec_lock (impl=false, mode=514, block=0x7ffff40cc260, heap_no=4, index=0x7fff9c02c468, thr=0x7fff9c02e5a8)
                 at ./storage/innobase/lock/lock0lock.cc:2148
             #1  0x00005555560081f4 in lock_clust_rec_read_check_and_lock (flags=0, block=0x7ffff40cc260, rec=0x7ffff45c40a3 "\t", index=0x7fff9c02c468, 
                 offsets=0x7fffefd1a140, mode=LOCK_S, gap_mode=512, thr=0x7fff9c02e5a8)
                 at ./storage/innobase/lock/lock0lock.cc:6130
             #2  0x00005555560e93e4 in sel_set_rec_lock (pcur=0x7fff9c02dd70, rec=0x7ffff45c40a3 "\t", index=0x7fff9c02c468, offsets=0x7fffefd1a140, mode=2, 
                 type=512, thr=0x7fff9c02e5a8, mtr=0x7fffefd1a520)
                 at ./storage/innobase/row/row0sel.cc:1261
             #3  0x00005555560f143f in row_search_mvcc (buf=0x7fff9c09dac0 "\377", mode=PAGE_CUR_GE, prebuilt=0x7fff9c02dba8, match_mode=1, direction=0)
                 at ./storage/innobase/row/row0sel.cc:4968
             #4  0x0000555555f87f67 in ha_innobase::index_read (this=0x7fff9c09df18, buf=0x7fff9c09dac0 "\377", key_ptr=0x7fff8c013920 "\006", key_len=1, 
                 find_flag=HA_READ_KEY_EXACT) at ./storage/innobase/handler/ha_innodb.cc:9469
            

            3. Connection "default", repeatable read, "insert into t values(10)":

            a) before purge

            (0)...(3 - X-lock)...(5 - dmc)...(9 - dmc, gap S-lock)...(10 - just inserted, X-lock)...(13)
            

            (10) is inserted as there is no gap lock on (13)

            b) after purge

            (0)...(3 - X-lock)...(13 - gap S-lock inherited from purged 9)
                              ^^^(10) can't be inserted as there is gap S-lock on (13)
            

            As we can see, the problem is that gap locks on delete-marked records are not propagated till their non-delete-marked successors for point queries:

            dberr_t
            row_search_mvcc(...)
            {
            ...
            	if (match_mode == ROW_SEL_EXACT) {
            ...
            				err = sel_set_rec_lock(
            					pcur,
            					rec, index, offsets,
            					prebuilt->select_lock_type, LOCK_GAP,
            					thr, &mtr);
            ...
            			goto normal_return;
            	  } else if (match_mode == ROW_SEL_EXACT_PREFIX) {
            ...
            				err = sel_set_rec_lock(
            					pcur,
            					rec, index, offsets,
            					prebuilt->select_lock_type, LOCK_GAP,
            					thr, &mtr);
            ...
            			goto normal_return;
            	}
            ...
            }
            

            vlad.lesin Vladislav Lesin added a comment - - edited The above test shows the problem. The following is the explanation why the last INSERT is locked after purge and is not locked before purge. 1. Connection "default", repeatable read, "insert into t values(3)": (0)...(3 - just inserted, X-lock)...(5 - dmc)...(9 - dmc)...(13) where dmc - delete-marked committed 2. Connection "con1", serializable, "select * from t where k=6": (0)...(3 - X-lock)...(5 - dmc)...(9 - dmc, gap S-lock)...(13) The lock on (9) is set with the following backtrace: # 0 lock_rec_lock (impl= false , mode= 514 , block= 0x7ffff40cc260 , heap_no= 4 , index= 0x7fff9c02c468 , thr= 0x7fff9c02e5a8 ) at ./storage/innobase/lock/lock0lock.cc: 2148 # 1 0x00005555560081f4 in lock_clust_rec_read_check_and_lock (flags= 0 , block= 0x7ffff40cc260 , rec= 0x7ffff45c40a3 "\t" , index= 0x7fff9c02c468 , offsets= 0x7fffefd1a140 , mode=LOCK_S, gap_mode= 512 , thr= 0x7fff9c02e5a8 ) at ./storage/innobase/lock/lock0lock.cc: 6130 # 2 0x00005555560e93e4 in sel_set_rec_lock (pcur= 0x7fff9c02dd70 , rec= 0x7ffff45c40a3 "\t" , index= 0x7fff9c02c468 , offsets= 0x7fffefd1a140 , mode= 2 , type= 512 , thr= 0x7fff9c02e5a8 , mtr= 0x7fffefd1a520 ) at ./storage/innobase/row/row0sel.cc: 1261 # 3 0x00005555560f143f in row_search_mvcc (buf= 0x7fff9c09dac0 "\377" , mode=PAGE_CUR_GE, prebuilt= 0x7fff9c02dba8 , match_mode= 1 , direction= 0 ) at ./storage/innobase/row/row0sel.cc: 4968 # 4 0x0000555555f87f67 in ha_innobase::index_read ( this = 0x7fff9c09df18 , buf= 0x7fff9c09dac0 "\377" , key_ptr= 0x7fff8c013920 "\006" , key_len= 1 , find_flag=HA_READ_KEY_EXACT) at ./storage/innobase/handler/ha_innodb.cc: 9469 3. Connection "default", repeatable read, "insert into t values(10)": a) before purge (0)...(3 - X-lock)...(5 - dmc)...(9 - dmc, gap S-lock)...(10 - just inserted, X-lock)...(13) (10) is inserted as there is no gap lock on (13) b) after purge (0)...(3 - X-lock)...(13 - gap S-lock inherited from purged 9) ^^^(10) can't be inserted as there is gap S-lock on (13) As we can see, the problem is that gap locks on delete-marked records are not propagated till their non-delete-marked successors for point queries: dberr_t row_search_mvcc(...) { ... if (match_mode == ROW_SEL_EXACT) { ... err = sel_set_rec_lock( pcur, rec, index, offsets, prebuilt->select_lock_type, LOCK_GAP, thr, &mtr); ... goto normal_return; } else if (match_mode == ROW_SEL_EXACT_PREFIX) { ... err = sel_set_rec_lock( pcur, rec, index, offsets, prebuilt->select_lock_type, LOCK_GAP, thr, &mtr); ... goto normal_return; } ... }
            marko Marko Mäkelä made changes -
            julien.fritsch Julien Fritsch made changes -
            Assignee Vladislav Lesin [ vlad.lesin ] Julien Fritsch [ julien.fritsch ]
            julien.fritsch Julien Fritsch made changes -
            Assignee Julien Fritsch [ julien.fritsch ] Vladislav Lesin [ vlad.lesin ]
            marko Marko Mäkelä made changes -
            manjot Manjot Singh (Inactive) made changes -
            Labels AssistToday
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Labels AssistToday
            Elkin Andrei Elkin made changes -

            I will take over this. But I am afraid that I must address MDEV-23399 first. So, this is not truly ‘in progress’ at the moment.

            marko Marko Mäkelä added a comment - I will take over this. But I am afraid that I must address MDEV-23399 first. So, this is not truly ‘in progress’ at the moment.
            marko Marko Mäkelä made changes -
            Assignee Vladislav Lesin [ vlad.lesin ] Marko Mäkelä [ marko ]

            marko I have just pushed my draft to https://github.com/MariaDB/server/tree/10.2-MDEV-20605-gap-locks-debug.

            I have implemented and tested changes in row_search_mvcc().

            I have implemented and not tested changes in row_sel(). I started implementation of system variable to test row_sel(), you can see it in one of the commits.

            I have implemented changes for foreign key constraints check. But I don't like this implementation, because it's complexity is n*m. See comment in commit message.

            I have not implemented duplicates check for foreign check.

            vlad.lesin Vladislav Lesin added a comment - marko I have just pushed my draft to https://github.com/MariaDB/server/tree/10.2-MDEV-20605-gap-locks-debug . I have implemented and tested changes in row_search_mvcc(). I have implemented and not tested changes in row_sel(). I started implementation of system variable to test row_sel(), you can see it in one of the commits. I have implemented changes for foreign key constraints check. But I don't like this implementation, because it's complexity is n*m. See comment in commit message. I have not implemented duplicates check for foreign check.
            julien.fritsch Julien Fritsch made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            Sorry for the delay; I was occupied by 10.6 development work.

            The changes in the 10.2 based branch look mostly OK to me. For testing, I think that we will need a 10.6-based branch.

            My understanding is that the function row_sel() is never invoked on spatial indexes. It might also be that all locking use of it is for system tables or InnoDB persistent statistics tables. If functions like fts_doc_fetch_by_doc_id() never execute a locking read, then we might not have to modify row_sel() at all.

            A more thorough description of how gap locks work in the FOREIGN KEY locks will be needed.

            Also, whenever we check that a record is delete-marked, is it certain that the record has been committed? I am fairly sure that a gap lock can be safely attached to a delete-marked non-committed record.

            Another related doubt: Is it safe to attach a gap lock to a record that has been inserted by a not-yet-committed transaction? Maybe, in such cases we should always also acquire a lock on the ‘anchor’ record itself (so that we will wait for the transaction commit), and not only on the preceding gap.

            As part of testing this, I think that we must repeat and analyze the failure of MDEV-22889.

            marko Marko Mäkelä added a comment - Sorry for the delay; I was occupied by 10.6 development work. The changes in the 10.2 based branch look mostly OK to me. For testing, I think that we will need a 10.6-based branch. My understanding is that the function row_sel() is never invoked on spatial indexes. It might also be that all locking use of it is for system tables or InnoDB persistent statistics tables. If functions like fts_doc_fetch_by_doc_id() never execute a locking read, then we might not have to modify row_sel() at all. A more thorough description of how gap locks work in the FOREIGN KEY locks will be needed. Also, whenever we check that a record is delete-marked, is it certain that the record has been committed? I am fairly sure that a gap lock can be safely attached to a delete-marked non-committed record. Another related doubt: Is it safe to attach a gap lock to a record that has been inserted by a not-yet-committed transaction? Maybe, in such cases we should always also acquire a lock on the ‘anchor’ record itself (so that we will wait for the transaction commit), and not only on the preceding gap. As part of testing this, I think that we must repeat and analyze the failure of MDEV-22889 .
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Vladislav Lesin [ vlad.lesin ]
            TheWitness Larry Adams added a comment -

            I'm on 10.5.8 and struggling with this problem. The server, in 'optimistic' mode encounters a 1964 error, and the Slave_SQL_Running goes to No, and the whole thing breaks. So, optimistic does not work in 10.5.8. I'm testing with 'conservative', but the slaves are unable to keep up with the master presently. Just watching as I type, I have not seen the Seconds_Behind_Master decrease since I restarted in 'conservative' mode. I'm using the 'mixed' binlog format. Switching to 'row' likely won't help either. We are already generating 5TB of binary log's per day in 'mixed' mode.

            TheWitness Larry Adams added a comment - I'm on 10.5.8 and struggling with this problem. The server, in 'optimistic' mode encounters a 1964 error, and the Slave_SQL_Running goes to No, and the whole thing breaks. So, optimistic does not work in 10.5.8. I'm testing with 'conservative', but the slaves are unable to keep up with the master presently. Just watching as I type, I have not seen the Seconds_Behind_Master decrease since I restarted in 'conservative' mode. I'm using the 'mixed' binlog format. Switching to 'row' likely won't help either. We are already generating 5TB of binary log's per day in 'mixed' mode.

            I think that the following must be part of the fix:

            diff --git a/storage/innobase/row/row0ins.cc b/storage/innobase/row/row0ins.cc
            index 635e7f659eb..097b67ca0f0 100644
            --- a/storage/innobase/row/row0ins.cc
            +++ b/storage/innobase/row/row0ins.cc
            @@ -1678,9 +1678,7 @@ row_ins_check_foreign_constraint(
             							    offsets));
             
             				err = row_ins_set_shared_rec_lock(
            -					skip_gap_lock
            -					? LOCK_REC_NOT_GAP
            -					: LOCK_ORDINARY, block,
            +					LOCK_ORDINARY, block,
             					rec, check_index, offsets, thr);
             				switch (err) {
             				case DB_SUCCESS_LOCKED_REC:
            

            In a FOREIGN KEY check, if we encounter a delete-marked record, we must ensure that the record is committed, by acquiring a lock on both the record and the preceding gap. And if it was the case, we must continue acquiring such locks on succeeding records until the end of the index, or until a record that is not delete-marked. On that non-delete-marked record, we must acquire a gap-only lock. And we must not recurse into that non-delete-marked record, because we already found that the first record was delete-marked.

            marko Marko Mäkelä added a comment - I think that the following must be part of the fix: diff --git a/storage/innobase/row/row0ins.cc b/storage/innobase/row/row0ins.cc index 635e7f659eb..097b67ca0f0 100644 --- a/storage/innobase/row/row0ins.cc +++ b/storage/innobase/row/row0ins.cc @@ -1678,9 +1678,7 @@ row_ins_check_foreign_constraint( offsets)); err = row_ins_set_shared_rec_lock( - skip_gap_lock - ? LOCK_REC_NOT_GAP - : LOCK_ORDINARY, block, + LOCK_ORDINARY, block, rec, check_index, offsets, thr); switch (err) { case DB_SUCCESS_LOCKED_REC: In a FOREIGN KEY check, if we encounter a delete-marked record, we must ensure that the record is committed, by acquiring a lock on both the record and the preceding gap. And if it was the case, we must continue acquiring such locks on succeeding records until the end of the index, or until a record that is not delete-marked. On that non-delete-marked record, we must acquire a gap-only lock. And we must not recurse into that non-delete-marked record, because we already found that the first record was delete-marked.
            marko Marko Mäkelä added a comment - - edited

            Why should it suffice to acquire a gap-only lock on that last non-delete-marked record in the traversal in row_ins_check_foreign_constraint()? What if that last record is later delete-marked and committed? Can it be purged, so that the last segment of our gap lock would be widened in purge?

            The answer is that normally, the transaction that is executing the traversal in row_ins_check_foreign_constraint() has a read view that will be older than a possible future commit of delete-marking the record that was not delete-marked during our traversal. Because purge cannot advance beyond the oldest existing read view, the record could not be purged.

            However, there are exceptions at two isolation levels:
            READ UNCOMMITTED: The transaction would not have any read view.
            READ COMMITTED: The read view will be reset at statement boundary.

            In either case, there is nothing that would prevent purge from advancing to that last record. In these cases, it could be simplest to acquire LOCK_ORDINARY on the last segment of the gap (when the anchor record is not delete-marked). That would prevent such a future delete-mark operation from occurring before the gap-owning transaction is committed or rolled back.

            marko Marko Mäkelä added a comment - - edited Why should it suffice to acquire a gap-only lock on that last non-delete-marked record in the traversal in row_ins_check_foreign_constraint() ? What if that last record is later delete-marked and committed? Can it be purged, so that the last segment of our gap lock would be widened in purge? The answer is that normally, the transaction that is executing the traversal in row_ins_check_foreign_constraint() has a read view that will be older than a possible future commit of delete-marking the record that was not delete-marked during our traversal. Because purge cannot advance beyond the oldest existing read view, the record could not be purged. However, there are exceptions at two isolation levels: READ UNCOMMITTED : The transaction would not have any read view. READ COMMITTED : The read view will be reset at statement boundary. In either case, there is nothing that would prevent purge from advancing to that last record. In these cases, it could be simplest to acquire LOCK_ORDINARY on the last segment of the gap (when the anchor record is not delete-marked). That would prevent such a future delete-mark operation from occurring before the gap-owning transaction is committed or rolled back.
            TheWitness Larry Adams added a comment -

            Per the recommendation made by others, I switched to ROW. So far so good.

            TheWitness Larry Adams added a comment - Per the recommendation made by others, I switched to ROW. So far so good.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            One more thought: In MDEV-19544 we finally removed an option that had been deprecated a long time ago:

            When innodb_locks_unsafe_for_binlog was set, InnoDB would skip gap locking in many cases. The non-descriptive name of the option hints that it was known to break statement-based replication.

            Later, the option was made redundant by incorporating such logic in the READ UNCOMMITTED and READ COMMITTED isolation levels. That those isolation levels acquire fewer gap locks is expected, as noted in MDEV-24224.

            All this suggests that the limitations that applied to innodb_locks_unsafe_for_binlog=ON will also apply to the READ UNCOMMITTED and READ COMMITTED isolation levels.

            Having said all this, we definitely do have a problem with the gap locks on delete-marked records that affects the default REPEATABLE READ isolation level.

            marko Marko Mäkelä added a comment - One more thought: In MDEV-19544 we finally removed an option that had been deprecated a long time ago: When innodb_locks_unsafe_for_binlog was set, InnoDB would skip gap locking in many cases. The non-descriptive name of the option hints that it was known to break statement-based replication. Later, the option was made redundant by incorporating such logic in the READ UNCOMMITTED and READ COMMITTED isolation levels. That those isolation levels acquire fewer gap locks is expected, as noted in MDEV-24224 . All this suggests that the limitations that applied to innodb_locks_unsafe_for_binlog=ON will also apply to the READ UNCOMMITTED and READ COMMITTED isolation levels. Having said all this, we definitely do have a problem with the gap locks on delete-marked records that affects the default REPEATABLE READ isolation level.
            marko Marko Mäkelä made changes -

            The option innodb_locks_unsafe_for_binlog was deprecated in MySQL 5.6.3. Already back then, the logic had been incorporated to the READ UNCOMMITTED and READ COMMITTED isolation levels. So, those isolation levels are as unsafe to use with replication as the deprecated setting innodb_locks_unsafe_for_binlog=1. We will not change the reduced acquisition of gap locks in this ticket MDEV-20605 (except when it comes to delete-marked gap anchor records).

            Note: I do not know what exactly was or is unsafe about innodb_locks_unsafe_for_binlog. I hope that Elkin can shed light on that.

            marko Marko Mäkelä added a comment - The option innodb_locks_unsafe_for_binlog was deprecated in MySQL 5.6.3. Already back then, the logic had been incorporated to the READ UNCOMMITTED and READ COMMITTED isolation levels. So, those isolation levels are as unsafe to use with replication as the deprecated setting innodb_locks_unsafe_for_binlog=1 . We will not change the reduced acquisition of gap locks in this ticket MDEV-20605 (except when it comes to delete-marked gap anchor records). Note: I do not know what exactly was or is unsafe about innodb_locks_unsafe_for_binlog . I hope that Elkin can shed light on that.
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.5 [ 23123 ]
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            vlad.lesin Vladislav Lesin made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Labels 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Elkin Andrei Elkin made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -

            could you do a small test case to demonstrate that behavior? you can use, for example, sleep() to make sure statements are executed in the desired order on the slave.

            serg Sergei Golubchik added a comment - could you do a small test case to demonstrate that behavior? you can use, for example, sleep() to make sure statements are executed in the desired order on the slave.
            vlad.lesin Vladislav Lesin made changes -
            Attachment MDEV-20605-restore-cursor.test [ 60381 ]
            vlad.lesin Vladislav Lesin added a comment - - edited

            The test case is attached. It repeats the customer's root case with secondary keys locking. But more likely the same wrong cursor position restoring behaviour can be repeated for primary keys too. The test was developed for 10.6, I did not check it for the earlier versions, it might be some syncpoints missed for the earlier versions. MDEV-20605-restore-cursor.test

            vlad.lesin Vladislav Lesin added a comment - - edited The test case is attached. It repeats the customer's root case with secondary keys locking. But more likely the same wrong cursor position restoring behaviour can be repeated for primary keys too. The test was developed for 10.6, I did not check it for the earlier versions, it might be some syncpoints missed for the earlier versions. MDEV-20605-restore-cursor.test
            vlad.lesin Vladislav Lesin made changes -
            Attachment row0upd.cc.diff [ 60406 ]
            vlad.lesin Vladislav Lesin made changes -
            Attachment MDEV-20605-deadlock.test [ 60407 ]
            vlad.lesin Vladislav Lesin added a comment - - edited

            There is one more test case with deadlock. MDEV-20605-deadlock.test

            It looks strange to me. The situation is the following. We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds lock on the record. And the second's transaction lock contains "waiting" flag. This should be reported in the separate MDEV. See MDEV-27025.

            vlad.lesin Vladislav Lesin added a comment - - edited There is one more test case with deadlock. MDEV-20605-deadlock.test It looks strange to me. The situation is the following. We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds lock on the record. And the second's transaction lock contains "waiting" flag. This should be reported in the separate MDEV. See MDEV-27025 .
            vlad.lesin Vladislav Lesin made changes -
            Attachment MDEV-20605-deadlock.test [ 60407 ]
            vlad.lesin Vladislav Lesin made changes -
            Attachment MDEV-20605-deadlock.test [ 60412 ]
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            vlad.lesin Vladislav Lesin made changes -
            Attachment MDEV-20605-deadlock.test [ 60412 ]
            vlad.lesin Vladislav Lesin made changes -
            Attachment MDEV-20605-deadlock.test [ 60437 ]
            vlad.lesin Vladislav Lesin made changes -
            Attachment row0upd.cc.diff [ 60406 ]
            vlad.lesin Vladislav Lesin made changes -
            vlad.lesin Vladislav Lesin added a comment - - edited

            marko, I think the causer is not https://github.com/MariaDB/server/commit/f4fd68857b2e7767b559dbc09a8d4abbe7ccec29 commit. If we take a look the initial 5.0 import to git, we will see the following code:

            static                                                                          
            ibool                                                                           
            sel_restore_position_for_mysql(...)                                             
            {                                                                               
                    ibool   success;                                                        
                    ulint   relative_position;                                              
                                                                                            
                    relative_position = pcur->rel_pos;                                      
                                                                                            
                    success = btr_pcur_restore_position(latch_mode, pcur, mtr);             
                                                                                            
                    *same_user_rec = success;                                               
                                                                                            
                    if (relative_position == BTR_PCUR_ON) {                                 
                            if (success) {                                                  
                                    return(FALSE);                                          
                            }                                                               
                                                                                            
                            if (moves_up) {                                                 
                                    btr_pcur_move_to_next(pcur, mtr);                       
                            }                                                               
                                                                                            
                            return(TRUE);                                                   
                    }                                                                       
            ...                                                                             
            }
            

            The comment of btr_pcur_restore_position() says:

                                                    /* out: TRUE if the cursor position     
                                                    was stored when it was on a user record 
                                                    and it can be restored on a user record 
                                                    whose ordering fields are identical to  
                                                    the ones of the original user record */
            

            So "success" in sel_restore_position_for_mysql() contains two conditions joined with "and":

            1. The restored position is not supremum or infinum
            2. The restored position points to the record equals to stored one

            I understand why the cursor is moved forward if the first condition is false, i.e. the restored position is infinum or supremum. But I don't understand why it should be moved if the second condition is false. For our case the cursor position was restored to the new record with the same key as in the stored record, but with different value. I think this is error and when the code was developed, the developer took into account the first condition and did not take into account the second one.

            So the fix could be in dividing those two conditions. btr_pcur_restore_position() could, for example return true if cursor is not supremum or infinum, and have one more out parameter to notify if restored position points to the record with the same fields as the stored one.

            What do you think about it?

            vlad.lesin Vladislav Lesin added a comment - - edited marko , I think the causer is not https://github.com/MariaDB/server/commit/f4fd68857b2e7767b559dbc09a8d4abbe7ccec29 commit. If we take a look the initial 5.0 import to git, we will see the following code: static ibool sel_restore_position_for_mysql(...) { ibool success; ulint relative_position; relative_position = pcur->rel_pos; success = btr_pcur_restore_position(latch_mode, pcur, mtr); *same_user_rec = success; if (relative_position == BTR_PCUR_ON) { if (success) { return (FALSE); } if (moves_up) { btr_pcur_move_to_next(pcur, mtr); } return (TRUE); } ... } The comment of btr_pcur_restore_position() says: /* out: TRUE if the cursor position was stored when it was on a user record and it can be restored on a user record whose ordering fields are identical to the ones of the original user record */ So "success" in sel_restore_position_for_mysql() contains two conditions joined with "and": The restored position is not supremum or infinum The restored position points to the record equals to stored one I understand why the cursor is moved forward if the first condition is false, i.e. the restored position is infinum or supremum. But I don't understand why it should be moved if the second condition is false. For our case the cursor position was restored to the new record with the same key as in the stored record, but with different value. I think this is error and when the code was developed, the developer took into account the first condition and did not take into account the second one. So the fix could be in dividing those two conditions. btr_pcur_restore_position() could, for example return true if cursor is not supremum or infinum, and have one more out parameter to notify if restored position points to the record with the same fields as the stored one. What do you think about it?
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            vlad.lesin Vladislav Lesin made changes -
            Attachment MDEV-20605-restore-cursor-v2.test [ 60741 ]
            vlad.lesin Vladislav Lesin made changes -
            Attachment MDEV-20605-locking.test [ 60742 ]
            vlad.lesin Vladislav Lesin added a comment - - edited

            Some update.

            The first I changed my test a little bit. There is no need to execute the last INSERT in parallel with the previous DELETE. We can finish the previous DELETE, and then execute the INSERT. The updated version is here: MDEV-20605-restore-cursor-v2.test

            I fixed cursor restore position function and the test passes with the fix. But there are test failures because of the fix: rpl.rpl_parallel_optimistic_nobinlog rpl.rpl_parallel_optimistic. I have not yet fully analysed them, but I suppose the issue is with locking and commit order.

            About locking order. Elkin found the the test fails even if we remove purging. I analysed it too and came to conclusion that this is two independent issues. Here is the test: MDEV-20605-locking.test. The locks order is depicted here: https://docs.google.com/presentation/d/1ISG85S8A06kRVO6esq2cWPMMWS-1-fxcxEDVeZ0mu2I/edit?usp=sharing. The issue is that the last INSERT got lock on heap_no 3 before the previous DELETE got lock for heap_no 2 and was executed before the DELETE. I explained it also on the test comment.

            Theoretically slave must resolve such issues allowing commits on slave to be in the same order as on master. But I have some doubts about it, need to analyse rpl.rpl_parallel_optimistic_nobinlog failures to be sure.

            vlad.lesin Vladislav Lesin added a comment - - edited Some update. The first I changed my test a little bit. There is no need to execute the last INSERT in parallel with the previous DELETE. We can finish the previous DELETE, and then execute the INSERT. The updated version is here: MDEV-20605-restore-cursor-v2.test I fixed cursor restore position function and the test passes with the fix. But there are test failures because of the fix: rpl.rpl_parallel_optimistic_nobinlog rpl.rpl_parallel_optimistic. I have not yet fully analysed them, but I suppose the issue is with locking and commit order. About locking order. Elkin found the the test fails even if we remove purging. I analysed it too and came to conclusion that this is two independent issues. Here is the test: MDEV-20605-locking.test . The locks order is depicted here: https://docs.google.com/presentation/d/1ISG85S8A06kRVO6esq2cWPMMWS-1-fxcxEDVeZ0mu2I/edit?usp=sharing . The issue is that the last INSERT got lock on heap_no 3 before the previous DELETE got lock for heap_no 2 and was executed before the DELETE. I explained it also on the test comment. Theoretically slave must resolve such issues allowing commits on slave to be in the same order as on master. But I have some doubts about it, need to analyse rpl.rpl_parallel_optimistic_nobinlog failures to be sure.
            serg Sergei Golubchik made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -

            Update: local testing of my fix looks good, pushed https://github.com/MariaDB/server/tree/bb-10.6-MDEV-20605-cur-pos-fix for buildbot testing.

            vlad.lesin Vladislav Lesin added a comment - Update: local testing of my fix looks good, pushed https://github.com/MariaDB/server/tree/bb-10.6-MDEV-20605-cur-pos-fix for buildbot testing.
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 99695 ] MariaDB v4 [ 144523 ]
            manjot Manjot Singh (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            julien.fritsch Julien Fritsch made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            vlad.lesin Vladislav Lesin added a comment - - edited

            Some update and analyses.

            The initial issue.

            During the last RQG testing "CHECK TABLE" showed error in the following code:

            int ha_innobase::check(...)                                                     
            {                                                                               
            ...                                                                             
                            if (index == dict_table_get_first_index(m_prebuilt->table)) {   
                                    n_rows_in_table = n_rows;                               
                            } else if (!(index->type & DICT_FTS)                            
                                       && (n_rows != n_rows_in_table)) {                    
                                    push_warning_printf(                                    
                                            thd, Sql_condition::WARN_LEVEL_WARN,            
                                            ER_NOT_KEYFILE,                                 
                                            "InnoDB: Index '%-.200s' contains " ULINTPF     
                                            " entries, should be " ULINTPF ".",             
                                            index->name(), n_rows, n_rows_in_table);        
                                    is_ok = false;                                          
                                    dict_set_corrupted(index, "CHECK TABLE; Wrong count",   
                                                       false);                              
                            }                                                               
            ...                                                                             
            }
            

            The number of rows does not match for the first and the next indexes.

            The initial analyses

            The scenario is the following. Suppose we have a pair of records in secondary index with the same unique fields, the first record is non-delete-marked, the second record is delete-marked-committed. The transaction of "CHECK TABLE" does non-locking read of secondary index. After the record from secondary index is read in row_search_mvcc(), the correspondent clustered index record is also requested(see mtr_has_extra_clust_latch local variable in row_search_mvcc()). After that before requesting the next secondary index record in row_search_mvcc() cursor position is stored, mtr is committed, mtr is started and cursor position is restored(to preserve clustered index pages latching order). In the room when the page was unlatched(after mtr commit) the cursor's record was purged, and then cursor position was restored to the previous secondary index record. As the previous record had the same unique fields as the unique fields in the stored in cursor record, the cursor was not moved forward, and the same record was pushed twice in results set.

            The above case can happen only if cursor position is stored, the page is unlatched, purge thread removed stored in cursor record, and then the cursor position is restored. There can be 3 cases when row_search_mvcc() does the above(see btr_pcur_store_position() and sel_restore_position_for_mysql() calls in row_search_mvcc()):

            1) When row_search_mvcc() is finishing. It can be invoked with the same cursor again to continue index reading, see ha_innobase::index_next() as an example;
            2) before suspending when it got waiting lock;
            3) when there is search in secondary index, and we need to request clustered index too(see the above example for "CHECK TABLE").

            Consider the cases for locking and non-locking reads.

            Non-locking read

            Case (1) should not cause the issue because if the locking read view can see the record and the view is still opened, the record can't be purged.

            Case (2) can't happen for non-locking read by definition.

            For non-locking read for case(3) the following code requests clustered index record even if the secondary index record is delete-marked:

            dberr_t                                                                         
            row_search_mvcc(...)                                                            
            {                                                                               
            ...                                                                             
                    if (prebuilt->select_lock_type != LOCK_NONE) {                          
                    ...                                                                     
                    } else {                                                                
                    ...                                                                     
                            if (trx->isolation_level == TRX_ISO_READ_UNCOMMITTED            
                                || prebuilt->table->is_temporary()                          
                                || prebuilt->table->no_rollback()) {                        
                                                                                            
                                    /* Do nothing: we let a non-locking SELECT read the     
                                    latest version of the record */                         
                                                                                            
                            } else if (index == clust_index) {                              
                            ...                                                             
                            } else {                                                        
                                    /* We are looking into a non-clustered index,           
                                    and to get the right version of the record we           
                                    have to look also into the clustered index: this        
                                    is necessary, because we can only get the undo          
                                    information via the clustered index record. */          
                                    ...                                                     
                                    if (!srv_read_only_mode) {                              
                                            ...                                             
                                            if (trx->read_view.sees(trx_id)) {              
                                                    goto locks_ok;                          
                                            }                                               
                                            /* We should look at the clustered index.       
                                            However, as this is a non-locking read,         
                                            we can skip the clustered index lookup if       
                                            the condition does not match the secondary      
                                            index entry. */                                 
                                            switch (row_search_idx_cond_check(              
                                                            buf, prebuilt, rec, offsets)) { 
                                            ...                                             
                                            case CHECK_POS:
                                                    goto requires_clust_rec;                
                                            }                                               
                                                                                            
                                            ut_error;                                       
                                    }                                                       
                            }                                                               
                    }                                                                       
            ...                                                                             
            }            
            

            That is why for locking read row_search_mvcc() can unlatch the page after clustered index record is read:

            dberr_t                                                                         
            row_search_mvcc(...)                                                            
            {                                                                               
            ...                                                                             
                    } else if (mtr_has_extra_clust_latch) {                                 
                            /* If we have extra cluster latch, we must commit               
                            mtr if we are moving to the next non-clustered                  
                            index record, because we could break the latching               
                            order if we would access a different clustered                  
                            index page right away without releasing the previous. */        
                                                                                            
                            btr_pcur_store_position(pcur, &mtr);                            
                            mtr.commit();                                                   
                            mtr_has_extra_clust_latch = FALSE;                              
                                                                                            
                            mtr.start();                                                    
                                                                                            
                            if (sel_restore_position_for_mysql(&same_user_rec,              
                                                               BTR_SEARCH_LEAF,             
                                                               pcur, moves_up, &mtr)) {     
                                    goto rec_loop;                                          
                            }                                                               
                    }                                                                       
                                                                                            
            ...                                                                             
            }
            

            And when the page is unlatched, purge has ability to remove stored in the cursor record, what causes rows duplication in result set.

            marko proposed to use mtr_t::get_savepoint() and mtr_t::release_s_latch_at_savepoint() to solve the issue. I.e. we can create mtr savepoint before requesting clustered index record, then release only clustered index page latches instead of releasing the whole mtr latches. See MDEV-27557.

            One more way to fix it is not to execute that "don't move forward cursor if the record with the same unique fields is found" logic for non-locking reads at all. Because it does not make sense for non-locking read. For locking read that logic was implemented to found just inserted record after reading transaction is awaken, and cursor position is restored. This is necessary for correct locking order. For non-locking read locking order does not make sense. And even if something was inserted after read view is created, it will not be visible for that read view(depending on isolation level).

            I would prefer to create separate task to implement marko's idea for the future optimisation and implement my approach now.

            Locking read

            For locking read row_search_mvcc() contains the following code:

            dberr_t row_search_mvcc(...)                                                    
            {                                                                               
            ...                                                                             
            locks_ok:                                                                       
                    /* NOTE that at this point rec can be an old version of a clustered     
                    index record built for a consistent read. We cannot assume after this   
                    point that rec is on a buffer pool page. Functions like                 
                    page_rec_is_comp() cannot be used! */                                   
                                                                                            
                    if (rec_get_deleted_flag(rec, comp)) {                                  
            locks_ok_del_marked:                                                            
                            /* In delete-marked records, DB_TRX_ID must                     
                            always refer to an existing undo log record. */                 
                            ut_ad(index != clust_index                                      
                                  || row_get_rec_trx_id(rec, index, offsets));              
                                                                                            
                            /* The record is delete-marked: we can skip it */               
                                                                                            
                            /* This is an optimization to skip setting the next key lock    
                            on the record that follows this delete-marked record. This      
                            optimization works because of the unique search criteria        
                            which precludes the presence of a range lock between this       
                            delete marked record and the record following it.               
                                                                                            
                            For now this is applicable only to clustered indexes while      
                            doing a unique search except for HANDLER queries because        
                            HANDLER allows NEXT and PREV even in unique search on           
                            clustered index. There is scope for further optimization        
                            applicable to unique secondary indexes. Current behaviour is    
                            to widen the scope of a lock on an already delete marked record 
                            if the same record is deleted twice by the same transaction */  
                            if (index == clust_index && unique_search                       
                                && !prebuilt->used_in_HANDLER) {                            
                                                                                            
                                    err = DB_RECORD_NOT_FOUND;                              
                                                                                            
                                    goto normal_return;                                     
                            }
                                                                                            
                            goto next_rec;                                                  
                    }                                                                       
            ...                                                                             
            }
            

            I.e. delete-marked record is ignored, and either the next record is read, or row_search_mvcc() returns DB_RECORD_NOT_FOUND. So case (1) is impossible. Case (3) is also impossible for the same reason, the above code shows that clustered index record is not requested for delete-marked secondary index record.

            Consider case (2). To repeat the issue there must be two records with the same unique fields, the first record (in index order) must be non-delete-marked, the second record must be delete-marked. Some transaction must hold lock on delete-marked record, and some other transaction must hold lock on non-delete-marked record, and create waiting lock and suspend on delete-marked record. In any case the first record will always be locked before the second record. So any transaction with conflicting lock will be blocked on the first record. The locks will be released on transaction commit, and during this process lock hash table cell will be latched, what means any other transaction will wait until both locks are released, and there can't be a situation when some other transaction acquired first record lock, and blocked on the second record lock(to let "purge" to remove stored in persistent cursor record). Is the case (2) is impossible too.

            Summary

            The issue caused by my changes does not exist for locking read. It does exist for non-locking read, but for non-locking read my changes don't make sense, so we can just switch them off for this case. We can also prevent mtr commit in the case when we have to read clustered index record during secondary index search, and unlatch only clustered index pages, as marko suggested, it's safe because clustered index pages are S-latched in this case(see MDEV-27557).

            vlad.lesin Vladislav Lesin added a comment - - edited Some update and analyses. The initial issue. During the last RQG testing "CHECK TABLE" showed error in the following code: int ha_innobase::check(...) { ... if (index == dict_table_get_first_index(m_prebuilt->table)) { n_rows_in_table = n_rows; } else if (!(index->type & DICT_FTS) && (n_rows != n_rows_in_table)) { push_warning_printf( thd, Sql_condition::WARN_LEVEL_WARN, ER_NOT_KEYFILE, "InnoDB: Index '%-.200s' contains " ULINTPF " entries, should be " ULINTPF "." , index->name(), n_rows, n_rows_in_table); is_ok = false ; dict_set_corrupted(index, "CHECK TABLE; Wrong count" , false ); } ... } The number of rows does not match for the first and the next indexes. The initial analyses The scenario is the following. Suppose we have a pair of records in secondary index with the same unique fields, the first record is non-delete-marked, the second record is delete-marked-committed. The transaction of "CHECK TABLE" does non-locking read of secondary index. After the record from secondary index is read in row_search_mvcc(), the correspondent clustered index record is also requested(see mtr_has_extra_clust_latch local variable in row_search_mvcc()). After that before requesting the next secondary index record in row_search_mvcc() cursor position is stored, mtr is committed, mtr is started and cursor position is restored(to preserve clustered index pages latching order). In the room when the page was unlatched(after mtr commit) the cursor's record was purged, and then cursor position was restored to the previous secondary index record. As the previous record had the same unique fields as the unique fields in the stored in cursor record, the cursor was not moved forward, and the same record was pushed twice in results set. The above case can happen only if cursor position is stored, the page is unlatched, purge thread removed stored in cursor record, and then the cursor position is restored. There can be 3 cases when row_search_mvcc() does the above(see btr_pcur_store_position() and sel_restore_position_for_mysql() calls in row_search_mvcc()): 1) When row_search_mvcc() is finishing. It can be invoked with the same cursor again to continue index reading, see ha_innobase::index_next() as an example; 2) before suspending when it got waiting lock; 3) when there is search in secondary index, and we need to request clustered index too(see the above example for "CHECK TABLE"). Consider the cases for locking and non-locking reads. Non-locking read Case (1) should not cause the issue because if the locking read view can see the record and the view is still opened, the record can't be purged. Case (2) can't happen for non-locking read by definition. For non-locking read for case(3) the following code requests clustered index record even if the secondary index record is delete-marked: dberr_t row_search_mvcc(...) { ... if (prebuilt->select_lock_type != LOCK_NONE) { ... } else { ... if (trx->isolation_level == TRX_ISO_READ_UNCOMMITTED || prebuilt->table->is_temporary() || prebuilt->table->no_rollback()) { /* Do nothing: we let a non-locking SELECT read the latest version of the record */ } else if (index == clust_index) { ... } else { /* We are looking into a non-clustered index, and to get the right version of the record we have to look also into the clustered index: this is necessary, because we can only get the undo information via the clustered index record. */ ... if (!srv_read_only_mode) { ... if (trx->read_view.sees(trx_id)) { goto locks_ok; } /* We should look at the clustered index. However, as this is a non-locking read, we can skip the clustered index lookup if the condition does not match the secondary index entry. */ switch (row_search_idx_cond_check( buf, prebuilt, rec, offsets)) { ... case CHECK_POS: goto requires_clust_rec; } ut_error; } } } ... } That is why for locking read row_search_mvcc() can unlatch the page after clustered index record is read: dberr_t row_search_mvcc(...) { ... } else if (mtr_has_extra_clust_latch) { /* If we have extra cluster latch, we must commit mtr if we are moving to the next non-clustered index record, because we could break the latching order if we would access a different clustered index page right away without releasing the previous. */ btr_pcur_store_position(pcur, &mtr); mtr.commit(); mtr_has_extra_clust_latch = FALSE; mtr.start(); if (sel_restore_position_for_mysql(&same_user_rec, BTR_SEARCH_LEAF, pcur, moves_up, &mtr)) { goto rec_loop; } } ... } And when the page is unlatched, purge has ability to remove stored in the cursor record, what causes rows duplication in result set. marko proposed to use mtr_t::get_savepoint() and mtr_t::release_s_latch_at_savepoint() to solve the issue. I.e. we can create mtr savepoint before requesting clustered index record, then release only clustered index page latches instead of releasing the whole mtr latches. See MDEV-27557 . One more way to fix it is not to execute that "don't move forward cursor if the record with the same unique fields is found" logic for non-locking reads at all. Because it does not make sense for non-locking read. For locking read that logic was implemented to found just inserted record after reading transaction is awaken, and cursor position is restored. This is necessary for correct locking order. For non-locking read locking order does not make sense. And even if something was inserted after read view is created, it will not be visible for that read view(depending on isolation level). I would prefer to create separate task to implement marko 's idea for the future optimisation and implement my approach now. Locking read For locking read row_search_mvcc() contains the following code: dberr_t row_search_mvcc(...) { ... locks_ok: /* NOTE that at this point rec can be an old version of a clustered index record built for a consistent read. We cannot assume after this point that rec is on a buffer pool page. Functions like page_rec_is_comp() cannot be used! */ if (rec_get_deleted_flag(rec, comp)) { locks_ok_del_marked: /* In delete-marked records, DB_TRX_ID must always refer to an existing undo log record. */ ut_ad(index != clust_index || row_get_rec_trx_id(rec, index, offsets)); /* The record is delete-marked: we can skip it */ /* This is an optimization to skip setting the next key lock on the record that follows this delete-marked record. This optimization works because of the unique search criteria which precludes the presence of a range lock between this delete marked record and the record following it. For now this is applicable only to clustered indexes while doing a unique search except for HANDLER queries because HANDLER allows NEXT and PREV even in unique search on clustered index. There is scope for further optimization applicable to unique secondary indexes. Current behaviour is to widen the scope of a lock on an already delete marked record if the same record is deleted twice by the same transaction */ if (index == clust_index && unique_search && !prebuilt->used_in_HANDLER) { err = DB_RECORD_NOT_FOUND; goto normal_return; } goto next_rec; } ... } I.e. delete-marked record is ignored, and either the next record is read, or row_search_mvcc() returns DB_RECORD_NOT_FOUND. So case (1) is impossible. Case (3) is also impossible for the same reason, the above code shows that clustered index record is not requested for delete-marked secondary index record. Consider case (2). To repeat the issue there must be two records with the same unique fields, the first record (in index order) must be non-delete-marked, the second record must be delete-marked. Some transaction must hold lock on delete-marked record, and some other transaction must hold lock on non-delete-marked record, and create waiting lock and suspend on delete-marked record. In any case the first record will always be locked before the second record. So any transaction with conflicting lock will be blocked on the first record. The locks will be released on transaction commit, and during this process lock hash table cell will be latched, what means any other transaction will wait until both locks are released, and there can't be a situation when some other transaction acquired first record lock, and blocked on the second record lock(to let "purge" to remove stored in persistent cursor record). Is the case (2) is impossible too. Summary The issue caused by my changes does not exist for locking read. It does exist for non-locking read, but for non-locking read my changes don't make sense, so we can just switch them off for this case. We can also prevent mtr commit in the case when we have to read clustered index record during secondary index search, and unlatch only clustered index pages, as marko suggested, it's safe because clustered index pages are S-latched in this case(see MDEV-27557 ).
            vlad.lesin Vladislav Lesin made changes -
            vlad.lesin Vladislav Lesin made changes -
            Status In Progress [ 3 ] In Testing [ 10301 ]
            vlad.lesin Vladislav Lesin made changes -
            Assignee Vladislav Lesin [ vlad.lesin ] Matthias Leich [ mleich ]
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -

            origin/bb-10.5-MDEV-20605-cur-pos-fix 9f032f03fa8fd674c9b63401a12aed144ad3aa34 2022-02-10T21:52:50+03:00
            performed well in RQG testing.
            The issues observed occur on main source trees too.
            

            mleich Matthias Leich added a comment - origin/bb-10.5-MDEV-20605-cur-pos-fix 9f032f03fa8fd674c9b63401a12aed144ad3aa34 2022-02-10T21:52:50+03:00 performed well in RQG testing. The issues observed occur on main source trees too.
            mleich Matthias Leich made changes -
            Assignee Matthias Leich [ mleich ] Vladislav Lesin [ vlad.lesin ]
            Status In Testing [ 10301 ] Stalled [ 10000 ]
            vlad.lesin Vladislav Lesin made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            vlad.lesin Vladislav Lesin made changes -
            Summary With optimistic and aggressive modes of parallel replication, some replicated statements have no effect Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration
            vlad.lesin Vladislav Lesin made changes -
            Component/s Replication [ 10100 ]

            OK to push. The MDEV-27557 optimization can be implemented later.

            marko Marko Mäkelä added a comment - OK to push. The MDEV-27557 optimization can be implemented later.
            vlad.lesin Vladislav Lesin made changes -
            Fix Version/s 10.5.16 [ 27508 ]
            Fix Version/s 10.6.8 [ 27506 ]
            Fix Version/s 10.7.4 [ 27504 ]
            Fix Version/s 10.8.3 [ 27502 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            vlad.lesin Vladislav Lesin made changes -
            Fix Version/s 10.3.35 [ 27512 ]
            Fix Version/s 10.4.25 [ 27510 ]

            The branch bb-10.2-MDEV-27025-MDEV-20605 is for 10.2-based custom builds.

            vlad.lesin Vladislav Lesin added a comment - The branch bb-10.2- MDEV-27025 - MDEV-20605 is for 10.2-based custom builds.
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            alice Alice Sherepa made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            vlad.lesin Vladislav Lesin made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201658 136328 160775 151737
            Zendesk active tickets 201658
            vlad.lesin Vladislav Lesin made changes -
            Attachment MDEV-20605-mysql.test [ 74058 ]

            I've just attached modified test for mysql(it's still not fixed in MySQL-9.0.1): MDEV-20605-mysql.test.

            vlad.lesin Vladislav Lesin added a comment - I've just attached modified test for mysql(it's still not fixed in MySQL-9.0.1): MDEV-20605-mysql.test .
            midenok Aleksey Midenkov made changes -

            People

              vlad.lesin Vladislav Lesin
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              5 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.