Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25036

use bulk insert optimization for multiple insert statements

Details

    Description

      After MDEV-515 InnoDB can handle bulk inserts better. But still that optimization only applies to the first INSERT into the empty table. It works well for INSERT...SELECT and LOAD DATA, but, for example, mysqldump generated dumps contain a series of INSERT statements, and that usage is still poorly optimized.

      The problem here is that in the bulk insert mode InnoDB cannot rollback individual statements, all the work done in the bulk insert mode is atomic, as a whole.

      There should either be some way to express it in SQL or make sure there will be no rollbacks.

      Another complication is that in the bulk insert mode InnoDB locks the whole table, not individual rows.

      A typical sql dump looks like

      /*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
      /*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
      LOCK TABLES `t1` WRITE;
      /*!40000 ALTER TABLE `t1` DISABLE KEYS */;
      INSERT INTO `t1` VALUES (...), (...), ...
      /*!40000 ALTER TABLE `t1` ENABLE KEYS */;
      UNLOCK TABLES;
      /*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
      /*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
      

      It seems that with UNIQUE_CHECKS=0 and FOREIGN_KEY_CHECKS=0 there should be no rollbacks. But with CHECK constraints there still should be? And as inserts are done under LOCK TABLES, InnoDB fine grained row level locking is wasted here. So, it seems to be possible to get MDEV-515 to work with no syntax extensions in the specific case of mysqldump output.


      To make it work generally we'll need a new syntax that will mean "this block of work can only be rolled back as a whole"

      A standard approach would be something like

      BEGIN ATOMIC;
      DECLARE UNDO HANDLER FOR SQLEXCEPTION RESIGNAL;
      

      which means pretty much exactly what we want to do. But the syntax is, perhaps, too awkward?

      Non-standard suggestions, copied from MDEV-24818:

      START TRANSACTION ATOMIC;
      --
      START TRANSACTION WITH ATOMIC ROLLBACK;
      --
      START TRANSACTION FOR BULK LOAD;
      

      Attachments

        Issue Links

          Activity

            The setting unique_checks=0 only has an effect on the InnoDB change buffer. If InnoDB decides to buffer an insert into a unique secondary index (with the default unique_checks=1 it can never even be attempted), then the unique key violation would go unnoticed until some later point of time when the buffered change is actually merged and the corruption of the index might be noticed, possibly leading to a crash of the server, or some funny symptoms like MDEV-9663. I think that for exactly this reason, unique_checks should be have been deprecated and removed years ago, or at least ignored by InnoDB.

            My guess is that the motivation of the setting unique_checks=0 was to allow change buffering to speed up large data loads, in case the user really knows that no duplicate keys can occur. A much better speedup would be provided by MDEV-24621.

            That said, InnoDB could theoretically interpret the combination of unique_checks=0 and foreign_key_checks=0 to enable the bulk insert optimization. After all, the user swore that there cannot be any errors, hence we should be free to do what we like if those errors are present.

            marko Marko Mäkelä added a comment - The setting unique_checks=0 only has an effect on the InnoDB change buffer. If InnoDB decides to buffer an insert into a unique secondary index (with the default unique_checks=1 it can never even be attempted), then the unique key violation would go unnoticed until some later point of time when the buffered change is actually merged and the corruption of the index might be noticed, possibly leading to a crash of the server, or some funny symptoms like MDEV-9663 . I think that for exactly this reason, unique_checks should be have been deprecated and removed years ago, or at least ignored by InnoDB. My guess is that the motivation of the setting unique_checks=0 was to allow change buffering to speed up large data loads, in case the user really knows that no duplicate keys can occur. A much better speedup would be provided by MDEV-24621 . That said, InnoDB could theoretically interpret the combination of unique_checks=0 and foreign_key_checks=0 to enable the bulk insert optimization. After all, the user swore that there cannot be any errors, hence we should be free to do what we like if those errors are present.

            thiru pointed out that if MDEV-24621 is to support multiple INSERT statements per transaction (which currently is possible thanks to the MDEV-24818 hack of SET unique_checks=0, foreign_key_checks=0), we would have to report duplicate key errors after the INSERT statement was already executed, for example, during a subsequent non-INSERT statement. Because reporting a duplicate key error for a previous statement could be extremely confusing, I think that it could be easiest to allow multi-statement INSERT operation of MDEV-24621 only when the transaction has been started in a special mode.

            marko Marko Mäkelä added a comment - thiru pointed out that if MDEV-24621 is to support multiple INSERT statements per transaction (which currently is possible thanks to the MDEV-24818 hack of SET unique_checks=0, foreign_key_checks=0 ), we would have to report duplicate key errors after the INSERT statement was already executed, for example, during a subsequent non- INSERT statement. Because reporting a duplicate key error for a previous statement could be extremely confusing, I think that it could be easiest to allow multi-statement INSERT operation of MDEV-24621 only when the transaction has been started in a special mode.

            When this is being implemented, please also improve the cryptic error message currently seen when executing:

            SET autocommit=0,foreign_key_checks=0,unique_checks=0;
            CREATE TABLE t (c1 INT KEY,c2 INT UNIQUE) ENGINE=InnoDB;
            BEGIN;
            INSERT INTO t VALUES (1,0),(2,0);
            

            Leading to:

            10.8.0 ccb345e2a3616590ea741830ded1b7f645639de0 (Optimized)

            10.8.0-opt>SET autocommit=0,foreign_key_checks=0,unique_checks=0;
            Query OK, 0 rows affected (0.000 sec)
             
            10.8.0-opt>CREATE TABLE t (c1 INT KEY,c2 INT UNIQUE) ENGINE=InnoDB;
            Query OK, 0 rows affected (0.018 sec)
             
            10.8.0-opt>BEGIN;
            Query OK, 0 rows affected (0.000 sec)
             
            10.8.0-opt>INSERT INTO t VALUES (1,0),(2,0);
            ERROR 1180 (HY000): Got error 1 "Operation not permitted" during COMMIT
            

            History: see MDEV-26947

            Roel Roel Van de Paar added a comment - When this is being implemented, please also improve the cryptic error message currently seen when executing: SET autocommit=0,foreign_key_checks=0,unique_checks=0; CREATE TABLE t (c1 INT KEY ,c2 INT UNIQUE ) ENGINE=InnoDB; BEGIN ; INSERT INTO t VALUES (1,0),(2,0); Leading to: 10.8.0 ccb345e2a3616590ea741830ded1b7f645639de0 (Optimized) 10.8.0-opt>SET autocommit=0,foreign_key_checks=0,unique_checks=0; Query OK, 0 rows affected (0.000 sec)   10.8.0-opt>CREATE TABLE t (c1 INT KEY,c2 INT UNIQUE) ENGINE=InnoDB; Query OK, 0 rows affected (0.018 sec)   10.8.0-opt>BEGIN; Query OK, 0 rows affected (0.000 sec)   10.8.0-opt>INSERT INTO t VALUES (1,0),(2,0); ERROR 1180 (HY000): Got error 1 "Operation not permitted" during COMMIT History: see MDEV-26947

            We want something that is trivial to use and highly compatible with old applications.
            There is no benefit for the user (or the server) to set unique_checks=0.

            • Users do not want to disable unique checks.
            • There is almost zero cost to check for duplicate keys as part of creating an unique index.
              (MyISAM has done that for 40 years...)

            I suggest the following 'easy solution' that would allow new users and old application to benefit
            of the new optimization

            • Add an option to "start transaction' like 'atomic' that will enable the optimzation for empty tables.
            • Add a session variable "innodb-optimize-insert-into-empty-table' (or similar) which enables optimization of insert into empty table.
            • The optimzation should happen if either of the two options is used.

            I would also like to suggest that the first auto-commit insert into a table would also use this optimization.
            This would make CREATE TABLE ; LOAD DATA INFILE fast!

            I would expect that most users would use innodb-optimize-insert-into-empty-table=1 because of it's great benefits, in spite of
            the problem outlined in MDEV-24818 Concurrent use of InnoDB table is impossible until the first transaction is finished

            monty Michael Widenius added a comment - We want something that is trivial to use and highly compatible with old applications. There is no benefit for the user (or the server) to set unique_checks=0. Users do not want to disable unique checks. There is almost zero cost to check for duplicate keys as part of creating an unique index. (MyISAM has done that for 40 years...) I suggest the following 'easy solution' that would allow new users and old application to benefit of the new optimization Add an option to "start transaction' like 'atomic' that will enable the optimzation for empty tables. Add a session variable "innodb-optimize-insert-into-empty-table' (or similar) which enables optimization of insert into empty table. The optimzation should happen if either of the two options is used. I would also like to suggest that the first auto-commit insert into a table would also use this optimization. This would make CREATE TABLE ; LOAD DATA INFILE fast! I would expect that most users would use innodb-optimize-insert-into-empty-table=1 because of it's great benefits, in spite of the problem outlined in MDEV-24818 Concurrent use of InnoDB table is impossible until the first transaction is finished

            People

              Unassigned Unassigned
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.