Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22760

Bulk INSERT...ON DUPLICATE KEY UPDATE updates only a fraction of rows

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.4.13, 10.4(EOL), 10.5
    • 10.5
    • Server
    • None
    • Windows 10 64-bit

    Description

      Consider the following table:

      CREATE TABLE `demo_upsert` (
          `ID` BIGINT(20) NOT NULL AUTO_INCREMENT,
          `Url` TEXT NOT NULL COLLATE 'utf8mb4_unicode_ci',
          `LastListID` BIGINT(20) NULL DEFAULT NULL,
          `Statistics` BIGINT(20) NOT NULL DEFAULT '0',
          PRIMARY KEY (`ID`) USING BTREE,
          UNIQUE INDEX `Url` (`Url`) USING HASH
      )
      COLLATE='utf8mb4_unicode_ci'
      ENGINE=InnoDB
      ;
      

      When running the following program:

      #include <cassert>
      #include <string>
      #include <random>
      #include <iostream>
      #include "mariadb/mysql.h"
      #include "Secret.h" // for database credentials
      using namespace std;
       
      string randStr(const int len, mt19937_64& rng) {
        string ans;
        for (int i = 0; i < len; i++) {
          ans += ('a' + rng() % 26);
        }
        return ans;
      }
       
      int main() {
        MYSQL mysql;
        mysql_library_init(0, nullptr, nullptr);
        mysql_thread_init();
        MYSQL* pmsRet = mysql_init(&mysql);
        assert(pmsRet);
        pmsRet = mysql_real_connect(&mysql, ConnProp::_cpHost, ConnProp::_cpUser, ConnProp::_cpPasswd,
          ConnProp::_cpDB, ConnProp::_cpPort, nullptr, CLIENT_COMPRESS | CLIENT_LOCAL_FILES | CLIENT_MULTI_STATEMENTS
          | CLIENT_MULTI_RESULTS);
        assert(pmsRet);
        const int lastListID = 10;
        mt19937_64 rng;
        for (int i = 0; i < 3; i++) {
          string query;
          query = "INSERT INTO demo_upsert (Url, LastListID) VALUES ";
          for (int j = 0; j < 15000; j++) {
            if (j != 0) {
              query += ", ";
            }
            query += "('" + randStr(3, rng) + "', " + to_string(lastListID) + ")";
          }
          query += " ON DUPLICATE KEY UPDATE LastListID=" + to_string(lastListID);
          int iRet = mysql_real_query(&mysql, query.c_str(), query.size());
          assert(iRet == 0);
          int64_t nAffected = mysql_affected_rows(&mysql);
          cout << nAffected << ' ';
          query = "UPDATE demo_upsert SET LastListID=NULL";
          iRet = mysql_real_query(&mysql, query.c_str(), query.size());
          assert(iRet == 0);
          nAffected = mysql_affected_rows(&mysql);
          cout << nAffected << endl;
        }
        return 0;
      }
      

      I get for example the following output:
      10130 10130
      11627 5814
      11620 5811

      It indicates the following:
      Initially, when the table is empty, bulk INSERT...ON DUPLICATE KEY UPDATE seems to insert correctly 10130 distinct values (out of 15000 non-distinct) because the following UPDATE query sets exactly that amount to NULL.
      However, the next bulk INSERT...ON DUPLICATE KEY UPDATE sets LastListID to non-null for only 5814 distinct rows out of ~10K expected distinct rows out of 15K non-distinct. We infer this because the following UPDATE reports affected only 5814 rows, meaning that the previous INSERT...ON DUPLICATE KEY UPDATE called UPDATE only for that number of rows.

      I cannot reproduce the problem from HeidiSQL 11.0.0.5989. I could only reproduce it when using Connector/C versions 3.1.7 and 3.1.8 (I didn't try the older versions). The problem appears for both a direct and prepared statement.

      MariaDB server versions tried: 10.4.13 and 10.4.12.

      Attachments

        Issue Links

          Activity

            georg Georg Richter added a comment - - edited

            Thanks for your bug report.

            Can you please also report the numbers HeidiSQL returns and in which sql mode HeidiSQL is running (select @@sql_mode) ?

            georg Georg Richter added a comment - - edited Thanks for your bug report. Can you please also report the numbers HeidiSQL returns and in which sql mode HeidiSQL is running (select @@sql_mode) ?
            rserge Serge Rogatch added a comment -

            Thank you for starting to look into this quickly. I'm going to do the above experiments right now and will post here afterward.

            rserge Serge Rogatch added a comment - Thank you for starting to look into this quickly. I'm going to do the above experiments right now and will post here afterward.
            rserge Serge Rogatch added a comment -

            When I just dump the queries into a SQL file and then copy-paste the file into HeidiSQL and run the 6 queries, I get the following output:
            /* Affected rows: 55,132 Found rows: 0 Warnings: 0 Duration for 6 queries: 1.140 sec. */

            The updated program and the script are attached.
            MariaDBBugInsertUpdate.cpp QueryLog.txt

            Below is the output for select @@sql_mode in HeidiSQL:
            STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

            When I run the queries one by one, I get:
            INSERT #1: /* Affected rows: 10,130 Found rows: 0 Warnings: 0 Duration for 1 query: 0.594 sec. */
            UPDATE #1: /* Affected rows: 10,130 Found rows: 0 Warnings: 0 Duration for 1 query: 0.110 sec. */
            INSERT #2: /* Affected rows: 11,627 Found rows: 0 Warnings: 0 Duration for 1 query: 0.235 sec. */
            UPDATE #2: /* Affected rows: 5,814 Found rows: 0 Warnings: 0 Duration for 1 query: 0.093 sec. */
            INSERT #3: /* Affected rows: 11,620 Found rows: 0 Warnings: 0 Duration for 1 query: 0.188 sec. */
            UPDATE #3: /* Affected rows: 5,811 Found rows: 0 Warnings: 0 Duration for 1 query: 0.062 sec. */

            So apparently HeidiSQL behaves the same way when the queries are run one by one. Sorry for misaddressing the bug report. Now I tend to think that the error is in the core DB server. Is there a way to migrate the issue to the proper project?

            rserge Serge Rogatch added a comment - When I just dump the queries into a SQL file and then copy-paste the file into HeidiSQL and run the 6 queries, I get the following output: /* Affected rows: 55,132 Found rows: 0 Warnings: 0 Duration for 6 queries: 1.140 sec. */ The updated program and the script are attached. MariaDBBugInsertUpdate.cpp QueryLog.txt Below is the output for select @@sql_mode in HeidiSQL: STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION When I run the queries one by one, I get: INSERT #1: /* Affected rows: 10,130 Found rows: 0 Warnings: 0 Duration for 1 query: 0.594 sec. */ UPDATE #1: /* Affected rows: 10,130 Found rows: 0 Warnings: 0 Duration for 1 query: 0.110 sec. */ INSERT #2: /* Affected rows: 11,627 Found rows: 0 Warnings: 0 Duration for 1 query: 0.235 sec. */ UPDATE #2: /* Affected rows: 5,814 Found rows: 0 Warnings: 0 Duration for 1 query: 0.093 sec. */ INSERT #3: /* Affected rows: 11,620 Found rows: 0 Warnings: 0 Duration for 1 query: 0.188 sec. */ UPDATE #3: /* Affected rows: 5,811 Found rows: 0 Warnings: 0 Duration for 1 query: 0.062 sec. */ So apparently HeidiSQL behaves the same way when the queries are run one by one. Sorry for misaddressing the bug report. Now I tend to think that the error is in the core DB server. Is there a way to migrate the issue to the proper project?
            georg Georg Richter added a comment -

            changed to server bug (MDEV)

            georg Georg Richter added a comment - changed to server bug (MDEV)

            Thanks for the report and test case.

            The culprit here is the long unique index on blob (text) column.
            For a workaround, you could use a prefixed unique key instead of the long blob, e.g.

            CREATE TABLE `demo_upsert` (
                `ID` BIGINT(20) NOT NULL AUTO_INCREMENT,
                `Url` TEXT NOT NULL COLLATE 'utf8mb4_unicode_ci',
                `LastListID` BIGINT(20) NULL DEFAULT NULL,
                `Statistics` BIGINT(20) NOT NULL DEFAULT '0',
                PRIMARY KEY (`ID`) USING BTREE,
                UNIQUE INDEX `Url` (`Url`(20))
            )
            COLLATE='utf8mb4_unicode_ci'
            ENGINE=InnoDB
            ;
            

            (or whatever reasonable length suits your actual data).

            The actual problem here appears to be not as much that existing values don't get updated, but mainly that new values don't get inserted. It is not easy to infer from the behavior of UPDATE.

            Let's take the 1st and 2nd INSERT/UPDATE from your SQL an insert them into separate identical tables. demo_upsert1 below contains results of the 1st INSERT, and demo_upsert2 – results of the 2nd INSERT.

            MariaDB [test]> select count(*) from demo_upsert1;
            +----------+
            | count(*) |
            +----------+
            |    10130 |
            +----------+
            1 row in set (0.032 sec)
             
            MariaDB [test]> select count(*) from demo_upsert2;
            +----------+
            | count(*) |
            +----------+
            |    10014 |
            +----------+
            1 row in set (0.032 sec)
             
            MariaDB [test]> select count(*) from (select Url from demo_upsert1 intersect select Url from demo_upsert2) sq;
            +----------+
            | count(*) |
            +----------+
            |     5817 |
            +----------+
            1 row in set (1.157 sec)
             
            MariaDB [test]> select count(*) from (select Url from demo_upsert2 except select Url from demo_upsert1) sq;
            +----------+
            | count(*) |
            +----------+
            |     4197 |
            +----------+
            1 row in set (1.119 sec)
            

            So, the 1st INSERT is supposed to insert 10130 rows, which it does.
            Then the 1st UPDATE is supposed to update all 10130 rows from 10 to NULL, which it also does.
            The 2nd INSERT is supposed to update 5817 duplicate rows, and insert 4197 ones.

            For new 4197 rows the value of LastListID would be inserted as 10, for 5817 duplicates it will be updated from NULL to 10, but there would also be 10130-5817=4313 rows left intact with LastListID NULL. Thus, just from the behavior of UPDATE you can't say whether it updates a fraction of values because the previous upsert didn't update existing ones or didn't insert new ones.

            Here is a simplified test case where we can see the actual values. The structures and values are taken from the original one, I just reduced the number of rows.

            --source include/have_innodb.inc
             
            DROP TABLE IF EXISTS demo_upsert;
             
            CREATE TABLE `demo_upsert` (
                `ID` BIGINT(20) NOT NULL AUTO_INCREMENT,
                `Url` TEXT,
                `LastListID` BIGINT(20) NULL DEFAULT NULL,
                PRIMARY KEY (`ID`) USING BTREE,
                UNIQUE INDEX `Url` (`Url`)
            ) ENGINE=InnoDB;
             
            INSERT INTO demo_upsert (Url) VALUES ('iaa'),('qlf');
             
            SELECT * FROM demo_upsert;
             
            INSERT INTO demo_upsert (Url) VALUES
              ('knx'),
              ('qlf'), # This is a duplicate value
              ('bdu')
            ON DUPLICATE KEY UPDATE LastListID=10;
             
            SELECT * FROM demo_upsert;
             
            # Cleanup
            DROP TABLE demo_upsert;
            

            The first SELECT returns the expected two rows which were just inserted:

            MariaDB [test]> SELECT * FROM demo_upsert;
            +----+------+------------+
            | ID | Url  | LastListID |
            +----+------+------------+
            |  1 | iaa  |       NULL |
            |  2 | qlf  |       NULL |
            +----+------+------------+
            2 rows in set (0.000 sec)
            

            The second INSERT is supposed to update one row (qlf) and insert two new ones, knx and bdu.

            MariaDB [test]> INSERT INTO demo_upsert (Url) VALUES
                ->   ('knx'),
                ->   ('qlf'), # This is a duplicate value
                ->   ('bdu')
                -> ON DUPLICATE KEY UPDATE LastListID=10;
            Query OK, 5 rows affected (0.048 sec)
            Records: 3  Duplicates: 2  Warnings: 0
            

            Stats are already wrong. There is only 1 duplicate, not 2, and the total number of affected rows should be 4 (because upserted rows are counted twice), but not 5.
            And the following SELECT shows that the contents is wrong as well:

            MariaDB [test]> SELECT * FROM demo_upsert;
            +----+------+------------+
            | ID | Url  | LastListID |
            +----+------+------------+
            |  1 | iaa  |       NULL |
            |  2 | qlf  |         10 |
            |  3 | knx  |         10 |
            +----+------+------------+
            3 rows in set (0.000 sec)
            

            The duplicate row did get updated, but out of 2 new ones only one got inserted.

            elenst Elena Stepanova added a comment - Thanks for the report and test case. The culprit here is the long unique index on blob (text) column. For a workaround, you could use a prefixed unique key instead of the long blob, e.g. CREATE TABLE `demo_upsert` ( `ID` BIGINT (20) NOT NULL AUTO_INCREMENT, `Url` TEXT NOT NULL COLLATE 'utf8mb4_unicode_ci' , `LastListID` BIGINT (20) NULL DEFAULT NULL , ` Statistics ` BIGINT (20) NOT NULL DEFAULT '0' , PRIMARY KEY (`ID`) USING BTREE, UNIQUE INDEX `Url` (`Url`(20)) ) COLLATE = 'utf8mb4_unicode_ci' ENGINE=InnoDB ; (or whatever reasonable length suits your actual data). The actual problem here appears to be not as much that existing values don't get updated , but mainly that new values don't get inserted . It is not easy to infer from the behavior of UPDATE. Let's take the 1st and 2nd INSERT/UPDATE from your SQL an insert them into separate identical tables. demo_upsert1 below contains results of the 1st INSERT, and demo_upsert2 – results of the 2nd INSERT. MariaDB [test]> select count (*) from demo_upsert1; + ----------+ | count (*) | + ----------+ | 10130 | + ----------+ 1 row in set (0.032 sec)   MariaDB [test]> select count (*) from demo_upsert2; + ----------+ | count (*) | + ----------+ | 10014 | + ----------+ 1 row in set (0.032 sec)   MariaDB [test]> select count (*) from ( select Url from demo_upsert1 intersect select Url from demo_upsert2) sq; + ----------+ | count (*) | + ----------+ | 5817 | + ----------+ 1 row in set (1.157 sec)   MariaDB [test]> select count (*) from ( select Url from demo_upsert2 except select Url from demo_upsert1) sq; + ----------+ | count (*) | + ----------+ | 4197 | + ----------+ 1 row in set (1.119 sec) So, the 1st INSERT is supposed to insert 10130 rows, which it does. Then the 1st UPDATE is supposed to update all 10130 rows from 10 to NULL, which it also does. The 2nd INSERT is supposed to update 5817 duplicate rows, and insert 4197 ones. For new 4197 rows the value of LastListID would be inserted as 10 , for 5817 duplicates it will be updated from NULL to 10, but there would also be 10130-5817=4313 rows left intact with LastListID NULL. Thus, just from the behavior of UPDATE you can't say whether it updates a fraction of values because the previous upsert didn't update existing ones or didn't insert new ones. Here is a simplified test case where we can see the actual values. The structures and values are taken from the original one, I just reduced the number of rows. --source include/have_innodb.inc   DROP TABLE IF EXISTS demo_upsert;   CREATE TABLE `demo_upsert` ( `ID` BIGINT (20) NOT NULL AUTO_INCREMENT, `Url` TEXT, `LastListID` BIGINT (20) NULL DEFAULT NULL , PRIMARY KEY (`ID`) USING BTREE, UNIQUE INDEX `Url` (`Url`) ) ENGINE=InnoDB;   INSERT INTO demo_upsert (Url) VALUES ( 'iaa' ),( 'qlf' );   SELECT * FROM demo_upsert;   INSERT INTO demo_upsert (Url) VALUES ( 'knx' ), ( 'qlf' ), # This is a duplicate value ( 'bdu' ) ON DUPLICATE KEY UPDATE LastListID=10;   SELECT * FROM demo_upsert;   # Cleanup DROP TABLE demo_upsert; The first SELECT returns the expected two rows which were just inserted: MariaDB [test]> SELECT * FROM demo_upsert; + ----+------+------------+ | ID | Url | LastListID | + ----+------+------------+ | 1 | iaa | NULL | | 2 | qlf | NULL | + ----+------+------------+ 2 rows in set (0.000 sec) The second INSERT is supposed to update one row ( qlf ) and insert two new ones, knx and bdu . MariaDB [test]> INSERT INTO demo_upsert (Url) VALUES -> ( 'knx' ), -> ( 'qlf' ), # This is a duplicate value -> ( 'bdu' ) -> ON DUPLICATE KEY UPDATE LastListID=10; Query OK, 5 rows affected (0.048 sec) Records: 3 Duplicates: 2 Warnings: 0 Stats are already wrong. There is only 1 duplicate, not 2, and the total number of affected rows should be 4 (because upserted rows are counted twice), but not 5. And the following SELECT shows that the contents is wrong as well: MariaDB [test]> SELECT * FROM demo_upsert; + ----+------+------------+ | ID | Url | LastListID | + ----+------+------------+ | 1 | iaa | NULL | | 2 | qlf | 10 | | 3 | knx | 10 | + ----+------+------------+ 3 rows in set (0.000 sec) The duplicate row did get updated, but out of 2 new ones only one got inserted.
            rserge Serge Rogatch added a comment -

            Thank you for looking into our issue.

            rserge Serge Rogatch added a comment - Thank you for looking into our issue.
            stevenatdbaid steven chang added a comment -

            Guys,

            To my understanding, INSERT...ON DUPLICATE KEY UPDATE can only be applied tables with only one unique key (either pk or uk) . Your case have 2 .

            stevenatdbaid steven chang added a comment - Guys, To my understanding, INSERT...ON DUPLICATE KEY UPDATE can only be applied tables with only one unique key (either pk or uk) . Your case have 2 .

            People

              serg Sergei Golubchik
              rserge Serge Rogatch
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.