[MDEV-11760] Weird duplicate entry on insert Created: 2017-01-10  Updated: 2017-01-11  Resolved: 2017-01-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 5.5.50, 10.1.20
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Andy Salnikov Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: need_feedback
Environment:

CentOS7 x86_64



 Description   

I'm getting occasional weird errors trying bulk insert into InnoDB table with a composite primary key. By bulk insert I mean "INSERT INTO TABLE (columns) VALUES (tuple1), (tuple2), ...". The number of tuples in a statement is quite large, in the order of 10k. Most of the time it works fine but occasionally I get errors like

sqlalchemy.exc.IntegrityError: (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry '1017678102-2020-01-05 02:54:45' for key 'PRIMARY'") [SQL: 'INSERT INTO "DiaObject" ...

(I run data loading from Python using sqlalchemy).

The schema for a table looks like:

CREATE TABLE "DiaObject" (
  "diaObjectId" bigint(20) NOT NULL AUTO_INCREMENT,
  "validityStart" timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  "validityEnd" timestamp NULL DEFAULT NULL,
  "lastNonForcedSource" timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  "ra" double NOT NULL,
  "raSigma" float NOT NULL,
  -- ~50 more columns
  "flags" bigint(20) NOT NULL,
  "htmId20" bigint(20) NOT NULL,
  PRIMARY KEY ("diaObjectId","validityStart"),
  KEY "IDX_DiaObject_validityStart" ("validityStart"),
  KEY "IDX_DiaObject_htmId20" ("htmId20")
) ENGINE=InnoDB

The validityStart is unique for each INSERT statement. sqlalchemy dumps the whole INSERT statement with the exception and I do not see a repeating diaObjectId there, so I am 100% sure there cannot be primary conflict in this case. And if I re-run the query which fails using command-line tool the INSERT works just fine.

I saw this first with CentOS7-standard 5.5.50 mariadb, later I upgraded from mariadb repo to current 10.1.20, but the issue still remains. I also saw the same error with a different table which is also InnoDB with a 2-column composite PK.

Did anyone see anything similar before? I can post more details if needed, I have complete query and I have a huge query log file taken with 5.5.50.



 Comments   
Comment by Andy Salnikov [ 2017-01-10 ]

One more observation that I had looking at those multiple failures. Our data has two sets of possible diaObjectId values, one is in the range 0-22M, second set is 1 billion and above. The INSERT statement has tuples ordered by diaObjectId so INSERT looks like:

INSERT INTO "DiaObject" ("diaObjectId","validityStart","validityEnd", ..., ,"flags","htmId20")
  VALUES 
(0,'2020-01-05 02:54:45',NULL,'2020-01-05 02:54:45',180.5557152710876,-3.545548360056213,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,10996812867709),
(497,'2020-01-05 02:54:45',NULL,'2020-01-05 02:54:45',183.0533180237157,-3.386658124787378,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,11011473617183),
(3456,'2020-01-05 02:54:45',NULL,'2020-01-05 02:54:45',182.84962581849368,-4.085850996191277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,11011292570412),
-- many more tuples
(21438550,'2020-01-05 02:54:45',NULL,'2020-01-05 02:54:45',181.6881834221752,-3.725247734858977,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,10996656988967),
(21438712,'2020-01-05 02:54:45',NULL,'2020-01-05 02:54:45',183.54836706566272,-2.6828919969996843,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,11009581270647),
-- start of the IDs 1 billion and above, this is the ID which causes mariadb error
(1017678102,'2020-01-05 02:54:45',NULL,'2020-01-05 02:54:45',183.5325548818888,-2.3912437851373647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,11009566420519),
(1017678103,'2020-01-05 02:54:45',NULL,'2020-01-05 02:54:45',182.19123817035205,-2.024478683765525,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,10999310848994),
-- many more tuples

In all cases of failure that I saw so far mariadb complains about first diaObjectId which is grater than 1 billion. Also in all observed cases the first tuple has diaObjectId=0. Not sure if this can be a hint to anything.

Comment by Elena Stepanova [ 2017-01-11 ]

Great observations in the last comment, it helps really a lot.
Given them, and unless you have NO_AUTO_VALUE_ON_ZERO which I guess you would have mentioned, the result looks quite natural. Consider the following sequence of events.

First, we create a new empty table.

CREATE TABLE DiaObject (
  diaObjectId bigint(20) NOT NULL AUTO_INCREMENT,
  validityStart timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  validityEnd timestamp NULL DEFAULT NULL,
  lastNonForcedSource timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  ra double NOT NULL,
  raSigma float NOT NULL,
  flags bigint(20) NOT NULL,
  htmId20 bigint(20) NOT NULL,
  PRIMARY KEY (diaObjectId,validityStart),
  KEY IDX_DiaObject_validityStart (validityStart),
  KEY IDX_DiaObject_htmId20 (htmId20)
) ENGINE=InnoDB;

Run first INSERT:

INSERT INTO DiaObject (diaObjectId,validityStart,validityEnd,lastNonForcedSource,ra,raSigma ,flags,htmId20)  VALUES 
(497,'2020-01-05 02:54:45',NULL,'2020-01-05 02:54:45',183.0533180237157,-3.386658124787378,0,11011473617183),
(1017678102,'2020-01-05 02:54:45',NULL,'2020-01-05 02:54:45',183.5325548818888,-2.3912437851373647,0,11009566420519);

Now the AUTO_INCREMENT attribute for the table is max(diaObjectId)+1, that's important.

CREATE TABLE `DiaObject` (
  `diaObjectId` bigint(20) NOT NULL AUTO_INCREMENT,
  `validityStart` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `validityEnd` timestamp NULL DEFAULT NULL,
  `lastNonForcedSource` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `ra` double NOT NULL,
  `raSigma` float NOT NULL,
  `flags` bigint(20) NOT NULL,
  `htmId20` bigint(20) NOT NULL,
  PRIMARY KEY (`diaObjectId`,`validityStart`),
  KEY `IDX_DiaObject_validityStart` (`validityStart`),
  KEY `IDX_DiaObject_htmId20` (`htmId20`)
) ENGINE=InnoDB AUTO_INCREMENT=1017678103 DEFAULT CHARSET=latin1

Then we run the INSERT of the pattern given above:

INSERT INTO DiaObject (diaObjectId,validityStart,validityEnd,lastNonForcedSource,ra,raSigma ,flags,htmId20)  VALUES 
(0,'2020-01-05 03:54:45',NULL,'2020-01-05 02:54:45',180.5557152710876,-3.545548360056213,0,10996812867709),
(500,'2020-01-05 03:54:45',NULL,'2020-01-05 02:54:45',183.0533180237157,-3.386658124787378,0,11011473617183),
(1017678103,'2020-01-05 03:54:45',NULL,'2020-01-05 02:54:45',182.19123817035205,-2.024478683765525,0,10999310848994);

The first row with zero uses the AUTO_INCREMENT column and thus inserts the PK 1017678103-2020-01-05 03:54:45.
The second row is inserted normally with given values.
But the third row has the explicit PK 1017678103-2020-01-05 03:54:45, so it fails.

Comment by Andy Salnikov [ 2017-01-11 ]

Nice, thanks for explanation! Although the behavior seems a bit confusing to me as I would expect AUTO_INCREMENT to apply only when value is NULL (or missing) and not 0, but my expectations are probably be wrong. I guess the reason why I see this is AUTO_INCREMENT attribute for diaObjectId column, so I need to figure out why sqlalchemy decided to add that attribute to the column and disable that.

Thanks again,
Andy

Comment by Elena Stepanova [ 2017-01-11 ]

Your expectations aren't really wrong, it can be both ways, that's why the above mentioned sql_mode=NO_AUTO_VALUE_ON_ZERO exists. It's not default, but if you want zeros to remain zeros, you can set it in your config file. Here is the difference:

MariaDB [test]> create table t1 (pk int auto_increment primary key);
Query OK, 0 rows affected (0.40 sec)
 
MariaDB [test]> set sql_mode='';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> insert into t1 values (0);
Query OK, 1 row affected (0.04 sec)
 
MariaDB [test]> insert into t1 values (0);
Query OK, 1 row affected (0.05 sec)
 
MariaDB [test]> select * from t1;
+----+
| pk |
+----+
|  1 |
|  2 |
+----+
2 rows in set (0.00 sec)

MariaDB [test]> set sql_mode='NO_AUTO_VALUE_ON_ZERO';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> insert into t1 values (0);
Query OK, 1 row affected (0.05 sec)
 
MariaDB [test]> select * from t1;
+----+
| pk |
+----+
|  0 |
|  1 |
|  2 |
+----+
3 rows in set (0.00 sec)
 
MariaDB [test]> insert into t1 values (0);
ERROR 1062 (23000): Duplicate entry '0' for key 'PRIMARY'

Generated at Thu Feb 08 07:52:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.