[MDEV-11760] Weird duplicate entry on insert Created: 2017-01-10 Updated: 2017-01-11 Resolved: 2017-01-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 5.5.50, 10.1.20 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Andy Salnikov | Assignee: | Unassigned |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | need_feedback | ||
| Environment: |
CentOS7 x86_64 |
||
| Description |
|
I'm getting occasional weird errors trying bulk insert into InnoDB table with a composite primary key. By bulk insert I mean "INSERT INTO TABLE (columns) VALUES (tuple1), (tuple2), ...". The number of tuples in a statement is quite large, in the order of 10k. Most of the time it works fine but occasionally I get errors like
(I run data loading from Python using sqlalchemy). The schema for a table looks like:
The validityStart is unique for each INSERT statement. sqlalchemy dumps the whole INSERT statement with the exception and I do not see a repeating diaObjectId there, so I am 100% sure there cannot be primary conflict in this case. And if I re-run the query which fails using command-line tool the INSERT works just fine. I saw this first with CentOS7-standard 5.5.50 mariadb, later I upgraded from mariadb repo to current 10.1.20, but the issue still remains. I also saw the same error with a different table which is also InnoDB with a 2-column composite PK. Did anyone see anything similar before? I can post more details if needed, I have complete query and I have a huge query log file taken with 5.5.50. |
| Comments |
| Comment by Andy Salnikov [ 2017-01-10 ] | ||||||||||||||||||||||||||||||||||||||
|
One more observation that I had looking at those multiple failures. Our data has two sets of possible diaObjectId values, one is in the range 0-22M, second set is 1 billion and above. The INSERT statement has tuples ordered by diaObjectId so INSERT looks like:
In all cases of failure that I saw so far mariadb complains about first diaObjectId which is grater than 1 billion. Also in all observed cases the first tuple has diaObjectId=0. Not sure if this can be a hint to anything. | ||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-01-11 ] | ||||||||||||||||||||||||||||||||||||||
|
Great observations in the last comment, it helps really a lot. First, we create a new empty table.
Run first INSERT:
Now the AUTO_INCREMENT attribute for the table is max(diaObjectId)+1, that's important.
Then we run the INSERT of the pattern given above:
The first row with zero uses the AUTO_INCREMENT column and thus inserts the PK 1017678103-2020-01-05 03:54:45. | ||||||||||||||||||||||||||||||||||||||
| Comment by Andy Salnikov [ 2017-01-11 ] | ||||||||||||||||||||||||||||||||||||||
|
Nice, thanks for explanation! Although the behavior seems a bit confusing to me as I would expect AUTO_INCREMENT to apply only when value is NULL (or missing) and not 0, but my expectations are probably be wrong. I guess the reason why I see this is AUTO_INCREMENT attribute for diaObjectId column, so I need to figure out why sqlalchemy decided to add that attribute to the column and disable that. Thanks again, | ||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-01-11 ] | ||||||||||||||||||||||||||||||||||||||
|
Your expectations aren't really wrong, it can be both ways, that's why the above mentioned sql_mode=NO_AUTO_VALUE_ON_ZERO exists. It's not default, but if you want zeros to remain zeros, you can set it in your config file. Here is the difference:
|