MySQL 5.7.4 changed the behaviour of INSERT…ON DUPLICATE KEY UPDATE in the InnoDB storage engine. Upon encountering a duplicate key, it would no longer directly fall back to INSERT, but instead it would proceed to acquire an exclusive lock on every index record for the row on which the UPDATE failed.
The extra locking was motivated by a public bug report: MySQL Bug#50413 insert on duplicate key update sometimes writes binlog position incorrectly (Oracle internal BUG#11758237). The fix was followed up by a couple of regression fixes. For one user, reverting these changes significantly reduces the deadlock rate of INSERT…ON DUPLICATE KEY UPDATE.
There also is a related MySQL Bug #52020 InnoDB can still deadlock on just INSERT...ON DUPLICATE KEY. One of the factors was that when the pluggable storage engine interface was created in MySQL 5.1, the function innobase_query_is_update() was removed without replacement, and MySQL Bug #7975 (which lacked a test case) was reintroduced.
In a comment in MySQL Bug #52020 I anticipated that the deadlocks would be caused in a scenario where the INSERT phase fails, then some other transaction locks some of the index records, causing the ON DUPLICATE KEY UPDATE phase to wait for those locks or to deadlock. Acquiring the locks for all index records already in the INSERT phase would make the UPDATE phase wait-free, but it could cause more conflicts with other accesses, as the hold time of the locks is extended.
valerii posted some insightful comments on Bug #52020. I hope he can construct a test case that demonstrates the increased deadlock rate, so that we can see what can be improved here.
Sven Sandberg suggested in MySQL Bug #50413 that the INSERT phase should have acquired a gap lock, so that conflicting INSERT with that key would be prevented. I assume that he meant the PRIMARY key, because his example involves two unique keys: PRIMARY KEY(a), UNIQUE KEY(b). He also filed MySQL Bug #58637 Mark INSERT...ON DUPLICATE KEY UPDATE unsafe when there is more than one key.
Apparently the nondeterminism that the extra locking is trying to prevent is caused by the ambiguity of the ON DUPLICATE KEY syntax. It does not specify the key! An unambiguous syntax would be something like:
INSERT INTO t1 VALUES(1,2,3) ON DUPLICATE KEY(PRIMARY) UPDATE …;
|
INSERT INTO t1 VALUES(1,2,3) ON DUPLICATE KEY(u) UPDATE …;
|
Statement-based replication is obviously affected by this ambiguity.
I hope that Elkin and seppo can comment on whether row-based replication and parallel forms of replication (including Galera Cluster and MySQL 5.7 group replication) are affected, and how exactly the operations would be logged by the master and applied on the slave.
Note: Comments in MySQL Bug #50413 suggest that innodb_autoinc_lock_mode settings 0 and 1 are equivalent in this respect. I’d also like to know whether this parameter is at all relevant outside statement-based replication (that is, when innodb_autoinc_lock_mode=2 could be safe to use). With the setting 2, InnoDB does not acquire any auto-increment lock within the transaction. With the settings 0 or 1, InnoDB will hold a lock until the end of the current statement. This would suggest that the setting only matters in statement-based replication.
Galera replication exercises optimistic concurrency control, whatever happens during the transaction processing, in master node, does not really matter. At commit time, Galera populates a replication write set which will contain binlog events for the transaction and key information for modified rows (primary keys, unique keys and foreign keys). If insert succeeds, there should be write rows events, and if insert execution deviated for updating, there should be update rows events in the binlog events set.
In the slave node side, the write set is applied directly by using primary keys in the respective binlog events, so this is rather straightforward operation, and does not involve excessive locking.
However, there will be some harm for replication performance, in multi-master topologies, if transaction locks more rows than what will be needed during the write set applying phase in slave node. In such "asymmetric locking situation", the INSERT...ON DUPLICATE... execution, which has advanced in replication phase, may still end up as victim for earlier replicated transaction, and it has to rollback. If the conflict happened over such locks, which are not used in applying phase, Galera will still abort the transaction and immediately replay. This rollback-replay cycle, in master node, will slow down the master node somewhat, especially if such non related conflicts are frequent.