[MDEV-16248] Row based replication to spider with float column fails on delete/update Created: 2018-05-22  Updated: 2021-08-11  Resolved: 2019-07-06

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Spider
Affects Version/s: 10.3.7
Fix Version/s: 10.4.7

Type: Bug Priority: Critical
Reporter: Mattias Jonsson Assignee: Kentoku Shiba (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Linux


Attachments: File mdev_16248.cnf     File mdev_16248.result     File mdev_16248.test    
Issue Links:
Blocks
is blocked by MDEV-16872 Add CAST(expr AS FLOAT) Closed
Duplicate
is duplicated by MDEV-26341 Spider - Update sharding key fails if... Closed
Relates
relates to MDEV-18989 Wrong result update float_column in u... Closed

 Description   

Replicating using binlog_format = ROW to a spider table can fail for float columns for DELETE and UPDATE statements, due to spider sends delete/update queries to its datanodes that tries to match every column in the table definition, which may fail since you cannot do a comparison like 'float_column = 0.671437' reliably.

After finding a repeatable test case (see attached test case files) i have also updated this description. I missed to add the 'UPDATE' to the test case, but just after the DELETE statement, add:
UPDATE t1 SET f = 0.632 WHERE id = 2;

Also notice in the .cnf file there is an additional way to even make the master fail.

I have not tested federatedX, but assume it suffers from the same issue?

The rest of this description is from the original bug report, where we found the inconsistency. This happened due to this bug but we found it first due to the row was never deleted, so the following part is more of a part of our troubleshooting:
This was found after stopping an intermediate master, dumping the table (using --where) to several different spider data nodes (thus partitioning the data), setting up a spider head using those spider data nodes.

Every now and then the replication fails with:
Could not execute Write_rows event on table test_db.test_table; Duplicate entry '1234567890' for key 'PRIMARY', Error_code: 1062; handler error No Error!; the event's master log binlog.000001, end_log_pos 12345678

From the error log on the spider head:
20180522 13:35:40 [SEND SPIDER SQL] from 8688 to [testserver2.example.com] 22452051: sql: insert high_priority into
`test_db`.`test_table2`(`id`,...)values(1234567890,...)
20180522 13:35:40 [ERROR SPIDER RESULT] to 8688: 1062 Duplicate entry '1234567890' for key 'PRIMARY'
2018-05-22 13:35:40 8688 [ERROR] Slave SQL: Could not execute Write_rows event on table test_db.test_table2; Duplicate entry '1234567890' for key 'PRIMARY', Error_code: 1062; handler error No Error!; the event's master log binlog.000001, end_log_pos 12345678, Internal MariaDB error code: 1062
2018-05-22 13:35:40 8688 [Warning] Slave: Duplicate entry '1234567890' for key 'PRIMARY' Error_code: 1062

From the query log on the spider head:
8688 Query BEGIN
8688 Query mysql testserver2.example.com set session transaction isolation level read committed ;set session autocommit = 1;start transaction
8688 Query mysql testserver2.example.com insert high_priority into `test_db`.`test_table_1`(`id`,...)values(1234567890,...)
8688 Query mysql testserver2.example.com insert high_priority into `test_db`.`test_table_2`(`id`,...)values(1234567890,...)
8688 Query mysql testserver2.example.com rollback

From the query log one of the data nodes:
12345678 Connect test_user@testserver.example.com as anonymous on
12345678 Query set session transaction isolation level read committed;set session autocommit = 1;start transaction
12345678 Query SET NAMES latin1
12345678 Init DB test_db
12345678 Query insert high_priority into `test_db`.`test_table1`(`id`,...)values(1234567890,...)
12345678 Query insert high_priority into `test_db`.`test_table2`(`id`,...)values(1234567890,...)
12345678 Query rollback
12345678 Quit

And we can see a row in test_table_2 with that id, but not in test_table_1.

show slave status output:

Slave_IO_State: Waiting for master to send event
Master_Host: intermediate_test_master
Master_User: replication_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000537
Read_Master_Log_Pos: 94002275
Relay_Log_File: relaylog.001004
Relay_Log_Pos: 86488413
Relay_Master_Log_File: binlog.000051
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on table test_db.test_table_2; Duplicate entry '1234567890' for key 'PRIMARY', Error_code: 1062; handler error No Error!; the event's master log binlog.000051, end_log_pos 87949203
Skip_Counter: 0
Exec_Master_Log_Pos: 87948048
Relay_Log_Space: 49417412287
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Could not execute Write_rows event on table test_db.test_table_2; Duplicate entry '1234567890' for key 'PRIMARY', Error_code: 1062; handler error No Error!; the event's master log binlog.000051, end_log_pos 87949203
Replicate_Ignore_Server_Ids:
Master_Server_Id: 217056038
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 0



 Comments   
Comment by Mattias Jonsson [ 2018-05-24 ]

Changing to minor, since currently I suspect something with our test replication setup.

Comment by Mattias Jonsson [ 2018-05-24 ]

The duplicate PK is correct, but due to a DELETE was missed in the replication flow. Still investigating.

Comment by Mattias Jonsson [ 2018-05-24 ]

I think the DELETE that did not remove the row, failed due to the replicated deleted row had a float column, which did not match the full delete query sent by the spider node, like:
From the query log on the spider head node:
53840 Query mysql test_server1.example.com delete from `test_db`.`test_table` where `id` = 1 and `float_col` = 0.671437 limit 1

And since the float_col probably compares as DOUBLE (or some other floating number comparison issues) it does not get any match and did not actually delete the old row.
When update/deletes comes via replication:
Should the delete/update WHERE clause only be done by PK?
Should the float columns be compared as strings? (using "`float_col` LIKE 0.671437" seems to work)

Comment by Mattias Jonsson [ 2018-05-25 ]

Also the binlog was created with the default binlog_row_image=FULL
Another way to solve this and https://bugs.mysql.com/bug.php?id=83788 would be to add functions to convert between float32 and uint32 as well as between double64 and uint64.

It could be implemented like: https://godbolt.org/g/3SGZq8
(memcpy is more clear that using union and will be compiled to the same instructions: https://godbolt.org/g/hiX8oL )

We will try with binlog_row_image=minimal.

Comment by Eric Herman [ 2018-06-28 ]

I have created https://github.com/MariaDB/server/pull/805 which adds functions which might be useful in this case as well.

See also: https://bugs.mysql.com/bug.php?id=83788

Comment by Mattias Jonsson [ 2018-07-22 ]

I think this shows the real issue:

1> create table tfdd (id int unsigned primary key, f float, d double,
decimal_col decimal(30,20));
2> insert into tfdd values (1, 0.632, 0.632, 0.632);
3> insert into tfdd SELECT 2, f, f, f from tfdd where id = 1;
4> insert into tfdd SELECT 3, d, d, d from tfdd where id = 1;
5> select * from tfdd;
+----+-------+--------------------+------------------------+
| id | f     | d                  | decimal_col            |
+----+-------+--------------------+------------------------+
|  1 | 0.632 |              0.632 | 0.63200000000000000000 |
|  2 | 0.632 | 0.6320000290870667 | 0.63200002908706670000 |
|  3 | 0.632 |              0.632 | 0.63200000000000000000 |
+----+-------+--------------------+------------------------+
6> select * from tfdd where decimal_col = 0.632000000000000000001;
Empty set (0.00 sec)
7> select * from tfdd where d = 0.632000000000000000001;
+----+-------+-------+------------------------+
| id | f     | d     | decimal_col            |
+----+-------+-------+------------------------+
|  1 | 0.632 | 0.632 | 0.63200000000000000000 |
|  3 | 0.632 | 0.632 | 0.63200000000000000000 |
+----+-------+-------+------------------------+
2 rows in set (0.00 sec)
 
8> select * from tfdd where d = 0.632;
+----+-------+-------+------------------------+
| id | f     | d     | decimal_col            |
+----+-------+-------+------------------------+
|  1 | 0.632 | 0.632 | 0.63200000000000000000 |
|  3 | 0.632 | 0.632 | 0.63200000000000000000 |
+----+-------+-------+------------------------+
2 rows in set (0.00 sec)
 
9> select * from tfdd where f = 0.632;
Empty set (0.00 sec)
 
10> select * from tfdd where f = 0.6320000290870667;
+----+-------+--------------------+------------------------+
| id | f     | d                  | decimal_col            |
+----+-------+--------------------+------------------------+
|  1 | 0.632 |              0.632 | 0.63200000000000000000 |
|  2 | 0.632 | 0.6320000290870667 | 0.63200002908706670000 |
|  3 | 0.632 |              0.632 | 0.63200000000000000000 |
+----+-------+--------------------+------------------------+
3 rows in set (0.00 sec)

We need to specify the value after FLOAT -> DOUBLE conversion is done for float columns!

Comment by Jacob Mathew (Inactive) [ 2018-07-24 ]

I will continue working on this issue after Bar's changes to the FLOAT data type are committed to 10.3.

Comment by Eric Herman [ 2018-07-30 ]

Is there an MDEV or a branch or anything to track Bar's changes to FLOAT?

(I don't see one, with a quick search)

Comment by Jacob Mathew (Inactive) [ 2018-08-01 ]

Bar's tasks for his changes to the FLOAT data type are MDEV-16872 and MDEV-16873.

Comment by Jacob Mathew (Inactive) [ 2018-10-17 ]

Reassigning to Bar, who has been working on MDEV-16873 and MDEV-16872.

Comment by Alexander Barkov [ 2019-03-21 ]

Kentoku, if we implement a new cast type:

CAST(expr AS FLOAT)

would it help to solve this issue?

We currently have only CAST( AS DOUBLE).

Here's what happens:

The original query:

DELETE FROM t1 WHERE id = 1;

is translated to this binary log entry:

### DELETE FROM `auto_test_remote`.`t1`
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=0.671437             /* FLOAT meta=4 nullable=1 is_null=0 */

which is, when sent to the backend, is translated to:

delete from `auto_test_remote2`.`t1` where `id` = 1 and `f` = 0.671437 limit 1

Wrapping float constants to CAST should help:

delete from `auto_test_remote2`.`t1` where `id` = 1 and `f` = CAST(0.671437 AS FLOAT) limit 1

Comment by Alexander Barkov [ 2019-03-21 ]

Here's the proof of the idea, which emulates CAST(AS FLOAT) using a stored function:

DROP FUNCTION IF EXISTS to_float;
DROP TABLE IF EXISTS t1;
CREATE FUNCTION to_float(a DECIMAL(20,6)) RETURNS FLOAT RETURN CAST(a AS DOUBLE);
CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY, a FLOAT) ENGINE=MyISAM;
INSERT INTO t1 VALUES (1, 0.671437),(2, 0.671437);
DELETE FROM t1 WHERE a=0.671437;
SELECT * FROM t1;

+----+----------+
| id | a        |
+----+----------+
|  1 | 0.671437 |
|  2 | 0.671437 |
+----+----------+

DELETE FROM t1 WHERE a=to_float(0.671437);
SELECT * FROM t1;

Empty set (0.00 sec)

Comment by Kentoku Shiba (Inactive) [ 2019-04-16 ]

bar
CAST(AS FLOAT) is helpful for this case. Please add it.

Comment by Alexander Barkov [ 2019-05-16 ]

"MDEV-16872 Add CAST(expr AS FLOAT)" is now pushed to 10.3
As agreed, Kentoku will take over this issue from this point.

Comment by Kentoku Shiba (Inactive) [ 2019-05-24 ]

willhan
I fixed this using CAST(AS FLOAT). Would you please review the following commit?
319697f

Comment by willhan [ 2019-06-28 ]

kentoku
delete from `auto_test_remote`.`tbl_a` where `pkey` = 1 and `f` = cast(0.671437 as float) limit 1

which mysql/mariadb version can support cast(0.671437 as float) ?

Comment by Alexander Barkov [ 2019-06-28 ]

`MDEV-16872 Add CAST(expr AS FLOAT)` was added in 10.4.5.

Comment by willhan [ 2019-07-05 ]

kentoku

It's OK

Comment by Kentoku Shiba (Inactive) [ 2019-07-06 ]

Thank you for checking this.
it's merged.

Generated at Thu Feb 08 08:27:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.