[MXS-3353] Tee filter loses statements if branch target is slower Created: 2020-12-24 Updated: 2022-07-21 Resolved: 2021-09-28 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | tee |
| Affects Version/s: | 2.5.6, 2.5.10 |
| Fix Version/s: | 6.2.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Oli Sennhauser | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Linux, Debian 10, n.a. |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Sprint: | MXS-SPRINT-122, MXS-SPRINT-140 | ||||||||||||
| Description |
|
When we are running our simple insert test or our a bit more complex mixed test (SIUD) at a very high pace (10 us delay between statements) maxscale is not sending all statements to the tee'd instance after a short time. |
| Comments |
| Comment by Oli Sennhauser [ 2020-12-24 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Example: | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2020-12-27 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
The tee filter will execute the statements on the branch target asynchronously. If I'm understanding this correctly, you are doing inserts and then immediately checking the results which is not currently supported by the filter. This currently doesn't appear to be mentioned in the filter documentation which we should correct first. If you believe that this would be a useful feature, please submit a new feature request. I'll convert this into a bug about the filter documentation. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oli Sennhauser [ 2021-01-04 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
@Markus: I was waiting for a while after the inserts to give MaxScale a chance to catch up. But the queries did not arrive and the connection aborted in an unclean way. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2021-01-05 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
When you are seeing these lost statements, how soon after the write is the read being done? If you do the writes and then read the data several seconds later do you see all the rows or are still some rows lost? I'm trying to understand whether this is a case of the transactions not being committed on the tee'd backend or if the statements are never sent there. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oli Sennhauser [ 2021-01-06 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Hi Markus I repeated the test. Please find all the information attached. You can also provoke the situation when you set on the tee'd instance innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1 while you have these values set to 2 and 0 on the primary instance. Even with the slow pace: Test with slow pace (10000 us sleep, everything is fine): shell> date ; ./mixed_test.php Primary instance: mariadb-105 SQL> SELECT COUNT
---------
--------- End of General Log: 330 Query SELECT * FROM test WHERE id = 383966 Tee'd instance: mysql-57 SQL> SELECT COUNT
---------
--------- End of General Log: 2021-01-06T08:19:26.942277Z 360 Query SELECT * FROM test WHERE id = 383966 Test with fast pace (10 us sleep, rows are missing): shell> date ; ./mixed_test.php ; date Primary instance: mariadb-105 350 Query SELECT * FROM test WHERE id = 934 SQL> TRUNCATE TABLE test.test;
---------
--------- Secondary instance: mysql-57 2021-01-06T08:27:30.470174Z 383 Query DELETE FROM test WHERE id = 61171 SQL> TRUNCATE TABLE test.test;
---------
--------- 570 rows missing. Now warning in the MaxScale error log. The test table: CREATE TABLE `test` ( And the test script: | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oli Sennhauser [ 2021-01-06 ] | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oli Sennhauser [ 2021-01-06 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
All the other details here: https://fromdual.com/traffic-mirroring-with-mariadb-maxscale | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2021-01-08 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
I've managed to reproduce this with a local setup. This seems to happen much more easily when the branched target is slower than the main target. I've created | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2021-05-03 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
The fix to this was inadequate and it caused a session reference leak. This leak made it appear as if it worked as expected. |