[CONC-659] The replication API always uses semi-sync Created: 2023-07-31  Updated: 2023-08-17

Status: Stalled
Project: MariaDB Connector/C
Component/s: Replication/Binlog API
Affects Version/s: 3.3.5
Fix Version/s: 3.3.6

Type: Bug Priority: Blocker
Reporter: markus makela Assignee: Georg Richter
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocks
blocks MXS-4668 Binlogrouter eventually stops working... Closed

 Description   

The replication API always expects that the two extra bytes are present that indicate whether a semi-sync acknowledgement needs to be sent. These two bytes are not sent if semi-sync replication is not enabled on the server where the connector replicates from.



 Comments   
Comment by markus makela [ 2023-08-01 ]

The replication API also needs a separate function for sending the semi-sync ACK as otherwise an acknowledgement might be sent before the event is stored on disk. The purpose of semi-sync is to make sure the replication event is received at least in one additional location before the transaction is committed on the origin server. Since the acknowledgement is currently sent before the caller even has a chance to process it, this weakens the guarantees that semi-sync replication is intended to provide. Based on this, I would say that using the replication API in version 3.3.5 with a MariaDB server that has semi-sync replication enabled is not recommended until there's a way to turn off the semi-sync code.

Comment by markus makela [ 2023-08-04 ]

As discussed, it would also be possible for the replication API to delay sending the ACK by one rpl_fetch() call so that when the next event is read, the ACK of the previous event is sent. This would make the semi-sync mechanism much easier to use and it should be a reasonable compromise between performance and ease of use. Compared to a separate ACK function, the automatic ACK is less efficient if the "durability" of the event is achieved early and the rest of the work done by the application is unrelated to the replication. One could argue that semi-sync is only useful in cases where the data is actually used as a recovery mechanism of some sorts (e.g. the binlogrouter in MaxScale) which means that a large part of the work done by the application would be the useful work of making the event durable.

Comment by markus makela [ 2023-08-15 ]

The code in the 3.3.6-MS tag is broken and does not seem to work. The following patch seems to fix it:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index eb034ad..642fdbe 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -135,7 +135,7 @@ SET(CMAKE_CXX_FLAGS_RELWITHDEBINFO "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} -DDBUG_OFF
 
 IF(CMAKE_COMPILER_IS_GNUCC)
   INCLUDE(CheckCCompilerFlag)
-  SET(GCC_FLAGS -Wunused -Wlogical-op -Wno-uninitialized  -Wall -Wextra -Wformat-security -Wno-init-self -Wwrite-strings -Wshift-count-overflow -Wdeclaration-after-statement -Wno-undef -Wno-unknown-pragmas)
+  SET(GCC_FLAGS -Wunused -Wlogical-op -Wno-uninitialized -Werror -Wall -Wextra -Wformat-security -Wno-init-self -Wwrite-strings -Wshift-count-overflow -Wdeclaration-after-statement -Wno-undef -Wno-unknown-pragmas)
   FOREACH(GCC_FLAG ${GCC_FLAGS})
     CHECK_C_COMPILER_FLAG("${GCC_FLAG}" HAS_${GCC_FLAG}_FLAG)
     IF(${HAS_${GCC_FLAG}_FLAG})
diff --git a/libmariadb/mariadb_rpl.c b/libmariadb/mariadb_rpl.c
index 4ebd3b2..2a8e9e4 100644
--- a/libmariadb/mariadb_rpl.c
+++ b/libmariadb/mariadb_rpl.c
@@ -962,7 +962,7 @@ static uint8_t mariadb_rpl_send_semisync_ack(MARIADB_RPL* rpl, MARIADB_RPL_EVENT
     rpl_set_error(rpl, CR_BINLOG_SEMI_SYNC_ERROR, 0, "semi synchronous replication is not enabled");
     return 1;
   }
-  if (!event->is_semi_sync || !event->semi_sync_flags != SEMI_SYNC_ACK_REQ)
+  if (!event->is_semi_sync || event->semi_sync_flags != SEMI_SYNC_ACK_REQ)
   {
     rpl_set_error(rpl, CR_BINLOG_SEMI_SYNC_ERROR, 0, "This event doesn't require to send semi synchronous acknoledgement");
     return 1;
@@ -972,7 +972,7 @@ static uint8_t mariadb_rpl_send_semisync_ack(MARIADB_RPL* rpl, MARIADB_RPL_EVENT
   buf = alloca(buf_size);
 
   buf[0] = SEMI_SYNC_INDICATOR;
-  int8store(buf + 1, event->next_event_pos);
+  int8store(buf + 1, (uint64_t)event->next_event_pos);
   memcpy(buf + 9, rpl->filename, rpl->filename_length);
 
   ma_net_clear(&rpl->mysql->net);
@@ -2084,6 +2084,7 @@ int STDCALL mariadb_rpl_get_optionsv(MARIADB_RPL *rpl,
   {
     unsigned int* semi_sync = va_arg(ap, unsigned int*);
     *semi_sync = rpl->is_semi_sync;
+    break;
   }
 
   default:

Comment by markus makela [ 2023-08-16 ]

I submitted a pull request that fixes the problem: https://github.com/mariadb-corporation/mariadb-connector-c/pull/229

Generated at Thu Feb 08 03:06:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.