[MXS-3939] Debug assertion during transaction replay Created: 2022-01-07  Updated: 2022-01-13  Resolved: 2022-01-13

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 6.2.0
Fix Version/s: 6.2.1

Type: Bug Priority: Major
Reporter: markus makela Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None


 Description   

This debug assertion was seen during transaction replay when a COM_STMT_PREPARE was being replayed:

debug assert at /home/timofey_turenko_mariadb_com/MaxScale/server/modules/protocol/MariaDB/mariadb_backend.cc:906 failed: History response callback must not be installed on failure (ok || !data->history_info[this].response_cb)
 
  /lib64/libpthread.so.0(+0xf630): sigaction.c:?
  /lib64/libpthread.so.0(raise+0x2b): ??:?
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN24MariaDBBackendConnection17compare_responsesEv+0x7b0): server/modules/protocol/MariaDB/mariadb_backend.cc:905 (discriminator 15)
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN24MariaDBBackendConnection11normal_readEv+0x108a): server/modules/protocol/MariaDB/mariadb_backend.cc:749
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN24MariaDBBackendConnection17ready_for_readingEP3DCB+0x607): server/modules/protocol/MariaDB/mariadb_backend.cc:507
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB14process_eventsEj+0x980): server/core/dcb.cc:1261
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB13event_handlerEPS_j+0x2f): server/core/dcb.cc:1313
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB12poll_handlerEP13MXB_POLL_DATAP10MXB_WORKERj+0x8e): server/core/dcb.cc:1352
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0xf72): maxutils/maxbase/src/worker.cc:854
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x207): maxutils/maxbase/src/worker.cc:563
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker11thread_mainEPS0_PNS_9SemaphoreE+0x23): maxutils/maxbase/src/worker.cc:688
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZSt13__invoke_implIvPFvPN7maxbase6WorkerEPNS0_9SemaphoreEEJS2_S4_EET_St14__invoke_otherOT0_DpOT1_+0xac): /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/invoke.h:60
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZSt8__invokeIPFvPN7maxbase6WorkerEPNS0_9SemaphoreEEJS2_S4_EENSt15__invoke_resultIT_JDpT0_EE4typeEOS8_DpOS9_+0xe3): /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/invoke.h:95
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt6thread8_InvokerISt5tupleIJPFvPN7maxbase6WorkerEPNS2_9SemaphoreEES4_S6_EEE9_M_invokeIJLm0ELm1ELm2EEEEDTcl8__invokespcl10_S_declvalIXT_EEEEESt12_Index_tupleIJXspT_EEE+0x5f): /opt/rh/devtoolset-7/root/
usr/include/c++/7/thread:234
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt6thread8_InvokerISt5tupleIJPFvPN7maxbase6WorkerEPNS2_9SemaphoreEES4_S6_EEEclEv+0x95): /opt/rh/devtoolset-7/root/usr/include/c++/7/thread:243
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvPN7maxbase6WorkerEPNS3_9SemaphoreEES5_S7_EEEEE6_M_runEv+0x1c): /opt/rh/devtoolset-7/root/usr/include/c++/7/thread:186
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0xa4c13f): thread48.o:?
  /lib64/libpthread.so.0(+0x7ea5): pthread_create.c:?
  /lib64/libc.so.6(clone+0x6d): ??:?

Upon further study, it seems that this happens when a transaction is replayed due to a deadlock in a transaction by a command that is considered a session command. This causes the same session command ID to be added multiple times to the list of IDs to check which leaves one extra value in the container.



 Comments   
Comment by markus makela [ 2022-01-08 ]

The fix to MXS-3924 causes this to be possible as the interrupted command of a transaction can now be a session command. Previously this would have killed the session before this problem would've taken place. With the fix in place, the same command appears to be executed multiple times from the backend connection's viewpoint. This means that the old method of storing all responses (previously of which there were only one) does not work and it must be changed to only store the latest response.

Even if only the latest command is stored, it is still possible for the same session command to be validated twice. It is possible that the original response installs the confirmation callback, expecting it to be called when the accepted response arrives. When the session command is retried, it is possible that the accepted answer arrives before the second execution of the session command completes on a backend. In this case the original response is verified when the confirmation callback is called when the accepted response arrives as well as when the final response from the backend arrives. If both responses are identical, there are no problems. If either one of them is different, the connection will be discarded due to session command mismatches.

Generated at Thu Feb 08 04:25:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.