[MDEV-32669] DML statements written to audit logs but were not executed in the database Created: 2023-11-03  Updated: 2024-01-16

Status: Needs Feedback
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.4.12
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Jaya Karthik Karri Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Hello Team,

Today we have identified a weird situation.. We observed that two of the DML's are written to the audit logs with retcode 0 but these statements did not change the database table rows. I even analysed the binary logs during the time statements were written to audit logs but could not find them in the binary logs. Meaning, somehow these statements are not written to redo logs and table page were not updated. We are using innodb_flush_log_at_trx_commit =1 and sync_binlog=1, also we are using autocommit.

I'm wondering how can a statement is written to audit logs with success code before committing in the database and writing to binary logs. I hope I can't reproduce this issue but I have the audit logs and binary logs saved during that time. I could not find any error messages in the error logs during that time. It would be great if someone could help us understand what went wrong as now we fear there might be some DML's which are silently being ignored (not written to database tables or redo logs).. Let me know if you need more information..



 Comments   
Comment by Jaya Karthik Karri [ 2023-11-07 ]

Today I have identified, when our application job runs( monthly twice) it creates 4 parallel connections to the DB and INSERTS lot of data into multiple tables. Out of this 4 connections, one of the connection successfully inserted data into the tables initially (around 13000) and after certain time all the INSERTS (around 900) executed by this particular connection were recorded in the audit log but were not executed in the database. During this time rest of the connections were successfully inserting data into those tables, meaning there was no issue with the table locks etc, because if that was the case then other 3 sessions should not INSERT data... At the end of the JOB all the 4 connection successfully log out of the database ( I can see the DISCONNECT from audit logs)..

I assume that audit logging plugin is some how interacting in between the client and the database and stopping the client inserts reaching the database and also not giving any kind of errors back to application and application assumes all the inserts were successful.

I want some insights from MariaDB team to understand the issue.. It would be great if someone have a look into the issue.

Thank You
Jaya

Comment by Sergei Golubchik [ 2023-11-07 ]

can you attach the relevant part of the audit log, please?

Comment by Jaya Karthik Karri [ 2023-11-08 ]

Hi Sergei, Thanks for picking this ticket..

If I include all the missing statements the file size is very big, For reference I picked two queries which are written to audit log but not executed in DB.. Let me know if you need all the statements... I modified the db hostname, username and client IP in below statements.

20231101 09:10:08,db_hostname,DBService,<clinet IP>,160975436,6521714934,QUERY,payment,'DELETE FROM metadata  WHERE `key` in (\'lastUpdate\', \'currentRun\', \'lastPendingAggregation\')',0
 
20231101 09:12:15,db_hostname,DBService,<clinet IP>,160975436,6522063976,QUERY,payment,'INSERT INTO paymentIdTable (paymentId, sourceTable) VALUES (4558005, \'202011\'),(4562672, \'202011\'),(4591382, \'202011\'),(4643417, \'202011\'),(4659498, \'202011\')',0

Comment by Sergei Golubchik [ 2023-11-09 ]

What do you mean, there is nothing in binlog? Perhaps you replicate in row mode and DELETE did not find any rows to delete? Then you would not have any Delete_row events in the binary log. INSERT should've always worked, though.

Audit plugin doesn't (or at least shouldn't) interact with the client or error status, it's a passive observer. And it shouldn't be possible for it to log a failed INSERT as successfully completed.

Comment by Jaya Karthik Karri [ 2023-11-09 ]

I just pasted two sample queries, but we missed around 800 INSERT statements which are written to audit log with 0 retcode but not executed inside the database and not even logged to binary logs. We are using STATEMENT based binary logging. Later we manually inserted that data taking INSERT statements from audit logs. Let me know if you want me to attache few more statements from audit logs.

Comment by Jaya Karthik Karri [ 2023-11-21 ]

Hi Sergei, It would be great if we get some insights from MariaDB Team about the issue we faced.. so we can confidently use the MariaDB for our production databases...

Regards
Jaya

Comment by Sergei Golubchik [ 2024-01-16 ]

I don't see have any insights to provide and don't see how that could've happened.

Except that if your DML statements were part of the transaction that was later rolled back? Like

START TRANSACTION;
DELETE FROM metadata  WHERE `key` in ('lastUpdate', 'currentRun', 'lastPendingAggregation');
INSERT INTO paymentIdTable (paymentId, sourceTable) VALUES (4558005, '202011'),(4562672, '202011'),(4591382, '202011'),(4643417, '202011'),(4659498, '202011');
ROLLBACK;

You wrote you have autocommit enabled, but a transaction can be started explicitly too
If that was not the case — we need a way to reproduce this bug somehow.

Comment by Jaya Karthik Karri [ 2024-01-16 ]

We are not using transactions to run statements, I don't think we can reproduce this issue as we don't have a clue what happened. We have not seen this issue in the past couple of months..

Thanks again for your support. You can close this ticket as I can't reproduce the issue.

Generated at Thu Feb 08 10:33:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.