[MDEV-29044] Lost update anomaly found in Galera-Maria cluster Created: 2022-07-06  Updated: 2023-03-09

Status: Open
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Si Liu Assignee: Daniele Sciascia
Resolution: Unresolved Votes: 3
Labels: Galera, lost-update

Attachments: File client-result-debug.log     File client-result-sync.log     File client-result.log     PDF File lost-update.pdf     File my.cnf     File my.conf-sync     Zip Archive server-log-sync.zip     Zip Archive server-logs-debug.zip     Zip Archive server-logs.zip     Zip Archive tools.zip    

 Description   

If I understand it correctly, transactions issued on different nodes in a Galera-Maria cluster should be free of lost update anomalies, as claimed in https://galeracluster.com/library/training/tutorials/supporting-transaction-isolation-levels.html. However, we have observed such anomaly in our tests.

Experimental Setup

We have set up our tests with the following configuration:

We run a cluster of two nodes with docker-compose, using mariadb 10.7.3. We use a simple database schema with a single two-column table and each row representing a key-value pair:

CREATE TABLE IF NOT EXISTS dbcop.variables (var BIGINT(64) UNSIGNED NOT NULL PRIMARY KEY, val BIGINT(64) UNSIGNED NOT NULL)

We have one client session on each node; both sessions are executed concurrently. First, we initialize the table (using a different session) with values set to 0. After the initialization, both clients run a stream of transactions produced by our workload generator. The values written by the clients are guaranteed to be unique.

Query results

The results observed by the client are in client-result.log:

Transaction (0, 0)
WRITE(0, 0)
 
...

where each transaction is identified by (session id, txn id). In a session, transactions with larger txn ids are executed after those with smaller ids finish. Queries are shown as READ(var, val) or WRITE(var, val).

When running the experiment, a failed transaction is retried until all its operations succeed and the transaction is successfully committed. Only the successful shot is logged on the client side, and all earier executions are shown as ROLLBACK in the query logs.

The lost-update violation

Both transactions txn (1, 5) and txn (2, 13) read var=0, val=4 written by txn (1, 4) and both transaction successfully commit their writes on var=0. A lost update found! See lost-update.pdf for the complete scenario.

Database Logs

The server-side logs of the nodes are attached in server-logs.zip. The query logs and error logs are stored in mariadb_general.log and mariadb_error.log. The binary logs are in mariadb_bin*.

Reproducing the violation

The tools to reproduce the violation are attached in tools.zip.

1. Start the galera cluster in docker

The docker-compose file is generator/docker/docker-compose.yml. Database
logs are stored in /tmp/ in the containers.

2. Record and verify a history

First build the tools:

    # tested on ubuntu 22.04
    apt install g++ openjdk-11-jdk cmake libgmp-dev zlib1g-dev pkg-config libssl-dev cargo
    cd generator
    cargo b --release
    cd ../PolySI
    ./gradlew jar
    cd ..
    

Then generate txns and record history:

    # Run txns and record history
    rm -rf /tmp/a/ /tmp/b
    ./generator/target/release/dbcop generate -d /tmp/a/ -e 2 -n 2 --nhist 1 -t 10 -v 1
    ./generator/target/release/dbcop run --db galera -d /tmp/a/ -o /tmp/b/ 127.0.0.1:3306 127.0.0.1:3307
    
    # requires Java 11; will print [[[ REJECT ]]] if a violation is found
    java -jar ./PolySI/build/libs/PolySI-1.0.0-SNAPSHOT.jar audit -t dbcop --no-pruning /tmp/b/hist-00000/history.bincode
    
    # print history
    java -jar ./PolySI/build/libs/PolySI-1.0.0-SNAPSHOT.jar dump -t dbcop /tmp/b/hist-00000/history.bincode
    

Note that, since we are running black-box testing with randomized workloads, we cannot reproduce exactly the same violating histories. As we observed, violations manifest very frequently, e.g., we observed 8 violating histories out of 10 collected histories.



 Comments   
Comment by Si Liu [ 2022-07-06 ]

@sciascid @Daniele Sciascia

FYI. Regardless of the confusing claim on SI (see our discussion at https://github.com/codership/galera/issues/609), there seems no doubt that Galera-Maria prohibits the lost update anomaly https://galeracluster.com/library/training/tutorials/supporting-transaction-isolation-levels.html

BTW, how's the improvement to the documentation going https://github.com/codership/galera/issues/609#issuecomment-1046763283 ?

Comment by Jan Lindström (Inactive) [ 2022-08-23 ]

I must say output is confusing. Let's take example where r is read and w is write and c is commit and numbers used in tn is transaction identifier.

r1(x=0) w1(x=6) c1 r2(x=6) w2(x=7) c2

Now it looks like x=6 is lost but this is serializable (actually serial) execution of transactions t1->t2 and totally correct. To have lost update you should have something like:

r1(x=0) r2(x=0) w1(x=6) c1 w2(x=7) c2

Now transaction t2 has read old value and still updates it with new value ==> we lost update x=6.

Comment by Jan Lindström (Inactive) [ 2022-08-24 ]

Looking into client-result.log file this looks suspicious

Transaction (1, 7)
READ(0, 19)
WRITE(0, 7)

But why tool does not print exact order of operations? Here read gets value 19 I assume but where did it get it? There has to be transaction before this that has written that value but in output the write is later.

Comment by Zhenge Chen [ 2022-08-24 ]

Hi, this write is done in another session in transaction (2, 19). (1, 7) means (session id, transaction id). We are preparing a detail explanation and will upload soon.

Comment by Si Liu [ 2022-08-24 ]

Hi Jan,

We are updating the bug report: files uploaded and readme on the way. We will let you know when it's done.

Comment by Si Liu [ 2022-08-24 ]

Hi Jan,

We have finished the update on the bug issue description. Please let us know if any question.

Comment by Jan Lindström (Inactive) [ 2022-08-24 ]

nobiplusplus May I have your node configuration i.e my.cnf. I'm especially interested what value you use for wsrep_sync_wait variable (I would assume it is 15). For picture (2,11) means that write is issued on node_2 (second node)? But, how you know execution order in node_1 (first node) as in there it is executed as binlog event. In my understanding Galera does not support global distributed consistency up to serializable level.

Comment by Si Liu [ 2022-08-24 ]

Hi Jan @Jan Lindström

A quick reply:

1. Zhenge Chen will upload my.cnf later.

2. Yes, (2,11) means the 11th txn issued by client session 2 on node 2. Each client executes txns in closed loop: a txn starts if the predecessor finishes. That's actually what "so" (session order) means in the graph.

3. We are not checking Serializability. Instead, we are reporting lost-update anomaly. According to https://galeracluster.com/library/training/tutorials/supporting-transaction-isolation-levels.html, Galera txns should be free of lost updates if issued on separate cluster nodes (we are using a cluster of two nodes). Of course, a history exhibiting lost updates also violates Ser.

Comment by Jan Lindström (Inactive) [ 2022-08-25 ]

Thanks. So you use default that is 0 meaning there is no causality checks, this is bad if you expect values to be consistent. This means that transaction on lets say node_2 reading a value does not wait until all previous writes from node_1 have been executed. See more in https://galeracluster.com/library/documentation/mysql-wsrep-options.html#wsrep-sync-wait . Note that lost update is defined on single database i.e. it is not defined on distributed case. This means that you should only check how those transactions are executed e.g. on node_1 to find lost update and totally isolated way on node_2. Is your example from distributed transaction execution showing their serializability graph? Remember that reads are not replicated in Galera to other nodes so only writes are replicated. So even if transaction is reading different value in node_2 and then writing something only write is replicated to node_1.

Comment by Si Liu [ 2022-08-25 ]

Im not convinced by your saying "Note that lost update is defined on single database i.e. it is not defined on distributed case":

1. Lost update is commonly defined independent of database arch or implementation.

2. In the case of Galera, the document https://galeracluster.com/library/training/tutorials/supporting-transaction-isolation-levels.html explicitly says that "transactions issued on separate cluster nodes, are subject to the ‘first committer wins’ rule. This will avoid the lost update problem...". In fact, the second lost-update example given there (which Galera claims to avoid) has two transactions on separate cluster nodes.

Regarding the parameter "wsrep_sync_wait", we have one question before trying to set it to 1: The link you refer to is about MySQL wsrep, while we are actually running MariaDB-Galera. So, is it still the case for MariaDB?

Comment by Daniele Sciascia [ 2022-08-25 ]

nobiplusplus Are all of the transactions in client-result.log successfully committed? It could be the transaction involved in the lost update was aborted?

BTW, how's the improvement to the documentation going https://github.com/codership/galera/issues/609#issuecomment-1046763283 ?

I removed any claims on SI or "partially supporting SI" we had in the documentation.
See:

https://github.com/codership/documentation/commit/cc8d6125f1767493eb61e2cc82f5a365ecee6e7a

and

https://github.com/codership/documentation/commit/d87171b0d1b510fe59973cb7ce5892061ce67b80

Comment by Zhenge Chen [ 2022-08-25 ]

Hello, we have set wsrep_sync_wait=1, as shown in my.conf-sync.

Still, we observed lost updates on the client side, this time
between transactions (1, 7) and (2, 15), as they both read val=6
and write different values.

The client side results are in client-result-sync.log,
and the server side logs are in server-log-sync.zip.

Comment by Si Liu [ 2022-08-25 ]

Hi Daniele,

1. The transactions involved in the lost update issue are all committed.

2. Thanks for the update on the documentation!

3. Note that the issue reported here is lost update (though SI must be free of lost updates). There might be confusing statement on SI, however, I see clearly that Galera claims to avoid lost updates https://galeracluster.com/library/training/tutorials/supporting-transaction-isolation-levels.html

BTW, should I @sciascid or @Daniele Sciascia ? Both seemed not to work.

Comment by Jan Lindström (Inactive) [ 2022-08-25 ]

nobiplusplus Is that transaction id real transaction id from InnoDB? As noted by documentation idea is that first committer wins and this could mean that conflicting transaction write might be "lost" as it is not done.

Comment by Daniele Sciascia [ 2022-08-25 ]

Could you guys reproduce the issue one more time with the following server options?

wsrep_debug=4

and

debug=d,query,exec_query

That would be helpful in analyzing the logs.

Comment by Si Liu [ 2022-08-25 ]

Hi Jan,

The txn ids reported in the violation are assigned by clients. We can always find the corresponding txn ids in the database binlog. For example, the txn ids in our last finding below correspond to Xid 29 and Xid 30.

"Still, we observed lost updates on the client side, this time between transactions (1, 7) and (2, 15), as they both read val=6 and write different values"

Regarding your interpretation of "lost update" in the documentation, the problem is that if the update is lost then the associated txn should be aborted. However, in the reported counterexamples, all txns are committed.

Comment by Zhenge Chen [ 2022-08-25 ]

Hi Daniele,

We enabled the options you mentioned and attached the logs in client-result-debug.log and server-logs-debug.zip .

This time, the conflicting transactions are (1, 8) and (2, 14) on the client side, and they have Xid=32,33, GTID=0-1-67,0-1-68 in the binlog.

Comment by Daniele Sciascia [ 2022-08-26 ]

Hi Zhenge,

Thanks! Unfortunately, something went wrong: with option
"debug=d,query,exec_query" we should see the sql commands logged in mariadb_error.log files.
For some reason, they are not. Did you add the option in the configuration file?

Comment by Zhenge Chen [ 2022-08-27 ]

It turns out that this option only works with debug builds... I re-uploaded these two files.

The conflicting transactions are (1, 4) and (2, 13) from the client side, both read var=0, val=3. Their Xids in the binary log are 26 and 27.

Comment by Jan Lindström (Inactive) [ 2022-08-31 ]

nobiplusplus czg Galera cluster can provide only those consistency levels that are supported by InnoDB storage engine. Now InnoDB repeatable read has some adoptions for MVCC see https://dev.mysql.com/doc/refman/5.7/en/innodb-consistent-read.html. Remember that UPDATE will also do a read and this read happens from consistent view created on SELECT. However, as that SELECT does not use row locks values could change when UPDATE reads them again using row locking. Only way currently to make sure that SELECT and UPDATE see exactly the same database state is to use SELECT ... FOR UPDATE that will set row locks.

Comment by Daniele Sciascia [ 2022-09-01 ]

The claim that Galera prevents "lost update" on transactions issued on different nodes appears to be wrong: https://galeracluster.com/library/training/tutorials/supporting-transaction-isolation-levels.html

In the article above, the following example is given:

# initially table t contains (1,0)
# trx 1 and trx 2 execute on node 1 and node 2, respectively 
trx 1: begin;
trx 2: begin;
trx 1: select * from t; # returns row (1,0)
trx 2: select * form t; # returns row (1,0)
trx 1: update t set j=1 where i=1;
trx 2: update t set j=1 where i=1;
trx 1: commit; # success
trx 2: commit; # deadlock error

The execution above happens to prevent a "lost update". However, it sufficient to switch the order of commit in trx 1 and the update of trx 2 to have an execution in which both transactions commit successful:

# initially table t contains (1,0)
# trx 1 and trx 2 execute on node 1 and node 2, respectively 
trx 1: begin;
trx 2: begin;
trx 1: select * from t; # returns row (1,0)
trx 2: select * form t; # returns row (1,0)
trx 1: update t set j=1 where i=1;
trx 1: commit; # success
trx 2: update t set j=1 where i=1; # make sure this happens after the commit of trx 1 is applied to node 2
trx 2: commit; # success

Notice that, for both transactions to commit, the "update" of trx 2 must happen after the commit of trx 1 is applied to node 2.

Comment by Zhenge Chen [ 2022-09-01 ]

I have confirmed Daniele's finding using our tool. So, is that simply an incorrect example given? Or Galera actually does not prevent lost updates BY DESIGN?

Comment by Daniele Sciascia [ 2022-09-02 ]

czg The documentation is wrong. Galera does not prevent lost updates.
Thanks for trying your tool on Galera and to bring this up. On our part, ee will remove that claim from the documentation.

Comment by Si Liu [ 2022-09-02 ]

Thanks, Daniele, for your confirmation and help along the way!

Generated at Thu Feb 08 10:05:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.