[MDEV-19660] wsrep_rec_get_foreign_key() is dereferencing a stale pointer to a page that was previously latched Created: 2019-05-31  Updated: 2020-08-28  Resolved: 2019-07-09

Status: Closed
Project: MariaDB Server
Component/s: Galera, Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 5.5.33a-galera, 10.0.19-galera, 10.1.6, 10.2.0, 10.3.0, 10.4.0
Fix Version/s: 10.2.26, 10.1.41, 10.3.17, 10.4.7

Type: Bug Priority: Major
Reporter: Thirunarayanan Balathandayuthapani Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 1
Labels: corruption, foreign-keys, race


 Description   

In row_ins_foreign_check_on_constraint(), clustered index record is being passed to wsrep_append_foreign_key() after releasing the latch. If a record has been changed by other thread in the meantime then it could lead to a crash when
wsrep_rec_get_foreign_key () tries to access the record.

The following is the problematic code :

        btr_pcur_store_position(pcur, mtr); 
 
        if (index == clust_index) {
                btr_pcur_copy_stored_position(cascade->pcur, pcur);
        } else {
                btr_pcur_store_position(cascade->pcur, mtr);
        }
 
        mtr_commit(mtr);
 
        ut_a(cascade->pcur->rel_pos == BTR_PCUR_ON);
        
        cascade->state = UPD_NODE_UPDATE_CLUSTERED;
        
#ifdef WITH_WSREP
        err = wsrep_append_foreign_key(
                                        thr_get_trx(thr),
                                        foreign,
                                        clust_rec,
                                        clust_index,
                                        FALSE,
                                        (node) ? TRUE : FALSE);



 Comments   
Comment by Marko Mäkelä [ 2019-05-31 ]

As far as I can tell, this was introduced in 5.5.25-galera, 10.0.19-galera, 10.1.6.

Comment by Jan Lindström (Inactive) [ 2019-06-04 ]

In row_ins_check_foreign_constraint same function is called inside a active mtr.

Comment by Jan Lindström (Inactive) [ 2019-06-04 ]

https://github.com/MariaDB/server/commit/42a1ad314700b705077333f42393250c978c92d7

Comment by Marko Mäkelä [ 2019-06-10 ]

I think that we need a test case that exercises the error handling code. Note: I am not asking for a test that reproduces the race condition.

Also, please address my review comments regarding the code changes.

Comment by Marko Mäkelä [ 2019-06-10 ]

I wonder if we could simply replace clust_rec with cascade->pcur->old_rec in the call.

Comment by Jan Lindström (Inactive) [ 2019-06-18 ]

https://github.com/MariaDB/server/commit/859052dfcc3da3d61e3023c7401ef009cea50bcd

Comment by Marko Mäkelä [ 2019-06-25 ]

Can we merely replace the clust_rec with cascade->pcur->old_rec and omit all other changes? I am concerned about adding so much code for error handling or reporting, especially when that code is not being backed by an ‘organic’ test case that does not resort to fault injection.

Comment by Jan Lindström (Inactive) [ 2019-07-01 ]

Sure, I will do that for 10.1-10.4.

Generated at Thu Feb 08 08:53:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.