[MDEV-15752] Possible race between DDL and accessing I_S.INNODB_TABLESPACES_ENCRYPTION Created: 2018-04-02 Updated: 2023-04-27 |
|
| Status: | Confirmed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.1, 10.2, 10.3 |
| Fix Version/s: | 10.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Thirunarayanan Balathandayuthapani |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
https://api.travis-ci.org/v3/job/360919346/log.txt
Not reproducible so far. |
| Comments |
| Comment by Marko Mäkelä [ 2018-04-07 ] | |||||||||||||||||||||||||||
|
while some DDL operation is in progress (even on non-encrypted tables). In this case, a good candidate is ALTER TABLE on a partitioned table:
I do not understand why fil_space_crypt_get_status() needs to invoke buf_page_get_gen() at all. Shouldn’t this information be updated directly in crypt_data when the first page is read or updated in the first place? In threads
The call fil_space_get_page_size() can result in !found even if the tablespace exists, because it is invoking fil_space_get_space() which contains some non-trivial logic. I believe that the code can and should be made more robust:
A more proper fix would be to remove this whole function, to make the crypt_data a more integral part of fil_space_t, and to ensure that crypt_data is always updated in sync with any access to the file (be it the very first time the first page is read, or any update of the encryption status). | |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-04-07 ] | |||||||||||||||||||||||||||
|
I pushed a work-around to 10.1 that should reduce the probability of the race, but not prevent it altogether. I think that we should tightly integrate crypt_data with fil_space_t and as part of that effort remove the function fil_crypt_read_crypt_data(). | |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-04-24 ] | |||||||||||||||||||||||||||
|
Related to this, I think that we should remove SYNC_NO_ORDER_CHECK, especially for the encryption mutexes and rw-locks. Maybe we could use fil_space_t::latch or just fil_space_acquire() for encryption, and no separate mutex in fil_space_t::crypt_data?
| |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-04-24 ] | |||||||||||||||||||||||||||
|
It may be necessary to run
to repeat the hangs, because the hangs could depend on some purge or change buffer workload created by earlier tests. | |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-04-24 ] | |||||||||||||||||||||||||||
|
The test encryption.innodb_encryption_tables seems to always fail. When running the test by itself, you just have to wait 10 minutes for the first message to appear:
The message will be repeated several times per second. | |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-01 ] | |||||||||||||||||||||||||||
|
In The latching order checks were replaced in elenst, have you encountered this bug in any version recently? | |||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2021-07-01 ] | |||||||||||||||||||||||||||
|
No, I don't have any recent records of hitting it. | |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-01 ] | |||||||||||||||||||||||||||
|
I see that the function fil_crypt_read_crypt_data() still exists. In an earlier comment, I suggested that it should be removed. |