[MDEV-22258] Limit innodb_encryption_threads to 255 Created: 2020-04-16 Updated: 2020-12-17 Resolved: 2020-05-20 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB, Storage Engine - XtraDB |
| Affects Version/s: | 10.1, 10.2, 10.3, 10.4, 10.5 |
| Fix Version/s: | 10.5.4, 10.1.46, 10.2.33, 10.3.24, 10.4.14 |
| Type: | Bug | Priority: | Major |
| Reporter: | Roel Van de Paar | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | affects-tests | ||
| Issue Links: |
|
||||||||||||
| Description |
|
Leads to:
Bug confirmed present in: Bug confirmed not present in: |
| Comments |
| Comment by Roel Van de Paar [ 2020-04-16 ] | |||||||||||||||||||||||||||||||||||||||||||
|
axel Marko informs me that a bug fix is trivial once he has some reasonable suggestion for a limit. Would you have a recommendation from your experience? Thank you! | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2020-04-16 ] | |||||||||||||||||||||||||||||||||||||||||||
|
From the discussion with marko in Slack it seems as if this background thread does nothing more than marking pages as dirty so they would be picked up by the checkpointing mechanism and (re)encrypted. I fail to see why one would need multiple such threads at all. The encryption itself and the IO would be the bottleneck of that background operation. I will however do some testing to find out if multiple such threads do any good. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2020-04-25 ] | |||||||||||||||||||||||||||||||||||||||||||
|
This testcase;
Leads to another double assert:
| |||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2020-05-04 ] | |||||||||||||||||||||||||||||||||||||||||||
|
This testcase:
Produces the issue quicker without CLI/Server hangs. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2020-05-04 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Discussing with Axel. Test done (and same result for innodb_sync_array_size=1000);
| |||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2020-05-04 ] | |||||||||||||||||||||||||||||||||||||||||||
|
That backtrace from Roel hints that the number of innodb_encryption_threads counts against the number of sync array slots (srv_max_n_threads). Having too many innodb_encryption_threads would just fill the sync array and eventuelly overflow it and then crash the server. The limit for innodb_encryption_threads should hence depend on srv_max_n_threads. It should probably be no larger than 1/10th of it or the encryption threads would disrupt normal operations. Now I see in | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2020-05-04 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Proposal to set it to an arbitrary but more sensible limit? 255? 100? 10? | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-05-20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
I would prefer an exact proposal, so that I do not have to investigate too much myself. Now that innodb_sync_array_size was mentioned, I see that its minimum value is 1. Does that minimum value cause crashes if encryption is not involved? | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-05-20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
I performed a quick benchmark myself. I believe that this test keeps the server idle while the files are being encrypted in the background.
This was on a system with 2×10 CPU cores, or 40 threads in total with hyperthreading. I also tried innodb_encryption_threads=255 and innodb_sync_array_size=1 without any problem. The test execution time was about the same as with innodb_encryption_threads=16. Based on this experiment, I think that 255 would be a reasonable upper limit. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2020-05-20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
marko the innodb_sync_array_size variable is misnamed. This variable does not set the size of the sync array, but in how many partitions it is split (to reduce mutex contention). The crash with very big innodb_encryption_threads happens because each such thread allocates one slot in the sync array. Then the array overflows eventually. The sync array is statically sized using srv_max_n_threads. I could not find where this in turn is set, but it must be somewhere when InnoDB is initialized. Probably related to max_connections? In any case there is an implied relationship between the size of the sync array and the number of threads in InnoDB. It was just never foreseen that there could be a huge number of internal threads using the sync array. IMHO the whole logic for determining the size of the sync array should be checked and fixed. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-12-17 ] | |||||||||||||||||||||||||||||||||||||||||||
|
axel, sorry, I missed your comment until now. In After I repeated my previous benchmark, on 10.5 and 10.6, both compiled with the following:
This was with 10.5 e8217d070fc3e60870131615a48515836c773b07 and 10.6 af1335c2d6c3f3d656e4227bc0c925c5f7051d7e. So, we still seem to have some benefit from multiple threads. But, what if we use innodb_encryption_threads=1 (there is a single buf_flush_page_cleaner thread that does the actual work, after all) and a larger value of innodb_encryption_rotation_iops?
There is some random variation in the numbers, but we still seem to benefit from multiple threads that are dirtying the pages concurrently. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-12-17 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Having a large number of encryption threads may lead to extreme thrashing, as noted in |