[MDEV-11972] Assertion failure in sync_array_get_and_reserve_cell() Created: 2017-02-02 Updated: 2020-09-06 Resolved: 2020-09-06 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.0.9, 10.1.0, 10.1.19, 10.2 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Ganesan | Assignee: | Marko Mäkelä |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | innodb, need_feedback, need_rr, upstream | ||
| Environment: |
Microsoft Windows Server 2012 R2 Standard |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
MariaDB 10.1.19 crashed with exception 0x80000003 and out of memory
|
| Comments |
| Comment by Daniel Black [ 2017-02-06 ] | ||||
|
This look like an out of memory situation. While better handling actions can be taken within mariadb to handle this better, it is ultimately a user problem to provide sufficient hardware and appropriate configuration to ensure mariadb never hits the out of memory situation. https://mariadb.com/kb/en/mariadb/mariadb-memory-allocation/ may provide some useful tips on configuration. If this is a major memory consuming query, using T.T_STAMP = p.T_STAMP as a JOIN criteria rather than a WHERE criteria and ensuring that this is indexed correctly. Also consider batch moving this in 1000 rows or so batches rather than the entire query in one go. I'd suggest you vote on MDEV-8307 if this meets your use case. | ||||
| Comment by Elena Stepanova [ 2017-02-06 ] | ||||
|
Given that the system has 256G memory and the buffer pool alone is 100G, I don't think a hardware deficiency is to blame here. Of course, it would be helpful to know more about the data and the query, as well as the situation on the machine when the problem occured. Marappa, | ||||
| Comment by Ganesan [ 2017-02-06 ] | ||||
|
This Happened only once. No information is available about the spike at that time. | ||||
| Comment by Daniel Black [ 2017-02-06 ] | ||||
|
Sorry, I didn't (and still don't) see the hardware/buffer pool memory size. MDEV-8307 is based on this appearing to be a batch insert before a delete. making a ```INSERT INTO .. DELETE FROM ... LIMIT 1000 RETURNING ...``` could be used multiple times to limit memory however DELETE from multiple tables with LIMIT isn't supported. In hindsight its not the most productive idea. I suggest getting more monitoring regarding memory on the server and you can monitor memory per node also that could rule out or confirm if NUMA is a cause. If VMWare is manipulating memory that needs to be monitored too. | ||||
| Comment by Ganesan [ 2017-02-15 ] | ||||
|
There are about 67 jobs(events) running some every 1 minute, some eveny 3 minutes and some every 10 minutes processing data from 483 tables by creating many temporary tables and loading the processed data into 154 summary tables. innodb buffer pool memory is always 99% usage. see below ---------------------- ---------------------- innodb_engine_status.txt | ||||
| Comment by Marko Mäkelä [ 2017-05-09 ] | ||||
|
The failing assertion and the preceding code are pretty interesting:
This code was introduced into MariaDB 10.0 and 10.1 by a merge of MySQL 5.6.15, which contains It looks like the author knew that the assertion is invalid, but still decided to make it a hard assertion that can crash the production server; not a debug assertion. | ||||
| Comment by Marko Mäkelä [ 2017-05-09 ] | ||||
|
I think that the algorithm should be changed to loop over all sync_wait_array[] until an available slot is found. There is a theoretical possibility that the sync_wait_array[] runs out of slots when there is enough contention between the threads in the server, or if multiple threads are holding a large number of rw-locks or mutexes, such as when multiple threads are splitting (different) b-trees concurrently. | ||||
| Comment by Marko Mäkelä [ 2017-05-09 ] | ||||
|
The same problem exists in MariaDB 10.2 as well. Only in the merge of MySQL 5.7.9 the assertion was changed to ut_a(*cell != NULL) just like it was changed by the MySQL 5.7 version of the Oracle change. | ||||
| Comment by Marko Mäkelä [ 2017-05-11 ] | ||||
|
On a related note, at least the following functions (or their callers) seem to be broken when sync_array_size == 1 does not hold: I think that we should try to get rid of the sync_array altogether. | ||||
| Comment by Marko Mäkelä [ 2017-05-18 ] | ||||
|
One more flaw in sync_array_get_and_reserve_cell() is that if srv_sync_array_size==0, it would return an uninitialized value. This will cause GCC 7.1.0 to emit warnings for the callers of this function when building with -O3 (several warnings like the following):
| ||||
| Comment by Marko Mäkelä [ 2020-08-07 ] | ||||
|
I do not think that we have fixed anything directly in this area. |