[MDEV-23867] insert... select crash in compute_window_func Created: 2020-10-01 Updated: 2021-05-18 Resolved: 2020-10-23 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer - Window functions, Server |
| Affects Version/s: | 10.4.14, 10.2, 10.3, 10.5 |
| Fix Version/s: | 10.2.35, 10.3.26, 10.4.16, 10.5.7 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Rick Pizzi | Assignee: | Varun Gupta (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
A customer is experiencing crashes when an ETL query is executed. |
| Comments |
| Comment by Varun Gupta (Inactive) [ 2020-10-02 ] | ||||||||||||||||||||||||||||||||||||
|
rpizzi can you also attach the global variables list | ||||||||||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2020-10-02 ] | ||||||||||||||||||||||||||||||||||||
|
reproducible test case, with encrypt-tmp-files=ON
| ||||||||||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2020-10-02 ] | ||||||||||||||||||||||||||||||||||||
|
The stacktrace for the failure on debug build
| ||||||||||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2020-10-06 ] | ||||||||||||||||||||||||||||||||||||
|
Patch addressing part of the problem http://lists.askmonty.org/pipermail/commits/2020-October/014343.html | ||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-08 ] | ||||||||||||||||||||||||||||||||||||
|
Ok, so I see two issues that have caused this MDEV. Both are caused by two features not taking each other into account:
Issue #1: memory allocation.An IO_CACHE that uses encryption uses a larger buffer (it needs space for the encrypted data, decrypted data, IO_CACHE_CRYPT struct to describe encryption parameters etc). The logic to do this is here in init_io_cache():
init_slave_io_cache() doesn't know about this. It allocates memory like so:
This is why we get a memory overrun. Varun's patch tries to fix this by copying the logic:
I would suggest a dedicated member saying how much was allocated. Issue #2: IO_CACHE::seek_not_doneWhen IO_CACHE objects are cloned, they still share the file descriptor. In order to avoid this, there is IO_CACHE::next_file_user which is a circular list of objects using the same file. If an IO_CACHE operates on the file, it it sets IO_CACHE::seek_not_done=1 for every other IO_CACHE in the list. _my_b_cache_read has pieces like:
But my_b_encr_read doesn't do this. This needs to be fixed. | ||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-08 ] | ||||||||||||||||||||||||||||||||||||
|
varun but you've also mentioned there was a 3rd problem? I can't observe it on my side- when I apply the attached patch, the example works correctly for me. | ||||||||||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2020-10-09 ] | ||||||||||||||||||||||||||||||||||||
|
psergey the problem you mentioned in Issue #2 is the one I was talking about.
So your approach covers it | ||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-16 ] | ||||||||||||||||||||||||||||||||||||
|
Draft patch: https://github.com/MariaDB/server/commit/e877e752ba2c0f107d4732301dc4a41d2d57a79d . Input provided over email. | ||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-21 ] | ||||||||||||||||||||||||||||||||||||
|
Two concerns about the write -> pwrite conversion: Concern #1:
Imagine the following situation: Execution before this patch:
After the patch this becomes:
Is there any logic which would allow to say that the above will not happen? | ||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-21 ] | ||||||||||||||||||||||||||||||||||||
|
Concern #2 is this code in mi_check.c:
Does the code that follows this assume that info->dfile is seeked to some particular offset?
.. and I have the same question about sort_delete_record() in storage/maria/ma_check.c.
Before the patch, row_info->rec_cache.file was "seeked" to somewhere, now it is not. Any thoughts? If it is not immediately clear, let's ask Monty (this is very old code so I dont think anybody else rememers). | ||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2020-10-21 ] | ||||||||||||||||||||||||||||||||||||
|
No problem with mi_check or ma_check(), They are taking the file descriptor of the newly re-created data file and removing a duplicate row from it. The remove-row code is using pread/pwrite and thus is not affected by the position of the IO_CACHE file descriptor and will not affect the IO_CACHE code | ||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-22 ] | ||||||||||||||||||||||||||||||||||||
|
Take-aways from yesterday discussion:
Due to this:
| ||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-23 ] | ||||||||||||||||||||||||||||||||||||
|
Ok to push the "conservative" fix for aec8072f99ef4b1a4a966692a1cb761abde96cf0 . | ||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-28 ] | ||||||||||||||||||||||||||||||||||||
|
varun please file an MDEV for the "complete" fix and link with this one |