[MDEV-11004] Unable to start (Segfault or os error 2) when encryption key missing Created: 2016-10-10 Updated: 2016-11-29 Resolved: 2016-10-29 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Encryption, Storage Engine - InnoDB, Storage Engine - XtraDB |
| Affects Version/s: | 10.1.18 |
| Fix Version/s: | 10.1.19 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Richard Oakham | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | innodb | ||
| Environment: |
Debian Jessie (up to date as of 10/10/16), MariaDB 10.1.18, 24GB RAM, 16 core Intel processor, 2TB data (90% free) |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Updated to MariaDB 10.1.18 on a development server, tables are encrypted using DAR encyption with key 1 for system use, and key 19 for specific table use. Prior to 10.1.18 you could start the server with only key 1 available (this file is always available on the server), and it would fail to access any other encrypted table but still start up. We would then restart with both keys available to get access to the main data files. This method enabled updates to work correctly (apt-get update) without the data being available on restart (protection against someone gaining root access to the server, key19 is only installed for data restarts after upgrades and then removed from the system) With 10.1.18 the restart without key 19 fails with OS error number 2 - as if the file cannot be accessed - but it can access the file, as shown by accessing with key 19 present which works as expected (full startup). This breaks doing updates however because the update does not have key 19, only key 1. Even after starting with both keys, and then doing a formal shutdown, a restart without key 19 fails. This is different from previous behaviour. Expected behaviour: MariaDB starts up but the tables encrypted with a missing key are not available until the key is provided at restart. Note with log below it doesn't matter what table name or database name it is, moving a reported "broken" table out (so that the tablespace gets updated without that table) simply moves the issue to the next file or database, depending on what it is trying to open. Log entries
|
| Comments |
| Comment by Richard Oakham [ 2016-10-10 ] |
|
Addit: The update process (apt-get update) will fail as follows: Setting up mariadb-server-10.1 (10.1.18+maria-1~jessie) ... |
| Comment by Elena Stepanova [ 2016-10-11 ] |
|
Oakham, Provided error log relates to the case when the server was started after a crash or otherwise unclean shutdown. I can reproduce this refusal to start, but it's not new, at least 10.1.17 behaves the same way. |
| Comment by Richard Oakham [ 2016-10-12 ] |
|
Log files attached with annotations as to what I was doing at the time. All my annotations start ** Also updated the title of this bug, because of the SegFaults showing. logset01 - Original upgrade attempt, followed by starting up with one key, both keys, back to one key (both Signal 11 and OS Error 2 failures to start), moving one data directory out of access to see if the issue was that dir or all databases |
| Comment by Elena Stepanova [ 2016-10-12 ] |
|
Oakham, thanks for all the data, we'll review it in detail shortly. At the first glance, the root problem here is the crash with SIGSEGV. After it happens, the next server startup causes InnoDB recovery, which fails with "We do not continue the crash recovery" etc. According to the error text, this part (not continuing recovery without all the keys) is intentional – maybe it can be reconsidered, but it's not new. On a previous version, you didn't have the crash, hence there was no need in recovery, hence the failure to recover was never triggered. But the crash is apparently new, and it obviously mustn't happen. |
| Comment by Richard Oakham [ 2016-10-12 ] |
|
Hi. |
| Comment by Richard Oakham [ 2016-10-12 ] |
|
Added a GDB thread backtrace of the sigsegv occurring - debug build using 10.1 branch from Git |
| Comment by Elena Stepanova [ 2016-10-12 ] |
|
Thanks for this. Assigning to jplindst, as the full stacktrace might be enough for him to see the reason of the problem. |
| Comment by Richard Oakham [ 2016-10-18 ] |
|
Hi - is there anything else I can add to this to help track it down? |
| Comment by Jan Lindström (Inactive) [ 2016-10-29 ] |
|
commit 885577fb10cba63a4a140bd91257a3cd2b402159 Two problems: (1) When pushing warning to sql-layer we need to check that thd != NULL (2) At tablespace key rotation if used key_id is not found from |