[MXS-3332] MaxScale 2.5.5. Signal 11 on database select Created: 2020-12-09 Updated: 2021-03-10 Resolved: 2021-03-10 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | Core |
| Affects Version/s: | 2.5.5 |
| Fix Version/s: | 2.5.9 |
| Type: | Bug | Priority: | Major |
| Reporter: | Bryan Alsdorf | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Environment: |
Ubuntu 18.04.4 LTS connecting to single SkySQL Server. |
||
| Attachments: |
|
| Sprint: | MXS-SPRINT-122, MXS-SPRINT-123, MXS-SPRINT-124, MXS-SPRINT-125, MXS-SPRINT-126 |
| Description |
|
When selecting a database MaxScale is crashing with a signal 11.
Specifying a database on initial connect works fine
Stacktrace:
|
| Comments |
| Comment by Bryan Alsdorf [ 2020-12-10 ] |
|
I successfully downgraded to 2.4 as a workaround |
| Comment by markus makela [ 2021-01-11 ] |
|
Based on the stacktrace, the most likely cause of this is the use of session_track_schema=ON in the server configuration. Obviously, even if it is on MaxScale shouldn't crash but, as a workaround, I believe it can be disabled. Initial testing with the community 10.5.8 binaries in a docker container doesn't reveal anything wrong in the current code. |
| Comment by markus makela [ 2021-01-11 ] |
|
Tested with 10.5.8 enterprise packages and MaxScale 2.5.5, wasn't able to reproduce it. Same story with 10.5.5-3. |
| Comment by markus makela [ 2021-01-11 ] |
|
Looking at the code, the most likely reason for the crash is a read past the end of the buffer. The code assumes that the server sends valid packets and I'm guessing that the server sent something that MaxScale wasn't able to decode which in turn caused it to think it needed to read more data than was available. |
| Comment by Bryan Alsdorf [ 2021-01-11 ] |
|
I've attached show global variables for the server, since it is skysql I do not have access to the raw configuration files. You are correct that session_track_schema is on. |
| Comment by markus makela [ 2021-03-08 ] |
|
bryan do you remember of this ever happened with the readwritesplit router? |
| Comment by Bryan Alsdorf [ 2021-03-08 ] |
|
I do not believe we had the readwritesplit setup (unless it is enabled by default). We only had one backend enabled though. |
| Comment by markus makela [ 2021-03-08 ] |
|
If you have the chance to test this again, try changing the router from readconnroute to readwritesplit. This should give us more information about whether this is only a problem with readconnroute or if it affects readwritesplit as well. The number of the backends in this case doesn't matter. I did find a bug where readconnroute was unnecessarily executing the code where the crash happened. With the fix in place I wouldn't expect this to happen again but I don't yet fully understand why the crash happened in the first place which is why reproducing this would still be valuable. |
| Comment by markus makela [ 2021-03-10 ] |
|
I'll close this as fixed in 2.5.9 since the code that caused this isn't going to be used by readconnroute anymore. In addition, with the same exact versions of the products and the steps described in the issue, I wasn't able to reproduce the problem. If you can, please let us know if you'll see this again now that 2.5.9 has been released. |
| Comment by Bryan Alsdorf [ 2021-03-10 ] |
|
Thanks, I'll upgrade the server to 2.5.9 when I get a chance and report back. |