[MXS-1216] Fatal error while converting data Created: 2017-04-03 Updated: 2017-05-22 Resolved: 2017-05-22 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | avrorouter |
| Affects Version/s: | 2.0.5 |
| Fix Version/s: | 2.0.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Frédérick Pop | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOs 6.8 |
||
| Issue Links: |
|
||||||||
| Sprint: | 2017-32, 2017-33, 2017-34 | ||||||||
| Description |
|
I tried version 2.0.5 to test if
Improvement since last try during
|
| Comments |
| Comment by markus makela [ 2017-04-03 ] | ||||||||||||||||||||||||||||||||
|
This could be related to | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-04-03 ] | ||||||||||||||||||||||||||||||||
|
Right, I also generated avsc file with the script (not the one written in Go). | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-04-20 ] | ||||||||||||||||||||||||||||||||
|
I've build packages from commit a418387d0a8fa2372f78eb2fe351122c6b3ab024: http://max-tst-01.mariadb.com/ci-repository/2.0-apr20/mariadb-maxscale/ These packages contain the fixes for the handling of ALTER TABLE statements that can cause crashes. | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-04-21 ] | ||||||||||||||||||||||||||||||||
|
Ok it seems to go a little further but I have a new stack :
| ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-08 ] | ||||||||||||||||||||||||||||||||
|
The packages that were previously built didn't have the avrorouter module in the packages. I apologize for this inconvenience and I would like to ask you to test with these new packages and report if the crash still occurs. The packages can be found here: http://max-tst-01.mariadb.com/ci-repository/2.0-may8/mariadb-maxscale/ | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-08 ] | ||||||||||||||||||||||||||||||||
|
No problems don't worry I flushed *.avro files, avro-conversion.ini and avro.index to start conversion from the begining. I tried again with this new build and here is a new stack :
| ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-08 ] | ||||||||||||||||||||||||||||||||
|
Thanks for the swift response! This stack trace will really help as it triggered a debug assertion which tells us exactly where things went wrong. | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-09 ] | ||||||||||||||||||||||||||||||||
|
I found a bug in the fixed length string processing in the avrorouter wher I was pointed to by the debug assertion in that stack trace. It could possibly cause a crash if long fixed length strings were used. This bug is not present with variable length strings so an immediate fix to this is to change the types from CHAR to VARCHAR. The fix is quite simple and later I can provide packages for testing and verification so that we can be sure that the bug is fixed on your end. | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-09 ] | ||||||||||||||||||||||||||||||||
|
I've built packages from commit 898bc3444eadae7a72d9c19a741ec678bcfe18cc which you can find here: http://max-tst-01.mariadb.com/ci-repository/2.0-may9/mariadb-maxscale/ These packages should fix the problems with fixed length strings and I'm cautiously optimistic that the root cause of these crashes is fixed. I thank you for reporting these problems and taking the time to test our fixes. | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-10 ] | ||||||||||||||||||||||||||||||||
|
I'm glad to help ! And thank you too for being so reactive. I just install this new package and I've got a new debug assertion a little further from the last one :
| ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-10 ] | ||||||||||||||||||||||||||||||||
|
Hmm, it still seems to be the same part of code. Would you happen to have a way to reproduce this crash without providing the exact binlogs? If you don't have a simple way of reproducing this, you could upload the binlog files confidentially to ftp://ftp.mariadb.com/uploads where can look at them. | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-10 ] | ||||||||||||||||||||||||||||||||
|
I don't really know how to reproduce it and I can't manage to connect to the ftp, I'm using FileZilla on ftp.mariadb.com port 21 with anonymous authentication (tried with my JIRA credentials too) and default folder to /uploads/ but it doesn't work, am I doing something wrong ? | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-10 ] | ||||||||||||||||||||||||||||||||
|
If you have curl installed, the following should work.
where <file> is the file to upload. | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-10 ] | ||||||||||||||||||||||||||||||||
|
For FileZilla, connect to ftp.mariadb.com and then navigate to the private subdirectory. There you can upload the files. | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-11 ] | ||||||||||||||||||||||||||||||||
|
My bad, it was a network restriction on my side. I uploaded a file named " | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-11 ] | ||||||||||||||||||||||||||||||||
|
I've managed to reproduce the crash with the binlog and I'm proceeding with my investigation. I will post an update once I've figured out why it crashes. | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-11 ] | ||||||||||||||||||||||||||||||||
|
Which version of the server were these binlogs created with? | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-11 ] | ||||||||||||||||||||||||||||||||
|
MariaDB [(none)]> show global variables like 'version%';
| ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-12 ] | ||||||||||||||||||||||||||||||||
|
Upon further investigation, the problem seems to be the DATETIME type processing for events created by a MariaDB 10.0 server. This can be reproduced with the following test case on a 10.0 server.
There doesn't seem to be an immediate workaround for this apart from using a MariaDB 10.1 server. The replication events generated by the 10.1 server use a newer format which is processed correctly. I'll continue my investigations into finding a way to resolve this issue. | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-15 ] | ||||||||||||||||||||||||||||||||
|
I've created packages from commit 5a0d2c54bd564688af44695067953ac16a09ee85 which fixes the crash on 10.0 DATETIME(1..6). The packages can be found here: http://max-tst-01.mariadb.com/ci-repository/2.0-may15/mariadb-maxscale/ | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-15 ] | ||||||||||||||||||||||||||||||||
|
Tried this new build :
| ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-15 ] | ||||||||||||||||||||||||||||||||
|
Could it be possible to get the table definition for the table which was being processed? I believe the table in question is mediator_execution. I suspect that a DATETIME value with sub-second precision might be causing these problems. I have encountered some problems when defining the DATETIME values with 10.0 as DATETIME(0). | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-15 ] | ||||||||||||||||||||||||||||||||
|
I don't have DATETIME(0) in this table, only DATETIME(3) :
| ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-15 ] | ||||||||||||||||||||||||||||||||
|
Thanks for the table definition, I'll continue my investigations. I'll post updates when I have more information. | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-16 ] | ||||||||||||||||||||||||||||||||
|
Would it be possible for you to upload the binlog in question that is causing the crash? I haven't been able to reproduce the crash locally. | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-16 ] | ||||||||||||||||||||||||||||||||
|
I think it's still the same binlog as before, do you want me to reupload it ? | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-16 ] | ||||||||||||||||||||||||||||||||
|
Curious, I was able to convert the whole binlog without a crash. I'll take a closer look at the binlog again. | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-16 ] | ||||||||||||||||||||||||||||||||
|
Here is the avro-conversion.ini content :
| ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-16 ] | ||||||||||||||||||||||||||||||||
|
Ah, this is due to the fact that I lack the .avsc schema files for those tables. I'll be able to create them using the table definition you provided. | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-16 ] | ||||||||||||||||||||||||||||||||
|
Oh yes I had to create them with a script, I thought that it would be able to create them itself as it was a fresh load on the db but for some reasons create statements where missing from the binlog position I set (it wasn't a fresh mariadb database). | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-16 ] | ||||||||||||||||||||||||||||||||
|
I managed to reproduce the crash by defining the schema for the table. I had to add two extra fields to the Avro schema to store the real type and the length of each field. If the field doesn't define the two new fields, real_type and length, then the conversion process crashes due to a debug assertion. By adding the extra information to the DATETIME fields I'm able to convert the binlog corretly:
You can manually add the fields to the schema or use this Python script to generate them: https://github.com/mariadb-corporation/MaxScale/blob/2.0-avro-datetime/server/modules/protocol/examples/cdc_schema.py | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-16 ] | ||||||||||||||||||||||||||||||||
|
I will regenerate all avsc schema files to be sure to have the correct format for each table. | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-16 ] | ||||||||||||||||||||||||||||||||
|
Finally I had to edit the file (I can't run the script as I'm on CentOs 6 and it seems that there is no python3/mysql connector for CentOs 6) and it seems to run smoothly | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-16 ] | ||||||||||||||||||||||||||||||||
|
Yes, all DATETIME fields that have a precision definition e.g. DATETIME(3). This is only for 10.0 and 10.1 should generate correct events regardless of the DATETIME precision. | ||||||||||||||||||||||||||||||||
| Comment by Frédérick Pop [ 2017-05-17 ] | ||||||||||||||||||||||||||||||||
|
I'll regenerate avsc files from a CentOs 7 VM and put them on my CentOs 6 VM and try to convert the whole binlogs. | ||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2017-05-22 ] | ||||||||||||||||||||||||||||||||
|
Closing as fixed. Please reopen this if you find any problems. |