[MDEV-14549] ERROR 2013 (HY000): Lost connection persists Created: 2017-11-30  Updated: 2017-12-08

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.11
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Fabio Valeri Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: innodb
Environment:

Windows 7
Workstation / Lenovo D30


Attachments: Zip Archive WS20130124-2_20171130-2.zip     Zip Archive WS20130124-2_20171207_NEW.zip     Text File fire_curr_ddl_20171013.sql     GIF File jira_01_select.gif     PNG File jira_02_Program Files_MariaDB 10.2_data.png     PNG File repair.png    

 Description   

I have several databases:

  • fire_curr: 10 tables, number of rows between 100 and 16 Mio
  • fire_curr_add: 20 tables, number of rows between 100 and 16 Mio

and many other databases on the same MariaDB instance. When I submit an sql to fire_curr_add, for example:

SELECT COUNT(*) FROM fire_curr_add.patient

I get connection to the server and results. When I send the query
to fire_curr, for example:

SELECT COUNT(*) FROM fire_curr.patient

I get the error:

MariaDB: ERROR 2013 (HY000): Lost connection to MySQL server during query

and the service stops. When I restart the service each time I want to access the database curr the serivice stops. This is not the case for nearly all other databases
of the same instance. The instance and databases exists since 6 month and I had never such problems.

Other affected database, tables and columns:
I have access to database information_schema. For example, I can count the rows of table system_variables:

SELECT COUNT(*) FROM system_variables

But if I try the same with table columns:

SELECT COUNT(*) FROM columns

the connection will be lost.

This works:

SELECT COUNT(*) FROM tables

Also this:

*SELECT table_schema, table_name FROM tables

But if I submit

SELECT * FROM tables

or

SELECT table_schema, table_name, engine, table_rows 
FROM tables 
LIMIT 10

the connection will be lost.

Summary: some database disconnect the mysql-server and also some columns of tables.

This problems was also submitted to stackoverflow.

Attachmemts

  • WS20130124-2_20171130-2.zip: contains the compete err-file.
  • jira_01_select.gif: mysql-terminal output

Other relevant informations:

  • 20-23.11.2017 All tables have been copied from fire_curr to fire_curr_add and processed in fire_curr_add.
  • 29.11.2017 Version 10.2.6 was upgraded to 10.2.11, but problem persists.


 Comments   
Comment by Vladislav Vaintroub [ 2017-11-30 ]

The crash from the error log

2017-11-30 16:26:26 7164 [ERROR] InnoDB: Tried to read 16384 bytes at offset 4947968, but was only able to read 0
2017-11-30 16:26:26 7164 [ERROR] InnoDB: File (unknown): 'read' returned OS error 0. Cannot continue operation
171130 16:26:26 [ERROR] mysqld got exception 0x80000003 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.11-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=1
max_threads=65537
thread_count=7
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 136057 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7ea49fb8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
mysqld.exe!my_parameter_handler()[my_init.c:256]
mysqld.exe!raise()[signal.cpp:516]
mysqld.exe!abort()[abort.cpp:71]
mysqld.exe!os_file_handle_error_cond_exit()[os0file.cc:5224]
mysqld.exe!os_file_read_page()[os0file.cc:5106]
mysqld.exe!os_file_read_func()[os0file.cc:5494]
mysqld.exe!fil_io()[fil0fil.cc:5272]
mysqld.exe!buf_read_page_low()[buf0rea.cc:192]
mysqld.exe!buf_read_page()[buf0rea.cc:462]
mysqld.exe!buf_page_get_gen()[buf0buf.cc:4322]
mysqld.exe!btr_cur_search_to_nth_level()[btr0cur.cc:1115]
mysqld.exe!btr_pcur_open_on_user_rec_func()[btr0pcur.cc:598]
mysqld.exe!dict_load_foreign()[dict0load.cc:3447]
mysqld.exe!dict_load_foreigns()[dict0load.cc:3695]
mysqld.exe!dict_load_table_one()[dict0load.cc:3071]
mysqld.exe!dict_load_table()[dict0load.cc:2809]
mysqld.exe!dict_table_open_on_name()[dict0dict.cc:1169]
mysqld.exe!ha_innobase::open_dict_table()[ha_innodb.cc:6764]
mysqld.exe!ha_innobase::open()[ha_innodb.cc:6424]
mysqld.exe!handler::ha_open()[handler.cc:2499]
mysqld.exe!open_table_from_share()[table.cc:3319]
mysqld.exe!open_table()[sql_base.cc:1874]
mysqld.exe!open_and_process_table()[sql_base.cc:3409]
mysqld.exe!open_tables()[sql_base.cc:3926]
mysqld.exe!open_and_lock_tables()[sql_base.cc:4682]
mysqld.exe!execute_sqlcom_select()[sql_parse.cc:6367]
mysqld.exe!mysql_execute_command()[sql_parse.cc:3462]
mysqld.exe!mysql_parse()[sql_parse.cc:7892]
mysqld.exe!dispatch_command()[sql_parse.cc:1807]
mysqld.exe!do_command()[sql_parse.cc:1359]
mysqld.exe!threadpool_process_request()[threadpool_common.cc:366]
mysqld.exe!tp_callback()[threadpool_common.cc:192]
ntdll.dll!TpPostWork()
ntdll.dll!RtlRealSuccessor()
kernel32.dll!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x74c38f80): SELECT COUNT(*) FROM fire_curr.patient
Connection ID (thread ID): 8
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Comment by Marko Mäkelä [ 2017-11-30 ]

giordano, thank you for the report. I believe that the issue is that the InnoDB system tablespace is shorter than expected, and some pages of the internal tables SYS_FOREIGN or SYS_FOREIGN_COLS are located in the missing portion of the system tablespace.

I have some questions. Please answer them.
What is the history of the database like? Were the files copied at any point? Any previous crashes before this problem started to occur? Any DDL? Did innodb_data_file_path ever contain references to multiple data files?

Finally, let me copy my response from the StackOverflow discussion:

Based on the stack trace, it seems to be the InnoDB system tablespace that is shorter than expected. When the function dict_load_foreigns() is accessing the InnoDB system table SYS_FOREIGN or SYS_FOREIGN_COLS, it is requesting a page that is not in the buffer pool. The page read request causes InnoDB to commit suicide, because the file is too short.

InnoDB notoriously does not report the problematic file name. We should refactor the I/O code in MariaDB at some point. In this case, we do know that the problem is in the InnoDB system tablespace, because the InnoDB internal SYS_ tables are located there.

There already exist some related bugs in the MariaDB tracker. I think that this scenario is already covered by these:

MDEV-13542 Crashing on a corrupted page is unhelpful (Yes, I copied the title of the ancient MySQL Bug#10132)
MDEV-11633 Make the InnoDB system tablespace optional (This is not going to happen soon, and the design is not finalized)

It would be interesting to know how the corruption occurred in the first place. Before MDEV-11556, InnoDB data file extension in MariaDB was not fully crash-safe. (MySQL does not contain this fix at all.)

Could it be that the files were copied at some point? A bug in the copy procedure? Or could the system tablespace have originally consisted of multiple files, but the server was started up with the wrong innodb_data_file_path so that the last file(s) were ignored? Everything would appear fine until a page in the ‘missing’ files is being accessed.

You might ask: How to work around this error? Unfortunately, I don’t think that there currently is any way to skip the read of the foreign key metadata. So, if the metadata tables are corrupted, in the worst case you will be unable to access any InnoDB tables. For this, I would welcome a MariaDB bug report.

I think that we should repurpose this report as "Implement an option for InnoDB to ignore SYS_FOREIGN, SYS_FOREIGN_COLS when loading table definitions".
Actually I think that this could be made part of innodb_force_recovery>=3.

Comment by Fabio Valeri [ 2017-12-01 ]

Thank you very much for your help. I try to answer your question as good as possible. Since I'm not a database specialist I may answered your question not correctly.

What is the history of the database like?

I'm not sure if I understand this question.
I have two MariaDB instances, one on drive C(hdd) and one on drive D(ssd). The former one cause the problems and contains the database fire_curr.
The database (fire_curr) was filled regularly (one time per month) with xml files through a VB-Application. Each second month tables from fire_curr were copied to database fire_curr_add and processed. Some processing (sql-statements) in fire_curr_add needs 10 hours but fire_curr (the defect one) is not involved.

The second instance of MariaDB on my Workstation drive D has version 10.1.23.

I use MariaDB/fire_curr since 4 years starting with version 5.

I add a new attachment jira_02_Program Files_MariaDB 10.2_data.png which shows the content of \data.

Were the files copied at any point?

What do you mean with file? If you mean the files in
C:\Program Files\MariaDB 10.2\data
No, definitivly not. I never touched them.

Any previous crashes before this problem started to occur?
I'm not aware about any crashes since the installation of MariaDB 10.2.6 6 month ago.

Any DDL?
I put the DDL in the attachement (fire_curr_ddl_20171013.sql).

Did innodb_data_file_path ever contain references to multiple data files?
I don't understand the question. Can you give me some hints how I can answer the question?

In case that there is no solution what would you suggest to do:

  1. Drop the database fire_curr and reload an older sql dump?
  2. Deinstall MariaDB instance and install a new instance and import all sql dumps?
Comment by Fabio Valeri [ 2017-12-08 ]

I wanted to reinstall MariaDB. When I started deinstall/change of MaridaDB through the application wizard of Window 7 it ask if I wanted to Change/Repair/Remove. I decided to repair. After that MariaDB worked as usually. That is I could submit queries without loosing connection.

Upgrad from MariaDB from 10.2.6 to 10.2.11 didn't help but repair.

I attached the "repair"-GUI and the new err.log (WS20130124-2_20171207_NEW).

Lesson learnt: Before asking SO repair MariaDB.

Generated at Thu Feb 08 08:14:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.