[MXS-841] Plantage Maxcale en charge Created: 2016-08-26  Updated: 2017-12-21  Resolved: 2016-09-13

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 1.4.3
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: dsi Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Maxscale 64 bit debian wheezy


Attachments: File maxscale.cnf     Text File maxscale1.log    

 Description   

Bonjour,

Nous avons tenté de mettre en prod dans la soirée du 16 au 17/06, Maxscale mais ce dernier n'a pas tenu la charge après plusieurs heures. voici les logs d'erreur :
2016-06-16 17:36:25 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 17:46:27 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 18:48:27 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 19:06:10 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 19:13:09 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 19:57:02 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 20:38:52 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 20:50:29 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 21:01:20 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 21:18:45 error : Write to dcb 0x7f24d400da30 in state DCB_STATE_NOPOLLING fd 301 failed due errno 9, Bad file descriptor
2016-06-16 21:18:45 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 21:23:14 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 21:42:24 error : Write to dcb 0x7f24d8024870 in state DCB_STATE_NOPOLLING fd 549 failed due errno 9, Bad file descriptor
2016-06-16 21:42:24 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 22:07:46 error : Write to dcb 0x7f24d0002890 in state DCB_STATE_NOPOLLING fd 259 failed due errno 9, Bad file descriptor
2016-06-16 22:07:46 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 22:23:34 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 22:31:33 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 22:36:49 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-16 23:35:52 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-17 00:49:38 error : Write to dcb 0x7f24d80070f0 in state DCB_STATE_NOPOLLING fd 182 failed due errno 9, Bad file descriptor
2016-06-17 00:49:38 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-17 01:03:46 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-17 01:27:05 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-06-17 01:28:54 error : Write to dcb 0x7f24d8010b30 in state DCB_STATE_DISCONNECTED fd -1 failed due errno 9, Bad file descriptor
2016-06-17 01:28:54 error : Attempt to write buffered data to backend failed due internal inconsistent state.

AFAIK this the 'Attempt ...' messages are related to issues with the databases itself. Does this cause any problems to the production?

Now related to the crash at ~2AM which is the reason you opened this ticket. this is what we can see in the log:

2016-06-17 01:57:05 error : Fatal: MaxScale 1.4.3 received fatal signal 11. Attempting backtrace.
2016-06-17 01:57:05 error : Commit ID: c2706bab69d1f3b5f1f3ec70f78b1a4803788fec System name: Linux Release string: undefined Embedded library version: (null)
2016-06-17 01:57:05 error : /usr/bin/maxscale() [0x40406c]
2016-06-17 01:57:05 error : /lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7f24f8dd40a0]
2016-06-17 01:57:05 error : /usr/lib/x86_64-linux-gnu/maxscale/libMySQLBackend.so(+0x36d4) [0x7f24de8e76d4]
2016-06-17 01:57:05 error : /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x4645f) [0x7f24f96f145f]
2016-06-17 01:57:05 error : /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(poll_waitevents+0x5df) [0x7f24f96f0f69]
2016-06-17 01:57:05 error : /usr/bin/maxscale(worker_thread_main+0x2a) [0x405067]
2016-06-17 01:57:05 error : /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f24f8dcbb50]
2016-06-17 01:57:05 error : /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f24f793a30d]

Que peut-on faire pour résoudre ce problème ?

En vous remerciant

Cordialement,

Aziz Jouhdi DSI



 Comments   
Comment by Johan Wikman [ 2016-08-29 ]

Interpreted stacktrace:

/home/vagrant/workspace/server/core/gateway.c:377
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7f24f8dd40a0]
/home/vagrant/workspace/server/modules/protocol/mysql_backend.c:622
/home/vagrant/workspace/server/core/poll.c:941
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(poll_waitevents+0x5df) [0x7f24f96f0f69]
/home/vagrant/workspace/server/core/gateway.c:1028
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f24f8dcbb50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f24f793a30d]

Comment by dsi [ 2016-08-29 ]

Bonjour Johan,

Je n'ai pas compris ce que vous me demandez.
Parlez-vous français ?

Cordialement,

Comment by Johan Wikman [ 2016-08-29 ]

Bonjour,

Sorry no, I don't speak french (but google translate did an adequate job).

I did not ask you yet for anything. I just converted the stacktrace from your log into one where the file and line number information is available and stored them here for later reference.

Comment by dsi [ 2016-08-29 ]

Hi Johan,

Thx for your response, let me know if you have questions.

Kind regards

Comment by Johan Wikman [ 2016-09-05 ]

Hi,

The attached log seems to be from multiple runs with various issues.

In case you still have problems, please provide a log from a recent run.

Johan

Comment by markus makela [ 2016-09-05 ]

Please also describe the client activity between the restart at 2016-06-16 17:33:25 and the crash at 2016-06-17 01:57:05. Also if you could specify what sort of client activity you had after the crash happened and MaxScale was restarted.

Comment by dsi [ 2016-09-06 ]

Hi,

We have no recent logs because MAXSCALE is no longer Prod since the crash.
We are a great site showing ads for sales of agricultural equipment.
We are present worldwide.
We have more than 450,000 ads, we have 7 million visits per month.

Kind regards

Comment by Johan Wikman [ 2016-09-13 ]

Without more information we cannot reproduce this.

Please reopen or recreate if this occurs again.

Comment by dsi [ 2016-09-14 ]

Hi,

We reviewed the configuration with our consultant at Percona.
We will put into production MAXSCALE tomorrow.
We will let you know

Thx

Comment by dsi [ 2016-09-21 ]

Hi,

Maxscale is now on Prod since last week.
And all seem OK but we have these errors (not a lot of) :

2016-09-17 23:16:25 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-09-18 10:19:17 error : Write to dcb 0x7f406c01efd0 in state DCB_STATE_NOPOLLING fd 255 failed due errno 9, Bad file descriptor
2016-09-18 10:19:17 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-09-18 14:52:03 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-09-19 04:33:59 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-09-19 05:22:07 error : Write to dcb 0x7f406c0160f0 in state DCB_STATE_NOPOLLING fd 149 failed due errno 9, Bad file descriptor
2016-09-19 05:22:07 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-09-19 07:56:51 error : Write to dcb 0x7f406c008210 in state DCB_STATE_NOPOLLING fd 121 failed due errno 9, Bad file descriptor
2016-09-19 07:56:51 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-09-20 13:10:15 error : Write to dcb 0x7f40680027c0 in state DCB_STATE_NOPOLLING fd 41 failed due errno 9, Bad file descriptor
2016-09-20 13:10:15 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-09-20 16:28:13 error : Attempt to write buffered data to backend failed due internal inconsistent state.
2016-09-20 18:16:37 error : Write to dcb 0x7f40680055a0 in state DCB_STATE_NOPOLLING fd 153 failed due errno 9, Bad file descriptor
2016-09-20 18:16:37 error : Attempt to write buffered data to backend failed due internal inconsistent state.

Can you explain me what happen ?

Thx

Kind regards

Comment by markus makela [ 2016-09-22 ]

The error message in question is logged on line 644 in mysql_backend.c where the code is trying to write more data to a backend network socket but the state of the connection is not what is expected.

It is possible that there is a short window where the connection appears to be in a bad state (DCB_STATE_NOPOLLING) but it is still in the polling system. On line 393 in poll.c the state is changed before the connection is removed from the polling system. This is done to avoid locking the connection while it is being removed from the polling system. It is possible that inside this small window, after the state change and before the actual removal from the polling system, another thread picks up an outgoing write event and starts to process it. This other thread then sees that the connection is not in the proper state and proceeds to log an error message on line 644 in mysql_backend.c. Above this, on line 622, we can see that the dcb->writeq is accessed. As this is done without a lock, it is possible that invalid data is read. It could also be possible that this is what caused the crash that you saw earlier.

We've changed the reading and writing of data to backends in the upcoming 2.0.1 release of MaxScale. It is likely that the error message does not appear on 2.0.1 due to the fact that the code now handles state changes differently and grabs the internal connection buffers into local variables before processing them.

Comment by dsi [ 2016-09-22 ]

Hi,

Thx for your response.
If I understand, this a bug in Maxscale 1.4.3 and you recommend to upgrade in 2.0.1.
But I want to know if the config parameters are the same

Kind regards

Comment by dsi [ 2017-12-21 ]

Hi,

We have stil this crash who is happening once per mounth during those 3 last mounth.
We have a core file of 617MB. How Can we provide it to you ? Or How can we split it ?

Best Regards,

Comment by dsi [ 2017-12-21 ]

Hi,

We have stil this crash who is happening once per mounth during those 3 last mounth.
We have a core file of 617MB. How Can we provide it to you ? Or How can we split it ?

Best Regards,

Comment by Johan Wikman [ 2017-12-21 ]

Unfortunately, we do not anymore provide other than security fixes for MaxScale 1.4.

You can find documentation on the upgrading of MaxScale here: https://github.com/mariadb-corporation/MaxScale/tree/2.1/Documentation/Upgrading

Generated at Thu Feb 08 04:02:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.