[MXS-265] infinite non-blocking epoll loop Created: 2015-07-11  Updated: 2015-07-13  Resolved: 2015-07-13

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: None
Fix Version/s: 1.2.0

Type: Bug Priority: Major
Reporter: Yuval Hager Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None


 Description   

I'm not sure how to recreate this - it sometimes happen when a client disconnects while the backend server is still processing a request. With 4 threads configured, maxscale uses 300% CPU (I'm guessing 3 epoll infinite loops, 1 working with the backend server).

It's sometimes impossible to get debug information when this happens, but I managed to get 'show epoll':

MaxScale> show epoll
 
Poll Statistics.
 
No. of epoll cycles: 				109414647
No. of epoll cycles with wait: 			64
No. of epoll calls returning events: 		32
No. of non-blocking calls returning events: 	21
No. of read events:   				14
No. of write events: 				17
No. of error events: 				0
No. of hangup events:				3
No. of accept events:				3
No. of times no threads polling:		4
Current event queue length:			2
Maximum event queue length:			4
No. of DCBs with pending events:		1
No. of wakeups with pending queue:		2
No of poll completions with descriptors
	No. of descriptors	No. of poll completions.
	 1			32
	 2			0
	 3			0
	 4			0
	 5			0
	 6			0
	 7			0
	 8			0
	 9			0
	>= 10			0

Maxscale has been running for a few seconds, notice the amount of epoll cycles.

It seems to be an infinite non-blocking loop with no events, here's the gdb session:

458			if (pollStats.evq_pending == 0 && timeout_bias < 10)
(gdb) n
463			atomic_add(&n_waiting, 1);
(gdb) 
471			if (thread_data)
(gdb) 
473				thread_data[thread_id].state = THREAD_POLLING;
(gdb) 
476			atomic_add(&pollStats.n_polls, 1);
(gdb) 
477			if ((nfds = epoll_wait(epoll_fd, events, MAX_EVENTS, 0)) == -1)
(gdb) 
499			else if (nfds == 0 && pollStats.evq_pending == 0 && poll_spins++ > number_poll_spins)
(gdb) 
514				atomic_add(&n_waiting, -1);
(gdb) 
517			if (n_waiting == 0)
(gdb) 
523			if (nfds > 0)
(gdb) 
609			if (process_pollq(thread_id))
(gdb) 
612			if (thread_data)
(gdb) 
613				thread_data[thread_id].state = THREAD_ZPROCESSING;
(gdb) 
614			zombies = dcb_process_zombies(thread_id);
(gdb) 
615			if (thread_data)
(gdb) 
616				thread_data[thread_id].state = THREAD_IDLE;
(gdb) 
618			if (do_shutdown)
(gdb) 
633			if (thread_data)
(gdb) 
635				thread_data[thread_id].state = THREAD_IDLE;
(gdb) 
637		} /*< while(1) */
(gdb) 
458			if (pollStats.evq_pending == 0 && timeout_bias < 10)

Here's an excerpt from

{strace -tt}

:

17:05:48.556305 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556319 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556334 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556349 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556363 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556378 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556392 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556407 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556421 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556436 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556450 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556465 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556479 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556493 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556507 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556522 epoll_wait(12, {}, 1000, 0) = 0
17:05:48.556536 epoll_wait(12, {}, 1000, 0) = 0



 Comments   
Comment by Dipti Joshi (Inactive) [ 2015-07-11 ]

yhager What version of MaxScale and what OS are you running it on ?

Comment by Yuval Hager [ 2015-07-11 ]

I'm running on the develop branch, but I don't have the latest. I'll update it next week and update here.

Comment by Dipti Joshi (Inactive) [ 2015-07-13 ]

Reported issue should be against a stable release branch.

Generated at Thu Feb 08 03:58:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.