Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-265

infinite non-blocking epoll loop

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: Core
    • Labels:
      None

      Description

      I'm not sure how to recreate this - it sometimes happen when a client disconnects while the backend server is still processing a request. With 4 threads configured, maxscale uses 300% CPU (I'm guessing 3 epoll infinite loops, 1 working with the backend server).

      It's sometimes impossible to get debug information when this happens, but I managed to get 'show epoll':

      MaxScale> show epoll
       
      Poll Statistics.
       
      No. of epoll cycles: 				109414647
      No. of epoll cycles with wait: 			64
      No. of epoll calls returning events: 		32
      No. of non-blocking calls returning events: 	21
      No. of read events:   				14
      No. of write events: 				17
      No. of error events: 				0
      No. of hangup events:				3
      No. of accept events:				3
      No. of times no threads polling:		4
      Current event queue length:			2
      Maximum event queue length:			4
      No. of DCBs with pending events:		1
      No. of wakeups with pending queue:		2
      No of poll completions with descriptors
      	No. of descriptors	No. of poll completions.
      	 1			32
      	 2			0
      	 3			0
      	 4			0
      	 5			0
      	 6			0
      	 7			0
      	 8			0
      	 9			0
      	>= 10			0

      Maxscale has been running for a few seconds, notice the amount of epoll cycles.

      It seems to be an infinite non-blocking loop with no events, here's the gdb session:

      458			if (pollStats.evq_pending == 0 && timeout_bias < 10)
      (gdb) n
      463			atomic_add(&n_waiting, 1);
      (gdb) 
      471			if (thread_data)
      (gdb) 
      473				thread_data[thread_id].state = THREAD_POLLING;
      (gdb) 
      476			atomic_add(&pollStats.n_polls, 1);
      (gdb) 
      477			if ((nfds = epoll_wait(epoll_fd, events, MAX_EVENTS, 0)) == -1)
      (gdb) 
      499			else if (nfds == 0 && pollStats.evq_pending == 0 && poll_spins++ > number_poll_spins)
      (gdb) 
      514				atomic_add(&n_waiting, -1);
      (gdb) 
      517			if (n_waiting == 0)
      (gdb) 
      523			if (nfds > 0)
      (gdb) 
      609			if (process_pollq(thread_id))
      (gdb) 
      612			if (thread_data)
      (gdb) 
      613				thread_data[thread_id].state = THREAD_ZPROCESSING;
      (gdb) 
      614			zombies = dcb_process_zombies(thread_id);
      (gdb) 
      615			if (thread_data)
      (gdb) 
      616				thread_data[thread_id].state = THREAD_IDLE;
      (gdb) 
      618			if (do_shutdown)
      (gdb) 
      633			if (thread_data)
      (gdb) 
      635				thread_data[thread_id].state = THREAD_IDLE;
      (gdb) 
      637		} /*< while(1) */
      (gdb) 
      458			if (pollStats.evq_pending == 0 && timeout_bias < 10)

      Here's an excerpt from

      {strace -tt}

      :

      17:05:48.556305 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556319 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556334 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556349 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556363 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556378 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556392 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556407 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556421 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556436 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556450 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556465 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556479 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556493 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556507 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556522 epoll_wait(12, {}, 1000, 0) = 0
      17:05:48.556536 epoll_wait(12, {}, 1000, 0) = 0

        Attachments

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            yhager Yuval Hager
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.