NUMA hardware is becoming more common. Access to RAM that is not local to the CPU nodes is more expensive than accessing it locally. MariaDB should implement mechanisms to optimize the workload to keep CPUs of a node accessing their local memory.
example numa architecture:
$ numactl --hardware
available: 2 nodes (0,8)
node 0 cpus: 0 8 16 24 32 40 48 56 64 72
node 0 size: 130705 MB
node 0 free: 80310 MB
node 8 cpus: 80 88 96 104 112 120 128 136 144
node 8 size: 130649 MB
node 8 free: 81152 MB
node distances:
node 0 8
0: 10 40
8: 40 10
Components of the implementation include:
A meaningful configuration that makes conflicts with existing settings obvious
each innodb buffer pool instances to be constrained to a NUMA node
SQL threads to be allocated by a user configurable map of one or more of user, connecting host, default database (based on initial connection)
The user SQL thread will be pinned to CPUs associated with a node
Innodb accesses by the SQL thread will be to/from the innodb buffer pool instances first
Accounting of CPU/memory utilization for the mapping identifier to enable automated or configuration based of node to this mapping identifier.
Innodb background threads to be per node to facilitate the innodb buffer pool instance processing locally
(Marko, Jan, et al. please edit with important design/implementation details)
I'm willing to mentor this (with help).
Attachments
Issue Links
relates to
MDEV-5774Enable numa interleaving by default when required conditions are met
Thanks!This sounds interesting and useful, a good project.
It'll need to be much more clearly defined though, unless you expect a student
to fill in all blanks (it's a valid assumption, but, in my opinion, a bit
optimistic).
A couple of thoughts:
What kind of meaningful configuration? Example?
pinned SQL threads - ok, and any thread local allocation should use the
appropriate NUMA node.
Sergei Golubchik
added a comment - Thanks!This sounds interesting and useful, a good project.
It'll need to be much more clearly defined though, unless you expect a student
to fill in all blanks (it's a valid assumption, but, in my opinion, a bit
optimistic).
A couple of thoughts:
What kind of meaningful configuration? Example?
pinned SQL threads - ok, and any thread local allocation should use the
appropriate NUMA node.
Implementation plan from a configuration point of view.
numa=off,on - this will be a read only system variable and a mysqld start option to enable numa which defaults to off
numa_scheduler=
{user,host,db}
- this will be one or more of these elements by which the server will allocate a node.
numa_scheduler_host_mask = a CIDR that is applied to the host for the purposes of numa scheduling (IPv6?)
thread_handling=one-thread-per-connection - With thread cache entries - threads will have a numa node node assigned. Threads with an desired numa affinity will be used before altering the affinity of an existing thread
thread_handling=pool-of-threads (Unix only) - thread_pool_size will be limited to multiples the number of numa nodes. Each thread has affinity to the CPUs corresponding to the numa node.
innodb_buffer_pool_instances - will start of with being one per node only when numa is enabled - can expand if time permits
innodb_read_io_threads and innodb_write_io_threads - default to two threads per nodes and both affinity bound.
innodb_page_cleaners one per number node
Best guesses so far:
The mysqld client thread loop will be bound to a node as will innodb_encryption_threads
Unsure how to handle
slave_parallel_threads - mapping per domain, per (master) connection or just group them
innodb_ft_sort_pll_degree
innodb_mtflush_threads
innodb_purge_threads
Threads generally - will have sched_setaffinity/SetThreadAffinityMask set to the cpu set corresponding to the numa node.
Daniel Black
added a comment - Implementation plan from a configuration point of view.
numa=off,on - this will be a read only system variable and a mysqld start option to enable numa which defaults to off
numa_scheduler=
{user,host,db}
- this will be one or more of these elements by which the server will allocate a node.
numa_scheduler_host_mask = a CIDR that is applied to the host for the purposes of numa scheduling (IPv6?)
thread_handling=one-thread-per-connection - With thread cache entries - threads will have a numa node node assigned. Threads with an desired numa affinity will be used before altering the affinity of an existing thread
thread_handling=pool-of-threads (Unix only) - thread_pool_size will be limited to multiples the number of numa nodes. Each thread has affinity to the CPUs corresponding to the numa node.
innodb_buffer_pool_instances - will start of with being one per node only when numa is enabled - can expand if time permits
innodb_read_io_threads and innodb_write_io_threads - default to two threads per nodes and both affinity bound.
innodb_page_cleaners one per number node
Best guesses so far:
The mysqld client thread loop will be bound to a node as will innodb_encryption_threads
Unsure how to handle
slave_parallel_threads - mapping per domain, per (master) connection or just group them
innodb_ft_sort_pll_degree
innodb_mtflush_threads
innodb_purge_threads
Threads generally - will have sched_setaffinity/SetThreadAffinityMask set to the cpu set corresponding to the numa node.
NUMA implementation will be abstracted and support Windows equivalent NUMA functions - https://msdn.microsoft.com/en-us/library/windows/desktop/aa363804(v=vs.85).aspx
Eventually - persistent table of mappings
Out of scope:
MyISAM - key cache segments maybe eventually
All other storage engines
Eventually we'll have to bind table cache instances and MDL instances (not yet implemented) to NUMA nodes. As well as PFS counters and some status variables. Please keep this in mind.
Sergey Vojtovich
added a comment - Eventually we'll have to bind table cache instances and MDL instances (not yet implemented) to NUMA nodes. As well as PFS counters and some status variables. Please keep this in mind.
innodb_mtflush_threads and its replacement were removed, and in MDEV-23855 the single page cleaner thread was simplified. MDEV-16264 refactored many of the InnoDB background threads into tasks.
I think that it would be very challenging make all users of the buffer pool aware of NUMA (say, actively migrate execution threads to the NUMA node that owns most of the data that is likely to be addressed). I wonder if it could make sense to partition the buf_pool.page_hash in such a way that pages would be mapped to NUMA nodes by some simple formula like page_id.raw()%N_NUMA. All entries of a buf_pool_numa[i].page_hash would point to buffer pool block descriptors and blocks that reside in that NUMA node. I think that we should keep a global buf_pool.LRU and buf_pool.flush_list in any case.
Marko Mäkelä
added a comment - innodb_mtflush_threads and its replacement were removed, and in MDEV-23855 the single page cleaner thread was simplified. MDEV-16264 refactored many of the InnoDB background threads into tasks.
I think that it would be very challenging make all users of the buffer pool aware of NUMA (say, actively migrate execution threads to the NUMA node that owns most of the data that is likely to be addressed). I wonder if it could make sense to partition the buf_pool.page_hash in such a way that pages would be mapped to NUMA nodes by some simple formula like page_id.raw()%N_NUMA . All entries of a buf_pool_numa[i].page_hash would point to buffer pool block descriptors and blocks that reside in that NUMA node. I think that we should keep a global buf_pool.LRU and buf_pool.flush_list in any case.
People
Unassigned
Daniel Black
Votes:
1Vote for this issue
Watchers:
9Start watching this issue
Dates
Created:
Updated:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":998.2000000178814,"ttfb":237.40000000596046,"pageVisibility":"visible","entityId":60499,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"44f25125-f967-455b-9687-d16abe67ce40","navigationType":0,"readyForUser":1062.9000000059605,"redirectCount":0,"resourceLoadedEnd":917.4000000059605,"resourceLoadedStart":242.30000001192093,"resourceTiming":[{"duration":255.40000000596046,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2bv2/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":242.30000001192093,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":242.30000001192093,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":497.7000000178814,"responseStart":0,"secureConnectionStart":0},{"duration":255.40000000596046,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2bv2/820016/12ta74/2380add21a9a1006587582385952de73/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":242.59999999403954,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":242.59999999403954,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":498,"responseStart":0,"secureConnectionStart":0},{"duration":265.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/e9b27a47da5fb0f74a35acd57e9847fb-CDN/lu2bv2/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":242.7000000178814,"connectEnd":242.7000000178814,"connectStart":242.7000000178814,"domainLookupEnd":242.7000000178814,"domainLookupStart":242.7000000178814,"fetchStart":242.7000000178814,"redirectEnd":0,"redirectStart":0,"requestStart":242.7000000178814,"responseEnd":508.2000000178814,"responseStart":508.2000000178814,"secureConnectionStart":242.7000000178814},{"duration":366.80000001192093,"initiatorType":"script","name":"https://jira.mariadb.org/s/c32eb0da7ad9831253f8397e6cc26afd-CDN/lu2bv2/820016/12ta74/2380add21a9a1006587582385952de73/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":243,"connectEnd":243,"connectStart":243,"domainLookupEnd":243,"domainLookupStart":243,"fetchStart":243,"redirectEnd":0,"redirectStart":0,"requestStart":243,"responseEnd":609.8000000119209,"responseStart":609.8000000119209,"secureConnectionStart":243},{"duration":370.40000000596046,"initiatorType":"script","name":"https://jira.mariadb.org/s/bc0bcb146314416123c992714ee00ff7-CDN/lu2bv2/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":243.09999999403954,"connectEnd":243.09999999403954,"connectStart":243.09999999403954,"domainLookupEnd":243.09999999403954,"domainLookupStart":243.09999999403954,"fetchStart":243.09999999403954,"redirectEnd":0,"redirectStart":0,"requestStart":243.09999999403954,"responseEnd":613.5,"responseStart":613.5,"secureConnectionStart":243.09999999403954},{"duration":370.7999999821186,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":243.2000000178814,"connectEnd":243.2000000178814,"connectStart":243.2000000178814,"domainLookupEnd":243.2000000178814,"domainLookupStart":243.2000000178814,"fetchStart":243.2000000178814,"redirectEnd":0,"redirectStart":0,"requestStart":243.2000000178814,"responseEnd":614,"responseStart":614,"secureConnectionStart":243.2000000178814},{"duration":371,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":243.40000000596046,"connectEnd":243.40000000596046,"connectStart":243.40000000596046,"domainLookupEnd":243.40000000596046,"domainLookupStart":243.40000000596046,"fetchStart":243.40000000596046,"redirectEnd":0,"redirectStart":0,"requestStart":243.40000000596046,"responseEnd":614.4000000059605,"responseStart":614.4000000059605,"secureConnectionStart":243.40000000596046},{"duration":406.2000000178814,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2bv2/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":243.59999999403954,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":243.59999999403954,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":649.8000000119209,"responseStart":0,"secureConnectionStart":0},{"duration":371.09999999403954,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":243.7000000178814,"connectEnd":243.7000000178814,"connectStart":243.7000000178814,"domainLookupEnd":243.7000000178814,"domainLookupStart":243.7000000178814,"fetchStart":243.7000000178814,"redirectEnd":0,"redirectStart":0,"requestStart":243.7000000178814,"responseEnd":614.8000000119209,"responseStart":614.8000000119209,"secureConnectionStart":243.7000000178814},{"duration":405.90000000596046,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2bv2/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":244,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":244,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":649.9000000059605,"responseStart":0,"secureConnectionStart":0},{"duration":371.30000001192093,"initiatorType":"script","name":"https://jira.mariadb.org/s/719848dd97ebe0663199f49a3936487a-CDN/lu2bv2/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":244.09999999403954,"connectEnd":244.09999999403954,"connectStart":244.09999999403954,"domainLookupEnd":244.09999999403954,"domainLookupStart":244.09999999403954,"fetchStart":244.09999999403954,"redirectEnd":0,"redirectStart":0,"requestStart":244.09999999403954,"responseEnd":615.4000000059605,"responseStart":615.4000000059605,"secureConnectionStart":244.09999999403954},{"duration":484.2000000178814,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":249.59999999403954,"connectEnd":249.59999999403954,"connectStart":249.59999999403954,"domainLookupEnd":249.59999999403954,"domainLookupStart":249.59999999403954,"fetchStart":249.59999999403954,"redirectEnd":0,"redirectStart":0,"requestStart":249.59999999403954,"responseEnd":733.8000000119209,"responseStart":733.8000000119209,"secureConnectionStart":249.59999999403954},{"duration":479.19999998807907,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":254.90000000596046,"connectEnd":254.90000000596046,"connectStart":254.90000000596046,"domainLookupEnd":254.90000000596046,"domainLookupStart":254.90000000596046,"fetchStart":254.90000000596046,"redirectEnd":0,"redirectStart":0,"requestStart":254.90000000596046,"responseEnd":734.0999999940395,"responseStart":734.0999999940395,"secureConnectionStart":254.90000000596046},{"duration":29.19999998807907,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":660.9000000059605,"connectEnd":660.9000000059605,"connectStart":660.9000000059605,"domainLookupEnd":660.9000000059605,"domainLookupStart":660.9000000059605,"fetchStart":660.9000000059605,"redirectEnd":0,"redirectStart":0,"requestStart":660.9000000059605,"responseEnd":690.0999999940395,"responseStart":690.0999999940395,"secureConnectionStart":660.9000000059605},{"duration":224,"initiatorType":"link","name":"https://jira.mariadb.org/s/d5715adaadd168a9002b108b2b039b50-CDN/lu2bv2/820016/12ta74/be4b45e9cec53099498fa61c8b7acba4/_/download/contextbatch/css/jira.project.sidebar,-_super,-project.issue.navigator,-jira.general,-jira.browse.project,-jira.view.issue,-jira.global,-atl.general,-com.atlassian.jira.projects.sidebar.init/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":693.4000000059605,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":693.4000000059605,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":917.4000000059605,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":59,"responseStart":237,"responseEnd":255,"domLoading":240,"domInteractive":1133,"domContentLoadedEventStart":1133,"domContentLoadedEventEnd":1184,"domComplete":1341,"loadEventStart":1341,"loadEventEnd":1342,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1107.9000000059605},{"name":"bigPipe.sidebar-id.end","time":1108.5999999940395},{"name":"bigPipe.activity-panel-pipe-id.start","time":1108.800000011921},{"name":"bigPipe.activity-panel-pipe-id.end","time":1111},{"name":"activityTabFullyLoaded","time":1201.2000000178814}],"measures":[],"correlationId":"82c984c969b49","effectiveType":"4g","downlink":9.3,"rtt":0,"serverDuration":110,"dbReadsTimeInMs":12,"dbConnsTimeInMs":20,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Thanks!This sounds interesting and useful, a good project.
It'll need to be much more clearly defined though, unless you expect a student
to fill in all blanks (it's a valid assumption, but, in my opinion, a bit
optimistic).
A couple of thoughts:
appropriate NUMA node.