[MDEV-32537] Set thread names for MariaDB Server threads Created: 2023-10-21  Updated: 2024-01-25

Status: Stalled
Project: MariaDB Server
Component/s: None
Fix Version/s: 11.5

Type: Task Priority: Major
Reporter: Valerii Kravchuk Assignee: Vladislav Vaintroub
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Multi-threaded Linux programs (like Firefox or MySQL since 8.0.27 or so) usually set a name for each thread. This name will appear in the output of top and other utilities and would help to answer questions like "What thread uses most of CPU" etc.

Consider this example of MySQL 8.0.29:

openxs@ao756:~/dbs/8.0$ for task in $(ls /proc/$(pidof mysqld)/task/); do name=$(cat /proc/$(pidof mysqld)/task/${task}/comm); echo "TASK: ${task} (${name})"; done
TASK: 567049 (mysqld)
TASK: 567079 (ib_io_ibuf)
TASK: 567080 (ib_io_log)
TASK: 567081 (ib_io_rd-1)
TASK: 567082 (ib_io_rd-2)
TASK: 567083 (ib_io_rd-3)
TASK: 567084 (ib_io_rd-4)
TASK: 567085 (ib_io_wr-1)
TASK: 567086 (ib_io_wr-2)
TASK: 567087 (ib_io_wr-3)
TASK: 567088 (ib_io_wr-4)
TASK: 567089 (ib_pg_flush_co)
TASK: 567096 (ib_log_checkpt)
TASK: 567097 (ib_log_fl_notif)
TASK: 567098 (ib_log_flush)
TASK: 567099 (ib_log_wr_notif)
TASK: 567100 (ib_log_writer)
TASK: 567114 (ib_srv_lock_to)
TASK: 567115 (ib_srv_err_mon)
TASK: 567116 (ib_srv_mon)
TASK: 567117 (ib_buf_resize)
TASK: 567118 (ib_src_main)
TASK: 567119 (ib_dict_stats)
TASK: 567120 (ib_fts_opt)
TASK: 567122 (xpl_worker-1)
TASK: 567123 (xpl_worker-2)
TASK: 567124 (xpl_accept-1)
TASK: 567893 (ib_buf_dump)
TASK: 567894 (ib_clone_gtid)
TASK: 567895 (ib_srv_purge)
TASK: 567896 (ib_srv_wkr-1)
TASK: 567897 (ib_srv_wkr-2)
TASK: 567898 (ib_srv_wkr-3)
TASK: 567899 (evt_sched)
TASK: 567900 (sig_handler)
TASK: 567902 (xpl_accept-3)
TASK: 567903 (gtid_zip)

For MariaDB (10.6 here for example) we get:

openxs@ao756:~/dbs/maria10.6$ for task in $(ls /proc/$(pidof mariadbd)/task/); do name=$(cat /proc/$(pidof mariadbd)/task/${task}/comm); echo "TASK: ${task} (${name})"; done
TASK: 568093 (mariadbd)
TASK: 568095 (mariadbd)
TASK: 568096 (mariadbd)
TASK: 568097 (mariadbd)
TASK: 568098 (mariadbd)
TASK: 568099 (mariadbd)
TASK: 568104 (mariadbd)
TASK: 568105 (mariadbd)
openxs@ao756:~/dbs/maria10.6$

See also https://bugs.mysql.com/bug.php?id=70858

It makes sense to use the same thread name as used in Performance Schema.



 Comments   
Comment by Vladislav Vaintroub [ 2023-10-21 ]

Compared to MySQL, MariaDB has only few single purpose threads. There is timer thread, there is page cleaner thread, there is a main thread, that accepts connections, and nothing interesting otherwise. there can be a bunch of background threads, from a threadpool, that handle innodb background work, and a bunch of foreground threads, that handle user queries ( from thread pool or not). I'm wondering if you can gain much from such info

Comment by Valerii Kravchuk [ 2023-10-21 ]

I'd really be happy to know if CPU is used by a specific foreground user thread, some of slave applier threads or some background InnoDB thread.

Comment by Nikita Malyavin [ 2024-01-18 ]

left a few notes in https://github.com/MariaDB/server/commit/7e53e7729bf58a0e806424d538ed086f44304c1b

Comment by Sergei Golubchik [ 2024-01-18 ]

May be, let's avoid multiplying entities without a necessity?
There's performance schema, it already has names for all threads. For example

static PSI_thread_info all_mysys_threads[]=
{
  { &key_thread_timer, "statement_timer", PSI_FLAG_GLOBAL}
};

the thread is started as

  if (mysql_thread_create(key_thread_timer, &timer_thread, &thr_attr,
                          timer_handler, NULL))

so in mysql_thread_create() performance schema can use pthread_setname_np() or whatever for the return value of pthread_create(), setting the name for the new thread. And users will see the same name in ps and in select * from performance_schema.threads. Less confusion.

Comment by Vladislav Vaintroub [ 2024-01-18 ]

serg, this does not work.

  • You assume the name can be always set from outside. It can't, portably. On macOS, our supported platform, there is no thread handle or ID that you can use this pthread_setname_np() with.
  • It can make sense, to change thread name, sometimes. For example, if the connection id is 10, one might want, in debug version say, change name of the thread to conn_10. Makes it easier to find your connection
  • Not all threads are created with mysql_thread_create. There are some with std::thread.
    There are some that are created by the OS-controlled threadpool , e.g on Windows where OS
    takes over the thread lifecycle. I'd like to have a thread name nonetheless, to distinguish between connection thread pool and Innodb's background threads.

Thread names are for debugging and profiling, and that's it.

Comment by Sergei Golubchik [ 2024-01-18 ]

I thought another option could be to set the name inside the thread, but have a DBUG_ASSERT that it matches the name in perfschema. Then it'll still be "one entity" from the user point of view, even if we'll specify the name in two places (or one can #define STATEMENT_TIMER_THREAD_NAME and use it everywhere).

As for many threads with the same name, I thought it'd be useful to print them like, say, "statement_timer:1234" where "1234" is, ideally, the id in the performance_schema.threads.

The point is still to show not some arbitrary info but something that correlates with other information sources. For example, one can use ps to find the thread that takes 100% CPU and then use performance_schema to find out more.

Comment by Vladislav Vaintroub [ 2024-01-23 ]

serg I sort-of did this verification in the last patch. Not sure it is a good idea, and strictly speaking we can't have the very same entity from the user's point of view.

Because, Linux does not accept long names (where long is > 15 characters), one can't have thread names like "thread/innodb/page_cleaner", you can't even have names like 'innodb/page_cleaner'. Only names like 'page_cleaner' are possible, due to Linux limitations.

I thought a bit about many threads with the same name, I actually would prefer them to have the same name by default (unless maybe an option is set).
The profiler I use does a nice aggregation based on thread names, so I can figure out how much CPU foreground threads are using, and how much background ones, I'd rather not lose it. Besides, already some work was done to show OS thread id and what not in processlist and performance_schema.threads, this can help find thread during debugging

Comment by Sergei Golubchik [ 2024-01-24 ]

Thanks. That explains why firefox names threads like

AudioIP~ver RPC
Backgro~ #51598
BgIOThr~ol #600
CanvasRenderer
Compositor
Cookie
DNS Res~r #1879
DNS Res~r #1882
DNS Res~r #1883

Agree about not appending numeric ids, not even for your profiler sake, but simply because the length is such a scarce resource, we shouldn't waste it on something that can be stored elsewhere, such a thread id.

Then what about — the same name as the last component of name in perfschema. Assert if such a last component is longer than 15 characters, it'll limit our imagination a bit, but I'm sure we'll manage.

PERFORMANCE_SCHEMA.THREADS has columns NAME and OS_THREAD_ID, they should match the values from

ps H -o tid,comm

("match" in the above sense. for tid it's "equal", for "comm" it's "last component of")

Generated at Thu Feb 08 10:32:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.