[MCOL-3735] Investigate using unix sockets instead of full tcp/ip sockets for localhost connections Created: 2020-01-20  Updated: 2020-01-31  Resolved: 2020-01-31

Status: Closed
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: None
Fix Version/s: 1.4.3

Type: Task Priority: Minor
Reporter: Patrick LeBlanc (Inactive) Assignee: Patrick LeBlanc (Inactive)
Resolution: Won't Do Votes: 0
Labels: None

Sprint: 2020-1

 Description   

Investigate whether or not there is a benefit to using unix sockets for localhost connections. Really, this is to test whether linux is smart enough to skip the tcp & ip stacks for localhost connections.



 Comments   
Comment by Patrick LeBlanc (Inactive) [ 2020-01-20 ]

Will resume after a couple bug fixes.

Comment by Patrick LeBlanc (Inactive) [ 2020-01-24 ]

Mixed results in initial testing.

{{Benchmarking unix sockets mod
-----------------------------
time mariadb tpch1 -e 'select l_orderkey from lineitem order by 1 limit 100;'
develop-1.4:

  • first run: 5.1s, PrimProc 18.11s, ExeMgr 12.25s
  • second run: 4.7s, PrimProc 32.71s, ExeMgr 24.29s

localhost-sockets:

  • first run: 4.9s, PrimProc 16.84s, ExeMgr 11.54s
  • second run: 4.7, PrimProc 28.93s, ExeMgr 23.03s

time mariadb tpch1 -e 'select l_comment from lineitem order by 1 limit 100;'
develop-1.4:

  • first run: 25.8s, PrimProc 1:59.27, ExeMgr 1:34.19
  • second run: 23.5s, PrimProc 3:34.93, ExeMgr 3:02.50

localhost-sockets:

  • first run: 33.7s, PrimProc 1:45.61, ExeMgr 1:34.31
  • second run: 37.8s, PrimProc 3:00.57, ExeMgr 3:00.27}}
Comment by Patrick LeBlanc (Inactive) [ 2020-01-30 ]

Some add'l test results. Ran the DBT3 queries against a 10GB load. These timings are on the 2nd run of a query, where all of the data is cached. There are also time gaps between queries to ensure there is no CPU time being spent on cleanup tasks when a query starts.

At least on my laptop & with queries like this, it looks like there's no benefit to using unix sockets over tcp/ip sockets. Will repeat these benchmarks on a 'real' machine later.

localhost-sockets:
ExeMgr CPU time: 3:45.15
PrimProc CPU time: 28:26.45
1.sql: 0:06.78
2.sql: 0:00.80
3.sql: 0:01.37
4.sql: 0:03.50
5.sql: 0:01.91
6.sql: 0:00.70
7.sql: 0:06.69
8.sql: 0:01.57
9.sql: 0:22.46
10.sql: 0:03.24
11.sql: 0:00.44
12.sql: 0:02.57
13.sql: 0:06.46
14.sql: 0:01.11
15.sql: 0:01.53
16.sql: 0:01.03
17.sql: 0:12.05
18.sql: 0:05.82
19.sql: 0:15.80
20.sql: 0:03.96
21.sql: 0:10.06
22.sql: 0:01.63

develop-1.4:
ExeMgr CPU Time: 3:37.82
PrimProc CPU Time: 27:11.10
1.sql: 0:06.50
2.sql: 0:00.79
3.sql: 0:01.45
4.sql: 0:04.05
5.sql: 0:01.79
6.sql: 0:00.63
7.sql: 0:06.00
8.sql: 0:01.56
9.sql: 0:20.75
10.sql: 0:03.58
11.sql: 0:00.39
12.sql: 0:02.22
13.sql: 0:06.54
14.sql: 0:01.08
15.sql: 0:01.36
16.sql: 0:01.00
17.sql: 0:11.42
18.sql: 0:05.58
19.sql: 0:14.92
20.sql: 0:04.14
21.sql: 0:09.48
22.sql: 0:01.62

Comment by Patrick LeBlanc (Inactive) [ 2020-01-31 ]

A big PM join query. 30GB DBT3 load.

set columnstore_ordered_only=on;
select count from orders, lineitem where l_orderkey = o_orderkey;

(2nd run, data is fully cached)
localhost-sockets: 17.5s ExeMgr 0:26.35 PrimProc 2:55.81 observed sys utilization at 6-9% during xfer
develop-1.4: 17.6s ExeMgr 0:26.67s PrimProc 2:58.10 observed sys utilization at 6-9% during xfer

This isn't worth bringing into the code at the moment. If there is overhead in using TCP/IP sockets for a localhost connection, it's minimal. I'll keep this in the 'localhost-sockets' branch of my fork in case we want to experiment with it later at some point.

Generated at Thu Feb 08 02:45:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.