Benchmarks

Totally unscientific and mostly unrealistic benchmark that go-faster/ch project uses to understand performance.

The main goal is to measure minimal client overhead (CPU, RAM) to read data, i.e. data blocks deserialization and transfer.

Please see Notes for more details about results.

SELECT number FROM system.numbers_mt LIMIT 500000000

500000000 rows in set. Elapsed: 0.503 sec.
Processed 500.07 million rows,
  4.00 GB (993.26 million rows/s., 7.95 GB/s.)

Note: due to row-oriented design of most libraries, overhead per single row is significantly higher, so results can be slightly surprising.

Name	Time	RAM	Ratio
go-faster/ch (Go)	347ms	9M	~1x
clickhouse-client (C++)	381ms	91M	~1x
clickhouse-rs (Rust, inferred¹)	490ms	192M	1.41x
clickhouse-cpp (C++)	531ms	6.9M	1.53x
vahid-sohrabloo/chconn (Go)	750ms	12M	2.16x
clickhouse-jdbc (Java, HTTP)	10s	702M	28x
loyd/clickhouse.rs (Rust, HTTP)	10s	7.2M	28x
clickhouse-rs (Rust)	27s	192M	77x
clickhouse-driver (Python)	37s	60M	106x
clickhouse-go (Go)	38s	184M	109x
mailru/go-clickhouse (Go, HTTP)	4m13s	13M	729x

See RESULTS.md and RESULTS.slow.md.

Keeping go-faster/ch and clickhouse-client to ~1x because they are always equal and there is no point to calculate relative speedup.

Notes

C++

Mean results are identical and C++ has much lower dispersion:

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`clickhouse-cpp`	575.2 ± 36.5	531.3	686.1	1.00
`clickhouse-client`	611.5 ± 161.1	393.2	1102.6	1.06 ± 0.29
`go-faster`	626.4 ± 90.9	395.5	805.1	1.09 ± 0.17

We are selecting best result, so picking 393 ms vs 531 ms, while mean results are much closer.

Rust

Benchmarks were performed on Ryzen 9 5950x, where Rust behaves surprisingly bad:

Benchmark 1: go-faster
  Time (mean ± σ):     644.6 ms ±  53.8 ms    [User: 109.7 ms, System: 352.5 ms]
  Range (min … max):   586.8 ms … 719.4 ms    5 runs

Benchmark 2: clickhouse-cpp
  Time (mean ± σ):     579.5 ms ±  23.2 ms    [User: 381.7 ms, System: 185.1 ms]
  Range (min … max):   541.8 ms … 599.0 ms    5 runs

Benchmark 3: clickhouse-rs
  Time (mean ± σ):     27.122 s ±  1.342 s    [User: 26.129 s, System: 1.024 s]
  Range (min … max):   24.760 s … 28.106 s    5 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 4: vahid-sohrabloo/chconn
  Time (mean ± σ):      5.066 s ±  0.115 s    [User: 4.632 s, System: 0.535 s]
  Range (min … max):    4.901 s …  5.204 s    5 runs

Benchmark 5: clickhouse-go
  Time (mean ± σ):     38.254 s ±  0.098 s    [User: 74.100 s, System: 1.179 s]
  Range (min … max):   38.120 s … 38.366 s    5 runs

Benchmark 6: clickhouse-client
  Time (mean ± σ):     507.6 ms ±  97.2 ms    [User: 135.3 ms, System: 197.7 ms]
  Range (min … max):   408.5 ms … 615.7 ms    5 runs

However, on Intel results are much closer:

Benchmark 1: ch-bench-rust
  Time (mean ± σ):      5.309 s ±  1.845 s    [User: 4.852 s, System: 0.727 s]
  Range (min … max):    2.055 s …  8.683 s    10 runs

Benchmark 2: ch-bench-faster
  Time (mean ± σ):      1.435 s ±  0.138 s    [User: 0.364 s, System: 0.767 s]
  Range (min … max):    1.122 s …  1.588 s    10 runs

Summary
  'ch-bench-faster' ran
    3.70 ± 1.33 times faster than 'ch-bench-rust'

Also, on AMD EPYC they are even closer:

$ hyperfine ch-bench-rust ch-bench-faster
Benchmark 1: ch-bench-rust
  Time (mean ± σ):      3.949 s ±  1.324 s    [User: 2.133 s, System: 2.188 s]
  Range (min … max):    2.672 s …  6.198 s    10 runs

Benchmark 2: ch-bench-faster
  Time (mean ± σ):      2.020 s ±  0.091 s    [User: 0.348 s, System: 1.399 s]
  Range (min … max):    1.893 s …  2.225 s    10 runs

Summary
  'ch-bench-faster' ran
    1.95 ± 0.66 times faster than 'ch-bench-rust'

Please create an issue to help me improve results on Ryzen 9 5950x if it is possible, Rust client is pretty good and should perform better.

Maximum possible speed

I've measured my localhost performance using iperf3, getting 10 GiB/s, this correlates with top results.

For example, one of go-faster/ch results is 390ms 500000000 rows 4.0 GB 10 GB/s.

I've also implemented mock server in Go that simulates ClickHouse server to reduce overhead, because currently the main bottleneck in this test is server itself (and probably localhost). The go-faster/ch was able to achieve 257ms 500000000 rows 4.0 GB 16 GB/s which should be maximum possible burst result, but I'm not 100% sure.

On go-faster/ch micro-benchmarks I'm getting up to 27 GB/s, not accounting of any network overhead (i.e. inmemory).

Not real measurement, extrapolated from AMD EPYC results to Ryzen 9. See notes on Rust. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
bin		bin
ch-bench-chconn		ch-bench-chconn
ch-bench-cpp		ch-bench-cpp
ch-bench-faster-multiple		ch-bench-faster-multiple
ch-bench-faster		ch-bench-faster
ch-bench-java		ch-bench-java
ch-bench-mailru		ch-bench-mailru
ch-bench-official		ch-bench-official
ch-bench-python		ch-bench-python
ch-bench-rust-http		ch-bench-rust-http
ch-bench-rust		ch-bench-rust
ch-write-bench-faster		ch-write-bench-faster
ch-write-bench-official		ch-write-bench-official
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RESULTS.md		RESULTS.md
RESULTS.slow.md		RESULTS.slow.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarks

Notes

C++

Rust

Maximum possible speed

About

Releases

Packages

Languages

License

jwilm0028/ch-bench

Folders and files

Latest commit

History

Repository files navigation

Benchmarks

Notes

C++

Rust

Maximum possible speed

Footnotes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages