103 lines
4.4 KiB
Plaintext
103 lines
4.4 KiB
Plaintext
Demonstrations of ssllatency, the Linux bpftrace/eBPF version.
|
|
|
|
ssllatency traces OpenSSL handshake functions. It's the statistical summary
|
|
version of sslsnoop. This is useful for performance analysis with different
|
|
crypto algorithms or async SSL acceleration by CPU or offload device.
|
|
For example:
|
|
|
|
# wrk -t 1 -c 10 -d 1s -H 'Connection: close' https://localhost:443/0kb.bin
|
|
Running 1s test @ https://localhost:443/0kb.bin
|
|
1 threads and 10 connections
|
|
Thread Stats Avg Stdev Max +/- Stdev
|
|
Latency 839.94us 323.68us 5.98ms 98.50%
|
|
Req/Sec 1.28k 9.05 1.29k 54.55%
|
|
1400 requests in 1.10s, 414.26KB read
|
|
Non-2xx or 3xx responses: 1400
|
|
Requests/sec: 1272.97
|
|
Transfer/sec: 376.67KB
|
|
|
|
# ./ssllatency.bt
|
|
Attaching 26 probes...
|
|
Tracing SSL/TLS handshake... Hit Ctrl-C to end.
|
|
^C
|
|
Latency distribution in microsecond:
|
|
|
|
@hist[SSL_write]:
|
|
[0, 200) 1401 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
@hist[SSL_read]:
|
|
[0, 200) 1401 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
@hist[SSL_do_handshake]:
|
|
[600, 800) 1359 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
[800, 1000) 12 | |
|
|
[1000, ...) 32 |@ |
|
|
@hist[rsa_ossl_private_decrypt]:
|
|
[600, 800) 1359 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
[800, 1000) 44 |@ |
|
|
@histF[SSL_do_handshake]:
|
|
[0, 200) 1410 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
@histF[SSL_read]:
|
|
[0, 200) 2804 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
|
|
@stat[SSL_read]: count 1401, average 2, total 2834
|
|
@stat[SSL_write]: count 1401, average 5, total 7062
|
|
@stat[rsa_ossl_private_decrypt]: count 1403, average 643, total 902780
|
|
@stat[SSL_do_handshake]: count 1403, average 706, total 991605
|
|
|
|
@statF[SSL_read]: count 2804, average 1, total 3951
|
|
@statF[SSL_do_handshake]: count 1410, average 29, total 41964
|
|
|
|
This output shows latency distribution for wrk benchmark which saturated
|
|
one CPU core used by nginx server. wrk issued 1400 requests in 1.10s, and
|
|
req/s is 1272. Server side RSA function is counted 1403 times averaging
|
|
643us latency. And there's same amount(1410/1403) of failed/successful
|
|
SSL_do_handshake calls for the round trip. This is the default behavior.
|
|
|
|
# wrk -t 1 -c 10 -d 1s -H 'Connection: close' https://localhost:443/0kb.bin
|
|
Running 1s test @ https://localhost:443/0kb.bin
|
|
1 threads and 10 connections
|
|
Thread Stats Avg Stdev Max +/- Stdev
|
|
Latency 448.67us 148.67us 1.28ms 82.00%
|
|
Req/Sec 2.95k 43.03 2.99k 80.00%
|
|
2933 requests in 1.00s, 867.87KB read
|
|
Non-2xx or 3xx responses: 2933
|
|
Requests/sec: 2930.53
|
|
Transfer/sec: 867.14KB
|
|
|
|
# ./ssllatency.bt
|
|
Attaching 26 probes...
|
|
Tracing SSL/TLS handshake... Hit Ctrl-C to end.
|
|
^C
|
|
Latency distribution in microsecond:
|
|
|
|
@hist[SSL_write]:
|
|
[0, 200) 2933 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
@hist[SSL_read]:
|
|
[0, 200) 2933 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
@hist[SSL_do_handshake]:
|
|
[0, 200) 2941 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
@histF[SSL_read]:
|
|
[0, 200) 5873 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
@histF[SSL_do_handshake]:
|
|
[0, 200) 5884 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
|
|
@stat[SSL_read]: count 2933, average 4, total 12088
|
|
@stat[SSL_write]: count 2933, average 7, total 20683
|
|
@stat[SSL_do_handshake]: count 2941, average 51, total 151149
|
|
|
|
@statF[SSL_read]: count 5873, average 2, total 13942
|
|
@statF[SSL_do_handshake]: count 5884, average 19, total 113061
|
|
|
|
This is the hardware accelerated result by using async SSL and CPU crypto
|
|
SIMD. req/s is more than doubled under same wkr workload. Peak throughput
|
|
can be more than 3x if adding more wrk connections. Keep using same
|
|
workload for comparison.
|
|
|
|
libcrypto_mb.so is used instead of libcrypto.so, to batch process multiple
|
|
async requests simultaneously by SIMD. As a result, wrk issued 2933 requests
|
|
in 1s. Failed SSL_do_handshake calls has doubled(5884), and successful
|
|
calls(2941) returned quickly(51us). This is expected from async routines.
|
|
|
|
The above effect is based on the huge bottleneck of RSA which is very CPU
|
|
intensive. If change to ECDSA, the overhead would be much less, and overall
|
|
improvement is less obvious.
|