%PDF- %PDF- Mini Shell
Mini Shell

Direktori : /usr/share/doc/bpftrace/examples/
Current File : //usr/share/doc/bpftrace/examples/ssllatency_example.txt
Demonstrations of ssllatency, the Linux bpftrace/eBPF version.

ssllatency traces OpenSSL handshake functions. It's the statistical summary
version of sslsnoop. This is useful for performance analysis with different
crypto algorithms or async SSL acceleration by CPU or offload device.
For example:

# wrk -t 1 -c 10 -d 1s -H 'Connection: close' https://localhost:443/0kb.bin
Running 1s test @ https://localhost:443/0kb.bin
  1 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   839.94us  323.68us   5.98ms   98.50%
    Req/Sec     1.28k     9.05     1.29k    54.55%
  1400 requests in 1.10s, 414.26KB read
  Non-2xx or 3xx responses: 1400
Requests/sec:   1272.97
Transfer/sec:    376.67KB

# ./ssllatency.bt
Attaching 26 probes...
Tracing SSL/TLS handshake... Hit Ctrl-C to end.
^C
Latency distribution in microsecond:

@hist[SSL_write]:
[0, 200)            1401 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@hist[SSL_read]:
[0, 200)            1401 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@hist[SSL_do_handshake]:
[600, 800)          1359 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[800, 1000)           12 |                                                    |
[1000, ...)           32 |@                                                   |
@hist[rsa_ossl_private_decrypt]:
[600, 800)          1359 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[800, 1000)           44 |@                                                   |
@histF[SSL_do_handshake]:
[0, 200)            1410 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@histF[SSL_read]:
[0, 200)            2804 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

@stat[SSL_read]: count 1401, average 2, total 2834
@stat[SSL_write]: count 1401, average 5, total 7062
@stat[rsa_ossl_private_decrypt]: count 1403, average 643, total 902780
@stat[SSL_do_handshake]: count 1403, average 706, total 991605

@statF[SSL_read]: count 2804, average 1, total 3951
@statF[SSL_do_handshake]: count 1410, average 29, total 41964

This output shows latency distribution for wrk benchmark which saturated
one CPU core used by nginx server. wrk issued 1400 requests in 1.10s, and
req/s is 1272. Server side RSA function is counted 1403 times averaging
643us latency. And there's same amount(1410/1403) of failed/successful
SSL_do_handshake calls for the round trip. This is the default behavior.

# wrk -t 1 -c 10 -d 1s -H 'Connection: close' https://localhost:443/0kb.bin
Running 1s test @ https://localhost:443/0kb.bin
  1 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   448.67us  148.67us   1.28ms   82.00%
    Req/Sec     2.95k    43.03     2.99k    80.00%
  2933 requests in 1.00s, 867.87KB read
  Non-2xx or 3xx responses: 2933
Requests/sec:   2930.53
Transfer/sec:    867.14KB

# ./ssllatency.bt
Attaching 26 probes...
Tracing SSL/TLS handshake... Hit Ctrl-C to end.
^C
Latency distribution in microsecond:

@hist[SSL_write]:
[0, 200)            2933 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@hist[SSL_read]:
[0, 200)            2933 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@hist[SSL_do_handshake]:
[0, 200)            2941 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@histF[SSL_read]:
[0, 200)            5873 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@histF[SSL_do_handshake]:
[0, 200)            5884 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

@stat[SSL_read]: count 2933, average 4, total 12088
@stat[SSL_write]: count 2933, average 7, total 20683
@stat[SSL_do_handshake]: count 2941, average 51, total 151149

@statF[SSL_read]: count 5873, average 2, total 13942
@statF[SSL_do_handshake]: count 5884, average 19, total 113061

This is the hardware accelerated result by using async SSL and CPU crypto
SIMD. req/s is more than doubled under same wkr workload. Peak throughput
can be more than 3x if adding more wrk connections. Keep using same
workload for comparison.

libcrypto_mb.so is used instead of libcrypto.so, to batch process multiple
async requests simultaneously by SIMD. As a result, wrk issued 2933 requests
in 1s. Failed SSL_do_handshake calls has doubled(5884), and successful
calls(2941) returned quickly(51us). This is expected from async routines.

The above effect is based on the huge bottleneck of RSA which is very CPU
intensive. If change to ECDSA, the overhead would be much less, and overall
improvement is less obvious.
Zerion Mini Shell 1.0