Measuring Golang and Echo against Kotlin and Vert.x

It works

23-Sep-2020

Intro

Software bench-marking is extremely tough but it is something I really enjoy doing. Whether that is running apache bench on HTTP servers, running redis-benchmark, or pgbench - it is always interesting seeing how various tweaks impact performance.

Two of my go to languages for building anything are Kotlin and Golang. In those, two of my go to libraries to build HTTP services are Vert.x and Echo respectively. It was natural instinct to see how they perform when put under a stress test.

The internet is filled with comments around the JVM being slower compared to native code - It is, while the code gets JIT’d but after that it should perform at a similar level as native code.

App code

Let’s look at the code for the two applications.

fun main() {
  Vertx.vertx()
    .createHttpServer()
    .requestHandler { it.response().end("") }
    .listen(9999)
}

func main() {
	e := echo.New()
	e.GET("/", func(c echo.Context) error {
		c.String(http.StatusOK, "")
		return nil
	})
	e.Logger.Fatal(e.Start(":1323"))
}

I am using Openj9 JVM on JDK 13.

$ java -version
openjdk version "13.0.1" 2019-10-15
OpenJDK Runtime Environment AdoptOpenJDK (build 13.0.1+9)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.17.0, JRE 13 Mac OS X amd64-64-Bit Compressed References 20191031_101 (JIT enabled, AOT enabled)
OpenJ9   - 77c1cf708
OMR      - 20db4fbc
JCL      - c973c65658 based on jdk-13.0.1+9)

Both languages start a server on their respective port and return a 200 on / with an empty string response. Something to keep in mind is that the vert.x http server is single threaded where as the echo server is not.

Curl timing

The first thing I wanted to see was a breakdown of the time it takes to do a single request. We can see the breakdown in curl by using the following file

# curl-format.txt
time_namelookup:  %{time_namelookup}s\n
time_connect:  %{time_connect}s\n
time_appconnect:  %{time_appconnect}s\n
time_pretransfer:  %{time_pretransfer}s\n
time_redirect:  %{time_redirect}s\n
time_starttransfer:  %{time_starttransfer}s\n
                    ----------\n
time_total:  %{time_total}s\n

$ curl -w "@curl-format.txt" -o /dev/null -s http://localhost:9999/

As expected the first request to the JVM application was quite slow coming in at around 180ms

$ curl -w "@curl-format.txt" -o /dev/null -s http://localhost:9999/
time_namelookup:  0.004885s
time_connect:  0.005071s
time_appconnect:  0.000000s
time_pretransfer:  0.005100s
time_redirect:  0.000000s
time_starttransfer:  0.180707s
                    ----------
time_total:  0.180724s

The first request to the Go server was already quite optimised and responded in under 5ms

$ curl -w "@curl-format.txt" -o /dev/null -s http://localhost:1323/
time_namelookup:  0.004139s
time_connect:  0.004329s
time_appconnect:  0.000000s
time_pretransfer:  0.004378s
time_redirect:  0.000000s
time_starttransfer:  0.004816s
                    ----------
time_total:  0.004830s

The JVM does optimisations based on the traffic it sees, which became evident as I started making a few more requests.

$ curl -w "@curl-format.txt" -o /dev/null -s http://localhost:9999/
time_namelookup:  0.005040s
time_connect:  0.005238s
time_appconnect:  0.000000s
time_pretransfer:  0.005287s
time_redirect:  0.000000s
time_starttransfer:  0.006986s
                    ----------
time_total:  0.006998s

$ curl -w "@curl-format.txt" -o /dev/null -s http://localhost:9999/
time_namelookup:  0.004111s
time_connect:  0.004386s
time_appconnect:  0.000000s
time_pretransfer:  0.004437s
time_redirect:  0.000000s
time_starttransfer:  0.005847s
                    ----------
time_total:  0.005859s

$ curl -w "@curl-format.txt" -o /dev/null -s http://localhost:1323/
time_namelookup:  0.004139s
time_connect:  0.004329s
time_appconnect:  0.000000s
time_pretransfer:  0.004378s
time_redirect:  0.000000s
time_starttransfer:  0.004816s
                  ----------
time_total:  0.004830s

The response times fell from 180ms down to 5ms once the JVM had optimised for that code path and the code was running at similar speeds to the native code that Go had produced.

Load testing

The next step was to hammer them with HTTP request using wrk.

Wrk was configured to use 2 connections and 1 thread for 60 seconds. Since the server and load tester were both running on the same machine I wanted to limit the amount of resources wrk would use.

The Go echo server results are as follows. It was able to achieve 37k requests per seconds with an average latency of 50 micro seconds and Stdev of 44 microseconds.

$ wrk -t 1 -c 2 -d60s http://localhost:1323/
Running 1m test @ http://localhost:1323/
  1 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    50.46us   43.79us   6.44ms   99.07%
    Req/Sec    37.30k     2.53k   41.54k    80.87%
  2230715 requests in 1.00m, 246.78MB read
Requests/sec:  37116.78
Transfer/sec:      4.11MB

The Kotlin vertx server results are as follows. It was able to achieve 47k requests per seconds with an average latency of 271 micro seconds and Stdev of 4.8 milliseconds.

$ wrk -t 1 -c 2 -d60s http://localhost:9999/
Running 1m test @ http://localhost:9999/
  1 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   270.89us    4.82ms 165.23ms   99.67%
    Req/Sec    48.16k    12.36k   59.36k    82.14%
  2870424 requests in 1.00m, 104.02MB read
Requests/sec:  47837.12
Transfer/sec:      1.73MB

Conclusion

What that seems to indicate is that Vert.x was able to achieve a higher throughput but with a much wider band on response times where as Go sacrificed throughput to keep response times very tight, which makes sense since Go is optimised for low latency.

Optimising for high throughput AND low latency is quite difficult. A good read on this trade-off can be found here

Pick your technology stack based on your application requirements and not hacker news headlines.