Counting hits using Cloudflare workers

02-Jul-2023

Introduction

One of the best parts of having a server is the ability to quickly spin up and deploy ideas on it. If you have a server that can run projects then ideas of projects come more frequently than when you don’t have somewhere to deploy them and have to go hunting for scraps in ‘free’ compute services

I have a server running on DigitalOcean that has a couple of services running. You can get $200 in credit with DigitalOcean using my referral link -> https://m.do.co/c/b5f565690240

Some of the public apps I have running are

The backend that allows searching for recently posted reddit comments - https://rcs.aawadia.dev/
A curation of various interesting conversations - https://conversations.aawadia.dev/home
Some misc API functionality such as object detection, language detection, and sql parser - https://blog.aawadia.dev/api/

And some private apps such as a Filebrowser [https://github.com/filebrowser/filebrowser], a Postgres DB, and a Docker registry.

Analytics

With these services running, it is always nice to get some usage numbers. What options are there to get a gauge of how much traffic the services are receiving?

The easiest one is Google analytics directly embedded in the frontend. But what about backend? Do you look at the bandwidth usage of the server? Or do you look at the Cloudflare request dashboard? Either way it felt a bit disjoint but also the numbers are always different between the different solutions. What I also wanted was something that could allow me to cross verify the data points I was getting from Google and Cloudflare.

Old school hit counters

Back in the early days of the web, websites would show a hit counter on the page somewhere - sometimes they would even do a live counter on how many users were on the website at that moment too.

I took inspiration from those hit counters and wanted something similar for my services. A hit counter that gets incremented anytime the hostname gets hit.

The counter store server

Fundamentally the counter store that keeps track of the hits is a Key-Value store where the key is the hostname of the service and the value is the counter that continuously gets incremented.

Counting things is extremely common so I built it once as a server and exposed it as an API, which stores all this information on the server itself using RocksDB.

The client

Now that the server is ready to store this information there needs to be something that sits at the very edge and sends count requests for each service that gets hit.

The code itself is relatively straightforward but where do you run it? The answer is in the description itself. The code needs to run at the ‘edge’ - it needs to run between the request and response life cycle like a middleware. That is exactly where Cloudflare workers come in and sit. Cloudflare is my DNS provider and thus can easily inject code and knows exactly which service is being accessed.

The code gets written in a javascript [typescript and rust are other options] file and uses the wrangler node package for publishing and testing. It is a decent dev experience for small middleware type functionality - better than writing lambdas/faas.

First step is to install the wrangler package and login to cloudflare

1 2	npm install -g wrangler wrangler login

Then init a new project using wrangler init requests-tracking-analytics

Thankfully Cloudflare worker devs understand the dev life cycle and there is a dev server functionality built in - so you can test your code before pushing it to production/live.

I like to live on the edge, so I yolo deploy it via npx wrangler publish src/index.js --name requests-tracking-analytics

The next step is to go to the Cloudflare dashboard and connect the worker to the urls that should trigger it.

The logs

All console.log messages can be streamed live directly in the dashboard - this is very useful during debugging

Bot detection

I realised that most of the traffic I get is from bots. Cloudflare does have bot detection built in and it can pass that data to your worker so that you can skip counting hits on those requests but it is available only on the higher paid plans

The code

The code attaches an event handler as a middleware and is regular javascript code that has access to the incoming request and what the outgoing response should be

// the main entry point that you must fill
addEventListener("fetch", (event) => {
  // fail open if the worker fails - don't want to fail the client request
  event.passThroughOnException()
  event.respondWith(handleRequest(event.request))
})

// my code
async function handleRequest(request) {
  // avoid recursive calls - respond to the client as is
  if (checkForRecursion(request)) {
    return await fetch(request)
  }

  const requestOptions = {
    method: "POST",
    headers: {
      "content-type": "application/json;charset=UTF-8"
    }
  }
  // grab the hostname being accessed
  let counterName = request.url
  if (request.headers.has("host")) {
    counterName = request.headers.get("host")
  }

  // asynchronously make the request to the counter service
  // don't wait for this to finish before responding back to the client
  fetch("https://api/" + counterName, requestOptions)

  // return the respose to the client
  return await fetch(request)
}

The outcome

Now I can easily grab a dump of the counters at any given moment to see how much the services are being accessed.

{
  "counter:aawadia.dev": 1475,
  "counter:api.aawadia.dev": 1060,
  "counter:blog.aawadia.dev": 30595,
  "counter:consulting.aawadia.dev": 1214,
  "counter:conversations.aawadia.dev": 3171,
  "counter:rcs.aawadia.dev": 1382,
  "counter:wordle.aawadia.dev": 128,
  "counter:wschat.aawadia.dev": 53
}

Daily analytics

I opted to keep it as a ‘total’ counter instead of giving it any sort of time based granularity because I didn’t want to create tonnes of different counters.

How to get the analytics in more granular time based manner?

This is where github actions’ cron schedule feature comes into play. I have a small kotlin app that gets triggered every morning which takes a snapshot of the counter state and saves it to a local sqlite DB that gets committed to the same git repo.

The code looks something like

fun trackCounterInSqlite(host: String, count: Int) {
  val db = Jdbi.create("jdbc:sqlite:counters.db")
  db.useHandle<Nothing> {
    // the timestamp is automatically inserted with a default of (strftime('%Y-%m-%dT%H:%M:%fZ', 'now'))
    it.createUpdate("insert into counters (name, count) values (:name, :count)")
      .bind("name", host)
      .bind("count", count)
      .execute()
  }
}

For any sort of dashboard-ing or further analytics the sqlite database can then be imported into https://sqliteviz.com/ and queried as required.

Conclusion

Next time you need some middleware style functionality running check out Cloudflare workers as an option. They are priced generously and provide other features such as KV/SQL DBs and object store as well if required.