Ring Buffer Logging For Fun And Profit

Sometimes, all we really need is a big pile of logs

The Problem

Logging isn't cheap, so we often disable debug level logs (and lower) when we deploy our code to production. When things are going well, this isn't an issue. But when an error log pops up, much of the context around what was happening in the code is missing. A single line about the stack trace often just isn't enough. It'd be pretty nice to have a more detailed picture of what was happening at the time of failure. If only we had turned on those debug and trace logs.

Well, there's a reason we don't. Simply put, it's expensive. Often, there's an order of magnitude more debug and trace level logs than all other logging levels combined. So if we were to log it all, we'd have to spend multiples more.

We have to choose between a bad option and a worse option:

Don't debug log and miss the crucial context when an error occurs
Log everything and pay up

This is where most conversations around this end. Option 2 is a non-starter so we go option 1. If an error pops up we try to replicate the bug locally with debugging on and work from there.

Using A Ring Buffer

Luckily, this is a false dichotomy. We have a third option. Use a ring buffer to store debug logs and flush it when an error occurs.

This technique is conceptually simple. It takes just a sentence to describe, and not much has changed about the way things are logged. We still use the same logging libraries and the same logging software patterns, but now we have access to far more information in our logs when things go wrong.

You can have your cake and eat it too. You can avoid logging verbosely while at the same time having access to those verbose logs when it matters.

An Example

Say that we're building a Django backend for our online widget store, and say that there's a bug somewhere in the pseudo code below.

# handles the order
def handle_order_post_request(req):
  customer = get_customer(req)
  if not is_authorized_customer(customer):
    return HttpStatusUnauthorized()

  orders = get_orders(req)
  verify_prices(orders) # verify that prices are up to date
  verify_inventory(orders) # removes items from checkout if their out of stock

  checkout_info = process_order(order)
  save_checkout(checkout_info)
  return HttpStatusOk(checkout_info)

# the code for processing orders
def process_order(orders: List[OrderItems]) -> CheckoutInfo:
  logger.debug("total price: %d", order.price)

  total_price = sum([order.price for order in orders])

  most_expensive_item = max(
    orders, key=lambda order: order.price
  )

  return CheckoutInfo(
    total_price,
    most_expensive_item,
  )

We've launched our widget store, and it's a hit. The widgets our flying off the shelves. Things are going great but every once in a while our backend errors and a would-be customer leaves our site unhappy. Without ring buffer logging, here are the logs.

[INFO handle_order_post_request] POST /checkout
[ERROR save_receipt] non-nullable field most_expensive_item is null

Something is going wrong, but what is it? with just a single line error it's difficult to tell what's going wrong even if we had the stack trace. However, if we had all the contextual debug logs we'd have far more useful information:

[INFO handle_order_post_request] POST /checkout
[DEBUG get_orders][ring_id 2022] number of orders: 2
[DEBUG verify_prices][ring_id 2022] prices changes: 1
[DEBUG verify_inventory][ring_id 2022] items out of stock: 2
[DEBUG total_price][ring_id 2022] total price: 0.0
[ERROR save_receipt][ring_id 2022] non-nullable field most_expensive_item is None

Now we see a more full picture of the story. The transaction began by processing two orders from the customer. One of the prices were updated, but both items were out of stock.

Aha! when items were out of stock, they were removed from the checkout meaning that there was no most expensive item since the list was empty. Now we know what to do to fix this problem.

Now, this example isn't enough to tell you whether or not ring buffer logging is worth it for your codebase. I'm not going to make an argument telling you what you should be doing when I have no context in what you deal with. But I definitely do recommend playing around with it to see if it works for you.

Some Useful Links

More On Ring Buffer Logging

Brian Marick made has this pdf that's a great reference for the use cases of this method.
Will Sargent made has this blog post. It shed's some light on the history of this method. Turns out we used it to go to the moon in the 70s.

Other Interesting Ways To Log

I've been building this python tool that let's you dynamically insert new logs to running python without restarts. Check out the repo on GitHub

If you know any other links that I missed, or just want to get in touch, email me at <my-first-name>@<my-last-name>.com

Shalom Yiblet

Ring Buffer Logging For Fun And Profit

The Problem

Using A Ring Buffer

An Example

Some Useful Links

Shalom
Yiblet