Debug like never before

You have written a good amount of code, and finally have an application ready to serve its users. But how would you see what’s really going on inside this application? The best way to do this, as the the twelve-factor app rightly suggests, is logging. “Logs provide visibility into the behavior of a running app.” However, running […]

You have written a good amount of code, and finally have an application ready to serve its users. But how would you see what’s really going on inside this application? The best way to do this, as the the twelve-factor app rightly suggests, is logging.

“Logs provide visibility into the behavior of a running app.”

However, running a piece of code in development is very different from running it in production. When running locally, you just have a single server responding to the tiny number of requests you are sending. You can clearly see the stream of logs in your terminal and know exactly what’s happening. That is not the case when you’re running at least two servers for high availability or load distribution. It gets even more troublesome to comprehend if you have a Service Oriented Architecture (SOA), where you have hundreds (if not thousands) of servers. And these servers keep coming and going based on scaling events. If you’re downloading logs from each server repeatedly, you’re doing it wrong. In that case, how do we deal with this kind of situation?

Enter, Centralized logging.

This seems the most natural of approaches. You have a central location, where you can see logs from all your servers, filter them based on service, log type and of course, time. Maybe even parse them to create fields. Or even perform a full-text search.

So centralized logging is not so bad after all. But how do we make use of it? There are quite a number of options, including hosted logging services. We went with an approach that is cost effective for our scale. The key areas: collection, shipping, storage, analysis and alerting. The solution – Elasticsearch, Logstash and Kibana – commonly known as the ELK stack.

Basic architecture of ELK stack

Logstash-forwarder runs as an agent on every server, tailing the required log files and shipping them to the central logstash server via lumberjack protocol. Logstash parses these logs line by line, according to the provided configuration. Logstash has a lot of plugins for different kinds of inputs, parsing filters and outputs. In our case, we use lumberjack input plugin for receiving log events from logstash-forwarders running on various servers. In addition to that, we use S3 input plugin to get in the the AWS Elastic Load Balancer (ELB) logs. For better analysis and aggregation, we parse these logs using:

  • Grok filter for nginx and ELB logs
  • JSON filter for application logs (we put logs in json format)
  • Multiline filter for keeping stack-traces together
  • date filter to keep the timestamps in logs in sync

The output goes to an Elasticsearch cluster which indexes and stores the logs, making it ready to be accessed and searched via Kibana. Kibana is basically a front-end app that connects to the Elasticsearch cluster and queries it for relevant logs. It can be used to simply view the logs, based on a number of filters, or draw charts based on the logs. For retention purposes, we just let the logs sit in Elasticsearch for about a month or so. The older indices are deleted using curator.

To make ELK highly available and reliable, we use multiple logstash instances behind an ELB. Elasticsearch cluster also runs multiple data and master nodes for replication, and a client node for load balancing. There is one thing missing here, and that is Redis. A lot of people use Redis as a broker before logstash. It acts as a buffer for coming logs in case logstash machine is down under heavy load. In our experience, we didn’t need it since we’re running multiple logstash instances behind a load balancer. Also, logstash-forwarder spools the logs in case the target logstash instance is unable to receive them.

Kibana 4 dashboard

That’s pretty much about it. Don’t wait, put on your debugging goggles, and play with ELK. Do share your thoughts or questions in the comments below. 🙂

P.S.: logstash-forwarder project has been recently replaced by filebeat. That would mean just use filebeat instead of logstash-forwarder on your servers.


Debug like never before was originally published in Coding Big — The Crowdfire Engineering Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Crowdfire