Website Caching a.k.a Full page caching — Learnings from housing.com

Website Caching a.k.a Full page caching — Learnings from housing.com Have you ever considered serving cached HTML pages for your dynamic websites? If your answer is among any of below list, this article might be worth reading “No, but what’s a big deal about it? I can do it anytime” “No, and I don’t see any benefit of the same” […]

Website Caching a.k.a Full page caching — Learnings from housing.com

Have you ever considered serving cached HTML pages for your dynamic websites? If your answer is among any of below list, this article might be worth reading

“No, but what’s a big deal about it? I can do it anytime”

“No, and I don’t see any benefit of the same”

“Yes, done it for static pages only, dynamic pages was too much to handle”

“Yes, we want to do it but there are too many unknowns”

“Yes, but never got a desired hit ratio”

As an engineer you must have heard of many caching techniques (memcache, redis, aerospike, CDN etc ) and very likely you might have implemented at least one of them for your live projects as well. However, we have been doing this to store either key value pairs, api response in json format, static assets etc. Some of the challenges/dilemma faced while working on caching includes “choosing cache key”, “policy to invalidate cache”, “Eviction policy”, etc.

Let’s discuss some of the similar challenges which we faced while making https://www.housing.com html pages cached.

Platform —We had primarily two choices

  1. self managed — Varnish or nginx caching can be a good option due to lower cost and flexibility. However maintenance can become concern while scaling up the application. Word of caution if google crawling time is something you care about, we had seen negative impact of Varnish caching with our another website Makaan.com.
  2. Cloud based solution — Cloudfront, Akamai CDN, Cloudflare etc are some of the top choices which offer out of the box solutions for problems like scaling, security and high availability. Though it has cost impact.

We at housing opted for AWS Cloudfront as most of our other infra is with AWS only.

Architecture — All requests/response are passed through a lambda function before hitting Cloudfront. Primarily request lambda is used to generate Cache key for the request.

Request flow

Some fancy stuff are done in those request/response lambda functions which eventually helped in improving overall hit ratio and better logging/debugging as well. To list few of them

  1. In Request Lambda, we removed all query parameters and cookies which didn’t impacted server response. Any parameters/cookies which are needed on client side are saved in headers. In Response lambda, before serving final response to users, the parameters are stitched back in the url and cookies are placed again.
  2. For requests which miss CDN cache, the request at origin server reaches with client information of Cloudfront instead of actual users. To overcome this all the client info required was patched in custom headers which is later used in web application servers
  3. For few specific routes, hit BE api and change url to improve hit ratio.
  4. Check request cookie and if page is part of any A/B experiment, adjust cache key accordingly
  5. There are many other checks are done which helps in setting Cache key which includes device type (Mobile/desktop) and browsers (to cover differential loading of modern and old browsers), vernac (hindi, english), login status etc.

Cache hit Ratio — Cloudfront gives out of the box dashboard to check hit ratio of the CDN. Alternatively this can also be checked if detailed logs are enabled. Like any other e-commerce website housing.com also has millions of URL variations for SRP pages (search result page). Each url can have nearly 48 copies in the cache at a time. Maths goes like this:

2 A/B exps (average) * 2 variations of each exp (average) * 2 supported devices ( mobile and desktop) * 3 browser variations (modern, legacy and medium) * 2 request types (user and bots) = 48

With so many variations we are able to achieve cache hit ratio of around 40% by optimizing TTL values of different pages. As of now we have static value of TTL for different pages controlled in code base. We are planning to make all pages cached forever and burst the cache only if something changes in DB layer (going to be a next challenge).

Other considerations —Few other tricks done in order to have better control on the overall system includes

  1. A jenkins jobs to burst cache for specific url patterns on demand
  2. Update Lambda code with each web release to make sure things remains in sync.
  3. Keep a way to bypass CDN for any url by appending a query parameter (ex passCache=true).
  4. Enabling Cloudfront logs, for debugging and monitoring

Conclusion — As it stands some benefits we are able to achieve for our overall traffic by enabling website caching:

  1. Improvement in TTFB of users (Time to first byte) ~200ms and Google-bot crawl time by ~250ms ⏩
  2. Load on main infra reduced by ~30% 💸
  3. For traffic surges during marketing campaigns, no need to worry about any infra scaling 🕊

Thanks to sachin agrawal for leading the project and making it live with 0 bugs (just kidding there were few but got patched as soon as they were identified). Sukhdeep Handa for review and suggestions.


Website Caching a.k.a Full page caching — Learnings from housing.com was originally published in Engineering @ Housing/Proptiger/Makaan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Housing.com