Website Caching a.k.a Full page caching — Learnings from housing.com Have you ever considered serving cached HTML pages for your dynamic websites? If your answer is among any of below list, this article might be worth reading “No, but what’s a big deal about it? I can do it anytime” “No, and I don’t see any benefit of the same” […]
Have you ever considered serving cached HTML pages for your dynamic websites? If your answer is among any of below list, this article might be worth reading
“No, but what’s a big deal about it? I can do it anytime”
“No, and I don’t see any benefit of the same”
“Yes, done it for static pages only, dynamic pages was too much to handle”
“Yes, we want to do it but there are too many unknowns”
“Yes, but never got a desired hit ratio”
As an engineer you must have heard of many caching techniques (memcache, redis, aerospike, CDN etc ) and very likely you might have implemented at least one of them for your live projects as well. However, we have been doing this to store either key value pairs, api response in json format, static assets etc. Some of the challenges/dilemma faced while working on caching includes “choosing cache key”, “policy to invalidate cache”, “Eviction policy”, etc.
Let’s discuss some of the similar challenges which we faced while making https://www.housing.com html pages cached.
Platform —We had primarily two choices
We at housing opted for AWS Cloudfront as most of our other infra is with AWS only.
Architecture — All requests/response are passed through a lambda function before hitting Cloudfront. Primarily request lambda is used to generate Cache key for the request.
Some fancy stuff are done in those request/response lambda functions which eventually helped in improving overall hit ratio and better logging/debugging as well. To list few of them
Cache hit Ratio — Cloudfront gives out of the box dashboard to check hit ratio of the CDN. Alternatively this can also be checked if detailed logs are enabled. Like any other e-commerce website housing.com also has millions of URL variations for SRP pages (search result page). Each url can have nearly 48 copies in the cache at a time. Maths goes like this:
2 A/B exps (average) * 2 variations of each exp (average) * 2 supported devices ( mobile and desktop) * 3 browser variations (modern, legacy and medium) * 2 request types (user and bots) = 48
With so many variations we are able to achieve cache hit ratio of around 40% by optimizing TTL values of different pages. As of now we have static value of TTL for different pages controlled in code base. We are planning to make all pages cached forever and burst the cache only if something changes in DB layer (going to be a next challenge).
Other considerations —Few other tricks done in order to have better control on the overall system includes
Conclusion — As it stands some benefits we are able to achieve for our overall traffic by enabling website caching:
Thanks to sachin agrawal for leading the project and making it live with 0 bugs (just kidding there were few but got patched as soon as they were identified). Sukhdeep Handa for review and suggestions.
Website Caching a.k.a Full page caching — Learnings from housing.com was originally published in Engineering @ Housing/Proptiger/Makaan on Medium, where people are continuing the conversation by highlighting and responding to this story.