Five Milestones on the Road to the Cloud HomeAway’s cloud migration strategy uses change data capture (CDC) streams to unlock data at rest and accelerate our movement to the cloud. Our event streams presently unlock data from over 25 services using over 200 relational tables. This data feeds 400 new data structures in distributed data technologies […]
HomeAway’s cloud migration strategy uses change data capture (CDC) streams to unlock data at rest and accelerate our movement to the cloud. Our event streams presently unlock data from over 25 services using over 200 relational tables. This data feeds 400 new data structures in distributed data technologies such as Apache Cassandra, Neo4j, MongoDB, and Elasticsearch. Are we done? Not a chance, but we have a winning pattern. Our goal is to unlock all legacy data.
Your architecture, number of services, and amount of data that must be migrated can make your cloud migration daunting. That is why it is so important to have a strategy. I often hear mandates to shut down services by a certain date, but there is often little thought of downstream components, intelligent routing, and strangulation.
In most cases, attempts to move to the cloud either fail entirely or are completed much later than estimated. I think it is important for teams to think through the entirety of the problem. The challenge in moving to the cloud is not as simple as moving your piece of the puzzle. Typically, there is a larger ecosystem that has to remain functional while migrating to the cloud.
The following diagram is a small snippet of a representation of interactions at HomeAway before applying the strangler pattern. It depicts how two services connect at both a service level and a data level.
When I included this image in a technical presentation, the audience experienced sheer disbelief and astonishment. After my talk, I was told by a few — jokingly — that the image hurt their heads.
In retrospect, I presented this image prematurely. My intent was to illustrate our capability to identify dependencies throughout the stack, but what the audience saw was a demotivating, tangled mess.
Looking at the dependency graph, teams began to ask themselves, “How can my service move to the cloud, if my dependencies are not ready to go?” This is the point at which I think reality set in.
The problem with dependency tracking is the burden is always on the other team, which quickly becomes a finger pointing match. For example, X is waiting on Y, Y is waiting on Z, and so on. In this world, no services move to the cloud.
The question that I asked myself and my team was, “How do we non-intrusively move services to the cloud ahead of their dependencies?” We began work on a blueprint which includes deprecation and decoupling of on-premise data center resources.
We spent the better part of last year laying the foundation we will now build upon. The infrastructure, pipelines, and tooling required to accelerate teams to the cloud are now available.
The most important reason to consider a strangler application over a cut-over rewrite is reduced risk. A strangler can give value steadily and the frequent releases allow you to monitor its progress more carefully. Many people still don’t consider a strangler since they think it will cost more — I’m not convinced about that. Since you can use shorter release cycles with a strangler you can avoid a lot of the unnecessary features that cut over rewrites often generate. — Martin Fowler
Martin Fowler identified the strangler pattern a number of years ago as a way to migrate legacy applications while minimizing risk. We started to look at ways we can leverage this thought process to streamline our cloud migration blueprint. The primary components of the blueprint are the cornerstone of the strangler pattern: event interception and asset capture.
The asset that we are all after is data; therefore, to be successful, one must have the ability to intercept or subscribe to the system of record (SOR) change stream. There are not a lot of tools in this market. Most CDC tools only work with a particular class of data technology and not for a polyglot environment. HomeAway developed a tool to act as an event interceptor and capture the change data stream from both SQL and NoSQL data platforms and, importantly, to synchronize data between those platforms.
The tool is called DataSync. DataSync is a CDC service that reads changes from the commit/transaction log of multiple data platforms such as SQL Server, MongoDB, Kafka, and Cassandra. These change events are persisted to an internally developed pub/sub event store called Photon. It allows us to write/consume a continuous stream of data mutations. Photon provides bi-directional synchronization capability amongst heterogenous data platforms, with strong consistency guarantees and exactly-once semantics. For example, you can stream changes from SQL Server to Cassandra or from Cassandra to SQL Server, or you can synchronize Cassandra and MongoDB.
The last key ingredient is to have a router of some kind redirect traffic to the legacy or new microservice depending on the functionality at hand. My preference here would be to use intelligent routing at the edge layer, but there are multiple ways to achieve the desired result.
With these tools in place, we can move any service to the cloud ahead of its dependencies while maintaining legacy service contracts.
There are 5 milestones to complete the strangulation of any given service. For the example, let’s assume there are three monolithic services hitting the same database. Within a single service, “Legacy Service 1,” three microservices are identified.
The following storyboards illustrate the process of strangling “Legacy Service 1.”
Milestone 0 is ground zero and where most legacy services are today.
The first milestone is about rearchitecting legacy services as cloud optimized microservices. Once in the cloud, each new microservice can read local data; however, writes will occur in the original data center.
The second milestone places emphasis on establishing a heterogeneous/homogeneous multi-master, so writes in both the data center and the cloud can be synchronized. This milestone is the turning of the tide because reads and writes can be served in the cloud. Strangulation cannot occur without having completed this stage.
You will remain in milestone 2 as long as there are services or functionality dependent on the legacy service.
The third milestone places emphasis on iterating the pattern applied in milestone 2, until all functionality for a given service has been rearchitected into cloud optimized microservices.
The is the most important milestone. The fifth and final milestone is about decommissioning strangled services.
Moving to the cloud can be difficult and having a strategic plan is critical. The key element to our strategy at HomeAway is leveraging CDC for asset capture and event redirection.
One of the biggest challenges companies will face is unwinding their dependencies. Based on my experience, dependencies are typically what prolong cloud initiatives. Unlocking data at rest provides a means to move services and data to the cloud ahead of their dependencies and opens the door for strangler pattern.
I believe there are 5 milestones to the strangler pattern:
Note: Each of these phases will have a router or logic that determines which service (legacy or new) to route to for specific functionality
I believe in learning from and leveraging the thoughts others have shared in the software community. I hope my sharing HomeAway’s cloud migration strategy gives back to that community and encourages dialog about alternative strategies.