How to simplify your Cloud Data Migration Strategy?
One structured approach for efficient Cloud Data Migration Strategy
There are new trends, in the technology industry after every few years. The phenomenon impact that Artificial Intelligence (AI) is creating in today’s tech world is remarkable. AI has been there for a long time, but this specific era of AI is all about scaling the foundational models that we have been building for so long. The advancement in technology, infrastructure, and cloud computing is supporting the scale that these model architectures demand and hence the push towards - Every company, in the future, will be an AI company.
The ‘need to scale’ drives innovation and trends
In 2015-2016, I remember seeing a big push from cloud providers to take advantage of the scale that the cloud provides to the infrastructure to move workloads to the cloud. Right around that time, there was the Big Data tsunami. Organizations needed the scale to manage the Big Data problem and open-source frameworks like Hadoop-based architectures provided the solution to the big-data problem.
Every-time technology breakthroughs have happened are when there is an easier and more structured way of doing things, such that the focus of the organization is to drive business initiatives.
The key strategy in such trends is to determine how the organization adopts such a structured approach. The answer is clearly defining the ‘needs VS wants’ and the ‘as-is Vs to-be’. In short, by a well-defined transformation approach backed by a strong architectural strategy.
The new trends of AI and its flavors – adaptive AI, generative AI, etc. also will go through these phases with defined approaches and architecture for better fit until they become mainstream implementation needs.
The structured approach drives the adoption
One example from the past is when there was a big push to move things to the cloud, Gartner1 rationalized a structured approach by defining the “5-Rs - Retire, Re-host, Rebuild, Re-architecture, Re-purpose” for cloud migration. A simplistic way to decide what workloads can go in the cloud and takes which route.
This approach worked fine for workloads in the form of services, applications, and endpoints. There are several articles available on this, and every cloud vendor has a version to rationalize cloud migration.
When my customers with large data footprints are transforming, there is always a question about how to consolidate data and which data sources to move to the cloud, and how. In any given enterprise, there are many data sources and targets. Over many years, there is a lot of data that flows through the systems, and data is made available for BI. A lot of time there are multiple data silos created when access to data becomes a challenge. When moving a lot of data, there is redundancy of data.
When such organizations go through a transformation, there are many key objectives for the best value proposition and ROI to migrate to the cloud including consolidation of data footprint.
One way of doing this is the big-bang approach. Just move everything to the cloud with data-migration tools available and land them into Data Lake, and we figure out everything as we go. But that is not a good start for any transformation projects.
My experience, leading several successful transformation projects across various industry verticals, is simple. I understand the end goal of what the customer is trying to achieve. Then I work backward toward the current state analysis. One such approach that has helped my customer, A Data Tagging Process which is similar to the 5 Rs that Gartner defined for cloud migration.
The Data Tagging Process to simplify transformation & Adoption
Each of the data sources or “System of record” can be tagged as follows. The list below is naturally sorted from least to greatest value drivers. For example, the highest value proposition of data migration to the cloud will be achieved by doing numbers 3,4 & 5.
1. Retire: Data that is seating in the source or target but is not relevant or used by any application, can be tagged for retirement. With this tag, it is clear that this data needed not to be considered for migration. Some examples of such data sources can be redundant copies of data that have become stale over some time. This data source was a one-off report need or quick access to newly acquired raw data sources for some temporary analysis purposes. It can also be old data backups for temporary retention etc.
2. Rehost or lift/shift: Data that needs to be lifted/shifted or re-hosted as-is. E.g., if the data exists in a structured format inside a relational database and there are dependencies that this data needs to stay in a structured format, then such data sources would have to be tagged as lift/shift sources. Some examples of this can be data sources to collect logging/ exception handling data. It can also be a 3rd party backend database which would also be rehosted in the cloud; hence the backend has to be rehosted. Please note that there are better cloud-native logging and exception management tools available. In that case, this example may be relevant in the Refactor Data category.
3. Rebuild or Re-imagine. There is some thought process involved in either rebuilding or reimagining this data. One use-case can be due to some regulatory, compliance, or other 3rd party vendor dependency that this data may not be moved into the cloud. The source of this data may have to stay on-prem and in the same structure. The key is to re-image how such data sources can be co-located with other cloud data sources such that there is limited data movement. The best way to do this is via Data Virtualization. The data virtualization tools avoid the redundancy and movement of data. They provide on-demand access to data by directly connecting to the sources. There are cloud-ready tools available such as TIBCO, Denodo, or Dremio that can play a key role in achieving this. There can be other examples in case the data is acquired through a 3rd party data-provider. Does the data provider have modern ways of acquiring data in real-time through SaaS end-points rather than getting periodic point-in-time bulk updates that need to be stored?
4. Refactor/ Re-Architect: This is a key category when we start putting the cloud investments to work and start taking advantage of the cloud. The beauty of the cloud is elasticity, scalability, better performance, and perhaps a modern and much more efficient way of dealing with data. The most common examples would be to modernize the on-perm data warehouse implementations to cloud-native data-warehouse services or if the use-cases are expanding toward the data-analytics first approach, then convert them into cloud-native data-lake implementations. There is also a need to leverage cloud-native data-mart solutions or implement the best of all use cases with data lake-house implementations. Some common cloud-native services are Amazon Redshift, Microsoft OneLake, Google BigQuery and other well-known vendors like Snowflake and Data-bricks further simplify the use-case with additional built-in capabilities to solve these use-cases. E.g.: Snowflake can simplify the implementations of Cloud Data warehousing and further segment data to provide a data mart for your data in the cloud. Data bricks, on the other hand, can help build a robust lake-house implementation that addresses both warehousing and data-lake use cases.
5. Relate/ Reflect: This may be a category that can lead to an evolutionary next step as part of this Data Tagging exercise, leading towards harvesting new use cases. Usually, established organizations do not see a need for this until a major transformation project is underway. A seamlessly connected & governed data can drive the most critical decisions, ownerships, or governance of data. With a common understanding and semantics of data, advanced data practices like DataOps can be enabled or can help to make the organization ready for Data Fabric or Data Mesh architecture can then be a relevant next step. This, in my view, is either a point of finding a Relation with data or a point of Reflection based on the data tagging exercise… i.e. What’s next!! In case there are key missing elements or use cases that organizations need to mature their data practices, those can be documented and prioritized for funding. Some common examples —are the need for Master and Metadata Management — for building a trusted source of data with quality, accuracy, and reliability, Enterprise Reference Data Management — for ensuring the data is augmented with the right context, a centralized Data Catalog — to identify the critical data elements within the organizations and their lineage, having an active Data Quality scoreboard — for the continuous understanding of the health of data, or implementing a robust Data-Observability platform — for advanced data jobs monitoring alerting and self-healing.
In conclusion, the entire data tagging process is a primary step for cloud data migration to ensure that the right data sources and data use cases are identified, documented, and implemented to make the transformation projects successful. Needless to say that the identification of these use cases can take advantage of modern capabilities like machine learning, artificial intelligence (AI/ML), self-healing, and auto-discovery. Stay subscribed for more on these advanced capabilities and data practice in the following newsletters.