AWS to Google Cloud migration

Data Warehouse migration from AWS to BigQuery

AWS to Google Cloud migration

About

01.

Launch

2023

02.

Scope

data warehouse migration

03.

SOLUTION

04.

RESULTS

BigQuery based EDW

About the client

To protect the privacy of our clients we can't always share the details of projects and services provided. We would sometimes share anonymous stories of our projects to show you what's possible in the data and AI world today.

Our client wanted to change the data warehouse solution used and transfer their data from AWS to Google Cloud - BigQuery. Effectively, they needed to build a new data warehouse and transfer the existing data to Google Cloud, but they wanted to do this without the need to integrate data from original sources from scratch, and duplicate a process that had already been done in AWS.

To build a new data warehouse for our client, and transfer the company's existing data from AWS, we used a number of Google Cloud based solutions. The Data Warehouse was built in Google BigQuery, and for seamless transfer of data, we took advantage of Google Transfer Service and Cloud Composer with Apache Airflow 2.

01.

Our approach

In AWS, the data from tables that needed to be transferred to BigQuery were stored and backed up in S3 bucket. Using Google Transfer Service we were able to synchronize an AWS bucket with Google Cloud bucket. Synchronized files were loaded into BigQuery directly from raw files, using BigQuery external tables. Proces of loading data to BigQuery was orhestrated using Apache Airflow. For a brief time both warehouses would run simultaneously, to make sure all the operations are flawless, while the BigQuery warehouse was being tested with real data prior to the final migration and was already prepared to automatically implement all AWS stored data when the migration took place.

02.

Testing & Savings

Thanks to the automated process that could be easily repeated, the test migration didn’t generate additional costs and there was always a backup available. Original source data didn’t have to be aggregated from scratch, so significant financial and operational savings were made.

03.

Results

The project provided our client with a new data warehouse on Google Cloud, that would have all the aggregated data from a previously used solution ingested into BigQuery in reliable, secure and automated way. Thanks to the fact that original source data didn’t have to be integrated from scratch, significant financial and operational savings were made. The built solution allowed for warehouse testing on real data, which helped minimize the risk of failures and mistakes when the final migration was done. Thanks to the automated process that could be easily repeated, the test migration didn’t generate additional costs and there was always a backup available.