Why Deploy an Enterprise Data Warehouse on a Hybrid Cloud Architecture?
Analytics and artificial intelligence (AI) solutions are profoundly transforming how businesses and governments engage with consumers and citizens. Across many industries, high-value transformative use cases in personalized medicine, predictive maintenance, fraud detection, cybersecurity, logistics, customer engagement, geospatial analytics, and more are rapidly emerging
Deploying and scaling AI across the enterprise is not easy especially as the volume, velocity, and variety of data continue to explode. What’s needed is a well-designed, agile, scalable, high-performance, modern, and cloud-native data and AI platform that allows clients to efficiently traverse the AI space with trust and transparency. An enterprise data warehouse (EDW) is a critical component of this platform.
EDWs are central repositories of integrated data from many sources. They store current and historical data used extensively by organizations for analysis, reporting, and better insights and decision-making. Historically, data warehouse appliances (DWAs) have delivered high query performance and scalability, but are now struggling to transform data into timely, actionable insights with the data explosion.
A hybrid, open, multi-cloud platform allows organizations to take advantage of their data and applications wherever they reside, on-premises, and across many clouds. Here are some key pros and cons of deploying EDWs over on-premises, hybrid, or public clouds (Figure 1):
Figure 1: Comparing Enterprise Data Warehouses on On-Premises, Public and Hybrid Cloud
- Strategic for the long-term: About 80% of enterprise workloads are still on-premises[1] and still strategic, the public/hybrid cloud is even more strategic driving most of the innovation, growth, and investment in analytics.
- Total long-term costs: On-premises costs are predictable and become more favorable with greater utilization. Public cloud costs are unpredictable and good for short, infrequent spiky workloads and consumption-based pricing produces greater accountability of the user population. However, these costs grow steeply with higher utilization typical for most EDWs today. In addition, there are many other hidden costs such as long-term contracts, incremental, supplementary licensing fees, and more.
With hybrid cloud EDWs, customers can prudently optimize costs using on-premises assets for predictable workloads and offload spiky workloads to the public cloud. This is very effective for the long-term as a smaller on-premises hardware footprint can meet immediate requirements, and incremental needs for resources during peaks can be satisfied by the public cloud. Key components of the total costs include:
- Data Transfer/Migration Costs: For on-premises, these are negligible since most of the data for the entire analytics workflow typically reside on-premises. Significant for public clouds since many analytics workflows require substantial movement of data to and from the public cloud. Often enterprises are limited in their ability to move datasets from the cloud back to their on-premises equipment or to another cloud. Moreover, cloud providers charge fees for transferring data out their cloud environment which dramatically increases costs – particularly as datasets continue to grow. Also migrating on-premises workloads to the public cloud is hard and time-consuming.
In hybrid clouds, there is limited movement of data throughout the analytics workflow to and from the public cloud, and so these costs are low to medium. With consistent cloud-native architectures, migrating workloads from on-premises to public clouds is also relatively easy and less expensive.
- Capital Costs: Significant capital investment for on-premises IT infrastructure is needed to handle peak loads and may result in lower and sub-optimal utilization under normal operations. For public clouds customer capital costs are negligible. For hybrid clouds, some capital investment for IT infrastructure is needed for certain critical analytics workloads to run on-premises with the rest offloaded to the public cloud. This may result in better utilization and lower capital costs compared to the all on-premises alternative.
- Upgrade Costs: Significant capital expense for hardware upgrades over time needed to modernize on-premises IT infrastructure to drive innovation. For public clouds, the customer incurs a negligible capital expense for hardware upgrades over time since the provider is responsible for the infrastructure. For hybrid clouds, the modest capital expense for hardware upgrades over time is needed to modernize infrastructure.
- Operating Costs: Since the customer typically owns and operates on-premises assets, costs are predictable and high utilization environments provide better economics than public clouds which are better for short spiky workloads. With a hybrid cloud, the customer can prudently minimize costs by largely using on-premises assets for predictable workloads and offloading spiky workloads to the public cloud.
- Deployment Costs (no Integration/Customization): Significant for on-premises since provisioning and deploying resources and analytics workflows take more time and effort. Whereas costs are low on public clouds with faster provisioning and deployment as the process is automated. On hybrid clouds, costs are significant since connectivity between on-premises and public cloud and maintaining two environments could add another layer of complexity. However, this could be alleviated with a consistent cloud-native containerized architecture.
- Management/Maintenance: Moderately hard for on-premises since customers must invest in scarce skills and resources to maintain and operate these environments. Much easier with public clouds since customers typically can use a centralized portal with process automation. For hybrid clouds, it is relatively straightforward for customers to maintain and operate with the right pre-determined operating policies and procedures for workload placement on-premises or on-the-cloud.
- Integration/Customization: Easier for on-premises customers to customize and integrate newer solutions with their legacy solutions. This is harder to do on public clouds. On hybrid clouds, it is easier to integrate legacy systems with newer custom solutions from the edge to multiple clouds seamlessly.
- Business Continuity/Serviceability: It can be tailored to provide higher service level agreements (SLAs) for on-premises customers. It is harder to do for public clouds, but they can deliver excellent business continuity. Hybrid clouds can provide high SLAs and excellent business continuity even with disasters.
- Performance/Scalability: EDWs offer excellent performance on-premises with hardware accelerators, faster storage, and proximity to data, but harder to scale to address new business requirements. Lower performance for large-scale analytics on public clouds since maintaining data proximity is hard and optimized storage and computing infrastructure are typically not available. But public clouds can easily scale to meet new business requirements for smaller data sizes. However, as data sets continue to grow exponentially, beyond a few 100s of terabytes, these environments have limited elasticity. Hybrid EDWs have excellent performance with hardware accelerators, faster storage, and proximity to data either on-premises or on-the-cloud and can also easily scale to meet new business requirements.
- Governance/Compliance: Excellent for on-premises since these operations can be tailored to meet individual enterprise and regulatory requirements. Public clouds have limited ability to tailor these operations for individual customers since they are set broadly by the cloud provider. Hybrid clouds are excellent since these operations can be tailored to meet individual enterprise and regulatory requirements consistently end-to-end.
- Data Protection/Security: On-premises and hybrid clouds are excellent since sensitive data can be stored and managed for individual customer requirements and protocols. Public clouds are somewhat vulnerable since their infrastructure is shared and many enterprises are reluctant to part with their mission-critical data.
- Vendor Lock-in: Strong for on-premises and public clouds especially with the underlying software infrastructure. Also, data migration to an alternate solution is complex and expensive.
A hybrid multi-cloud environment empowers customers to experiment with and choose the tools, programming languages, algorithms, and infrastructure to build data pipelines, train and make analytics/AI models ready for production in a governed way for the enterprise, and share insights throughout the workflow.
[1] Nagendra Bommadevara, Andrea Del Miglio, and Steve Jansen, “Cloud adoption to accelerate IT modernization”, McKinsey & Company, 2018