Emerging Computing Technology Trends for 2022

Emerging Computing Technology Trends for 2022
M. R. Pamidi, Ph. D.
Happy New Year to all. Here are some thoughts on key emerging computing technology trends for 2022:
We see AI expanding in a variety of ways.
  1. Reliable data: Organizations in continuing their efforts build digital-first business models will further explore using AI to enhance customer acquisition, improve customer experience, and expand customer retention. To achieve these goals, they need to have reliable data that is both clean and structured. Today, these tasks are being accomplished by highly paid data scientists, who maintain their notebooks with very little shared infrastructure. We expect AI to take over much of these activities and create pools of priceless data and pipelines of valuable dataflow.
  2. Conversational and ethical AI: Debates will continue. Recent (ab)use of AI by major social media companies and state agencies involving face recognition, monopoly, and privacy issues have caught the attention of politicians and lawmakers worldwide and many countries have imposed hefty fines on these companies. These will force media companies to adopt more-ethical AI.
  3. AI for all: We’ve often heard the phrase data is the new oil. This may be true but, remember, the OPEC and a few other countries around the world control oil extraction, production, and distribution, but data is much more democratic. Even oil-scarce smart countries (e. g., Israel) in a flat world can exploit AI and show their prowess.
  4. Improving lifestyle: A machine-learning (ML) program that can be connected to a human brain and used to command a robot. The program adjusts the robot’s movements based on electrical signals from the brain. With this invention, tetraplegic patients hopefully will be able to carry out more day-to-day activities on their own.
  5. Entertainment: On a more entertaining side, classical-music lovers know that Beethoven composed nine symphonies and was reportedly working on his 10th when he died in 1827, leaving behind 40 sketches for this symphony. Many music lovers, musicologists, academia, and AI experts, mainly from Europe and the U. S., got together a few years ago and decided to complete Beethoven’s unfinished 10th using AI, sponsored by Deutsche Telekom.[1] The work had its premiere in Bonn on October 9, 2021.[2] This, definitely, is one of the most creative and fascinating applications of AI.
  6. Patents: In an excellent article on AI in The Wall Street Journal, the reporter states South Africa in July 2021 granted a patent to an invention that listed an AI system as the inventor.[3] The system came up with an idea for a beverage container based on fractal geometry. It was the first time a government awarded a patent for an invention made by AI. The U. S. grants patents only to human beings, or “natural persons.”
  1. Continuing growth: Cloud Computing (CC) continues to make deeper inroads into the enterprises and spending on CC is expected to surpass that on non-cloud before 2025. CC has traditionally been a technology disruptor and will eventually morph into a business disruptor in many areas, e. g., bio-pharma, public sector, consumer goods, banks and financial sectors, oil and gas, energy, technology to name a few. We expect the current leaders—Amazon Web Services, Microsoft Azure, and Google GCP—to maintain their strong positions with continued growth (Figure 1), although Google and Microsoft are gaining market shares at the expense of Amazon.[4]                                                           Figure 1. Public Cloud Market Shares
  2. Security and privacy: As CC proliferates, so will concerns about cybersecurity. Traditionally, security has been an afterthought in enterprises, and they will soon realize that, if IT is considered as a cake, security should be baked into it like eggs, and not just brushed on later as icing. DevOps will be gradually replaced by DevSecOps and we expect the beginning of distributed public clouds to different physical locations with due considerations to geo-fencing and privacy laws, such as Brazil’s Lei Geral de Proteção de Dados (LGPD), California Consumer Protection Act/California Privacy Rights Act of 2020 (CCPA/CPRA), the EU’s GDPR, and South Africa’s Protection of Personal Information (POPI). The U. S. is still kind of loosey-goosey on privacy issues and appears to be saying what Scott McNealy of Sun Microsystems said over 20 years ago, “You got zero privacy, get over it.” We hope the rest of the U. S. learns from California, which has always led the nation in creative issues.
  3. Complements AI: AI and CC will complement one another, because AI with ML and Deep Learning will require large amounts of computing resources—CPUs/GPU/IPUs/TPUs, speed, storage, and network bandwidth—and CC can easily deliver these to those in need. AI will get smarter and more resourceful—creating its own algorithms as it ‘learns’ from experience, with very little help from humans.
  4. “Serverless Computing”, a buzz phrase for the past few years, will make a deeper footprint from Amazon Lambda, Microsoft Azure Functions, and IBM Cloud Functions. Serverless means enterprises are not acquiring or leasing servers, but are using a cloud provider on a pay-as-you-go basis. So, “serverless” is really a misnomer; someone out there pays for and owns those servers. It’s more like “less-server” or, to please our grammarian readers, “Fewer-Servers Computing.”
  5. Streaming: Finally, with increased emergence and embrace of 5G and Wi-Fi 6E, not only more, but new kinds of, data, such as those from Amazon Luna and Google’s Stadia gaming platforms, will be streaming on networks. Only CC will accommodate such burst-load spikes, as it has successfully done so on Black Fridays and Cyber Mondays in recent years.
  1. Accelerated computing: Years ago, High-Performance Computing (HPC), born in traditional on-premises datacenters, was done using expensive water-cooled supercomputers[5] and parallel processing techniques to execute multiple time-consuming tasks simultaneously. However, edge computing and AI have redefined HPC which can now deliver these tasks very inexpensively. What has made these possible is a combination of AI, new kind of processors beyond traditional CPUs—such as GPUs (Nvidia), TPUs (Google), and IPUs (Graphcore)—and improvements in traditional ASICs and FPGAs.
  2. Mainstreaming: AI, CC, and HPC complement each another in that AI, as noted above, drives the HPC engine and CC democratizes IT infrastructure and delivers a level playing field. Once the kingdom enjoyed mainly academia, national labs, and defense, HPC has been widely embraced by aerospace, bio-pharma, energy, healthcare, oil and gas, Wall Street, and other industries. With CC delivering HPC as a Service (HPCaaS), edge computing will further HPC’s footprint. These trends will continue as Exascale Computing appears on the horizon, with performance measured in exaFLOPS (1 quintillion or 1018 FLOPS). But we are still far from achieving the late great Seymour Cray’s vision of 4-T Computing—Terahertz chip speed, terabit bandwidth, terabyte memory (achieved), and terabyte storage (achieved).
The concept of Quantum Computing (QC) was first posited by the Nobel Prize-winning physicist Richard Feynman who explained that classical computers could not process calculations that describe quantum phenomena, and a quantum computing method was needed for these complex problems.[6] Since then, QC has made significant strides and established companies and nations are investing heavily to gain leadership positions in this field.
On the commercial front:
  1. Honeywell recently completed a business combination of its Honeywell Quantum Solutions division with Cambridge Quantum and has formed a new company, Quantinuum. The previously announced business combination results in Honeywell owning a majority stake in Quantinuum.  Honeywell and IBM were both prior investors with Cambridge Quantum. Jointly headquartered in Cambridge, U.K., and in Broomfield, CO., Quantinuum plans to launch a “quantum cyber security product” this year, and an enterprise software package that applies quantum computing to solve complex scientific problems in pharmaceuticals, materials science, specialty chemicals and agrochemicals later this year.
  2. PlatformE, the fashion technology company enabling on-demand production for top brands, has acquired Catalyst AI, an artificial intelligence company based in Cambridge, UK. The deal will see Catalyst AI’s ML tools for optimizing fashion supply chains bolster PlatformE’s services for efficient on-demand and made-to-order fashion.
  3. IBM recently announced[7] its new 127-quantum bit (qubit) ‘Eagle’ processor at the IBM Quantum Summit 2021, its annual event to showcase milestones in quantum hardware, software, and the growth of the quantum ecosystem. IBM measures progress in quantum computing hardware through three performance attributes:
  4. Scale, measured in the number of qubits (quantum bits) on a quantum processor and determines how large of a quantum circuit can be run.
  5. Quality, measured by Quantum Volume and describes how accurately quantum circuits run on a real quantum device.
  6. Speed, measured by CLOPS (Circuit Layer Operations Per Second), a metric IBM introduced in November 2021, and captures the feasibility of running real calculations composed of a large number of quantum circuits.
IBM’s Quantum System Two offers a glimpse into the future quantum computing datacenter, where modularity and flexibility of system infrastructure will be key towards continued scaling,” said Dr. Jay Gambetta, IBM Fellow and VP of Quantum Computing. “System Two draws on IBM’s long heritage in both quantum and classical computing, bringing in new innovations at every level of the technology stack.”
Expected to be up and running in 2023, IBM Quantum System Two is designed to work with IBM’s future 433-qubit and 1,121 qubit processors and is based on the concepts of flexibility and modularity. The control hardware has the flexibility and resources necessary to scale, including control electronics, allowing users to manipulate the qubits, and cryogenic cooling, keeping the qubits at a temperature low enough for their quantum properties to manifest.
QC will not replace traditional computing anytime soon, but will coexist with it. When it does mature, QC applications will be widespread in climate-change studies, new drug discoveries, revolutionary agriculture resulting in reduced carbon emissions, systems biology, and cognitive computing processes—involving programs that are capable of learning and becoming better at their jobs—using vast neural networks. Quantum-powered AI will yield machines that are able to think and learn more quickly than ever, although machines may never equal humans in creative and emotional aspects.
Cybercrime reportedly cost damages totaling US$6 trillion globally in 2021, larger than the economies of U.S. and China and would be the world’s third-largest economy, and is expected to grow by 15% CAGR reaching US$10.5 trillion 2025.
  1. Security, like CC, is a journey and not a destination and, just as CC does, security threats from hackers, fraudsters, phishers, and scammers are only expected to get worse and more frequent. Ransomware attacks, for instance, were three times higher in the first quarter of 2021 than they were during 2019, according to the UK National Cyber Security Centre. Sixty-one percent of respondents to a PwC research survey expect the ransomware attacks to increase in 2022. Ransomware locks files behind hard-to-break encryption and threatens to wipe them all if they are not paid. Not only organizations, but also individuals, have become targets. AI, again, is coming to rescue cybersecurity professionals, as it did in financial fraud detection involving money-laundering schemes. AI can identify unusual patterns of behavior in systems dealing with hundreds of thousands of events per second. As IT security professionals encourage companies to invest in AI, cybercriminals are equally adept and aware of AI’s benefits and will try to outsmart IT. In fact, they have developed new threats using ML technology to bypass cybersecurity (think of ‘sandbox’). Again, it’ll be a battle of good vs. evil using the same technology—AI—and the savvy ones will win. This is not to discourage security spend, but to spend it wisely.
  2. Phishing or spear fishing, either in the form of employees tempted to click on an innocent-looking link, thus welcoming malware, or via USB devices that employees pick up for free at trade shows, is also becoming more common. Stuxnet is one of the most well-known phishing incidents of the latter kind.
  3. Finally, Internet of Things (IoT), about 18 billion of which are expected to be connected by 2022, is another attractive pick for cybercriminals. The targets include billions of smart appliances, light bulbs, autonomous vehicles, plant and control systems (chemical, electric power, manufacturing, traffic, oil and gas, water supply…). Thus, IoT may have to be rechristened IoVT—Internet of Vulnerable Things.
Summary
The IT industry is never dull and 2022 will be no different.
AI will invade more fields and also attract the attention of central governments worldwide concerning privacy, racial profiling, and facial recognition.
CC will continue to grow fueled by its leaders’ growth. New players will face daunting challenges from established vendors.
HPC, aided by AI and CC, will become cheaper to embrace and expand its footprint by entering new fields.
QC is still in its early stages and, but for a few marquee use cases, may take 5 to 10 years to reach practical implementations.
Security will face more challenges with hacksters (hackers + fraudsters) trying to outsmart cybersecurity experts. Central governments have to play a key role to avoid individuals (seeking fun, money, or both) or state-sponsored infrastructure meltdowns. While our Defense Brass is stuck in 20th century warfare (mass killings, carpet bombing), the 21st century will face cyber warfare. Einstein is famously reported to have said, “I do not know with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.” We beg to disagree with probably the greatest scientist and humanitarian of all time and state: The next World War will be fought with ‘0’s and ‘1’s. It will be a cyber war. Mass destruction of past wars will be replaced by mass disruption.
[1] “Beethoven’s 10th Symphony Completed By AI: Premiere October 2021,” https://www.udiscovermusic.com/classical-news/beethovens-10th-symphony-ai/
[2]Welturaufführung: Beethoven X,´ October 9, 2021.
[3] “For AI, 2021 Brought Big Events,” John McCormick, The Wall Street Journal, December 30, 2021.
[4]Rivals Tap Cash Piles To Win In Cloud,” Tripp Mickle and Aaron Tilley, The Wall Street Journal, December 30, 2021 (may need subscription for access).
[5] Seymour Cray, often called the Father of Supercomputing, once quipped “I’m an overpaid plumber.
[6] W. Knight, “Serious Quantum Computers Are Here. What Are We Going To Do With Them?”, MIT Technology Review, February 2018.
[7] IBM Unveils Breakthrough 127-Qubit Quantum Processor, November 16, 2021.

As the world gathers to fight climate change, let’s recognize the critical role of HPC, AI and Cloud Computing

This week in Glasgow, the COP26 summit will bring global leaders together to accelerate action towards combating climate change. This is happening as energy consumption, mostly with polluting fossil fuels, is at an all-time high. This may seem good for the oil, gas, and coal industries, but it isn’t. The grave reality is that the fossil fuel industry is under immense pressure to mitigate climate change and decarbonize since high levels of energy consumption are causing unsustainable levels of C02 and other greenhouse gases.

 

So, the energy industry is undergoing a profound transition from fossil fuels to renewable and clean energy sources. As oil and gas companies decarbonize, they are looking into new and economically viable solutions including potentially becoming carbon-neutral energy companies.  For this they are investing heavily in physical infrastructure and boosting investments in High-Performance Computing (HPC) and artificial intelligence (AI) to innovate and quickly solve critical problems in this transition.

 

Oil and gas companies have always used HPC (seismic processing and reservoir simulation) and certain forms of AI technology (analytics) for decades to improve decision-making regarding exploration and production, and to reduce investment risks particularly as fossil fuels become harder to extract. Now as oil and gas companies transition, they must be able to handle newer interdisciplinary workloads in Geophysics, Computer-aided engineering (CAE), Life sciences, Combustion/Turbulence models, Weather modeling, Material science, Computational chemistry, Nuclear engineering, and Advanced optimization.

 

For this oil and gas companies are increasing their investments in HPC, AI and other innovative and agile solutions to handle exploding compute/storage requirements. As the use of AI and HPC continues to grow in the energy industry, cloud computing is making is easier to process spiky workloads and improve overall user experience. So, many oil and gas companies are using hybrid cloud solutions with several deployment options.

 

The harmful impact of climate change is becoming more severe day by day, threatening the survival of the planet. The entire world is coming together to fight it.  Oil and gas companies are doing their part and transitioning to renewable energy sources. This transition is hard and expensive but is made easier with novel applications and extensions of proven HPC and AI solutions oil and gas companies have used for years. As the world focuses this week on climate change, let’s also recognize the critical role HPC and AI play in solving one of mankind’s most pressing challenges.

 

You can learn more by reading this Hewlett Packard Enterprise whitepaper that Cabot Partners recently helped create.

HPC and AI enable breakthroughs in genomics for better healthcare

Many Life Sciences organizations are using digital technologies to meet the needs and expectations of patients. These technologies help treat and manage diseases in new ways. HPC and AI solutions are at the forefront of this. They are needed to accelerate breakthroughs in large-scale genomics.

Genomics is a sub-discipline of molecular biology that focuses on the storage, function, evolution, mapping, and editing of genomes. It is a vital and growing field because it can improve the lifestyle and outcomes for patients. In the next five years, the economic impact of genomics is estimated to be in the hundreds of billion to a few trillion dollars a year.

Next-generation Sequencing (NGS), Translational and Precision Medicine

High-performance computing (HPC) and Artificial Intelligence (AI) are essential for Genomics primarily because they help speed up NGS which processes and reduces the amount of raw data into a usable format. NGS helps determine the sequence of DNA or RNA to study genetic variations associated with diseases or other biological phenomena.

After this NGS step, it is important to establish the relationship between genotypes to understand the influence individual DNA variances have on disease and medical outcomes. This is Translational medicine. The final step is Precision or Personalized medicine which customizes disease prevention and treatment for an individual based on their genetic makeup, environment, and lifestyle. This last step uses all the data collected using HPC and AI technology to create a personalized approach for the patient.

How HPC and AI help accelerate genomics

Genomics is difficult, but new HPC technology is making this process easier. NGS deals with complex algorithms that require a lot of memory to assemble and solve. HPC and AI solutions help speed up this process and make it cheaper and more accurate.  Using the cloud and new IT solutions, data sequence analysis also becomes easier and can be done at a much larger scale.

Translational medicine requires HPC solutions that can process a lot of data efficiently. This step looks for relationships between a lot of genes, DNA, and diseases and makes it possible to provide personalized treatment for a patient.

HPC and AI are game changers in life sciences. They make genomics easier and faster for healthcare providers, so they can provide highly effective personalized care for their patients. We expect HPC and AI in Life Sciences to continue grow even more rapidly especially in the post-COVID-era by enabling breakthroughs in vaccines, personalized medicine, and healthcare.

You can learn more by reading this Hewlett Packard Enterprise and NVIDIA whitepaper that Cabot Partners recently helped create.

The Promise and Peril of RPA

RPA or Robotic Process Automation emulates human activity when interacting with digital software. It automates tedious and mundane business processes. Artificial Intelligence (AI) when integrated with RPA increases business value. AI can directly be used in bots to execute tasks without human intervention. This results in better efficiency, and improved customer and employee experiences.

RPA software revenue is growing rapidly despite economic disruptions caused by the COVID-19 pandemic and is projected to reach $1.89 billion in 2021 with double-digit rates through 2024.

Automating processes with RPA seems like a great solution in theory, but in practice, this isn’t the case. RPA has been successful for some but disappointing for others. While many organizations are relatively happy with their automation investment, most haven’t fully realized the ROI promised by RPA software vendors. For this reason, clients need to carefully evaluate the various RPA vendors before making this strategic investment.

Read this Cabot Partners paper for more details.

A Fresh Look at the Latest AMD EPYC 7003 Series Processors for EDA and CAE Workloads

When it comes to high-performance computing (HPC), engineers can never get enough performance. Even minor improvements at the chip level can have dramatic financial impacts in hyper-competitive industries such as computer-aided engineering (CAE) for manufacturing and electronic design automation (EDA).

With their respective x86 processor lineups, Intel and AMD continue to battle for bragging rights, leapfrogging one another in terms of absolute performance and price-performance. Both Intel and AMD provide a comprehensive set of processor SKUs optimized for various HPC workloads.

In March of 2021, AMD “upped the ante” with the introduction of their 3rd Gen AMD EPYC™ processors. Dubbed as the world’s highest-performing server processor, AMD 7003 series processors deliver up to 19% more instructions per clock (IPC) than the previous generation. The new “Zen 3” processor cores deliver industry-leading amounts of cache per core, a faster Infinity Fabric™, and industry-leading memory bandwidth of 3200 MT/sec across eight channels of DDR4 memory. HPC users are particularly interested in the recently announced 7xF3 high-frequency SKUs with a boost speed of up to 4.1 GHz.

In two recently published whitepapers sponsored by AMD, Cabot Partners looked at the latest AMD EPYC 7003 series processors (aka “Milan”) in HPE Apollo and HPE ProLiant server platforms, characterizing their performance for various CAE and EDA workloads. Among the headlines were that EPYC 7003 series processors deliver 36% better throughput and up to 60% more simultaneous simulations per server than previous 2nd Gen EPYC processors.

These performance gains benchmarked on the latest HPE servers make these processors worth a look. Readers can download the recently published whitepapers here:

TVO Analysis of Federated Learning with IBM Cloud Pak for Data

Analytics and AI are profoundly transforming how businesses and governments engage with consumers and citizens. Across many industries, high value transformative use cases in personalized medicine, predictive maintenance, fraud detection, cybersecurity, logistics, customer engagement and more are rapidly emerging. In fact, AI adoption alone has grown an astounding 270% in the last four years and 40% of organizations expect it to be the leading game changer in business[1]. However, for analytics and AI to become an integral part of an organization, numerous deployment challenges with data and infrastructure must be overcome – data volumes (50%), data quality and management (47%) and skills (44%)[2].

In addition, many companies are beginning to use hybrid cloud and multi-cloud computing models to knit together services to reach higher levels of productivity and scale. Today, large organizations leverage almost five clouds on average. About 84% of organizations have a strategy to use multiple clouds[3].

IBM Cloud Pak for Data is an end-to-end Data and AI platform that reduces complexity, increases scalability, accelerates time to value and maximizes ROI with seamless procedures to extend to multiple clouds. While Cloud Pak for Data and can run on any public or private cloud, it is also modular and composable allowing enterprises to embrace just the capabilities that they need on-premises. So, it is truly a hybrid multi-cloud platform.

Recently, IBM announced enhancements to IBM Cloud Pak for Data (Version 3.5). These enhancements can be broadly grouped into 2 key themes:  Cost Reduction and Innovation to drive digital transformation. Customers can drive down costs through automation, consolidated management and an integrated platform. On the innovation front, Accelerated AI, Federated Learning, improved governance & security and an expanded ecosystem are the key focus areas. In this blog, we primarily focus on the value of Federated Learning.

Federated learning (also known as collaborative learning) is a machine learning technique that trains an algorithm across multiple decentralized edge devices or servers with local datasets, without transferring them ( Figure 1). The data stays local and it allows for executing deep learning algorithms while preserving privacy and security.   This approach is different from traditional centralized machine learning techniques where all the local datasets are uploaded to one server and deep learning ML algorithms are executed on this aggregated dataset.   

Figure 1: Comparison of Federated Learning and a Standard Approach

Federated learning enables multiple actors to build a common, robust machine learning model without sharing data, thus maintaining data privacy, data security, data access rights and access to heterogeneous data. Many industries including defense, telecommunications, IoT, healthcare, manufacturing, retail and others use federated learning and getting significant additional value from their AI/ML initiatives.

For IBM Cloud Pak for Data, this additional value can be quantified using the Cabot Partners Total Value of Framework.

High Level TVO Framework for Federated learning

TVO analysis is an ideal avenue to quantify and compare the value of Federated Learning compared to the standard approach for Machine Learning.  In the TVO analysis, the Total Value (Total Benefits – Total Costs) of IBM Cloud Pak for Data solution with Federated Learning is compared against IBM cloud Pak for Data solution without Federated Learning

The TVO framework (Figure 2) categorizes the interrelated cost/value drivers (circles) for Analytics by each quadrant:  Costs, Productivity, Revenue/Profits and Risks. Along the horizontal axis, the drivers are arranged based on whether they are primarily Technical or Business drivers. Along the vertical axis, drivers are arranged based on ease of measurability: Direct or Derived.

The cost/value drivers for Analytics are depicted as circles whose size is proportional to the potential impact on a client’s Total Value (Benefits – Cost) of Ownership or TVO as follows:

  1. Total Costs of Ownership (TCO): Typical costs include: one-time acquisition costs for the hardware and deployment, and annual costs for software, maintenance and operations. For the case without Federated Learning, the costs associated with data transfer to a central repository need to be considered. 

Figure 2: TVO Framework for Federated Learning with Cost/Value Drivers

  • Improved Productivity: The TVO model quantifies the value of productivity gains of data scientists, data engineers, applications developers and the organization. It should also consider the value associated with the availability of additional heterogeneous data due to Federated Learning. Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud and the value associated with this innovation need to be considered for applicable cases.   
  • Revenue/Profits: Benefit of Federated Learning is access to a large pool of data, resulting in increased machine learning performance, while respecting data ownership and privacy.  Faster time to value with better performance results in greater innovation and better decision-making capabilities which spur growth, revenues and improve profits. 
  • Risk Mitigation: Federated Learning enables multiple actors to build a common, robust machine learning model without sharing data, thus allowing users to address critical issues such as data privacy, data security, data access rights which also allows for improved governance and compliance.  

The above Framework is a simplified pictorial look of TVO analysis. In a rigorous TVO analysis, which is a major offering of Cabot Partners, the elements of the framework are quantified and expressed in easily understandable business terms. In addition, the analysis can be expanded include other innovation features.   

Conclusions

IBM, recently announced enhancements to IBM Cloud Pak for Data (version 3.5). The enhancements focus primarily on cost reduction and Innovation to drive digital transformation. A major element of innovation is Federated Learning. As detailed above, Federated Learning amplifies the value of IBM Cloud Pak for Data through:

  • Lower costs – no costs associated with data migration to a central database location   
  • Availability of heterogeneous data improves the quality of ML models
  • Access to larger pool of data resulting in increased ML performance
  • Improved security
  • Multiple actors to build a common robust ML model without sharing data, thus allowing to address critical issues such as data privacy and data access rights

[1] https://futureiot.tech/gartner-ai-adoption-growing-despite-skills-shortage/

[2] Ritu Jyoti, “Accelerate and Operationalize AI Deployments Using AI – Optimized Infrastructure”, IDC Technology Spotlight, June 2018  

[3] RightScale® STATE OF THE CLOUD REPORT 2019 from Flexera™

IBM Storage Simplified for Multi-cloud and AI

A profound digital transformation is underway as High-Performance Computing (HPC) and Analytics converge to Artificial Intelligence/ Machine Learning/Deep Learning (AI/ML/DL). Across every industry, this is accelerating innovation and improving a company’s competitive position, and the quality and effectiveness of its products/services, operations and customer engagement. Consequently, with 2018 revenues of $28.1 billion, the relatively new AI market is rapidly growing at 35.6% annually[1].

As the volume, velocity and variety of data continue to explode, spending on storage systems and software just for AI initiatives is already almost $5 billion a year and expected to grow rapidly.[2] In addition, many companies are beginning to use hybrid cloud and multi-cloud computing models to knit together services to reach higher levels of productivity and scale. Today, large organizations leverage almost five clouds on average. About 84% of organizations have a strategy to use multiple clouds[3] and 56% plan to increase the use of containers[4]

What’s needed to handle the data explosion challenges are simple, high-performance and affordable storage solutions that work on hybrid multi-cloud environments (Figure 1).

Figure 1: Data Challenges, Storage Requirements and Solutions for HPC, Analytics and AI

 

Key Storage Requirements

Scalable and affordable: These two attributes don’t always co-exist in enterprise storage. Historically, highly scalable systems have been more expensive on a cost/capacity basis. However, newer architectures allow computing and storage to be integrated more pervasively and cost-effectively throughout the AI workflow.

Intelligent software: This helps with the cumbersome curatorial, data cleansing tasks, and help run and monitor compute and data-intensive workloads efficiently and reliably from the edge to the core to multiple clouds. It also greatly improves the productivity of highly skilled Data Scientists, Data Engineers, Data Architects, Data Stewards and others throughout the AI workflow.

Data integration/gravity: This provides the flexibility to simplify and optimize complex data flows for performance even with data stored in multiple geographic locations and environments. Wherever possible, moving the algorithms to where the data resides can accelerate the AI workflow and eliminate expensive data movement costs especially when reusing the same data iteratively.

 

Storage Solutions Attributes

Parallel: As clients add more storage capacity (including Network Attached Storage – NAS), they are realizing that the operating costs (including downtime and productivity loss) of integrating, managing, securing and analyzing exploding data volumes are escalating. To reduce these costs, many clients are using high performance scalable storage with parallel file systems which can store data across multiple networked servers. These systems facilitate high-performance access through concurrent, coordinated input/output operations between clients and storage nodes across multiple sites/clouds.

Hybrid: Different data types and stages in an AI workflow have varying performance requirements. The right mix of storage systems and software is needed to meet the simultaneous needs for scalability, performance and affordability, on premises and on the cloud. A hybrid storage architecture combines file and object storage to achieve an optimal balance between performance, archiving, and data governance and protection requirements throughout the workflow.

Software-defined: It is hard to support and unify many siloed storage architectures and optimize data placement to ensure the AI workflow runs smoothly with the best performance from ingest to insights. With no dependencies in the underlying hardware, Software-defined Storage (SDS) provides a single administrative interface and a policy-based approach to aggregate and manage storage resources with data protection and scale out the system across servers. It also provides data-aware intelligence to dynamically adapt to real-time needs and orchestrate IT resources to meet critical service level agreements (SLAs) in parallel, virtual and hybrid multi-cloud environments. SDS is typically platform agnostic and supports the widest range of hardware, AI frameworks, and APIs.

Integrated: A lot of AI innovation is occurring in the cloud. So, regardless of where the data resides, on-premises storage systems with cloud integration will provide the greatest flexibility to leverage cloud-native tools. Since over 80% of clients are expected to use at least two or more public clouds[5], there will be a need for smooth and integrated data flow to/from multisite/multi-cloud environments. This requires more intelligent storage software for metadata management and integrating physically distributed, globally addressable storage systems.

IBM Spectrum Storage provides these attributes and accelerates the journey to AI from ingest to insights.

IBM Spectrum Storage and Announcements – October 27, 2020

IBM Spectrum Storage is a comprehensive SDS portfolio that helps to affordably manage and integrate all types of data in a hybrid, on-premises, and/or multi-cloud environment with parallel features that increase performance and business agility. Already proven in HPC, IBM Spectrum Storage Software comes with licensing options that provide unique differentiation and value at every stage of the AI workflow from ingest to insights

On October 27, 2020, IBM announced new capabilities and enhancements to its storage and modern data protection solutions that are designed to:

  • Enrich protection for containers, and expand cloud options for modern data protection, disaster recovery, and data retention
  • Expand support for container-native data access on Red Hat OpenShift
  • Increase container app flexibility with object storage.

These enhancements are primarily designed to support the rapidly expanding container and Kubernetes ecosystem, including RedHat OpenShift, and to accelerate clients’ journeys to hybrid cloud. This announcement further extends an enterprise’s capabilities to fully adopt containers, Kubernetes, and Red Hat OpenShift as standards across physical, virtual and cloud platforms.

IBM announced the following new capabilities designed to advance its storage for containers offerings:

  • The IBM Storage Suite for Cloud Paks is designed to expand support for container-native data access on OpenShift. This suite aims to provide more flexibility for continuous integration and continuous delivery (CI/CD) teams who often need file, object, and block as software-defined storage. This is an enhancement with new Spectrum Scale capabilities.
  • Scheduled to be released in 4Q 2020, IBM Spectrum Scale,a leading filesystem for HPC and AI, adds a fully containerized client and run-time operators to provide access to an IBM Spectrum Scale data lake, which could be IBM Elastic Storage systems, or an SDS deployment. In addition, IBM Cloud Object Storage adds support for the open source s3fs file to object storage interface bundled with Red Hat OpenShift
  • For clients who are evaluating container support in their existing infrastructure, IBM FlashSystem provides low latency, high performance, and high-availability storage for physical, virtual, or container workloads with broad CSI support. The latest release in 4Q 2020 includes updated Ansible scripts for rapid deployment, enhanced support for storage-class memory, and improvements in snapshot and data efficiency
  • IBM Storage has outlined plans for adding integrated storage management in a fully container-native software-defined solution. This solution will be managed by the Kubernetes administrator, and is designed to provide the performance and capacity scalability demanded by AI and ML workloads in a Red Hat OpenShift environment.
  • IBM intends to enhance IBM Spectrum Protect Plus, to protect Red Hat OpenShift environments in 4Q 2020. Enhancements include ease of deployment with the ability to deploy IBM Spectrum Protect Plus server as a container using a certified Red Hat OpenShift operator, the ability to protect metadata which provides the ability to recover applications, namespaces, and clusters to a different location, and expanded container-native and container-ready storage support. IBM is also announced the availability of a beta of IBM Spectrum Protect Plus on Microsoft Azure Marketplace

Conclusions

As HPC and Analytics grow and converge, clients can continue to leverage these new IBM storage capabilities to overcome the many challenges with deploying and scaling AI across their enterprise. These simple storage solutions – on-premises or on hybrid multi-clouds can accelerate their AI journey from ingest to insights.

[1] https://www.idc.com/getdoc.jsp?containerId=US45334719

[2] http://www.ibm.com/downloads/cas/DRRDZBL2

[3] RightScale® STATE OF THE CLOUD REPORT 2019 from Flexera™

[4]https://www.redhat.com/cms/managed-files/rh-enterprise-open-source-report-detail-f21756-202002-en.pdf

[5] https://www.gartner.com/smarterwithgartner/why-organizations-choose-a-multicloud-strategy/

Cloudera Introduces Analytic Experiences for Cloudera Data Platform

Cloudera recently announced new enterprise data cloud services on Cloudera Data Platform (CDP): CDP Data Engineering; CDP Operational Database; and CDP Data Visualization. The new services include key capabilities to help data engineers, data analysts, and data scientists collaborate across the entire analytics workflow and work smarter and faster. CDP enterprise data cloud services are purpose-built to enable data specialists to navigate the exponential data growth and siloed data analytics operating across multiple public and private clouds.

Data lifecycle integration enables data engineers, data analysts and data scientists to work on the same data securely and efficiently, no matter where that data may reside or where the analytics run. CDP not only helps to improve individual data specialist productivity, it also helps data teams work better together, through its unique hybrid data architecture that integrates analytic experiences across the data lifecycle and across public and private clouds. Effectively managing and securing data collection, enrichment, analysis, experimentation and analytics visualization is fundamental to navigating the data deluge. The result is data scientists and engineers can collaborate better and more rapidly deliver data-driven use cases. Following are the new enterprise cloud services announcements:

CDP Data Engineering: is a powerful Apache Spark service on Kubernetes and includes key productivity enhancing capabilities typically not available with basic data engineering services. Preparing data for analysis and production use cases across the data lifecycle is critical for transforming data into business value. CDP Data Engineering is a purpose-built data engineering service to accelerate enterprise data pipelines from collection and enrichment to insight, at scale.

CDP Operational Database: is a high-performance NoSQL database service that provides scale and performance for business-critical operational applications. It offers evolutionary schema support to leverage the power of data while preserving flexibility in application design by allowing changes to underlying data models without having to make changes to the application. In addition, it provides Auto-scaling based on the workload utilization of the cluster to optimize infrastructure utilization.

CDP Data Visualization: CDP Data Visualization simplifies the curation of rich, visual dashboards, reports and charts to provide agile analytical insight in the language of business, democratizing access to data and analytics across the organization at scale. It allows Technical teams to rapidly share analysis and machine learning models using drag and drop custom interactive applications. It provides business teams and decision makers the data insights to make trusted, well informed business decisions.

These data cloud services in combination with CDP are purpose-built for data specialists. They deliver rapid, real-time business insights with the enterprise-grade security and governance and will permit Cloudera to continue to be a leader in data science.

Why Deploy an Enterprise Data Warehouse on a Hybrid Cloud Architecture?

Why Deploy an Enterprise Data Warehouse on a Hybrid Cloud Architecture?

Analytics and artificial intelligence (AI) solutions are profoundly transforming how businesses and governments engage with consumers and citizens. Across many industries, high-value transformative use cases in personalized medicine, predictive maintenance, fraud detection, cybersecurity, logistics, customer engagement, geospatial analytics, and more are rapidly emerging

Deploying and scaling AI across the enterprise is not easy especially as the volume, velocity, and variety of data continue to explode. What’s needed is a well-designed, agile, scalable, high-performance, modern, and cloud-native data and AI platform that allows clients to efficiently traverse the AI space with trust and transparency. An enterprise data warehouse (EDW) is a critical component of this platform.

EDWs are central repositories of integrated data from many sources. They store current and historical data used extensively by organizations for analysis, reporting, and better insights and decision-making. Historically, data warehouse appliances (DWAs) have delivered high query performance and scalability, but are now struggling to transform data into timely, actionable insights with the data explosion.

A hybrid, open, multi-cloud platform allows organizations to take advantage of their data and applications wherever they reside, on-premises, and across many clouds. Here are some key pros and cons of deploying EDWs over on-premises, hybrid, or public clouds (Figure 1):

 

Figure 1: Comparing Enterprise Data Warehouses on On-Premises, Public and Hybrid Cloud

  • Strategic for the long-term: About 80% of enterprise workloads are still on-premises[1] and still strategic, the public/hybrid cloud is even more strategic driving most of the innovation, growth, and investment in analytics.
  • Total long-term costs: On-premises costs are predictable and become more favorable with greater utilization. Public cloud costs are unpredictable and good for short, infrequent spiky workloads and consumption-based pricing produces greater accountability of the user population. However, these costs grow steeply with higher utilization typical for most EDWs today. In addition, there are many other hidden costs such as long-term contracts, incremental, supplementary licensing fees, and more.

With hybrid cloud EDWs, customers can prudently optimize costs using on-premises assets for predictable workloads and offload spiky workloads to the public cloud. This is very effective for the long-term as a smaller on-premises hardware footprint can meet immediate requirements, and incremental needs for resources during peaks can be satisfied by the public cloud.  Key components of the total costs include:

  • Data Transfer/Migration Costs: For on-premises, these are negligible since most of the data for the entire analytics workflow typically reside on-premises. Significant for public clouds since many analytics workflows require substantial movement of data to and from the public cloud. Often enterprises are limited in their ability to move datasets from the cloud back to their on-premises equipment or to another cloud. Moreover, cloud providers charge fees for transferring data out their cloud environment which dramatically increases costs – particularly as datasets continue to grow. Also migrating on-premises workloads to the public cloud is hard and time-consuming.

In hybrid clouds, there is limited movement of data throughout the analytics workflow to and from the public cloud, and so these costs are low to medium. With consistent cloud-native architectures, migrating workloads from on-premises to public clouds is also relatively easy and less expensive.

  • Capital Costs: Significant capital investment for on-premises IT infrastructure is needed to handle peak loads and may result in lower and sub-optimal utilization under normal operations. For public clouds customer capital costs are negligible. For hybrid clouds, some capital investment for IT infrastructure is needed for certain critical analytics workloads to run on-premises with the rest offloaded to the public cloud. This may result in better utilization and lower capital costs compared to the all on-premises alternative.
  • Upgrade Costs: Significant capital expense for hardware upgrades over time needed to modernize on-premises IT infrastructure to drive innovation. For public clouds, the customer incurs a negligible capital expense for hardware upgrades over time since the provider is responsible for the infrastructure. For hybrid clouds, the modest capital expense for hardware upgrades over time is needed to modernize infrastructure.
  • Operating Costs: Since the customer typically owns and operates on-premises assets, costs are predictable and high utilization environments provide better economics than public clouds which are better for short spiky workloads. With a hybrid cloud, the customer can prudently minimize costs by largely using on-premises assets for predictable workloads and offloading spiky workloads to the public cloud.
  • Deployment Costs (no Integration/Customization): Significant for on-premises since provisioning and deploying resources and analytics workflows take more time and effort. Whereas costs are low on public clouds with faster provisioning and deployment as the process is automated. On hybrid clouds, costs are significant since connectivity between on-premises and public cloud and maintaining two environments could add another layer of complexity. However, this could be alleviated with a consistent cloud-native containerized architecture.
  • Management/Maintenance: Moderately hard for on-premises since customers must invest in scarce skills and resources to maintain and operate these environments. Much easier with public clouds since customers typically can use a centralized portal with process automation. For hybrid clouds, it is relatively straightforward for customers to maintain and operate with the right pre-determined operating policies and procedures for workload placement on-premises or on-the-cloud.
  • Integration/Customization: Easier for on-premises customers to customize and integrate newer solutions with their legacy solutions. This is harder to do on public clouds. On hybrid clouds, it is easier to integrate legacy systems with newer custom solutions from the edge to multiple clouds seamlessly.
  • Business Continuity/Serviceability: It can be tailored to provide higher service level agreements (SLAs) for on-premises customers. It is harder to do for public clouds, but they can deliver excellent business continuity. Hybrid clouds can provide high SLAs and excellent business continuity even with disasters.
  • Performance/Scalability: EDWs offer excellent performance on-premises with hardware accelerators, faster storage, and proximity to data, but harder to scale to address new business requirements. Lower performance for large-scale analytics on public clouds since maintaining data proximity is hard and optimized storage and computing infrastructure are typically not available. But public clouds can easily scale to meet new business requirements for smaller data sizes. However, as data sets continue to grow exponentially, beyond a few 100s of terabytes, these environments have limited elasticity. Hybrid EDWs have excellent performance with hardware accelerators, faster storage, and proximity to data either on-premises or on-the-cloud and can also easily scale to meet new business requirements.
  • Governance/Compliance: Excellent for on-premises since these operations can be tailored to meet individual enterprise and regulatory requirements. Public clouds have limited ability to tailor these operations for individual customers since they are set broadly by the cloud provider. Hybrid clouds are excellent since these operations can be tailored to meet individual enterprise and regulatory requirements consistently end-to-end.
  • Data Protection/Security: On-premises and hybrid clouds are excellent since sensitive data can be stored and managed for individual customer requirements and protocols. Public clouds are somewhat vulnerable since their infrastructure is shared and many enterprises are reluctant to part with their mission-critical data.
  • Vendor Lock-in: Strong for on-premises and public clouds especially with the underlying software infrastructure. Also, data migration to an alternate solution is complex and expensive.

 

A hybrid multi-cloud environment empowers customers to experiment with and choose the tools, programming languages, algorithms, and infrastructure to build data pipelines, train and make analytics/AI models ready for production in a governed way for the enterprise, and share insights throughout the workflow.

[1] Nagendra Bommadevara, Andrea Del Miglio, and Steve Jansen, “Cloud adoption to accelerate IT modernization”, McKinsey & Company, 2018

 

Total Value of Ownership (TVO) of IBM Cloud Pak for Data

The speed and scope of the business decision-making process is accelerating because of several emerging technology trends – Cloud, Social, Mobile, the Internet of Things (IoT), Analytics and Artificial Intelligence/Machine Learning (AI/ML). To obtain faster actionable insights from this growing volume and variety of data, many organizations are deploying Analytics solutions across the entire workflow.

For strategic reasons, IT leaders are focused on moving existing workloads to the cloud or building new workloads on the cloud and integrating those with existing workloads. Quite often, the need for data security and privacy makes some organizations hesitant about migrating to the public cloud. The business model for cloud services is evolving to enable more businesses to deploy a hybrid cloud, particularly in the areas of big data and analytics solutions.

IBM Cloud Pak for Data is an integrated data science, data engineering and app building platform built on top of IBM Cloud Pak for Data – a hybrid cloud that provides all the benefits of cloud computing inside the client’s firewall and provides a migratory path should the client want to leverage public clouds. IBM Cloud Pak for Data clients can get significant value because of unique capabilities to connect their data (no matter where it is), govern it, find it, and use it for analysis. IBM Cloud Pak for Data also enables users to collaborate from a single, unified interface and their IT staff doesn’t need to deploy and connect multiple applications manually.

These IBM Cloud Pak for Data differentiators enable quicker deployments, faster time to value, lower risks of failure and higher revenues/profits. They also enhance the productivity of data scientists, data engineers, application developers and analysts; allowing clients to optimize their Total Value of Ownership (TVO), which is Total Benefits – Total Costs.

The comprehensive TVO analysis presented in a recent Cabot Partners paper compares the IBM Cloud Pak for Data solution with a corresponding In-house solution alternative for three configurations – small, medium and large. This cost-benefit analysis framework considers cost/benefit drivers in a 2 by 2 continuum: Direct vs. Derived and Technology vs. Business mapped into four quantified quadrants: Costs, Productivity, Revenues/Profits and Risks.

Compared to using an In-house solution, IBM Cloud Pak for Data can improve the three-year ROI for all three configurations. Likewise, the Payback Period (PP) for the IBM Cloud Pak for Data solution is shorter than the In-house solution; providing clients faster time to value. In fact, these ROI/PP improvements grow with configuration size; offering clients better investment protection as they progress in their Analytics and AI/ML journey and as data volumes and Analytics model complexities continue to grow.

You can access the full report here.