Accelerating AI for growth: the key role of infrastructure

This article is part of a VB special. Read the full series here: The CIO Agenda: The 2023 Roadmap for IT Leaders.

And don’t miss additional articles with new industry insights, trends and analysis on how AI is transforming organizations. Find them all here.


Enterprises everywhere are recognizing the central role of artificial intelligence (AI) in driving transformation and business growth. In 2023, many CIOs will shift from the “why” of AI to “how?” More specific: “What is the best way to grow AI production quickly and economically at a scale that creates value and business growth?”

It’s a high-stakes balancing act: CIOs must enable rapid, broader development and deployment, and enable the maintenance of impactful AI workloads. At the same time, enterprise IT leaders need to more accurately manage spending, including costly “shadow AI,” so they can better focus and maximize strategic investments in the technology. That, in turn, can help fund continued, profitable AI innovation, creating a virtuous circle.

High-performance AI infrastructure — purpose-built platforms and clouds with optimized processors, accelerators, networking, storage, and software — provides a powerful way for CIOs and their enterprises to successfully balance these seemingly competitive demands, enabling them to manage cost-effectively and orderly. speed up. growth and “industrialization” of manufacturing AI.

In particular, standardization on a public cloud-based, accelerated “AI-first” platform provides on-demand services that can be used to quickly build and deploy muscular, high-performing AI applications. This end-to-end environment can help companies manage related spend, lower the threshold for AI, reuse valuable IP and, crucially, keep valuable internal resources focused on data science and AI, not infrastructure.

Three key requirements for accelerating the growth of AI

A major benefit of focusing on AI infrastructure as a key enabler of AI and business growth is its ability to help enterprises meet three key requirements. We and others have observed these in our own pioneering work in the area and, more generally, in the development and adoption of technology over the past 20 years. These are: standardization, cost control and governance.

Let’s look at them briefly.

1. AI standardization

Enabling orderly, rapid, cost-effective development and implementation

Like big data, cloud, mobile and PCs before it, AI is a transformative game-changer – with even greater potential impact both inside and outside the organization. As with these previous innovations, including virtualization, big data and databases, SaaS and many others, after careful evaluation, smart enterprises will want to standardize on accelerated AI platforms and cloud infrastructure. This brings a whole host of well-understood benefits to this latest set of universal tools. Major banks, for example, owe a lot to their vaunted ability to rapidly expand and grow into standardized, global platforms that enable rapid development and deployment.

With AI, standardizing on optimized stacks, pre-integrated platforms, and cloud environments helps enterprises avoid the many drawbacks that often come with handling a chaotic variety of products and services. Chief among these: unmanaged procurement, sub-optimal development and model performance, duplication of efforts, inefficient workflows, pilots that cannot be easily replicated or scaled, more expensive and complex support, and lack of specialized staff. Perhaps most serious is the inordinate time and expense associated with selecting, building, integrating, tuning, deploying, and maintaining a complex stack of hardware, software, platforms, and infrastructures.

To be clear, enterprise standardization of AI platform and cloud doesn’t mean one-size-fits-all, exclusivity with one or two vendors, or a return to strictly centralized IT control.

Rather, modern AI cloud environments should provide tiered services optimized for a wide range of use cases. The “standardized” AI platform and infrastructure must be purpose-built for different AI workloads and must provide appropriate scalability, performance, software, networking, and other capabilities. A cloud marketplace, familiar to many business users, offers AI developers a variety of approved choices.

As for portability, containerization, Kubernetes, and other open, cloud-native approaches provide easy movement between providers and multiclouds, easing concerns about lock-ins. And while enterprise standardization restores a CIO’s overall visibility and control, it can overlay existing procurement policies and procedures, including decentralized approaches – a win-win situation.

2. AI cost management

Focusing and releasing funds for continued innovation and value

According to various estimates, unauthorized spending, often by corporate groups, adds 30-50% to technology budgets. While specific numbers for such “shadow AI” are hard to come by, surveys of enterprise IT priorities for 2023 suggest it’s a good bet that hidden investments in products and services will eat up a large portion of AI infrastructure costs . The good news is that centralized procurement and delivery of enterprise-standard AI services restores institutional control and discipline while providing flexibility to organizational consumers.

With AI, as with any workload, the cost is a function of how much infrastructure you need to buy or rent. CIOs want to help groups developing AI avoid both oversupply (often with high costs but underutilized local infrastructure) and undersupply (which can slow down model development and deployment and lead to unplanned capital purchases or cloud service overruns).

To avoid these extremes, it makes sense to look at AI costs in a new way. Accelerated processing for inference or training may cost more (or not) initially by using a powerful, optimized platform. Still, the work can be done faster, meaning you have to rent less infrastructure for less time, lowering the bill. And, not unimportantly, the model can be deployed faster, which can provide a competitive advantage. This accelerated time-to-value is analogous to the difference between the total travel time to Dallas from Chicago (15 hours) or flying nonstop (5 hours). One can cost less (or with current gas prices, more); the other will get you there much faster. Which is more “valuable”?

In AI, assessing development costs from a total cost of ownership perspective can help you avoid the common mistake of looking at raw costs only. As this analysis shows, the advantage of arriving faster, with less wear and tear and fewer opportunities for detours, accidents, traffic jams or wrong turns, is a smarter choice for our road trip. So it is with fast, optimized AI processing.

Faster training times accelerate time to insight, maximizing the productivity of an organization’s data science teams and enabling faster deployment of the trained network. There is another important advantage: lower costs. Customers often experience a 40-60% cost reduction versus a non-accelerated approach.

Train an advanced large-language model (LLM) on thousands of GPUs? Optimize an existing model on a handful of GPUs? Run real-time inferences around the world for inventory? As we noted above, understanding and budgeting AI workloads prior to helps ensure that facilities are properly matched to the task and budget.

3. AI governance

Ensure accountability, measurability, transparency

The term AI governance has recently taken on a variety of meanings, from ethics to explainability. Here it refers to the ability to measure cost, value, auditability and regulatory compliance, especially around data and customer information. As AI expands, the ability of companies to easily and transparently ensure ongoing accountability will continue to be more important than ever.

Again, a standardized AI cloud infrastructure can provide automations and metrics to support this critical requirement. In addition, multiple security mechanisms built into several layers of purpose-built infrastructure services — from GPUs to networks, databases, developer kits, and more, soon to include confidential computing — help provide deep and essential secrecy for AI models and sensitive data.

A final reminder about roles and responsibilities: Rapidly achieving profitable, compliant AI growth and maximizing value and TCO using advanced, AI-first infrastructure cannot be a solo act for the CIO. As with other AI initiatives, it requires close collaboration with the chief data officer (or equivalent), data science lead, and, in some organizations, chief architect.

In short: focus on how. Utilities.

Most CIOs today know the “why” of AI. It’s time to make ‘how’ a strategic priority.

Enterprises that master this critical skill – accelerating the easy development and deployment of AI – will be much better positioned to maximize the impact of their AI investments. That could mean accelerating the innovation and development of new applications, enabling easier and broader adoption of AI across the enterprise, or accelerating time-to-production value in general. Technology leaders who fail to do so risk creating AI that sprouts wildly in expensive patches, slowing development and adoption, and losing advantage to faster, better managed competitors.

Where do you want to be at the end of 2023?

Visit the Make AI Your Reality hub for more AI insights.

#MakeAIYourReality #AzureHPCAI #NVIDIAonAzure

Nidhi Chappell is general manager of Azure HPC, AI, SAP and confidential computing at Microsoft.

Manuvir Das is VP of enterprise computing at Nvidia.


VB Lab Insights content is created in conjunction with a company that pays for the mail or has a business relationship with VentureBeat, and they are always clearly marked. For more information, please contact sales@businesskinda.com.