Meta tries to speed up AI inference with open-source AITemplate

Couldn’t attend Transform 2022? Check out all the top sessions in our on-demand library now! Look here.


Without inference, an artificial intelligence (AI) model is just math and doesn’t perform or predict much or anything.

To date, AI inference engines have largely been tied to specific hardware for which they are designed. That degree of hardware lock-in means developers have to build specific software for different hardware, and could also slow the pace of innovation in the industry in general.

The challenge of managing inference hardware has not been lost on social media giant Meta (formerly Facebook). Meta uses a lot of different hardware in its infrastructure and has a lot of challenges when implementing inference solutions. To help solve that challenge, Meta has been working on a technology it calls AITemplate (AIT), which defines it as a unified inference system that will initially support both Nvidia TensorCore and AMD MatrixCore inference hardware. Meta announced yesterday that it is open sourcing AItemplate under an Apache 2.0 license.

“Our current version of AIT focuses on supporting Nvidia and AMD GPUs, but the platform is scalable and could support Intel GPUs in the future if there was a demand for it,” Ajit Matthews, technical director at Meta, told IPS. VentureBeat. “Now that we have open-source AIT, we welcome all silicon providers interested in contributing to it.”

Event

MetaBeat 2022

MetaBeat will bring together thought leaders to offer advice on how metaverse technology will change the way all industries communicate and do business October 4 in San Francisco, CA.

Register here

The need for GPU and inference engine abstraction

The idea of ​​AI hardware lock-in is not limited to just inference engines; it’s also a concern others in the industry, including Intel, have about GPUs for accelerated computing.

Intel is one of the leading proponents of the open-source SYCL specification, which seeks to help create a unified programming layer for GPUs. The Meta-led AIT effort is similar in concept, though different in what makes it possible. Matthews explained that SYCL is closer to the GPU programming level, while AITemplate focuses on high-performance TensorCore/MatrixCore AI primitives.

“AIT is an alternative to TensorRT, Nvidia’s Inference engine,” Matthews said. “Unlike TensorRT, it is an open source solution that supports both Nvidia and AMD GPU backends.”

Matthews noted that AIT first characterizes the model architecture and then works to fuse and optimize layers and operations specific to that architecture.

It’s not about competition

AIT is not just about creating a common software layer for inference, it is also about performance. In early tests conducted by Meta, it is already seeing performance improvements over non-AIT inference-powered models on both Nvidia and AMD GPUs.

“For AIT, the goal is to bring flexible, open, more energy-efficient AI inference to GPU users,” Matthews said.

Meta builds AIT not only to serve the greater good, but also to meet its own AI needs. Matthews said Meta’s workloads are evolving and to meet these changing needs, it needs solutions that are open and performant. He also noted that Meta tends to want the top tiers of its technology stacks to be hardware agnostic. AIT does that today with AMD and Nvidia GPUs.

“We see opportunities with many of our current and future Inference workloads to take advantage of AIT,” he said. “We think AIT has the potential for widespread adoption as the highest-performing unified inference engine.”

The mission of VentureBeat is a digital city square for tech decision makers to learn about transformative business technology and transactions. Discover our briefings.