Jack Vernon (Senior Research Analyst)

AI advances and innovations continue to make weekly news headlines, and almost all of what we learn about comes from AI applications in software. Without hardware innovation, however, the current wave of AI excitement would be non-existent. What’s more, hardware innovation will continue to shape the art of the possible for AI for years to come.

NVIDIA is the best-known innovator in the AI compute hardware market; its graphical processor units (GPUs), originally designed for high-end media work and gaming, are today used to accelerate both training and inferencing of the deep learning systems that underpin the advances we see in applications like computer vision, natural language processing (NLP), and voice recognition.

AI Compute Hardware Market

NVIDIA was certainly quick to spot its AI opportunity — and it pivoted accordingly, building impressive go-to-market programs and specialized software tools so early adopters in industries like automotive, healthcare, and financial services could access and benefit from its GPUs quickly and flexibly.

But there’s more to specialized AI compute hardware than GPUs, and NVIDIA has a growing field of competitors.

The fundamental design point of GPUs — their ability to carry out huge quantities of numerical calculations on large datasets in parallel — suits the requirements of deep neural-network-based systems in particular. However, GPUs have never been fundamentally designed for the specific needs of AI workloads, and it also turns out that AI workloads are themselves highly varied in terms of their hardware requirements.

For example, system training and inferencing bring different workload requirements. In addition, different use cases bring different requirements around the ideal locations for training and inferencing to occur: in an on-premises datacenter, on a public cloud platform, or on an edge device or gateway. Multiple real-world factors (including the effect of latency, enterprise concerns about data residency, or the need to keep datasets anonymous) impact the architecture choices that most make sense — architecture choices that then place constraints on compute hardware features (such as power consumption).

Beyond GPUs, there are three further compute hardware choices in the mix:

  • It’s easy to overlook the role of general-purpose CPUs, but they are relevant in many areas of AI and machine learning and are still used widely for a range of tasks. Even in the field of deep learning, where general-purpose CPUs aren’t optimized, it’s still possible to run algorithms on CPU-based systems — and it may make sense to use CPUs when training and inferencing speed aren’t a high priority.
  • FPGAs (field programable gate arrays). The fundamental computing capabilities of FPGAs can be configured and reconfigured after manufacturing, so customers can specify and arrange hardware functions (including logic functions as well as memory) in the field to fully adapt them to the needs of a specific workload. FPGA vendors such as Xilinx and Intel are positioning them as solutions that are particularly well suited for AI inferencing, and Microsoft takes advantage of FPGAs to run AI inference workloads for systems like Bing and its Azure infrastructure. Other hyperscale cloud vendors are starting to offer customers FPGA instances for AI workloads.
  • ASICs (application-specific integrated circuits). These are designed with one workload in mind, and aim to maximize efficiency for a single task or algorithm. ASICs make sense when there is a large volume market in play for a very well-defined workload: audio processing is one example. Google’s Tensor Processing Unit (TPU), which it offers customers through its Google Cloud Platform, is arguably an example of an ASIC for deep learning; the company claims that it has avoided building 12 additional cloud datacenters because of its investment in TPU technology. Other companies, like UK start-up Graphcore, are bringing their own new approaches to market for AI. Graphcore has developed a new architecture it’s calling the Intelligent processing unit (IPU), a platform which situates memory units on the same silicon as processing, and unlike an ASIC can address a wide range of AI related workloads.

Compute hardware for AI: Risks and Recommendations

All this innovation around compute hardware for AI workloads is great in terms of pushing the art of the possible, but it also presents risks for unwary adopters. Assessing hardware benchmarks is part of the answer to understanding what will work best for a given workload type and use case, but adopters will also need to explore the energy consumption of different hardware choices; the upfront costs of purchasing new systems, either on premises or in the cloud; workload-specific performance requirements (across training and inferencing); the skills and tooling constraints around specific hardware choices; and more.

If you want to learn more about this topic or have any questions, please contact Jack Vernon, or head over to https://uk.idc.com and drop your details in the form on the top right.