Meta announces AI training and inference chip project

Meta announces AI training and inference chip project

Technology

The new design would be 31% cheaper and be built twice as quickly as the current data centre

 NEW YORK (Reuters) - Meta Platforms on Thursday shared new details on its data centre projects to better support artificial intelligence work, including a custom chip "family" being developed in-house.

The Facebook and Instagram owner said in a series of blog posts that it designed a first-generation chip in 2020 as part of the Meta Training and Inference Accelerator (MTIA) program. The aim was to improve efficiency for the recommendations models it uses to serve ads and other content in news feeds.

Previously the company was not planning to deploy its first in-house AI chip widely and was already working on a successor. The blog posts portrayed the first MTIA chip as a learning opportunity.

The first MTIA chip was focused exclusively on an AI process called inference, in which algorithms trained on huge amounts of data make judgments about whether to show, say, a dance video or a cat meme as the next post in a user's feed, the posts said.

Joel Coburn, a software engineer at Meta, said during a presentation about the new chip that Meta had initially turned to graphics processing units, or GPUs, for inference tasks, but found they were not well suited to inference work.

"Their efficiency is low for real models, despite significant software optimizations. This makes them challenging and expensive to deploy in practice," Coburn said. "This is why we need MTIA."

A Meta spokesperson declined to comment on deployment timelines for the new chip or elaborate on plans to develop chips that could train the models as well.

Meta has been engaged in a massive project to upgrade its AI infrastructure in the past year, after executives realized it lacked the hardware and software to support demand from product teams building AI-powered features.

As a result, the company scrapped plans for a large-scale rollout of an in-house inference chip and started work on a more ambitious chip capable of performing training and inference, Reuters reported.

Meta's blog posts acknowledged that its first MTIA chip stumbled with high-complexity AI models, but noted that it handled low- and medium-complexity models more efficiently than competitor chips.

The MTIA chip also used only 25 watts of power - a fraction of what market-leading chips from suppliers such as Nvidia Corp consume - and used an open-source chip architecture called RISC-V, Meta said.

Meta also provided an update on plans to redesign its data centres around more modern AI-oriented networking and cooling systems, saying it would break ground on its first such facility this year.

The new design would be 31% cheaper and could be built twice as quickly as the company's current data centres, an employee said in a video explaining the changes.

Meta said it has an AI-powered system to help its engineers create computer code, similar to tools offered by Microsoft Corp, Amazon.com Inc. and Alphabet Inc.