AI架构扩展性探讨：从Llama 3到Llama 5的算力飞跃

类别：教育科技

对话探讨AI架构的扩展性，指出基于Transformer的新架构尚未遇到性能天花板。以Llama系列为例，训练GPU数量从1-2万张增至10万张以上，引发对扩展极限的思考。

时长 79 秒 · 口音美式 · 语速 150 wpm

字幕原文

In which field? 你是做哪一行的？

You're in so many. 你涉猎的领域真多。

I am particularly curious about the combination of AI and hardware, 我特别好奇AI和硬件的结合，

but I realize that we've covered a lot. 不过我发现咱们已经聊了不少了。

So I'm curious the direction you'd take this on a question that occupies you right now. 所以我想知道，你现在最关注的问题会往哪个方向走？

Gosh, I mean, 哎呀，这个嘛，

I think maybe one that's a little more AI specific. 我觉得可能更偏向AI本身。

Is there a current set of methods that seem to be scaling very well? 现在有没有哪套方法看起来扩展性特别强？

So with past AI architectures, 以前那些AI架构吧，

you could kind of feed an AI system a certain amount 你给它喂一定量的数据，

of data and use a certain amount of compute, 用一定量的算力，

but eventually it hit a plateau. 但最后总会遇到瓶颈。

And one of the interesting things 有意思的是，

about these new transformer-based architectures over the last five 过去五到十年这些基于Transformer的新架构，

to 10 years is that we haven't found the end yet. 我们还没找到它的天花板。

So that leads to this dynamic where Lama 3, 所以就出现了这种情况：训练Llama 3的时候，

we could train on 10,000 to 20,000 GPUs. 我们用了一万到两万张GPU。

Lama 4, 到了Llama 4，

we could train on more than 100,000 GPUs. 训练用了超过十万张GPU。

Lama 5, Llama 5呢，

we can plan to scale even further. 我们计划继续扩大规模。

And there's just an interesting question of how far that goes. 这就带来一个有趣的问题：这条路到底能走多远？

It's totally possible that at some point we just hit a limit. 完全有可能，到了某个点我们就撞上极限了。

And just like previous systems, 就像以前的系统一样，

there's an asymptote and it doesn't keep on growing. 会有一个渐近线，不会一直增长下去。