a16z - DeepSeek: AI's Sputnik Moment? Steven Sinofsky and Martin Casado Discuss
发布时间:2025-02-06 21:52:21
原节目
以下是将原文翻译成中文:
这段视频记录了技术专家们关于中国某对冲基金/研究机构最近发布的 DeepSeek R1 AI 模型的讨论。谈话围绕着这个模型的意外出现、它对人工智能格局的影响,以及从互联网发展中汲取到的更广泛的经验教训展开。
专家们承认 DeepSeek R1 的发布让 AI 社区措手不及。引起轰动的关键因素包括该模型与现有模型(如 GPT-4)相当的性能,令人惊讶的低训练成本(估计为 600 万美元),以及以宽松的 MIT 许可证发布。这种开源性质允许免费使用和修改。此外,它还发布了推理轨迹或思维链,使较小的模型能够以更快、更低的成本进行训练。这标志着人工智能行业发展轨迹的转变。
讨论强调,DeepSeek 的出现并非突然,而是中国一支技术精湛的团队长期研究成果的结晶,他们的贡献已经可以在公开文献中找到,尽管没有得到很好的整合。小组成员认为,围绕 DeepSeek 发布引发的强烈反响是合理的,我们应该做出适当的反应。
一种理论认为,他们的优势来自于模型必须在工程约束下工作。此外,人们认为该模型可以访问庞大的中国互联网以及公共互联网,这赋予了他们数据优势。
专家们批评了西方 AI 社区盛行的“超大规模企业视角”,这种观点过度强调算力和数据量。他们认为,这种方法忽略了通过工程约束进行创新的潜力,例如如何用更少的资源做更多的事情。中国团队的成就证明了以更少的资源和更大的创造力实现突破的潜力。
讨论的关键转向了从互联网发展中汲取的经验教训,表明当前的人工智能发展轨迹与互联网的演变相似。最初的重点是将 HTML 和 HTTP 等核心技术货币化,而真正的价值出现在购物、旅游和媒体等更高层级的应用层面。同样,在人工智能领域,重点是基础模型,并急于将其货币化。但将会出现转变,真正的经济价值将是应用层,并且将开发标准化的应用程序。
讨论深入研究了纵向扩展与横向扩展的比较。纵向扩展是增加一个模型的计算能力,而横向扩展将是增加端点的数量,从而让用户对正在发生的事情有更多的控制权。横向扩展将降低成本。
谈话指出了计算分配方式的一个重大范式转变。他们认为,目前最昂贵的 MIPS(每秒百万条指令)存在于核动力数据中心,而个人智能手机上的 MIPS 基本上是免费的。
他们警告不要重蹈 AT&T、WorldCom 和 AOL 的覆辙。他们认为,整个行业正在重蹈覆辙。各公司应该从他们做错的地方吸取教训。相反,重点必须放在赋能边缘创新、激发创造力以及普及 AI 能力的访问。
演讲者强调了专业模型和定制应用开发的重要性。正如 JavaScript 通过在浏览器中实现动态功能而彻底改变了 Web 开发一样,大规模微调和部署专用 AI 模型的能力将解锁新的应用程序。
然后,小组讨论转向了监管影响。演讲者强调,美国需要投资研发,才能在 AI 竞赛中竞争。
演讲者认为,出口管制和其他限制政策实际上对阻止中国拥有这些类型的芯片作用甚微。为了拥有竞争优势,美国必须加大对人工智能研究的投入以保持领先地位,这样就不需要限制了。
The video transcript captures a discussion between tech experts about the recent release of the DeepSeek R1 AI model by a Chinese hedge fund/research organization. The conversation revolves around the surprising emergence of this model, its implications for the AI landscape, and the broader lessons learned from the internet's development.
The experts acknowledge that DeepSeek R1’s release took the AI community by surprise. Key factors contributing to the buzz include the model's comparable capabilities to existing models (like GPT-4), the remarkably low estimated cost of training ($6 million), and the permissive MIT license under which it was released. This open-source nature allows for free use and modification. This is in addition to releasing the reasoning traces, or chain of thought, allowing smaller models to be trained more quickly and cheaply. This is a shift in the trajectory of the AI industry.
The discussion highlights that DeepSeek is not a sudden emergence but is a culmination of research efforts from a skilled team in China, whose contributions have been available in the public literature, although not well aggregated. The panelists believe that the outcry around DeepSeek's release is warranted, however, and that we should respond appropriately.
One theory is that their advantage is a result of the engineering constraints that the model had to work under. Additionally, it is thought that the model has access to a vast Chinese internet, as well as a public internet, which grants them a data advantage.
The experts criticize the prevailing "hyperscaler view" in the Western AI community, which overly emphasizes compute power and data volume. They argue that this approach ignores the potential for innovation through engineering constraints, such as what can be done with less resources. The Chinese team's achievement exemplifies the potential to achieve breakthroughs with fewer resources and greater ingenuity.
The key discussion turns towards the lessons from the internet's development, suggesting that the current AI trajectory mirrors the internet's evolution. The initial focus was on monetizing core technologies like HTML and HTTP, while the real value emerged at higher layers of the stack in areas like shopping, travel, and media. Similarly, in AI, the emphasis is on the base models, and there is a rush to monetize them. But a shift will occur where the real economic value is the app layer, and standardized apps are developed.
The discussion dives into the comparison of scaling up vs scaling out. Scaling up is increasing the compute power of one model and scaling out would be increasing the amount of endpoints, to allow the users to have more control over what is happening. To scale out would reduce costs.
The conversation points out a significant paradigm shift in how computation is distributed. They argue that the most expensive MIPS are currently in nuclear-powered data centers, whereas the MIPS on individual smartphones are essentially free.
They warn against repeating the mistakes of AT&T, WorldCom, and AOL. They believe the industry is repeating their mistakes. The companies should learn from how they did it wrong. Instead, the focus must be on empowering innovation at the edge, enabling creativity, and democratizing access to AI capabilities.
The speakers highlight the importance of specialized models and customized application development. Just as JavaScript revolutionized web development by enabling dynamic functionality within browsers, the ability to fine-tune and deploy specialized AI models at scale will unlock new applications.
The panel then pivots to the regulatory implications. The speakers emphasize the need for the US to invest in research and development to compete in the AI race.
The speakers argue that export control and other restriction policies will do very little to actually prevent China from having those types of chips. In order to have a competitive edge, the United States must be investing more in AI research to stay ahead, so that restrictions are not necessary.