I'm filled to introduce our next and final speaker, Andrei Carpathi. And then Carpathi probably needs no introduction. Most of us have probably watched his YouTube videos at length. But he's renowned for his research in deep learning. He designed the first deep learning class at Stanford, was part of the founding team at OpenAI, led the computer vision team at Tesla, and is now a mystery man again now that he has just left OpenAI. So we're very lucky to have you here. And, Andrei, you've been such a dream speaker, and so we're excited to have you and Stephanie close out the day. Thank you.
我很荣幸为大家介绍我们的下一位也是最后一位讲者,Andrei Carpathi。而Carpathi可能无需介绍。我们大多数人可能都长时间观看过他的YouTube视频。但他以深度学习研究而闻名。他设计了斯坦福大学的第一个深度学习课程,在OpenAI的创始团队中,领导了特斯拉的计算机视觉团队,现在再次 taste around is. is.
...
. is .
with.
is.
this many of .
of as a .
of rough K The
a e K The with.
of rough with for this e .
Pat a .
this of of using an The The Simply Advertisement R. . W by The 7 data recovery key!!!!!!!!!!!!!
4 8 7 7⚫️4 5 6 0⚫️4 9 2 1
! 7 6d 4 9 5 3 4 9 i 7
7 4 9 3 2 5 7 1 7 2
2 9 1 5⚫️2 0 8 3 7 5
!
C c 4 5 7 7 2 1 6 8 1 5
Þ s 6 4 2 6 4
9 3 < !"$ 0 1 5 4 3 9 1 7 0
7 9 2 6 4 5 0 3 9 1 8 4 7
8 9 1⚫️1 3 2 9 0 1 8 7 2 8 5 7
! " 4 9 0 1 1 3 7 1 9 6
8 7 8 5 4 7 1 0 3
m en c 2 9 9 3 1 1 7 9 4 6 2
0⚫️5 #5 9 0 0 5
09 8 9 2 8 5 8 2 " 1 9 3 0 4
USA 2021⚫⚫⚫⚫⚫⚫⚫ 2.16(device 1⚫2😎 0.00order 5
t 7 8 3 2 6 4 5 9 4 5 1 5 7 6 0 0 7 c 7 2 8 5 r
1 1 7 9 2 !
1 2 5 0 5 5 7
Andrei's first reaction as we walked up here was, oh my God, to his picture. It's like a very intimidating photo. I don't know what Euro was taking, but he's impressed. Okay, amazing. Andrii, thank you so much for joining us today, and welcome back. Yeah, thank you. Fun fact that most people don't actually know how many folks here know where OpenAI's original office was. That's amazing. Nick. I'm going to guess right here. Right here on the opposite side of our San Francisco office where actually many of you guys were just in huddles. So this is fun for us because it brings us back to our roots back when I first started Sequoia and when Andrei first started co-founding OpenAI, Andrei, in addition to living out the Willy Wonka, working atop a chocolate factory dream, what were some of your favorite moments working from here? Yes, OpenAI was right there.
And this was the first office after, I guess, Greg's apartment, which maybe doesn't count. And so, yeah, we spent maybe two years here, and the chocolate factory was just downstairs, so it always smelled really nice. Yeah, I guess the team was 10, 20 plus. And yeah, we had a few very fun episodes here. One of them was alluded to by Jensen at GTC that happened just yesterday or two days ago. So Jensen was describing how he brought the first DGX and how he delivered it to OpenAI, so that happened right there. So that's where we all signed it. It's in the room over there.
So Andrei needs no introduction, but I wanted to give a little bit of backstory on some of his journey to date. As Sony had introduced, he was trained by Jeff Hinton and then Fei-Fei. His first claim to fame was his deep learning course at Stanford. He co-founded OpenAI back in 2015, and 2017 he was poached by Elon. I remember this very, very clearly. For folks who don't remember the context then, Elon had just transitioned through six different autopilot leaders, each of whom lasted six months each. And I remember when Andrei took this job, I thought, congratulations and good luck.
Not too long after that, he went back to OpenAI and has been there for the last year. Now unlike all the rest of us today, he is basking in the ultimate glory of freedom in all time and responsibility. And so we're really excited to see what you have to share today. A few things that I appreciate the most from Andrei are that he is an incredible, fascinating futurist thinker. He is a relentless optimist and he's a very practical builder. And so I think he'll share some of his insights around that today.
To kick things off, AGI even seven years ago seemed like an incredibly impossible task to achieve even in the span of our lifetimes. Now it seems within sight. What is your view of the future over the next ten years? Yes, I think you're right. I think a few years ago I sort of felt like AGI was, it wasn't clear how it was going to happen. It was very sort of academic and you would like to think about different approaches. And now I think it's very clear and there's like a lot of space and everyone is trying to fill it. And so there's a lot of optimization. And I think roughly speaking the way things are happening is everyone is trying to build what I refer to as this LOMOS. And basically I like to think of it as an operating system. You have to get a bunch of like very, basically peripherals that you plug into this new CPU or something like that. The peripherals are of course like text, images, audio and all the modalities. And then you have a CPU which is the LOM transformer itself. And then it's also connected to all the software 1.0 infrastructure that we've already built up for ourselves. And so I think everyone is kind of trying to build something like that and then make it available as something that's customizable to all the different nooks and crannies of the economy. And so I think that's kind of roughly what everyone is trying to build out and what we sort of also heard about earlier today.
So I think that's roughly where it's headed is we can bring up and down these relatively self-contained agents that we can give high level tasks to and specialize in various ways. So yeah, I think it's going to be very interesting and exciting. And it's not just one agent. And then there's many agents and what does that look like? And if that view of the future is true, how should we all be living our lives differently? I don't know. I guess we have to try to build it, influence it, make sure it's good and yeah, just try to make sure it turns out well. So now that you're a free independent agent, I want to address the elephant in the room, which is that OpenAI is dominating the ecosystem. And most of our audience here today are founders who are trying to carve out a little niche, praying that OpenAI doesn't take them out overnight. Where do you think opportunities exist for other players to build new independent companies versus what areas do you think OpenAI will continue to dominate even as its ambition grows? Yeah, so my high level impression is basically OpenAI is trying to build out this LMOs OS and I think as we heard earlier today, it's trying to develop this platform on top of which you can position different companies in different verticals. Now I think the OS analogy is also really interesting because when you look at something like Windows or something like that, these are also operating systems, they come with a few default apps. Like a browser comes with Windows, you can use the Edge browser.
And so I think in the same way, OpenAI or any of the other companies might come up with a few default apps quote unquote, but it doesn't mean that you can have different browsers that are running on it, just like you can have different chat agents running on that infrastructure. And so there will be a few default apps, but there will also be potentially a vibrant ecosystem of all kinds of apps that are fine-tuned to all the different nooks and carols of the economy. And I really liked the analogy of the early iPhone apps and what they looked like and they were all kind of like jokes. And it took time for that to develop and I think absolutely I agree that we're going through the same thing right now. People are trying to figure out what is this thing good at? What is it not good at? How do I work it? How do I program with it? How do I debug it?
How do I just actually get it to perform real tasks and what kind of oversight? Because it's quite autonomous, but not fully autonomous. So what does the oversight look like? What does the evaluation look like? So there's many things to think through and just to understand sort of like the psychology of it. And I think that's what's going to take some time to figure out exactly how to work with this infrastructure. So I think we'll see that over the next few years. So the race is on right now with LLMs, OpenAI and Thoropic, Mistrol, Lama, Gemini. The whole ecosystem of open source models, now a whole long tail of small models. How do you foresee the future of the ecosystem playing out? Yeah, so again, I think the open source now, sorry, the operating system is now interesting because we have say like we have basically an oligopoly of a few proprietary systems, like say Windows, Mac OS, etc. And then we also have Linux. And so and Linux has an infinity of distributions. And so I think maybe it's going to look something like that. I also think we have to be careful with the naming because a lot of the ones that you listed like Lama, Mistrol, so I wouldn't actually say they're open source, right? And so it's kind of like tossing over a binary for like an operating system. You know, like you can kind of work with it and it's like useful, but it's not fully useful, right?
And there are a number of what I would say is like fully open source LLMs. So there's, you know, Pithia models, LLM 360, ALMO, etc. So and they're fully releasing the entire infrastructure that's required to compile the operating system, right? To train the model from the data to gather the data, etc. And so when you're just given a binary, it's much better, of course, because you can fine tune the model, which is useful. But also I think it's subtle, but you can't fully fine tune the model because the more you fine tune the model, the more it's going to start regressing on everything else. And so what you actually really want to do, for example, if you want to add capability is you not regress the other capabilities, you may want to train on some kind of like a mixture of the previous data set distribution and the new data set distribution, because you don't want to regress the old distribution, it's going to add knowledge.
And if you're just given the weights, you can't do that, actually. You need the training loop, you need the data set, etc. So you are actually constrained in how you can work with these models. And again, like I think it's definitely helpful, but it's, I think we need like slightly better language for it almost. There's open weights models, open source models, and then proprietary models, I guess, and that might be the ecosystem. And yeah, probably it's going to look very similar to the ones that we have today. And hopefully you'll continue to help build some of that out.
So I'd love to address the other elephant in the room, which is scale. Simplistically, it seems like scale is all that matters, scale of data, scale of compute, and therefore the large research labs, large tech giants have an immense advantage today. What is your view of that? And is that all that matters? And if not, what else does? So I would say scale is definitely number one. I do think there are details there to get right. And I think a lot also goes into the data set preparation and so on, making it very good and clean, etc. That matters a lot. These are all sort of like compute efficiency gains that you can get.
So there's the data, the algorithms, and then of course the training of the model and making it really large. So I think scale will be the primary determining factor, is like the first principle component of things, for sure. But there are many of the other things that you need to get right. So it's almost like the scale sets some kind of a speed limit almost, but you do need some of the other things, but it's like if you don't have the scale, then you fundamentally just can't train some of these massive models if you are going to be training models. If you're just going to be doing fine tuning and so on, then I think maybe less scale is necessary, but we haven't really seen that just yet. Fully play out.
And can you share more about some of the ingredients that you think also matter? Maybe lower and priority behind scale? Yeah, so the first thing I think is like you can't just train these models. If you're just giving them money and the scale, it's actually still really hard to build these models. And part of it is that the infrastructure is still so new and it's still being developed and not quite there. But training these models at scale is extremely difficult and is a very complicated distributed optimization problem. And there's actually like the talent for this is fairly scarce right now. And it just basically turns into this insane thing running on tens of thousands of GPUs. All of them are like failing at random at different points in time. And so like instrumenting that and getting that to work is actually an extremely difficult challenge. GPUs were not like intended for like 10,000 GPU workloads until very recently.
And so I think a lot of the infrastructure is sort of like creaking under that pressure and we need to work through that. But right now if you're just giving someone a ton of money or a ton of scale or GPUs, it's not obvious to me that they can just produce one of these models, which is why it's not just about scale. You actually need a ton of expertise both on the infrastructure side, the algorithm side and then the data side and being careful with that. So I think those are the major components. The ecosystem is moving so quickly. Even some of the challenges we thought existed a year ago are being solved more and more today, hallucinations, context windows, multimodal capabilities, inference getting better, faster, cheaper.
What are the LLM research challenges today that keep you up at night? What do you think are media enough problems but also solvable problems that we can continue to go after? So I'll say on the algorithm side, one thing I'm thinking about quite a bit is this like distinct split between diffusion models and autoregressive models, they're both ways of representing probably the distributions and it just turns out that different modalities are apparently a good fit for one of the two. I think that there's probably some space to unify them or to like connect them in some way and also get some best of both worlds or sort of figure out how we can get a hybrid architecture and so on.
So it's just up to me that we have sort of like two separate points in the space of models and they're both extremely good and it just feels wrong to me that there's nothing in between. So I think we'll see that sort of carved out and I think there are interesting problems there. And then the other thing that maybe I would point to is there's still like a massive gap in just the energetic efficiency of running all this stuff. My brain is 20 watts roughly. Jensen was just talking to GTC about the massive supercomputers that they're going to build building now. These are the numbers are in mega megawatts, right? And so maybe you don't need all that to run like a brain. I don't know how much you need exactly. But I think it's safe to say we're probably off by a factor of a thousand to like a million somewhere there in terms of like the efficiency of running these models.
And I think part of it is just because the computers we've designed of course are just like not a good fit for this workload. And I think NVIDIA GPUs are like a good step in that direction. In terms of like the you need extremely high parallelism. We don't actually care about sequential computation that is sort of like data dependent in some way. We just need to like blast the same algorithm across many different sort of array elements or something. You can think about it that way. So I would say number one is just adapting the computer architecture to the new data workflows.
Number two is like pushing on a few things that we're currently seeing improvements on. So number one may be precision. We're seeing precision come down from what originally was like 64 bit for double. We're now to down to I don't know it is four, five, six or even 1.58 depending on which papers you read. And so I think precision is one big lever of getting a handle on this. And then second one of course is sparsity. So that's also like another big delta I would say like your brain is not always fully activated. And so sparsity I think is another big lever. But then the last lever I also feel like just the von Neumann architecture of like computers and how they built where you're shuttling data in and out and doing a ton of data movement between memory and you know the courses are doing all the compute.
This is all broken as well kind of and it's not how your brain works and that's why it's so efficient. And so I think it should be a very exciting time in computer architecture. I'm not a computer architect but I think there's it seems like we're off by a factor of million thousand to a million something like that. And there should be really exciting sort of innovations there that bring that down. I think there are at least a few builders in the audience working on this problem. Okay, switching gears a little bit. You've worked alongside many of the greats of our generation. Sam, Greg from OpenAI and the rest of the OpenAI team, Elon Musk. Who here knows the the joke about the rowing team, the American team versus the Japanese team? Okay, great. So this will be a good one. Elon shared the set of last base camp and I think it reflects a lot of his philosophy around how he builds cultures and teams.
So you have two teams. The Japanese team has four rowers and one steer and the American team has four steerers and one rower. And can anyone guess when the American team loses? What do they do? Shout it out. Exactly. They fire the rower. And Elon shared this example I think as a reflection of how he thinks about hiring the right people, building the right people, building the right teams at the right ratio. From working so closely with folks like these incredible leaders, what have you learned? Yeah, so I would say definitely Elon runs his company is an extremely unique style. I don't actually think that people appreciate how unique it is. You sort of like even read about it and so much. You don't understand it, I think. It's like even hard to describe, but I don't even know where to start. But it's like a very unique, different thing. Like I like to say that he runs the biggest startups and I think it's just, I don't even know basically like how to describe it. It almost feels like it's a longer sort of thing that I have to think through.
Well, number one is like, so he likes very small, strong, highly technical teams. So that's number one. So I would say at companies by default, they sort of like the teams grow and they get large. Elon was always like a force against growth. I would have to work and expend efforts to hire people. I would have to like basically plead to hire people. And then the other thing is that big companies usually you want, it's really hard to get rid of low performers. And I think Elon is very friendly to by default getting rid of low performance. So I actually had to fight for people to keep them on the team. Because he would by default want to remove people. And so that's one thing. So keep a small, strong, highly technical team. No middle management that is kind of like non-technical for sure. So that's number one.
Number two is kind of like the vibes of how this is, how everything runs and how it feels when he sort of like walks into the office. He wants it to be a vibrant place. People are walking around, they're pacing around. They're working on exciting stuff. They're charting something they're coding. He doesn't like stagnation. He doesn't like to look for it to look that way. He doesn't like large meetings. He always encourages people to like leave meetings if they're not being useful. So actually you do see this. It's a large meeting. If you're not contributing and you're not learning, just walk out. And this is like fully encouraged. And I think this is something that you don't normally see. So I think like vibes is like a second big lever that I think he really instills culturally. Part of that also is like I think a lot of big companies, they're like Pamper employees. I think like there's much less of that. The culture of it is you're there to do your best technical work. And there's the intensity and so on. And I think maybe the last one that is very unique and very interesting and very strange is just how connected he is to the team. So usually a CEO of a company is like a remote person, five layers up who talks to their VPs who talk to their reports and directors and eventually you talk to your manager. That's not how you're as companies, right? Like he will come to the office. He will talk to the engineers. Many of the meetings that we had were like 50 people in the room with Elon and he talks directly to the engineers. He doesn't want to talk just to the VPs and the directors. So normally people would talk, spend like 99% of the time maybe talking to the VPs. He spends maybe 50% of the time and he just wants to talk to the engineers. So if the team is small and strong, then engineers and the code are the source of truth. And so they have the source of truth, not some manager and he wants to talk to them to understand the actual state of things and what should be done to improve it. So I would say like the degree to which he's connected with the team and not something remote is also unique.
And also just like his large hammer and his willingness to exercise it within the organization. So maybe if he talks to the engineers and they bring up that, you know, what's blocking you? I just don't have a GPU to run my thing. And he's like, oh, OK. And if he hears that twice, he's going to be like, OK, this is a problem. So like, what is our timeline? And when you don't have satisfying answers, he's like, OK, I want to talk to the person in charge of the GPU cluster. And like someone dials the phone and he's just like, OK, double the cluster right now. Like, let's have a meeting tomorrow. From now on, sending daily updates onto the cluster is twice the size. And then they kind of like push back and they're like, OK, well, we have this procurement set up. We have this timeline and video says that we don't have enough GPUs and it will take six months or something. And then you get a rise of an eyebrow. And then he's like, OK, I want to talk to Jensen. And then he just kind of like removes bottlenecks. So I think the extent to which he's extremely involved and removes bottlenecks and plies his hammer, I think is also like not appreciated. So I think there's like a lot of these kinds of aspects that are very unique, I would say, and very interesting. And honestly, like going to a normal company outside of that is definitely miss aspects of that. And so I think, yeah, that's maybe that's a long rant. But that's just kind of like, I don't think I hit in all the points, but it is very unique thing and it's very interesting. And yeah, I guess that's my rant. Hopefully tactics that most people here can employ.
Taking a step back, you've helped build some of the most generational companies. You've also been such a key enabler for many people, many of whom are in the audience today of getting into the field of AI. Knowing you, what you care most about is democratizing access to AI. Education, tools, helping create more equality in the whole ecosystem. At large, there are many more winners. As you think about the next chapter in your life, what gives you the most meaning?
Yeah, I think like, I think you've described it in the right way, like where my brain goes by default is like, you know, I've worked for a few companies, but I think like ultimately I care not about any one specific company. I care a lot more about the ecosystem. I want the ecosystem to be healthy. I want to be thriving. I want it to be like a coral reef of a lot of cool, exciting startups and all the nooks and crannies of the economy. And I want the whole thing to be like this boiling soup of cool stuff. And only Andre dreams about coral reefs. You know, it's going to be like a cool place. And I think, you know, that's why I love startups and I love companies. And I want there to be vibrant ecosystem of them. And by default, I would say a bit more hesitant about kind of like, you know, like five mega corpse kind of like taking over, especially with AGI being such a magnifier of power. I would be kind of kind of worried about what that could look like and so on. So I have to think that through more. But yeah, I like I love the ecosystem and I want it to be healthy and vibrant.
Amazing. We'd love to have some questions from the audience. Yes, Brian. Hi, Brian, how I can would you recommend founders follow Elon's management methods or is it kind of unique to him and you shouldn't try to copy him? Yeah, I think that's a good question. I think it's up to the DNA of the founder, like you have to have that same kind of a DNA and that's some kind of a vibe. And I think when you're hiring the team, it's really important that you're like the you're you're making it clear upfront that this is the kind of company that you have. And when people sign up for it, they're they're very happy to go along with it actually. But if you change it later, I think people are unhappy with that and that's very messy.
So as long as you do it from the start and you're consistent, I think you can run a company like that. And, you know, as but you know, it has its own like pros and cons as well. And I think so, you know, up to people, but I think it's a consistent model of company building and running. Yes, Alex. Hi, I'm curious if there are any types of model composability that you're really excited about, maybe other than mixture of experts. Not sure what you think about like merge model merges, franken merges or any other like things to make model development more composable.
Yeah, that's a good question. I see like papers in this area, but I don't know that anything has like really stuck. Maybe the composability. I don't know exactly know what you mean, but you know, there's a ton of work on like primary efficient training and things like that. I don't know if you would put that in the category of composability in the way I understand it, but it's something the case that like traditional code is very composable. And I would say neural nets are a lot more fully connected and less composable by default. But they do compose and can fine tune as a part of a whole. So as an example, if you're doing like a system that you want to have chai-chee petin just images or something like that, it's very common that you pre-train components and then you plug them in and fine tune maybe through the whole thing as an example. So there's some possibility in those aspects where you can pre-train small pieces of the cortex outside and compose them later. So through initialization and fine tuning. So I think to some extent it's maybe those are my scattered thoughts on it, but I don't know if I have anything very coherent otherwise.
Yes, Nick. So we've got these next word prediction things. Do you think there's a path towards building a physicist or a von Neumann type model that has a mental model of physics that's self-consistent and can generate new ideas for how you actually do fusion? How do you get faster than light travel if it's even possible? Is there any path towards that or is it like a fundamentally different vector in terms of these AI model developments?
I think it's fundamentally different in some, in one aspect. I guess like what you're talking about maybe is just like capability question because the current models are just like not good enough. And I think there are big rocks to be turned here. And I think people still haven't like really seen what's possible in the space like at all. And I roughly speaking, I think we've done step one of AlphaGo. This is what the team we've done imitation learning part. There's step two of AlphaGo, which is the RL and people haven't done that yet. And I think it's going to fundamentally like this is the part that actually made it work and made something superhuman. And so I think this is, I think there's like big rocks and capability to still be turned over here. And the details of that are kind of tricky potentially. But I think this is, we just happened on step two of AlphaGo long story short. And we've just done imitation.
And I don't think that people appreciate like, for example, number one, like how terrible the data collection is for things like Jash EPT. Like say you have a problem, like some prompt is some kind of mathematical problem. A human comes in and gives the ideal solution, right, to that problem. The problem is that the human psychology is different from the model psychology. What's easy or hard for the, for the human are different to what's easier or hard for the model. And so human kind of fills out some kind of a trace that like comes to the solution, but like some parts of that are trivial to the model and some parts of that are a massive leap that the model doesn't understand. And so you're kind of just like losing it. And then everything else is polluted by that later. And so like fundamentally what you need is the model needs to practice itself how to solve these problems.
It needs to figure out what works for it or does not work for it. Maybe it's not very good at fordage edition, so it's going to fall back and use a calculator. But it needs to learn that for itself based on its own capability and its own knowledge. So that's number one is like that's totally broken, I think. It's a good initializer though for something agent-like. And then the other thing is like we're doing reinforcement learning from human feedback, but that's like a super weak form of reinforcement learning. It doesn't even count as reinforcement learning, I think.
Like what is the equivalent in AlphaGo for RLHF? It's like what is the reward model? What I call it is a vibe check. Like imagine like if you wanted to train like an AlphaGo RLHF it would be giving two people two boards and like said which one do you prefer. And then you would take those labels and you would train the model and then you would RL against that. Well what are the issues with that? It's like number one that's just vibes of the board, that's what you're training against. Number two, if it's a reward model that's a neural nut, then it's very easy to overfit to that reward model for the model you're optimizing over. And it's going to find all these spurious ways of hacking that massive model is the problem. So AlphaGo gets around these problems because they have a very clear objective function you can RL against it. So RLHF is like nowhere near I would say RL, this is like silly.
And the other thing is imitation learning super silly. RLHF is nice improvement but it's still silly. And I think people need to look for better ways of training these models so that it's in the loop with itself and in some psychology. And I think there will probably be unlocks in the direction. So it's sort of like graduate school for AI models. It needs to sit in a room with a book and quietly question itself for a decade.
Yeah. I think that would be part of it, yes. And I think like when you are learning stuff and you're going through textbooks like there is exercises in textbook where are those? Those are prompts to you to exercise the material. And when you're learning material, not just like reading left to right, like number one, you're exercising but maybe you're taking notes, you're rephrasing, reframing. Like you're doing a lot of manipulation of this knowledge in a way of you learning that knowledge. And we haven't seen equivalents of that at all in alarms. So it's like super early days, I think. So I think that's a good thing. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. It's cool to be to be optimal and practical at the same time.
So I would be asking like, how would you be aligned the priority of like A, either doing cost reduction in revenue generation or B, like finding the better quality models with like better reasoning capabilities? How would you be aligning that? So maybe I understand the question. I think what I see a lot of people do is they start out with the most capable model that doesn't matter what the cost is. So you use GPT4, you super prompted, etc. You do reg, etc.
So you're just trying to get your thing to work. So you're going after sort of accuracy first. And then you make concessions later. You check if you can fall back to 3.5 or so in terms of queries, you check if you sort of make it cheaper later. So I would say go after performance first, and then you make it cheaper later. It's kind of like the paradigm that I've seen a few people that I talked to about this kind of say works for them. And maybe it's not even just a single problem for like think about what are the ways in which you can even just make it work at all.
Because if you just can make it work at all, like say you make 10 prompts or 20 prompts and you pick the best one and you have some debate or I don't know what kind of a crazy flow you can come up with, right? Like just get your thing to work really well. Because if you have a thing that works really well, then one other thing you can do is you can distill that, right? So you can get a large distribution of possible problem types. You run your super expensive thing on it to get your labels and then you get a smaller, cheaper thing that you find you're on it. And so I would say I would always go after sort of get it to work as well as possible no matter what first and then make it cheaper. This thing I would suggest.
Hi, Sam. Hi. One question. So this past year we saw a lot of kind of impressive results from the open source ecosystem. I'm curious what your opinion is of how that will continue to keep pace or not keep pace with closed source development as the models continue to improve in scale.
Yeah, I think that's a very good question. Yeah, I think that's a very good question. I don't really know. Fundamentally, like these models are so capital intensive, right? Like one thing that is really interesting is for example you have Facebook and meta and so on who can afford to train these models at scale, but then it's also not part of it's not the thing that they do and it's not involved like their money printer is unrelated to that. And so they have actual incentive to potentially release some of these models so that they empowered the ecosystem as a whole so they can actually borrow all the best ideas.
So that to me makes sense. But so far I would say they've only just done the open weights model. And so I think they should actually go further. And that's what I would hope to see. And I think it would be better for everyone. And I think potentially maybe they're squeamish about some of the some of the aspects of it eventually with respect to data and so on. I don't know how to overcome that. Maybe they should like try to just find data sources that they think are very easy to use or something like that and try to constrain themselves to those. So I would say like those are kind of our champions potentially. And I would like to see more transparency also coming from you know and I think Meta and Facebook are doing pretty well. Like they released papers they published a log book and sorry it was yeah a log book and so on. So they're doing I think they're doing well but they could do much better in terms of fostering the ecosystem. And I think maybe that's coming we'll see. Peter.
Yeah maybe this is like an obvious answer given the previous question but what do you think would make the AI ecosystem cooler and more vibrant or what's holding it back? Is it you know openness or do you think those other stuff that is also like a big thing that you'd want to work on? Yeah I certainly think like one big aspect is just like the stuff that's available. I had a tweet recently about like number one build the thing number two build the ramp. I would say there's a lot of people building a thing. I would say there's a lot less happening of like building the ramps so that people can actually understand all this stuff. And you know I think we're all new to all of this. We're all trying to understand how it works. We all need to like ramp up and collaborate to some extent to even figure out how to use this effectively. So I would love for people to be a lot more open with respect to you know what they've learned how they trained all this how what works what doesn't work for them etc. And yes just from us to like learn a lot more from each other that's number one. And then number two I also think like there is quite a bit of momentum in the open ecosystems as well. So I think that's already good to see. And maybe there's some opportunities for improvement I talked about already. So yeah.
Last question from the audience Michael. To get to like the the next big performance leap from models do you think that it's sufficient to modify the transformer architecture with say thought tokens or activation beacons or do we need to throw that out entirely and come up with a new fundamental building block to take us to the next big step forward or AGI. Yeah I think I think that's a good question. Well the first thing I would say is like transformer is amazing. It's just like so incredible. I don't think I would have seen that coming for sure. Like for a while before the transformer arrived I thought there would be an insane diversification of neural networks. And that was not the case. It's like complete opposite actually. It's complete like it's like all the same model actually. So it's incredible to me that we have that.
I don't know that it's like the final neural network. I think there will definitely be. I would say it's really hard to tell to say that given the history of the of the field and I've been in it for a while it's really hard to say that this is like the end of it. Absolutely it's not. And I think I feel very optimistic that someone will be able to find a pretty big change to how we do things today. I would say on the front of the autogress of the fusion which is kind of like the modeling and the loss setup. I would say there's definitely some fruit there probably but also on the transformer and like I mentioned these levers of precision and sparsity and as we drive that and together with the co-design of the hardware and how that might evolve and just making network architectures there are a lot more sort of well tuned to those constraints and how all that works. To some extent also I would say like transformer is kind of designed for the GPU by the way like it was the big leap I would say in the transformer paper and that's where they were coming from is we want an architecture that is fundamentally extremely paralyzable and because the recurrent neural network has sequential dependencies terrible for GPU. Transformer basically broke that through the attention and this was like the major sort of insight there and it has some predecessors of insights like the neural GPU and other papers at Google that are sort of thinking about this but that is a way of targeting the algorithm to the hardware that you have available so I would say that's kind of like in that same spirit but long story short like I think it's very likely we'll see changes to it still but it's been proven like remarkably resilient I have to say like it came out you know many years ago now like I don't know yeah something six yeah so you know like the original transformer and what we're using today are like not super different yeah.
As a parting message to all the founders and builders in the audience what advice would you give them as they dedicate the rest of their lives to helping shape the future of AI? So yeah I don't have super I don't usually have crazy generic advice I think like maybe the thing that's top of my mind is I think founders of course care a lot about like their startup I would I also want like how do we have a vibrant ecosystem of startups how do startups continue to win especially with respect to like big tech and how do we how does the ecosystem become healthier and what can you do? Sounds like you should become an investor. Amazing thank you so much for joining us Andre for this and also for the whole day today.