Latent Space: The AI Engineer Podcast - NeurIPS 2023 Recap — Top Startups
Episode Date: December 30, 2023We are running an end of year listener survey! Please let us know any feedback you have, what episodes resonated with you, and guest requests for 2024! Survey link here.We can’t think of a more Late...nt-Space-y way to end 2023 than with a mega episode featuring many old and new friends recapping their biggest news, achievements, and themes and memes of the year!We previously covered the Best Papers of NeurIPS 2023, but the other part of NeurIPS being an industry friendly conference is all the startups that show up to hire and promote their latest and greatest products and papers! As a startup-friendly podcast, we of course were ready with our mics to talk to everyone we could track down.In lieu of an extended preamble, we encourage you to listen and click through all the interviews and show notes, all of which have been curated to match the references mentioned in the episode.Timestamps & Show Notes* [00:01:26] Jonathan Frankle - Chief Scientist, MosaicML/Databricks* see also the Mosaic/MPT-7B episode* $1.3B MosaicML x Databricks acquisition* [00:22:11] Lin Qiao - CEO, Fireworks AI* Fireworks Mixtral* [00:38:24] Aman Sanger - CEO, Anysphere (Cursor)* see also the Cursor episode* $8m seed from OpenAI* Tweet: Request-level memory-based KV caching* Tweet: GPT-4 grading and Trueskill ratings for rerankers* [00:51:14] Aravind Srinivas - CEO, Perplexity* 1m app installs on iOS and Android* pplx-online api 7b and 70b models* Shaan Puri/Paul Graham Fierce Nerds story* [01:04:26] Will Bryk - CEO, Metaphor* “Andrew Huberman may have singlehandedly ruined the SF social scene”* [01:12:49] Jeremy Howard - CEO, Answer.ai* see also the End of Finetuning episode* Jeremy’s podcast with Tanishq Abraham, Jess Leao* Announcing Answer.ai with $10m from Decibel VC* Laundry Buddy, Nov 2023 AI Meme of the Month * [01:37:13] Joel Hestness - Principal Scientist, Cerebras* CerebrasGPT, all the Cerebras papers we discussed* [01:56:34] Jason Corso - CEO, Voxel51* Open Source FiftyOne project* CVPR Survival Guide* [02:02:39] Brandon Duderstadt - CEO, Nomic.ai* GPT4All, Atlas, Demo* [02:12:39] Luca Antiga - CTO, Lightning.ai* Pytorch Lightning, Lightning Studios, LitGPT* [02:29:46] Jay Alammar - Engineering Fellow, Cohere* The Illustrated Transformer This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Transcript
Discussion (0)
Hello, hello. This is Swix back again with part two of our Europe's coverage. This time we're going to cover startups. And it's a special episode because this is the last episode of 2023. We are definitely looking back at the year with rolls colored glasses. This has been a fantastic year. We only started this podcast in February. And it's grown so much. Thanks to all of you who've listened and give feedback and shared it with your friends. And we actually managed to invite a few of our former guests back on the pod together with some new friends.
and probably some new voices that you're going to be hearing next year.
So this is not a hard-hitting interview series.
You know, it's not that kind of interview.
It's not that kind of podcast where we try to go too deep.
Today we're just going to go broad.
And we're just going to check it on a bunch of startups that we like and monitor
and we're present in Europe.
So first up is John Frankel of Mosaic ML.
We last talked to him in May for the MPT-7B episode.
That's episode 13.
And I have to say that was one of the best performing episodes of the whole year.
So you're welcome to go back and listen to that if you missed it.
And since then, they were bought by Databricks for $1.3 billion.
And actually during the interview, they were in the process of getting inquired.
They just couldn't say anything about it.
But it's definitely one of the biggest AI news of the year.
And you can listen to what it's like or what's going through John's mind back then,
as well as now, today, six months later.
Hey, Jonathan.
Welcome back to the pod.
Thank you so much.
This is an interesting place to have the pod under the overpass of interstate whatever it is.
Yeah, interstate whatever in the city of New Orleans.
Yeah, it's really good to you.
Since you were last on the pod, Mosaic got acquired.
Yeah, thank you.
I think you really deserve all the credit for this.
No, you guys were sitting on that news, and we didn't know what was going to happen.
But I did come away from your interview with a very, very high impression.
Like, you guys are in a perfect place, perfect time, and it makes a lot of sense to join forces with Databricks.
Yeah, they're kind of, I mean, I will say we really didn't want to get acquired.
You did not?
We didn't.
I mean, we loved being independent.
Sure.
Like, we loved doing our own thing.
Sure.
But this just made too much sense.
Like, you know, they do data.
We do LLMs.
We both do enterprises.
We're all a bunch of academics.
Like, it was just kind of, we couldn't think of a better match.
And so it just, we kind of came to the conclusion, like,
okay, I guess we can't not do this.
Like, it's too perfect.
Yeah, yeah.
And you've done a bunch of other podcasts on the acquisitions.
So I don't, we don't need to retread.
I'll send people.
that way? Just like what's new in Mosaic world? In Mosaic world, honestly, like, we're just
cooking. I think we've been a little quiet lately. Yeah. Or at least we look quiet from the
outside. It is certainly not that we haven't been busy and it's certainly not that, you know,
we're not doing cool stuff. Part of it is that, you know, getting acquired, there's a bit of
administrivia involved. You know, we had to go through new employee orientation, get health insurance,
you know, meet our amazing new colleagues. Part of it is like, you know, the fields has moved
toward bigger stuff and we've moved toward bigger stuff. So I think we'll have some exciting
stuff to talk about soon, but my philosophy is always like speak through the work. Yeah. So I don't
to hype, I don't want to like get people excited. You know, you'll see the work and you judge for
yourself. Yeah. You talk about the industry moving towards bigger stuff. What trends are
notable to you in the, I say, second half of this year? Everybody's figured out how to build
LLMs. Like it's no longer a coveted skill of, you know, a handful of people. But now we've all
become LLM builders. The field has kind of narrowed an aperture again. And, you know,
And yesterday when we were all figuring out how to train ImageNet, now we're all figuring out
how to build really big, really powerful models.
And that's not just an assumed skill.
The rest is kind of what do you do with that skill?
How do you build a product?
How do you differentiate?
What cool thing can you do that's different from everybody else?
That's going to determine kind of what 2024 is going to be like.
Yeah.
I guess like a lot of people are banking on multimodal being, well, 20134 being the year of multimodal
LMs.
I feel like that's a little bit too broad a brush.
I don't know, like, what's valuable in that front?
I mean, so multimodal is going to be a huge deal.
Like, it's, but it's already a huge deal.
Like, we made multimodal models.
The lava paper author I also interviewed on this pod.
Yeah, like, lava's amazing.
Like, you know, I've been playing with it a bunch personally.
It's awesome.
And we've got Bard, and we've got Gemini, and we've got GPT4V,
and, you know, I'm sure there are going to be plenty more where that came from.
I think the question is, that's with all good things, you know.
Cool promise is different than, like, delivering value.
Yeah.
And I'm really curious, like,
Like, you know, what do people genuinely do this with this in real production settings
and the settings that will actually pay off the huge investment that's made to build these multimodal models?
Right.
I'm also kind of curious, like, are we going to start to see some big open source multimodal models?
Like, you know, we've got lava.
It's moving in the right direction.
But, like, is somebody going to, you know, build something that looks a lot like GPT4V
or something on that trajectory and kind of start another arms race in that direction?
Like, it'll be interesting to see.
I'm honestly pretty curious, and I'm watching with baited breath for what everybody does.
Yeah, well, I think in our chat earlier today, you said, you know, we kind of live in a diverse world where like every company has kind of found its niche, maybe?
If you want to go through that logic.
Yeah, I'm kind of like, I think they're, you know, they're the optimistic and pessimistic scenarios for where we go.
Okay.
Like, you know, I don't know.
I kind of think there's a boring scenario where everybody basically is building these giant LLMs and maybe language, you know, image models or what have you.
And they're all kind of the same.
It's just you've got the Google version and the opening eye version and the Amazon.
version and it almost feels like cloud providers in some sense like you know what
distinguishes AWS from GCP it's kind of you know where you are and slightly
different consoles and you know it's like different interface maybe you prefer one or
maybe like you've been using one for a while and like you're used to it or your IT
person really likes this one because you know they used to work at that company or what
have you that would be a pretty boring world but I think that's unlikely to be the case
I'm kind of I'm looking at like you know all the cool stuff coming out of Gemini you know
all the cool stuff coming out of Open A.R.
and then like I'm looking at Adobe.
Firefly, really?
Firefly.
They're building like a different model
with a creative perspective.
Like I'm kind of looking at this and going,
maybe we'll have a wide diversity of models
and everybody will be building models
like just by virtue of the fact that we need so much data
to build any of these models.
Everybody's going to play to their strengths
and, you know, use every resource they have at their disposal
and Google has, you know, they have YouTube.
I don't know if they're using it,
but like that's a cool resource.
Open AI has put a ton of energy into text data.
Adobe gets creative people.
And there are a few other companies where that came from.
So I'm kind of like,
I'm honestly curious if we're going to just see
really different models for different people.
And I don't know, that's a pretty cool world to live in.
That is great.
We won't see this arms race.
We'll just kind of see diversity.
Yeah.
And we shouldn't forget Bloomberg,
which sees Bloomberg GPT,
but that's a source of significant tokens,
the financial world.
Yeah, yeah.
Like it's, I mean, my whole business
on the Mosaic and Databricks side
is, you know,
helping people leverage the data they have.
So I'm excited about a world of diversity because, you know, it's,
not only do we have like these crazy diverse foundation models at the larger scales.
Yeah.
But everybody embraces whatever they have.
Like our friends at Replit do a code model and, you know, I don't know,
Bloomberg does another finance model and like somebody does a healthcare model.
And like everybody draws in their strengths.
And that's a cool world.
Are you bullish every company training in our model?
Oh, sorry.
That's a stupid question to ask you.
I mean, I think, you know, I'll give you the honest answer because I think it's,
You know, the business mosaic answer is, oh, yeah, I'm super bullish.
Everybody should train their own model.
Come do it on Databricks right now.
Like, you should start from a base model that everyone shares, right?
Like, that's kind of useful.
Maybe or, like, you work your way up.
I think it's like there's a journey with playing with any of these models
that may or may not end with training your own depending on where you go on that journey.
Like, you start by playing with an API and maybe you do some retrieval and maybe you do some fine tuning
and then maybe you build your own model.
Yeah.
But like, it's a journey.
and I think there's a destination many people will get to that involves training their own models.
Yeah, totally.
What other trends are going on that you're liking or seeing or hating?
Honestly, maybe this gets to the question of, like, you know, overall impressions of NURPS.
Like, I thought this was a pretty garden variety of NURPS in some sense,
which feels weird to say in the age of, you know, chat, GPT and everything else that's happened in the past year.
Yeah.
But this felt like the most normal conference I've had since, like, 2019.
You know, I mean, we've had a pandemic in between and everything, but, like,
Like the past couple years, actually internally at Mosaic, I always do a long write-up of every conference in the trends I see.
Okay.
Like some public, some that are more relevant to what we're doing.
And like a lot of the write-ups I've done over the past year or two have been like all about like the unease.
Sometimes it was just like my write-up for ICML 2022 was all about people capitulating to scale and the five stages of grief and, you know, how different people were responding.
Academics, people at Google Brain, back when it existed.
you know, all that stuff.
And it almost looks quaint to think about that it was insightful to say people have capitulated
to scale in this day and age where, you know, you know, tens of billions of parameters
looks mundane.
Yeah.
But this kind of felt like, okay, the academics are trying to find their way forward.
It's no longer just kind of coping and ignoring, but like trying to find their way
forward.
The industry folks are doing their thing.
A lot more people keeping secrets than used to, but it's still like, you know, a lot of
people also aren't keeping secrets.
and can talk about what they're doing still.
Yeah.
So it kind of felt like, you know, equilibrium.
I don't know how long it'll last,
but this was a lot less of a frantic and stressful conference
than I think I'm used to, at least in the past couple years.
You know, I'm in a new role in some sense.
I'm on the business side now.
I'm on the industry side, and I'm trying to find my own path.
But I felt like a lot of us have changed roles in some sense
as the past couple years have, you know,
have taken place and everybody's moved around
and figured out what they want to do.
But we've all kind of found our place at this point.
I feel like, you know,
We may be in different places, but the ecosystem, the community has kind of sustained with, you know, a bunch of new PhD students and all that good stuff.
Like, it's kind of, you know, I don't know, it's nature healing in some sense from the insanity of the past couple years.
And a reminder that, you know, we're all kind of small pieces in a much bigger, you know, ecosystem and community.
Yeah, and it's still growing, though.
Apparently the latest stats was something like 15,000 attendees.
Oh, my God.
Oh, my God.
I will say one big difference.
You know, in the time right before the pandemic,
deep learning was getting so popular,
the conferences would sell out the day registration open.
Like, as a student, you'd have to rush to register
or you wouldn't even get to go.
That I don't think is happening anymore.
It's easier.
Yeah, and there was also live streaming stuff, you know, so.
Yeah, but it's kind of interesting that, like,
I guess we've adjusted to the huge capacity
and everything that's, you know, going on.
But it's, you know, even so with it getting bigger,
it didn't feel that different, to be honest.
Maybe it's just that I joined the community,
when things were already big.
Yeah.
But like, you know, there were some journalists here, some VCs here,
but that's always been the case.
Yeah, it's always been the case.
You always have, you know, overrated, underrated papers.
We'll maybe save the overrated stuff for later,
but any underrated stuff that you want to highlight from this year,
it doesn't have to be at the conference,
but I just want to mind you for underrated papers
that people should pay attention to.
I'm going to flip this a different way.
Okay.
Because I'm not a fan of overrated or underrated,
and I'm not a fan of passing judgment on stuff.
I just don't, like, far be it for me.
I like one of my big gripes is like we shouldn't have best paper awards like and I say that
having gotten one back in the day so I feel like I have the ability to say that not just out of
bitterness but out of like recognition that it's dumb sure but test of time though test of time is great
test of time is awesome yeah you know and you know I look forward to everybody using lottery tickets
in 2029 I don't if you're working on lottery tickets you know there's a lot of other cool stuff out
there but I think it's really I'll turn that question into like what areas
should academics be thinking about?
I don't know. What would I work on as a PhD student right now?
Or what would I recommend a student work on?
And all the biggest questions in the field come down to how you measure and how you evaluate.
Those are just such fundamental questions.
Until we know how to measure things, until we know how to evaluate anything,
you can't really even do any science.
We don't know what we're even talking about.
Yeah.
And so I'm also thinking a lot about synthetic data.
Can we generate useful evaluation sets for all the little properties we want to find about in LLM?
Creating data sets is really hard, but a model can help us do that.
So I'm kind of curious, like, you know, can we bootstrap the evaluation process with synthetic data,
figure out good ways to help ourselves build good data sets, and then, you know, from there,
maybe we can start to really take a bite out of the evaluation questions and get moving on the actual science of understanding what's going on with these LOMs.
All that seems very academically viable.
None of those require huge amounts of compute.
They require creativity, ingenuity, but that's an abundance in academia, even when compute isn't.
Yeah, I would say that that's actually one thing.
had a big delta on for this year.
Yeah, tell me more. I'm curious.
Synthetic data, I always thought you're just kind of sampling from a known distribution
anyway that you know it's imperfect and doesn't match human preferences.
And it's Kanjun from Mbue that actually changed my mind on this.
Tell me more. That is a smart person you're talking to.
She's like, you actually don't want to match human preferences.
You want to spike it in different ways and useful ways.
And so you want to synthesize data in useful ways that don't necessarily match human
preferences.
And once she said that, I was like, oh, okay, I think I'm actually sold on this as a viable practice.
I would actually make a completely different argument, but she's right.
So I'm probably going to make a wrong argument now because Ken June is pretty much always right.
And, you know, when she disagrees with me, it means I'm wrong.
But, you know, the way that I look at it is synthetic data is not about, like, it's not about relying solely on the model.
Like, we as computer scientists love the idea that once you automate something, you fully automate it.
Okay.
It's really about, like, how do you reduce the amount of work necessary?
to create something that's truly useful.
And so synthetic data is not about can we, like,
whip up a data set automatically and then make a model better.
It's about how can use human time most effectively?
And maybe labeling data or creating a data set from scratch
is not the most effective use of human time.
Maybe it's curating a data set that a model generated,
you know, to pick the examples you like most
and edit a few of them.
When I think about the millions of different small properties of LLMs we want to study,
like in some sense, the unit tests of LLMs that we want to develop,
You know, that's going to require a bunch of tiny e-val sets on specific, really niche things.
It's really hard for a human to just write from scratch.
Nobody has a time or patience for that.
If a model can help you do it and you can curate, you don't end up in a full feedback loop.
You have a human there, but you're just making better use your mind.
Yeah, it makes sense.
I would just observe that this sounds like weak labeling, and I talked to Razanabe from human loop,
who actually pivoted away from weak labeling.
Interesting, tell me more.
I don't know.
I just think of my, it might have just been too early.
early. I still believe her. This is the thing about all of deep learning. You never know
whether you're too early and too early is often six months too early. It's no longer the like,
you know, Yahshua Vengio and everybody being 20 years too early. A Schmidt Hoover. And Shpit
Hoover, of course. We have to salute, you know, Shpit Huber as well. It's not like being 20
years too early. It's like you might be six months too early and some crazy thing is going to
happen or like something will finally click. Yeah. And there goes that. Yeah, yeah.
Totally. Cool. We're almost at probably your destination.
The workshops tomorrow, you said, are kind of the highlights for you for New Reps?
Yeah, yeah.
What should be my workshop strategy? I don't, I haven't, I picked out a few, but.
Oh, Wander.
How do you do New York as well, basically?
Wander and go to a lot of the poster sessions.
Like, the talks at workshops are always great, but, you know, often, honestly, the workshops
are pretty eclectic in terms of talks.
You try your best as a workshop organizer to put together a coherent program, but, you know,
presenters are going to do what presenters are going to do, and you can't really stop that.
But instead, you know, I love the poster sessions
because, like, you get students who are working on, like,
really crazy, creative stuff that isn't even ready for the conference yet.
Like, you're actually seeing things that have not been put out on Twitter yet.
And that's such a nice change from NERIFs
where all conference papers have been out for months, if not longer.
Oh, wait, I observe the opposite.
Things that have been on Twitter for, like, forever are not out of date,
and there are posters because that's how long it takes to submit a paper.
Yeah, yeah.
But for the workshop poster sessions.
It's the workshop poster sessions,
that are awesome because you're truly seeing stuff
that was created this fall,
may not be an arc yet, nobody's talked about it,
probably makes no sense yet,
but may evolve into something really cool.
And you also, there's not as much competition
to talk to the people, you can just kind of chill.
So I love to like wander from poster session
to poster session throughout the workshops,
because like that's my favorite part.
I don't know, I can hear, you know,
somewhat important people talk anytime,
but it's like talking to the people
and seeing, like getting a glimpse of what might be ahead.
You know, being able to say, like, oh my gosh, I remember seeing the poster for this paper that a year later becomes very important.
And, like, kind of asking yourself, you know, is this nonsense or is this brilliant?
And, like, not actually knowing the answer or having 50 million people on Twitter having told you the answer,
that's kind of, I don't know, it's fun.
It takes me back to, like, what the conferences were like for me, you know, when I was early in my career.
Like, you know, it was just kind of some random people coming and chatting with me,
and you never really knew what was important and what wasn't, but it was all kind of cool and fun.
Maybe it's a formula hypothesis and search that way.
Yeah, so I'm looking forward to tomorrow.
If you find anything interesting, just let me know and I'll go interview with them.
I've been recording sessions with poster presenters all the time.
And I want to expose people who don't come to Europe, that this is what goes on.
And there's so much, I found so much talent that does a lot of work that you don't hear about them online
because they're just not online or they just don't have the reach that, you know, and I do.
So, like, I want to give them that reach.
Yeah, I think there's like, you know, I'll say two things kind of to close.
up. One is kind of that like I feel like there's now so much hype attached to NURIPS and I Clear
and ICML just by virtue of the hype that's attached to the field. I don't know, this feels
pretty mundane and boring to me. Like it's a really cool, but it's also just, you know, it's just
a bunch of academics, like walking around having boring conversations, getting coffee, and like
pretending to party. I definitely my experience of...
Pretending to party. I love it. No, I'll say that, you know, I'll tell you like, my experience
of NURPS last year. Like, I don't know, these conferences have a reputation of being over the
top with industry parties and things like that.
And my impression was that was probably true in 2017.
Like that year is known as the Nureps that broke Nureps for various reasons.
I wasn't there at that time.
That was before I was even in the field.
But my experience last year, especially post-pandemic, was a whole generation of students
had like heard stories and these stories had been built up in their minds.
And they were trying to live out the fantasy of what they thought NURPS had been like.
So these very boring happy hours, people tried to turn into Ragers and it was hilarious.
who's just adorable in some sense.
So, you know, it's worth remembering, like, you know,
there's the fantasy and there's the reality,
and the reality is, you know,
it's a boring industry conference
where people are, or academic conference
with some industry component,
where people are trying to make money
and convince people to look at their posters
and get a few citations and...
Lots of hiring. Lots of hiring.
Lots of hiring.
I think things have really settled into a new normal.
And, you know, with all the hype
and all the craziness over the past couple years,
people feel like everything is just exploding
and changing all the time.
Like, you see those LinkedIn posts
of everything has just changed.
I hate those.
I hate those so much.
I hate LinkedIn.
If anyone is a LinkedIn influencer,
I hate you.
But, you know, it's kind of like,
this felt like, okay,
like maybe there's a steady state again.
Maybe we can all catch our breath a bit.
And it kind of felt like after a pandemic,
after all the technical development
that's happened in the past couple years,
like, it's nice.
We can chill.
Like, we can kind of breathe a little bit.
And there's something really nice about that.
Yeah, love that.
Well, it's so nice to have you on again
in chat and catch up.
Thank you so much.
It's good to see you.
In case it wasn't obvious, that was not up to the usual standards of our recordings
because that was a walking interview.
I was carrying these portable mics all over Newrips.
And really the only way to schedule podcast interviews with people, especially busy people like John, at Newrips,
is to show up with a portable mic, shove it in their face and talk to them.
And that's what the majority of the hit podcast conversations are for this episode,
because that's the only way I can see someone, grab someone, do you have 15 minutes,
and talk through something, that's what happens.
That's how we schedule interviews with a whole bunch of people
that we would not get otherwise.
New Reps is too chaotic to schedule anything else otherwise.
One takeaway from John's interview, which I want to highlight,
apart from the whole, it's the new normal conversation.
It's the focus on synthetic data generation.
This is a recurring theme that is continually coming up
from my conversations with literally everybody in the space.
And how do you do it right?
How do you do it with the blessing of opening eye?
Bight Dance was recently banned from OpenEI because they were considered to be distilling from GPD4,
which is not allowed under the terms of service.
I've heard that they're not the only company that is accused of or being thought of or rumored to be doing that.
Probably the right approach is something that looks like Deep Mind's approach,
which on Monday of Nurev's published a paper called Beyond Human Data,
scaling self-trading for problem-solving language models.
And the concept is honestly not that complicated.
For the domains of math and for coding,
they were able to computer-generate data for training on,
and they found that when training Palm 2 on that synthetically generated data
improved their results and performance on the benchmarks for those relevant domains.
It makes sense that we can scale beyond human data on those dimensions.
That's the trivially easy stuff,
and the question is how do you scale beyond the verifiably correct?
If you listen to part one of our NERRIP's coverage,
we talked about DPO, which is more efficient usage of existing information
so not exactly using synthetic information.
But just as a sneak peek of 2024, we've actually already recorded an episode with Nathan Lambert,
now of the Allen Institute, on RLHF and RLAIF,
and I think those approaches might scale beyond just the narrow domains of math and code.
So next up is someone who's new to the pod, but not new to me.
I've talked with Lynn from fireworks a bunch over the past few months,
and they've definitely blown up in the inference space.
So in some sense, you can think of fireworks as a competitor to,
together AI or replicate or any other inference serving platform that you might think about,
but they have a really good team and they've been doing some very good work with Mistral.
Lynn and her team have an amazing track record which you hear about in the interview,
and their customer list is pretty stellar too.
So it's worth checking out and checking in on the inference business with Lin Tiao from Fireworks AI.
Okay, so who are you and what is Fireworks?
Hey, Sean.
We started fireworks last year and me and a few founding engineers,
we have been working at Mata on building air platform and specific PITORCH for five years.
When we started PITORGE, it was a framework for researchers.
And we took the mission to build one framework for both production and research
and streamline research production transition, operating PITORGE as a huge scale for Mata and for the industry.
So by the time we left last year,
it is running more than $5 trillion inference per day
across 50 data centers for META.
And we feel like this is a great impact we have landed.
But when we look at the industry, it's really, really behind.
And we founded fireworks to really bring these expertise
to help industry adopt AI in the fasted way,
adopt the state of our best research
into production in a very streamlined way.
And why fireworks the name?
Because Pytorch holds fire
and we want this fire to be everywhere.
That's why we'll come up with our name fireworks.
Nice, nice.
Well there's also lightning and lightning labs
is kind of spin off of that effort.
Right, right.
And basically, I think there are multiple teams
working on better inference for Pytorch.
Could you elaborate like,
how do you see the,
the landscape of sort of inferences to service companies.
I don't know if you consider yourself that,
like infrastructure companies in general, I guess.
Right.
So I think when we think about inference optimization,
there are different angles, right.
I still think like Pytorch team,
when I was there and now, the Pyttouch team,
they are still doing a great job
pushing for Pytorch performance optimization
across training and inference
through the PITTorch Compile project.
The goal here is to keep the simple Pi-Torch programming API,
which is really good for researchers,
and then take to have a lifting of doing optimization in an automatic way.
But then because Pi-Torch can support and sustain a broad community,
so the workload is much more diversified when they think about optimization.
And here at Fireworks, we take the software,
Fireworks, we take the same philosophy.
We want to keep the simple API of Pite Torch programming language
and take the heavy lifting of the optimization,
but more specific target at industry verticals.
For example, when we started a company,
we started from ranking recommendation,
and we have a product around that.
And then later on, our customer we engaged with,
they're asking us, hey, can we help on Jenny Eye?
because all the Gen.I.M.O.O.T.M. models is bigger, it's more complex. It's even harder
to operate and optimize. So then we start a vertical on Geniac across large language model
and image generation and other modality as well. But because we focus on vertical, so we
can't afford to take a much more specialized optimization approach. And that is complementary
to Python's Compile with Python driving for a broader audience. So, so that's a more specialized
That's where we are.
And I will say, because of our PITOCH expertise,
we are the best when it comes to performance
of optimization across the following areas, right?
The performance for GNI models are pretty complicated
because there's no one bottleneck on system resource
consumption point of view.
The bottleneck can scatter across CPU to GPU communication,
the computer itself, memory bandwidth,
and many other things.
So we develop a lot of different.
a very special skating algorithm that allow us to tackle those bottlenecks independently
instead of blending them together.
So that's very unique thing we are doing.
The second is we build custom kernels across attentions, especially multi-quiry attention,
met more or reduced, and those customer kernels outperform anything in the industry.
Yeah, we also do many adaptive technology that just when we run the inference,
performance will get better.
The more you run the workload, same workload, it will start to adapt to the workload,
then become better and better.
So across all this, that enabled us to be in the leading position of a GENI inference provider.
Just to give people a mental image, obviously they can go to the website,
you have a self-serve option that people can try out.
You mostly have a library of existing popular open source models.
You just started creating your own models, which we can talk about.
I didn't know that.
That's super exciting.
You actually recently enabled, you enabled Mixed trial in one day after the release.
By reverse engineering the code?
That's right.
What's the high level of that?
Yeah, so I think we did that twice, right?
The first time when Midstroke 7B got released the same day,
They release in the morning, and in the afternoon we launched Mitchell 7B.
We're the first to get work.
And this is basically, like, they release weights but no code.
And then you have to implement code by guessing the...
Right.
For Mixtrault, that happened last week.
They only released the weights, and there's no code.
And I think it's really fun for us, right?
Because thanks to our technology,
we developed over time, we actually build a slew of componentized libraries that enabling new models is not every time built from scratch.
So because all these models share similar kind of model architecture underneath with different components,
and that's why we have the velocity of the speed.
But it was actually fun to hack it.
Dima, he goes by Dimitro Zuckukov.
You're a CTO.
Yeah, our CTO.
He basically took the dilemma model and tried to retrofit to the mystery of weights and worked.
We were like thrilled.
Oh, it's actually working pretty well.
But on top of that, it was just a base model.
It's not an instruct tune model.
It's not really usable for chat.
And then overnight, we tune a chat model.
and deploy to pool bots and used by many other users already at high scale.
And the feedback is really, really good.
Of course, now we switch to Mistro Instra as an official version,
but we still keep getting users' feedback.
Our overnight tune chat model, sometimes even perform better.
Wow.
So, yeah, so that's what we do.
When it comes to the velocity of quality and velocity to high speed,
we are the best company in the industry.
Mentioning speed, I should also mention that a lot of AI engineers listening on the podcast would be familiar with the Versailles AI playgrounds, which you are the primary provider for, right?
I mean, that's the one that's most visible because they name you, but I don't know if there's any other that you serve that you can name as you're the sort of inference provider.
Here's just kind of very highly selective list of the customer.
Yeah, we get the marketing rights.
Yeah.
So we already served home.
Yeah.
They're doing a really good PowerPoint generation if you haven't used.
that, please try it out. It's really cool.
Yeah, I used it for my keynote from my conference.
Oh, that's fantastic. Yeah. It's basically, I use like a magic track pad to serve the
tome and then obviously whenever I need to generate images, I actually generated from
inside a tone. So I was using fireworks without knowing it. That's fantastic. We also
serve the co-pilot kind of application, for example, Sourcegraph released Cody.
And we, by this time, by the time this releases, we'll release our episode of Sourcegraph and Steve
Oh, that's great.
Yeah, we're good friends.
We also are the inference back-end provider for Paul.
That is a very popular chatbot.
And Paul is building...
Wait, does it just anthropic or GPT?
At the beginning.
Okay, now they have their own models.
They are going big on open source models.
I see.
To provide a variety of different, solving different experiences and much better performance.
better performance. And of course, from their point of view, like, cost efficient.
There are many other big enterprises, for example, with DoorDash. They're using us.
Did they say for what?
Yeah. So we actually, yeah, we released ranking recommendations stack with them to power
with their main business. Because when you go to that website, there are a lot of ranking
having stuff happening including ads and kind of restaurant search recommendations so on
one thing I wonder about is for something like a DoorDash and you know I'm a bit
anywhere to Rexis in general shouldn't those be precomputed like why doesn't have to be
fast or live it shouldn't it doesn't have to be live right actually there are a lot of
dynamism right because your your personal preference can change right it's also
quickly learning and the they are distribution
channel, there are participating in restaurants may change, a menu may change. There's a lot of
dynamism in the matching criteria here. And as I work at a matter for a long time, to actually do
highly adaptive ranking recommendation, personalized ranking recommendation, yield the
best performance when it comes to the relevance and revenue. Yeah, I'm just asking
like offline versus online. I don't know how sensitive.
if this is the latency requirements.
Oh yeah, yeah.
No, so a lot of time, most people, like,
of course, at no big companies,
people do online training.
But for those enterprises,
I haven't seen the need to go online training yet.
So usually training is offline, and then,
but it's periodic, right?
You have to refresh with new information,
and then you launch and deploy periodically.
Yeah.
Okay.
And so I teased this earlier.
I didn't know that you had your own models
that you're also training.
So you just released a clean lava.
Yeah.
What's the story behind that?
Right.
So I think everyone knows like GVTV
and the kind of the space of multi-modality, right?
I think as I talked about in one of the interview
when I was at Mada for Pithorch,
at the end, the moderator asked me,
hey, what I think is in the future.
My sister is multi-modality.
Because we live in the whole world that it has so many different modalities across image,
audio, text, video, and so many other things.
And that is the mix of our world and the real world experience.
So yeah, we really think multi-modality will be a very important aspect.
So we take the very popular Lava model from Microsoft, but it's kind of, it has the kind
GD4 training data, so we replace that with our own training data and make sure it's commercially
usable. Yeah, we're super excited about this. Yeah, I mean, it sounds like you'll be exploring
more models as well and just putting all your platform and you're the fastest way to access
them. We're here in Europe. You're talking to a lot of industry folks. Any other top-of-mind
conversations that you're just hearing a lot that may be surprising to people? So I mostly talk
with many startups that is emerging.
So number one, it's really refreshing to me,
but not surprising, that there's so much product innovation
that's happening across the board,
so much energy there, but a lot of those
are built on top of JNI.
Of course it's not surprising,
but it's kind of validating fundamentally innovative technology
can reboot.
a huge part of it in the industry. So that's really, really refreshing. The second is,
I think there are a lot more, hey, how we think about working together, right? How we build a bigger,
more interesting product for a broader audience together. I think those conversations is very,
very interesting to me. Yeah, yeah. Okay, very cool. And you're also here to hire or recruit.
Maybe put out a call.
Who are you looking for?
What's the profile?
Yeah, we are definitely growing very fast as a company.
We are looking for system engineers.
Hey, we want to, we already have a rock solid inference serving,
but we are scaling it quickly and aggressively.
So anyone with cloud infrastructure experience will really fast join us.
We are also looking for researchers who has a lot of experience and understanding data a lot,
understanding quality a lot, can get to quickly help our customer get to high quality.
And whether through training our own models or fine-tuning models and the building task-specific fine-tuning services,
those are the areas we are pushing really aggressive on.
And of course, we are hiring across the board of the building,
hiring across the board of go-to-market people,
all the way from marketing, solution architects, sales wrap, and so on.
Yeah, yeah.
Seems like you're scaling very quickly.
Thanks for coming on.
Oh, thank you for having me.
Cool.
When I first met fireworks, I was very impressed by your team,
but since then, I've been more impressed by their execution,
and my guess is that this will not be the only time
that you'll hear about them on the Lane Space Pod.
So far, in organizing and editing this podcast,
I've been trying to bias towards reintroducing previous guests of the pod
as a form of, you know, end of year check-in episode with friends, right?
But so many of them actually mentioned fireworks.
You'll see later with cursor and perplexity that I had to put fireworks first
just because that many people have interacted with them, used them, and love them, or compete with them.
I think it's a really interesting open question as to how much moat any one inference or
commodity infrastructure provider can have.
The people who are not in the business say there's no moat.
And the people who are in the business, like Lynn, see tons of moat in the software that they write,
which obviously is proprietary to them.
It's also interesting to see them start training and releasing their own models.
And fireworks released a lava variant, which we previously covered in our previous
Neurip's episode as one of the best papers of 2023.
So I highly encourage you to check out that conversation with Haltian if you are interested.
So I say all that to preface the conversation that we're going to have with the next two
guests.
The first is a return guess, which is Amman Sanger from cursor.s.o.
We had them on in August to talk about their amazing rise to power as the AI first code
editor, they've definitely exploded all over my timeline. And at the time of the interview, I myself
was a VS code, Cody, codium, co-pilot, codium fan. And since then, I've actually switched my own
workflow over to Cursor because of the better workflow that they provide. But still, there's a lot of
open questions around their business. Just like Mosaic, during our podcast interview, they were actually
sitting on a fundraise, and they recently announced their fundraise with OpenEi. So let's check in on Cursor.
Okay, cool. So I'm back with Amand. Hey.
Hey, how's we're going?
Hard to catch you. You're a difficult man to find.
I guess so.
You've been exploring the Rips, and you also announced your fundraise since our last episode.
Yeah, so we raised $8 million from Open AI.
They've been a fantastic partner, and I think it was a great decision.
Yeah. Open AI used you themselves.
Yes, we have a lot of open AI users, and we're growing pretty fast inside the org.
The thing that we like to say is, like, cursor is the means by which research happens faster, right?
Like, as we make programming happen faster and faster, as we make programmers much more efficient,
we're making researchers more efficient.
And the bottleneck for research is really just implementation.
If you can come up with an idea and then actually have the code,
have the experiment all written for you immediately,
researchers just happen much faster.
And so that's the goal that we're working towards.
And I think, you know, we're tiny bit of the way there
with a lot of open AI users.
Yeah.
What's the funniest or most interesting sort of feedback you get from opening
eye people versus regular coders?
Like, do they prompt differently because they work at opening?
So they actually probably have less feedback than some of our other users who are less than linear language models.
Because they know what the deficiencies are.
They kind of know what's going on underneath the hood.
You can probably give them interesting input on what people are trying and failing with.
Yeah, that's true.
We do give them a lot of feedback on a lot of their early alphas and whatnot.
And so you've been tearing up the Twitters recently, putting in some effort.
What are your sort of top messages that have been really resonating with people?
I was a big fan of the KV caching tweet.
It's surprising that not too many people, it seemed like not too many people knew about this before.
So when people learn about Transformers, it's actually not in the documented literature and the academic side of things that KV caching is a common industry practice.
Yeah.
You only find out when you talk to industry people that you have a KV cache.
So like when you say KVCatch, it's really confusing because the KV Cash, like the KV Cash can be cash, right?
It's like almost like a double caching.
But the key idea here is, well, let's look at all the big closed model providers, right?
They all have like these chat models.
And with chats and with conversations, like the first end conversation messages are always fixed.
And that means like the first, let's say, like end tokens are going to be fixed.
And that means when I put the next token in, why do I need to redo all the work of
recomputing the keys and values for those first end tokens?
Yeah.
And a standard inference trick for this is you take.
those keys and values, you move them from GPU RAM to CPU RAM.
You store them, store them there for some period of time before they're evicted.
And then if another request comes in with a matching prefix, the matching original conversation
history, you just load those back into GPU RAM.
And you save a ton of time on compute, your time to first token goes.
And then because you're saving on compute, you can increase your throughput.
And this is a trick that you don't really see in any of the open source in print sessions.
So you don't see that, but people implement it on top of it.
All the use of the intellectual artists.
Well, understanding, like, I think together, for example,
I think is implementing this.
Yeah, and I just talked with Linthau from fireworks as well,
just doing that.
So one of the interesting, I always assume that it's because of personalization.
Like, hey, in my system prompts, I have today's date.
I'm going to have to update that once a day, fine.
Like, no big deal.
Yeah.
But maybe if people have more customized prompts,
but you know, you said there's some kind of cash eviction policy
where if there's like a 90-5,
5% match, you use the cache?
Yeah, I don't know what the exact eviction policy would be.
You could probably use, like, assume you have, I don't know, like 100 gigabytes of space per
device, probably a lot more actually, probably have up to a terabyte of CPU RAM per device or
maybe per machine.
You could just do something like least recently used.
And then if you start to use up more space than exists on device, you just evict the least
that you recently used request.
You are a consumer mostly of the GPC4.
API. Yes. They don't really expose this. They don't. They don't. How does this affect you?
I think it's actually pretty important to understand what's going on underneath the hood to take
advantage of these things. So like we use dedicated instances. Yeah. So they expose their capability
to you? Like somewhat. But like the key thing is they expose very little actually. And and
isn't it weird? I mean, yeah, but like the only way that you can really take advantage of this. And I kind
had another tweet about this is like you need to really understand what's going on
underneath the hood so like you can then plan for when memory utilization is
spiking based on how many tokens you're currently using or how much memory the
instance you can speculate as you or when are you getting a lot of cash hits so
you don't expect to be using as much compute which means you can then
increase your throughput without worrying about things going latency spiking or
things going down yeah and I don't know if you've I've taken this
Thought to quite an extreme level like you can use this to cash rag stuff like rag results
Yeah, just general prompts right? You can you can so I did have another tweet about this where there's
No one's done this the best of my knowledge and I think this would be very very hard to do
But you could technically cash the entirety of some corpus
And something like S3 if you have a model which has smaller size keys and values so this would be instead of full multi-head
attention, it could be something like grouped query attention, which is I think usually
around 8x smaller or even multi-quiry, which can be 64 to 250x, 256x smaller.
And so then what that means is you can actually read the weights from blob storage
if you have everything like really optimized. You can read it into RAM a decent but faster
than it would actually take to compute, recompute the key to KV cache. I think that will
be very tricky to implement. And I think there are actually not too many.
use cases where it would be useful.
I think code basis is actually one word could be.
Yeah.
My final observation on this is, opening I
I had the opportunity to offer caching to people
with the assistance API.
And again, they're charging you for the whole thing
every single time you send a message to the assistance API.
And I find it, like, is there some explanation?
Is it just like a, we can do it so we're going to do it?
I mean, it's tricky when you're not using,
I don't know what they're doing underneath
hood but if you assume they're doing something like caching at a machine level like
so I assume they're not they're serverless right so you have to load unload and
that costs a cold start and that's a problem for for them yeah so it's like really
trivial when you have like server endpoints server-based endpoints or like
dedicated instances right it's probably quite tricky to get right I mean I'm not
really confident as to like what their decision-making was there but I'd imagine it's
much more difficult to get right got it was your second
tweet that we prepped.
One of them that I thought was interesting was generating a retrieval data set.
Yes.
Synthetic data.
Using synthetic data.
I mean, the key thing here is there's a lot of using synthetic data to, like the outputs
of models actually train weaker models.
And so a lot of people have done this with GPD4 outputs.
This is actually, I think, that requires like, I guess, the claim that you can train on GPD4
in outputs and you'll still get like pretty good models out of that.
Yeah, it's well-submished.
Yeah, which seems reasonable, but we're actually relying on a weaker claim because all we're doing is, I mean, people can check out the tweets to see it in more detail, but GPD-4 is quite good at this task of ordering four candidate documents given a query as to the relevance of the query, right?
That's like there have been papers that show this, like, list-wise re-ranking, and it works really well.
So if you do that for enough documents and you do it in an efficient way, which we kind of use a variant of,
ELO-Call-Prof skill to do, you can then get a really high-quality, pre-ranking data set,
a really high-quality ordering over, let's say, 100 candidate documents given some query.
So we use GPD4 in the loop for being a bunch of different synthetic data stuff.
This is one of them, and I feel like more people should be doing it for this kind of stuff.
Yeah.
Yeah, I think people are exploring synthetic data a lot at the back-out of this year for
choosing like models as judges, models as synthetic data generators.
Yeah, I think models as judges is like almost certainly going to work.
If you like use Chanthoron, it's a very easy task.
I think this is a very easy task.
This is how we do RLEIF?
Yeah, yeah.
Though it's interesting, RLAIF, I was looking at that paper again.
And it seemed to really be good for, if you look at it compared to RLHF, it helped with
harmlessness.
I don't believe it actually helped in healthfulness.
It helped to achieve the Pareto-O optimal trade-off, which is no decline in the other two.
I think if you compare it to RLHF, it was pretty neck-and-neck.
I don't think there's a statistically significant difference with helpfulness, at least.
But it is interesting.
Like, RLAIF is just effectively getting better at censoring the model rather than improving its, almost like capabilities, right?
It's helpfulness.
Well, RLHF, like, it'll do it as well as RLHF, but it doesn't offer anything additional there, which kind of makes sense to me.
Yeah. First impression is in Europe's.
I mean, very interesting. Lots of very smart people.
I've had lots of very interesting conversations. I'll probably be back next year.
I was kind of lukewarm on it coming in because everyone goes like, oh, it's a big conference.
It's hard to navigate and all that. But then you run into a few papers, people are authors that are interesting.
And then like you're here, like a bunch of other people I want to meet are all here.
Like it's a nice way to get everyone in one place and just catch up on everything.
The house parties are fun. Yesterday it was just a lot of parties.
lot of parties. Yeah. I don't know. It's to me it's to me it's very overwhelming, but I
think the more exposures or epochs that you have on Neuribs the better. And I'm basically
trying to doing this audio experience to try to bring people in because there's many people
who just never come. Yeah. But they should get a sense of what's going on here. Like I find
there are people here who you've never heard off on Twitter. They're not on there. They just know
more than yeah. Because they've just done the work. Exactly. They read everything. Have you seen
the data comp paper?
I'll walk you over and show you.
I was very impressed by their work.
These people, they just come out of nowhere,
and once a year they do this,
and this is the place to find them.
So that's why I'm here.
Yeah, I mean, I completely agree.
Right?
There's really such a good congregation of, like, very good researchers, right?
Yeah.
Are you trying to hire them?
Let's make a hiring call.
Yeah, I mean, look, I think right now we're very small,
very strong team.
We were five last time.
Yeah, so we are seven now.
Only six engineers though.
Yeah.
So very small.
You're more millions than people.
Yeah.
Look, we're a very small training team and we're looking to grow the team, but we're looking to grow it very carefully and slowly because I think a lot of companies fall into the pit bull hiring too quickly.
Yes.
So yeah, we're really looking for fantastic people.
We're seeing incredible traction, incredible growth.
There's a lot more really interesting problems to tackle.
And people should check out our blog posts on that because I think,
like it's very excited.
The fundraising post?
Yeah, there's a fundraising post
and then we kind of link there.
There's a problems post
if you go to Nesphere.com slash problems 2023.
There's lots of interesting work to do.
Yeah.
And I think we have a really good chance
of being the team that can cat-crack coach in.
So it's a really exciting space.
I think you'd be joining a very small, strong team.
And so yeah, if you're interested in working with us at Cursor,
we'd love to talk.
You can just reach out to Ammon at Cursor.S.H.
Nice.
S-H? Oh, okay. Yeah, well, I thought it was so.
We might try to get.a.ai.com. We'll see. We'll see.
Cool. Well, thanks for jumping on.
Yeah, for sure. Thanks for having me.
So there again, you see one of the topics that I highlighted from my conversation with John
Frankel, which is why I put it at the start, which is synthetic data generation in all
its glory. And for Amman and Cursor, they're particularly interested in LLMs as
rankers or LLM's as judges. And that seems to be generally a more blessed way than
directly distilling the output of LLMs.
And you can look out for our episode with Nathan in 2024 to go deeper on that.
Another founder that recently raised that is the talk of the AI community, particularly
with Guillermo Raus and Toby Lutka recently endorsing the product is Arvid Srinivas or PoplexyTal
AI, which started off being, maybe we will construct SQL queries for you and they went
to, maybe we'll construct SQL queries on our Twitter scrape for you.
And now they've blown up as a potential Google replacement.
which is a huge increase in ambition,
but they have the web app and the mobile apps to prove it.
So here is Arvin's with perplexity.
And so congrats on all your successive perplexity.
The two most recent accomplishments,
which I have seen at least on my feed,
is one, you hit a million people on your mobile app.
That's huge.
On both platforms.
Yeah.
Android and iOS, independently.
Is that because of your slick video editing skills?
Actually, we have a good brand marketing designer.
But, I mean, more than everything,
I think the app's really good fast. We spent a lot of time on it. In fact our first
our first rollout of the app was not the grade. It was slow. It used to crash. Users
complained and we listened to that and like recruited a good mobile team much faster
and more reliable. Any technical decisions that drove that like is it React native? That's
slow or something else. It's all native. There's no we're not like on one common React
stack and the reason to do that is that that's the only way to make the apps feel
fast, right? And I believe chat GPT also does this. Like yeah, they don't use React Native.
And then the other accomplishment is PPLX online, which you have, which you're showing on
screen here. Yeah. What are the headline people, things that people should know if they haven't
heard of PPLX online? Well, it's like the only LM API that has no knowledge cut off. Yeah.
So if you're a developer and you just want to prototype products that need information from
the web or like has no knowledge off, this is the only way to do that. And it's super fast,
pretty accurate. You have two versions, a 7B and a 70B.
So 7V is super fast, 7B is a little slower, but also better quality.
And we plan to bring it up in the context of the mixtral MOE as well.
That's been recently really.
I think you've been pretty transparent that they are fine-tuned of Lama 2.
That's right.
We're not in the business of pre-training.
But what do you fine-tuned for between Lama 2 and what do you have?
Yeah, we fine-tuned for like summarization, the ability to take a bunch of sources
and accurately give you a nice summary.
And you are, I think, the only provider right now with online access or whatever.
But also, like, GROC has access to Twitter, which you don't have.
And they'll release an API at some point.
I'm not sure.
I'm not sure I trust them.
I mean, if they release, they'll be happy to use it, right?
Our goal is to just give accurate answers on the web.
And Twitter is just one part of the web.
Their vision is like Twitter is the everything app.
We believe, like, that's the information out there that exists outside of Twitter that's also super valuable.
In fact, like, you can even make an argument that information outside Twitter may even be a lot of
more valuable than information within Twitter
because most of the links that get
shared on Twitter are all from outside anyway.
Yes. So it's only like what do you miss out
on. It's like a specific person's
opinion. And usually like
journalists pick on that and like write web articles
so it's all going to diffuse, right?
Good ideas usually diffuse the rest of the web.
So we're not really missing out much.
It's a different source of data.
Yeah, it's a different source of data. Also it's all
about like what do you want?
Is your source or citation like already
highly curated human artifact or is it like some tweet these are all like questions worth asking
one thing that you do show off so i was watching you demo just now you have sentence by sentence
citation that's right yeah that's a design choice yeah because realistically your source articles
actually overlap that's right the full paragraph so why did you choose to impose sentence by sentence
that's how we write papers i'm an academic every sentence you write in a paper needs to have
a corresponding citation as a user it can be confusing like when i click
that link. Maybe it's like the third paragraph in a like that's right. They can do better and like
exactly navigating you to the right part of the link but we're looking into all that. Yeah, yeah, of course.
I mean, I do see you as like a search engine first with a very good language model team. That's right. Yeah.
Right? Yeah. Answer engine. I'll call it answer engine. Answer engine. You are doing a really good job
with that. I also noticed in your PPLX blog post that you also talks about the fresh LLM paper.
That's right. Maybe could you introduce that and like did you talk to the authors? Are they here in
Newerips, like, any...
I did not talk to the others.
Like, you know, it's not like we took a lot of inspiration from it.
Okay.
But it made sense to, like, you know, attribute the citation to them.
Yeah, to intellectual backgrounds.
Yeah.
What do you look for at New Rip, say, a conference like this?
Oh, we're here for recruiting good, strong researchers to join our team,
especially if they're more focused on, like, shipping models to a search product.
It's used by millions of people.
Awesome.
We'll talk about your hazard.
hiring call to action in a bit.
I'm also interested in like labs, like perplexity labs.
Yeah.
It seems like a place for you guys to experiment
with serving models.
That's right.
Yeah, everybody thinks like, you know,
you start as a rapper and then one magic day,
you just switch over from 3.5 to like, you know,
like your own model.
That's not how it works.
So in practice, your GPU's crash,
or like your nodes are not working,
or like Kubernetes doesn't work as expected,
and like requests are like not having the throughput required.
You optimize for latency, but then you are worse on throughput,
so you're not able to handle the spike requests.
So all these things can happen, right?
So you only know about these if you start small
and serve a playground where people come and test your own infrastructure
and see how it holds up, and then take the lessons from there
and use it to serve it on production, right?
So labs are sort of our playground for testing open source models
and our in-house models that have been fine-tuned for factual accuracy
and helpfulness.
And it's like a nice way for people to test open source models if they're curious about it,
especially if they think about it as alternatives to chat GPT.
And then it's also a nice way for us to like battle test our infrastructure.
Same thing goes to the API.
Like it's not like I believe these APIs are going to take over GPT3.5 APIs or something.
But it's a nice way for developers who want an alternative to like explore,
especially those who want to use faster small models, like the 7B models.
And it's also a good way for us to know
how we can handle like search requests and things like that.
Yeah.
I mean, so, like, I want to push back on this.
Like, you said your playground is a way to battle test.
Yeah.
But I think you probably get orders and magnitude more traffic
on your main app than your site app.
Look, we can't just directly ship to the main app, right?
And you can never simulate real use.
It's like a staging environment.
Yeah, it's like a staging environment.
Yeah, it's not just meant to be a staging for,
I don't want to like downplay the importance of labs.
Labs is sort of one of the fewest places on the internet today.
for you to go and explore and compare different open source models.
And it also tells the user how fast our inferences.
We give you all the metrics like tokens per second, the time to first token.
It's also a very transparent way to communicate the speed of our infrastructure,
which helps us also recruit good talent for infrastructure.
Yeah, but I think you're pretty opinionated that you are an app company first.
Yeah, we are not an infrared company.
That's right.
You just happen to have a good.
We're not competing with Together AI or fireworks.
Fireworks or like OctoML.
Yeah.
You know, there are too many of them, actually, honestly.
What do you think they need to do to win as an objective?
I think they need to raise an insane amount of capital
and subsidized the cost so much and capture the market.
Ours is basically going to be impossible
because you're all offering the same thing, more or less.
And Nvidia is basically commoditizing it, right?
Like with TRTLM and Megatron and things like that.
So most people's tax are going to get standardized.
So then why am I paying you?
I'm paying you for the GPUs then.
That's the game you can only pay it.
it's an economy of scale thing.
Which you're also buying your own GPUs and running your own stack.
That's right.
But we care about buying GPUs to serve our own product more than helping other people serve their products.
Yeah.
What have you learned being like a, I don't know, I feel like you're both at Infra CEO and an application sort of product CEO.
How do you balance that?
Yeah, it's difficult, but, you know, like you, one thing exists in service or the other, right?
Yeah.
Infrastructure exists in service of the product.
You always have to remember that.
For some people, product exists in service with infrastructure.
That's not how we are.
What does perplexity become a year or two years from now?
Hopefully, like, a lot more people start using it as a Google replacement.
I see.
You already, I read some stats somewhere.
You're 10% of Bing traffic?
I don't know about that, but.
Someone was measuring like a third party, like similar type of thing.
Yeah, maybe for, actually for Bing chat, we might be even further ahead.
Okay.
Like, it's just perplexity versus Bing chat, not Bing.com.
Which is crazy given that they have so much distribution, right?
Oh, yeah.
And marketing power.
But you are more AI native than they are?
That's right.
In a sense.
That's right.
You are a different search index.
Like you have your own crawlers.
We have their own crawlers.
Yeah.
So like if I don't want Bing, then I use your stuff.
And maybe you turn up for your stuff.
That's right.
Yeah.
That's cool.
Okay.
Hiring.
What are you looking to hire?
What people,
what should people demonstrate when joining you?
I think you have a very strong perspective on the kind of culture that you're building.
Yeah.
I mean, we work pretty hard and like we want to get stuff done fast.
So if you enjoy like fast shipping cycles and.
Can you give a,
illustrations? What do you mean by that?
You know, every two weeks, like, we have some announcements we make.
So we work on very clear, precise projects that have, like, clear deliverables, and we
kind of constantly want to keep improving the product.
So as a machine learning research engineer, if you're excited about, like, training models
and shipping them to production for such a useful use case, like consumer search, and want
to do it at the same velocity as, like, as us, like a startup rather than a big company that
has to wait for several months to get something into production, that's a unique spot,
like, to be in, right? And you also want to be part of a growing exponential rather than something
that's trying to defend its territory, right? Like, defend its territory. Yeah, like, Google.
Google's defending. I see. Yeah, yeah, yeah. So, they're attacked. Yeah. So you want to be
an attacking case. Have you heard, like, what does Google say about you? Like, are they interested in
buying you? I think they're, I think they've been pretty appreciative and respectful of the product, right?
But like SGE is not great for some reason.
Yeah.
By the way, I don't think Google people are not talented.
Like they're probably more talented than we are.
I think it's just that their incentives are not clear
and they might have to cannibalize their own business model.
This is the classic innovator's dilemma, right?
Exactly.
They have a cash cow and the needs to preserve that.
You don't have ads, but you're serving subscriptions.
And that's the main business model for now.
As of today, yeah.
That's it.
Well, thank you very much.
Cool.
All the people that we talk to so far and some of the best founders I know, whether or not
they're in AI, are fierce nerds.
And our event definitely reminds me of the fierce nerds concept.
But I don't think I'm the best person to tell that story.
Maybe I'll tag in Sean Puri.
Have you ever read that Paul Graham blog post called Fierce Nerds?
No, what is it?
It's an amazing post.
I'm going to read you a couple pieces of it, but it's one of those, like, Paul Graham, I think,
somebody said this earlier, they go,
what's the guy, Andrew Tate?
They said some tweet that was really funny was,
Paul Graham was my Andrew Tate, like growing up.
Same.
So funny.
So it's such a funny, it's such a deep cut joke.
But if you get it, you're like, it just hits the spot.
All right.
So he wrote this post and he goes, most people think of nerds as quiet,
you know, sort of like diffident people, right?
Just sort of like, you know, passive.
And in most social situations, they are.
They're quiet.
And, you know, they're not the star quarterback in the middle of the gym, right?
They're kind of a fish out of water in a bunch of different things.
he goes, but this is an illusion because that only happens when non-nerds observe them,
because they're observing them in non-nerdy situations.
So you see a nerd at prom, you just see them as a quiet sort of passive nerd.
There's no alpha in them.
But in fact, some nerds are quite fierce.
Fierce nerds are a small but interesting group.
They are extremely competitive, more competitive, I would say, than competitive non-nerds,
because the competition is more personal to them, partly because they're not emotionally mature
and they distance themselves from it,
but also because there's less randomness
in the types of competition that they engage in.
Therefore, they're justified in making it more personal.
I'll cut it off there.
That's a clip from the My First Million podcast,
and that's a story about how Darmesh Shah,
the HubSpot CTO, is the first nerd.
And I really like that concepts
because one, it helps to validate that nerds can also win
and why nerds can sometimes win more than regular people.
And obviously for more, you can read that program essay.
But I think Arvin is a fierce nerd, and I think Perplexity is a fierce nerd company.
They do have competition, though.
It's not like Perplexity is the only company going after Google, not the only company
going after Search.
One of my favorite parts in compiling these ensemble episodes is juxtaposing two competitors
next to each other or people who disagree have different worldviews.
Like, you just heard Perplexity, we just heard Arvind dunk on all the infrastructure companies,
including fireworks, which we just had on.
Now, I'm not the right person to tell you who.
who's right and who's wrong, but I know for a fact that they cannot all be right, and that's
what's fascinating. That's what makes a market. So next in full disclosure is a personal friend of
mine. It's Will Brick from Metaphor Systems. Metaphor launched end of 2022 with an AI search
engine narrative as well, but their approach is more of a pre-trained LLM research engine as opposed
to Ravid's answer engine. They're all very minor differences in the end. At the end of the day,
people want to punch in a query and get results. And Metaphor's approach is different. They are going
after the infrastructure play rather than the application plus infrastructure play.
And it's just nice to contrast them together.
And I'll leave the conclusions to you.
What is metaphor?
Metaphor is a search engine over the internet, but it's better than Google at handling complex queries.
Okay.
Why is that?
Why is that?
Because we train an algorithm from scratch, a search algorithm from scratch to handle complex
queries, basically.
It's a totally different algorithm, yeah.
Why are you at Neurips?
I'm at Neuris because we want to learn about all the cool things people we're working on
and also because we want to hire some crazy, good researchers
to help build a future of search.
Metaphor has a search engine.
That's what you launched this last year.
And then you also released an API.
And I've actually been using the API.
It's actually really good for augmenting LLMs of search.
I don't know how much to which you want to lean
being an app versus an infrastructure company.
Yeah, so we're leaning towards search infrastructure.
So we really see ourselves as like,
we want people to build applications on top of us.
We see the future as like everyone will use LLMs
as interface to everything, and we want to be powering
the search that underlies that.
I think we want people to build really cool UIs
on top of our search, but the hard part
and the thing we're focusing on
is really good search results.
Can you give examples?
You have some really cool examples of like tweets
and books and PDFs and stuff?
People really get excited about like researchers
working on something similar to them
in the Bay Area or something like that.
People have actually met.
Oh yeah, yeah, competitive Intel research as well.
People have met people in real life based on searches
because the results are so high quality
and they're not, you know, SEO spammed in any way.
It's just like exactly what you're asking for.
That, you know, it's cool to see that, like, digital information to real-world interaction thing happen.
I actually also interviewed Arvin from Perplexity, who I feel like is also in that sort of search domain.
But he's less focused on search infrastructure.
He's more focused on just being a search engine.
I don't know if you, like, compared yourself to Perplexity in that way.
Yeah, I know.
We got asked us a lot.
I mean, Perplexity is doing a great job at combining LLMs, you know, with search results.
and that makes, you know, that it does make for a better search engine.
That is the future of the user interaction.
But we're just like more focused on the search results themselves
and really trying to handle the queries that, you know, Google, Bing are not good at.
Yeah.
So, I mean, we want people to build LLM style interactions on top of our thing as well.
Wait, so you say Google Bing are not good at?
Do you think that people will use you in complement to Google and Bing?
Or do you just completely replace that?
At least in the beginning.
Like we're going to be used in places where Google and Bing don't,
work well. So I mean if your application wants to know the weather, wants to know like that Taylor Swift
song, basically if your application knows the right keywords to search with, then sure Google and Bing
are going to be fine for you. Yeah. But if you want to, you know, make these complex, almost
metaphorical queries with natural language, which are really the most powerful ones, then you
should be using metaphor. Yeah, yeah. I was actually walking from your, we're walking from your
sort of sushi party that you just had, like at a recruiting event. I hope the food was good.
It was pretty good. It's weird. I love me a little bit of sushi. And I was actually talking
people about your sort of auto-prompting feature because a lot of people, there was someone from
Mid Journey there and they were saying how Dolly 3 also does sort of auto-prompting or rewriting of the
prompts. Yeah. Is there art to auto-prompting? How do you feel that? How do you feel about your
auto-prompting feature, basically? Yeah, autoprompt is like we convert, we use Chachvetya, basically,
to convert the queries that come into the search engine into queries that are formatted for
metaphor's models, because metaphor is trained to predict links given text. So the model really, like
the best way to prompt metaphor is to search in a way where a link naturally follows,
which can be confusing. So we have this auto-prompt that converts into the right format.
You can kind of think of metaphor as in the same state as like what GPD3 was in.
I don't know if you guys remember, or if you remember.
It's not instruction tunes. Yeah.
Yeah, it's like, you know, two years ago, GB3 was auto-complete.
So you had to like prompt it in order to get the best output from it.
It had a lot of power, but it just had this weird user interface.
Metaphor is in a similar situation.
The problem is when you RLHF, you like, and we've tried this, like, it does reduce the power of the model.
And like it's just okay to keep, because like often we're using this auto prompt,
like it's okay to keep this model the way it is requiring this auto-complete type search.
And yeah, would you call yourself a search LLM?
Like very, very long ago, the original pitch for a metaphor that I heard from you guys
was you're an LLM that predicts links instead of tokens.
Oh, well, an LLM is like, yeah, I mean LLM is like, it's modeling like, yeah, usually like language.
And we're not really, we're not exactly generating the links.
We search over an index.
They're not hallucinated at all, right?
They're actually from an index.
Yeah, I wouldn't call it a search LLM.
It's more like really a search engine.
You might even think of it as a research engine.
There are a lot of different ways we're trying to explain.
I mean, I think we're using terms that were developed in an old era for a new type of thing.
So we might have to invent new words or wait until they are created.
Yeah, yeah.
What else should people know about metaphor in general?
What are the interesting work you guys doing?
I think just like the vision is super exciting, and I think people don't realize how exciting the vision is.
Basically, the vision is to solve search.
What does that mean?
It means no matter how complex the query,
metaphor should be able to handle it.
So we're talking like, you know, like AI researchers,
similar to you who are in the Bay Area,
who've worked on Rust before,
who went to so-and-so college,
who would be a great candidate for this startup,
whatever it is.
Like, we should be able to handle it.
And language models are powerful enough
to understand language at the level of a human.
So you should theoretically be able
to make a system like this.
It's just a matter of how fast can it be.
And we want to make these things
like do all those complex queries
really fast. And imagine if you could do this, imagine if this was possible, and then you combine
that with like, you know, GPD4, Gb5, and that's how we want our customers to combine us with,
you know, combine us with GB4, GB5. Suddenly now you have the ability to literally answer any information
query, no matter how complex. That's, like, the entire world's knowledge is at your fingertips.
That's, like insane. We basically become all-knowing. Yeah. You know, omnipotent.
Yeah, that's...
Amnitioned. Amnitioned, and then omnipotent, because knowledge is power.
I skipped a step.
No, no, yeah, I can do that sort of QED proof of why omniscient equals omnipotent.
I am very excited about you guys, you know, I've seen you grow literally from your living room and it's definitely not over.
What's it like having Mimi Celebrity CTO who keeps tweeting viral shit?
I mean, I love it.
Like Jeff literally just goes, Jeff has figured out Twitter.
He just knows how to go viral because he has really good takes.
And we often throw up a party in response to his viral tweets.
So you want to talk about the Andrew Huberman party?
So he had a tweet that was like Andrew Huberman as single-handly
destroyed the SF social scene because like everyone, whatever,
is like sober at parties and goes home early.
And so of course we had an anti-Huberman party
where everyone stayed late and we had like a bunch of beer and like everyone.
Well, my favorite was all over the apartment that we had the party in.
You plaster like quotes from Andrew Huberman about alcohol alcohol's graphrey.
Alcohol will destroy your brain and all these things.
Look, I mean everything in balance, right?
Like we should have fun in life, but all.
also, you know, be safe and everything.
And then he had another tweet about, you know,
how he was going to go on a date, but the girl ghosted him
and that allowed him to focus on coding that night.
So, of course, we had to have, like, a ghosted an SF party
where everyone came to code together
because you're already going to be ghosted on Friday night.
You might as well code together while you're at it.
Yeah, I love that part of the social scene.
I think metaphor is also really driving that somehow.
So congrats for all you do, and it's just nice to check in with you.
I've personally been enjoying the metaphor approach to
LLM search APIs. I've often said this in context of the capabilities of GPTs. So if you think
about it, what are the capabilities of chat GBT as it is today, as well as GPTs as announced on Dev Day,
right? There's the LLM base layer, but then you tack on three core capabilities on top of it,
right? One is retrieve all maintenance generation when you upload files and then you do rag on it.
And second is a code interpreter where you do generate code in a sandbox and then you run code and you
correct code and finally you execute it and third is you have a search feature and so we have a
bunch of companies competing for the rag functionality you can check our episodes mutually with
harrison of langchain and jerry of llama index this year there's a bunch of companies competing for
the code interpreter capability there's obviously repel it but then abstractly there's also deno and
valtown and anyone who runs code is in that game basically but what is surprisingly uncontested is
open web search and so far i think it's perplexed
of metaphor that are leading the pack in their different approaches. One, the PPLX API is an
integrated LLM plus search API and then two is metaphor, which is search only and you kind of
bring your own LMs. For our next guest, we're actually going to go over to our last return guest,
which is one of our most recent hits, which is Jeremy Howard, previously of fast AI, but now of
answer AI. It seems that all people want is answers and Jeremy doesn't have them, but he has
questions.
I almost grabbed the wine.
I realized I had to be the interviewer, and I was like, I should probably should have wine.
And I had to pick a wine, and Sean told me, pick the most expensive one.
Yeah, it's on decibel.
Because decibelics pay for it.
The one I'm having is from a $160 bottle, and it's really good.
And I did the same.
And I'm not having any wine.
You're too young to drink.
Yes.
Could we go around and identify voices for people listening?
Maybe Tanish can you want to start?
Sure.
My name is Tanisha Abraham.
I am the CEO of MedArk, which is a medical AI research organization.
I also work as a research director at Stability AI,
and I've been collaborating with Jeremy Howard for more than a year,
like a couple of years maybe,
and he's also the president of MedArk,
and he's been heavily involved in my venture as well.
And you have a podcast together, which I really enjoyed.
Oh, yeah, yes.
Jeremy had me on his podcast, which was a...
Your first and only episode, or what the hell?
Yeah.
It turns out that maintaining a podcast is hard.
Ah, it's easy.
Just shove microphones in front of people's faces.
So I'm Jeremy.
This is my voice and as of today I'm Jeremy Howard of answer.AI I guess and repeat guests on Lin space
Your last episode did really well in terms of the number of views. Yeah you guys are good
interviews. Well also you drop a lot of spice which is what we like as podcasters.
We also have Jess Laow for the first time. Hey yes hello I'm Jess Laow and I'm a partner
at Dessable excited to be here. Excited to be providing the wine also.
Standing in for Alessio. Oh so good. Alessio dished us tonight right?
So you're the better replacement.
Yeah, it's good because in a previous conference,
Alessio was wearing my badge and replacing me,
so now I can be Alessio for today.
Well, you just were a shorter version of Alessio, basically.
So today was the Answer AI announcement.
Maybe you wanted to cover that.
Just like, what should people know about it?
What should people know about it?
Oh, I don't know, man.
You went from Fast AI to the dark side now.
No, it's not at all.
It is the light side.
It is actually, it is the light side.
But fast AI, look, I spent the last week in San Francisco, and the amount of love I received
for Fast AI was overwhelming.
I couldn't believe how many people told me it changed their life, you know, which is just amazing.
But I have to say it's actually time to be rejuvenated.
You know, the mission is the same, bring AI to as many people as possible.
But now we can't do it on the back of my business.
bank account I've been paying for everything on my wife you know we can't afford it
you've had donations and stuff no no no no no nothing actually it's sorry you were
very steadily against donations I remember this yeah no donations no revenue of any
kind totally independent but now you know I think we can do a better job by having a
bank account with money in it so thank you Jess for sending us money
Jess what is it like when someone like Jeremy comes and like goes you know
We need a bank account.
You know, there are some people that you go through a pitch,
and then there's some people that you email and you start prepping the wire,
and I would say that Jeremy fell into the ladder.
I didn't even ask for this money.
I was just going to have a chat with Alessio to get some advice.
And then Jess turned up, and Jess's other partner, John turned up,
and I was like, what are you guys doing here?
They're like, oh, we'd like to give me money.
So I was like, oh, okay.
So that was good.
They have good taste, right?
Yeah.
I've talked to you a bit, especially at the modular conference, which I'm wearing the badge of.
Nice hoodie, yeah.
The hoodie is really nice.
So you're interested in fine-tuning.
You're interested in fundamental research.
Could you list out the main areas of interest, maybe?
I mean, basically the interest is in making AI as useful and valuable as possible.
Yeah.
That's how we make it, like, as accessible as possible, as widely used as possible,
help as many possible people as we can with this technology, right?
So how do we do that?
It needs to be cheaper, it needs to be faster,
it needs to be easier to use,
and it needs to be more integrated into people's day-to-day lives
into the stuff that they do.
This is like hard, you know,
and so in the end, I guess I was inspired by Thomas Edison's invention factory
in the late 19th century,
where they had the same situation.
They were like, oh, look, electricity has been invented.
Okay, what do we do with this?
It's a source of power.
I don't know.
And they're like, oh, let's create the record player
and the light bulb and the refrigerator.
And, you know, it's like recognizing that now you have electricity,
you can make all these things.
That's hard.
It requires really smart researchers
who deeply understand the underlying technology.
Recognize like, oh, there are some gaps.
here but they could be filled if we like use this kind of different kind of filament or whatever
and so you actually need like deep technical experts who also have the like curiosity and playfulness
and spontaneity to like think like oh what if the world had this new thing in it I wonder if we
could put that thing in the world now that we have AI yeah you were very complimentary of like
the open source so we last made at the open source
meet up as well.
We met so many times.
And you're very complimentary of their approach towards just trying things,
like model stacking, for example.
Is that the kind of people that you're looking to collaborate with?
I think partly, you know, I'm deeply involved in the open source community,
and I want to continue to do that.
You know, all the best kind of models outside of your kind of open AI,
and stuff are all created by the open source community at the moment through just trying
crazy things but it'll be a mix you know I also want to work really closely with the
best academics in the world you know and I also want to collaborate with the people
in parts of the world we've never even heard of who never get a chance because
nobody gave them a chance and you know so one of the things we're going to be doing a lot
of is like recruiting in really weird ways you know to find those people who are
underappreciated and would it be like a challenge like a caggle type challenge
yeah like caggilly kind of things and you know you know politically find ways you know
all through like open source bounties and stuff like that like basically give people an
opportunity to show that they can do amazing shit that nobody else can do yeah doesn't
how old they are or where they live or what color of the skin is or whatever.
Yeah, I think what the FastiI community has shown is that a lot of people who don't have a traditional background
that are really talented people.
And I think, yeah, it's great that that was there for, that the Fasti Eye community was there
and that Jeremy continues to highlight those talents as well.
Let me give props to Tuniske as an example, right?
So Tanishk is the CEO of a research lab, which I'm the president, met up.
And how old are you, Tunish?
I'm 20 years old, which is why I'm not drinking the wine.
You know, so like Tunish is a great example of somebody that most people wouldn't hire as a CEO.
But why the hell not?
Like, he finished high school 10 years ago.
He finished high school at 10.
You know, he had his first degree at, what, 14?
Like, he's somebody who's, you know, doesn't die.
That's somewhat, like, he went after the traditional accreditation, the pieces of the,
the paper that you would pursue to show yourself as qualified. So in a way, he's part of that
status quo. In a way, but, you know, unfortunately people are ages. Yes, they are. And so...
And I also know that I never actually did a computer science degree or anything like this.
My start with AI was actually through the Fast AI course. Yeah. So, yeah, and so it's been a long journey since then. Yeah.
What would you ask him about Answer?
Because I already know a lot of what's going on at the company.
What is he not saying?
Does he's too humble to say?
I think what he's not saying, he already has a great team of researchers that, you know,
there are already two researchers that are at Answer AI that are amazing researchers
that I've had the chance to also interact with over the past maybe a year or so closely
and also just more generally.
I'm looking forward to seeing what Answer does.
And I'm really excited to continue to collaborate with Jeremy.
I think this would be even better for me.
Like I'm selfishly, I'm very excited because I think it'll be better for me, you know,
to work closely with Jeremy as well.
Even though, you know, he's in his own research lab,
but I think the collaborations that will come out of this will be just be amazing.
So that's what I'm excited for.
And Jeremy, last time you were on the podcast,
you said that, you know, one of the most consistent pieces of the advice
that you always give is that people just need to show up, follow through,
do the work, that stuff.
Obviously, Tunis did that.
Yeah, so Tunis is one of those rare people, right?
But, like, I feel like Tunish is more special than that.
Like, what else did he do really well?
Yeah.
So, I mean, God, how old were you when I first came across you?
Like 15 or something, maybe?
Wait, what?
It's been so long?
Yeah.
Because he only took Fast AI a year and a half ago.
No, no, no.
He was a fast AI student back then.
Okay.
And, you know, he kind of got on the forum,
helped answer questions, you know, asked interesting questions of his own, to stick with that for
five years, that's tenacity, you know, and the last course we did was the hardest course
we've ever had. It was the diffusion course. It was the first ever stable diffusion course,
and none of us knew what the hell was going on. And, you know, he was the one who slogged through
the math, figured out
what the hell all those Greek letters were saying, and did
the first math of
stable diffusion video that
as far as I know that ever existed.
You did that with
Wassim, right?
Along with Wassim.
So, you know, he's like
he slogs through difficult shit.
And the thing that I
noticed now is like, you know,
Tuniske is kind of famous, or was kind of famous
as a child prodigy.
You did a TED Talk when you were 14.
He did a 10-t- I was nine, but I did it.
He's nine, okay.
And like, so I kind of thought like, oh, things are easy for child prodigies.
You know, they're so smart that they just, it's easy.
And I'm like, oh, no, actually, Tuniske's nearly as dumb as me.
And so he just works really, he just works really hard.
And he's, and like, he's, and like, Tini's, what does this mean?
He's like, I don't know.
Like, oh, okay, we better figure it out.
And so that's been interesting.
to see that like actually child prodigies have to work really really hard as well you
know that's part of what makes them a child prodigy is that they're tenacious and they don't
give up even over five years. Does it look that way to you? Is that what you? Yeah I think so and I think
again part of it you agree you're nearly as dumb as me. I know say it again for the pod. I think
Jeremy's trying to trick me here but I think um the fascinating community has been so
friendly that it's been a really pleasant experience to stay with that community and I think
that has also enabled my tenacity because I enjoy being in that community so much so that's why I've
stuck around in that community for so long so without that without the community that Jeremy has built
I don't think there's any way I would support you yeah I had the same with free co-cab so you know I think a lot of it
has to do with with I'm gonna cry I think a lot of it has to do with building good communities
and Jeremy has done a really good job of doing that.
And it's actually a lot of hard work to build a good community
and to nurture and grow that community.
And I've been in many communities
and I've kind of observed how different communities
in the AI feel have grown.
And Fast AI still is one of the best communities
that have had a chance to be a part of.
So, you know, again, props to Jeremy for doing that as well.
I'm so embarrassed right now.
I want to give you the perspective.
You've been an AI investor for a while.
Yeah.
And like how do you view this community
in this moment's here.
The one thing I will say to the conversation that we were just having that I think is awesome
is we can move here a little bit.
Yeah.
There's so that people keep coming and drinking more wine.
It's great.
It's a mobile studio.
Yeah, more truly.
Mobile studio, middle of New Orleans.
Let's go.
One of my favorite heuristics as an investor is distance traveled rather than just your,
rather than just like, what do I see today in your resume or whatnot.
Because I think if you just, you just.
just go by a certain pedigree or credential or whatnot.
You miss a lot of people who have traveled a really big distance,
who didn't have advantages to certain opportunities,
or came from different places, or not from the US.
Like you name all the different, you know, all the different lists.
And I always try to look for those kinds of people
because they're the ones that are always pushing the frontier
and really run through walls.
And I think this conversation is a good example of that, right?
I mean, no one has a longer distance travel to Germany.
100%.
Well, literally and in the set.
Literally from a lot of time.
Australia, yes. Yes. And when we were, and I think when we were meeting last week, you were talking
about this a little bit around looking for engineers and people at places that, it's not necessarily
where everyone else would be looking at, but that has yielded some of the best, like, deepest
relationships you've had, right?
Oh, absolutely. I mean, companies turn resources into valuable products and services, right?
Like, what are the resources that we suck in? It's like, it's people and GPUs, you know?
And money.
And, well, we need the money to get those GPUs and the people, right?
Like, the GPUs are, you know, reasonably, like, you can replace one with another, no worries.
So it's actually the competitive advantage, the thing that makes you different is the people.
So this is the most important thing for us to achieve our mission is to build this team, you know,
to build this really special team.
And I think the way to do that and the way I've always built teams is to say,
is to look at people and say like, okay, where is this person now?
And what would it have taken them to get there?
You know, like, so if somebody's like, you know, was kicked out of high school,
you know, because they were dyslexic or because somebody was like grew up in the mountains
of Bangladesh and didn't have a PC until they were 16 or, you know,
somebody fought against, you know, a woman who grew up in an environment which he had to fight
against, like, institutionalized sexism or whatever. It's like, these are the people to me. I just
kind of go like, okay, this person's gone from like negative 43 up to 99. Yes. Overcome a lot.
That's a kick-ass amazing person, where somebody who's gone from like 98 to 99 is like, okay,
it's cool. But they're probably not the people who are going to
like change the world. Yeah. And so we want to be a small team where like literally every person
that is somebody who can change the world. And the nice thing is when you're in a small team like
that, it's just really enjoyable because everybody's like just really great to be around, you know,
really inspiring. And so yeah, that's why we're kind of looking for these extremely special
individuals. Yeah, cool. So that's a hiring call explicitly, you know, if anyone's listening,
who fits that profile and really wants to work with you, they should reach out, right?
Yes, absolutely. And now we have a website to send people to. So I was going to wrap it up with
just overall New Reps tips, right? Like, what does it like to be at Neurip's this year if you've
been here before? And also, like, what's your best tip for doing Newrop's right? Anyone can take it?
I guess I'll start. This is my second Neurobs, so maybe I don't have a lot of experience with it, but I mean, I've been enjoying it a lot so far.
For me, I think it's about networking with people, and that's the best part of Neurups, because at the end of the day, AI moves so fast that half of these papers are already kind of outdated.
Like, you know, we've already seen like...
They were written months ago, right?
Yeah, yeah.
In order to get here, they have to be reviewed.
Exactly. So, you know, we're already seeing the second.
version or the third version of a lot of these models already. And, you know, so, I mean,
it's, for me... So archive is all you need.
Archive is all you need, I guess, yeah. So for me, the value comes out of talking with people and
meeting with people and networking. And that's why, you know, we're coming to events like these
to network and, you know, make these connections. And, you know, I actually meet a lot of, like,
collaborators and other researchers at all these conferences. And just to be clear, when you say networking,
like, it's not, like, networking in that sense of, like, getting ahead. It's a kind of,
of a really nerdy kind of networking.
So like earlier, Tanishka and I were at another reception
where it's like, oh, there's Albert Goh.
He's the guy that like two days ago released the Mamba paper.
And we got her and say like, oh, you know,
let's, we had a conversation about states-based models
and why he's using that and what he thinks
the opportunities and limitations are,
or is there still room for attention?
And like, so when we say networking, you know,
we mean like geeking out on deep conversations
about people's academic areas of interest.
Yeah.
I always follow up the question of like,
okay, like what's your name where you work?
And then what are your interests?
And then we tend to go from there.
Yeah, just like what paper did you write last or?
You know, I will say one thing.
So even though the posters, there are a bunch
that truly you go by and even the people presenting
are like, yeah, this is kind of out of date.
The one hack that's really fun is a lot of those people
are also already working on the next thing
and they can give you sort of an early preview
of something that actually is not an archive yet.
And so that I actually have always,
my favorite parts of the conference,
actually just walking around the poster session, shaking hands with people who are presenting and
learning about what they're most excited about, what they're working on, what are some of the new
things. So I find that really fun. And also in my case, since I'm a VC, my best tip is throw an event
with a lot of good wine and let the people come.
Jeremy, you have any tips?
I mean, like Tunis, because it's only my second Europe's. But I've been to quite a few conferences
in general. And my tip, number one, tip for all conferences,
is don't go to any sessions.
Yeah, just stay outside and talk.
Like, whatever they're saying very, very slowly,
and they're probably not an expert at verbal communication either.
You can probably get the better version
by just reading the damn paper that they're reading out to you.
So don't bother with that.
So I'm like, yeah, hang outside, you know, in the hallway,
look on the app to see who else is around
and reach out to them and, like, try and, like,
find a group of six or so interesting people
to go and, like, check out the, you know,
local Louisiana sausage special outlet with whatever.
Yeah, that's...
Reception hopping.
Yeah, reception hopping.
This is our fourth reception tonight.
Oh, my God.
Fourth and best, right, Jeremy?
Oh, fourth and best.
This is why we came to this one last,
so we can hang out here until the wine's finished.
So a lot of people hate on the official in Europe's conference app, Hoover.
But I kind of like it because of one thing, people can organize their own meetups and listed here.
It's awesome.
It's awesome.
It's actually really good.
Yeah, so I'm Brazilian and there's a Brazil like little chat.
And it's so fun.
Everyone's talking in Portuguese, talking all the time.
They're sharing all the things that.
And these are people talking about actually like interesting concepts in Portuguese.
So it's actually really fun.
I love the app.
And I didn't even know you're Brazilian.
I am.
Yeah, Leao with the old school.
Yeah, my accent kind of like trips people.
And it also trips people when I say something incorrectly and you can't really tell.
But I'm like really Brazilian.
Yeah.
Well, we should we should do a steakhouse next time.
Yes, please.
Yes, please.
That's one of those dinners.
Trousgarias, right?
Yeah, true has an idea.
Exactly.
My favorite was, there was a meetup for people who are interested in sushi.
That was the meetup.
I love it, yeah.
Nothing much be learning about it.
So at ICML, it was really fun.
There was one meetup that I went to that was just like swimming in the morning because it was in Hawaii.
It was actually kind of awesome.
And then people were like actually discussing like super legit topics in the ocean.
I'm actually kind of sad I missed on ICML, but like it felt indulgent to go to Hawaii for that.
Yeah.
Okay.
bring it to a close. The last thing I was going to say is, Jeremy, I don't know if you know,
I picked your meme as the best meme of November 2023. It was Laundry Buddy. So what's up with
laundry buddy? Why do you hate it so much? What did it do to you? No. It did nothing to me.
For people who are out of the name, what did you do? I couldn't have walked it back more.
Jeremy did walk it back on Twitter. Do you really going to make me revisit my shame?
I just think it's a fun story.
Just for your show.
Some people don't know.
Some people don't know.
I made a bold claim that Laundry Buddy was not the peak of open AI's path to societally beneficial, artificial general intelligence.
I was wrong.
It is, in fact, very much on that path.
It is well loved.
To be able to know that the world's best artificial artificial.
official intelligence can help you figure out how to sort out your whites and your colors,
whether to use powder or pods, and what to do, you know, if you get a stain and you don't have
laundry nearby, it's special, it's important, and it's a part of my life that I will
never want to be without. I love that the, so the chatypte now has an official Twitter account
and they even got in on the laundry body meme, which is amazing to me.
I actually spent a couple of hours this morning hanging out with Boris Power from OpenAI,
who, you know, was in there batting for Laundry Buddy from the start.
Wait, there's an anti-and-pro laundry buddy.
No, I mean, he was just a particularly strong enthusiast.
He had the grace to not even bring it up, unlike you.
I had to. It was so funny. I cracked up so much. It was great.
Well, thanks for chatting. I'll return you back to your evenings.
May your clothes be well-wanted.
Thanks for having us.
Cheers.
Thanks.
That was Jeremy Howard, together with Tanishk Abraham and Jess Liao.
Tanishk and Jeremy recorded a podcast,
I believe, so if you want to learn more about Tanishk,
he's done long-form interviews in more detail than I can cover
because it's a lot of biomedical stuff,
and that's one of the areas that we are not very knowledgeable on.
And for Jess Liao, she was an investor in Mosaic,
is one of the newest partners at Dessable,
and led the round in Answer AI.
Next, we're going to go to some people on the show,
Joel Floor of the Neuribs Expo, they're not people I had prior relationships with, but they're
still doing interesting work nonetheless.
And the first is we're going to check in with Cerebrus, which is not only producing giant,
massive GPUs, but also publishing interesting research.
So here's my conversation with Joel Hessnes, principal research scientists at Cerebris Systems.
That started working about a year ago.
We started building out multi-box systems so that we could do cluster level training, so larger-scale
models and so this last year we've just been like showing off what what it's
capable of so early this year we started with our Cerever's GPT models that
showed yes compute optimal scaling for this so Chinchilla style scaling but it's
open source all those models we released open source based on that work we we got
attention of a few different groups one of them was the open tensor foundation
and they came to us and said,
hey, we want a great 3 billion parameter model
that does, so it's something that's easy to deploy,
like in a laptop or something,
and we wanted to do very general language capabilities,
long sequence length.
And so we train the BTLM language model for that.
Concurrently with that, we also had an engagement
that started up with Group 42 in the United Arab Emirates.
So that's this poster, Core 42,
They had interest to train large Arabic language models.
So the first demos that we did for them were just Arabic models,
but then they said, let's do multilingual Arabic and English.
So we've been training the Jace 13 billion and 30 billion parameter models this year.
We've released both of those publicly.
The first version of the 30 billion just came out.
And the quality of that model is in Arabic,
is better than any other public models currently.
And then in English, it's competitive with models like Falcon 40B.
So we're on a good track there.
More releases to come through Core 42.
We're excited to have that be open source
and to contribute to the community there.
Yeah.
And then...
I think totally, I mean, since we're already chatting,
so might as well keep going.
But the UAE also notably has the Falcon or TIA
Institute. Are they related? Are they competing with each other? What's going on?
Initially there was a little bit of competition. They're funded by different people,
different groups. But there is a countrywide effort going on in the United Arab Emirates
to consolidate a lot of their AI efforts. So that's why we're seeing very
impressive and good pushes towards let's make it open, let's collaborate some more. And so
there might be opportunities in the future for us to coordinate directly with TII.
And we have looked at things like their data sets like Refined Web.
So there has been some exchange.
Yeah, with the macro data refinements process that I don't know if you know it was a reference
to an Apple TV show.
Okay.
Separates.
Anyway.
It's my fun fact.
A little bit editors note, the TIA Institute people were actually there at Nureps,
presenting a poster on Refined Web, the dataset that they did for Falcon 180B and 40B.
so I asked them about the name.
My last question is about the name.
Is it from Apple?
Is it from Severance?
Yes.
So what's the story?
It was a...
No, it's just like...
But in the end,
we had someone look at the data
every now and then,
like go through the thing.
And that's like looking at the scary numbers.
So, you know, this was...
You know, nobody comments about this.
I know.
I was like, wait, I saw this in Severance.
Yeah, I know.
Right?
Like, I was like, this is a good joke
because it's exactly what you do
when you do filtering.
Exactly.
If you haven't seen Severance,
It's a great show.
It's on Apple TV.
Great watch for the holidays.
Pretty short.
And it's interesting.
I guess you can call it AI-related now.
But it's cool that...
Well, so one of the things I often get asked about
because we have listeners in a lot of different countries.
Should every country have their own model?
You know?
I think this is a really tough question
because the volume of data in different languages
is its power law, Zips Law distributed.
So the number of low-resource languages is massive.
We're talking over 100 languages that are low resource.
You just have too few tokens to do a lot with in the language modeling context.
So it's much harder to deal with those.
Now, we've actually seen a few different techniques at NURIPS that are targeting those sorts of settings.
And they're doing things like train a base language model in English and then do transfer process where you co-trained with both languages.
That makes a lot of sense.
It makes a lot of sense.
In that setting, you want to get the knowledge representation from one language and then try
to adapt the style or grammar syntax, I guess, the easier part.
In Arabic, we're in a sort of medium resource language.
There I think it makes more sense to try to mix two languages if you want to do multilingual,
and then it helps you do things like translation.
And then higher resource languages, so if you're talking European language,
languages, French, Spanish, German.
Those I think you can do probably from scratch in those languages.
And probably pretty easy to do multilinguality also.
Yeah.
So, yeah, it's definitely a very interesting open direction.
We're pushing for.
In fact, I maybe reference, we have a multi-lingual workshop on Friday
where we've invited a bunch of groups to come.
and give talks about their experiences with training different language models.
Cool.
Well, people can check out the authors.
I'm sure this is published and Findable Online.
Yes.
Cool.
So we should probably get to intros a little bit.
I mean, we're already recording.
Who are you and what do you work on?
And what is your team work on?
So my name is Joel Hessness.
I'm a principal research scientist at Cerebra Systems.
And I'm the lead of our core machine learning group.
So I've helped us bring up our foundation language models first and helps kind of set some of the direction for expanding outward from there.
So we started by expanding out a lot on the language, the common language functionality.
And now we're expanding into other places where transformer models can be used.
So targeting things like multimodal and other workloads that are similar.
Okay.
So a lot of our effort has been bringing this up and coordinating with the broader Cerebrus organization to do, lower these applications down, get them compiled to run ad efficiency on our hardware.
So there's been a lot of performance optimization, making sure numerics are correct for training large models, making sure things train stably, things like that.
Yeah.
So, yeah, we're focusing on scaling out right now, getting much larger clusters.
We've sold a couple already.
To G42.
G42.
Yeah, exciting things to come there, I think.
Exciting things to come.
So we're going to cover some of the other posters that you have here.
But one thing, I guess, people are very unfamiliar with anything but Nvidia.
What should people know when working with a Cerebrus trip?
Sure, yeah.
I think maybe people might be familiar with our wafer.
So Cerebrus uses a full wafer for our processor.
instead of cutting the wafer apart into pieces.
If you cut it apart, you end up packaging it into a bunch of different cards,
and then you package those into a box.
Then you have to network them.
And then you have to network them all together with a bunch of extra software.
That's very complicated for large-scale applications.
And so instead of doing that, we leave it together on a single wafer.
That single wafer goes into a single big box.
The performance is roughly equivalent.
Our CS2 box is roughly equivalent to maybe 20.
A100 GPUs and you can program it like running on a single GPU so it's just much
easier to use nice and is it cost effective as well I assume it is because yes you're
saving a whole bunch of overhead right so we aim so the the manufacturing
process it has a lot lower cost because we don't have to deal with as many
moving parts the fewer points of failure reliability is quite good and we try to
we aim to be
performance comparable to GPU systems.
Cool. Awesome. That's the hardware stuff. We're also going to talk about the streaming things in a bit.
But yeah, I'd love to whatever you want to pick next as one of your work for this year.
Just give an overview of some of our research directions.
So our hardware is, it has native support for completely unstructured sparsity.
What that means is we can send in, say, if we're using the weight of the weight of the way
streaming mode, which I mentioned. A weight that comes in, we can do a vector multiply with some
activations. So you can use that in your matrix multiplies on the wafer, but you can do that
on a per weight basis. You don't need to load the whole thing at once. You don't need to load the
whole thing to do matrix multiply. So what that means is we can do unstructured sparsity, just send in
the weights that you actually want to use in the matrix multiply, and you can get a sparse matrix
multiply.
Isn't the decision for, like this is the argument, classic argument against that kind of sparsity
is that the decision actually takes longer than just doing the map anyway.
Like the branching, the sort of cheering complete branching.
That's a, yeah.
So part of the approach that we're using is a weight sparse approach, which means the
sparsity is in our, in the model itself.
And so then while you're training that, you'd prefer those weights to be the same sparsity
structure for a while.
Okay.
So there are techniques that train.
Some kind of constraint, some regularization thing.
Right, yeah.
So the early works in this are things like the lottery ticket hypothesis,
where you'd find the...
John Frank goes like 10 feet from us.
And there you find the mask by doing some heavy-duty training,
and then you rewind and retrain the model from scratch.
Yeah.
Now that's static sparse,
so you have the same weight sparsity all the way throughout.
that works great on our hardware.
We have, however, added a bunch of new functionality that's sort of beta in our recent release
that allows you to change the sparsity throughout training.
And so that's something that's being used in recent research works like the rigging the lottery ticket hypothesis work.
So riggle, and then another one called Set, a different approach to deciding how to change the sparsity.
but those updates happen infrequently enough that it doesn't harm the performance on our hardware.
That's cool. Awesome.
So this is sparse IFT is the paper that you published.
Yes. So our sparse IFT work looks at different ways that you can swap out layers for sparse versions.
Yeah.
Using the same flops that might be able to get you better representation capability.
So if you have pressure in your representation that's in your activations, for instance,
let's widen the layer and sparsify it to give the model more activations.
You can store more in those activations.
Those end up staying dense.
So our results here show that we can get something like a 2 to 3x performance improvement at 75% sparse,
or you could flip it around and you can get, for the same flops, a better more.
model by sometimes three to five percent.
That's probably budget-wise.
I guess you're choosing between pre-training and inference, just like many people,
like what you're optimizing for.
Yes.
That's great.
Awesome.
And what else are you leading?
So I'm also working on some of the pre-training efforts that we're doing that look
at things like gradient noise to estimate good batch sizing and make sure that we're
making efficient use of the compute.
So there are techniques.
So we have a poster, the efficient and approximate, per example, gradient norms paper.
This is, yes.
So this is at the, we have this published at the WANT workshop with NURPS.
And the basic idea is gradient norm calculations are typically if you wanted to do the gradient norm calculation,
you'd want to aggregate all the gradients together and then calculate the norm.
And you do that over your batch.
So it's helpful if you want to measure some training dynamics,
but if you want to look at something like critical batch size
to understand how well is my model training in terms of efficiency,
you actually want to have sub-batches.
You want to understand the grad norms of the sub-batches also.
You use that and then the large-batch grad norm,
you can calculate noise statistics, like signal to noise, maybe.
If you use this technique that was defined by one of my teammates, Gavia,
we can do an approximation that allows us to run some statistics over activations
and run some statistics over the delta gradient values coming back.
And then you can take a dot product, an element-wise product of those now,
it's much more compute-efficient to calculate.
For each example, this is an approximation of,
the gradmunk for that sample.
And then you can arbitrarily kind of combine those back together to get
estimates of gradient noise.
Okay.
So this is something where we improve the compute the compute requirements.
We use this in a few different contexts currently,
but it improves the compute requirement for this from
for high dimensional tensors from the dimension of the tensor down to linear
linear time computation. Nice. And do you, is there like, I forget what this,
what does this call it? It's kind of like an annealing curve or something where you use this
technique at the start to initialize and then eventually you sort of wean yourself off it.
So if you, so this is something you do want to track throughout training. Yeah.
Especially if you're doing like phase training or if you're changing the data distribution or something,
it's really helpful to have these statistics to decide, is my, am I using an appropriate batch size that I'm getting good generalization with the new data?
It helps you set learning rates and things.
So this is something you want to track throughout training.
It gives you an estimate of how big the batch size could be.
Yeah, excellent.
Very cool.
Any one more?
Sure, so then given that we have a sparse accelerator,
we're also looking at applications where you can deploy sparse models.
And part of our work is figuring out how to find those sparse models that you use in a deployment setting.
And so we have other work that's related to like the sparse GPT work that's been recently released,
where we do some pruning after dense pre-training,
and we do some retraining to get the capabilities of the model back
before you would put it in deployment.
How much of it can you get back?
Actually, I'm not totally familiar.
This was worked from my team members.
I know we can do, so for large, very large language models
that have not been trained on a huge number of tokens,
you can do easily upwards of 50% sparsity
and fully recover the upstream losses from this retraining.
So this is a really big next step challenge for a lot of the organizations that we work with.
They're interested in now they're able to pre-train a very large model with the hardware.
Now they're interested in figuring out how to deploy it in an efficient manner.
So we're working with a few different groups on this.
So we're working with Qualcomm and
another group called Neural Magic that does inference for these large models.
Yeah.
Amazing.
I was going to ask if you need the same data set to retrain, but it looks like you train on the pile.
So I guess that's a no.
Yes, you can actually shift here.
Obviously, different data distribution means you have to be a little bit careful about how you do the retraining.
So I think there are a few different things we've learned about different learning rate warmups.
different learning rate levels, I guess, because if you're doing a big distribution shift,
you want to allow the model to shift a little bit, and so you want a slightly higher learning rate.
But like, for example, you prune, you prune Lama 2, and we don't know what the original
dataset was.
Yeah, I mean, well, so we kind of know that Lama 2 is a little bit similar to something
like Slim Pajama 1.
Okay.
And Lama 1, but yeah, it's, it is definitely a different data set.
We do know that Pyle and Slim Pajama have a fair bit of overlap in some things, but it is definitely a different distribution.
Yeah.
So this is a lot of work that our applied ML team is working on.
We're expanding that team currently, by the way.
So Cerebris is hiring for anybody who's interested in listening.
You can check out our website cerebris.net slash join dash us if you'd like to check it out.
out. Send us your resume and we'll take a look.
Yeah, thanks for spending some time with us.
Before we go, what's one Nureps tip that you want to give to people if they're attending
Neurip's? Like, you know, how do you do New York's right?
How do you do Nureps right? Well, so it's grown roughly 5x in the time that I've been attending
Nureps, so it gets more overwhelming every year, so pace yourself.
And I like, I like that they've kind of backed off about.
bit on the talks and in favor of poster sessions. Like just you got to go wander around. You got to
talk to people. Yeah. You got to check out posters and kind of let stuff sink in and ask questions.
So yeah. Yeah. Excellent. Well, thanks so much for your time. Definitely. Thanks. Thank you. That's it.
I think Cerebus is doing very interesting work here. Most people know them for their hardware,
but I think they're doing very interesting work on the software and LLM trading side. And I'd be interested
to have them on again in 2024. So next, we're going to,
go walk down the floor to Voxel 51, which is not a company I've actually come across before,
but it seems to be an interesting pair together with the next guest as well. So this is another
one of those situations where I get to put two competitors next to each other and let you decide
as to how they defer and how they talk about themselves. Sure, my name is Jason Corso. I'm the co-founder
and chief scientist at Vauxhall. I'm also on the faculty of Eeks and Robotics at the University
of Michigan. So Vauxhall 51 is a spin out of my lab. We make
a toolkit for AI engineers that sits on top of things like Pi Torch and TensorFlow.
And I think of it like a model and dataset debugger.
The key problem that we face is not that we can go download datasets and then train models
on them or even with foundation models, go pull one off the shelf and then expect it to work
exactly the way you want.
The problem is really the code development of a dataset to then go and actually use one of those
models or train or fine-tune your own model.
So 51 lets you represent the data that you use
or building alongside your models in a way that is extensible, visualizable, and flexible
so that you can write single Python lines of, like simple, single lines of code in Python,
to do queries of your data sets and your models.
Like, show me the corner cases where Model A is outperforming Model B and it's outdoors,
or show me, you know, intersections in my BDD data,
or let me visualize my embeddings that are either just vision or point cloud-based or
multimodal.
and then visually interact with them with lassoing on the 3D embedding.
Is the concept of active learning still in vogue, or is it not cool these days?
Well, I mean, so 51 is a pretty flexible ecosystem of capabilities.
The heart of it really is that data-centric data model of unstructured data.
So we support images, video, and point clouds.
You can, in fact, there's a blog that one of my colleagues at Vauxhall 51 wrote maybe like a month ago
on how to implement an active learning workflow on top.
top of 51. So it's possible. It seems like it will lend itself easily. Yeah, exactly. It's plausible.
I mean, the challenge with active learning is, you know, will just more data help or do you need
the right more data? Of course the right. I think that's kind of a, you know, it's, you know,
that's the question, I think, right? Is it primarily a vision that you work on or it's just anything?
Yeah, so my experience is in computer vision, mostly video understanding and imaging
problems. So that's where we got started. However, the software is pretty flexible, so you can
add your own data type, like, you know, we're considering adding audio, adding text, IOT,
you know, like temporal signals. But right now it's images video and point clouds. I've often heard
it say that, you know, the best researchers and the best engineers are really the people who get
their hands dirty in the datasets. Oh yeah, you have to get your hands dirty. And this is,
so in some sense, the whole company exists because I was worried no one was getting their hands
dirty enough. Yeah. Right? Like they were just expecting to take a data set, take a model,
and then train it once and then out pops like your usable thing. Yeah. No, that
It's not the way it works, right?
It's a hard, this is a hard problem in building intuition,
building a comfort or like an ability to take a 10 million sample data set
and find like the 1,000 samples that are giving you this problem here is hard to do.
And that's what 51 really let you do.
Yeah, yeah.
What's the one, actually, I have to ask.
Well, we had 50 bad ideas.
And this is the first, the one that was like actually good.
Well, that's the way we say now.
But the actual original way we got started as a company was as a video,
understanding as a service platform.
And so that's why, so the voxel in the name is in the space-time volume of pixels.
And 51 was just to elicit ideas of Area 51.
Like, can you find the right voxel?
Is it there?
That kind of thing.
We've subsequently way pivoted away from that, as most startups will do at some point
in their journey.
Yeah, it makes the domain easy to buy.
Sure.
Exactly.
So anything else people should know about your platform, like top use cases, top customers
that you always brag about?
Sure.
Well, I mean, it is open source, right?
So as long as you have the key, three,
key assumptions, local data, one user, one machine.
There's no limitation on the machine learning
that you can do with 51.
When you want to violate one of those assumptions,
like work on a team or work in the cloud or whatever,
then we have an enterprise product that you would talk to us
to purchase, basically.
And that's kind of like a Google Drive layer
on top of the open source one.
Very reasonable.
Yeah, the only, I mean, we sell to a lot of companies
do use it.
I'm not going to name them here.
You can go to the website.
There's a logo wall of those we can name.
Yeah.
But it would be great if you're listening to give us a GitHub star.
Yeah.
That's our, like, we're here at Nureps to get users.
Stars for swag.
Yeah, stars.
Yeah, excellent.
You published a guide to doing CVPR right.
I did.
We're here at NUrips.
What would be your guide for doing NUPS right?
So how to do NURPS right?
I think there's some key things of doing large conferences right.
One is like don't expect to do too much per day.
Yeah.
Right?
So what I've always done, even when conferences were like a quarter of the size or less,
Like for any one day, identify five to ten papers in the morning that I just want to understand
for that day, right?
So then I will make sure, though, to spend time with that poster presenter or at the oral talk.
To me, that's the key.
And then at the end of that day, I do tend to write a summary from my own brain, my own notes,
of what I did, like what the key points were from those papers.
That's definitely one, one winning strategy for a big conference like this.
All right.
Any other advice for people building,
or any papers that you're excited for this year?
Well, I mean, advice, I don't know.
If you don't know your data, then you don't know what you're doing,
is the way I would probably say it.
And indeed, like getting close to your data
is part of the model building process, right?
Like I just to say it again,
like I think of it as a co-development process
of datasets and models, not of a model training problem.
Yeah, I actually had a really interesting chat
with someone from Cerebras, actually,
where they talked about how they were doing e-vals
on their loss per region on a data set
as they were training their large language models
so that they could increase the exposure
on a specific subdomain if they saw that specifically
like loss was not progressing as well
in that particular subdomain.
So it's kind of like online training
and like watching their models evolve while they're training.
Yeah, I guess it sounds like on specific subsets
of the data.
Yes.
Which is really important.
Cool.
Well, thanks so much for your time.
Thanks very much.
Nice to chat with Sean.
Coming from data engineering is pretty interesting
to see this space develop.
It's interesting also that a lot of them emphasize open source,
which we'll see you with the next speaker, which is Brandon from NOMIC.
Who are you in? What's NOMIC?
Yeah, hey, everyone. My name is Brandon A.I. I'm a co-founder and CEO of NOMIC.
NOMIC is a company that does many things, but we have two main products right now.
One of them is GPD for All, which is an open source ecosystem of low-resource language models.
So it lets you do things like run, you know, Mistral 7B, fine-tuned on Open Orca on a MacBook
or, you know, some esoteric GPU, things like this.
The second product is a tool called Atlas.
It lets you explore massive unstructured data sets in your web browser.
Since we're here at NERIPS, a lot of people seem to respond to calling it massive, clickable T-S-N-E as a service.
Yes.
I was actually thinking, is it TSNI or UMAP?
Yeah, so it turns out if you squint closely enough, they're the same algorithm, up to a choice of low-dimensional kernel.
So we optimize the T-Snee objective function.
One of our uses of IP is we have the world's fastest optimizer for it.
So if you take, say, the Nvidia Rapids UMEP implementation, which is kind of a system,
kind of the fastest version of this in the wild,
off the shelf and run it on Wikipedia,
on the biggest machine on AWS.
It's going to take you a couple of days
to actually get that map,
and we can do it in about four hours.
Excellent.
It lets you make the maps part of your iterative daily workflow
as opposed to having a wait a week to get them.
Nice.
We'll throw a video on this on the show notes,
but maybe you could sort of narratively show what you're showing.
You showed a TikTok example and a Twitter example, right?
So these are really for visualizing massive multimodal datasets.
Yeah, so the fundamental thesis
behind the tool is that the shape of data that people have has fundamentally changed as a result of generative.
Instead of having these big Excel spreadsheets of tabular things, you know, have vectors plus metadata.
And we need to rethink visualization, you know, and the implications of that for the visualization stack.
You are kind of seeing at the database layer that's starting to penetrate with vector dbs and stuff,
but I think there's going to be radical kind of implications for that change all the way up the stack.
And so you can use it on, you know, getting back to your original question, Twitter data,
TikTok data, images, sounds, text, anything that you can stuff into a vector, which is pretty much
anything these days, you can map and you can understand.
Yeah. Can I bring my own custom embeddings and see the impact of that?
You can. So there's two ways to get data into the platform. One way is bring your own embeddings,
and then you just pip install NOMIC from NOMIC import Atlas, and then atlas.
You supply your embeddings, you supply metadata on top of them, and then a couple minutes later,
you'll get a web link back to a map where you can click on it and fly around it.
If you just have raw data, we have a bunch of out-of-the-box embedders that we develop
and we work with partners to develop that you can use to map it out the box as well.
Yeah.
And this is not open-source, but GPC for all is...
So there are aspects of the platform that are open-source.
The entire thing runs on a graphics engine that we developed called Deep Scatter.
It's the only tool out there that can render a billion-point scatter plots in a web browser.
And then do that, you have to, again, kind of fundamentally rethink how graphics in the browser
works from the ground up.
That is available source, but unfortunately it is not fully open-source.
It's okay.
Yeah, you don't have to...
to apologize for anything. I do have to. You know, I wish we could open source everything,
but like we are unfortunately subject to capitalism and so we cannot. But in the limit,
I would love to open source everything. I also maybe heard you in another introduction
talked to talk about this as like Looker for language models. Like what, like, elaborate more
about that? Like yeah. Do you have a query language? Like what are you thinking about as the overall
vision? Yeah. So I want to bring it back to the analogy of like the new shape of data
disrupting the stack, right? So the first place we see it hitting is at the database layer.
things, you know, we see vector databases,
there's a million of them nowadays. I think that that
change is going to propagate all the way up the stack
and we are interested in, you know, what happens
to the BI analytics, you know, visualization
layer. And so really what we're thinking
of this as is sort of like a tableau
for unstructured data or a looker, power, BI,
or something like this, where
we've built the entire visualization system
with embeddings as a first class citizen. And so
that enables a lot of different actions.
Some are already in the platform, some I can't
tease yet, unfortunately.
But having embeddings as a first
class primitive enables a lot of very, very useful things that you're not going to be able to get unless you have that.
What do people use atlas for? Like just maybe list out some more use cases that might not be obvious from people just thinking about visualization.
Yes, we'll start with the most technical and we'll go to the least technical.
A lot of ML engineers use it to understand and evaluate their models and training data.
So we just did some work with Hugging Face on their Obelix data set, which they use the train their IDEFIX model,
doing some evaluation and training data analysis, looking at what is.
We actually interviewed those guys. I was in Paris and I talked to Leo and Victor.
Yeah, those guys are sick.
But yes, we worked with them on this and we discovered, you know, a couple of things and they're
trained in it. They should have like actually cleaned out of it. There was like a bunch of,
you know, end of sentence token to be replaced that made it through stuff like this, some really garbage content.
Do you do anomaly detection? Or is that up to people to code themselves?
Yeah, so the anomalies usually manifest is like the little moons on the outside of the map.
Sure, okay. And then you can just like hit them with the little lasso tool and stuff like this.
But one of the things about the hugging face map that I found fascinating was because we supply
like a topic model out of the box, you can look at things like, are there topics where
the loss tends to like cluster together?
And for the hugging face model, there was this high loss mode in the poetry topic, which
I thought was super interesting.
And so I've got two theories for it.
One is that poetry includes the distinct subversion of like common linguistic patterns.
And so of course language models will be bad at it.
But the more perhaps optimistic theory is that poetry captures something that's fundamental
fundamentally human that the machines have not grasped yet.
The pragmatic version, I think, is probably what's happening, but I like to be optimistic.
Idafix is a visual data set, and you are multimodal.
Yeah.
Okay.
So they have poetry in there.
Yep.
Interesting.
It's sort of interleaved webpages of like it'll be an image and then some poetry.
Right.
So that's the more technical side.
And then coming down to the less technical side, you know, a lot of our customer base at this point
is like consulting type companies.
And they find the product really useful for connecting domain experts with large data
sets. So generally what will happen is you'll have these domain experts, be it like a doctor or
someone in regulation, someone with subject matter expertise that will be handed this massive set
of documents from a client and be like, I don't even know what to start. I don't even know what's in
this. And so a couple of the consulting partners we work with actually now have a KPI that's like
time to Atlas where it's like how quickly from the data set hitting the company does it get to
Atlas so that we can send an analyst to the map and they can start to explore it. And so we're
really excited about enabling sort of traditionally non-technical people to explore and analyze
these massive data sets with this no-code interface. In what you should do, you should hook up with
Google, doesn't Google have a big set of publicly available datasets? Yeah, so we've actually
done a couple of collaborations with Google Cloud on some of those data sets. We can maybe link the
blog posts or something. Sure, yeah, yeah. Okay, awesome. Just Nureps in general, you've been here a number
of years. What do you look for when you come to Norebs? Any tips that you have for people coming to
Noreps? Oh, that's a good one. Yeah. Big tip is just like if you see
someone cool, like they're probably nice, so chase them down and like have them talk to you.
Shove a microphone in their face. Yeah, yeah. No, I love it. But it was like my second
nerves or something, I saw Oriole Vignoles walk by and he had just done like the Starcraft stuff.
And I was like, okay, this guy is sick. He's doing some really cutting edge stuff. So I like ran up
and asked him for life advice. And he was so down the earth and like shudder with me for a bunch
of time about like modeling and life and you know how to think about my career and stuff. And so like,
Yeah, if you see a hero, like shoot your shot.
Yeah, very, very cool.
Any papers that you're keen on this year or like maybe really affected you in previous years?
Oh, that's a good too.
This year I think Q Laura's here, which I think is like a very, very interesting.
Timmy thinking tomorrow.
Yeah, yeah, yeah, yeah.
It's a very interesting set of implications for like the low resource world.
Can you elaborate?
Yeah, so one of the things we think a lot about at NOMIC is the accessibility of AI technology.
And one of the things that's become very clear to us.
And I think everyone this year is like there's the change.
GPU rich and the GPU poor. And so I think methods that make it so that anyone in the world
can interact with this technology like Q Laura are just like so, so, so valuable. And so I think
any research into like low resource training of models and low resource deployment of models is just
going to be so good for everybody, especially like the open source community that I really love to
see it. Yeah. You just reminded me. So talking about, we forgot to talk about GP for all. Yeah.
Very, very early win, I think, in the overall space of things. But now more recently in my mind,
Lama CPP has come out to be its own platform.
Old Lama is emerging as a thing.
There's a bunch of ways in which people run models locally.
How should people think about GPP for All in the context of all that?
Yeah, so one thing that a lot of people don't realize
is that a lot of the core contributors to Lama CPP actually work at NOMIC.
And so I guess the operant advice here is just like play nice with open source, right?
Like GPD for all is this thing that's going to be free forever for our community.
We're going to keep trying to improve it as, you know, our Discord recommends it as people call.
for, but, you know, if we can do things like go and, you know, contribute to other open source
projects that are high impact, we're going to, right? And so the hope here is that, like,
you know, as economic pressures apply, like open source stays collaborative is really,
really the goal for us, I think. Okay, cool. Well, that's it. Any other last words? What are you
looking for? How do people find you? Yeah, you can follow us on Twitter at nomic underscore AI.
You can also find our website, nomic.a.com. Hiring engineers, researchers, what?
Yeah, I mean, we're always looking for super interesting people.
Yeah, come chat about interesting things in our Discord, really.
You can just go to our website and stuff.
But really the best way to get involved is like make some maps, do some open source work.
Like a lot of the people that we hired in this last kind of spree of hiring were like big open source contributors.
And so like, yeah, just give back to the community and then, you know, we'll try and find you and boost you.
Yeah, awesome.
Well, thanks so much for your time.
Yeah, take care.
I think the way that Nomics and embracing and supporting open source AI is encouraging
and I think more companies should learn from that,
but they're definitely far from the only open source AI company out there.
Lightning AI is one of the oldest, I guess,
if you can call that old in the space.
And I happened to catch Luca, the CTO, at their booth.
And at Neurips, they were there to launch Lightning Studio,
which is their new development environment.
Hey, Luca. Welcome.
Good to see that you guys are launching a new product today.
Yeah, sure.
It's super exciting.
It's the result of many months, if not years of work and realizations.
So maybe let's establish a baseline.
Most people will have heard of PITTorch Lightning.
What was the evolution to Lightning AI?
So PITURCH Lightning is a very healthy,
has a very healthy community of people using it.
We are 5.5 downloads, about $80 million downloads in,
sorry, 5.5 million downloads, of course.
Per month, about $80 million downloads in total.
And it's one of the frameworks that comes from the era of
traditional quote unquote deep learning
that is one of the main actors in the Gen.
genii space because for example stable diffusion was trained using Pytuch lining a bunch
of models Pytish lining powers Nemo from Nvidia yeah the their custom chip design
language model yeah so basically Pytuch lining has evolved and grown grown into
gen AI and with the release of 2.0 2.1 we've tried to make it better and better for use
cases which you have very large models and you have a hard time not
going out of memory.
So, and do distributed with Pyttorch landing
has always been very focused on distributed trainings.
One of the things that he did the best.
But when models get very, very large,
I think that's where we improved a lot this year.
We also launched fabric, Lightning Fabric,
which is a, it's a framework,
it's a companion framework to Pytosh Landing,
where you get all the constituent
of the Lightning Trainer, but now you can write your own training loops.
So for people doing very optimized, very bespoke,
I don't know, collected calls, they want to place them where they want,
they want to fully own the training loop,
or they're doing stuff like reinforcement learnings,
where it's not the traditional training loop.
You can still do it with the trainer, but it's a bit more difficult.
Then, Fabric lets you just ride your four loops.
But we'll still abstract away strategies, precision plugins,
the login, the aggregation of metrics, and all this stuff.
I like to think about these frameworks as frameworks
that reduce the surface area for mistakes.
Because mistakes nowadays, well, a few years ago,
mistakes, exactly, right?
They cost it a lot of time to a PhD student.
Right now they cost a lot of money.
So you don't want to make too many mistakes there.
And Torch Matrix is a third.
project that we have that is very healthy and it's powering a lot of the metric computation.
Again, you don't want to compute accuracy and aggregate it across a multi-machine job in a wrong way,
right? Because you'll get wrong indications and it's really easy to do it incorrectly.
Yeah.
And this year we started doing...
Yeah, I should mention these are mostly open source.
Yeah, these are all 100% of source.
I think fabric in particular was pretty popular.
Yeah, yeah. So fabric has powered also.
our language model repositories,
Lit Lama and Lid GPP.
Basically back when Lama was originally released,
the ways were leaked.
The weights were leaked.
Yeah, the ways, exactly, exactly.
But at some point there was a model
being published by META as well.
It was a GPL license, so we didn't really like that.
And so we say, why don't we take?
nano-GPT because I was working with
nano-GPD at the time and turning it into
Lama and that started the whole thing of
minimal implementation single file you have
everything there you have no layers to go through to
understand how your layers are and that
became something that it became very
popular within many organizations so it's still
very popular so the LLM efficiency challenge
the starter kit had LGPT in it
And LigPT today supports many models, many different models, but it's very easy to get to the bottom of the implementation of every single thing.
So it's very hackable.
It's one my philosophy is file.
Make it hackable because before you make it fast, right?
Because more people can contribute to it and we have contributors being very successful.
There have been initiatives of models being pre-trained using that, like Tiny Lama and 360 AI, I think.
a few days ago came out and they said they used Lid Lama to pre-trained their 7 billion
parameter models.
So it's great.
And a lot of those learnings went back into fabric and back into Pitechus planet.
And this is how we're kind of growing organically towards supporting JNAI use cases.
There's an example of one of those learnings from those outside usage of Lit Lama, I guess.
Sorry, what's an example of one of those learnings that you got from like 360 contributing back?
Well, 360 is very young in the sense that we just learned, I think, the day before yesterday
that they used us.
So it's great.
We're very happy about that.
From Tiny Lama, they did some optimizations on top of our code.
And they trained a 1.1 billion parameter model on 3 trillion tokens.
I think they're still doing that.
I don't think they're done.
And then some of the improvements that they made and we upstreamed.
it to our, like for example, I think chunk cross entropy, some kernels that they were using.
And then we were happy to see that even our data set that we optimize because it chunks your
data and it can stream very quickly work for them.
So it's kind of a mutual thing that we're doing.
And also, all the quantization support, for example, right now, Fabric and Pytarch,
designing support bits and bytes natively.
And it's basically one of the few solutions where you can use quantization on any kind of model,
and not just the model that the original authors decided to support.
Yeah, it's kind of flexible.
But here today, I think the main thing we're doing today is launching our platform.
Yeah, you just launched the studio today.
Yeah, exactly.
Aliny Studio, again, is a result of many months and years of work.
It basically makes you a build AI at scale, but it feels like it's your laptop.
So to me, it's kind of the first time I've seen a platform not leaking the obstruction of orchestration on the cloud and so on.
Literally, there's nothing to learn.
You put VS code in the browser and then you add all that.
You can even connect from your local VS code and code there.
You have the whole machine, like it's a whole machine.
Yeah, it's a cloud development environment.
Exactly.
And it's built around reproducible environment.
Yeah, exactly.
But when you go in there, it's not that you need to build your Docker container.
You just go in there, you present it with a machine, you can start working immediately.
If you keep installed something and then you decide to switch instance type, your dependencies will carry over.
Or if you decide to duplicate my studio, everything that is set up on that studio, from the environment to the data,
the data, the code, the checkpoints eventually that you put there, you will find them.
And so you will spend zero time setting up your environment.
Are you snapshoting memory?
How does this work?
Well, that's secret sauce.
Secret sauce.
You're not using containers, you said.
Yeah, well, I mean, we do, like, what, if you think about it, then it all, it's not
too complicated.
Fundamentally, but it's very complicated to actually get the perfect experience out of it.
Like maybe your design constraints.
What are you optimizing for?
We're optimizing for velocity.
So we don't want people to spend time thinking about things they shouldn't think about.
Yeah.
Like when you're on, you're a coding on a machine and you now want 4 GPUs,
you should just be able to get 4 GPUs and keep working, right?
Without thinking about, oh, now I need to go to a console, spin things up,
for my environment, attached drives.
Like, these are all things you shouldn't think about.
And again, it goes back to limiting the surface area for mistakes, right?
Because you can do what you're good at and not do what you shouldn't mess with.
It's like the fabric philosophy that's sort of expanded to the dev environment.
Exactly.
Yeah.
So, yeah, we're very excited.
You can do small things like in KOLA, except that your data is persistent and you can switch off and switch on and everything will be there.
Yeah.
Or you can even train large language.
models. Yeah. What are the larger customers doing? You know, what are you doing for them?
Because I feel like this might be targeted towards the smaller customers. No, actually, we run with,
we work with very big financial institutions. And we're actually pre-training models ourselves.
So the scale at which you can operate is pretty large. It's not like, it looks like something that
you can do small stuff with, which is true. Yeah.
It's super smooth there, but if you need to launch a job on 100 GPUs, you can just do it,
provided that you have the machines. But we manage reservations, so we can target reservations,
or you can attach your own cloud account and negotiate your photos with your cloud provider,
and we'll just orchestrate on your cloud account.
Yeah. Any cloud providers you would shout out as particularly, I mean, people know the big three clouds,
but any other providers that you were shot out as very good partners to work with so far?
Right now, we have been focusing on AWS.
We'll expand, of course, because...
Yeah, everyone needs everyone else.
Yeah, exactly.
Apparently, Oracle's doing very well.
Yeah, yeah, yeah.
We talked to Oracle.
We talked to most of the cloud providers out there.
To us, it's more a matter of sequencing.
Yeah.
We have a very good relationship, of course, with AWS right now.
They've been supporting us for the launch and so on.
But surely we'll get into getting the best machines for customers.
Yeah.
And in the near future, we'll also see.
support on-prank clusters in terms of orchestration, like SLRM as an orchestrator or as a scheduler.
People have mixed feelings about SLRM.
Well, yeah, but in this case, you do not have to deal with it, right?
Yes, yeah, yeah, yeah.
We take away the pain, and you still can orchestrate on top of that.
It's still not out, but it will come in the near future.
Yeah, we're already doing that with some companies.
Yeah.
So I want to talk about the workshop that you're doing on Friday, your efficiency challenge.
Was it motivated by a paper?
I saw it sort of a cramming paper.
What's the maximum you can do with one day of compute, something like that?
Yeah.
So we noticed because Mark Saru Fem and the other organizers ended up choosing LGBT as one of the models for Starkey.
And we were happy about it, of course.
And so we say, yeah, what we can do together.
and we ended up and we really like the principle so we believe you know smaller models can empower
you know people a lot getting control and understanding how to extract value from AI
and so I think there's a dire need of consolidation getting smaller getting more efficiency
and getting the result you want in the shortest time as possible and that's how the velocity
will increase and how eventually open source will get there on par if not beyond what's available in the closed source world.
So we are fully supported of that.
The way we ended up contributing is we maintained a public leaderboard.
And it was a nice experience because we integrated with Discord.
There is a Discord channel.
This is for the Efficiency Challenge Discord.
Yeah, for exactly the efficiency challenge discord.
And we set up an agent that was running on a few of our machines.
And people could submit through a DM to the bot so that the bot would then spin out a job,
run things in a queue, get back the results from L evaluation,
and then essentially get a ranking on where they were.
Yeah, and that I think helped a lot motivating people to compete against each other.
Yeah, but in a very constructive way.
And to be honest, in the first month, it's been very, very bumpy with that.
Like, it was all new infrastructure and we were doing kind of in the spare time.
So it wasn't the best of the experience.
So together with the community that wasn't there, they helped us figure out like what was not going to be well.
And I think at the end we had like more than a thousand.
submissions that were successful like yeah many more submissions that didn't
complete because of submission problems yeah like user code problems there were
more than a thousand submissions that were actually fully evaluated yeah on that
leaderboard did your so the challenge is over but I don't know if you've done the
analysis on anything you to learn from the winning entries so we've been
the the rest of the organizers yeah put a lot of effort in the next three weeks
four weeks to re-evaluate everything, run the first ones from scratch, and they've done an
amazing job. And some of the code that we wrote for the public leaderboard ended up being
part of this evaluation infrastructure. I was very, very busy with the launch.
Of course. So I didn't participate there. So I'll try to talk to Sebastian.
On Friday. Yeah. Yeah. Okay. What would be the details from the details from
from the winners.
Yes.
But I may say that all the community has been super nice.
They were being super constructed.
I remember when Mistral first came out,
there was a huge thread of people just getting in there,
analyzing it, trying to find it.
It was so much energy that we definitely want to push it forward
and we'll create public studios with evaluation frameworks on them.
So studios are shareable, of course.
Yes.
Yes, that makes sense.
shareable between you and me, but also community-wide.
So there will be a lot of things.
You can go in there, use your pre-credits to just run
the evaluation on your model.
You can do that.
Yeah, great.
So last question.
I've been asking everybody this.
You've been coming to NewRibs for many years.
What are your NewRubs tips?
Oh, wow.
Yeah, go to posters, which I cannot do.
But, okay, I've been, I've been to, like, you know,
there's been three poster sessions so far.
the popular ones are just crowded
there's just no way
I think there's not
I don't like to go to popular ones
yeah
yeah okay the less popular ones
and talk to them
so I had so many
super engaging conversations
even in topics like
even something is not
apparently
something that you should focus on
your brain will oxygenate itself
a lot and typically after these
conferences I always come back
with a I had full of ideas
and so
I would say get enriched as much as you can by interacting with people having very honest
conversation with them.
I had an off-the-record conversation with one of the presenters who said like, yeah,
I don't, like, this paper I'm presenting, I don't believe in it.
I was like, wow, that's really honest.
Because like they were submitted it months ago, right?
And since then, the world has moved on.
Yeah, yeah, well, that's part of the, you know, the struggle.
I don't know how it must be.
being a postdoc or
or PhD students
or even a master's students
nowadays in AI
it must be like so stressful
back in the day was a lot easier
you know the prices have got bigger
so yeah yeah okay well
thank you so much for your time
and congrats on your lunch
yeah I think we'll
you know meet each other on the platform maybe
yes I definitely will try it up
thanks a lot bye
a few of my AI engineer
and ML engineer friends checked out lightning
studio and they were pretty impressed so I'm personally interested to check it out next year.
But last and not least, I want to give the mic to Jay Alamar of Kohir and LLM University,
but more importantly of the Illustrated Transformer and is now writing a new book.
We're here with Jay Almar, educator of many things. I've learned so much from you.
I literally one of those moments where at New Yorkshire you just kind of see someone walking
and I'm like, is that Jay? And then I had to get your attention a few times.
But it's so nice to finally meet you.
It's great to meet you and great to be here and sort of meet all kinds of brilliant folks.
I've watched your stuff and sort of been watching the revolution and how you're helping
sort of crystallize people's thinking about this new domain of AI engineering.
Yeah.
The title is very helpful as categorizing that class.
Yeah, trying to do for my audience what you do for just general ML education,
which is something that you've really done an incredible job of.
Yeah, no, that's wonderful. It's what the community needs, definitely. As machine learning and AI sort of goes out of research and goes into industry and people.
It's a different persona, different background. The kind of people, one of the reasons I'm doing this recording is, like, the kind of people that follow my stuff don't come here.
Yeah.
And maybe they shouldn't, right? Like, some of this is like too in depth.
True.
But I'm curious, like, you know, so you've been to many nerves. Like, what is your general take of the vibe? What are people talking about? What's top of mind?
There's a lot of LLMs that's interesting to see.
Suddenly a lot of interest, right?
Yes.
Yes, that is, let's say, maybe possibly a new development in NERPS.
That's the area that's growing.
And a couple of interesting keywords or groups or directions are diffusion, even diffusion for text models.
There was a paper on that yesterday.
Yeah.
I'm not too, what's the point of diffusion for text?
Don't people want to stream things out?
Well, I mean, if you think of, like, on the application side,
auto-regressive generation has some problems.
So if the model makes a mistake with token 5, you're stuck with that problem.
Yes, that's what Tree of Thought solves, which the Tree of Thought guy was here.
Yeah, it's always, yeah, it's one, let's say one avenue.
Yes.
But it's like maybe if you, if the model does not fall within, in a mistake in that way,
you can unlock new sort of different applications.
But also all the image generation stuff, that's really where.
I'll make a plug.
I actually had a...
So there's a lot of house parties
that happen after Nereps,
which is fantastic.
I ran into a guy
from Mid Journey for the first time.
They have this new storytelling
section,
and they are actually exploring text diffusion
because of storytelling,
because you have to generate a coherent story
just like you would an image.
True, true.
So I would buy that as a use case.
That is fascinating.
And then with the agent stuff,
like you're interesting
in the future of agents,
which is, there's a lot of the reasoning stuff.
The reasoning research in Nureps
will most likely sort of inform the upcoming what's going to happen in agents next.
So chain of thought, three of thought, that domain of research for me is very fascinating
because it's going to be applied very quickly.
The React paper comes out.
It's in line chain.
Everybody's sort of using it.
Everybody has a sense of what agents are.
But that really shows you the potential of what they're going to be in the future.
We're still in early days on agents.
Any other top of mind sessions?
Did you go to the Chris Ray Run as most?
I thought that was pretty cool.
I have that one and a couple of the other sort of key notes.
I'll be re-watching them.
Yeah.
But mostly I'm just talking with people, the recording video.
That's been my, and sort of trying to orient myself.
It's an overwhelming amount of content and people and posters and talks.
And so I've been, yeah, looking at visualizations of, you know, these are the papers at Nureps.
These are the ones that could be interesting to you.
Yeah, people have published like TSNE things of them and it's good, but like it's not as good as just kind of seeing the vibes.
And I actually think the conference organizers do a good job of curating
like what the, you know, oral session papers should be.
True.
You know, like I generally found them like generally very insightful.
I just found out about data comp from one of the oral sessions.
Okay.
I don't know if you've seen them.
No.
Which is like, oh, that's cool.
New matchwork.
And yeah, I mean, like it's, to me, I'm taking it all in.
So it's impressive how many people do so much work and you've never heard of them.
That's true.
That's true.
Yeah.
And they're conversant in all.
the techniques, all the papers, all the stuff that you see online.
They're just not online.
And they just do research quietly.
And then once a year they show up here.
Yeah.
And sometimes you meet somebody here and you're like, and they would mention, they worked
on that other paper.
And it's a paper that you're very familiar with.
And then you go into their Google Scholar or something and you're like, I've been reading
this person's work for years, but the name never really sort of specifically popped up
until you meet them in person.
So that's why it's definitely an interesting experience.
No particular order, but have you had those, like any underrated person you would call out as like, hey, everyone should pay more attention to the work that this person is doing?
One thing that comes across, which is why workshops are good, and we can get into that sort of, like, there's David Bow's work on interpretability and editing language models and editing their knowledge was one thing that sort of really stood out to me after I've, you know, met David and sort of heard about his work.
Editing by editing weights?
Yeah, they have a method of editing.
the model exactly yes there's a is this the one where they played go this is
this is where they convinced a model using that method that the Eiffel Tower is in
Rome and then they have subsequent methods of let's say if you make a hundred
edits like that the model degrades so they have subsequent work on okay this is a
better method to do many more of that but also things like I mean we've seen
like logit lens and sort of where in the model is this token
being suggested.
Like, is that layer one or is that layer five
or that localization is
interesting work.
Yeah. So you do all these interviews on your YouTube.
We'll send people there. Is this part of your work
at Cohere? A little bit, yes.
What is your deal?
Yes. That's true.
So a bunch of them go on the Cohere
YouTube channel and the Cohere socials as well.
So yeah, my work at Cohere, I get to
learn in public, basically.
I love that. So Cohere builds language models for
embeddings, re-ranking, and generation.
And through selling them, I get to see how industry is solving problems with them.
And that, to me, is very fascinating.
To see the technology coming out of research and then how it goes into industry and how people
use them, how people sort of need to be educated on the best ways of using them.
That view to me is something I'm lucky to have.
Yeah, yeah, it's a good job to get, I'll be honest, if you love that stuff,
you might as well get paid to do it.
You probably don't notice, but I actually have written a book on learning in public,
and I am a big advocate of getting developers and engineers to learn in public.
Well, you do it so well.
Yeah, this is my way of doing it.
The final piece is, you know, you've written a lot of foundational work on transformers.
A lot of people are talking about the state-based models and what happens after the transformers.
Do you have personal views on that?
Not yet. Not yet. I'm on the lookout.
So there are always new ideas that, you know, there's maybe poster number 5002 here that nobody paid
attention to. Yes.
Maybe in six months we'll see that, oh, it crushes everything.
everything else on. So that is always something you can never sort of expect.
Yeah. My favorite fact about the Transformers paper is it itself was not accepted as it was like a
poster-only paper, right? I don't know the story behind that. It was a big deal for machine
translation. Yes. But it's like, okay, yeah, there's a, there's a cool translation paper. It's one
of many, right? Yeah, we already have Bert. One new attention method. We had Badano attention and
we had long attention and like now we have also, you know, one more. But then Bert comes out
and it's like, okay, this is more than translation. And then GPT comes out and it's like, oh,
I'm still missing a good survey paper on everything that happens since attention is all you need.
Like the evolution towards the modern decoder-only paradigm.
And I feel like someone needs to write that.
Everyone's too busy inventing new things to stop and write.
That's true.
Because it's a massive thing.
There are a few people who, because there's a lot of work on different kinds of attention.
For the Transformers specifically, how to improve it for this problem, that problem.
But one thing that I'm doing is re-referring.
writing the Illustrated Transformer with the ideas that have stood the test of time since then.
So it's like six years after, which ideas, so people are using
flash attention, flash attention, and then, yeah, rope and alibi, let's say, positional
encoding, localized attention. So some ideas that people are continuing to use over and
group query. Group query, yes. And then sliding window? Not yet. So it's in Mistral, but I, you know,
I need, we maybe need to see it in more work.
My conspiracy theory about Mistral is that,
so the Mitchell paper heavily features
slightly with no attention, and everyone is like bullshit.
Yeah, I mean, because you see it, you saw that.
For two years after the Transform,
everybody was proposing new ideas.
And if you put this in the Transformer,
it does better on this.
But then what stand the test of time?
The Vanilla Transformer really stood the test of time
and did better than even a lot of these enhanced,
quote-unquote, enhancements.
But these ones, let's say, stood the test of time.
So this rewriting is going to be part of the book.
I'm sort of currently row-riding.
Are you writing a book?
Yes.
Writing a book for O'Reilly called Hands-on Large Language Models, including this as eight chapters.
So if you want an updated, illustrated transformer, that's going to be a part of it.
When you launch your book, you should come on and do a full episode with us.
That's amazing.
Yeah, exactly.
And then just general new rep's tips, you know, as an attendee.
Like, if people are coming for the first time, what would you advise them to do?
I really love the visualization by, I posted about this,
by Hendrik Strobel and Ben Hoover of the Tisney of all the papers,
but also it's clustered, so if you're interested in language models,
that is clustered.
I use that for my planning.
It's so useful.
Like, these things are absolutely incredible.
I got to meet Hendrik, and they have so a lot of very interesting ideas there,
and helped you sort of orient yourself.
And I've also seen work kind of like it,
but where you can do semantic search on,
so you can say, you know, agent papers,
and it doesn't need to match the actual keywords.
With Gohere, we have a demo on RAG on Europe's papers as well.
So you can ask a question.
You're like, okay, I'm interested in LLM in efficiency.
It'll say, okay, this paper, this paper, this paper.
And it's retrieval augmented sort of generation.
So these are the three tools, but I think we need a lot more of these tools to make sense of...
Yeah, I need it for the meetups too.
You know, in the conference app, there's all these like meetups for very specific things.
That's true.
I started one for Singaporeans because I'm a Singaporean in tech.
And yeah, there's just a bunch of very, very specific, like running meetups,
like nothing to do with tech specifically, but like, you know, this is also a social event, right?
Like that you're meeting meeting.
You wouldn't happen to be at MNLP.
No, why?
Some people did that because it was like last week.
And some people went to MNLP in Singapore and then flew back here.
That's a tough call.
Yeah, I'm not going to do that.
That's wrong.
That's wrong.
Well, thanks very much.
It's pleasure to have you on, pleasure to meet in person.
good to meet, love your work.
Thank you.
Any calls to action for people?
Well, I'm J. Al-Amar on Twitter and YouTube,
and we have LLM University.
LLM.DUrSty.
Like I collaborate with Luis and Mior Amur.
Some of the best YouTube,
very short, but very comprehensive, authoritative.
I'm very lucky to collaborate with these folks.
It's incredible.
But yeah, thanks for doing all that.
Thank you.
Appreciate it.
Okay.
And that's it.
for our New York's coverage and for Layton Space Pod in 2023. We are still doing a listener
survey, so if you are listening through here, you're definitely a big fan. We definitely want to
hear from you what you like about the podcast, what do you want to hear for 2024. We've got a
couple of really good episodes already recorded for the start of 2024, so we're going to start
the year strong and come out to the one year anniversary of Layton Space. So thanks for all your
support. Have a wonderful end of the year, and we'll see you soon. DJ, hit the outro.
Thank you.
