Microarch Club - 10: Thomas Sohmers
Episode Date: February 28, 2024Thomas Sohmers joins to discuss dropping out of high school at age 17 to start a chip company, lessons from the successes and failures of past processor architectures, the history of VLIW, an...d the new AI hardware appliances he and his team are building at Positron AI.Thomas on X: https://twitter.com/trsohmersThomas' Site: https://www.trsohmers.com/Show NotesWelcome Thomas Sohmers (00:01:22)Growing Up Around Computers (00:03:13)Digging Beneath the Software (00:05:56)Learning Python, C, and Arduino C (00:07:05) https://www.arduino.cc/reference/en/Learning About the Thiel Fellowship (00:07:44) https://thielfellowship.org/Starting Research at MIT at age 14 (00:09:24)Dropping out of High School and Starting Thiel Fellowship at age 17 (00:10:36)MIT ISN Lab (00:11:09) https://isn.mit.edu/Evaluating ARM Processors for High Performance Computing (00:11:28) https://en.wikipedia.org/wiki/ARM_architecture_familyARM Calxeda Processor (00:11:38) https://en.wikipedia.org/wiki/Calxedahttps://www.zdnet.com/article/what-the-death-of-calxeda-means-for-the-future-of-microservers/Scaling Out Low Power Processors for Data Center Compute (00:12:27)Incorporating REX Computing (00:13:42) http://rexcomputing.com/https://fortune.com/2015/07/21/rex-computing/Facebook and the Open Compute Project (00:14:18) https://www.opencompute.org/Deciding Against Arm (00:14:49)ARMv8 (00:15:12) https://en.wikichip.org/wiki/arm/armv8Deciding to Design a New Architecture (00:16:26)Multiflow (00:18:23) https://en.wikipedia.org/wiki/MultiflowGood Architecture Ideas from the Past (00:18:35)Thomas' Talk at Stanford (00:18:59) https://youtu.be/ki6jVXZM2XURISC vs. CISC Debate (00:19:37) https://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/risccisc/SPARC Instruction Set (00:20:04) https://en.wikipedia.org/wiki/SPARCThe Importance of History (00:20:58)RISC Came Before CISC (00:23:08)CDC 6600 (00:23:20) https://en.wikipedia.org/wiki/CDC_6600Load-Store Architecture (00:23:53) https://en.wikipedia.org/wiki/Load–store_architectureIBM System/360 (00:24:02) https://en.wikipedia.org/wiki/IBM_System/360PowerPC (00:24:29) https://en.wikipedia.org/wiki/PowerPCVLIW (00:25:02) https://en.wikipedia.org/wiki/Very_long_instruction_wordELI-512 and Josh Fisher (00:25:05) https://dl.acm.org/doi/pdf/10.1145/800046.801649https://en.wikipedia.org/wiki/Josh_FisherFloating Point Systems, Inc. (FPS) (00:26:45) https://en.wikipedia.org/wiki/Floating_Point_SystemsMultiflow Compiler (00:26:52) https://www.cs.yale.edu/publications/techreports/tr364.pdfInstruction Level Parallelism (00:27:33) https://en.wikipedia.org/wiki/Instruction-level_parallelismIntel Itanium (00:28:20) https://en.wikipedia.org/wiki/ItaniumItanium is not a VLIW Architecture (00:29:04)Explicitly Parallel Instruction Computer (EPIC) (00:29:22) https://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computingx86 and Pentium (00:30:18) https://en.wikipedia.org/wiki/X86https://en.wikipedia.org/wiki/PentiumImpact of Branch Prediction and Caching on Determinism (00:31:34) https://en.wikipedia.org/wiki/Branch_predictorhttps://en.wikipedia.org/wiki/CPU_cacheWhy Itanium Failed (00:32:27)REX's NEO Architecture (00:35:29) http://rexcomputing.com/#neoarchHard Real-Time Determinism (00:35:41)Scratchpad Memory (00:35:54) https://en.wikipedia.org/wiki/Scratchpad_memoryRemoving Memory Management (TLB, MMU, etc.) (00:36:18) https://en.wikipedia.org/wiki/Translation_lookaside_bufferhttps://en.wikipedia.org/wiki/Memory_management_unitALU, FPU, and Register Files (00:37:14) https://en.wikipedia.org/wiki/Arithmetic_logic_unithttps://en.wikipedia.org/wiki/Floating-point_unithttps://en.wikipedia.org/wiki/Register_fileBenefits of Removing Implicit Caching Layers (00:38:30)VLIW in Signal Processing (00:39:51) https://en.wikipedia.org/wiki/Digital_signal_processorVLIW Won in a Silent Way (00:40:49)Original Reason for Hardware-Managed Caching (00:41:26)Impact of VLIW and Software-Managed Memory on Compile Times (00:42:41) http://www.ai.mit.edu/projects/aries/Documents/vliw.pdfLLVM and Sufficiently Advanced Open Source Compilers (00:42:49) https://llvm.org/Apple Transition from PowerPC to x86 to Arm (00:43:31) https://en.wikipedia.org/wi...
Transcript
Discussion (0)
Hey folks, Dan here. Today on the MicroArchClub podcast, I am joined by Thomas Summers.
Thomas is currently the founder and CEO of Positron AI, which recently emerged from stealth with its first product, Atlas,
which is a transformer inference appliance.
We talk about what makes Atlas different than traditional inference hardware,
and how Positron is able to deliver significantly higher performance per dollar compared to current GPU solutions.
However, we start out at the beginning of Thomas' fascinating journey, which began with dropping out
of high school to start a chip company at age 17. We cover his experience in the Thiel Fellowship
program, why there's a need for a new instruction set architecture, and how he and a team of three
other engineers were able to tape out and deliver a new chip in a highly compressed time frame. Thomas is an ardent student of computing history, which made this discussion
a ton of fun as we connect the dots from research papers in the 70s and 80s to the cutting edge
processors of today. Before we jump in, I want to thank Michael, who is the co-founder and CTO at
Lambda, for initially connecting Thomas and I a few months ago. With that,
let's get into the conversation. All right. Hey, Thomas, welcome to the show.
Hey, thanks for having me.
Absolutely. For folks who don't have contacts, which I guess is everyone,
Thomas and I spoke a few months back. Michael over at Lambda was nice enough to connect me
with Thomas. I know you all have worked together in the past. And I was just trying to talk to folks
who had experience that I was interested in as I was learning more about processor design and
trying to learn more about the industry. And Thomas, you were nice enough to spend some time
chatting with me. And then you have a new venture that you're working on, which I'm sure we'll talk
about at some point in this show. So now is kind of a great time as you all have come out of stealth
to circle back around and have you on the show. Yeah, well, thanks again for inviting me. And
yeah, always excited to be able to talk about computer architecture and computer history.
Absolutely. Well, you know, it's interesting. I as a little behind the scenes here, I guess, I haven't released any episodes yet.
So I've just been kind of doing recordings of this show.
And most of the folks I've talked to worked in the industry kind of in the 70s and 80s
and are now retired and are very happy to talk about their experiences.
But I'm also trying to get folks who are, folks who are hands on today and really get
a sense for, you know, what, what has inspired the work that's happening today, as well as what,
you know, modern work and processors looks like. You have a very, very interesting background,
though, that I think I'll probably not have anyone else on the show who has something similar, but
you, I believe, dropped out of high school, right? After getting the
Teal Fellowship. Do you want to talk a little bit about, you know, how you got interested in
computer architecture and then how that whole process came about? Yeah. So I think some of my
earliest memories, like three, four years old, were when I was in front of computers. So I've
been a computer user and very interested in them since then.
So this is right at the turn of the millennium.
So, yeah, back, you know, still like early memories of dial-up tones and such.
But, you know, the fact that my parents let me on the Internet in, you know, 2001 when I was five.
Right.
Maybe, you know, good or bad idea.
Not sure.
But I just had to, you know, it was my obsession to be playing with computers.
And I started to connect, like, more just using it as a tool and, like, the interest in, okay, could play games, can, can, you know, actually learn things
online. I was, uh, uh, throughout elementary school, very into, uh, uh, trying to research
as much as I can and just dive into Wikipedia for three plus hours at a time. But, um, uh,
but yeah, my dad also purchased, um, uh, electronics kits from the sixts and 70s, the old, you know, 100 projects in one things. And
so I also then got very interested in, okay, not only, you know, can do computers do all these
amazing things, but I can actually take, you know, a little piece of cardboard with a whole
bunch of components and follow this instruction booklet, which I think I actually started using those booklets before I actually knew how to read, um, and, uh, uh, can make a, you know, light turn on and then,
you know, graduate, graduate to a crystal radio and, and, uh, so on and so forth. So, um, yeah,
it was really, those really early formative years that got me interested in electronics.
Um, and then by the time, you know, late elementary school, middle school, I just
would take apart everything in the house. And thankfully had parents that were okay with me
taking things apart as long as I convinced them that I could put it back together. And I had a
pretty good success rate of that. But even when I couldn't put something back together
if I showed that I actually learned something about it then they weren't too mad
that's awesome we've talked a little bit you know about kind of like digging through abstraction
layers and you know I've heard a lot of folks who have, you know,
somewhat of a similar early interest in computing who end up being software engineers,
but you kind of pushed down and maybe it was some of that, you know,
electronic kit influence and that sort of thing.
But what was it that kind of, you know, brought you from using computers
to want to dig down and understand, you know, mechanically how they work?
Yeah.
I think those early experience with just really basic electronics kits,
I really appreciate putting my hands on things and especially experimenting like, okay, are there other ways that I can get this to function other than what the instructions tell me?
And I think it's, you know's similar, probably unsurprisingly,
was loved Lego sets, but I would go buy a Lego set,
put together what it said, at least maybe half of it,
and then have more fun just making something entirely new.
And so that hands-on aspect, I think, was just always with me.
And then by the time I actually started learning, you know, computer programming, thankfully, in middle school, I was able to get into a public charter school called the Advanced Math and Science Academy.
And they actually, you know, were starting, you know, teaching Python in sixth grade.
And so I actually got much more interested when they were teaching Python, you know, teaching Python in sixth grade. And so I actually got much more interested
when they were teaching Python, you know, learning that, but then on the side, you know, I've had
Ardrinos and a few, you know, basic stamps and stuff like that, and started more on the C side,
but also learning Python. But I guess even though I learned and could appreciate and build some things on Python and higher level software side,
there was nothing like, it was nothing like writing C code
and then getting the very simple Arduino version of C
and being able to get that to run and actually blink an LED.
So it was really just always on that, on that hardware side for me.
Gotcha. And so then you get into high school, and I assume that's when you applied to the
Teal Fellowship. How did you learn about that? And what was kind of the process for getting
into that program? Yeah, so when I was in eighth grade, so still middle school,
the director of the computer science program at my high school
recommended that I went to this event at the Google Cambridge office. And I grew up in central
Massachusetts. And so what there was a day, you know, encouraging, you know, middle and high school students that they should, you know, study computers.
And I was already pretty hard set on that.
But there was a woman there who was, I believe, a manager, director or something at iRobots, which is a local Boston area company.
And she had met her. she thought I was smart. And then a couple
weeks later, she emailed me out of the blue through my school and sent a link to the announcement of
the Teal Fellowship. This was back in 2011. And she said, you know, she knew one of the organizers of the Teal Fellowship.
And she said that it looked like something interesting for me.
You know, the kind of crazy thing about it, I was, you know, 14 at the time.
I was, yeah, a bit young for uh for it so I at that uh around the same time in 2010 I started actually
being a research affiliate at uh at a research lab at MIT and um worked there for three years
and then finally in 2013 I uh got accepted into the old fellowship., that's incredible. I mean, you obviously showed quite
a lot of promise, but also really, really great of her to reach out. And, you know, I think a lot
of times we look at folks, you know, maybe being in eighth grade or something like that and assume,
well, maybe they can do that later. But that's really cool that someone was looking out for that
and, you know, thought it's not it's never too early. So that obviously that's really cool that someone was looking out for that. And, you know, it's not it's never too early.
So that obviously had a really positive impact.
I think the good thing was, was the director of our CS program at our school was I don't think she realized that it was actually dropping out of school was the like encouraged thing. Um, and so I'm sort of doubting
that if, if she knew that, that, uh, would have made that, that connection, but, um, right. But
yeah, it was, you know, pretty serendipitous. Absolutely. So, so you get into the Teal
Fellowship and, um, I believe there, um, is kind of like a period of time where you're kind of
figuring out what to work on, right?
How long was that period of time?
And then obviously you and some other folks founded Rex Computing.
What was kind of like the process of getting into Teal Fellowship and then getting to that point of starting the company?
Yeah.
So the application for the Teal Fellowship at that point closed, you know closed December 31st, 2012, so I applied then.
But for basically the past year and a half, roughly before that, the main project I was working on at MIT was maintaining and doing tests on the cluster computing system at the ISM, the lab I was at.
And the particular focus,
I was evaluating a whole bunch of new ARM-based processors
for use for cluster and high-performance computing.
And we were the first to benchmark the CalSATA,
if anyone remembers them,
the first ARM server processor back in 2011, 2012
timeframe. And so, you know, that was really my formative, formula of years learning on distributed
computing and, and aspects really at a much higher level than the actual processor architecture, but was very interested in the idea that processors that were designed for embedded systems,
ARM at the time was really just embedded in industrial and mobile,
and really smartphones were really only just starting to really kick off,
and ARM was getting some real steam with that. But the key thing there when
applying for the Thiel Fellowship was this idea that I wanted to bring that sort of research work
and my belief that, okay, these low power processors are going to be able to actually
scale out and utilize them for general you know server industrial you know web
server processing and so and that was I would say a pretty risky novel idea
outside of you know Cal SATA at the time was basically the only company pursuing
it and so that that was my Teal fellowship application was utilizing and
continuing that work but actually starting a company around it.
So there was a whole multi-round interview process for the fellowship and then got selected in May of 2013 and moved out to the Bay Area in June.
So basically that startup, like really just getting my way of the land in the Bay Area. You know, I was 17, had just moved out,
never having been away from home for more than two weeks.
And so getting to meet the other folks in the Teal Fellowship,
there's 20 selected per year, all under 20 years old.
And spent basically that summer getting away from land and actually incorporated
Rex in August of 2013.
Yeah.
So pretty quick from the beginning.
And when you incorporated, you already had kind of the idea for what you wanted to do
as well?
Yeah.
Well, the idea originally was generically, you know, go after low power, high performance computing.
And the target market for that was the big data,
being the big buzzword at the time, data centers.
And the earliest folks that I was talking with were at Facebook.
So I got involved in the Open Compute Project pretty early on.
Later became a coordinator for the high-performance computing group within OCP.
But yeah, effectively, the really high-level outline was there.
The first about six months after that, basically through the end of 2013,
was really focused on utilizing ARM and was after having a lot of conversations
with all of the ARM chip vendors, ARM themselves, looking at what IP
and core designs they had in their roadmap, et cetera, that
came to the sad realization that it's not quite there yet.
So ARMv8 specification just came out that summer.
All of the chips out there were still 32-bit ARMv7.
And so ARMv8 was still going to be around the corner, but even that first generation of them, I didn't think were going to be really competitive in the market against what, you know ago, that said, okay, well, if ARM isn't but I think the general thought was, okay,
with ARMv8, they're actually embracing a lot of the CISC elements of x86. And they're still not actually providing a real competitive advantage. And the real root of the idea, like end of 2013,
early 2014 was, I have this Teal fellowship, it's, you know, I'm getting $50,000
a year, which is nothing in, in, you know, to live in the Bay area, but I was okay living with,
you know, five, six other people. So, um, basically what the Teal fellowship really
gave was that freedom to think a really crazy idea. Like why don't I make my own processor
architecture? Um, and, uh, uh, actually take the time to explore that in a way that didn't feel like the immediate time pressure of I need to support myself beyond the fellowship.
Right. That makes a lot of sense. ARM IP that was available. You know, you're kind of saying that you have that freedom. So maybe
this wasn't as much of a consideration. But was the cost of licensing ARM IP, did that come into
play? Or was it mostly just like the capabilities weren't there at that point?
It was really the capabilities. Thankfully, had some really great connections with folks at ARM. And they were trying to be, you know, extremely supportive.
They, early on, but from, you know, the connections made at MIT, the folks that at ARM that wanted to
go up to, you know, higher performance and capabilities, you know, were trying to be as
helpful as possible.
But as we've seen in this past decade,
ARMS only actually got competitive in the past three or four years.
So it really took time for all of that to evolve.
And I should say that also evaluated other architectures
and I think a large part of the drive to think about designing, you know,
new architecture from scratch was seeing that there were plenty of what I thought were good,
interesting ideas, you know, be it a decade before that with, you know, MIT, RAH, and Tyler,
and then, you know, later, as I learned about, you I learned about MultiFlow two decades before, etc.,
that my thinking at the time, especially being in the Bay Area
with the entrepreneurial vibe and a lot of people's thinking
being around, there were plenty of good ideas in the past
that failed, not because they were bad ideas,
but it wasn't the right time,
you know, being a big, big thing,
kind of lended credence and belief on my part
that, you know, this could actually be the right time
to do something new.
Absolutely.
That's one of the things, so, you know,
most of the research I did for our discussion today
is from this talk you gave at Stanford where you really go through what y'all build at Rex,
essentially. And I'll link that in the show notes and would encourage folks to watch.
But one of the things that I really appreciate, and we'll talk about some of these concepts here
in a second, is the callbacks you do to talk about these designs that have maybe been used in the past or concepts that have been
used in the past and in some cases are kind of like mocked right they're they're regarded as
not working out and you kind of go through it and debunk some of those that's something I find
really interesting looking at even like the risk versus CISC debate right because that happened
in the 70s and 80s and things like that and then we seem to go the way versus CISC debate, right? Because that happened in the 70s and 80s and things like that.
And then we seem to go the way of CISC architectures with x86.
And then obviously, right, with what we're talking about with ARM and RISC-V and, you
know, some of the stuff y'all were doing, obviously RISC has come back into vogue here.
Actually, tomorrow, I'm another kind of like behind the scenes, I'm recording with Robert
Gardner, who designed the spark instruction set um and we're focusing on um register windows and talking
about you know like why they succeeded and why they didn't so um there's a lot of times the reason
things didn't succeed is not um that they just weren't good ideas right there was some other
factor um and and maybe we can use that kind of to jump into what y'all were starting to build there. So I think the my main takeaway was
like two big concepts from the architecture that y'all designed. And that was VLIW, so very long instruction word, and scratchpad memory were kind of like the
two things that I took away from that talk.
Do you want to maybe start with VLIW and talk about the history of that and why y'all chose
to go with that architecture?
Yeah.
So I think a big reason I gravitate towards the history is, you know, other than my love of electronics as a kid, I was very bored in school in general.
But the only class I was very actively involved in throughout my school, it was always history. And so I, I just loved learning about, you know, everything from, you know,
ancient Egypt and Rome to, uh, World War II and, and, uh, really getting an idea of, you know, the,
the causes and effects and, you know, how, how, you know, everything has branched out, uh, from,
you know, over, you know, only a few thousand years of, you know, modern history.
So that always fascinated me. And then, you know, computer history and how, like the fact that really, you know, electronic
or if I expand a little bit, mechanical computers have a much, you know, shorter, you know,
less than 150, 200 year time frame.
And the evolution there has been remarkable.
And I just, me learning any of the technical details of things,
it just becomes so much easier.
And I think I make deeper connections,
both from a just personal enjoyment standpoint,
but also like being able to analyze things
in I think a different way than others
by actually learning the real, you know, historical line of things. Like if you understand
why, you know, in general, smart people made certain decisions that in retrospect,
people ridicule. If you actually look back and look at the context of how they came to that decision, it gives a lot more insight and usually suggests that, no, they weren't dumb.
They actually made a pretty reasonable choice given their constraints and such. interesting thing just since brought up the risk um uh cisc wars and the the uh 80s and and and
beyond um a lot of people like don't actually recognize that risk was actually a thing before
cisc was in the 60s and it's kind of overlooked but like i would i a lot of people you know that
are nerdy about this sort of thing would would argue the CDC 6600 was the first RISC machine.
So that was, you know, Seymour Cray designed first, you know, real supercomputer that had mass market adoption.
And fundamentally, like, what is RISC?
I would be, you know, a lot of people, probably because it's the easy thing for more lay people to understand, just think, oh, well, you have a fewer set of instructions that you build out and do pipelining and things like that.
But fundamentally, it's a load store architecture.
The fact that you're having to actually move all of your data into registers and you're operating on registers is a very different paradigm from, you know, early CISC machines,
you know, going, you know, system 360 onwards. And so just in, if we go back to what I think of
being like the first real CISC war of going between, you know, Control Data Corp, which
Seymour left and then formed Cray. But, you know, the first real risk architectures there versus System 360, which power...
I actually haven't looked into Power 10 recently, exactly where they have compatibility break off.
But a huge constraint on the power architecture for a very long time was IBM's insistence on being able to have binary compatibility all the way back to 1968.
I think they still have it on the Z machines, which isn't power.
There's other processing elements there.
But yeah, the evolution of risk to VLIW, a lot of people,
and I would say the first true VLIW was
the EY, which was extremely long instruction, 512, which was Josh Fisher, which then led to
multi-flow that I mentioned earlier. But a lot of people like don't actually know about floating point systems. And they're super computers from the mid-70s that didn't call their thing VLIW at all, but has all of the key characteristics.
The name is misleading in saying a very long instruction word.
Everyone just thinks, OK, that must be what defines a VLIW. But in the same way that I would say risk is fundamentally that load store architecture,
you're doing all of your operations on registers,
and you're needing to move that in and out of whatever other parts of your memory
before and after operating.
The core thing with VLIW is this idea that you can actually be doing concurrent operations
that is very distinct and different from just parallel operations.
And that you can encode in such a way
that your instruction stream
and that very long instruction,
all of the operations that are being given
in that single word are things that
you are making a guarantee with the system
that the program or whatever is emitting that instruction word is making a guarantee that these things can all operate concurrently without hazards.
Yeah, absolutely. extremely powerful idea for the mid-70s with FPS and then in the supercomputer space.
And then it was really, I would say, Multiflow that tried to take that to a much grander level
and be actually saying that we're going to build a compiler that can do this for you,
that it's not the programmer, a really brilliant person drawing
things on paper in the 70s with FPS and trying to manually schedule this, but that this can
actually be a control flow graph and you can algorithmically determine what things can be
done in parallel. And that, frankly, has been the holy grail problem
for 30, 40 years now as it relates to VLIW systems.
Absolutely.
And when you talk about different ways
to get instruction-level parallelism,
what we're kind of talking about,
and this is with RISC and CISC as well, perhaps,
is do you do more work in the hardware
or do you push more complexity to the hardware? Or do you push more complexity
of hardware? Or do you push it into the compiler? And, you know, the way computing systems have
evolved have changed that trade off. So some of the discussions, you know, looking back at things
in the 70s and 80s, you know, assumptions about, for instance, you mentioned a human, you know,
writing a program, it's very different when you're comparing that to a compiler generating
machine code.
So that's, you know, one of the things that I frequently see with some of the research from
back in that time and then applying it today. That's just one vector where the trade-offs may
have changed, you know, throughout history. The architecture, the processor that you
referenced in that talk that kind of gave VLIW a bad name was Itanium,
which I'm sure a number of folks are familiar with. So with VLIW, we can potentially reduce
the complexity of the processor, right, because we're pushing more of that into the compiler.
Can you talk about why Itanium was not successful in doing that or why
it's maybe mocked today? Yeah. Well, as I mentioned in that talk, and I'll stick to my stance here,
I do not consider, you know, this is trying to like disown a, not even my own child, but something
that gets associated with the thing that I love. But I do not consider Itanium, Itanic, to be a VLIW architecture.
And my best evidence or thing I can point to support that
is Intel's own marketing from the time,
which they were very explicitly trying to say that it wasn't VLIW back then.
They coined this Epic name,
so it's the explicitly parallel, you know,
instruction set computer. And I appreciate whoever in the marketing department at that time,
you know, was doing that. But even the, you know, technical reasoning for doing that
was their belief that they could actually take the really
powerful, amazing thing about VLIW of being able to have these explicitly encoded parallel or
concurrent operations that could be done. And I would say they had a bit of, and I've heard
different stories from different people involved at the time. But in general, I think everyone would agree that there were strong battles internally in the both Itanium team itself.
But, you know, being heavily, heavily constrained and influenced by other parties within Intel that were really trying to heavily push the, you know, x86 Pentium continuation elements. And what I think the real downfall for Itanium in diverging from sort of the pure VLIW roots,
which have ended up being successful in many other products that we'll talk about, but
was them trying to still have support and some level of compatibility with x86. And all the elements and kitchen sink pieces
that were getting moved in from other parts of Intel
and being bolted onto Itanium
that kind of turned it into not being VLIW
and I would say very much being Epic
and turning out to be an epic fail.
But yeah, I just, you know, to, I guess, expand a little bit on it.
It's fundamentally comes down to what I was saying before, where VLIW is the programmer, the compiler, whoever,
whatever is giving that instruction.
It's its job to actually be saying that we know at this cycle that you're
going to be having no control or data hazards with this data providing. And when you start to
try to add a lot of advanced processor features like branch prediction, any sort of caching
systems, and especially if you want to actually have some instruction compatibility with x86,
which was a big goal that Intel never actually even delivered on with Itanium.
You add in so much requirements of non-deterministic features
that that guarantee can't be made.
It's impossible. The delusion and
the reason I think that the Itanium failure is so well known by people that aren't compute architects
but that they still associate it as VLIW bad because of Itanium, don't realize that it
wasn't that they just weren't able to make a sufficiently smart compiler.
That was an impossible task.
I would say probably not provable as impossible, like P is or is not equal to NP. I would say it's similarly non-determinable, but any rational
person, I think, would be saying that P is not equal to NP. And if it was, then the whole world
collapsed and none of us, nothing we say about it matters if it was. And so I would make a similar
assertion that Itanium's mythical magical compiler was impossible due to all of the
non-VLIW cruft and design directions that they tried to do it to make it an appealing
product from Intel's internal perspective.
And fundamentally, I think that those decisions were rooted in Intel didn't want to
actually diverge from x86. Right, right. Absolutely. That's kind of a,
once again, kind of referencing back to like where you put the complexity,
you have to make sure wherever you push the complexity that you give sufficient information
to make the decisions in that place. And it sounds like in this case, right, the, um, it, you know, compiler authors
or human, you know, humans writing a machine code or assembly, um, you know, they didn't have
sufficient information to reason about performance and maybe some of those hazards as well. Um,
so that, that makes a lot of sense. And then with, you to take a full pipeline flush of 40 plus cycles at that time, if I remember correctly.
You're going to have a really bad time.
So I think the key thing to remember is code worked on Itanium.
You could run programs. They just performed horribly because those guarantees that that correct level of abstraction and where the handoffs are in complexity were like, I think this is the other thing and why I don't think a lot of the engineers and people associated with Itanium like to talk about and and don't like to correct the record on a broader scale um is because so many people and it becomes like within the broader
um computer science you know culture like a known failure beyond just in computer architecture
architects that um uh people don't like give credit to the engineers.
The engineers actually want to make a good product.
They actually want to do the right thing.
They made a lot of good decisions.
But when you have the wrong requirements going in
and the wrong request demands of a system,
you're going to end up, you know, not making the right
thing. Right, right. Absolutely. Well, so in your Neo architecture that y'all designed at Rex,
y'all took a different approach. And you talked about in that talk, having hard real time
determinism. What was kind of the strategy?
Are there any kind of like things you could point out specifically in the processor architecture
that allowed you to be able to do that in contrast to something like Itanium?
Yeah.
So probably the biggest element that stands out but also vastly simplifies everything, is the fact that we went to a purely scratchpad-based memory system.
So in terms of what would normally be L1, L2 caches and traditional processors,
rather than having the TLBs and MMU and all of these extra pieces of logic,
which take up a lot of area, a lot of power,
a lot of design time and complexity, and add latency and such. Basically, think of you've
got your L1 cache, but we just rip out all of that, and you're just directly accessing that
memory, the physical addresses for those pieces. So at the individual
core level, it was, you know, and I think a lot of how we talked about it to most people,
just think of it as being a simple risk core to start off with. In reality, it's VLIW, but
VLIW got such a bad rap that you can still, it still meets the definition of risk. And, you know, most people
don't actually care how instructions are generated for it if they're not responsible for doing
anything related to it. But, you know, had, you know, a 64-bit ALU IEEE 754-2008 compatible,
64-bit floating point unit. And then, you know, two separate register files for each of those,
and then the scratchpad memories, which were divided into multiple banks.
And really, the VLIW, the instruction board that used to control these elements, enabled you to
be doing an ALU op, a floating point op, as well as two load and store operations simultaneously,
all in a single cycle.
And at that basic core level, that sounds pretty simple.
It's no more difficult than, and in a lot of ways a lot easier than programming computers
pre-1985-ish with the proliferation of caches.
Like you just directly access the memory system,
and it happens that a single one of those cores had more, you know,
main memory just embedded within that core than a lot of the early computers
that probably a lot of people listening to this podcast grew up with
in the early 80s, 90s.
So, but yeah, we expanded.
So the core difference is that just having that direct access to memory drastically simplifies
and the benefit from us on the software layer is because you have that exact deterministic
operation for every single instruction you're doing all through
memory access. Take a predetermined number of cycles, number of nanoseconds.
When you're actually generating, when you have your compiler and the compiler is trying to
generate a control flow graph for your program,
it knows exactly when data is going to be where and that, you know, there's not these,
the potential that the hardware is actually going to be doing something in secret and there's going to be a stall that needs to be inserted for God knows how many cycles. Right. And when you were
describing that, it kind of made me think, it feels like a similar kind of like pushing the complexity or maybe even you could say pushing the capability onto the compiler to, you know, not just in the instruction and control process, but also in the memory system now um which is kind of like interesting to like lean
fully into to that kind of model yeah yeah and so um and it's not like no one was doing that
between itanium and us like there are plenty of vliw architectures that had great success
primarily in the signal processing world um and so, you know, between, you know, 2000 and
at this point, 2014, 2015, for Rex, you know, there were like basically every single cell phone.
If you think about the number of devices there, maybe by 2015, I may just, I think it would be believable that there are more VLIW processor cores out in the world than x86 processor cores, because it would be in every single baseband for your cell phone.
It would be in every DSP that's doing any ADC DAC work with anything that's microphone connected to it. So, you know, the
VLIW, what a lot of people don't realize is VLIW won in a very silent way in application segments
where you needed that determinism. Like in most of those DSP applications, it was a requirement
going into the design of that processor and the overall product that those processors would be going into to have some hard guarantees on timing.
And the easiest thing for those designers to do was ensure that the hardware behaved exactly as designed and not to have these elements that were originally designed to make general programmers' lives easier.
The reason that hardware-managed caching took over was back in the late 80s through 90s,
processor architects were getting free transistors every two years.
And so if you just keep getting free transistors and you almost don't know what to do with it,
you're going to start adding complexity to your design in order to make what is thought of as the,
you know, your end customers, the people that actually need a program and utilize these
machines, make their lives easier, allow them to address, you know, a virtual memory space
and not have to understand all of the increasing complexity
that's being introduced. And I think fundamentally the big, you know, our thesis for Rex was
we wanted to do much grander programs, applications than what DSPs were restricted to.
And we were targeting, you know, a higher level of performance,
et cetera. And our belief, and I think, you know, we did prove out on this, is you can have a sufficiently advanced compiler to be able to do that sort of scheduling, as long as you make those
guarantees, and if you're willing to actually spend the time with compilations. Like, our compiler wasn't fast. It was developed off of LLVM. So I think the big advancements that only really came in the early 2010s was sufficiently advanced open source compilers were there. So the LLVM project was the biggest advancement to actually allow
architecture independence and Apple's investment of actually making that open is
one of the best saving graces for the entire industry at the past decade. I don't think we can thank Apple enough and Chris Lattner and everyone involved in that project for not only
doing a good job and making it possible for Apple to do the portability that they wanted
for Mac going from PowerPC to x86,
and then they wanted to also enable those developers to develop on an x86 Mac
but for an ARM iPhone,
and then eventually them now moving fully to Apple Silicon based on ARM.
They could have kept all of that closed and to themselves,
but they realized that that was also a really big problem for Google and Microsoft,
and actually making that a whole community was one of the developments of like the past 10, 15 years,
in my opinion.
But, you know, directly enabled us with Rex to be able to start off with really advanced,
you know, SSA form of the static intermediary representation of LLVM IR,
and then be able to do our specific optimizations and that control graph work
to be able to emit, you know, efficient performance assembly for the architecture.
And even though that compilation time took longer than if we just went with a RISC and had all that complexity in the hardware, compared to the late 80s or early 90s or 2000 with Itanium, just base computers, just the desktop computers that we're doing compilation jobs on,
were so much faster.
So I think VLIW, to actually have VLIW be broadly successful
and usable for larger applications than what they,
in the embedded spaces where they were successful,
really needed you to be willing to spend
a lot more horsepower on that compilation stage
and just the natural evolution of computing performance
enabled that to actually be practical.
Absolutely.
You talked about last time when we chatted a few months ago
about the kind of like software side of things
sometimes being the hardest part of developing new hardware. So you mentioned there you know that y'all leverage lvm
which was obviously um hugely advantageous um what other kind of um uh obviously there's
reporting of operating systems and things like that but what were kind of like top of mind for
y'all um when when tackling things other than the compiler that you thought were necessary
to enable the Neo architecture to get adoption?
Yeah.
So our real focus on the compiler side was getting base LLVM and Clang working.
So Clang is the C and C++ front end for LLVM.
At a high level, LLVM is kind of broken into three pieces.
There's LLVM front ends for basically any high level language you can think of.
Those front ends then actually output this LLVM IR, the intermediary representation, which is a static single assignment representation of the program.
But basically there's no mapping to actual registers or anything that is hardware-specific at all.
But you have a control flow graph, and you can get a lot of intelligible and do a lot of optimizations just on that SSA form.
And then finally, there's the backend that takes in that LLVM IR and actually generates assembly and machine code for the target architecture.
So LVM being standing on the shoulders of giants there for those front end and a lot of the IR optimizations.
But we had to build a custom backend. Thankfully, there already was a pretty good VLIW backend that Qualcomm
released for their Hexagon DSP. And if I remember correctly, it wasn't actually mainline LLVM yet at
that time. They had their own branch. Then I would say we more learned from that in developing our own backend than like,
there are plenty of differences with, with hexagon, but yeah,
that, that was the biggest, biggest lift, you know,
but even once you have a compiler, even if you've got a C front end,
that's kind of useless if you don't have like a lib C that is, you know,
actually targeted for your architecture so
um you know my co-founder paul subxon um uh was really the software brains behind behind rex and
doing all all that work basically single-handedly so um i think it also i didn't really talk about
the founding of rex or anything but you know i very, very proud of what we did with, you know, basically four, you know, employees and going from, you know, raising our seed round in 2015 to taping out in less than a year from raising money, basically six months from actually getting started on
real RTL that ended up in the final thing. So having all of that, and even just getting to
that early level of, you know, a C compiler that can actually take, you know, basic C
was a really great starting point. But where I think a lot of the soft, like our plans were
not in that reasonable time to have an operating system running on there, but actually having a
good CUDA, or sorry not CUDA, ah, thinking of modern things now, but having a good BLAS library,
so the basic linear algebra subroutines, be able to have those optimized for our design.
Like the first benchmarks, like what we showed at that Stanford talk was the Artgem kernel in FFT.
So getting those to actually compile and run through was an accomplishment, but obviously the industry as a whole has much higher demands of supporting much larger packages and libraries than what we could tackle with our team size. With the kind of like accelerated timeline you had because of funding, but then also,
you know, the time it takes to get a chip taped out and get it fabricated and that sort of thing,
you obviously had to be working on software in parallel with the hardware. You talked a little
bit about this in that Stanford talk as well. But what kind of strategies did y'all use? Did
y'all have simulators that y'all built
for the architecture? What was kind of your process there as a team? Yeah, in large part,
because we didn't have much time just from a like base burn rate situation. And because we didn't,
we only raised $2 million, which sounded like a lot when I was 18. And, you know, but yeah, it doesn't last long.
And, you know, we couldn't afford any fancy simulation or emulation tools. And so our
software development and all of the verification that we did on the design and basically the,
we started, you know, RTL and hired our head of engineering in November
of 2015. Between then and when we taped out in June of 2016, was all done with Verilator,
which is an open source project. I should clarify that. was, you know, waveform simulation done just like for individual
blocks that was done just with incisive, the cadence simulation tool.
But anything that was like a full core and greater was done using Verilator, which a lot of people told us was very crazy,
you know, using an open source tool to be the thing that we're trusting to
actually tape out a test chip on. But, you know, we did have to make a lot of modifications to
increase performance with Verilator to get to the level we wanted. You know, we did have to make a lot of modifications to increase performance with Verilator to get to the level we wanted.
You know, we started off with very crazy ambition of being able to do a full core level synthesis.
So going from RTL to having a net list and, you know, every single day.
And we did get there basically by January of 2016.
So, like, we were doing very rapid iterations on our cores and, you know, at least weekly having full top level synthesized and having, you know, our golden model was that RTL that was being synthesized and taking that exact RTL, running it through Verilator to generate
that C++ simulator of that RTL and be able to actually then use Verilator to generate waveforms
to compare with waveforms of individual blocks. And so that whole development flow was
completely foreign to every single person we talked to in the industry.
But if we didn't do that, there was no way we were going to be able to build a chip on
that schedule.
And yeah, it's funny now having spent the past six-ish months working on some new hardware,
haven't been able to replicate that level of...
What we're doing now is a bit more complex and everything,
but it's crazy to think back that we got that done purely out of necessity,
and it worked without major problems.
Absolutely. Yeah, I've used Verilator before myself, and it's excellent. And another thing
that's, I think, in the last maybe decade. We've had a lot of problems back in 2015 that have been
fixed. I'd imagine so. I'd imagine so. One of the things that, so I've,
I've mostly, um, worked on FPGAs and, um, you know, after that, uh, it's, you know, similar,
similar process of, of going through synthesis and netless generation and that sort of thing.
Um, but then you, you know, eventually generate your bit stream to, to load onto the FPGA. And
one of the things that's been really nice, um, you know, being an individual who's doing much less than y'all were as a four person team, but is the open source tooling around
synthesis and that sort of thing, which has really in the last decade, maybe even five years, you
could say, you know, grown up quite a lot. But everyone I talked to who works hands on in the
industry still indicates that there's like a pretty significant
gap uh between what's available open source and and the proprietary tools yeah and yeah i i
i i have played around with the open source synthesis tools and it's it's um
i guess i feel like i'm spoiled brat and saying that, yeah, I just can't.
That's it. Uh, Verilator still, I think is, is for doing the specific things it does really well,
which is enabling you to have very, very fast simulation. Like we were doing full chip, um,
uh, simulation based on taking from RTL at like one and a quarter megahertz.
And then this was simulating the full 16-quart chip on, you know,
2015 era, you know, low-cost Xeons.
So, like, you know, our incisive simulations were running in the hundreds of kilohertz.
No, wait, sorry, way less than that.
Yeah, I think 10. It was somewhere in the order of 10 to of kilohertz. No, sorry, way less than that. Yeah, I think 10.
It was somewhere in the order of 10 to 50 kilohertz.
So being able to actually do,
and I think that was like single core.
Like we couldn't even,
I don't think we actually did anything larger
than a single core with incisive.
So yeah, it's,
there later enabled us to do, you know, constrained random test generation and actually doing, you know, nightly regressions and everything that I'm sure we could have bought a very expensive tool from Cadence or someone else to do that.
But I'm pretty sure we actually ended up with something that worked better for us, just with that free open source tooling.
So, and then like when it comes to Yoast
and everything today,
yeah, I think in large part
because there's not really a practical way to use all open source tooling to get to building anywhere close to leading edge chip.
I know there's the open, I'm forgetting the name of it, the Google sponsored project for doing shuttle runs.
Right, OpenMPs. Yeah. Um, and so
like, I think that's great for, for students and I'm glad that something like that exists,
but I guess I've gotten spoiled there where, you know, I'm, I'm never going to make a chip,
you know, I already made a chip at 28 nanometer. I need to be going smaller and smaller take out so um right so but yeah that's um
oh and and finally just on the fpga note um we we actually didn't start doing fpga simulation
you know test of our our chip until after we taped out so really we were so focused on getting the
the chip working we actually did buy a pretty high-end ultra-scale,
Xilinx ultra-scale FPGA to do simulation with.
And we tried doing some things before tape out,
and it was taking too much time,
and we weren't making progress with it.
So we just said, hold off on it.
And we didn't actually start actually using the FPGA for doing some physical testing of connecting to the development boards that we
were designing before we got the chips back. We wanted to test some of like, okay, can we have
this on the FPGA and actually connect the microcontroller developer board that was going to be put onto our chip. Can we take that evaluation kit we got from STMicro and connect that to the FPGA and actually
have that communication work?
And then we found bugs in our design post-tape out on that FPGA, which was sad.
But because we had that time of knowing those bugs existed we we figured
out ways to to get around them by by the time the real chips came back right and and uh obviously
you know um the the company kind of wound down a little bit um after y'all got the chips fabricated
um i do have to ask where are the chips today do you still have them on hand have been, you know, what are the interesting use cases that they've been used for so far?
If you make nice display pieces, they're not very good heat generators.
That's, you know, some of, you know, cold nights.
I'm like, I have 181 of these chips, you know, in my closet.
You know, it would be nice if we made it too power efficient. 81 of these chips in my closet,
it would be nice if we made it too power efficient.
So it just can't be easier.
Yeah.
The long story short on that is
2017, I think we officially ran out of money
two or three months after the Stanford talk.
So April, May of 2017.
A lot of reasons for that, which we could go into.
But yeah, I think the key thing, what can the chips do? We had, you know, amazing floating point performance and efficiency for, you know, what we had built, this low dinky 16 core test chip.
You know, that like measured, you know, validated, you know, double precision floating point performance per watt on 28 nanometer was better.
It's basically equivalent to NVIDIA A100 on, what, 7 on TSMC 7.
So, like, we were able to demonstrate that worked.
I would also say, even though the software was really early,
it had the components there to show that, you know, with more people,
more time, the software stack can function.
The core problem with the test chip was we, in the early fundraising we did,
and kind of our belief primarily as technologists when starting the company,
was that we needed to build a chip.
That was what we had to do to show potential customers, investors, et cetera, that, you know, we're serious. And, you know, a large part of that is also, we were both really young, you know,
18, my co-founder was 20. And so we kind of had blinders on of, we just need to get this chip.
And once we have the chip, that will be what unlocks us to get to the next step. And,
well, I should say chip plus the basic software tools around there.
And then, you know, all of these investors that to our faces had told us that, you know,
they didn't believe we could actually make a chip when they decided not to invest in
us.
Then they would come around once we demonstrated we could do that.
And it wasn't until, you know, after we had the chip and we were very proud in demonstrating that
that uh uh it became very clear that oh this thing we built greatly demonstrates the technical
potential but it doesn't actually demonstrate like that it was a little you know 12 square
millimeter test chip and it was never designed to actually do the thing that the 100
square millimeter version would you know actually be useful for so right um that focus on just
getting to have silicon and proving that would have probably actually worked out fine if we
had raised three million or five million if we had some more buffer from when we had,
because we were running on fumes by February of 2017
when giving that talk.
Basically, by the time when we started talking
to investors and customers,
and they realized our cash situation,
everything turned into
acquisition conversations um right so it still worked out in some ways but uh um i the other
you know big big problem uh you know we we gave up a very very large amount of the the company
in in our seed round.
So for that $2 million initial investment, it was on a $3 million pre-money valuation.
So we gave up 40% of the company at the very beginning. And we actually did have two different investment offers that would have recapped the company.
But it turns out
investors that have the ability to block, you know, new investment coming in and such don't like having themselves diluted a small, you know, large amount.
So yeah, I think
we kind of almost like signed our death certificate from the very beginning where we didn't raise enough to ensure that we could get the chip and still have enough viability that we could have more extended conversations and make that real proof point with the technology demonstrator that we had.
And then secondly, we just gave up way too much of the company to actually make an appealing investment to anyone else. And like when it came to the acquisition talks, when folks looked at
the cap table and saw that we owned such a small part of our company, it's like, why would they
want to pay the investors and have
them get all of the benefits when after that deal is over like the primary reason that they buy
you know they do an acqui to you know get the the people and you know some value technology
they frankly don't care and would like the investors to get as little as possible. So that was a
very tricky situation. Yeah, yeah, absolutely. Well, so you went and did a few different things
after that. We're going to fast forward through those. Maybe in a future episode,
we'll have you back to talk about some of those but you recently um started a new company and just kind of came out of stealth as we mentioned
at the beginning of the show um do you want to talk a little bit about what y'all are doing at
positron and and maybe also um some of your recent announcements yeah um so back uh april April of 2023, started a new company, Positron AI, developing new hardware for accelerating machine learning models in the general sense.
But like 95% of our focus is on transformers and specifically like the large language model and large multimodal network types.
And so we haven't announced too much.
We did exhibit at the NeurIPS conference last month and sort of had a soft launch,
but we're going to be making some more real product and shipping product announcements this spring. But at a high level, what we did show at a working demo of our PCIe card-based accelerators,
having about a 5x performance per dollar advantage over NVIDIA H100 for LAMA inference.
And some of the upcoming announcements are going to be significantly greater than that
for some of the newer, exciting model types
that are out there.
So yeah, our core belief is that
what are considered large language models today
are going to be pun you know, puny
in the not so distant future. And we're making a very strong bet on having, you know, very large
amounts of memory capacity and having the right balance of memory bandwidth to compute. And like
the core thesis of, you know, why we started Positron was that the sort of first generation
of AI accelerator chips that, you
know, all the companies that were founded basically 2016, 17 till, you know, 2020 at
least, and even a number after that, really based their designs on the ratio of compute
to memory that existed for convolutional neural networks and other neural network architectures
where you could throw as many flops as you want and it improves your actual realized performance.
And fundamentally, what's different with transformer models, especially on the inference
side, if you want to have good throughput and latency for that model, it's a vastly different ratio required
for memory bandwidth and memory capacity.
And so we're targeting,
like with this first generation product,
having an order of magnitude greater memory capacity
than what is possible on NVIDIA GPUs,
be it today or in their upcoming product releases, and having a much higher
ratio of memory bandwidth to the actual flops.
What does the software integration look like for a processor that's specifically focused
on accelerating inference.
Obviously, NVIDIA has CUDA.
There's a couple other, I think it's AMD that has ROC.
There's a couple different models there.
Does y'all's architecture fit into some of those software stacks
or are y'all developing your own?
So the big thing and what got a lot of interest
and people looking at us very differently from, I would say, all of the other people trying to compete with NVIDIA is the fact that our software strategy is actually we don't have a compiler.
Or there's no compiler that any user of our stuff would ever know about or care. And so really how this first generation product is the
operating model for users is we have a we ingest directly the trained model
file for these transformer models so you know.pt is the PyTorch output
file that when you do training you know know, primarily on NVIDIA GPUs, that's the output.
You know, you can, there's safe tensors and a couple other formats.
But taking that trained model file, that is directly what we ingest and put on the device.
And so there is no porting. There's no, you know, the story that AMD and Intel and many others have tried to tell
and to varying levels of success of actually being able to do anything close to it in reality
is that them saying, oh, well, we support PyTorch or we support TensorFlow, or we have a process
where you can go through 16 steps to convert your PyTorch model to ONNX. And, oh, well, actually,
we don't have that offset support. And ONNX is always six to 12 months
out of date on new features, you know, being supported or, you know, they just decided that
they're never going to support that particular op. Rather than dealing with all of that, that
has been the bane of the existence of anyone trying to move off of NVIDIA. We're actually
being able to take directly your config.json for your Hugging Face Transformers compatible model
and take that model file
and directly load that onto the device.
So that there's zero compatibility,
our compatibility processes,
well, if you trained it on, presumably in NVIDIA,
really, if you trained it on anything, and that it's a model architecture supported by Huggy Face transformers, we're able to ingest
that directly. Wow, that's definitely a huge benefit there. I could also imagine,
not only is that really nice from an integration perspective, but if you are, you know, a startup, a smaller company, removing the need to build all of that software support is also nice from
an organizational perspective. Yeah. Yeah, it's drastically simpler for the customer users,
which is the number one thing, but anything that can also make our lives easier. And,
you know, we're very clear in our stated mission and goals that
if you're using, you know, some 3D unit piece, yeah, technically our hardware can support it.
Yeah, technically we could. And actually in one case, like we've gotten a very simple version of
that running and going through this, but at least in the foreseeable future, we're not going to be offering that to customers as an option.
We want this to be the simplest process for us to be able to get a product
out to market and people actually having a viable alternative to NVIDIA.
And, you know, I'm knocking on wood a little bit.
We're approaching one year from the company being founded,
but it was our goal from the beginning that we would actually have real product in customers' hands a year from delivery,
and it's looking like we from Rex and other experiences that I've had since then is identifying what is it that the customer is actually craving for? making, you know, their, their decision on and just narrowly focusing on that and, and keeping,
you know, your, your hand on the, the pulse of, uh, uh, you know, the customer throughout the
entire development process, you know, the, be it, uh, you know, myself with Rex and I think the,
the failure point for a lot of companies, be it new processors or otherwise, I think probably one of the main
reasons companies fail is that they're too focused internally building and building something
that they themselves want rather than really deeply understanding and being obsessed with
making something that the customer wants. And if even like there is a line and you have to
use a lot of judgment and it's a process to it, but, um, you know, the customer is not always
right, but they're usually more right than you are in making decisions of how, what is going to
cause them to part with their, youearned dollars for your thing.
So you obviously don't want to just build a faster horse when you could have made a car.
But there's a balance which I'm still trying to find, and I think all entrepreneurs need to work on that. But it's definitely a really hard problem and an exciting one.
It's the fundamentals of how do you actually build a successful business.
Right.
On that kind of set of customers that are interested in a processor that's going to accelerate inference, what are kind of those key pain points or the key things you think people are willing to switch for?
I mean, some that come to mind based on what you've described your product is the interface feels like it's really important, right?
And you're having a really simple, maybe even you might say high level interface to your architecture could go a long way.
I know availability is also just a big thing today.
In terms of other aspects of y'all's product, what are things that you've heard from customers and then tried to incorporate in as key tenants?
Yeah.
I think fundamentally, I think the vast majority of purchasing decisions, if you assume that it's functional for the application that they care about and that they can use it in whatever mode method that they desire, the next step of that purchasing decision where things boil down to is the performance per dollar. So it's can I actually, is this going to be a net savings?
And when going in against an incumbent, you need to have, it can't be something that's 10%, 20% better.
It needs to be, you know, ideally everyone says an order of magnitude.
So somewhere between those two extremes is a great point. And the nice thing about this particular market and how we're
trying to address this is, you know, what we're selling is actually not the chips we're selling
in the appliance of a full system. And we're really treating that as being like a black box
for LLM inference. So, you know, you can use either publicly available model, you know,
we're aiming to work with partners that have their own closed source
proprietary models that would be able to be run on our hardware and accessible to others,
or you can bring your own model. But from that point on, it's no different than the interaction
method with any NVIDIA system, where we have an open AI compatible API front end. We've built a load balancing system that
can actually is hardware agnostic. So you know requests can hit that load balancer
and be distributed to a positron box, to an NVIDIA box, to an AMD box, as long as
that API interface is the same as what those customers are used to. And most of the industry has kind of
gravitated to the non-officially defined spec by OpenAI, but the one that they created for their
own services. And if you support that, there's really no change from that user's perspective.
They're simply sending tokens in as a prompt and they're getting tokens back as a response.
And so that's token in, token out model
is really one of the biggest enablers
to making this easy from a actual usage
and adoption standpoint.
Once you get beyond that point of,
okay, is the model that I want to actually run
capable of being run on this
hardware?
Absolutely.
Absolutely.
Is the, you know, obviously things are moving at a pretty rapid pace in this area of the
software and hardware industry.
Are there any kind of like concerns or future proofing y'all do around building this hardware that adheres to that
interface, you know, to ensure that your product will be viable for an extended period of time?
Yeah, a key thing, you know, where even though we call, you know, what we've built,
you know, principally a transformer accelerator,ally, it's a linear algebra accelerator
that has a very clever method and connections
both for the memory system and for scalable networking
of many of these things put together.
So from the perspective of, you know, new model advancements just
in the past few months, there's been, you know, actual, uh, first, I would say real
viable competitors to transformers, uh, in the case of, of, um, uh, states-based models.
So the, the, there's, um, Hyena and there's Mamba being two things that caught a lot of attention last month at
NeurIPS. And while it's still, I think, too early to tell if they are going to actually scale out to,
you know, wider usage in their current states, you know, the fact that we've,
we're fundamentally a, you know, a linear algebra accelerator, like the hardware maps just as well to those different ways
of solving similar problems than how transformer,
multi-headed attention and transformers do it.
So we feel really good about having that flexibility
of changing as model architectures do.
And I think the biggest advantage we have
to in like the thing that I don't think that the other hardware providers are really equipped for
right now is being able to scale to larger model sizes, especially with things like,
I'm not saying dense models as a necessity, like the whole emergence of a mixture of experts with GPT-4, BARD, and now the Gemini version of BARD.
And now like MixedRole being in the open source space.
Having this method of having many experts with only a small number of them being used for every given token actually increases the total amount of
memory you need to hold the entire model. And so that's like perfect for this architecture we've
designed where, you know, we can support having, you know, current things are capping out at like
eight experts. I'd be happy if, you know, things moved to 32, 64 or more experts because we
actually have the memory that can hold that and we can do that
affordably. I think that even like the next generations and all of the new AI accelerators
and things that are heavily embracing HBM as a memory technology are sort of going in the wrong
direction. Yeah, HBM is great from a bandwidth perspective. It's in the name. But you're going to be paying at least four times more per gigabyte.
And you have hard caps on how much memory can actually be supported per HBM stack and how many stacks you can actually put onto a co-auth package.
And so we're embracing commodity memory for our scaling story, which is what's really giving us that massive
amount of capacity and having the right balance of bandwidth to compute.
Absolutely. That makes a ton of sense. Well, Thomas, I know we're getting close to the end
of the time we allocated here, and I don't want to take up too much of your time. But I do hope
to have you on again in the future
as you all roll out some new products.
What are some ways that myself and others
can keep up with you and Positron
as you all continue to roll stuff out?
Yeah.
So I would say the best ways right now
are on TwitterX.
So there's my personal TR Summers account and and positron, I think it's underscore AI.
But you can find that from from my Twitter, my x account, at least. And then our websites,
positron.ai. Awesome. Well, we'll make sure to link those in the show notes as well.
But it sounds like there's a lot of really exciting stuff coming up. So definitely we'll keep an eye on that. And yeah, thank you again for joining for one of the early episodes of this show. I'm very appreciative of the folks who are willing to jump on for an interview before they've heard any of the other ones. But I'm sure folks will get a lot out of this one. So really appreciate
you stopping by. All right. Well, thank you, Dan. Absolutely.