The Changelog: Software Development, Open Source - Programming with LLMs (Interview)
Episode Date: February 19, 2025For the past year, David Crawshaw has intentionally sought ways to use LLMs while programming, in order to learn about them. He now regularly use LLMs while working and considers their benefits a net-...positive on his productivity. David wrote down his experience, which we found both practical and insightful. Hopefully you will too!
Transcript
Discussion (0)
I'm Jared and you're listening to the changelog.
Where each and every week we have conversations with the hackers, the leaders, and the innovators
of the software world.
We pick their brains, we learn from their mistakes, we get inspired by their accomplishments,
and we have a lot of fun along the way.
For the past year, David Kroshaw has intentionally sought ways to use LLMs while programming
in order to learn about them.
He now regularly uses LLMs while working and considers their benefits a net positive honest
productivity.
David wrote down his experience, which we found both practical and insightful. Hopefully you will too.
But first, a quick mention of our partners at fly.io,
the public cloud built for developers who ship.
Check them out at fly.io.
Okay, David Kroshaw on the changelog.
Let's do it.
Well, friends before the show, I'm here with my good friend David Shue over at Retool. Now David I've known about Retool for a very long time.
You've been working with us for many many years and speaking of many many years Brex
is one of your oldest customers.
You've been in business almost seven years.
I think they've been a customer for almost all those seven years to my knowledge,
but share the story.
What do you do for Brex?
How does Brex leverage Retail?
And why have they stayed with you all these years?
So what's really interesting about Brex
is that they are a extremely operational heavy company.
And so for them, the quality of the internal tools
is so important because you can imagine
they have to deal with fraud, they have to deal with underwriting imagine they have to deal with fraud,
they have to deal with underwriting, they have to deal with so many problems basically.
They have a giant team internally basically just using internal tools day in and day out
and so they have a very high bar for internal tools.
And when they first started, we were in the same YC batch actually, we were both at Winter
17 and they were, yeah, I think maybe customer number five or something like that for us.
I think DoorDash was a little bit before them, but they were pretty early.
And the problem they had was they had so many internal tools they needed to go and build,
but not enough time or engineers to go build all of them.
And even if they did have the time or engineers, they wanted their engineers focused on building
external physics software because that is what would drive the business forward.
Brex mobile app, for example, is awesome.
The Brex website, for example is awesome, the Brex website for example is awesome, the Brex
expense flow, all really great external vision software.
So they wanted their engineers focused on that as opposed to building internal crud
UIs.
And so that's why they came to us and it was honestly a wonderful partnership.
It has been for seven, eight years now.
Today I think Brex has probably around a thousand Retool apps they use in production, I want to say every week, which is awesome. And their
whole business effectively runs now on Retool. And we are so, so privileged to be a part
of their journey. And to me, I think what's really cool about all this is that we've managed
to allow them to move so fast. So whether it's launching new product lines, whether
it's responding to customers faster, whatever it is, if they need an
app for that, they can get an app for it in a day, which is a
lot better than you know, six months or a year, for example,
having schlep through spreadsheets, etc. So I'm really,
really proud of our partnership with Brex.
Okay, retail is the best way to build, maintain and deploy
internal software, seamlessly connected databases, build with
elegant components, and customize with code.
Accelerate mundane tasks and free up time for the work that really matters for you and
your team.
Learn more at Retool.com.
Start for free.
Book a demo.
Again, Retool.com. We are here with David Kroshaw, CTO and co-founder of Tailscale.
David, welcome to the changelog.
Yeah. Oh, I'm not actually the CTO anymore.
Oh, no. Your LinkedIn is outdated.
Oh, does it still say that?
I thought I had updated it.
Are you masquerading, David?
That's real time LinkedIn updates here.
We can do it.
Let me check. LinkedIn updates.
I read it somewhere.
I usually. Me too.
Well, snap, I think show starts.
I think my LinkedIn, it might be confusing
because it still lists that I was the CTO.
I stepped back from the CTO last year.
Okay, so what are you doing now?
I am spending my time exploring sort of new product spaces,
things that can be done.
So both inside and outside of Tailscale.
So.
Very cool.
Most of my work inside Tailscale is around helping
on the sort of customer side, you
know, talking to users or potential users about how it can be useful.
And then because I have such an interest in sort of the world of large language models,
I've been exploring that.
But that is not a particularly good fit for the tailscale product. You know, I spent quite a long time looking for ways to use this
technology inside tailscale and like it doesn't really fit. And I actually think
that's a good thing. You know, it's really nice to find clear lines like that
when you find something where it's not particularly useful. And I wouldn't want
to try and you know, a lot of companies are attempting to make things work,
even if they don't quite make sense.
And I think it's very sensible of Tailscale
to not go in that direction.
Do you mean like deploying LLMs
inside of the Tailscale product,
or how do you mean it wouldn't fit?
Well, yeah, so what would Tailscale do with LLMs
is the question I was asking from a Tailscale perspective.
I think Tailscale is extremely useful for running LLMs is the question I was asking from a Tailsco perspective. I think Tailsco is extremely useful for running LLMs
yourself for a network backplane.
Right.
In particular because of the sort of surprising nature
of the network traffic associated with LLMs.
On the inference side, so you can kind of think about working with models from both
a training and an inference. These are sort of two sides of the coin here. And training is very,
very data heavy and is usually done on extremely, extremely high bandwidth, low latency networks,
extremely high bandwidth, low latency networks, InfiniBand style setups on clusters of machines
in a single room.
Or if they spread beyond the room,
the next room is literally in the building next door.
The inference side looks very different.
There's very little network traffic involved
in doing inference on models in terms of bandwidth.
The layout of the network is surprisingly messy.
This is because the nature of finding GPUs is tricky even still today, despite the fact
that this has been a thing for years now.
If you have...
Very tricky.
Yeah.
I feel I should try and explain it just because it's always worth trying to
explain things, but I'm sure you all know this, which is that if you're running a service
on a cloud provider that you chose years ago for very good reasons, all the cloud providers
are very good at fundamental services, but they all have some subset of
GPUs, and they have them available in some places and not others.
And it's never quite what you're looking for.
And if you are deciding to run your own model and do inference on it, you might find your
GPU is in a region across the country, or it's on a cloud provider that's different
than the one you're using.
Or your cloud provider can do it, but it's twice the price of another one you can get.
And this leads to people ending up far more
in sort of multi-cloud environments
than they do in sort of traditional software.
And so, Tailsco actually is very useful there.
So for users, I think it's a great fit.
But what does the product actually need
as like new features to support that?
And the answer is, it actually is really great
as it is today for that.
There's no specific AI angle that you can add to the product
and immediately make it more useful.
Yeah, I think that's right.
I mean, there are, we came up with some proposals,
but they're not exciting. Like they would be very much, we'd be with some proposals, but they're not exciting.
Like they would be very much, we'd be doing it
because corporate at headquarters told us
to find an angle for AI or something like that.
And like we as a startup have the option
of just not doing that.
And so we didn't.
Well, we should probably just claim
that Tailscale is a past and of course,
hopefully a future sponsor of change log
And that Adams a huge we're working on a scale. We're more current and brings it up often
But this is not a sponsored episode. In fact, well, first of all, we don't do sponsored guest appearances
But also I had no idea that you were a co-founder of tailscale when I read your blog post
That made me me either. I found it out afterwards. I was like, oh cool. That's great I didn't know we'd be talking about tailscale at all when I read your blog post that made me doubt you. I found it out afterwards. I was like, oh cool.
That's great.
I didn't know we'd be talking about
tail scale at all when I came here today.
So that we're both basically on the same page.
Yeah, there we go.
I still work there.
I'm just kidding.
Real time update.
I did double check LinkedIn.
You are correct.
It says 2019 to 2024 was CTO,
but you just see co-founder tail scale
and then CTO next to it and you move on.
And that's probably what Ab and I both did.
Same.
We didn't realize there was an end date
on that particular role.
Yeah, the nomenclature or usage on the metadata usage
on LinkedIn is the, their UI is like which date,
which month did you begin and it says presence.
So the assumption was there.
I didn't read your byline on the job role.
Maybe what I'll do is I'll put something new above it
and that'll make it clearer.
But I don't want to mislead anyone.
Fair enough, fair enough.
I also honestly don't check LinkedIn very often.
It's not a big part of my life.
And so it.
We usually check it right before a show
to make sure we get someone's title right,
which is why we're both eating crow right now
for getting it wrong.
But.
Well, to get back into the mood or the groove,
whichever you wanna call it.
Let's get into both.
Well, I'm a tail scale user as you know.
I just trimmed some machines
because I've been doing some more home labbing.
So I use tail scale really only in my home lab.
And thank you so much for this free tier because I
Don't want to give you any money
Honestly, I'm just kidding with you
I think you're amazing but like I got a I gotta put a hundred machines on my tail net before I have to pay you
Any money I got 18. There's no way I'm ever gonna pay you money based on your tears. Not mine. That's totally great
Which is that's by design. That's totally great. Which is, that's by design.
That's by design.
And I think, you know, one, thank you,
because it's let me be more network curious
and more home lab curious.
So you, as a corporation,
Tailscale have allowed me and enabled me
and so many others to do just that.
And that's so cool.
And I think that's,
I applaud you all for that choice by design.
Well thank you, that's excellent.
That being said, I mean, at the same time I gotta dig.
And it's not really a dig, it's just really,
tail scale's kinda boring in the fact that
I don't have to do much to make it work.
You know, I put it in, I tail scale up,
and I'm done.
Okay, you just work.
I never have to worry about you working
unless you're not up. You're not gonna crash some more, or what are you looking for? I'm just saying, like it's pretty boring. You know, unless I'm doing. Okay, you just work. I never have to worry about you working unless you're not
Oh, or what are you looking for? I'm just saying like it's pretty boring
You know unless I'm doing like serves or I'm sharing a disk across the network
I'm not doing that kind of stuff
But you know this whole
Multi-cloud shared GPU thing is super cool because you can have a tail net on top of a different network and share that GPU access
Which I'm assuming is what you meant by that Just so cool. Honestly it is I mean I love boring software
And so for me the fact that you're having a boring experience is perfect. Yeah, no surprises. No surprises
Yeah, it's a product that's designed to enable you to do more things not for you to spend your days having to configure it
It is so smooth. The dev X experience on this thing is bar none.
You know, I know my machines, I know where they're at, I know throughout a date.
It's pretty easy to do that kind of stuff.
And as an avid user and a lover of tailscale, again, not sponsored, just super passionate,
I can't see how an LLM would fit in the other.
I just can't see how you would work in AI to make the platform better.
I mean, I haven't thought about it deeply besides the, this 20 ish minutes so far
in the conversation, but I mean, give me some time and I might.
Yeah. If you can come up with anything, let me know.
Well, I'm very excited about the idea of it.
But software has to be in some sense, true to itself.
You have to think about its purpose when you're, when you're working on it and
not step too far outside that.
So I similarly wouldn't build a computer game
in a tail scale.
I don't think that would be a particularly,
you know, good fit for a product.
And I feel sort of-
It's like an Easter egg.
As an Easter egg would be great actually.
Like a little quiz game or something
built into the terminal.
You want to have a hundred machines in your tail net,
you get access to,
you unlock a secret machine name
that's on your tail net by default.
Or the Pro plan.
Yeah.
There you go.
Right, it can ask questions like
what is the oldest machine on your tail net?
Something like that.
Oh yeah.
So yeah, that would be a lot of fun actually.
There are some questions I would probably ask the tail net.
Like there are actually some things
I don't know about my tail net
that I could discover via a chat LLM interface.
So I mean, there are some things I can see some value in,
but I mean, does everybody want that or need that?
Maybe, I don't know.
Yeah, I don't know either.
I very much went looking for something
I would use features like that for,
and I didn't come up with anything.
If you do come up with anything,
again, I'd be very happy to hear about it.
Honestly, now that I'm thinking about it in real time,
you know, you have a column whenever you're on your admin
and you're on your machines dashboard, essentially,
you can see last seen or ones that are out of date.
And unless you're savvy,
you probably haven't enabled
Tails' skills ability to auto update.
Maybe you have, maybe you haven't.
I forget which machines I've done it on.
Like everyone I install again,
once I knew that update was there,
I do enable that, but sometimes I forget.
So I might be like, okay,
are there any of my machines
that haven't been seen in a while?
Are there any versions that are out?
Give me a list of the ones that are out of date
that I should probably concern with around security.
Because you're probably not emailing me
about my security concerns,
but my tail net knows which ones are too far out of date
if I have an auto updated.
That's true.
I think we did actually email customers once
about an out of date version
where we were concerned about security.
I think that has only come up once.
Mostly, keeping tail scale up to date
is sort of proactive good security practice.
Tell scale up to date is sort of proactive good security practice.
The, uh, it is fortunately not been a significant source of issues in part,
you know, due to Keft design, you know,
a lot of engineers work very hard to make it that way.
For sure. And you got a lot of amazing engineers there. Yeah.
It's a great team. I guess now I'm thinking about it. I do have some ideas.
Nice. I mean, I think this is the idea I have for notion as well. I use notion a lot a lot more
Anything where you have a platform where you can create your own things on top of it a tail net
You know one tail net is not the same as the next even though they operate the same the way I use mine I mean that'd be the way you use yours
It would be nice to have an interface where I can just ask, tail scale,
how to tail scale, basically.
Like I have an idea, I wanna create a new group
or a whatever, or I can be introduced to new features.
It's discovery.
And you're essentially, by not having your own,
you force people to go into the, you know,
public LLMs essentially, into the chat GPTs,
into the clods, into
the Olamas, into the deep seeks or whatever you might have out there. And if you can corner
the market and own your own, I think you'd be one better, sure. Cause you know your documentation
better too. You know, it's the deterministic nature of it is maybe non-deterministic, but
you can probably fine tune it a bit more to be more focused on your customer base. I'd probably ask my, my, uh,
tail scale LLM more questions if I, if I could.
Yeah. So that's,
there's like an interesting sort of meta question there about LLMs around how
many models there should be in the world from a sort of a consumer perspective in a sense,
because that's almost like, you know, you just, you know, consuming it like where, and
this sounds very similar to like the question of how does search work on websites, which
you could have asked 10 years ago or 20 years ago.
You know, do I use the Wikipedia search or do I go to Google and type in my search and maybe put
the word Wiki at the end to bring the Wikipedia links to the top?
Both of these are valid strategies for searching Wikipedia.
I honestly don't use the Wikipedia search and haven't in a while, so it may be amazing.
But I have as a consumer a general concept that the search systems on individual websites
are not terrific and Google is baseline decent.
And so as long as I'm searching public data, I would generally prefer the Google search.
I guess in a sense that's a less and less true statement every year because the large
chunks of websites are just
not public data anymore. Like you can't search Facebook with Google, right? I'll search Instagram
with it. You can't find a TikTok with it or anything like that. And so, um, the existence
of those, uh, I think they sometimes get called walled gardens says that we should have more fine tuned tools like that.
And there's just a lot of similarities there. So should
start up the size of tail scale, build customized models for that
for its users, I think is a sort of a big open ended question
around how the model space will evolve.
And I think, you know, my last year of working with models, fine tuning them, training them, carefully prompting them,
you can do more and more just with carefully structured
prompts and long contexts that you used to have to use
fine tuning to achieve.
But all of this, my sort of big takeaway is that they are actually extremely hard to turn
into products and to get those details right in a general sense for shipping to users.
They're actually quite easy to get going for yourself.
And I think if anything, more people should explore running models locally and playing
with them because they're a ton of fun and they
can be very productive very quickly. But in much the same
way that it's really easy to write a Python script that you
run yourself on your desktop, versus a Python script you ship
in production for users. LLMs have this huge sort of
complexity gap when it comes to trying to build products for
others. And so I agree that that sort of tooling would be fun and should exist.
I also think where we are today, it's quite hard for a team the size of a startup to ship
that as not part of their core product experience.
What if it enabled so much deeper and greater usage?
Because the one thing you want to do as a startup
or a brand like you are, I would imagine at least
from the outside is a deeper customer is better
than a shallow customer, right?
If I've only got a few machines, well, one, my affinity
and my usage is lower.
So maybe my value is lower, but if I'm deeply entrenched
in it's, it's as a result of great documentation
which you have, but docs, they are are good when you have a particular precision thing and you want to read and understand and discover how a feature works.
And they only go so far and sometimes or even out of date, just hypothesize in that whether or not like this, what would be required? One, in terms of a lift, one in engineering power and two,
potentially financial power.
And then two, what is that costing you by lack of more deep users
and, you know, shallow users in comparison?
I think that is exactly the right way to frame the question for a business.
And I don't know the answer to a lot of those questions.
I can talk to some of the more technical costs involved.
What the benefits would be to the company
is extremely open-ended to me.
Like I don't actually,
I can't imagine a way to measure that
based on talking to customers of Tailscale who deploy it,
thinking about the companies where,
and so to go back to something you said earlier about how you use it and you don't pay for
it, I think that's great because Tailscale has no intention of making money off individual
users.
That's not a major source of revenue for the company.
The company's major source of revenue is corporate deployments.
And there's a blog post by my co-founder Avery
about how the free plan stays free on our website,
which sort of explains this,
that individual users help bring tailscale to companies
who use it for business purposes
and they fund the company's existence.
So looking at those business deployments,
you do see a tail scale gets rolled out initially
at companies for some tiny subset of the things
that could be used for.
And it often takes quite a while to roll out for more.
And even if the company has a really good roadmap
and a really good understanding of all of the ways
they could use it, it can take a very long time to solve all of their problems with it.
And that's assuming they have a really good understanding of all of the things it can
do.
And the point you're making, Adam, that people often don't even realize all the great things
you can do with it is true.
And I'm sure a tool that helps people explore what they could do would have some effect
on revenue.
In terms of the sort of the technical side of it and the challenges,
one of the, there is several challenges in the very broad
sense, the biggest challenge with LLMs is just the enormous
amount of what you might call traditional non-model
engineering has to happen out the front of them
to make them work well.
It's surprisingly involved.
I can talk to some things I've been working on
over the last year to give you a sense of that.
Beyond that, the second sort of big technical challenge
is one of sort of Tailsco's core design principles
is all of the networking is end-to-end encrypted.
And the main thing an LLM needs to give you insight
is a source of data.
And the major source of data would be what is happening on your network, what talks to
what, how does it all work?
And that means that any model telling you how you could change your networking layout
or give you insight into what you could do would need access to data that we as a company
don't have and don't want.
And so we're back to it would have to be a product
you run locally and have complete control over,
which is absolutely, you know, those sorts of,
my favorite sorts of products are that, you know,
I like open source software that I can see the source code
for, compile myself, run locally.
That's how I like all things to be.
But trying to get there with LLMs in the state they are today
is actually, I think, pretty tricky.
I don't think I've seen an actually shipped product that
does that really well for people.
There's one.
There's a developer tool that I hear a lot of good talk
about that I don't.
I'm just trying to live search for it for you.
Nope, that's the wrong one.
That's magic shell history, which also sounds really cool.
I should use that one.
Is that A2N?
A2N, yeah.
That one's awesome.
Oh, you've used it?
Oh, great.
I'm a daily user, yeah.
No LLMs involved on that one.
Yeah, I thought that was the LLM.
There's another one that is in the sort of agent space
for developers as they're writing programs and it helps you.
It's like a local Claude effectively.
And it's primarily built around helping you construct prompts really carefully for existing
open models.
And it's come up several times and I'm sorry, it's falling out of my head.
I will look it up later.
I'm sorry.
But I hear very positive things about it. And that, that's the closest I've seen to sort of a shipped completely local product, uh,
uh, that does that sort of thing on which models to use. Uh, I think given the state
of models that exist today, open models, the major shipped open models are so amazing that it always makes sense to start with one
of those sort of models as a, if nothing else, as a pre-trained base for anything that's
happening.
Building a model from scratch is a very significant undertaking.
And I don't think it's necessary for most tasks.
The available open models are extremely general purpose. And so
at worst, you would be fine tuning from one of those to build a product. If you take one of the
llamas or I mean, there's a lot of talk about deep seek, which produces terrific results. It's
a very large model. It'd be very hard to start with it, though I understand there's some very good distilled work coming from it using other models.
Well friends, I am here with a new friend of mine, Scott Deaton, CEO of Augment Code.
I'm excited about this. Augment taps into your team's collective knowledge, your code base, your documentation, your dependencies.
It is the most context-aware developer AI, so you won't just code faster, you documentation, your dependencies. It is the most context aware developer AI.
So you won't just code faster.
You also build smarter.
It's an ask me anything for your code.
It's your deep thinking buddy.
It's your stand flow antidote.
Okay, Scott.
So for the foreseeable future, AI assisted is here to stay.
It's just a matter of getting the AI to be a better assistant.
And in particular, I want help on the thinking part, not necessarily the coding part.
Can you speak to the thinking problem
versus the coding problem
and the potential false dichotomy there?
A couple of different points to make.
You know, AIs have gotten good at making incremental changes,
at least when they understand customer software.
So first and the biggest limitation
that these AIs have today,
they really don't understand anything about your code base.
If you take GitHub Copilot, for example,
it's like a fresh college graduate,
understands some programming languages and algorithms,
but doesn't understand what you're trying to do.
And as a result of that,
something like two thirds of the community on average
drops off of the product,
especially the expert developers.
Augment is different.
We use retrieval augmented generation to deeply mine the knowledge that's inherent inside
your code base.
So we are a copilot that is an expert and they can help you navigate the code base,
help you find issues and fix them and resolve them over time much more quickly than you
can trying to tutor up a novice on your software.
So you're often compared to GitHub Copilot.
I gotta imagine that you have a hot take.
What's your hot take on GitHub Copilot?
I think it was a great 1.0 product,
and I think they've done a huge service in promoting AI,
but I think the game has changed.
We have moved from AIs that are new college graduates
to, in effect, AIs that are now among the best developers in your code base.
And that difference is a profound one for software engineering in particular.
You know, if you're writing a new application from scratch, you want a web
page that'll play tic-tac-toe piece of cake to crank that out.
But if you're, you're looking at, you know, a tens of millions of line code
base, like many of our customers, Lemonade is one of them.
I mean, 10 million line mono repo,
as they move engineers inside and around that codebase
and hire new engineers,
just the workload on senior developers
to mentor people into areas of the codebase
they're not familiar with is hugely painful.
An AI that knows the answer and is available seven by 24,
you don't have to interrupt anybody
and can help coach you through
whatever you're trying to work on
is hugely empowering to an engineer
working on unfamiliar code.
Very cool.
Well, friends, Augment Code is developer AI
that uses deep understanding of your large code base
and how you build software to deliver personalized
code suggestions and insights. A good next step is co-suggestions and insights.
A good next step is to go to augmentcode.com. That's A-U-G-M-E-N-T-C-O-D-E.com. Request a free
trial contact sales or if you're an open source project, Augment is free to you to use. Learn more at augmentcode.com. That's A-U-G-M-E-N-T-C-O-D-E.com.
Augmentcode.com.
So you've been using those in your day-to-day programming work
for the last year and back in early January,
you wrote this post, How I program with LLMs, which I found to be
refreshingly practical and straightforward, your findings.
You said you've been actively trying these things.
I feel like I've been passively trying them,
not really trying to optimize my setup,
but just like, you know, like a Neanderthal
kind of poking at a computer box, you know,
like, oh, does this work?
No, okay, for the last couple of years.
So I do use these things,
but I don't think as effectively as most or at least some.
And I loved your findings,
and of course you're building something as a result of it,
but can you take us on that journey over the last year or so
where you started with LLMs
and what you found
in your day-to-day programming?
Yeah, I don't think your experience is unusual, actually.
I think almost everyone has your experience.
And for most software, I am in the same category.
I try things at a very surface level when they're newish
and see if there's any really obvious way they help me,
and if they don't, I put them aside and come back later.
A great example of that is the Git version control system.
It was 10 years before I really sat down and used it.
I was using other version control systems.
After 10 years, I was like,
okay, this thing's probably going to stick around.
I guess I'll get over it's user interface.
Fine.
Get to this.
I was reluctant, but I got there in the end.
LLMs really struck me as fascinating.
I decided to, you know, I made this active decision
to not do that with them.
And like set out on a process of trying to actively use them,
which has involved learning just a really appalling amount.
Honestly, like it's very reasonable that most engineers haven't done really significant things with LLMs yet because it is
It's too much cognitive load
Like you know if you if you're writing computer programs, you're trying to solve a problem
All right
You only have so much of your brain available for the tools you use for solving problems because you have to fit the problem in
There as well and the solution you're building and that should be most of what you're thinking about.
The tools should take up as little space as possible.
And right now to use LLMs effectively, you need to know too much about them.
And that is my sort of big, that was my big takeaway, you know, 11 months ago or so, which
is why I started working on tools with some friends to try and figure this out.
Because there has to be a way to make this easier.
And my main conclusion from all of that
is there's an enormous amount of traditional engineering
to do in front of LLMs to get there.
So the first really effective thing I saw from LLMs
is the same thing I think most engineers saw,
which was GitHub Copilot, which is a code completion.
Also, actually, GitHub Copilot has which is a code completion, also actually
GitHub Copilot has taken on new meanings. It's more than that now, right? Yeah, it's
an umbrella brand that means all sorts of products and I actually honestly
haven't even used most of those products at this point. The original product is a
code completion system built into Visual Studio Code, where as you type, it suggests a completion for the line
or a few lines beyond that of where you are,
which is building on a very well-established paradigm
for programming editors.
Visual Studio 6.0 did this 25 years ago with IntelliSense for completing methods in C++.
This is not a new idea.
Around the same time, we had eTags for Emacs, or cTags, I should say, which gave us similar
things in the Unix world.
This is extending that idea by bringing out some of the
knowledge of a, of a large language model in the process of completing.
And I, I'm really enamored with the entire model.
Like, you know, co-pilots original experience, uh, but it came out was
magical, like it was just like, there was nothing like this before.
It was really, I think, jumpstarted a lot of interest in the space from people who hadn't
been working on it, which was almost all of us.
And from my perspective, the thing that really struck me was, wow, this works really well.
And wow, it makes really obvious silly mistakes.
It uses both sides of this.
It would suggest things that just don't compile in ways that are really obvious to anyone
who takes a moment to read it.
And it would also make really impressive cognitive leaps where it would suggest things that,
yes, that is the direction I was heading and it would have taken me several minutes to
explain it to someone and it got there immediately.
And so I spent quite a lot of time working on code completion systems with the goal of
improving them by focusing on a particular programming language.
And we've made some good progress there.
We actually hope to demonstrate some of that publicly soon, like in the next few weeks,
probably in this sketch.dev thing that we've been building.
We'll integrate it so that people can see it and give it a try.
But so those models are interesting because they're not the LLM experience that most users have.
Like when everyone talks about AI today, they talk about chat GPT or Claude,
or these chat-based systems. And the thing I really, really like about the original copilot
code completion model is it's not a chat system. It's a different user interface experience for
the knowledge and the knowledge. And that's really a lot of fun. And in fact, the technology is
in an LLAM and that's really a lot of fun. In fact, the technology is a little bit different too.
There's a concept in the model space called fill in the middle where a model is taught
a few extra tokens that don't exist in the standard chat model.
With fill in the middle, which is a lot of fun, a model is taught a few extra tokens.
It's taught a prefix token, a suffix token, and a middle token.
What you do is you feed in as a prompt to the model the file you're in.
All the characters before where the cursor is get fed right after a prefix token.
You feed in prefix, all the characters of the file,
then you feed in suffix,
and you feed in all the tokens after the cursor,
and then you feed in the middle token,
and then you feed in whatever goes into the middle
to complete it.
And that's the prompt structure for one of these models.
And then the model keeps completing the thing that you fed in, it writes the next characters.
And you train the model by taking existing code out there.
There's a few papers on how these models are trained because Meta published one of these
models and Google published one of these models under the Gemma brand.
There's a few others out there.
There's one from Quinn and some other derived ones.
And you take existing code files.
You choose some section of code.
You mark everything before it as the prefix, everything
after it as the suffix.
And you fill in everything after it as the middle.
And that's your training data.
You generate a lot of that by taking existing code
and breaking it up into these files randomly,
by randomly inserting a cursor.
Then you've taught a model how to use these extra characters
and how to complete them.
And so it's not a chat model at all.
It's a sort of a sequence to sequence model.
It's a ton of fun.
And the advantage of these systems is they're very fast compared to chat models.
And that's the key to the whole code completion product experience is you want your code completion
within a couple hundred milliseconds of you typing a character.
Whereas if you actually time Claude or you time one of the open AI models, they're very
slow.
Like they take a good minute to give you a result.
And there's a lot of UI tricks in hiding that minute.
They move the text around on the screen,
then they stream it in.
Streaming, yeah, it's just very clever.
Because you're reading it word by word as it comes out,
but it's like, it's basically stalling.
You're like, come on, just give me the answer already.
That's right.
Yeah, exactly.
You can really feel it with the new reasoning models.
Oh, one of these things, because this is this is pause at the beginning.
It's a thinking phase.
I'm like, come on, I don't.
And it tells you what it's thinking, which is cool.
But it's like, think faster.
I don't care what you're thinking. Give me the answer.
I think it's actually kind of cool when you see that.
You can come up. I mean, you get to see like,
I feel like this is the closest we've glimpsed into the future
Then we've ever been able to by watching the reasoning in in real time
You see you see the the act of reasoning that it's happening explains the reasoning users ask me this
I'm gonna think about that. Okay, I thought about that which causes this and it's like this step process and it reminds me of how I think too
So I'm like that's pretty dang cool
But it's also a great trick. I agree. Yeah, it's a it is a ton of fun to watch
I agree and it is a lot of insight into how the models work too
because the the insides of the models are a large number of floating point numbers holding intermediate state and
It's very hard to get insight into those.
But the words, you can read them.
You can make some sense of them.
Right.
So code completion is, I think, extremely useful to programmers.
It varies a lot depending on what you're writing
and how experienced models are with it
and just how sort of out on the edge of programming you are.
If you're really out in the weeds, the models
can get less useful. I used a model for writing a big chunk of AVX assembly a few months ago.
And the model was both very good at it and very bad at it simultaneously. And it was
very different from the typical asking a model to help with programming experience.
It would constantly get the order operations wrong, um,
or over complicate things or misunderstand while it was a,
it's a very different experience than, uh, than typical programming.
What model was this? How did you find it?
I used all of them for that. Okay. And this is what I meant by,
I'm spending a lot of time actively exploring the space.
Yeah.
I'm putting far too much work into exploring the model space as I do work.
It makes sense that there are specific models that are good for autocomplete versus search
versus chat.
But have you found the correct one for each particular subtask?
Or what's your advice there?
Is it like use them all or just stick with this,
you'll be good?
I can't advise people to use them all.
You know, that's too much work.
Use a bunch of them.
Yeah, and this I think is the big problem.
And you mentioned that most programmers
are probably using this.
As far as we can tell,
not one fifth of programmers are using these tools today.
How can you tell that?
Through surveys.
A couple of people have done surveys of programmers.
And it seems to come back that most people are not
using these tools yet.
Which is both shocking to me, because they're so useful,
and also makes a lot of sense.
Because it's a lot of work figuring out how to use them.
I have a an analogy that I'd like to share.
If you're a runner, you probably wear running shoes, right?
You're probably not going to run barefoot.
I think it's like admitting to running barefoot.
Like you wouldn't do that.
You would run a marathon with rocks on the road and debris and things like that. Barefoot versus running shoes designed to age you in the process of running to make it more speedy, comfortable, agile, etc.
I feel like that's where we're at.
Like I've I've changed my tune, let's just say, because I feel like it's not going to go away.
And to hear that one fifth I'm I haven't dug
into these surveys but that's surprising one fifth is using it and it seems like
I guess then the the four of the five are saying no or for the time for the
moment denying it I mean do either of you disagree with that analogy is it way
off or is it Jared that's why you kind of like shake your head a little bit what
your thoughts on that? Now I mean I don't think that these tools have come to the Do you disagree with that analogy? Is it way off or is it, Jared, that's why you kinda like shake your head a little bit.
What's your thoughts on that?
No?
I mean, I don't think that these tools
have come to the place that the running shoe has.
I also think there's probably plenty of world-class runners
who run shoeless and would never run with a shoe on,
because that's for fools.
But I'm not gonna go there.
Well, would you run a New York City marathon with no shoes?
I wouldn't be so foolish. That's a tri- New York City marathon with no shoes? I wouldn't be so foolish as to try a New York City marathon.
Okay.
You have to admit though that having the shoes on is probably better for you than worse for
you.
Well, at what point has the shoe proven itself to be useless?
Because these tools routinely prove themselves to not just be wrong, but dramatically wrong
in ways that if you follow them,
you will be like Michael Scott
who drives directly into the pond.
No, no, no, no, no, no, look.
It means go up to the right, bear right over the bridge
and hook up with 307.
Make a right turn.
Maybe it's a shortcut Dwight, it said go to the right.
It can't mean that, there's a lake there.
I think he knows where it is going.
This is the lake.
The machine knows.
This is the lake.
Stop yelling at me.
No, it's not the lake. Stop yelling at me. No, it's the lake.
Stop yelling.
There's no road here.
Oh, yes.
Because his voice assistant or his GPS tells him to keep going straight and he just keeps
going straight.
I concur on that one too.
I concur.
So I can see why you could get frustrated and throw up your hands and say, I'm going
to come back to this in a year or two years, but I'm going to let all the frontiers people
like David figure all the stuff out. In like David, figure all this stuff out.
In the meantime, I got code to write.
I can see a lot of people saying that.
I'm not, I could see myself saying that.
I haven't because I am curious
and I don't wanna fall behind.
But I still don't feel like this is a must have
for everybody today.
But there are moments where I'm like, that was amazing.
So, for sure.
I'm actively trying to fit it in into everything I do,
is I guess my perspective.
I'm actively, if it's Home Lab, it's that.
If it's contracts, agreements, proposals,
if it's thinking, if it's exploration, if it's coding,
if it's you pick it's your if it's kind of thing.
I'm trying to fit it in and I'm just, so I'm sitting down
on my bench, I got my socks on and I'm trying to put
the shoe on, let's just say.
You know, to kind of extend my analogy.
You're gonna wear it.
Yeah, I believe that, you know, I'm gonna put this shoe on,
I'm gonna wear it for every scenario that makes sense
because I can tell you I move faster, I think differently
when I'm in those modes.
Are they wrong?
Do I always check it?
Of course.
But I know that it's coming for almost everything we do.
Every task we do that's productive,
coding, thinking, writing, whatever,
it's coming for it in a positive way.
I mean, I totally agree that it is coming for it.
I also think it's very early days
and a great reason to not learn this technology today is
that it's changing so fast. Yeah. And that you can spend a very long time
figuring out how to make it work and then that can all all of that accumulated
skill can be sort of made useless tomorrow. Right. By some new product. If you
remember stable diffusion first dropped probably two years ago now.
And we were enamored with prompt engineering.
And what was that artist's name that you'd always,
if you added it to your stable diffusion prompt,
it automatically get awesome.
And then he got mad because everyone's using him
to like make better pictures.
Like that whole technology, you know,
that magical incantation is just completely moot
at this point.
Like this probably, it's easier now to get better pictures without being such a wizard.
And whatever name you're invoking in the past is just that name doesn't do what it did
on the last version of stable diffusion, just as one instance, like prompt engineering
has changed dramatically.
And anybody who is ignoring it all and just listening to us talk about on the change
log and just like staying with
Their regular life like they've saved themselves a lot of time
Then those of us who dove in and decided they were gonna memorize all the magic words
Yeah, absolutely like a year ago a common technique with open models that existed was to offer them money to solve problems
You start every prompt by saying I'll give you $200 if you do this. And it greatly improved outcomes.
Yeah, or let's take this step by step.
Like that phrase was one of those magical things
that made it better.
All of those techniques are gone now.
If you try bribing a model, it doesn't help.
There was a great example I saw of that
where someone would kept saying,
I'll give you $200 if you do this.
And they did it in a single prompt several times. and they got to the nth case and it said but you haven't paid me for the previous ones. Yeah
We had a deal
No
No money means no, there you go. All right, they're very funny
Well, so I spent a long time believing and I, I still believe this in the longterm, that chat is currently our primary user interface
on the models, and it's not the best interface
for most things.
The way to get the most value out of models today
when you program is to have conversations with models
about what you're writing.
And that's, I think, it's quite the mode shift to do that.
It's quite taxing to do that.
And it feels like a user interface problem that
hasn't been solved yet.
And so I've been working a lot with Josh Bleacher-Snyder
on these things.
And we spent a long time looking for how
can we avoid the chat paradigm and make use of models.
That's why code completion initially was so interesting because it's an example of using
models without chat and it's very effective.
We spent a long time exploring this to give you another example of something we built
in this space because we've just been trying to build things to see what's actually useful. We built something called Merd, merd.ai,
which I think we put up a few weeks ago.
And it does merge commits for you.
So if you try and push a git commit or do a rebase,
and you get a merge conflict, you
can actually use LLMs to generate sophisticated merge
commits for you.
It turns out that's a much harder problem than it looks.
Like you would think you just paste in all of the files
to the prompt and you ask it to generate the correct files
for you.
Even the frontier models are all really bad at this.
You almost never get a good merge commit out of them.
But with a whole stack of really mundane engineering out the front,
mundane is not the right word because a lot of it's actually really very
sophisticated, but it's not, it doesn't involve the LLM itself.
It's about carefully constructing traditional is a much better word.
Yeah. It's a,
you can actually get very good merge commits out of it.
And that user experience seems much better for programmers to me
that you could imagine that being integrated into your workflows
to the point where you send a PR, there's a merge conflict,
it proposes a fix right on the PR for you.
And in fact, we attempted a version of that
where there's a little Gitbot that you can at mention on a PR
and it sort of generates another PR based on it that
fixes the merge conflict for you.
And that sort of experience doesn't require the chat interface to be exposed to the programmer
to make use of the intelligence in the model.
And that is where I dream of developer tools getting so that everyone can use them without
having to learn a lot about them.
You shouldn't have to learn all the tricks for convincing a model to write a merge commit for you.
It should be a button, or not even a button. It should just do it when GitHub says there's a merge conflict.
And so it's actually, it works pretty well. We've seen it generate some very sophisticated merge commits for us.
I'd love to see more people give it a try
and let us know what the state of that is.
But so just because that is such a hard state to get to,
we built Sketch, which exposes the traditional chat interface
in the process of writing code.
Because we're just not,
we don't think the models
are at a point yet where we can completely get away
from chat being part of the developer's workflow.
So at what level of granularity is Sketch working at?
And do you imagine it moving up eventually,
wherever it is, because the panacea, right,
the silver bullet is what some folks
are trying to do with Devon, for instance,
where it's like, you describe at a very high level
a system, and it goes and builds that system.
V0 from Vercell is another one that's doing these things.
And they're very much at, in my opinion,
the prototype slash demo level of quality,
not the production level of quality, in their output.
And it seems like they're very difficult.
In my limited experience with these things,
they're very difficult to actually mold or,
what do you do?
I'm losing a word here.
A sculpt, I don't know, like a sculpture.
To actually like sculpt what they come out with
and change it into something
that you actually would write or like.
But those are like the very high level of like,
well it should have a contact form
that submits to this thing.
But maybe you're looking down more where the,
where I use them currently which is like,
yo write me a function that does this particular thing.
And at that level, it seems a lot easier
to even chat to if I have to.
I would rather not chat to it,
but spit out code that I could copy, paste, and modify,
versus being like, I'm gonna have to throw this away
and rewrite it.
Right, yes, I think that lines up really well
with the way Josh and I think about these things, where
today if you open up a model, a cloud provider's frontier model or a local deep seek or even
a Llama 70B, you can ask it to write a Python script that does something.
It could be a Python script to go to the GitHub API and grab some data and present it neatly for you.
And it will do a great job.
These great models can basically do this in a single shot
where you write a sentence and the outcomes of Python script
that solves a problem.
And like that's an astonishing technical achievement.
I really, it's amazing how quickly I've got used to that
as a thing, but.
Yeah, you're not even impressing me right now. I know. Yes, it can do that. Exactly. It is amazing how quickly I've got used to that as a thing. Yeah, you're not even impressing me right now.
I know.
Yes, it can do that.
We all know.
Exactly.
But it is amazing.
Exactly.
Like five years ago, if you told me that, I would struggle to believe it.
And yet now I just take it for granted.
Yes.
And so that works.
We've got that.
We've got a thing that can write really basic Python scripts for us.
Similarly, these systems, at least the frontier models, are good at writing a small React
component for you.
You can give almost any of them like a...
You need more than a single sentence.
You need just a few sentences to structure the React component, but out comes some HTML
and some JavaScript in the React syntax, the JSX syntax, or the TSX syntax.
And it's pretty close.
It might need some tweaking.
You might have some back and forths to get there, but you can get about that out of it.
And clearly, models are going to improve.
There's no evidence to suggest we're at the limit here as the models keep improving every
month at this rate.
And part of what we're interested in Sketch
is getting beyond helping you write a function,
which I also use today, right?
I get for Frontier Models to write functions for me,
to sort of, how can we sort of climb
the complexity ladder there?
And so the point we chose is a point that, you know,
is comfortable for us and what is helpful for us
is the Go package. How can we get a model to help us build a Go package to solve a problem?
And there's an implicit assumption here in that the shape of Go packages looks slightly
different at the end of this. Packages are a little bit smaller and you have a few more
of them than you would in a sort of traditional go program you wrote
by hand.
But I don't think that is necessarily a bad thing.
Honestly, my own programming, as a go program, I tend to write larger packages because there's
a lot of extra work involved in me breaking it into smaller packages.
And there's often this thought process going on in my mind of like, oh, in the future,
this would be more maintainable as more packages.
But it's more work for me to get there today.
So I'll combine it all now and maybe refactor it another day.
And switching to trying to have LLMs write significant chunks
of packages for you makes you do that work up front.
That's not necessarily a bad thing.
It's perhaps more the way we'd like our code to end up.
And so Sketch is about taking an LLM
and plugging a lot of the tooling for Go
into the process of using the LLM to help it.
So an example is I asked at the other day
to write some middleware to broadly compress HTTP responses
under certain circumstances because Chrome can handle broadly encoding and it's very
efficient. It's not in the standard library, at least it wasn't the last time I looked.
And the first thing it did was it included a third party package that Andy had written
that has a broadly encoder in it.
And so Sketch Go gets that in the background in a little
container as you're working and has a little Go mod there
that modifies so that as you're editing the code,
you get all the code completions from that module,
just like you would in a programming environment.
And more importantly, we can take that information
and feed it into the model as it's working. If we run the Go build system as part of it, and if a build error
appears, we can take the build error, feed it into the model. It's like, here's the error,
and we can let it ask questions about the third-party package it included, which helps
with some of the classic problems you see when you ask Claude to write you some go code
where it includes a package and then makes up a method
in there that doesn't exist that you really wish existed
because it would solve your problem.
And so this sort of tool, automated tool feedback
is doing a lot of the work I have to do manually
when I use a frontier model.
And so I'm trying to cut out some of those intermediate steps
where I said, that doesn't exist,
could you do it this way?
Anything like that you can automate saves me time,
it means I have to chat less.
And so that's the goal is to slightly climb
the complexity ladder in the piece of software
we get out of a frontier model
and to chat less in the process.
Are you achieving that by having a system prompt
or are you actually fine tuning?
Like how are you as the sketch.dev creators
taking a foundation model and doing something to get here?
Today it is almost entirely prompt driven.
There's actually more than one model in use under the hood
as we try different things.
For example, we use a different model for solving the problem of
if we want to go get a package,
what module do we get to do that? Which sounds like a mechanical process, but it actually isn't.
There's a couple of steps there. So a model helps out with that. There's very different sorts of
prompts you use for trying to come up with the name of a sketch, then there are for answering questions. But at the moment, it's entirely prompt driven
in the sense that a large context window
and a lot of careful context construction
can handle this, can improve things.
And that can include a lot of tool use.
Tool use is a very fun feature of models
where you can instruct.
So to back up and give you a sense of how the models work, an LLM generates the next
token based on all the tokens that come before it.
When you're in chat mode and you're chatting with a model, you can at any point stop and
have the model generate the next token.
It could be part of the thing you're asking it or its response.
That meta information about who is talking is sort of built into just a stream of tokens.
So similarly, you can define a tool that a model can call.
You can say, here's a function that you can call and it will have a result.
And the model can output the specialized token
that says call this function, give it a name,
write some parameters.
And then instead of the model generating the next token,
you pause the stream, you the caller go and run some code.
You go and run that function call that it defined,
paste the result of that function call in
as the next set of tokens and then ask the
model to generate the token after it.
So that technique is a great way to have automated feedback into the model.
So a classic example is a weather function.
And so you define a function which says current weather.
The model, then you can ask the model, Hey, what's the weather?
And the model can say, call function, current weather, your software that's printing out
the tokens pauses calls, current weather says sunny, you pay sunny in there.
And then the model generates the next set of tokens, which is the chat response saying,
Oh, it's currently sunny. And that's the sort of easy
way to plug external systems into a model. This is going on under the hood of the user interfaces
you use onto Frontier models. So this is happening in chat, GPT, and Gordon, all these systems.
Sometimes they show it to you happening, which is how you know. You see it less now, but
Sometimes they show it to you happening, which is how you know. You see it less now, but about six months ago, you could see in the GPT-4 model, you
would ask it questions and it would generate Python programs and run them and then use
the output of the Python program in its answer.
I had a really fun one where I asked it how many transistors fit on the head of a pin.
And it started producing an answer and it said like, well, transistors are about this
big, pins are about this big.
And so I guess the magic little emoji appeared that this means this many transistors fit
on the head of a pin, some very large number.
And if you click on the emoji, it shows you the Python program it generated to do the
arithmetic.
It executed that as a function called
came back with the result.
And that saved it the trouble of trying to do
the arithmetic itself, which all of Amazon
notoriously struggle with doing arithmetic.
This is a great thing to outsource to a program.
And so-
It's a funny work around, because you know,
if you're a calculator for words,
you're not necessarily a calculator for numbers.
Yeah, they're much better.
And you can't do those reliably,
then you could just write a program that does it
and returns the same thing every time.
Yes, they're very good at writing programs
to do the arithmetic, very bad at doing the arithmetic.
So it's a great compromise.
The thing we do with Sketch is try to give
the underlying model access to information
about the environment it's writing code in using
function calls.
So a lot of our work is not fine-tuning the model.
It's about letting it ask questions about not just
the standard library, but the other libraries it's trying
to use so that it can get better answers.
It can look up the Go Doc for a method
if it thinks it wants to call it,
use that as part of its decision-making process
about the code it generates.
Can you describe let it ask?
I mean, you've said a couple of times
that I've been curious about this.
When you say let it ask, what does that mean?
Like decompress that compressed definition.
So at the beginning in your system prompt
or something like your system prompt,
depends on the API on exactly how the model works,
you say there is a function call which is get method docs
and it has a parameter which is name of method.
And in the middle of, and then you can ask a,
you can construct a question to an LLM that says,
generate a program that does this
with the system prompt which explains that there's a tool call there.
And so as your LLM is generating that program,
it can pause, make a system call,
make a function, a tool call that says,
get me the docs for this.
And so the LLM decides that it wants to know something
about that method call.
And then you go and run a program,
which gets the result,
gets the documentation for that method
from the actual source of truth.
You paste it into the prompt.
And then the LLM continues writing the program,
using that documentation as now part of its prompt.
And so this is the model driving the questions
about what it wants to know about.
And just blocks and waits for that to come back.
Yes.
Effectively.
Yeah.
Yeah, so it's like an embed.
If you step back to like running Lama CPP yourself
or something like this,
you can sort of oversimplify one of these models
as every time you want to generate a token,
you hand the entire history of the conversation you've had
or whatever the text is before it to the GPU
to build the state of the model.
And then it generates the next token.
It actually generates a probability value for every token in its token set.
And then the CPU picks the next token, attaches it to the full set of tokens,
and then does that whole process again of sending over the entire conversation
and then generating the next token.
And so if you think about that very long, that very long, big giant for loop around the outside
of every time there's a new token,
the token is chosen from the set of probabilities
that comes back is added to the set.
And then a new set of probabilities
is generated for the next token.
You can imagine in the middle of that for loop,
having some very traditional code in there
that inserts
a stack of tokens that wasn't actually decided by the LM, but then become part of the history
that the LM is generating the next token from.
And so this is, that's how those embeds work.
You can effectively have the LM communicate with the outside world in the middle there
by it driving that, or you don't even have to have it drive it.
You could have software outside the LM that looks at the tokens set as,
as it's appeared and then insert more tokens for it.
So this is all the fun stuff you can do by running your models yourself.
Yeah, I know. That's so fun.
Well, friends, I'm here with Samar Abbas, co-founder and CEO of Temporal.
Temporal is the platform developers use to build invincible applications.
But what exactly is Temporal?
Samar, how do you describe what Temporal does?
I would say to explain Temporal is one of the hardest challenges of my life.
It's a developer platform and it's a paradigm shift.
I've been doing this technology for almost like 15 years.
The way I typically describe it, imagine like all of us
when we were writing documents in the 90s,
I used to use Microsoft Word.
I love the entire experience and everything,
but still the thing that I hated the most is
how many documents or how many edits I have lost
because I forgot to save
or like something bad happened and I lost my document.
You get in the habit when you are writing up a document back in the 90s to do control S.
Literally every sentence you write. But in the 2000s, Google Doc doesn't even have a save button.
So I believe software developers are still living in the 90s era.
Where majority of the code they are writing is there some state which
needs to live beyond multiple request response.
Majority of the development is load that state, apply an event and then take some actions
and store it back.
80% of the software development is this constant load and save.
So that's exactly what temporal does.
What it gives you a platform where you write a function and during the execution of a function of failure happens,
we will resurrect that function on a different host
and continue executing where you left off
without you as a developer writing a single line of code
for it.
Okay, if you're ready to leave the nineties
and build like it's 2025 and you're ready to learn
why companies like Netflix, DoorDash and Stripe
trust Temporal as their secure,
scalable way to build invincible applications.
Go to temporal.io, once again temporal.io.
You can try their cloud for free
or get started with open source.
It all starts at temporal.io.
Is Go particularly well suited for this kind of tooling
because of the nature of the language or is it just your favorite or why Go?
Yeah, that's a really good question.
The best programming language for LLMs today is Python and I believe that is a historical artifact of the fact that all of the researchers working on generative models
work in Python.
And so they spend the most time testing it with Python and judging a model's results
by Python output.
There was a great example of this in one of the open benchmarks I looked at, and I believe
this has all been corrected since then. This is all about a year old. There was a multi-language
benchmark that tested how good a model is across multiple languages. I opened up the
source set for it and looked at some of the Go code, because I'm a Go programmer, and
it had been machine translated from Python
so that all of the variable names in this Go code
used underscores instead of camel case.
And the models were getting a certain percentage success
rate generating these results.
So Josh went through, actually, and made
these more idiomatic
in the go style of using camel case and putting everything
in the right place.
And the model gave much better results on this benchmark.
And so that's an example of where languages beyond the basic ones
that the developers of the models care about
are not being paid as much attention to as what you would like. And things are getting a lot better there.
The models are much more sophisticated.
The teams building them are much larger.
They care about a larger set of languages.
And so I don't think it's all as Python centric as it used to be.
But that is still very much the first and most
important of the languages.
As for how well Go works, it seems to work pretty well.
Models are good at it by our benchmarks. Like we said, if we took the benchmarks and made them more
Go-like, the models actually got better results. They have a real tendency to understand the
language. We think it's a pretty good fit. There are definitely times when models struggle,
but it's a garbage collected language,
which helps because in just the same way the garbage collection reduces the cognitive load
for programmers as they're writing programs, it reduces the load on the LLM in just the
same way.
They don't have to track the state of memory and when to free it.
So they have a bit more thinking time to worry about solving your problem.
So in that way, it's a good language.
It's not too syntax heavy, but it's also,
it doesn't have ambiguities that humans struggle with.
Yeah, it seems to work well.
Pretty small.
Yeah.
There aren't a lot of,
I don't, I haven't seen much research
into what is the best language for an LLM.
It does seem like an eminently testable thing.
Like there's some interesting, in fact, it may end up influencing programming language
design in a sense of imagine you are building a new programming language and you develop
a training set that's automatically generated based, translating some existing programs into your
language and you train models for it.
You could imagine tweaking the syntax of your new language, regenerating the training set,
and then seeing if your benchmarks improve or not.
So you can imagine, yeah, you can imagine driving readability of programming languages
based on your ability to train an LLM to write this language.
And so, you know, there's lots of really fun things that will happen long-term that, you of programming languages based on your ability to train an LLM to write this language.
So there's lots of really fun things that will happen long term that I don't think anyone
started on work like that yet.
Right, so the level that you all are working at with Sketch
with Go in particular, is the prompting you're doing
and the contexting and everything else that you're building,
is it at a layer of abstraction where you could replace Go relatively easily
with insert general programming language?
Or is it like, well, that would be a new product
that we would build?
Like how hard is that?
Yeah, it's a good question.
It's all of the techniques we're applying in general,
but they are all very,
each technique requires a lot of Go specific implementation.
So in much the same way that like a lot of the techniques inside a language server for
a programming language, these are the systems inside VS code for generating information
about programming languages.
The techniques are general for like what methods are available on those objects are very similar
in Go as they would be in Java, for example.
But the specifics of implementing them for both languages are radically different. And
I think it's a lot like that for Sketch. The tricks we're using for Sketch are very Go
specific. And if we wanted to build one for Ruby, we would have to build something very,
very different.
Okay.
So yes, I consider it very much a Go product right now, and I really like that focus that that gives us.
Because Go is a big enough problem on its own,
let alone all of programming.
Yeah, yeah, yeah.
I'm just asking that because I wonder how valuable
and important tooling like this would be
for each language community to either provide or fund
or hope that somebody builds because if the LLM related
tooling for Go, because of Sketch just hypothetically
becomes orders of magnitude more useful
than just talking to chat GPT about my elixir code
for instance, well that's a real advantage for Go
and the Go community, I mean it's great for
productivity for gophers. And going back to maybe the original question about you know should
Tailscale have its own little chat bot built into it? Like does each community need to take up this
mantle and say we need better tooling or is it like VS Code should just do it for everybody?
or is it like VS Code should just do it for everybody? I mean that's a really good question.
Good job, Drew.
Yeah, so to, you know, I very much admire VS Code.
I use it, which I don't actually have to admire
a program to use it.
That's better than Ed Meiering is using, I think.
Yeah, that's right, but I actually, I do both.
Like I both admire it and use it.
Okay, fair.
But to look at the inside of VS Code, which I've been doing a bunch of recently,
VS Code didn't actually solve language servers for all programming languages. They built
JavaScript and TypeScript, JSON, and I think they maintained the C-Sharp plugin. They started the Go plugin, I think,
and then it got taken over by the Go team at Google, who now
maintain the Go support in VS Code.
I don't think the Microsoft team built the Ruby support in VS
Code.
I don't know who did the Python implementation.
But a lot of the machinery in VS code is actually
community maintained for these various programming languages.
And so I'm not sure there is another option than imagining a world where each of these
communities supports the tooling in some form.
I don't know if each programming language needs to go out and build their own sketch.
Maybe there is some generalizable intermediate layer, some equivalent of a language server
that can be written to feed underlying models.
Given our...
We're just starting to explore this space.
Sketch is very new.
We basically started it some time near the end of November, so there's not much to it
yet.
Yeah. at some time near the end of November. So there's not much to it yet. Yeah, but so far what we've found
is it's far more than the sort of language server environment
that you get with VS code.
More machinery is needed to really give the LLM
all the tooling it needs.
The language server is very useful.
We actually use the Go language server in Sketch.
Go Please is a big part of our infrastructure.
It's really wonderful software.
But there's far more to it than that.
To the point where we need to maintain an entire Linux VM
to support the tooling behind feeding the model.
So what each community needs to provide,
I think that's the research in progress,
is figuring that out.
Yeah.
It's an interesting question
and one that I think will be open for a while.
I do not wanna see a world where Python continues
to proliferate merely because of its previous position.
I do see with tooling like Devvin and Bolt and v0, these are very front-end
JavaScript-y companies that are producing these things, which is fine. But it's like,
if you are just going to go use that, it's going to produce for you a React and Next.js
front-end with a Prisma based back-end. It's all very much like these are the tools it does.
And that's all well and good,
but that's gonna proliferate more and more
of that one thing.
Where I love to see a diversity where it's like,
yeah, is there a specific thing for Rails people?
Is there one for people who like Zig,
moving outside of the world of web development.
But you know what I'm saying,
and I think your answer might be right,
which is like, well, every community's gonna have
to provide some sort of greasing of the skids
for whatever editor is popular or used
in order to make their tooling work well
inside of these LLM-based
helpers beyond just being like ChatGPT knows about you,
which is kind of like what people are at right now,
is like, does ChatGPT know about me?
It's the new, am I Googleable?
It's the new SEO at this point.
I've heard people talk about that.
A startup founder, who I wouldn't know him,
mentioned that they were busy retooling their product
so that the foundation models under things like V0 and Volt
would be more likely to NPM include their package
to solve a problem.
That's super smart to do that right now.
I agree.
Did they devolved any of the how?
What are the mechanical steps that you do that?
I was actually really happy that they said
that their plan was to make it really easy
to NPM higher package and not require a separate signup flow
to actually get started.
Oh, that's nice.
Yeah, I've thought it was wonderful.
Like their solution to make their product more chat GPTable,
I guess you might say, is just make their product better.
Which, you know, if that's.
How avant-garde of them.
Yeah.
Yeah.
I'm sure one day we'll end up
in the search engine optimization world
of Frontier models, but today.
It's definitely gonna be some black magic for sale.
You know, here's how you really do it.
Yeah.
I don't see why a frontier model couldn't
run an ad auction for deciding what fine tuning
set to bring in to.
I had to, again, to talk about experiences,
I was using one of the voice models
and talking to it as I was walking down the street.
And I asked it some question about WD-40
because I had a squeaky door. And I think I described in my question WD-40 as a lubricant. And it turns out I just
didn't understand that it's not a lubricant, it's a solvent. And the purpose of it is to
remove grease.
It took me years to realize that. I think someone finally told me because I've been
using it as a lube all these years.
Yeah.
Oh my gosh.
Why do you got to keep reapplying it?
You know, it's not very good lube.
Well, I just had your experience, but it was an LLM that told me.
Oh, hilarious.
And it mentioned in passing, it's like, yeah, you could also, you know, you could use WD-40
and then use a lubricant like, and then it listed some brand name.
At the moment, I heard the brand name.
I was like, oh, I see a Frontier model could run an ad auction
on fine tuning which brand name to inject there.
That would be a really-
100%.
Yeah, it wouldn't require doing it into the pre-training
months ahead of time.
You could do that sort of on an hour by hour basis.
So that world is coming, and then once there's a world
of ads, there's a world of SEO and all the rest of it.
Well the more paramount they become,
and Adam you can probably speak to this
because you're injecting it into every aspect of your life.
Like if the answer includes a product right there,
like you're just gonna be like,
all right I gotta get that.
Sometimes you don't even realize Kleenex is a product.
You think that that's a category,
but no that's a product.
Yeah, absolutely.
Hard to tell, honestly. Kleenex is an easy one for me because we don't have Kleenex in Australia where I'm from it
So I came here and started calling tissues Kleenex and it was a bit of a bit of a surprise to me
It's like coke or coca-cola something like that. You know yeah, right?
Exactly. Yeah, you know I don't know if
I've gotten some hallucinations, let's just say on products
and even limited information of what's the true
good option when it comes to product search.
I haven't done a ton of it, mainly on like the motherboard search.
I want to do something that has the option for either an AMD
to something that has the option for either an AMD rise in a thread ripper with, you know, more of a workstation enterprise class CPU.
And I want to maximize some PCI lanes.
So I'm just trying to like figure out what's out there.
I'd prefer the chat interface to find things versus the Google interface, which is search
by nature to find things.
But thus far, it hasn't been
super fruitful. I think eventually it'd be cool, but it's not there yet.
I imagine in the YouTube video of this, a little Intel Xeon banner will appear just
as you say.
That's right. Yeah. On the ad.
So yeah.
Yeah, exactly.
So I'm a fan of Intel Xeons too. I got the Intel Xeon 4210.
Well, now it's really popping up.
There you go. Bing, bing, bing. It's like in Silicon Valley when Dinesh was
talking out loud and they had the,
I think they had AI in the VR and it was,
yeah it was doing some cool stuff.
It was pulling up ads real time.
It was cool, it was cool.
But yeah, Intel's cool, AMD's cool,
but PCIe lanes are even cooler, you know?
Give me the max, buy 16, you know?
David, maybe we close with this.
For those who aren't gophers out there, of course,
brand new, hot off the press, still in development,
three months old, sketch.dev, check it out if you're into Go
and those kind of things.
But let's imagine you're just a Ruby programmer out there
and you came across your blog post
about what you've been doing,
these three methods of working with AIs.
You have autocomplete, you've got chat, you've got search.
Where should folks get started if they haven't yet?
First of all, is today the day?
Like is it worth it now?
Or should I wait?
And then secondly, if I am gonna dive in
and I just wanna use it in my local environment
to like just code better today, what would you suggest?
Yeah, good question, especially for non-Gophers.
I would suggest trying out the code completion engines
because they take a little bit of getting used to,
but not a lot.
And depending, if you're writing the sorts of programs
they're good at, they're extremely helpful.
They save a lot of typing. And it turns out, I was surprised to learn this,
but what I learned from Code Completion Engines is a lot of my programming is fundamentally typing limited.
There's only so much my hands can do every day. And they're extremely helpful there.
The state of Code Completion Engines is they're pretty good at all languages,
whether caveat that they're probably not very good at COBOL or Fortran, but all the sort of
general languages, especially like Ruby, I'd expect them to be decent at. I suspect the world
of code completion engines will get better at specific languages as people go deeper on the
technology.
It's a thing I continue to work on and so I feel confident that it can be improved.
The other place that I think most programmers could get value today,
if they're not a Go programmer, is
writing small isolated pieces of code in a chat interface.
So you could try out a chat GPT or a Claude,
or if you really want to have some fun,
run a local model and ask it to solve problems.
Like try, try Llama CPP, try Ollama,
try these various local products,
grab one of the really fun models.
It's especially easy to try on a Mac
with their unified memory.
If you're on a PC, you might have to find a model that fits in your GPU.
But it's a ton of fun, and use it to, say, write me a Ruby function that takes these
parameters and produces this result.
And I suspect the model will give you a pretty good result.
So those are the places I would start because those require the least
amount of learning how to hold the model correctly and you'll get the most benefit
quickly. Love it. Good answer. I'm just wondering out loud and feel free not to
know but when it comes to prompting I know we're past the age of magical
incantations but as you guys have been building
out a product, which is basically sophisticated prompting,
are there guides that are useful or are there like,
I remember finding a site, I can't remember right now,
there's like, people are just sharing their system prompts
for certain things they do, like,
maybe there's like a Ruby prompting guide,
which makes it a little bit easier
to get quality results out faster. Does either of you guys know?
I've seen people write guides like that.
I would say the guides I've read are now out of date.
Like we were saying earlier, guides go out of date.
The thing I find most useful is to think of
the model I'm talking to as someone who just joined the company. Sometimes I think of them as an intern, though every now and again the models produce much
better code than I can. But interns have done that too. That happens. And then as you're
writing the question for it, imagine it as
like, here's a, you know, I'm talking to a smart person who knows nothing about what
I'm doing and they need some background.
And that gets me really far with the current frontier models.
And so that would be my general piece of advice that I think applies to any programming.
I agree with that, too. The random just give me X is you'll get a result,
but you'll have to massage it further.
You have to it'll ask you more.
I will often give it context.
Like you said, this intern, the smart person that's new to context,
they don't have the background awareness that that you want somebody to have.
And I'll often like give it a lot of that,
have a particular request, but then also say,
is there anything else I can give you
or any other information you need
to give me a successful way,
just some version of like be successful with our goal.
And it's strange even talking like that too,
as I even say it out loud, like our goal,
as if it's, you know, human.
And Jared, you know where we stand on this,
but Jared and I have some history,
so I've been very kind, please and thank you.
He's very nice, he talks to me like
we're on the same team and stuff.
If it gives me a great result, I say fantastic, you know,
I'm like, you know.
High fives.
Why would I, you have high fives,
why would I be any different?
You offer it money.
No, I haven't done that yet, I gotta try that. I gotta try that, honestly.
But I will ask it, like,
will, do you, could you be more successful?
Is there anything else I can give you?
Any more information I can give you
to get us to our goal, you know?
And I've found that that's it, like, it's context.
It's a full circumference of the problem set,
as much as you can, that makes sense.
And then you will have a more fruitful interaction.
I will also say that I'm more exclusively using ChatGPT.
And so the O1 model, while it's expensive,
let's just say I don't have the expensive plan.
I still don't feel like I can be the person
that spends 200 bucks a month on this thing.
I would much rather buy a GPU than spend 200 bucks a month.
Somehow that math makes more sense to me.
But then like, O1's been pretty successful
with thinking and iterating and being more precision,
whereas 4.0 was a bit more brute.
But that's just my personal take.
I think you might be onto something
with being nice to models.
I caught myself being pretty curt with models
a few months back and discussing this a lot with Josh,
he mentioned the conclusion we came to was,
one of the challenges of not being nice to models
is it sort of trains you to not be nice to people.
Yeah.
Because you're using all of the same tools. And so it might just be good for you not being nice to models is it sort of trains you to not be nice to people. Yeah. You're using all of the same tools.
And so it might just be good for you to be nice to models.
I mean, I just don't, if it's humanistic even, you know, similar to a human, why not just
be kind?
You know, why not?
I hate to break it to you, Adam, but this is not similar to a human.
It's the iteration is, it is, it certainly is.
If I were collaborating, if that was a human over there
giving me the answers back, it would be very human,
volleyball iterative.
If.
If it was.
Right.
I get that it's not, but I'm also like, like David, why not?
I think I'm somewhere between your two positions
because I do think it's just a machine and it's just a tool.
I don't think it's human.
Don't let me think that.
Well, you kind of just said that you think it is.
Did I?
I mean, is that what, is I interpreted it?
I just meant that if it,
maybe what I mean by that, just to be more clear,
is kind of keying off of David's,
which is like being kind.
Just why not?
I don't know.
I'm like overly like, thank you so much, you're amazing.
I think it's just, I'm a kind person.
When it got you a result in the search engine,
like you just.
Well this, it is not a prompt where there's an ebb and a flow
or a back and a forth.
You know, it's just simply return and answer.
Yeah.
Well, how are we doing?
You know, ask you at the end, how are we doing?
You know, I'm not, I, that being said,
I'm not like, thank you very much.
I'm just, I'm just antagonizing him at this point.
Yeah, I know you are.
You're really digging into me, but I,
I can catch myself saying like, that's awesome
or great job or I agree with that. Things, isms like that's awesome or great job or yeah
I agree with that things isms like that like you would say to another person if that makes you feel more like a nice human
Then you should just keep on doing that but I don't think it's doing anything for the computer
I don't think it is either it's actually costing resources
It doesn't help the computer
Thank you to the models now so that I remember to say please and thank you to humans You know, I don't want to get into the habit of training yourself. Exactly. It's all about training. That's fair. Self training.
Yeah, I don't feel like I have to.
I feel like it's just a natural atomism.
How I do things, how I operate yourself.
It's who I am in my core.
I'm a kind person.
Well, David, thanks so much for joining us, man.
Thanks for sharing all your knowledge.
You've learned a lot and I've learned a lot from you.
So I'm excited to be here.
I'm excited to be here.
I'm excited to be here.
I'm excited to be here.
I'm excited to be here. I'm excited to be here. I'm excited to be here. I'm excited to be here. I. Don't be yourself. It's who I am in my core. I'm a kind person. Well, David, thanks so much for joining us, man.
Thanks for sharing all your knowledge.
You've learned a lot and I've learned a lot from you.
So we appreciate your time.
This was a ton of fun.
Thanks for having me.
Okay, so hopefully hearing David's experience has helped you on your programming with LLM's
journey.
I'm sure you have thoughts on the matter.
Let us hear them in the comments.
Yes, there is a Zulip topic dedicated to this episode
and I'm sure there's lots of insightful things
being posted by ChangeLog community members
right after they listen.
There's a link in your show notes
so you can see what's up and join for $0
at changelog.com slash community.
Let's give one more shout out to our sponsors
of this awesome conversation.
Thank you to Fly.io, to Retool,
retool.com slash changelog, to Temporal,
find them at temporal.io, and to Augment Code.
Head to augmentcode.com to get started.
And thanks of course to our Beat Freak in residence,
the one, the only, Breakmaster Cylinder.
Yeah, I like beats.
Oh, and we have a little bonus
for our favorite kind of listener.
That's a ChangeLog++ listener, of course.
Stay tuned, friends.
We go even one layer deeper
on what a potential tail scale AI might look like.
If you aren't a Plus Plus member,
head to changelog.com slash plus plus today.
Right now even, you're free right now,com slash plus plus today. Right now even.
You're free right now, aren't you?
Okay, that's it.
This one's done, but we'll talk to you again on Change Login Friends on Friday.
Bye y'all. So
I'm out.