SemiWiki.com - Video EP7: The impact of Undo’s Time Travel Debugging with Greg Law
Episode Date: May 30, 2025In this episode of the Semiconductor Insiders video series, Dan is joined by Dr Greg Law, CEO of Undo, He is a C++ debugging expert, well-known conference speaker, and the founder of Undo. Greg explai...ns the history of Undo, initially as a provider of software development and debugging tools for software vendors. He explains that… Read More
Transcript
Discussion (0)
Hello, my name is Daniel Nennie, the founder of SemiWiki, the open forum for semiconductor
professionals.
Welcome to the Semiconductor Insiders video series, where we take 10 minutes to discuss
leading edge semiconductor design challenges with industry experts.
My guest today is Greg Law, CEO of Undo.
He's a C++ debugging expert, well-known conference speaker, and the founder of Undo. Welcome
to Semiconductor Insiders, Greg.
Hey, very happy to be here. Thanks for having me.
So, Greg, for the last 12 years, you have worked with all the EDA tool vendors and some
of their customers building semiconductors. What trends and common challenges are you
seeing across the board?
Hey, well, I think it's really just the pressure to deliver on time,
deliver quality designs on time just gets ever greater right and I think we've just seen that
particularly in the last few years with all the advances that we're seeing there's just ever
pressure in a faster design cycle getting those designs to customers but they've got to be quality
as well right it's not just speed, it's also the quality.
Right, and how does Undo help with this problem
of engineering teams spending far too much time debugging?
Yeah, sure.
So, I mean, in this, and what we're seeing that
some of the shifts we're seeing in terms of the approach
to getting these designs out quicker and higher quality
is really through the shift left, okay,
and getting more of the development done earlier
in the cycle.
And that means, it means virtual prototyping, okay,
it means modeling of architectures,
often just in pure software
before we try and turn them into gates.
And really all of these approaches are really very,
they're essential now.
Like they've changed from being perhaps about the leading
and leading companies were doing maybe five years ago
to now it's like table stakes. You have to be doing this stuff just to keep up. And where Undo comes into that is
really this is becoming increasingly about software. So our background is we come from a space of
helping software companies. So it was the EDA vendors in the early days. In fact, they were
some of the first companies that we started to work with. Since then, we've worked with all of the enterprise software
people that you might expect, right?
So Amazon, Bloomberg, and Cisco, and people like this.
And then what we're seeing just in the last maybe year or two
is increasing numbers of silicon companies
are becoming our customers, right?
So not just the EDA companies,
but the customers of the EDA companies too.
And that's because of this shift left, which means that like more and more of the development
effort is really being done in software, right? Through virtual prototyping and all the other stuff.
And so the same stuff about Undo that helps these customers, helps these software companies to
produce better code faster is now just becoming super relevant to silicon companies
as well.
And, you know, a lot of these silicon companies are systems companies, right?
So Apple and Tesla, I mean, they make the whole system.
So there's a huge amount of software behind it.
Absolutely.
Yeah, yeah.
So it's yeah, indeed a lot of that vertical integration.
But you know, and even the yeah, so all of the all of the silicon companies now produce
a lot of software.
But what's changed in the last few years is that it's also the Silicon teams who are actually
producing more software.
Not that they're shipping that software, right?
Because it's a C model of the design or peripherals or it's virtual prototyping, right?
That software never leaves the building, but it is essential to develop the latest version
of the chip.
So could you translate that into the kind of business impact engineering leaders can
make with your solution?
Yeah, yeah, yeah.
So I mean, the context of the problem here we just talked, we just spoke about, right?
So, you know, developers, you know, hardware, silicon developers spending up to 50% of their
time like debugging these models and these system C designs.
So you know, whether it's, it could be a model of a Verilog implementation, or it could be just
increasingly these days like system C, right, just implementing the chip directly in a traditional
kind of, you know, in C++ basically. And then what we help with at Undo is to allow the software
engineers to see exactly what their code did, right?
Not just rather than what they thought it was going to do.
And we do that through a kind of three phase approach, right?
So this record and then replay and then finally to resolve the issue.
So you record the execution.
So that might be through high level synthesis.
It might be in system C design, or it might be a model of the chip you're making.
It might be a model of the peripheral.
And you run that code,
and when it does something that you weren't expecting,
you have a recording of it.
And this recording allows you then to replay the execution
right down to the line-by-line level.
The developer can now see every single line of code
that executed and every piece of data for every line,
and they can wind back to any point in
the in the program execution. So it's like complete information about what that what that piece of
software was doing, which then makes that third step of resolution just you know, super straightforward.
Right? There's no longer any this but how did that happen? How did I get here? I think maybe that you
can just see exactly what happened and resolve these issues much faster
and then get, therefore get much more coverage,
much faster implementation of what you're doing.
Okay, so that's kind of the, that's what we do, right?
That's what this time travel debugging approach is
and what that means really in terms of how that looks to,
whether it's to the engineer
or to the development organization,
the traditional one is at the top here, right?
Where we have this loop and you go around this loop,
like the code's not doing what I thought it was doing.
Typically what people do is like add more logging, right?
Or they get, maybe they get a dump or something.
The most common actually is just add another printf,
run it again, let's see what happened that time.
And then, you know, step by step,
I go around this loop an indeterminate number of times.
It's one of the problems actually, I've got no idea how many times I'm going to go
on this loop until I, until I finished or maybe even given up.
Now at the bottom, you've got the time travel debugging approach, which
is just this straight line.
Okay.
You take the program recording, you replay it, you resolve it.
There's no iteration going around again, new bills, more information.
Everything you need is just there.
So it's like, it's so much more, it's not just faster.
It's so much more, it's not just faster, it's so much more predictable. And that's really kind of how that looks. And then what that means
in terms of the benefits that people see, which was really the question you were asking, so I'm
going to get to that, is understanding what really happened, and knowing what really happened,
rather than trying to make these guesses, which in turn lets the engineer
root cause what's really happened just with ease. They can even trace back through the code flow,
through the data flow. I've got some piece of data, some signal or some variable that's in a state
that I didn't expect. I can just wind the tape straight back to where that last got changed. And I can keep on doing that, keep following the chain back to the ultimate root cause,
which is especially valuable when it comes to intermittent failures. Okay, particularly,
and we have these, everybody has these very expansive now regression test suites.
And those regression tests, they're not always 100%, okay?
Sometimes you might get a failure,
you know, one in 100, one in 1,000.
If you can pick a recording of that,
just capture it just once,
then the intermittent bug problem basically goes away.
We had a customer just recently who was struggling
with less than 50% of their clean regression run
would just run, you know, all green, even if you haven't made
any changes. And using this, they got that up into high 90s, like 97, 98% just, you know,
reliable green runs, which is then that has big culture impact, right? Because down at
that level of 50% failure or even 70, 80% is green, then the engineers stop trusting
the tests.
Okay, and they see a failure and they say,
well, it probably wasn't my fault.
There's this inherent kind of flakiness in my test suite.
I'll just run it again, see if I get lucky next time.
Yeah, okay, off I go.
And that just then becomes this self-fulfilling prophecy
and just gets worse and worse.
And then the other final thing
I just wanna comment on here
is the ability for collaboration.
So these days, the systems we build are so complex, no single human can get them all in their head. It's a
collaborative effort. And if you look at how engineers collaborate on resolving any kind of
issue, you'll often get a long trail of comments on your GitHub or Bugzilla or whatever it is you're
tracking this. It could be this and they're asking questions
and people are jumping in.
Now you can take a recording.
And one of the nice things about these recordings
is that they're portable.
So you can take that recording, give it to your colleague
and say, hey, I've seen this thing at like,
time six minutes, 14 seconds, and that looks weird to me.
Can you explain what's going on there?
And it just is a big kind of collaboration win
rather than trying to, especially as we work,
increasingly not all in the same office, right?
So it's not so easy these days to say,
hey, come and take a look at this
and bring up the chair and working it together.
Sometimes you get to do that,
but a lot of times people are remote and they're asynchronous.
And so the collaboration you get through recordings
is a big win.
That's great.
So Greg, how do companies generally engage with Undo?
People always want to take this and try it for themselves.
We work through it.
We collaborate closely with customers
while they are going through what
we call the bug hunt process, where one of the common
reactions we get is, well, this sort of sounds great,
but surely this is too good to be true. And people want to see it working for real as they engage with us. So yeah, there's a
number of different ways that we kind of go about that. It will depend exactly on the customer.
And then what they're looking for is to validate not just does the technology work, because yeah,
it's kind of cool. But we don't want something's like, it's not, but we don't want something
that's just cool and a neat trick.
We want to understand that it really has the business impact.
Right.
And so what we're looking to do in that evaluation process is
demonstrate these three key points.
Right.
So the first is that we can, that using this technology, customers
can get to market faster, right?
They can reduce the time taken to produce these complex SOCs, ASICs and the rest,
and not just getting to market faster, but doing it more productively. Okay, so getting to market
faster with the same or sometimes even fewer resources, certainly no extra resources. And
even better than that, the kind of third leg of the stool here is that not only do you get out faster
with better productivity, what you get out the end is also better. You get improved quality.
One of our silicon customers was explaining to me the other day how they make a model of a chip
in C++ before they sort of really in parallel with the RTL team turning it into silicon.
And they run these workloads on the model
and kind of trying to characterize them.
And the model, previously, they were able to get like
50% coverage of the workloads they wanted to cover
on the model, because they just couldn't,
they couldn't get the weird little differences
out of the model enough to run all of the workloads.
Now, with Undoom, with Time Travel,
they can get 80, 90% coverage.
So what, you have higher understanding of how that silicon is going to perform when it ships.
So it's these three key business impacts, faster time to market, more productive, and
better quality of what you do ship.
That's what we're looking to demonstrate in the process when we engage with a customer.
Once you've demonstrated all those things, then it's pretty, it's a bit of a no brainer really
to then adopt the technology.
Great conversation, Greg.
Thank you for your time.
We will see you next at the Design Automation Conference.
That concludes our video.
Thank you for watching and have a nice day.