Disseminate: The Computer Science Research Podcast - Gina Yuan | In-Network Assistance With Sidekick Protocols | #54
Episode Date: June 10, 2024Join us as we chat with Gina Yuan about her pioneering work on sidekick protocols, designed to enhance the performance of encrypted transport protocols like QUIC and WebRTC. These protocols ensure pri...vacy but limit in-network innovations. Gina explains how sidekick protocols allow intermediaries to assist endpoints without compromising encryption.Discover how Gina tackles the challenge of referencing opaque packets with her innovative quACK tool and learn about the real-world benefits, including improved Wi-Fi retransmissions, energy-saving proxy acknowledgments, and the PACUBIC congestion-control mechanism. This episode offers a glimpse into the future of network performance and security.Links:NSDI'2024 PaperGina's HomepageSidekick's Github Repo Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello everyone, Jack here from Disseminate the Computer Science Research Podcast and welcome to another episode of our Cutting Edge series.
We're going to be talking about networking today and specifically we're going to be talking to Gina Yuan who will be telling us everything we need to know about her recent NSDI paper,
Sidekick in Network Assistance for Secure End-toto-end transport protocols, which won two
awards actually at the conference. It won the Outstanding Paper Award and the Community Award.
So that's awesome stuff. Gina is a PhD student at Stanford University where her research is,
research interests are building systems and designing algorithms for networks.
Welcome to the show, Gina. Thanks, Jack. It's really great to be here and thanks for the introduction.
Awesome stuff. I'm really excited about today's podcast. This paper is absolutely
jam-packed with really cool stuff and a lot of acronyms as well. So we're probably going to have
to have a breakdown of a few of these acronyms throughout this conversation today. But let's
start off as we always start off our podcasts by getting you to tell us your story.
Yeah, currently I'm in my fifth year of the PhD program at Stanford.
Before that, I was at MIT for my undergrad and my master's degrees, also in computer science.
Outside of that, I like to run and bike.
But how did I get into systems research?
First, how did I get into computer science? I think in undergrad, I didn't really know what to do at the time or what I wanted to do.
So I kind of just did what I thought was cool.
And I think everyone is really getting into computers at that point.
And I applied to a few clubs at MIT.
I applied to the trading club and I applied to the hack MIT clubs and they both rejected me.
And so I ended up having to do research, darn.
But I, yeah, I really
started getting into computer science because MIT had this UROP program, which is an undergraduate
research program. And I wanted to have some academic extracurricular. So what I did was I
just emailed a bunch of people and only a few got back to me. And that's just how I got started in
research. But actually, the first research I did as a freshman was not really research because I didn't really know anything about computers. But I was building apps,
basically. I built an Alexa app. I built JavaScript apps. But I learned a lot about coding and
engineering during this time. And this really set the foundation for me enjoying building systems
and writing code. And yeah, that's how I got into systems. Awesome stuff. Yeah. I mean, just kind of jumping back a second there. You said that
there's this concept of applying to clubs, MIT, and they accept you or reject you. That's such
an alien concept. We didn't, I mean, we had societies like out at the universities in the UK,
but like, I mean, they were just happy and all with the people to come along. Maybe it's a
different sort of structure in the States. Yeah, maybe it was even that the fact that you had to apply
made it seem even cooler
because then you could be part of this exclusive club.
But, you know, yeah, I guess they need like a few people
to run the organization.
I mean, yeah, then it becomes a tight-knit community.
But yeah, had to apply.
I mean, yeah, I kind of see it from the sports.
I'm getting kind of waylaid a little bit now,
but I see it from the sports perspective when you've only got like a set number of positions
on a team right like you've got a trial and etc etc but yeah i know like the math society or the
computer science society at the universities i've attended we're always like happy for more members
right i mean more people to go and get drunk with on a wednesday night normally exactly right yeah
that's interesting and also as well i kind of when i was doing some sort of
my background research on eugenia i noticed that you you kind of you ran something called battle
code and this was this was new on me so can you maybe tell us a little bit about battle code real
quick yeah it's another one of those clubs at mit okay um this this club creates a programming
competition where you write code for these virtual robots.
And then these virtual robots fight another team's virtual robots in this virtual world.
And every year the game changes. And I kind of forgot exactly what the game was in the years
that I did it. I kind of remember robots that shoot bullets and then they hit trees so it's not too violent because
they're trees and yeah like you form teams and then there's brackets and there's live tournaments
and you commentate them and so I just started one year like competing in it and then the next year
because I had so much fun competing in it I ended up joining the organizing team for the club and
leading it for a few years.
Yeah, that was a really fun experience and also very systems oriented because now I was
touching all the infrastructure in the cloud and just writing code that all these people were
going to use and iterating really fast because it wasn't a company. It was just a bunch of
grad students and undergrads at mit writing some code for people
to win like thousands of dollars hopefully that's a big prize as well then yeah yeah just some
students writing some code that's good it reminds me like uh it's the virtual i don't know if you
ever had this in in the states but it's called robot walls it was a big thing when when i was
a kid people would basically build a little like a i mean these things are pretty gnarly actually like these little like robots and they'd have
like axes on them and stuff and basically you'd build your robot and then you'd compete against
somebody else i mean i can send you a link on youtube after this so you can watch the robot
walls in action because i'm probably not going to do it justice yeah basically you'd build a robot
and you try and destroy somebody else's robot basically like a little thing but anyway it was it was cool we
still love it as kids but anyway let's get back on topic so let's talk about your paper and your
research and networking so let's let's set some context for the chat today then before we do
anything else and yeah tell us all about kind of what your research kind of the area it's in and
the kind of canonical model of the internet, what performance
enhancing proxies are. And yeah, tell us all these important terms we need to talk about Sidekick.
Yeah, yeah. And definitely sought me if I say a term that I've not defined well.
So my research is mainly in computer networking and systems. And when I tell people about
networking, who are not really familiar with networking,
I kind of just talk about the internet. Anything we do on the internet involves a computer network.
So I like to think of the network as a collection, like a network of nodes with links connecting
these nodes. And we want to send data from one node in this network to another node in the network.
All the nodes on
the edge of the network are like our end devices, like our computers, our cell phones. And the whole
goal of the internet is to be transferring data across these nodes in the way that we want them
to. This paper is related to something called PEPs, Performance Enhancing Proxies, P-E-P's.
And before I dive directly into what PEPs are, I kind of like to
start talking about my paper by talking about a problem that we are trying to solve that feels
real to people. And so this problem is like, imagine you're on a moving train and your laptop
is connected to the Wi-Fi on this train. And then you're connected to this Wi-Fi by connecting your
laptop to a router. And then this router in turn connects you to the rest ofFi on this train. And then you're connected to this Wi-Fi by connecting your laptop
to a router. And then this router in turn connects you to the rest of the internet. And it probably
has like a cellular connection or something like that. So if you were to use the Wi-Fi on the train,
you'd probably think it's kind of slow. This is common experience, slow internet. But what does
it mean for your internet to be slow? Okay, yeah. So imagine you're on this moving train
and your browser is uploading a really large file to the internet.
Your browser is doing this using something called a transport protocol.
And some of these popular transport protocols are TCP and QUIC.
So TCP might be the transport protocol
that if you've taken a networking class
or kind of just
interacted with networks that you've probably heard of first. This was kind of the original
transfer protocol. And then QUIC is this new transfer protocol. It was invented at Google
in 2012. It does a lot of the same things as TCP. It helps you have reliable connections,
and it just forms this idea of a connection to begin with. And so the surprising thing in this imaginary
train scenario, your laptop could be uploading large file using TCP, and that connection is
actually very fast. But then your upload using QUIC is actually very slow. And this is kind of
surprising because TCP has been around since the beginning of time. And then QUIC is a really new
transfer protocol. Yeah, so what is different about TCP and QUIC?
And the difference,
the main difference between these two transfer protocols is that QUIC is a secure transfer protocol.
So it's completely encrypted on the wire
versus TCP has some unencrypted information
like the metadata in its headers,
such as its sequence number.
So every TCP packet has like a different
consecutive sequence number assigned
to it to ensure that they're all arrive at the other end host. Yeah, so transport is end-to-end
for secure protocols. And this is kind of how the internet was designed to be. There's information
in these packets that your routers, middle boxes, proxies, these are all going to be the same thing
in case I use these words interchangeably.
These middle boxes were not intended to participate in transport. And that's kind of how it is for quick because everything's totally encrypted. But sometimes your connection can actually be better
if you do know things about what's going on in the middle of the network. And so they could be
better if transport wasn't end-to-end. For example, in our train scenario, we have two different path segments going on in this
connection.
We have the one between you and the router, which is very short, but you might have a
lot of wireless loss.
But you also have one between the router and the rest of the internet, which can be long,
but also more reliable.
And so because it actually can be better for these middle boxes to use information
about what's going on in the middle of the network, since TCP has these unencrypted transport headers,
actually what a lot of these middle boxes do is that they will look at this information and use
it to help your connection. And so this can be really great because these performance enhancing proxies
or PEPs that are looking at your TCP connection can actually help you. But at the same time,
they're going to be like changing the data on the wire, and then your transport is not going to be
end-to-end as originally intended. Yeah, so that's the great part about PEPs. You can have
awesome performance with PEPs. And they're widely
deployed. They're deployed on 20 to 40% of internet paths. And almost all popular cellular
companies will probably be deploying a PEP on your path. So this is the great part about PEPs.
Yeah, one might ask now, then why would you invent QUIC if PEPs are so great? And because there's
also a bad side to Pps and yeah when i when i
talk about the bad side of peps in my conference talk i put up this slide with like a graveyard
of tombstones and then on each of these tombstones is like a different idea that has been yeah killed
because peps are so bad we're not so bad like, because peps exist that the way they are.
I don't have so in your head, imagine this graveyard. And then now we have some of these tombstones. So some of these tombstones have these great ideas for TCP. So like changes to TCP,
extensions to TCP, some of these include ECN++, TCP Crypt, Multipath TCP.
These are all ideas that have been really hard or maybe even impossible to deploy.
And this is because PEPs, in helping your connection and modifying your data on the wire that they, remember, they weren't originally intended to see, they've kind of come to expect
these packets to be formatted a certain way. And so now
this expectation has been put in place and you can't change this format anymore. And this is
something that's called protocol ossification. Yeah, so protocol ossification. Protocols have
become fossils that are not able to change or upgrade anymore. And that's terrible because our internet has changed a lot in the past few decades.
Yeah, the other half of the tombstones are some of the research ideas
that have been previously discussed to try to solve this problem of protocol ossification
or of performance enhancing proxies not being able to help secure transfer protocols like QUIC.
But in the end, a lot of these ideas end up
co-designing the protocols and the PEPs or explicitly credentialing these PEPs to have
access to certain fields on the protocols. And we kind of feel like these solutions, while they
enable performance enhancements for secure transfer protocols, they kind of result in the same problems of ossification as before
that's that's a fantastic explanation so you kind of when you when you started giving this example
and you're like oh the train wi-fi might be a bit sucky like that's that's the that is the like my
experience and all of the trade all of the networks on the uk on the uk rail network all of the wi-fi
absolutely sucks like it's terrible so yeah i they need to do something
with that i don't know what it is but like yeah it's it's yeah it kind of so it kind of sucks
but okay cool so i'm going to kind of relay kind of a few things you said to me kind of while you
were describing that there to see if i can kind of grasp this sort of so ideally or kind of
originally the proposal was that these hosts would not really be end to end right so it would be the
kind of middle the middle boxes don't necessarily kind of,
they're agnostic to kind of what's getting passed through them.
They're just sort of, yeah, they're agnostic to the things they do.
And they're just passing the packet along to the next host.
Obviously, the topology of the internet is very heterogeneous.
There's like you said, there's like cellular satellite or whatever.
And there's various different sort of these ways of that of these paths on this this network that and
they have different properties so we can use like sometimes it's good to use these peps
these performance enhancing proxies if they kind of can be aware of the path they're on they can
do smarter things to kind of make performance better but that comes with the caveat that they
make assumptions about or they kind of change data in a way that then stops us actually,
well, it becomes that problem of ossification, right?
Where they're making assumptions and changing things,
which kind of stops future development or kind of stops things kind of happening.
Or the graveyard of research ideas, the thing people tried to do,
but they can't do because of PEPs, right?
And QUIC, which is, what does that stand for by the
way quick stands for quick udp internet connections okay cool so that's a pretty new protocol which
sort of takes peps back out the equation by encrypting things so they can't obviously
peek inside and meddle with the the packets but then that's obviously stops us getting the benefits
of pets and it can be slower because i guess there
is overheads with encryption and whatnot to actually to do that so that kind of maybe can
make things a little bit slower so peps are good and bad at the same time i guess and your work
is gonna allow us to get the best of both worlds i guess is the sort of way i would have kind of
the intuition i'm kind of getting from this but But yeah, in your own words, and give us the elevator pitch for Sidekick
and what the goal kind of is.
Yeah, the elevator pitch would be,
it's a way to have a universal pep.
It's a way to help, yeah,
for a pep to help any transfer protocol
that's already out there.
And for all these transfer protocols
to be able to keep changing
and keep upgrading and evolving over time. Love that. That's great. So let's talk about how you went about achieving that then. So
in your paper, you sort of break this down into three technical challenges.
So first of all, is something called quack. So tell us about quack.
Actually, before even quack, just the idea of a sidekick. And so the idea of a sidekick
is instead of how PEPs
exist today, which is actively manipulating your data on the wire to provide better performance,
the idea of a sidekick protocol is for the PEP to instead send information back to the end host
about the connection. And then the end host is making the decisions that will result in better
performance. So the active manipulation of data happens at the end host is making the decisions that will result in better performance.
So the active manipulation of data happens at the end host and is end-to-end as originally intended versus the proxy where PEP only observes the data. So that's the idea of the sidekick
protocol. Now, the first challenge was, our solution is called a quack, but what was the
challenge? The challenge is what are you actually sending on this sidekick protocol? So what can you send that is useful information about the connection back to the sender or to the data sender?
And the way we think about this is something that's already been a very useful signal in these transfer protocols is an ACK.
So saying which packets have been received for that connection at that specific node in the
network. And so the proxy could just say which packets it has already received from that
connection. Now, why is this a challenge? So let's think about how you would do this for TCP.
So for TCP, it's kind of easy because TCP has the sequence numbers that
I mentioned before. Each packet is numbered consecutively. And so you can easily say as a
proxy, for example, I received packets one through 10, and I also received 13 and also 15 through
27. So you can say this pretty concisely. But now if you're using QUIC and all your packets are totally
encrypted, your packets look like total garbage, you don't know, you just can't express that you've
received a range of packets, or even if any of these packets are missing because you don't know
what the consecutive order is. And so that's kind of where the idea of the quack comes in.
We wanted a way to be able to concisely acknowledge a set of completely
random looking packets. And there's several properties that we need to make this practical.
It can't take up too much space on the wire. It has to be really fast if you're processing
lots of packets at the same time. And if you lose any of these quacks on the wire, you need to be able to have
the same kind of correctness guarantees as before. That's kind of what we needed, but how did we
actually do this? And I can talk about the origin of this later, but it took a lot of different
connections. But our final solution at the end was we realized that there's this related theoretical
work in this idea called set reconciliation or in a more relevant networking context,
and they call it straggler identification. And basically it uses some ideas from math,
from algebra, coding theory, to be able to concisely and efficiently represent a set
of random packets that we've totally, that we've received in the middle of the network.
Yeah, so we went deeper into this math and this algebra.
We implemented a data structure called a CLAC that uses these ideas.
We highly optimized it so that it could be more practical to use and then we made some considerations
of like extra extra tricks we had so that it's more applicable and practical in this networking
setting that we're using whereas previously it's really been studied more in a theoretical context
yeah jean i'll just jump in real quick and say obviously there's there's some there's some
magical mathematical work happening there to make this thing kind of this work but can you give us an intuition of kind of how that
actually the the math behind it and kind of what properties it's kind of using to actually because
like you say kind of taking from the i've got this total totally kind of anonymized encrypted set of
set of packets i'm sending how do i work out what the hell i've got get what i've got given and say
yeah i've received this it kind of feels like an impossible problem almost.
So how does the math help us there?
Yeah.
So some intuition.
First, at a high level, how we're modeling this as a mathematical problem is that we have a data sender and then we have a proxy.
So the data sender is sending some set of packets, S, and we represent each of these packets by a 32-bit packet identifier, which you can do by hashing the payload to 32 bits or something like that.
Okay, so that's a set S. And then on the proxy or maybe receiver of packets, we have a set R of all the packets that the proxy has received.
And so we consider the set of packets
R to be a subset of the packets of S. And so as a mathematical problem, what we're trying to do
is figure out what is in the set R, or alternatively, what is in the set of S without R.
So the set difference. And we're trying to find this at the sender which already
knows s so finding either of these values is equivalent and the way we're able to do this
is eventually we're able to model the system of or the set of missing packets as a system of
symmetric polynomial sums and that's fancy but but back to just basic algebra, what we have is a system of, if we're missing
M packets, we have a system of M equations and M variables. And all we had to do is solve this
system. And that gives us an exact answer of which packets we're missing. Awesome. Cool. So how did
you then kind of take in from this solution you developed? How did you go about evaluating kind
of how effective it was? And yeah, what were the results? Yeah, so before the NSDI paper, we actually had a hot nets paper.
And the only thing we'd really implemented at that point in time was the quack. And so we were like,
what kind of performance guarantees do we need on this quack in order to like make it practical to
use? And one of them was that it has to be fast to encode.
So every time you update the state of your quack, given a new packet, it has to be very fast.
And it also has to be fast to decode. So every time you receive a quack,
it has to be very fast to figure out which packets are missing. And yeah,
when we were writing the hot nets paper,
one of our co authors, David Zhang, he actually works, he doesn't work in networking,
he actually does more stuff with like high performance programming. So he, he really
played a big role in, you know, helping to optimize some of these like hardware operations
for the quack. yeah ultimately because we found
this algebraic solution and it's just additions and multiplications these are all very fast
operations on a cpu and we're able to to create a pretty good implementation of it nice cool so how
yeah tell us about the results tree and so like how quick is it? How good is it at decoding and encoding? And yeah, what are the overheads of it?
Yeah, so it depends on the exact parameters,
but in the main experiment of our paper,
it takes 33 nanoseconds per packet
to encode a single packet into the quack.
And it takes three microseconds
to decode a quack at the end host.
Interesting, is the asymmetry there?
So I kind of why it's not the same either way, I guess, yeah, there's probably a reason for that.
Yeah. So encoding is much more important to be fast from our perspective because a router or
a middle box in the network could be processing a lot of packets at the same time because it's
not just your device's traffic, it's all the devices that are sharing the network in comparison decoding a quack is that your end host and
your end host is probably just handling the connections related to you and actually the
the logic of like processing the information from a quack or you know even general acknowledgements
is much longer than actually what it takes to just decode the quack.
Awesome stuff. Of course, we've got our quack there. So the quack's component one,
that's technical challenge number one solved. So let's talk about technical challenge number two
and something called Robin. And yeah, tell us what the challenge was and then how Robin
solves this challenge for us as part of these sidekick protocols.
Yeah.
So Robin is kind of just our name for the system that we built around the idea of sidekick protocols.
So like, what can we actually do in response to quacks?
Now that we have a sidekick connection and we have the information from quacks, what
kind of what we call path-aware sender behavior can we implement in a sender so that
it is giving us performance benefits that are as good as what we were getting before with
unencrypted transfer protocols and PEPs. And so kind of how we thought about this problem is,
first, what kind of information are quacks giving us now? Quacks are telling us which packets are
lost, just like acts do. Or, well, it tells you which packets are lost, just like acts do. Well, it tells you which
packets are received, and then you determine which packets are lost based on that. It's telling you
where this loss is occurring. So when you receive an end-to-end act, all you know is that somewhere
along your end-to-end connection, a packet is being lost. But now that we have a quack, we know,
for example, packets are being lost on this near path segment to us,
between us and the quacker. And we could even infer, based on that, which packets are being
lost on the far path segment. Yeah, so we know which packets, we know where, and then also we
know this signal a lot sooner. So for example, if in the train scenario, the router is quacking to
you, then you can know which packets are received much sooner than if the end host, like all the way in another country, is telling you when packets are lost.
So with these new signals, we think that we can get a lot more, a lot better sender behavior, because now it's aware of things that are going on on the path.
Yeah, so then we thought about, okay, what are we actually changing in the behavior?
And so we think about end-to-end acts.
What do protocols like QUIC and TCP do with end-to-end acts?
And so we kind of have four things here.
One thing it does is it uses to retransmit lost packets.
The second thing is it uses these acts as signals of loss and congestion,
and it uses this to update the congestion control. The third thing is it updates flow control window.
And the fourth thing is it discards packets, forgets them because they've been reliably
received and we no longer need to do anything with them. Okay, so for quacks, we ignore the fourth one,
just because for safety reasons,
just because a quack has received a packet,
it doesn't mean that your end-to-end connection
will have received a packet.
So we'll really focus on the first three.
Yeah, so for retransmission, what can you do?
Well, now you know the packets are there.
You know which packets are lost sooner.
So if your protocol kind of wants you to have low latency transmission, you can just retransmit
the packet immediately. And in the case of QUIC for retransmitting a packet, so something that's
a little different, if you're familiar with TCP retransmission, TCP will tell you that you lost,
say, packet seven, and then TCP will just retransmit this exact same packet again, packet seven. But whatIC, instead what it does, it kind of references
the data that's in QUIC.
And instead of retransmitting the exact packet, it's going to rebundle all this data in a
new packet, possibly with other data that needs to be sent.
And then it's going to transfer this over in a completely new, randomly looking packet
back to the end host.
And so we wanted to follow the spirit of QUIC
instead of retransmitting the exact packet that was transmitted previously. We modify the client
on QUIC to rebundle its data and retransmit the data in that lost packet back to the end host.
So yeah, retransmission, it's not too complicated because in the end it seems like a lot
of protocols just want to retransmit their data although that's not that's not true if you have
like some more lossy media protocol but this is this is kind of how we thought about retransmission
cool nice so yeah so i guess given retransmission you you how did you handle how does it handle
sort of congestion and other signal for sort of condition control as well? Do you use the quacks?
And by the way, I love the fact that it's called the quackers and you're referring to it as quackers.
I think it's brilliant.
It's an awesome name.
Yeah.
So how do the quacks help on the condition control side of things?
Yeah.
On the congestion control side of things, consider the train scenario.
Why really is your connection slow? And it's not just that you're not
retransmitting things fast enough it's because your laptop is like it thinks because there's
so much loss occurring that there is that it thinks that it's in congestion collapse basically
because because loss is occurring it keeps sending slower and
slower but the same loss is still occurring so it just keeps sending slower and because it does this
then your connection is really slow the exponential back off it's like oh this is falling okay i'm
gonna try again in an hour i'm gonna try again in two hours i'm gonna try again in three hours but
obviously yes okay so yeah okay cool yeah so it's really not a retransmission problem it for a lot of hours, but obviously, yes. Okay. So yeah. Okay, cool. Yeah. So it's really not a retransmission problem. It's a congestion control problem. And just to give a
little background more on TCP peps, the reason that peps, the way that peps help here is that
they don't just retransmit the packet sooner. A TCP pep in this scenario could actually be
splitting the connection into two completely separate connections. And so it has like one connection for the short lossy path
and it's sending signals about loss very quick. And then it has the other side of the connection
that it knows is it's, it can tell it's more reliable and it's like just keeping it more
saturated over time. And so the way we integrate this into quacks kind of at a high level is that
quacks are now telling you where the loss
is occurring. So now you know that you have like a really short segment where all the loss is
occurring. And then you have a really long segment where actually you're not really losing that many
packets. And so that kind of helps you, you know, still be, you have to update your congestion
control. You can't just like retransmit the packet without doing anything to congestion because
anytime you're just adding more data to the wire without kind of considering its impact
on the connections around you, that might not be fair.
And so our goal here was to mimic the behavior of a TCP PEP in how it's handling congestion.
And so we're using the signal
about where loss occurs to kind of think about our congestion window and sending rate of packets
to be just as fair as TCP PEPs, but also as fast as them. And yeah, I don't know if this is like
the right answer for congestion, but definitely the main takeaway i want is that you know we have we have more
signals now that we could incorporate to have you know better faster but still fair congestion
control in our networks nice yeah it makes total sense so how does robin perform then so yeah what
what's the what's the how good are the results tell us about the results and the experiments i
guess you ran as well to get the results.
Yeah, yeah.
The main experiments that we ran is, so you know that train scenario, we didn't end up riding a train, unfortunately, but we did.
We did set up a router or access point in our computer science building and then attached
a cellular modem to it to emulate that like cellular connection.
And then I sat like you know a few
feet away from the router so that the wi-fi wasn't as good and you know there were like people in the
office so there was probably some interference as well so we had like a short lossy network and we
had a long long more reliable network and so we tried we tried uploading a large file with quick
and then we were able to improve the speed of that upload by about 50%.
And then we also kind of implemented a simple media protocol to test whether we could have
better latency.
And so the exact metric that we measured is de-jitter buffer latency.
So if on the other end of your audio stream, you kind of have to wait for the packets to arrive so that you can play them in order.
And then the degenerate delay is how long your packet has to sit in that queue before you can play it.
We were able to reduce it from 2.3 seconds to 204 milliseconds, which is about a 91% reduction.
It's a big reduction actually.
Awesome. Cool. So, yeah, I mean, we've heard about quack we've heard about robin and let's there's one last component in this in this
in your paper that you talk about and that's i don't know it's pa cubic or pa cubic i don't know
how to pronounce it but yeah cubic with a pa in front of it. So yeah, tell us what that means and what's it trying to solve and how it solves it.
Yeah, I guess I would call it Pecubic.
You can say it however you want.
It's fine if it's funny.
Pecubic sounds good.
Yeah, so I kind of talked about this.
This is kind of integrated with the second challenge
in that it's a new kind of congestion control algorithm
that integrates information about where loss is occurring on the path.
Nice.
Yeah.
And yeah, we don't know if it's the right answer for sure.
And there's a long appendix in the paper kind of going more into the theoretical side of it.
But I do think it's really exciting that we can have more signals that could inform either on the end host, including congestion
control. Yeah, I guess that's a nice segue then into the sort of the future directions for sidekick
protocols then. And yeah, where do you go next for this? Is it exploring the angle and sort of
seeing how you can combine all these different signals together to make even better, smarter,
fairer, quicker networks? Yeah. What happens happens next gina i think some interesting things
could be you know if you want to make it practical and also is the quack the right signal are there
other signals from the middle of the network that we can incorporate to psychic protocols
now that we have a quack or kind of a more practical implementation of these ideas from
straggler identification or set reconciliation?
Are there other networking settings in which this kind of information could be useful?
But really, what I am really thinking about right now is, right now, scikit protocols only send information from the proxy to the data sender. But if the bottleneck of that link was actually on the other
side of the proxy, then psychic protocols are not really helping. And yeah, so right now,
psychic protocols are awesome. They can do lots of things, but they can't do everything that peps
can do. And so I want to make psychic protocols as good as PEPs. And I think one of the things
that needs to be done possibly is we need like buffering or a generic way to think about behavior
at the proxy, but just the next steps so that there's kind of no reason not to replace
PEPs with psychic protocols if psychic protocols can do everything that existing
peps can do yeah nice it definitely kind of reduces the barriers to adoption right if it's
a clear win on every different metric however you wanted to sort of to view the comparison between
the two then it's going to definitely help adoption and have a bigger impact i guess so yeah
i had i don't know if this is a valid question g Gina. So excuse me if it doesn't make any sense.
But can you have a quack between quackers?
Because at the minute you've kind of got this sort of end host
and then like in your train scenario, it's the router on the train, right?
Or whatever, it's there.
Can you have quackers all the way along your path
who are then communicating with each other and also then making,
the proxy itself can leverage these signals as well rather than just the end host or is that
sort of again breaking that sort of model of end to end i guess yeah it wouldn't yeah so imagine
if you have like an end host and then two quackers between you so, if these quackers are quacking to you, the data sender,
you can learn, you can learn like not just whether losses occurring on like the near path segment or
the far path segment, but like which of these three or any number of path segments. So you know,
one idea is one concept is that you could be, yeah, you have even more information about the
network. Now, at some point, I guess there's probably diminishing returns about more information about the network this way. But
yeah, that's definitely a possibility that could merit more exploration.
Cool. Yeah, just quacking between quackers. I don't know. Just kind of came with like,
yeah, I'm going quackers. I don't know. There's a joke in there or something.
Quack-ception.
Quack-ception. There we go. Quackers all the way down. Cool. So we touched on impact there slightly
and sort of trying to ease the kind of the barriers to adoption and making this a clear
win over the current state of the art of how things are done. So I guess, yeah, what's your
take on this and sort of the impact you think Sidekick or the impacts you wish it can have,
or maybe already has how already has had
kind of yeah going forward yeah yeah i know one of your goals of this podcast is to connect research
to industry people and i think yeah this is where industry people could definitely you know have
an impact so sometimes i think about like what yeah me and my co-authors we think about what me and my co-authors, we think about what it would take to make psychic protocols real. And I think one of the things we need is interest from the parties that would be involved. And so the parties that are involved in a lot of data, anyone who develops your browsers and are used by,
you know, just day to day users who are sending data. The other party involved is the what I've
been calling, we've been calling the quacker, the proxies or the middle boxes. So I think,
you know, satellite companies could find this interesting if they find that all their users
have been switching to quick, and they're not able to give as good performance as before. Cellular companies who are deploying
TCP peps along their path and also finding similar behavior. Anyone who manages their own
internal network and kind of deploys performance enhancing proxies. Anyone, anywhere that TCP
peps exists now and are finding that like the people
who are using their network are using a lot more of the secure transfer protocols. So we really need
like interest from these people involved before you start like just bombarding people with like
signals about your network. So yeah, just like getting the idea out there and like people
thinking, you know, maybe this could actually be a solution to their problems. And then, yeah, kind of a third perspective is, you know, internet standards people.
And these are, you know, the awesome people that write these very detailed RFCs and, you
know, garner interest from other internet standards people and come together to like
create, you know, like this, like just wonderful, very detailed document so that, you know,
everyone's on the same page
and we can kind of be confident that it will work. And, you know, luckily one of our co-authors,
Michael Welzel is very involved with the IETF. He's been bringing Sidekick, previously Sidecar
into these discussions and kind of like thinking about how it would fit in with what people are
talking about in that community.
Yes, there's lots of things that we did not address in this paper that are needed to make sidekick protocols practical. And that's like, you know, discovery.
How can the two ends of the sidekick connection discover each other?
And then how can we do it securely?
What other security mechanisms do we use?
You know, maybe like what other signals can we send just very like practical things that need to be addressed before before it will really happen yeah yeah
and actually there's a lot of stakeholders there but it's a call to action there this part of the
podcast hopefully someone somewhere listens to this and kind of goes oh actually maybe i should
check these sidekick protocols out and it can help you sort of realize that goal of yeah of getting
these out there and
i mean at the end of the day we all want as an end user as a consumer we all want like faster
more reliable networks right so if this is going to get us there i mean like it's it's yeah it
feels like a no-brainer to me but yeah i guess there's a lot of work needed to be done to make
it practical but the early the early signals are definitely very positive right that this is this
is a very viable approach so yeah awesome stuff let's change gears a little bit talk about some surprises that you found while working on
on the on sidekick so yeah i guess the question is what's maybe the most interesting lesson that
you've learned while working on this type of protocols and in this space maybe
i think prior to this project i wouldn't have really considered myself
a networking person but you've been converted now that's you're a networking person now gina
now i feel it but i think working in systems or networking or these computer science fields have
been around for you know relatively long now at least for computer science. It's not
exactly LLM, but there's a lot of history to these fields, especially in systems. And yeah,
even though I have vague memories about dial-up internet from back in the day, I definitely grew
up in an age that was way past that. And so I think working on this project, and then now that
the project is out, and I meet a lot of people who are interested in this project, it's really
awesome to have a different perspective on the internet and the network from people like my
advisors and collaborators and just other people out there who have lived through like the development of tcp
and the frustrations of peps and then reflections on like from the inventors of quick or just yeah
about the early internet and like what it has evolved to today like people have very strong
feelings about you know even the definition of middle boxes like people would probably feel very
strongly in the way that i'm using that word middle boxes like people would probably feel very strongly in
the way that i'm using that word middle boxes and it's like so interesting that like because people
grew up in this different time that there's like all these emotions attached to how this field has
changed over time yeah essentially i mean i'm gonna probably date myself a little bit here but
i remember dial-up internet i mean the first computer we had in our house was was dial-up and
i remember it being like that obviously there's the noise that goes with it right and if someone else is
on the phone when when i was there on the the laptop i'm sorry on the computer then the then
the connection would drop right so i'd be like mum get off the phone i'm trying to play i don't
know what stupid video game i was playing at the time but oh save for the internet but yeah and
yeah that's really interesting to
kind of get people's take on it you've kind of lived through the whole sort of development of
these things that kind of a lot of people kind of you just i mean come into the world today right
you just kind of the internet exists right but there was a time when it didn't which kind of
it's hard to get your head around i guess but yeah cool let's talk about the origin because
you mentioned it earlier on when we were chatting about the origin story and we might get into this a little bit later on so maybe now it's time to do that so yeah can you tell us
about origin story for the paper and like some backstory so how did it how did you arrive at the
sidekick protocols yeah so at the time I was working on a completely yeah I think the origin
story starts from before this project was even in my head.
I was working on something kind of very different. I wanted to be able to audit the logs of
a router that may be trying to hide its malicious traffic. And so I was working on this with David Mazieres and Matei at the time. And we kind of distilled it into
this mathematical problem, which is this set reconciliation problem that I mentioned before.
You have a set S and a set R, and how do you communicate the least amount of information to
find the subset? But actually at the time, I didn't know anything about set reconciliation. I was kind of just deriving it and forming it as a mathematical problem, kind of with some suggestion from Dan Bonet, another professor in the department, on how to do this. And we were like the three of us, we were more like systems people. And we're kind of
just like, yeah, just trying to figure it out. And then one day I was like, I had this like piece of
paper on my desk and I had this like set of polynomial equations. And I'm like, how do I
solve this? I have no idea. You know, Google isn't really helping me. And then just another
grad student, Matthew, one of the coauthors on this this paper, wanders in and it's like, what's up?
And I'm just like, I have this hard math problem.
And then he likes math.
So then he takes this problem.
He goes, talks to his friend down the hall, David,
who ends up being, David Sane,
who ends up being another co-author on this paper.
And then he also likes math.
And then they take it to some math mixer and they talk to some math PhD students
and they come back like, we have an answer to this.
And yeah, I'll keep rambling a little bit.
But, you know, the end conclusion is going to be this touched a lot of people to get to where it is, even if it's just like a sentence in the hallway.
And then, yeah, Matthew and I in particular worked a lot on like the implementation of yeah once we figured out this like algebra we worked
on like you know making it work with this router that we had and like real packets and like in a
real networking setting and yeah we we like scrapped up some paper that was rejected about
this like malicious router stuff yeah um and then in like the very last minute we kind of discovered
this whole body of like
related theoretical work that is solving this problem that we thought we had just like solved
so now we're like you know i don't know if i can swear on this podcast yeah go for it
yeah then we were like shit and then we like talked then we went downstairs to the first floor to the theory department.
We talked to Mary Wooters, who is, you know, she does a lot of research in coding theory.
And it's just like, yeah, like, yeah, this is Yeah, it just makes sense to her.
She just knows this stuff.
But of course, like her perspective is like from the very theoretical side.
And so like, there was I feel that we felt like, oh, my gosh, this already been done.
But also, we felt like there's still something there, like, you know,
networking and theory.
It's like very different kind of bringing like to, to these two worlds,
their side of the problem is so obvious,
but you really need to connect them in order to get the solution.
But now, yeah, now we realize that, you know,
there is stuff in coding theory that exists.
And so, yeah, this, this is how the the math of the quack, the idea
came about. Now, on a very separate branch. So Keith Winstein, and Michael Welzel, both authors
on this paper, Keith is, yeah, Keith is Professor of Computer science just yeah at stanford michael welleswell is professor
at the university of oslo and he's also as i mentioned very active in the ietf so i think
michael is very much more in like the the ietf community versus keith is in like this networking
research community and even though they're both on networking they're actually also quite
disjoint but anyway they ended up talking somewhere. And then they came up with
this idea of what was previously called the sidecar protocol, where you send information
about the network, instead of manipulating the data. And so they had this idea of a sidecar
protocol, but they were always stuck on like, how can we send the right information? Like,
we're either always sending too much information, or I'll take too long to figure out this
information. And so they were kind long to figure out this information.
And so they were kind of just sitting on this idea.
And so I don't know.
I think one day I think I was just talking to Keith because we just passed by each other or something or someone told us to talk.
And then he's just like, yeah, I have this like terrible problem.
And I'm like, I think I have a solution.
He's like, no, you don't understand.
It's such a hard problem. And I'm like, but I think it works. And he's like no you don't understand it's such a
hard problem and i'm like but i think it works and he's like i don't think it'll work but whatever
and it's like and then i'm like i think it works and so honestly i think we had to like you know
really implement it and like bring it to him before he was like oh yeah i think i needed
empirical evidence to prove him to like okay now i believe it yeah yeah and yeah so yeah long story but i think it's
i think it's just amazing that if you connect the right people at the right time then like
something wonderful can come out of it and it took a lot of yeah it's like a lot of just like chance
that like i think the people from yeah from theory from networking
from systems that yeah once we combine it all then we have this like really cool paper and i didn't
talk about the last co-author david mazier is he really helped i think you zoom out in this project
and help us think about psychic protocols as more than just like helping quick,
but, you know, really helping any transport protocol and just like zooming out and thinking about, you know, how, how this can really have the most impact. Yeah. That's a lovely story to
see how, I mean, there's a huge elements of serendipity to that sort of, of like, yeah,
bumped into this person in the hallway and then came in, colleague came in one day and said,
what are you working on?
And then took it to some math mixer
and then they figured it out.
And that's our whole journey.
It's lovely to see it.
And it's like, it's a really nice sort of illustration
of how I think research should work in a way
in that you're in that environment
surrounded by people from,
the fact that you can just pop downstairs
and be speaking to someone who's an expert on something,
like who's, it's kind of tangentially related, but then you you kind of you put those two people together and you can create like this
really cool paper out of and it's great to see that it's a culmination of like numerous individuals
as well and all bringing their own strengths to the to the to the table to produce yeah a really
awesome piece of work so no that's really nice really nice origin story that gina i guess this
this kind of leads into my
kind of next question as long-time listeners will know this is my favorite question about the
creative process and how how you operate how you approach that the act of creating really cool
ideas and generating ideas and then selecting which ones to work on So can you maybe elaborate on how you do that? I was thinking about this
and how I would describe myself. I think I've heard many times it's important to pick
impactful problems, but I actually just like interesting solutions. And interesting doesn't mean complex.
And I guess it can be very subjective.
So what's interesting to me?
So I feel like maybe one way to solve a problem is you stare at the problem for a very long time.
And you kind of branch out from there about what others have done or you know what might be like a natural first
solution and yeah this is important to explore of course but i really i think the interesting
stuff to me comes from kind of if you can develop knowledge bases in more eclectic areas or different
areas at least enough to draw connections between those.
It's kind of, then you get the like kind of unexpected connections or kind of when someone
goes like, oh, that's interesting, because then it's just not something that you expect. And that
kind of little sense of astonishment is kind of what makes things interesting, I feel. But it's always important
to go deep at some point. And so possibly, you know, after those connections have been made.
But this is like, I don't know if this is a research philosophy or life philosophy. I think
it's fun to go broad and to always be learning because you never know when something that you
learn is going to be used in your life
and and yeah that that's it I don't I don't really have a formula I'm kind of I feel kind of chaotic
sometimes and just like pursuing the thing that's like right in the moment and then hopefully like
as long as you're always pursuing something at some point it'll circle back and it'll all make sense
so no i love that i love that how it's like it's not necessarily a research philosophy but like
kind of a philosophy for a way to approach life in general right i think the i think being creative
like being creative and like they're kind of one and the same almost i think that these two sort
of things but i love that kind of just it's like trying new things all the time because you never
know whether you're gonna you're gonna kind of encounter something you really enjoy or like doing or have this great idea.
And you only do that by exposing yourself to new areas and new things.
Right.
So, yeah, no, that's a lovely, lovely answer to that question.
Another one for me to add to my collection.
Cool.
So Gina, we've come to time for the last word now.
So what's the one thing you want the listener to take away from this podcast episode today? coexist. But in general, I think the takeaway is, if you want to learn about psychic protocols,
you could probably also just watch my conference talk. But I hope from this podcast, or in general,
I hope listeners can have learned something interesting in kind of the origin of the story,
or the people behind the story. And in, just learning about like the background behind all of these wonderful research ideas
and papers that are out there.
That's great.
Great message to end on.
Thank you so much, Jeannie.
It's been a fascinating chat today.
I've really enjoyed it.
I'm sure the listener will have as well.
Where can we find you on the social media or anything?
Or where can people, listeners connect with you
to find out, keep up to date
with all the cool research you're doing?
I'm not super active, but I have a website
and my email is on that website.
And I always love talking to people.
Cool. Awesome.
So we'll put links, I think we've mentioned today
in the show notes as well.
So you can go and check that out
and learn more about sidekick protocols.
And I say all the cool research Gina's done
and will continue to do over a career, I'm sure.
And yeah, we'll see you all next time
for some more awesome computer science research.