Computer Architecture Podcast - Ep 5: Datacenter Architectures and Cloud Microservices with Dr. Christina Delimitrou, Cornell University
Episode Date: June 5, 2021Dr. Christina Delimitrou is an assistant professor in the Electrical and Computer Engineering Department at Cornell University. Prof. Delimitrou has made significant contributions to improving resourc...e efficiency of large-scale datacenters, QoS-aware scheduling and resource management techniques, performance debugging, and cloud security. She received the 2020 IEEE TCCA Young Architect Award for leading research in ML-driven management and design of cloud systems. She talks to us about datacenter architectures, cloud microservices, and applying machine learning techniques to optimizing and managing these systems.
Transcript
Discussion (0)
Hi, and welcome to the Computer Architecture Podcast,
a show that brings you closer to cutting-edge work in computer architecture
and the remarkable people behind it.
We are your hosts. I'm Suvainai Subramanian.
And I'm Lisa Hsu.
Today we have with us Professor Cristina de la Mitrio,
who is an assistant professor in the Electrical and Computer Engineering Department
at Cornell University.
Cristina has made significant contributions to improving resource efficiency on large-scale data centers, QoS-aware scheduling and resource
management techniques, performance debugging, and cloud security. She received the 2020 IEEE
TCCA Young Architect Award for leading research in ML-driven management and design of cloud systems.
Prior to Cornell, she earned a PhD in electrical engineering from Stanford University.
Today, she's here to talk to us about data center and cloud architectures and applying
machine learning techniques to optimizing and managing these systems.
A quick disclaimer that all views shared on this show are the opinions of individuals
and do not reflect the views of the organizations they work for.
Christina, welcome to the podcast. We're so happy to have you here with us today.
Well, let's kick this off with a nice simple question. These days, what gets you up in the morning? A few things. Non-work related, getting
the vaccine and going back to normal life. Work related, like you mentioned, there's a few problems
that we're working on, both on what's the right way to build servers for modern data centers and
also what's the right way to build the applications that go on top of the servers and specifically
what is the role that machine learning can play in improving the design and management of these large-scale systems.
Right.
So what are the major trends that are changing the landscape of these data center architectures
and cloud computing more broadly?
Yeah, that's a good question.
So if you look at cloud computing five or 10 years ago, its key competitive advantage
was that it was using commodity equipment.
That was the main difference between cloud computing and HPC, high-performance computing.
And that is how it got economies of scale
and improved cost efficiency.
So even back then,
it's not that it had a single type of server.
There were still different generations of servers,
different server configurations,
but it was a much more homogeneous picture.
If you look at what cloud computing systems
look like today,
there's a lot more heterogeneity,
whether that is through reconfigurable acceleration fabrics, whether it's through special purpose
accelerators.
And there's a lot of good reasons why people are switching to accelerators.
They have performance, power, in many cases, cost benefits, but they also introduce complexity
in system management.
And then on the software side, you have a similar picture, which is that traditionally
cloud applications were built as what we call monoliths.
So a monolith is an application that includes the entire functionality as a single service and then is deployed as a single binary.
And if at any point in time it needs more resources, you scale out multiple copies of that application across multiple machines.
And as long as the application remains small in scale and in complexity, the monolithic
design approach is fine. The problem is that when the application increases in scale and complexity,
as with every other area of systems research, you want modularity. So that is the primary reason why
you see programming models like microservices and like serverless compute popping up.
And again, there's many good reasons why people are using these programming models.
We can talk about them in more detail,
but they do introduce more complexity
on the software side as well.
And it's for those two reasons
why a lot of the work that we do
is around automating the system design and management
and trying to abstract away that complexity.
So neither the end user nor the cloud operator
has to deal with it on a day-to-day basis.
So let's expand a little bit on microservices.
Could you tell our audience a little bit about how these microservices are different from
monolithic cloud computing applications?
Are there any unique challenges that these essentially bring up that makes them harder
to manage?
Sure.
So again, a monolithic design is what you would call a
single application. So you have a single code base, compiles down to a single binary, and is
deployed as a single application. You can have slightly more complex versions of that, where you
have a front end, that would be the web server, and then you have a back end, which would be the
database. And then the in-between, the mid-tier, is where
all the logic of the application is implemented. So that's not entirely a monolithic application,
but as far as microservices are concerned, it would still qualify as a monolithic design.
Microservices are modularity at the extremely fine granularity. So you have still the front-end web
server, you still have the back-end databases,
but the middle tier, which would be one service, can now be hundreds of unique services,
and then each of them scaled out to multiple replicas. So the advantages of this programming
model is that modularity makes the application easier to understand. So if you are in a big
company and you are in a big company and
you are developing an application, you don't need to be familiar with the entire code base. You need
to be familiar with the microservice that you're responsible for, and then the API to interact with
all the other microservices. And usually these are standardized APIs, either through RPCs more
commonly or HTTP requests in some cases. The other advantage is that it ties nicely
with this idea of a containerized data center
where each service is deployed in its own container.
And then if it needs more resources,
you just scale up or scale out that service.
You don't have to scale out the entire deployment.
It also helps in the sense of software heterogeneity.
So you're not tied to a single language
that the entire application is written in.
If, let's say, the front end would benefit from a higher level language, you can do that.
If other tiers will benefit from lower level languages that optimize for performance, you can have that.
It's a longer discussion.
How many languages do you want to have in your system?
Because that adds complexity as well.
But it gives you the possibility of having some language that originates. Now on the cons side, it doesn't come without its challenges.
So first of all, servers today are not designed for microservices. They are designed for
applications that have performance requirements at least in the millisecond, if not multiple
millisecond granularity. If you have a service implemented
as microservices, then the end-to-end performance target is milliseconds, but then you might have
100 microservices on the critical path. So the target for each of them would be in the microsecond
granularity. So you need much more predictable performance, much more low latency operation,
even from the hardware. And then there's all the overhead that the software layers add. There's also complexities in the sense or challenges in the sense of dependencies
between microservices, because even though they are loosely coupled, they're not independent from
each other. So the problem that that can introduce is that if one microservice becomes bottleneck,
that can propagate, it can create back pressure to other components of the system, and it can propagate across
the system and even become worse and worse.
So that is difficult to diagnose because you don't always know what was the root cause
of the performance issue, and that motivated a lot of the work that we did on performance
debugging.
And it can also take a long time to recover from because essentially you have to go tier by tier and correct the resource allocation until the entire performance of the service recovers.
So that's at a high level why people are switching to this know, one of the potential aspects of a
monolithic system is that you can sort of understand it end to end because it's one monolith.
And now in order to sort of distribute complexity into modules, at some point, you still want a
sort of monolithic picture, at least, of what is happening. So you don't have sort of like cascading,
you know, effects
rippling throughout where nobody understands where they start or where they're going.
So are there aspects of this where you still need something that's kind of monolithic that's
looking at the whole thing? And is that or is that more sort of someone needs to be on the side
pulling telemetry and then having a monolithic processing sort of program on the other end that's pulling things from multiple aspects.
But somebody still needs to understand the whole thing in order to know what they're looking at.
Is that one of the complexities you're talking about?
Right, exactly.
So you definitely need tracing.
And most systems, at least that we're familiar with, have some end-to-end tracing where they track once a request arrives in the system,
what is the latency breakdown? So what machines does it traverse? What services does it traverse until it goes back to the user? So you definitely need some global visibility into the system.
The problem is that if you task a person with having that global visibility,
you still revert back to the very complicated monolithic application where somebody has to
understand the entire code base. So you need to understand the topology,
but you don't need to understand the details
of each individual microservice.
Right, so this seems like something ripe
for where all those inputs may go
into some sort of ML system
that can sort of help you interpret what is happening.
So would you say your focus is more
on the software side of the story and managing that complexity or on the hardware side of the story and managing that complexity?
Because you kind of discussed both.
Right. So I am looking at both because if you look at where the performance and predictability comes from and where the resource inefficiency comes from, both hardware and software are responsible for that. So on the hardware side, what I'm looking at is what's the right way to build
servers for these new programming models that can offer more predictable performance, more low
latency operation, whether accelerators can play a role in that. And if they can, what is the right
type of accelerator? So one example is that because microservices talk over the network
to each other, they spend a large fraction of their overall latency processing network requests.
So if you look at a breakdown of latencies, more than 50% in some cases is just processing network
requests. So that's clearly very inefficient. You're not doing useful work. You're just
receiving RPC packets and then sending RPC packets. So given this observation,
what we are looking at is
what is the right acceleration fabric
for networking in microservices?
And we've built this system based on an FPGA
that offloads the entire networking stack,
TCP included and RPC framework included
on an FPGA that's very tightly coupled to the main CPU.
The reason why you want it to be very tightly coupled is to have very efficient data transfer
between the host CPU and the FPGA.
And the reason why we're using an FPGA is to have a reconfigurable fabric that can adjust
to the needs of different microservices because you won't only accommodate one service.
You might have tens, if not not hundreds of microservices that have very
different network traffic requirements and network traffic characteristics. So that is kind of a
first step in the acceleration that you can do. There are other system tasks, things like garbage
collection, things like remote memory access, encryption, machine learning models, that could
also be accommodated in reconfigurable accelerators
like that, which brings up questions on virtualizing the FPGAs, allowing resource isolation,
resource partitioning on the FPGAs. That's more on the hardware side. On the software side is more
questions of how do you manage resources for this application? So one example is what is a cluster manager
that can take into account the dependencies
between different microservices
in a way that guarantees the end-to-end quality of service?
Or the performance debugging that we were talking about before,
which is if something goes wrong in the system,
how do you figure out what was the root cause of,
what was the root cause that caused the problem?
And how do you also correct it so that it doesn't happen again in the future. What's important with all this work is
can you get some insight because you can design the machine learning system that has high accuracy.
A problem with machine learning in systems is that it's difficult to get explainable
outputs from this machine learning algorithm.
So get something where there is an insight that can help you design the system better,
not just get better performance right now.
So that is a general issue with using machine learning in systems.
We have been making some progress with explainable machine learning techniques.
So one thing that we've been looking at is
for performance debugging specifically,
can you use the output of the performance debugging system
to correct design bugs in the application?
And right now, this is a limited set of design bugs.
So it can be things like blocking connections,
maybe shared data structures that create bottlenecks,
maybe cyclic dependencies between microservices,
things that even though the applications were designed by people
that have a lot of expertise in cloud applications,
there are still bugs that are difficult to fight.
But if you use the machine learning system,
it can help you pin down where the problem might be,
and then it still needs some human intervention
to actually fix the problem.
That sounds really exciting.
Let's dig a little deeper into the machine learning system that you mentioned, especially
one that's able to pinpoint where there are sources of bugs or performance bottlenecks
in the system.
So what does this machine learning system look like?
Is it different from the machine learning systems that we see for other kinds of tasks?
Is it your vanilla CNNs and LSTMs,
or do you have some other kinds of structures in your machine learning model that aid this
particular task? Yeah, so we built two systems for performance debugging. I can tell you what
the first one was. So the first one was using techniques that you would find in other systems.
It was a hybrid network that had CNN followed by an LSTM. So that was SEER.
And the goal of it was to identify patterns in space and in time that if you don't do
anything, if you don't take any action, will turn into a quality of service violation in
the near future.
And the reason why we were looking at the near future is so that before the problem
occurs, you can take an action and essentially avoid it.
Because if it happens and you don't notice it,
then it takes a long time to recover.
So, Cier relied on tracing that was both distributed
and per machine.
So the distributed tracing is your typical RPC level tracing
which collects the latency breakdown of an application
from the beginning to the end.
So what are the services it traverses? What is the latency for each service? And then the second
level of tracing was pair machine utilization metrics or performance counters, if that is
something that you have access to. So these are the traces that would stream into the model and
the output would be the probability that a microservice would be the root cause in the near future. Now, this is a supervised learning technique, which means that
to train it, you have to give it some annotated traces. Annotated traces with root causes that
you know are correct. Now, how do you know when a root cause is correct? Only if you've caused it.
So the way we did this was by injecting sources
of unpredictable performance
through some contentious applications.
And that allowed us to know
where the quality of service violation started from
and annotate the trace correctly.
Now that ended up having very high accuracy.
The disadvantage is that
when you're in the production system,
you can't really go and start hurting
the performance of the application when it's live disadvantage is that when you're in the production system, you can't really go and start hurting the
performance of the application when it's live, because obviously the user experience will be
degraded. So to address that, and also some other issues that some of the requirements that CR had,
which was very frequent tracing, quite a bit of instrumentation in the kernel, things that are not
easy to do in a production environment where you might have third-party applications that you can't necessarily instrument.
So the follow-up to that was the Sage system that you mentioned in the beginning, which
is, again, a performance debugging system.
It, again, relies on machine learning, but it's entirely unsupervised.
So its goal is not to improve the accuracy over SEER.
It's to improve the practicality and scalability.
And that system relies on two techniques. The one is building the graph topology of the different
microservices. So essentially it builds this causal Bayesian network, which gives you the
dependencies between microservices in the end-to-end application. And then the second
technique that it uses is what is called counterfactuals.
Counterfactuals are these hypothetical scenarios of what would happen to the end-to-end application
if I were to tweak something in one of the existing microservices. So if, for example,
I see that I'm experiencing poor performance, if I assume that the performance of one microservice
was normal, does that fix the end-to-end problem? So that is how Sage works.
The accuracy that it achieves is pretty similar to SEER,
which was good.
Actually, it was better than we expected
because usually supervised learning works better.
But it's much more practical to deploy at scale
because it doesn't need as much instrumentation at the kernel level.
Even at the application level,
all you need is the
end-to-end RPC level tracing. Oh, that's really cool. So you're able to deploy that ladder and
just have it kind of determine things on its own purely by essentially noticing phenomena and then
conjecturing about what would happen. Interesting. And so what sorts of systems are you actually deploying
and testing these ML systems on?
Yeah, so Sage is in collaboration with Google.
We have tested it in Cornell's clusters
and also Google Compute Engine
for a more large-scale experiment.
And this is using some applications that we developed.
So this is a benchmark suite called the Death Star Bench,
which people usually ask me why it's called that. It's called that because the dependency graphs between microservices are called the Death Star graphs. It's these bubbles that show all the edges between microservices. social network, it has a movie reviewing and browsing site, has an e-commerce site, a banking system, and then a couple of applications that are more related to IoT, so swarms of edge devices like
drones. So both systems we evaluated with those applications and then both on smaller clusters
that are fully dedicated and controlled and also public cloud clusters. So do you foresee, this is kind of a joke question, do you foresee something like
Death Star Bench replacing, say, spec?
It wouldn't replace it. It would be nice if it complemented it for cloud
applications. This is always a problem with cloud research in academia, right? You don't have
realistic applications. You'll never get access to the internal applications
that Twitter or Netflix have, which I mentioned
because there are two companies that were the first to use microservices.
The idea with Death Star Bench was to build something that uses realistic components.
So we use individual microservices that you would find in a real system.
But of course, the functionality is simpler than what you would have on Twitter or Netflix.
It is extendable, though, so extensible though.
So, and also open source.
So if anyone that's hearing this is interested
in this research, feel free to give it a try.
We welcome feedback and contributions.
Very cool.
And so is there, earlier you were talking about
how these sorts of microservices also have sort of different demands on the kind of hardware that they run on.
Does this particular benchmark have a potential dual purpose where, on the one hand, you can use it to sort of train this ML system to figure out what's happening and be able to understand what's happening within this sort of faux microservices deployment,
but at the same time use it to consider
what kind of hardware changes would need to be made?
Yeah, that's a good point.
Absolutely, yeah, you can do that.
The caveat there is making sure
that the logic tiers are representative.
So like I was saying before,
this is not a production application.
So the logic is simpler than what you would find
in a system like Twitter or Netflix. But you can
add any logic that you want on top of it to make it more
sophisticated or even simpler. The front end, which are the web
servers and the backend databases, both in memory and
persistent, those are production class. So those are systems that
you would find in the production system today. And in fact, these
are also the applications that we're using for the acceleration work. So for the network accelerator, those are the applications
that we use to quantify how much time we spend doing network processing and also the performance
benefits once you offload the network stack to the FPGA. I see. So, okay. So let me make sure
I understand. So what you were saying is before when you were using ML to sort of trace potential dependencies between microservices and figure out what the problem was,
the key there is understanding the relationship between the microservices and therefore the logic inside of them can be very simple because you're just looking at the connections between them.
But when you're talking about running it on hardware, the fact that you've kind of like hollowed out what's happening inside of the microservices for the sake of looking at communication now means that it's not necessarily
the kind of thing you want to understand when you're running on top of the server, because
you have this essentially empty shells that are just communicating. And so in order to validly
evaluate what would happen in hardware, you would want to actually fill them out
with real semi-complicated logic of what microservices
would be trying to accomplish.
Is that?
So you can add more complexity to each individual microservice.
They do implement the logic that they're responsible for.
So they do implement a social network where you can add posts, communicate with others,
send direct messages, reply, get recommendations, get advertisements, all these things that
you would find in a typical social network.
But of course, it's not production class code. So you don't have all the complexity that you would find in a typical social network. But of course, it's not production class code, so you don't have all the complexity
that you would find in a social network. So if the hardware study that you're doing depends on
having all the complexity, then you might get different results. But they are simple microservices
that you can start from, and you can add more sophistication in individual components if the
study that you're looking at requires more complexity.
I wanted to circle back to the ML aspect of things that you've been working on.
So you talked about how these microservices and cloud-based systems,
there's a lot of complexity between these different things.
And when you're trying to apply ML to these,
normally for a different task,
like for a natural language processing task or something, machine learning model developers, they try to induce some inductive bias where there is some semantic understanding of the system or the way the system is architected that needs to be induced into the ML model?
Or is that broadly not a concern right now, but it would be good if you could induce some of those things inside?
Have you had conversations about these with ML researchers as well?
Yeah, that's a good question.
It depends what is the goal of the ML system.
So, of course, if the system has some semantic knowledge of what the application is
doing, that would absolutely be useful. The ML techniques that we've been using so far,
they are relatively general purpose, so you would find them in other domains. They do take into
account the topology. So even if they don't take into account the logic, so what the functionality of the application, they do take into account
the end-to-end graph topology.
And that gets you most of the way there.
I think if, for example, the scope of the ML technique is to find design bugs or security
threats, things that are not necessarily related to resource provisioning questions, then having
some semantic knowledge embedded in the model
would be very useful. For the use cases that we've looked at so far, we didn't see that a lot of
accuracy was lost by not having the semantic knowledge, but as we expand to the correctness
debugging as well, I think we'd have to somehow expose that. Are there any particular idiosyncrasies
about designing ML for systems-related tasks? So
you touched upon a few of them. I think in your papers, you have talked about training data and
whether you have access to training data or if you can even generate training data and so on.
Are there any other kinds of idiosyncrasies in systems design that make it uniquely challenging
or require a different mindset compared to vanilla ML tasks?
Yeah. So if you look at many ML papers, the primary goal is improving accuracy. And that's
not always the case with ML in systems. Sometimes it is perfectly fine to drop some accuracy on the
table as long as you can keep the inference time low, because there's a system that works online.
These are applications that
have very strict latency requirements, in fact, day latency requirements, which is even more
challenging. You cannot have a performance debugging system or a cluster scheduler that
takes minutes to decide where to place the task or how many resources to allocate to it,
or even correct the system after it had some performance issue. So it's more about making
the system very interactive and very latency
sensitive, less about getting the absolutely optimal decision.
Do you find that sometimes with adding ML into the decision-making
processes of managing microservices and such, that the ML itself injects variability that you
don't want to accept, meaning that, you know, given that it could be slightly undeterministic,
depending on what the inputs are to ML, you know, maybe the, you know, it can be a sensitive system
where like your input is a little bit different, so your output is different, but really the inputs
shouldn't be that different because the output you know the it's a manifestation
of the same same thing it's just that this one is 1.1 this is 1.12 or something like that but
the result is different and if the result is different now you have variability in the after
effects of a decision that where you didn't want to see any do you ever see problems like that
manifest where you've injected some amount of non-determinism into decision-making processes? Yeah, absolutely. That can happen. So that's a
question of how do you design the technique, whether you fed it with the right training data
and whether the technique has some explainability. That's why I was mentioning explainability in the
beginning, because if you don't know why the technique told you what it told you, there's no
way to say if this is the right output or not.
There's still a lot of work that needs to be done with making the output of ML interpretable.
But I'm glad to see that the ML community is also focusing on this problem instead of just improving the accuracy or scalability of techniques.
There are already some explainability techniques that we are applying in our systems.
And the idea is to essentially interpret the output and use that to gain some insight on how to better architect the system.
So that was how we found a lot of the design bugs that we've identified.
It's still early days, so there's a lot that needs to be done in that area.
But that would address the problem that you mentioned.
I see. I see.
And so what do you see as kind of the harder problem
to solve?
You know, you've been kind of straddling this hardware side
and software side throughout this discussion.
Yeah, which one seems harder?
That's a good question.
So I think the dependent, managing the dependencies
is the hardest problem and is
the most different from what traditional cloud applications look like. Because I had worked on
using machine learning and cluster management before, I was a bit more familiar with solving
that problem. So in that sense, the hardware acceleration was newer to me, but I don't think
that makes it necessarily harder than the other. I think both problems, both sides are two sides of a coin.
They're both challenging. They just have different challenges.
So in the one it's more of an implementation of a hardware accelerator.
You need to understand the networking stack in a lot of detail.
You need to decide what parts of the networking stack does it make sense to
have reconfigurable or hard coded on the other side is more about selecting the
machine learning
algorithms that are suitable for this problem, potentially developing new ones if that's required,
and then collecting the traces, deploying the system. It's more of a distributed system
design in that case. Right. So this sounds like an incredibly complex problem that touches multiple
different fields and multiple layers of the stack.
You know, we have talked about ML, we have talked about systems, networking, operating system, hardware, and so on.
So how do you think about tackling such a problem, right?
Like for someone with a computer architecture background, like are there simplifying, normally researchers do make simplifying assumptions.
So are there like simplifying assumptions that you use or are are there useful signals that you have had to figure out? Like, here's the part of the stack that makes the most sense to go and address first. So how do you go about scoping out this problem?
It's modularity, right? This is what microservices do. So you break it down to the
individual components of the system stack. We're not trying to change everything at the same time.
The first thing we did was design the applications
and then do a characterization study.
So from hardware level to distributed settings,
what are the challenges of these applications?
And you change one thing at a time.
So you look at existing servers,
how well do they behave for these applications?
Where are the bottlenecks?
We in fact revisited some of these old questions
of do you want the big cores,
do you want the small cores for this programming model? And then you go to operating system and networking. What are the
bottlenecks there? What does it make sense to change? At the cluster management level, what is
the impact that dependencies have? And at the programming level, programming framework level,
what is the right programming framework for this type of applications? And then based on that,
you prioritize. You see that if 50% of your end-to-end latency goes towards networking, then that's a huge factor.
That's something that needs to be optimized.
Once that gets out of the way, there are other system tasks that also consume a lot of cycles, but nothing that is as dominant. bottlenecking one microservice can cause my entire system to collapse because this bottleneck has
propagated across tiers and made the end-to-end performance, I don't know, a hundred times worse,
then that's a big problem to tackle as well. But it's on a different level of the system stack. So
not trying to change everything at the same time.
Do you find that there's cohesion across your students in terms of understanding the whole problem from
all the way from hardware up through software or how do you how do you find students that are able
to understand across the spectrum or or do you sort of find once okay this one is going to be
targeted at this particular problem this is going to be targeted at this particular problem
do you modulize your students i suppose a little bit a little bit i do tend to take
students both from the cs and the ec side it's not always that the ec students want to do architecture
and the cs students want to do systems sometimes it's the other way around but i think it's good
to have a mix because just from their backgrounds ec students tend to have a deeper knowledge of how
to build hardware how to work with fbgas, what are the tradeoffs in architectures.
CS students have a more global understanding of how the software stack will affect the
performance of the system, perhaps not always as detailed knowledge of the hardware levels.
There is a bit of separation.
So there are some students that work more on the hardware side.
There are some students that work more on the machine learning for systems. Hopefully everybody understands what the big challenges
are. They are all working with the same infrastructure, the same applications,
and the same servers. So in that sense, there's a lot of collaboration between them.
Great. So this seems like potentially a time to transition into asking about your experience as a junior faculty, since we sort of like touched on,
you know, student and student hiring. So I believe you're our first junior faculty on the,
on the podcast. And so we'd love to hear some of your perspectives about the transition from
being a grad student to being faculty, how you sort of got going, chose your topic, you know, wrote your, you know,
the grant writing process, all that for some of our audience members.
Sure. Let me remember, because it's been a couple of years at this point. So what I found,
I did not, I should say, I did not radically change my topic when I switched from PhD student
to junior faculty. So I still stayed within cloud
computing. I started looking at different types of applications. So for my PhD, I had only looked
at monolithic applications. There was still this idea of applying machine learning to systems,
but not for the complex applications that we have today. And at the time, microservices were not a
thing. And also, I had not worked on hardware acceleration at the time. That is an entirely new, new topic. What I found most challenging switching from student to faculty was
time management first, because you're not, so as a student, especially senior students,
most of what you do is working on your project. And at that point, you know how to work. You're
very productive. You have good knowledge of the area, good knowledge of your project, you can make a lot
of progress. As a junior faculty, first, you're most likely switching to a new place. So that's
a very subjective thing, how easily people adjust to a new place and new people. I'm one of those
people that don't adjust very easily. So that takes some time. But more than that,
you have to adjust to this idea of advising students as opposed to doing the work yourself.
So sometimes it's very tempting to jump in and do a lot of the work yourself, but you're not there
to do the work for the students. You're there to train the students to learn how to do the work
themselves. Another challenge is that as a student, you know what advising style works for you.
But as a professor, you have to adjust to the advising style that works for each of your students.
And it's not always the same.
So some students might like more hands-on advice, more frequent meetings, going through the implementation.
Some students might prefer more high-level discussions, more infrequent meetings,
perhaps longer ones. So finding the right balance between what each student needs to be successful
and to make progress, that is challenging at the beginning. And I guess one good advice that I was
given and that I also gave is to not start with too many students because people are very enthusiastic when they start as a junior
faculty. They want to do a lot very quickly, but people should not underestimate how long it takes
to learn how to advise others. Even if you've had some experience with advising undergrads
or even other PhD students, it's still different when you're the main person that's responsible
for that student as opposed to helping your advisor with a student. So it's better to start with one or two students at the beginning,
learn the game and then scale up.
Yeah, no, that's,
that sounds like very good advice for someone just starting out.
You were asking also about the grants. I can, I can say a little bit about that.
So that's part of the time management too, right?
You have to learn to divide your time between working with students and
teaching,
which I was fortunate that at Stanford,
I did get a chance to teach a couple of classes.
But again, it's not the same
when you are the person that's responsible for the class.
And then grant writing,
which I didn't really have experience as a student.
So that was something that was entirely new.
And you have all the service in the department
and outside the department.
For grant writing,
it's something that you get much better at as time goes by and you learn how to frame what you want to do.
So rarely the first grant that somebody writes is going to get accepted.
Maybe if it's a small one, maybe if it's an industry, if it's a grant to industry.
But if you're submitting to NSF or DARPA, usually the first one's not going to get it, and that's fine. It takes some time to get used to
writing grants, expressing something that you want to do as opposed to something that you have done,
which is what you learn to do as a student. But as with writing research papers, it's something
that you pick up the more you do it.
And again, once you start, when somebody starts as a junior faculty, at that point, hopefully, you know fairly well how to write papers.
So from that to learning how to write grants, it's not that hard.
I do try to not submit too many grants. I also try to not have a really large group because I want to know what each student is doing and meet frequently with each student.
I don't really like to have a hierarchy of multiple postdocs.
And because I don't have a huge group, I don't have to submit a ton of grants.
So that helps as well to put more effort into the ones that I do submit.
You touched upon teaching very briefly here.
Maybe you can
expand on that a little bit, especially the last year we've had COVID and lockdowns and so on.
How has that affected both teaching classes as well as advising graduate students as well?
Yeah, I can tell you about teaching first. So it is challenging, definitely. I am fortunate in the
sense that my classes, neither the undergrad nor the graduate one,
need physical presence.
So students work with machines, but they can do that remotely.
So in that sense, they've been fairly easy to transition online.
Of course, you have to deal with different time zones.
Some people are taking the class at 3 a.m.
So they cannot call in.
They have to watch it later.
So you have to adjust a lot of things.
For example, I used to do short
quizzes in the beginning of my undergrad class. You can't do that if it's 3am for some, because
it's not fair. So something said to be adjusted, but it's been easier, I think, for my classes
compared to a lot of other people's that have in-person FPGA experiments, have robotics experiments
that people need to be there for. So it hasn't been too challenging.
I've seen people struggle much more with teaching online,
but of course it has its challenges.
Advising students online,
it also has the challenge of not being able to sit in front of a whiteboard
and just think for a couple of hours.
It's usually a Zoom meeting.
More often than not, we have a lot of Zoom meetings in a day, so you don't want to make
them even longer. If you have six hours, you don't want to make them eight or ten.
So, of course, that limits a little bit the time that you spend with each student.
But I am glad that all my students, first of all, are healthy. They didn't contract the virus. They are safe.
Most of them are in the U.S.
I have a couple that got stuck abroad,
but hopefully will be making their way back to the U.S. soon.
But they have remained productive and happy with the research.
Okay, that they're a bit less productive than they would be normally.
That's okay. It's a difficult time for everybody.
Indeed, indeed. So a little bit more on teaching.
What are you teaching with a computer architecture class? What kind of classes are you teaching?
Yeah, so the undergrad is a computer architecture class. It's a senior level.
So it's not the introduction to computer architecture,
but we do simple processors.
So the five-stage pipeline,
single cycle, FSMs.
We do caches.
And then we do a little bit about advanced processors.
So out of order,
superscalar,
branch prediction.
Let me see if I remember
the whole memory disambiguation.
A couple more.
So what is it?
Speculative execution and a couple more techniques.
And then they have to build a processor and very low at the end.
So the reason I ask is because I'm actually on the side teaching a class, that same class
right now. And one thing that I've noticed is that a lot of things are the same from when I
learned it a long time ago.
And one of the things that you were talking about in the beginning is how we may have to change how
we build and design hardware, given this new world of cloud computing and how everything looks a
little bit different now. Do you think so sort of from a philosophical sense, we need to change how
we teach computer architecture? That's a good question. I think you still need the basics.
You still need people to understand what a pipeline processor is. But if you look at computer
architecture classes 20 years ago, right, the main thing that they did was, I don't know, single cycle
designs or ISA designs, and that was it. And they have been augmented with several things, the
classes that you see today. One thing that I have done is
I have incorporated a discussion of accelerators and then large-scale systems and very low-power
systems, so cloud computing and then embedded devices. It's not the focus of the class. There
are higher-level classes, including graduate classes, that are specific for cloud computing,
which is one class that I'm teaching. There are also similar classes
for embedded computing, but I do try to mention them even in the undergrad class so that people
know that when you need the basic information, that's not the only way that computing systems
look like today. Makes sense. Makes sense. Yeah. Yeah. I've struggled with this a little bit
myself because you definitely don't want to release a computer architect into the wild that doesn't sort of understand, you know, a five stage pipeline.
But at the same time, there are a lot of things that are going on in the field today that are very different from thinking at that level.
What are you thinking about for the future? So you've had this beautiful body of work about ML systems and cloud and QoS and all
the stuff that we've been talking about.
Do you foresee yourself continuing down?
Like, is there a lot more to squeeze out of this area?
Or are there other things that you're thinking about?
What else makes you excited?
Yeah.
So fortunately, cloud computing is one of these areas that keeps transforming every few years.
So you don't run out of problems to work on. I do plan to continue this work on microservice.
I think there's still a lot of problems that are open, specifically with applying machine learning to systems.
We've only scratched the surface of what you can do. There's a lot more.
For example, what's the right way to design microservices?
If you have a monolith and you want to transition to microservices, how do you do that?
The way people do that today is very empirically.
So they look at the monolith and they start chipping away to design individual microservices.
That's not a very systematic way.
I think machine learning would help in that as well.
I think it would also help with not only performance debugging, but correctness debugging.
So I mentioned in the beginning finding some design bugs.
We're still at the early stages of what you can do with that. So part of it is finding the bugs. You might also
be able to automatically fix some of those bugs using ML. Same thing with security attacks. You
might be able to automatically detect when a system is being attacked and potentially block it.
On the hardware side, the networking acceleration is kind of the first step.
There is a lot of other system tasks that you can accelerate.
There's a lot of work that can be done in programmability for these accelerators, especially since you are exposing them to the end cloud user at this point, which has neither designed
the applications nor the accelerator.
So you need an interface that's much more user-friendly.
And then slightly, not outside cloud computing, but
in conjunction with cloud computing, I also have this project on coordination control for
swarms of edge devices. And the idea there is using a programming framework like serverless,
which is well-suited for applications with intermittent activity, applications that have short-lived tasks,
and a lot of data level parallelism
to offload some of the computation
to a back-end cloud system.
And in that case, there are hardware questions.
What is the right way to build the hardware that
goes into the edge devices?
The ones that I'm more interested in
is how can cloud acceleration help with performance predictability
in applications that span the cloud and the edge? What is the right way to manage resources? So how
do you decide what to run at the cloud, what to run at the edge, and what's the right interface
to design applications that go into the systems, which don't only have the complexity of the cloud
and the multiple components of the application, but also have part of the application that runs on this unreliable edge device, which has very limited resources and
unreliable connectivity. I see that sounds exciting too. That one has been challenging
during the pandemic because we can't access the drones. Yeah, I can imagine. Yeah, because now
computing is, there's a whole question of how to divide it up between cloud and something that is on the edge, how to do that divide when,? Like architecture sits at the boundary of hardware and software, right? So based on this
work, have you had any thoughts or insights into, you know, how should these either microservices
based systems or looking at these edge based systems or edge plus cloud based systems?
Is there something that's missing in our programming abstractions that would enable us to deploy these systems or manage the growing heterogeneity and things like
that in a lot more seamless manner?
That's a good question.
So architects are not going to like this answer, but higher level interfaces are better
for this complex programming models.
So of course it adds inefficiency.
The higher up you go in the system stack, it adds inefficiency.
But the complexity is such in many cases that you cannot in the system stack, it adds inefficiency. But the complexity
is such in many cases that you cannot expect the user to be exposed to all that complexity,
whether that is defining all the APIs between microservices, whether it is deciding what's
going to run in the cloud versus what will run at the edge, whether it's specifying constraints,
scheduling constraints, resource management constraints, security constraints, all this, which application designers have to do today.
Even as it is, it adds inefficiency because many times people get it wrong or they don't truly understand what the application needs or how it should run.
Yes, high level interfaces add inefficiency, but they also abstract away that complexity. And then if you have a system that can understand the application and understands the system abilities and the system
requirements, you can recoup that lack of efficiency there. So do you think the answer then
is higher level interfaces for architects to think about? Or do you think that it's inserting a
runtime layer where the architectural interface can sort of remain the same,
but you still have this kind of another layer inserted
that enables a higher level interfaces?
Yeah, probably something like that.
Of course, again, you're adding another level of indirection,
so that has its issues.
But the more complex systems become, the less you can expect
the user to have a full, a global visibility over all constraints and requirements.
Yeah. Any words of wisdom that or advice that you would like to share with our listeners,
especially, you know, younger faculty or graduate students as they're charting this territory and this new and exciting space?
Let's see, this is not very new, but pick a problem or several problems that you love
working on because you'll spend a lot of time doing it and it's better to work on something
that you really like even if it's challenging, even if it's a heavy design project, which is something that many times
people might try to avoid just because of the amount of time that it takes. So even though it
might take longer, design projects are very useful, especially in the systems community.
And it's important to focus more on the quality of work than the quantity.
I'm also glad to see that a lot of, since you talked about junior faculty, that a lot of tenure evaluations are starting to focus more on the quality of work instead of just
the number of papers that somebody publishes.
And hopefully that will continue.
That's great.
So Christina, thank you so much for joining us today.
It's been an absolute delight talking to you about these various topics.
We were so glad to have you here today.
Thank you.
Thank you for having me.
And to our listeners,
thank you for being with us
on the Computer Architecture Podcast.
Till next time, it's goodbye from us.