Signals and Threads - The Thermodynamics of Trading with Daniel Pontecorvo
Episode Date: July 25, 2025Daniel Pontecorvo runs the “physical engineering” team at Jane Street. This group blends architecture, mechanical engineering, electrical engineering, and construction management to build function...al physical spaces. In this episode, Ron and Dan go deep on the challenge of heat exchange in a datacenter, especially in the face of increasingly dense power demands—and the analogous problem of keeping traders cool at their desks. Along the way they discuss the way ML is changing the physical constraints of computing; the benefits of having physical engineering expertise in-house; the importance of monitoring; and whether you really need Apollo-style CO2 scrubbers to ensure your office gets fresh air.You can find the transcript for this episode on our website.Some links to topics that came up in the discussion:ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers)Some research on CO2’s effects on human performance, which motivated us to look into CO2 ScrubbersThe Open Compute ProjectRail-Optimized and Rail-only network topologies.Immersion cooling, where you submerge a machine in a dielectric fluid!
Transcript
Discussion (0)
Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Jane
Street.
I'm Ron Minsky.
It is my pleasure to introduce Dan Pontacorvo.
Dan has worked here at Jane Street for about 13 years on our physical engineering team.
And I think this is a thing our audience is not particularly conversant with.
So maybe just to start off with, what is physical engineering and why does Jane Street have a physical engineering team?
Thanks for having me, Ron, I appreciate it.
Yeah, I think physical engineering is a term
I think we came up with here
to represent a couple of different things.
But really the team thinks about all of our physical spaces,
be it data centers, offices, co-locations.
And the team's really responsible for thinking about
leasing spaces, renting spaces, designing
and building them, and operating them in a way that allows us to run our business.
So let's dive into the data center space for a bit, because data centers are a place where
trading firms are really quite different.
And there's a bunch of ways in which we've talked about in previous episodes of this
podcast how the networking level of things is different, right?
The vast majority of activity that most ordinary companies do that are highly technical happens
over the open Internet.
We operate in a bunch of colocation sites near data centers and do cross-connects in
the back of the network rather than the trunk of the Internet, or at least not for our core
trading activity.
But how does the classic trading network and trading data center differ at the physical
level?
What are the unique requirements that show up there?
I mean, I think proximity is one that's an important note. I think there's trading venues
that you need to be close to. Latency becomes a big concern all the way down to the length of the
fiber when you're talking about microseconds and lower. So proximity is key. I think performance is
also very important. There's different hardware that is used and from a power cooling standpoint
that also poses some challenges. I think being able to scale over time and not being boxed
in so thinking about optionality and growth and what that growth means you
don't want to build a data center that's properly located and then run out of
space or power there and then have to build another one and then the distance
between those two becomes an issue. So I think there's a few different things we
have to think about. A lot of it comes down to performance at the end.
And when you think about the physical space, a lot of those performance questions come
down to cooling.
Yes, yes.
Cooling is an interesting one because it's a byproduct of consuming a lot of power.
And cooling has seen a few different evolutions over the last 25 years, if you will.
And people are constantly balancing efficiency with performance. Cooling is the largest consumer besides the IT equipment,
is the largest consumer of power in a data center.
So there's lots of effort and there's been efforts over the years to drive down
PUEs to a place where the amount of power you're spending cooling your space is manageable.
What's a PUE?
Power utilization efficiency.
It's a measure of how much total power you consume
divided by the power that you're using for your compute.
So like what fraction of your power
is actually powering the computer
versus all the other stuff that you need
to make the data center work?
That's correct.
And you'll see ranges from low end 1.1,
people might claim lower, but let's say 1.1
up to the worst data centers, 1.8 or 2.
So 1.1 means I'm wasting roughly 10% of the power.
Yep, that's right.
You do that by utilizing different things
like cooler ambient temperatures to do economizer cycles,
to use outside air, ways that you
can use less mechanical cooling, which
is running compressors and big fans that use a lot of energy.
So, Aki, let's do data center cooling 101
just to understand the basic thermodynamics
of the situation.
I want to throw a bunch of computers into a bunch of racks in a data center.
What are the basic issues I need to think about?
And also, other than the computers themselves, what are the physical components that go into
this design?
Yeah.
So you'll ask yourself a few questions, but in the most basic data centers, you could
use a medium which we call chilled water, which is water that is cooled down to, say,
50 degrees Fahrenheit through maybe 65 degrees Fahrenheit.
And you use this by utilizing refrigerant cycles, maybe chillers on the roof, we call
them air-cooled chillers.
Blow air over a coil, you run a vapor compression cycle, you leave that chiller with some cooled
water that now can be converted back to cool air at these devices called Cray units.
So basically, we're taking that warm air that leaves the server
and blowing it over a coil, and that heat's being transferred
to that chilled water medium, and then blowing that air
back into the data center.
So that's the most basic.
Just to zoom out, there's things that are glorified air
conditioners or something.
Except they're not air conditioners,
they're water conditioners.
You're cooling the water.
And then the water is the medium of distribution for the coal.
It holds the coldness, and you can ship it in little pipes all over the building.
Yeah, it becomes very flexible.
Right.
And then the Cray unit is a thing that sits relatively close to the stuff you're trying
to cool where it's got a big metal radiator in the middle of it and some fans.
You blow hot air over the radiator, energy moves from the air into the radiator.
That water then gets cycled back into the cooling system.
That's correct. Yeah., closed loop and continuously runs. The closer you could get those
Cray units or those coils to the load, the better you are, the better heat transfer,
the less losses you have with air recycling in the data center. So talking about that
most basic design, over the years there's been efforts on optimizing by moving it closer
to the load, by increasing the temperatures because the service could
withstand higher temperatures and you could save energy there.
So a lot of work on optimizing and
saving energy over the years to be done.
Got it. Also, you don't always have to
use this closed loop design.
If you're sitting close to the water,
you can literally use the water from the Hudson.
Yeah. I mean, there's some salt in there,
we'll have to deal with that,
but you could reject heat into that.
There's lots of big hyperscalers that use moderately tempered outside air.
You evaporate some water there.
You have that latent heat of vaporization.
You're able to bring that air temperature down and cycle it to the data center.
So there's many, many ways to cool these servers, and air-cooled servers for
many years, it was a function of what's the warmest temperature that I could
bring to the face of these servers and have them run well
And you try to ride that upper limit so you don't use as much mechanical energy to get that air down nice and cold
I'm hearing two issues here
One is we can make the overall thing more efficient by just tolerating higher temperatures for the equipment itself and presumably that's also
Tolerating higher failure rates. Yeah, and I think there's a lot of work. ASHRAE is one body, the American Society of Heating
and Refrigeration Engineers, that's done some work
and written some white papers about allowable
and recommended ranges for temperature and humidity,
and done enough testing there to get comfortable
where OEM server manufacturers are using those as guidelines.
So we run CFD studies to look at those air-cooled scenarios
and try to understand where we can design our systems to allow for both good normal operation but also good operation during failure
scenarios of failed mechanical equipment.
I guess the failure scenarios come up because if you allow your equipment to run at a higher
temperature then when some bad thing happens and your AC isn't working for a little while,
you're closer to disaster.
That's right.
And there's a balance, right?
You can add more Cray units, you can add more chillers to a point at which becomes too costly or too complex.
So you want to look at some failure analysis and understand what are the more likely devices to fail? Those are the ones we want redundant.
When there is a failure, how quickly do you respond? What are your ways to mitigate that?
And then for us, how quickly do we communicate to our business that there's a failure or likely failure about to happen?
What does that mean to our business and how do they respond to that?
Got it, so there's a bunch of pieces here.
There's the air conditioners, the chillers
that cool the water, or I guess not quite air conditioners
as we said.
We've got the Cray units that deliver the localized cooling.
Sounds like there's all sorts of monitoring
that we're going to need for understanding the system.
And then there's the design of the actual room
with the racks and computers.
What goes into that?
How do you make a room full of machines more or
less efficient by adjusting the design?
Yeah, I mentioned moving those cooling units closer to the load.
There's this concept of rear door heat exchanger that bolts a cooling coil right to the back of the cabinet.
So it's within inches to foot from the back of the server,
allowing that heat transfer so you don't have this potential recirculation of that hot air back into the inlet.
So, in a thermodynamics level, why does this matter?
You said, I want to bring it closer.
Why do I care if it's closer?
What does it matter if the hot air has to travel a while before getting back to the
Cray unit to get cooled again?
There's a couple things.
One is you run the risk of that air moving in a direction you don't want it to go in
and then coming back into the inlet of the server.
And now you have an even higher inlet temperature of the server.
The other thing is having to move large volumes of air
to get this parcel of hot air back to a cooling unit takes energy.
Lots of fan energy to move that around.
And the energy consumed by fans goes with the cube of the velocity.
You've got to move that air, and the further you have to move it,
the more power it's consuming.
So why is this mixing so important?
So here's like a degraded model of how cooling works,
which is just not physically right at all,
but it's, I think, 20 years ago when I started thinking
about this, how I thought about it,
which is I have this Cray unit whose job is to
extract energy and it can extract some number of
joules per second or whatever from the system.
And then, I don't know, what do I care about the air unit
as long as the air conditioner can pull out energy
at a rate that matches the rate at which the machines
are running and inserting energy into the system.
Why am I so worried about things like airflow?
In the data center, you have various different types of server network switches, various
different types of equipment.
They're not always built to work very nicely with each other.
For years, we've had situations where you have these servers that move airflow a standard
way and the network switches that might move it in opposite ways.
Now you have to move that air around differently.
So really understanding where
these devices are pulling the air from,
making sure that that area of the data center,
that part of the data center is getting
the cool air that you want and that
hot air is being contained in a way
or the cold air is being contained in a way
where you funnel it right to where you want to consume it and not allow it to have this short cycling mixing where
you can imagine taking a home PC and putting it in an enclosed desk and running it and
seeing what happens at a time that heat would just build up in there and keep consuming
more and more hot air.
Right, so I think you can get hot spots and then some equipment can just get hotter than
you want it to be even if the average temperature is fine.
But I think there's another issue that you also won't successfully lower the average
temperature because that thing I said before about the air conditioning can just remove
some amount of energy per second, it's just not true, right?
It's conditional on maintaining this large temperature differential.
Can you talk a little bit more about why that temperature differential is important and
how that guides the way you build the data center?
The temperature differential is directly proportional to the amount of heat you can reject,
which is also proportional to the amount of airflow.
So as you have larger delta Ts and change in temperature,
you can reduce the amount of airflow you need.
So there's a balance between how much delta T or change in temperature
and the amount of airflow to cool a specific amount of power or amount of heat rejected.
So the industry does things like 20 to 30 degrees Fahrenheit
on the Delta T at servers, that's a nice sweet spot
where you get a flow rate that's manageable
and also a Delta T that's manageable.
There's ways where you can withstand higher Delta Ts
and get less airflow.
Also, that's more likely a play at reducing
the amount of fan energy and energy consumption
used by the mechanical systems.
And just to think about how this shows up in different parts, this delta T matters in
at least two places.
One is you want this delta to be high when you are pumping air into the Cray unit, right?
Because you're going to cool that air, and then the higher the difference in temperature
between the air and the water, the faster energy is going to move.
You're going to get better heat transfer.
And then the exact same thing is true within one of your, like you have my 1U box that
I stuff into a rack and basically the difference in temperature between that hot CPU or other
hot equipment within the machine and the air that's blowing through.
Yeah, and you have to be very careful inside that box, inside that server, is that cold
air parcel enters, right?
It's passing over different pieces of equipment and And the last device that it passes over,
be it a power supply or memory,
you want to make sure that the temperature
at that point is still cool enough
that it could reject that last bit of heat.
So if you have too little airflow
and it increases too rapidly in the beginning,
you don't have any cooling left towards the end of the box
as it's passing over component after component.
So it really matters the physical location
of the things being cooled
and what's the direction of airflow.
And you have to make sure that you're cooling the whole thing by enough.
Yep, and when server manufacturers are designing them,
they're specifically placing memory and chips and power supplies in locations
where they have an expected temperature at a different point in the box itself.
So there are clearly bad things that happen if the delta T is too small.
Yep.
Is there anything bad that happens if you make the Delta T too large?
Yeah, I think there's a point at that air, that warm air that gets back,
eventually gets back to the chilled water becomes problems at the chillers where they
lose their heat transfer abilities above a certain temperature, right? They're designed
at a certain capacity with a Delta T tested. Above that, you're running into areas where
you're not able to
reject heat efficiently back at those chillers,
and you run into issues at the chillers too.
Why is that? If the air itself is too hot,
it's not going to be able to cool it?
Yeah, so the air comes back, goes through the cray,
and now the water warms up, and it goes back to the chiller.
And the chiller has to be able to reject that amount of heat.
It has a Delta T that it's expecting too,
so if it's coming up higher, it could only do a delta T.
So, water's 10 degrees higher at the same delta T,
it's gonna be leaving 10 degrees higher as well.
Maybe this is in some sense,
partially this is about the delta T,
but partially it's also just about the total amount
of energy that you're capable of cooling at the end.
If you exceed the capacity of the system,
you're just in trouble.
You're in trouble and you're gonna dip into redundancy
and all sorts of things.
Balancing flows will get mismatched.
So you'll have some issues there.
I mean, it's not a place you want to see, but unfortunately, sometimes you run into
performance issues or failures and you have to respond to failures and deal with these
situations.
Got it.
So we want to maintain this separation of hot and cold air in order that the air going
into the chiller is as hot as possible and the air going into the machines as cold as
possible. What do you end up doing in the air going into the machines as cold as possible.
What do you end up doing in the physical design of the space in order to make that happen?
Yeah, what you're looking at is flow rates really.
Like I said, you have this fixed heat.
You kind of understand what your heat reduction is going to need to be based on how much power
you're consuming, right?
It's directly proportional to the amount of heat or power you're consuming.
So the amount of heat you have in the space, you have two different ways to deal with it,
whether it's air or water,
you have the ability to adjust flow or adjust the delta T.
So we're sizing pipes, sizing ductwork,
sizing fans for specific flow rates.
The servers are also sized for a specific flow rate of air,
let's say in this case,
and you're trying to match those flow rates,
moving that liquid or that air around
such that you're getting the expected
delta T by providing the correct flow rate.
And then there's also stuff you do in the physical arrangement of computers.
You're talking about the direction in which air flows.
So this is very basic idea of cold row designs, right?
Where you basically line up all the computers so they're all pulling air in the same direction.
So there's one side where the cold air comes in and one side where the hot air comes out
and then you try and blow the air from the hot side.
Back to the Cray unit.
Yeah, that's exactly right.
Cold aisle, hot aisle.
It's the concept that came around.
It's one of the early concepts of as things started getting slightly more dense, people
are like, well, we just have these machines at a room and we're just putting cold air
everywhere.
At some point, you start to deal with this air recirculation issue that I described earlier.
So they said, okay, well, let's really contain it.
So you can think of containment like a piece of duck work
that's just funneling the air either where you want
to bring it, so i.e. cold air to the inlet of the server,
or hot air from the back of the server to the cooling unit
to get that heat transfer back into the water.
Got it.
And then one of the things that we've dealt with
over the years is the way in which all of this moving
of hot air around connects to fire suppression.
Yeah.
So can you talk a little bit about what the story is there?
Yeah.
So obviously with this amount of power and just being in a building, you have to think
about fire suppression.
So most fire suppression around the world, there's other ways you can do it with foam
and gaseous substances, but water is still a big key component in fire suppression.
So you kind of use these devices called pre-action systems.
Ultimately what they are is a valve setup that allows you to delay these sprinkler pipes
from holding water above your racks until you can prove that there's heat or smoke or both.
So we've had situations where maybe you have a cooling failure and the data center gets
warmer than you expect, and the sprinkler heads melt at a certain temperature.
They have a fluid inside there that is in a glass housing
and melts and opens a valve.
Now, thankfully, when this happened,
there was no water in the pipes.
It was a lesson learned from us of, hey, maybe
standard temperature sprinkler heads aren't
sufficient in a data center, especially
when you have a failure.
So something we looked at in detail
and changed our design to have more resilient, higher temperature-rated sprinkler
heads to prevent these failure modes.
I have to say, I love the brute physicality
of the mechanism here.
It's not like, oh, there's a sensor and a microchip
that detects.
It's like, no, no, no.
It melts.
And when it melts, it opens.
And then water comes out.
Yeah, fire suppression, you don't want to mess.
Keep it simple.
Get it done.
Get the water where it needs to be,
and not make it too complicated.
A critical part of this is this two-phase design.
One is you have the pre-action system
where nothing bad can happen until the water flows in.
And the other piece is these actual sprinklers
where the top has to melt in order
for water to actually come out and destroy
all of your equipment.
A key part of that, I imagine, is monitoring.
If you build a system where there are multiple things that have to be tripped
before the bad thing happens, and here the bad thing is water comes out when there
isn't really a fire. If there's a fire, you would then like the-
I'm not sure which one's worse, the water or the fire.
Right. I mean, I think once there's a fire, you probably want to put it out.
That seems good. Having those two steps only really helps you if you have a chance to notice
when one of them is breached. And so monitoring seems like a really critical part of all this. Yeah
that's right and doesn't start or stop at the fire protection systems right.
Monitoring is key throughout the entire building, cooling systems, power systems,
various different things, lighting, it could be anything, but traditionally
there's been different platforms for different things. Power systems and
mechanical systems would have different
either control or monitoring solutions.
And over time, they've gotten to a place where it's unwieldy.
If you're trying to manage a data center,
you're looking at three or four pieces of software
to get a picture of what's going on inside the data center.
So at Jane's Ship, we've worked over the years
to develop our own software that pulls in data
from all these places and puts it in
a nice format for us to be able to monitor and look at in a single pane of glass, if
you will, understand exactly what these alerts mean, put our own thresholds on the alerts
that is something that we care about, maybe not necessarily the manufacturer.
Maybe the manufacturer is a little bit more or less conservative and we want to be more
conservative.
We want to get an early alert.
We're able to change our own thresholds.
And then we're able to use our software
to deal with projections as well on top of the real-time
monitoring and help us understand
where we're power constrained, where we're cooling constrained.
If we were to build a new phase, how much capacity
do we actually have?
Is there stranded capacity that we can use and give our data
center admin folks a look as to, hey, we
have some stranded capacity here.
Why don't we look at racking the next servers
in this location?
Yeah, and I think it's actually really hard for people
who are creating general purpose software
to do a good job of alerting at the right moment,
because there's a delicate balance, right?
You tune it in one direction, and you don't see things
until it's too late.
And you tune it in the other direction,
you see way too many alerts.
And that's also like not seeing anything.
You need to kind of adjust to the right level where it shows up just enough
and at a high enough percentage of the times where it says something, it's a real issue.
Yeah. It's a Goldilocks problem. It's one of those things that I don't know that there's any way to
get really good at it without reps. And we've used both our building, construction, and testing,
and commissioning to help tune our alerting.
We've had real-time incidents which help us understand
if we're getting the right level of alerting and reporting.
And when we do postmortems, we're looking back,
hey, when was the first early indication or early warning?
Was there something we could have done then?
Was there maybe a different alert that we could have set up
that would have given us even earlier notice?
So yeah, I think it is a bit of the art of
understanding when to alert and when not to alert,
especially out of hours, waking people up,
trying to respond to different things.
You really want to make sure it's an emergency
or something that needs to be responded to.
Do you have any examples of real-world stories of
failures where you feel like the monitoring system that we
have allowed us to respond in a much more useful way?
Yeah, I think there's lots of examples. I can give one. We had a data center that was
using chilled water, as I mentioned, as the medium. It was late in the day and we're noticing
sooner than the provider in many cases, temperature is increasing. And we have temperature sensors
at various points. You can have temperature sensors at the Cray units, but you could also
have temperature sensors at the servers, at the racks. When you're at the rack measuring
temperature, you're able to see smaller changes much quicker than at the servers, at the racks. When you're at the rack measuring temperature, you're able to seek
smaller changes much quicker than at the large volume, either at the chillers or
at the Cray unit.
So in this one case, we saw some temperature changes and
we've investigated, poked around, and
we're able to uncover a bigger problem, which was an unfortunate draining down
of a chilled water system that caused major incident for
us that we had to respond to.
But had we not had the monitoring system, we probably wouldn't have been able to communicate to the business what
was happening, why it was happening, and how long it might take to recover.
Right, so this was like a major incident. This was someone who was servicing the building
basically opened a valve somewhere and drained out all the chilled water.
Yeah, that's right.
And during the trading day, notably, it was towards the end of the trading day.
358, I think, was when we first got our alert. So it was a very scary time to see these alerts.
And the first couple of moments in any incident
is very much a scramble trying to understand
what do these signals mean, what is happening,
trying to gather as much information
but also not make any bold claims initially
until you have a good clear picture of what's going on.
But what happened here was there was a maintenance event,
switching of chillers, normal operation,
except for one
of the chillers was out of service and the valving that was manipulated ended up diverting
chilled water to an open port and drained down thousands of gallons of chilled water
that we critically needed to cool our space.
A couple things here that we learned, I mean I think something called the method of procedure
or MOP is something that is extremely important.
What you don't want in a data center or any critical facility is for technicians to go
around and do maintenance and do service on a whim or off the top of their head.
You want a checklist.
You want something that is vetted and in a room when there's no stress.
And you can kind of create a step-by-step process to avoid opening, closing, doing anything
incorrectly.
So really the time to plan and figure out the procedure is before the activity, not
during the activity.
I think this may be not obvious to people who work in other industries is you might
think it's like, okay, you messed something up and now this data center is in a bad state.
But data centers are kind of units of failure.
You just fail over to the other data center or something.
And we do have a lot of redundancy and there are other data centers we can use.
But they're not the same because locality is so important.
So if you have a data center that's in the right place,
yeah, you can fail over to other places,
but it's not like a transparent hot swap over.
The physical properties are different.
The latencies are very different.
And so it really is important at the business level
that you understand, okay, we're now in a bad state,
temperature is starting to climb
and how long can we safely run
under the present circumstances?
And the difference between being able to finish
the trading of the day and getting to the closing
and being able to trade a few more minutes after versus not
could be a very material difference
to just the running of the business.
Yeah, and that's a great point.
I think it's a key distinction
between what you would call hyperscalers
and financial enterprises as far as
locality of their data centers and why we tend to think a lot more about the resiliency of a site
rather than, as you mentioned, being able to fail over site to site. So we do spend more time
thinking about our design and the resiliency in our designs because of that fact. And there's
knock-on effects. You have an issue like this. You have to refill all this water that takes a period of time, right? So to your point, being
able to communicate how long this is going to take. We're there doing back-of-the-envelope
calculations of, all right, we have this flow rate on this hose here. How long will it take
to fill up 12-inch pipe that goes up however many feet and being able to do that on the
fly and report back. Also, going there in person and being able to talk to the people involved.
We have a team that responds in person.
We don't just rely on a third party.
So we had individuals go to site, supervise, ask questions,
be able to feedback to the rest of the team what the progress is,
what the likely recovery time or recovery looks like,
and how could we change our business or trading based on those inputs
that we're getting from the rest of the team.
I feel like a really important part of making that part go well is being able to have open
and honest conversations with the people who are involved.
And how do we try and maintain a good culture around this where people feel open to talking
about mistakes that they've made in the challenging circumstance where some of the people are
not people who work for Jane Street, but people who work for our service providers. How does that even work?
Yeah, I mean it happens long before the incident. If you don't have those
relationships in place prior, you're not going to be able to do it in real time
when something's going wrong. So the team that I sit on is responsible for
everything about our physical spaces from negotiating leases to designing the
spaces to building them to operating them. So we're sitting with the key stakeholders of these third-party
operators many times from day one and they see the same people in the same
team, appreciate the inputs that we're giving along the way, but because we've
developed that relationship for many months at times for years, we're able to
have those real conversations where they know we're gonna ask questions we want
to understand if they have a problem or not.
We'd rather hear the bad news than be surprised later.
And the only way you get there is by putting that work in early and often.
And developing a lot of trust.
Developing a lot of trust both ways and showing them that mistakes will happen.
We are building these sites knowing mistakes will happen.
How we respond to those as a team, cross walls, if you will. We're not the same firm.
How we respond to those in a way that
allows us to mitigate or lessen the blow
is going to make or break it.
The mistake already happened.
How do we get to the next point?
So all of the discussions we've had here
are, in some sense, around the traditional trading-focused
data center.
And in the last few years, we've pivoted pretty hard
towards adding a lot of machine learning-driven infrastructure to our world.
And that has changed things in a bunch of ways, and I think, obviously, it's changed lots of things at the software level and at the networking level.
What kind of pressures has doing more machine learning work put on the physical level?
Yeah, that's a great question. And I think this is kind of an industry-wide thing where the densities for some of this GPU compute
have just increased a lot, the power densities and power consumed.
I think that poses a couple of big questions.
And if I focus on the cooling and the power side of that, it's doing a lot of the same
stuff that we're doing but differently, tighter, closer, bigger capacities, bigger pipes, bigger
wires, things like that.
Some of the numbers are getting so large that a suite and a data center or a couple of rows or racks, that the amount of power there could be consumed
in a single rack. And that's something that is scaring people, but it's also creating
a lot of opportunity for interesting designs, different approaches. We can talk a little
bit about liquid-cooled computers and GPUs. I think that that's something that has really
pushed the industry to hurry up and come up with solutions,
something that maybe the high-performance computing world was doing for a bit longer,
but now anyone that's looking to do any AI ML stuff will have to figure out pretty quickly.
I think the first part of this conversation, in some sense, can be summarized by,
water is terrifying.
That's right.
And then now we're talking about actually we want to like put the water really, really close to the computers.
So first of all, actually, why why? Again from a physical perspective why is
using water for cooling more effective than using air? Based on the specific
heat and the density of water versus air it's three to four thousand times more
effective at capturing heat. Is that like a little like three to four thousand
times more effective? Is that measuring the rate at which I can transfer heat, how much
heat I can pack in per unit.
Like what is the thing that's 4,000 times faster?
Yeah, the specific heat is like four times more,
you know, more heat capacity at a unit, per unit mass
of water versus air.
And then the density is multiples, obviously higher in water.
So you combine those two and per unit mass,
you're able to kind of hold more energy in.
So the point we were able to move a ton more mass
because water is so much gentler than air.
That's right.
It's in a smaller pipe rather than this larger duct.
Got it.
Okay.
So, water is dramatically more efficient.
More efficient.
And that's why it was being used to chilled water from the chiller to the cray.
You're using these smaller pipes and then when you get to the air side, it gets very
large in the duct size.
So, it's being used in data centers for many years, but to your point, scary at the rack,
and something that we've tried for many years
to keep outside of the data center,
or outside of the white space, if you will.
Got it. And so now, what are the options when you think,
okay, how can I bring in the water closer to the machines
to make it more efficient? Like, what can I do?
Yeah, there's a couple things you can do.
One, you could do something called immersion,
where you can dunk your entire server
right into this dielectric fluid and be able to transfer that heat right
to that liquid by touching the entire server. And because the fluid is non-conductive, safe
to do.
I want to interrupt. I want to go back and answer the question in the other direction.
I feel like there's levels of increasing terror and I want to start from the least terror
to the most terror.
Sure.
So I feel like starting with exchange with the-
The dunking. We're starting with the exchange doors and then with the direct liquid cooling.
And then we cannot use water at all and do the direct kind of immersion thing.
Yeah, yeah.
So with the rear door heat exchangers, it's getting that liquid very close to the server,
but not actually touching it, right?
So you're inches away.
And this is the moral equivalent of stapling the Cray unit to the back of the rack.
Yeah, just pushing it over, bolting it to the back.
Other people have done other things like putting a roof in there with a coil.
But yes, it is getting it as close to the rack as possible.
What's the roof thing?
I think Google for years was doing something where in that hot aisle at the top of it,
you're putting cooling units, custom Cray units that sit at the top of this hot aisle containment.
Let that hot air pull in, use fans to pull that hot air in,
cool it and then send it back to the data center.
So you then put an actual roof over the top?
Yes.
But then how does that interact with fire suppression?
So they have these panels that also melt.
Amazing.
Yeah. Roof panels that in a thermal event at
a certain temperature will shrink and then fall out of
their grid system
and now allow sprinklers to be able to get to the fire below. That's amazing. Okay, so we could do
really serious containment by physically building a containment thing around it and then we don't
have to bring the water that close in. We could bring the water really close in by stapling the
cray units to the back of the door and like moving water around. What else can we do?
So the other one which is most prevalent now with GPUs
is something called DLC or direct liquid cooling.
This is bringing water or liquid to the chip.
And when I say to the chip, you can imagine
an air-cooled chip has this nice chunky heat sink
on the back where you blow air over
and you transfer that heat out.
Take that off for a second and bolt on a coil
or heat exchanger, if you will.
So maybe it's copper or similar to brass.
Heat sink that sits on there and has very small channels for
a liquid to pass through and absorb the heat.
So now you have this heat sink on the GPU and
you have to get some liquid to it.
So the liquid is something that we have to be very careful about because of
these small channels on these, what we're calling cold plates on these GPUs. And those are essentially just radiators. That's right.
Radiators instead of blowing air through your pushing water. Instead of a big air-cooled heat sink, it's a
radiator or coil that's sitting on a chip and some thermal paste to have some
nice contact there and transfer as much heat as possible. You've used these
micro channels to spread that water out to give you the greatest surface area to transfer heat over. And then the liquid that you're passing over is something
that you're just very conscious about the quality of that liquid. You don't want to plug these very
tiny micron sized channels. You're doing things like very, very fine filtration. You're doing
things like putting propylene glycol in there to prevent bacterial growth
within the pipe. All these things can lead to worse
performance, lower heat transfer, perhaps chips that
overheat and degrade.
Part of running water through my data centers, I have
to be worried about algae or something.
Sure. Yeah, absolutely. The types of materials you're
using, how do they react, how do two different
materials react and how do they corrode over time to
similar metals, things like that.
So there's this list of wetted materials.
Like once you're touching this cold plate at the server, you have to be very careful
about the types of materials.
So we're using types of plastic piping or stainless steel piping because we're very
concerned about just the particulates coming off of the piping and any small debris.
Okay.
So that's the whole problem that hadn't occurred to me before.
But another maybe more obvious problem is, I don't know, pipes have leaks sometimes.
Now we're piping stuff into the actual servers.
I assume if there's a leak in the server, that server is done.
Yeah, and maybe the ones below it or adjacent to it.
And in fact, there's some concerns about if it's leaking, what do you do?
Do you go to the server?
Can you even touch it?
Yeah, human health and safety. Like, there's 400 volts potentially at this rack.
So, there's a lot of procedures and standard operating procedures,
emergency operating procedures, and how do you interact with this fluid
or potential leak in a data center?
What are the responsibilities, both of the provider and also the data center occupier?
So, is there anything you can do at the level of the design of the physical pipes
to drive the probability of leaks very low? Yeah I
think one of the things that we do is really consider where the pipe
connections are, minimizing them. Off-site welding so we have nice solid
joints instead of a mechanical bolted joint or threaded joint. So thinking
about the types of connections, thinking about the locations of the connections,
putting leak detection around those connection points.
So monitoring again.
Monitoring, yep, of course.
And with monitoring, it's, well, what do you do?
We just sensed the leak.
Are we going to turn things off?
Are we going to wait and see?
Are we going to respond in person to see how bad it is?
Potentially, you're shutting down maybe a training run that's
been going on for a month.
Although hopefully you have checkpoints more recently.
Sure, sure, sure.
But it's still impactful.
Even if it's a couple of days or a day since your last checkpoint, whatever
it is, we don't want to be, as the physical engineering folks, we don't want to be the
reason why either training job has stopped or, furthermore, inference where it could
be much more impactful to trading.
We have all of these concerns that are driven by power.
Can you give me like a sense of how big the differences in power are? What do the machines that we put out there 10 years ago look like and
what do they look like now? Yeah, 10 years ago you're talking about 10 to 15
kW per rack as being pretty high. KW kilowatts, we're talking about like
amount of power per second essentially. Yeah, power is energy per second being
consumed at a voltage and a current.
And we've done things over the years, like 415 volt distribution to the rack
to get to the point where the higher voltage,
you're able to get more power per wire size.
So being able to scale helped us early designing those power distribution
systems.
10 to 15 kW was a high end.
Now we have designs at 170 kW per
rack, so more than 10 times. If you listen to Jensen, he's talking about 600 kW at
some point in the future, which is a mind-blowing number, but a lot of the
thermodynamics, it stays the same, but there's many, many different challenges
that you'll have to face at those numbers. One of the issues is you're
creating much more in the way of these power hotspots, right? You're putting tons
of power in the same place, and the data centers we used to build just
could not tolerate the power at that density at all.
If you go into some of our data centers now that have been retrofitted to have GPUs in
them, you might have a whole big rack of which there is one computer in that row, because
that computer is on its own consuming the entire amount of power that we had planned
for that rack.
Yeah, it looks pretty interesting, yeah.
If you're looking to deploy as quickly as possible and use your existing infrastructure, and consuming the entire amount of power that we had planned for that rack. Yeah, it looks pretty interesting, yeah.
If you're looking to deploy as quickly as possible and use your existing infrastructure,
you're having to play with those densities and say,
all right, well, this one device consumes as much as five or 10 of those other devices,
so just rack one and let it go.
But the more bespoke and custom data centers that we're building,
we want to be more efficient with the space, right,
and be able to pack them in more dense.
So you end up with less space for the computers and racks
and more space for the infrastructure that supports it.
So the space problem isn't as much of a problem
because things are getting so dense.
What's the actual physical limiting factor
that stops you from taking all of the GPU machines
and putting them in the same rack?
Is it that you can't deliver enough power to that rack?
Or you could, but if you did, you couldn't cool it? Because obviously the overall site has enough
power. So what stops you from just taking all that power and like running some extension
cords and putting it all in the same rack?
I mean, the pipes and wires just get much bigger. And as these densities are increasing,
you're having to increase both, right? If you're bringing liquid to the rack, your pipe size
is proportional to the amount of heat you're rejecting. So you're able to increase that up to a point at which it just doesn't fit.
And then the same thing with power.
And power is becoming interesting because not only do you have to have a total
amount of capacity, you also have to break it down and
build it in components that are manageable.
So we have these UPS systems, uninterruptible power supplies, right?
And they're fixed capacity.
So if I have a megawatt or, yeah, say a megawatt UPS and I need to feed a two or three megawatt cluster
I have to bring multiple of these together and now distribute them in a way that is
If one of them fails, where does the load swing over? So you're thinking about all these failure scenarios
So it's not just bringing one large wire over and dropping it
so it gets very cumbersome and messy and
over and dropping it. So it gets very cumbersome and messy and there's also different approaches by different OEMs and how their power is a DC power, is it
AC power, at what current, where are you putting your power distribution units
within the rack, where do they fit. So there's a lot of different constraints
that we have to consider. Yeah it's interesting the degree to which power
has now become the limiting factor in how you design these spaces and how you
think about how you distribute the hardware.
And then you mentioned it's not good to waste space, and that's one reason to put things
close to each other.
But it's also miserable from a networking perspective to have things like splayed across
the data center.
One thing that maybe most people don't realize is just that the nature of networking for
modern GPUs has completely changed.
The amount of data that you need to exchange between GPUs is just like dramatically higher and there's all new network
designs. One thing which has really required a lot of thinking at just like
how do you physically achieve this is this thing called a rail optimized
network where the old classic design is like I have a bunch of computers, I stick
them in a rack, there's like a top of rack switch and then I have up links from
the top of rack switch to some more central switches and I have this tree
like structure.
But now you sort of think much more at the GPU level.
You maybe have a nick married to each individual GPU.
And then you're basically wiring the GPUs to each other directly in a fairly complicated
pattern and it just requires a lot of wiring and it's very complicated and it's a real
pain if they're going to be far from each other.
Yeah, and being able to fit all that fiber or that if in a band or wiring, whatever
it may be within the rack, also leaving room for airflow or leaving room for
pipes, you end up looking at some of these racks and not only do you have all
these GPUs, but you have all these wires, all these network cables, all these
pipes now, and you're trying to fit everything together, right?
So it really does become a physical challenge in the rack.
And it's one where maybe the racks get bigger over time,
just to give you more space,
since you're not using as many as you used to.
Maybe let them get bigger so you can fit
all these components in more effectively.
Yeah, and maybe just more kind of customization
of the actual rack to match.
Because you're building, like, in some sense,
these fairly specialized supercomputers
at this point.
There's some folks working on this,
called the Open Compute Project. That is some folks working on a called the open compute project.
That is they are thinking about what the next generation of rack looks
like and DC power distribution, wider racks, taller racks, various different ways.
And I think different folks have different ways of approaching the problem.
What's clear right now is standardization is not really set in stone and it's
going to take a little while before folks start to agree on some standards.
Yeah, and a lot of this is just driven by the vendors
announcing it's like, we're gonna do this big thing
in two years and like, good luck guys.
Yeah, let us know how you figure it out.
Yeah.
The other thing that always strikes me about these setups
is that they're actually quite beautiful.
A lot of work goes into placing the wires just so
so that it turns out the highly functional design
is also quite pretty to look at.
Yeah, and I think it's extremely important
for troubleshooting.
Imagine you run a fiber and that fiber gets nicked or fails,
and you have this messy bundle.
It's like, good luck finding that,
and how long is it going to take to find it and replace it?
We have a great team of data center admins
that take a lot of care in placing things, designing
things, thinking about not just how quickly can we build it,
but also how functional and how
maintainable it is over time.
So we spend a lot of time talking about data centers,
but a lot of what our physical engineering team thinks
about is the physical spaces where we work.
And I think one particularly important aspect of that,
at least from my perspective, is just the desks.
So can you talk a little bit about how desks work
at Jane Street and why they're important
and what engineering challenges come from them?
Yeah, that's a good one. You know, I think back early in my career here at Jane Street and why they're important and what engineering challenges come from them? Yeah, that's a good one.
You know, I think back early in my career here at Jane Street, and it was my first time
working at a trading firm or financial firm, and it was very interesting on everyone sitting
at these similar desks.
But at the time, these desks were fixed.
If we wanted to move someone around, it was breaking down their entire setup, their monitors,
their keyboard, their PCs, and moving around is just very time consuming.
And it caused desk moves to happen less frequently than we wanted to, just as
teams grew and people wanted to sit closer to other people. So, at the time,
before we moved into a current building, we said, hey, there's got to be a better
way to do this. We hadn't seen it at the time. So, we said, very simply, why don't
we just put our desks on wheels and move them around? And from a desk surface
level, like...
I want to stop for a second.
We're talking about how to solve this problem, but why do we have this problem?
Maybe you can, for a second, paint a picture of what does the trading floor look like,
and why do people want to do desk moves, and what's going on anyway?
Yeah, I think that for our business, we really value collaboration,
and everyone sits next to each other.
There's no private offices.
There's no, hey, this group sits in its own corner.
We very much have these large open trading floors. People want to be able to look down
an aisle and shout and talk about something that's happening on the desk in real time.
And so we have these long rows of desks, people sitting close together. They're four feet
wide. And really it's about having close communication and collaboration.
And I will say, there used to be more shouting than there is now. And the shouting is much
more on the trading desks, especially when things are really busy, and there are more
different kinds of groups.
You go to the average developer group, and it's a little bit more chill than that.
But it is still the case that the density and proximity is highly valued, and the ability
to stand up and walk over to someone and have a conversation about the work is incredibly
important.
And we also have side rooms where people go and can get some quiet space to work and all
of that.
It is still very different from like, I don't know, certainly places where offices are the dominant mode,
or even the cubicle thing. It's just way more open and connected than that.
Yeah, some of the best conversations we have in our group are just spinning around in our chair
and talking to the person behind you or across.
And we do enough moves, if you will, throughout the year that you get to sit next to
different people and have different interactions. So I think from a culture standpoint, from the way
we work at Jane Street, we really value their close proximity to each other.
And how often do we have these desk moves?
Once a week, varied sizes. So there's a dedicated Mac Moves, Eds, and Change team that executes the
move. At times, it's hundreds of people. it's amazing, but it's because the physical engineering team worked very closely with our IT
teams to develop a system where you're able to move these desks. Now, like I
said, the surface and then the physical desk putting on wheels, that's fine, you
could do that, right? But now you got to think about the power and the networking
and the cooling, you know, all things we talked about earlier, and those were the
challenges on this project, where how do we create a modular wiring system where it's resilient, it works, it doesn't get
kicked and unplugged and stuff like that, it doesn't pose any harm, but also can be
undone once a week and plugged in somewhere else. How do we think about
cooling and we use this underfloor cooling distribution system where you're
able to move the cooling to support a user or to cool their PC under the desk
by moving these diffusers around the floor
because of this raised floor system.
So yeah, let's talk about how that works.
What's physically going on with the cooling there?
So what we do here, again,
we use a chilled water medium in our offices,
but we build these air handlers
that discharge air below the floor.
So in essence, you take that cold water,
you blow that warm air over it and push it under the floor.
We supply between 60 and 65 degrees Fahrenheit,
maybe closer to 65. And you get this nice cooling effect where you're sitting.
There's a real floor, and then there's space, a plenum, I guess?
Yeah, like 12 to 16 inches, depending on our design.
And then a big grid of metal or something and tiles that we put on top of it.
Yep, concrete tiles that sit there that have holes in them.
Various ones have holes for airflow, and also cable pass through for our fiber to the end
of the row. And the air underneath is pressurized. It's
pressurized, very low pressure, but it's pressurized and it gets to the extents
of our floor. And as an individual you're able to lean over and adjust the amount
of flow by rotating this diffuser. You're able to provide your own comfort
level where you sit, but also pretty importantly be able to cool the desks.
And the traders have pretty high energy, high power PCs under the desk and they're
in enclosed and we're able to get some cold air to them.
It was a design that was much better than a previous thing we did in London,
which was CO2 to these coils in the desk, which was kind of scary.
Right.
That's a little bit more like that was a case where we'd done piping of.
Yeah.
It was one of those knee jerk where the desks are getting hot.
So let's make sure we squash this problem.
And that was prior to my time, but it was something where I think a few other firms were doing.
Liquid cooling or CO2 cooling to the desk, it's an approach that's died down at this point.
In some sense, the approach we have now is one where we want the desks to be modular.
So you can literally physically come and pick it up and move it somewhere else,
and someone's setup just remains as it was you don't have to
reconfigure their computer every time you do a move yeah that's the key and
that's kind of incompatible if we're going to do the cooling by having like
copper pipes carrying co2 everywhere yeah just couldn't move it it's just not
gonna work yeah and if you have overhead cooling it's also not great because it's
not landing exactly where the desk is landing so we have a lot of flexibility
here but to your point you know one of the main reasons of doing it
is people set up their desk exactly how they like them.
Their keyboard, their mouse, their monitor set up.
You come to Jane Street, you get a desk,
and that's the desk that stays with you
and it moves around with you.
So when you come in the next day after a move,
besides being in a different spot on the floor,
you feel exactly the same as you did the day before.
I wonder if this sounds perverse to people,
ah, there's a move every week.
It's worth saying, it's not like any individual moves
every week, but somebody is moving every week.
And there are major moves that significantly reorganize
the floors and which teams work where,
at least once, probably twice a year.
That's right, making room for interns and.
Right, some of it's ordinary growth, some of it's interns.
And I guess another thing that I think is important about,
we in part do it because we value the proximity.
And so as we grow, we kind of at every stage want to find what is the optimal set of adjacencies
that we can build so teams can more easily collaborate.
And there's also just some value in mixing things up periodically.
I think that's like true on a personal level.
If you change who you sit next to, even just by a few feet, it can change the rate of collaboration
by a lot. And it can change the rate of collaboration by a lot.
And it's also really true between teams.
At some point, the tools and compilers team
used to not work very much with the research
and training tools team.
And then research and training tools
grew a Python infrastructure team.
And suddenly, there was a need for them to collaborate a lot.
And we ended up putting the teams next to each other
for a few months.
And then six or 12 months later, when
we had to do the next move, we decided, ah, that adjacency was now less critical and other things were
more important and we did it in other ways.
Yeah, it lowers the bar for asking for these moves, right? If we know we can kind of revert
it, it allows us to take more chances and put teams closer together, see how the collaboration
works. I think it's done wonders for our culture, being able to have maybe tenured folks next
to new joiners to allow them to learn a little bit faster.
I think it's been great for our team as well.
Yeah.
And even though a lot of engineering has gone to make it easy, one shouldn't understate
the fact that it's actually a lot of work.
Yeah.
And the team that does these moves works incredibly hard to make them happen.
And they happen really reliably and in a timely way.
It's very impressive.
Did you have to do anything special with the actual physical desks to make this work?
Yeah. We work closely with some of the manufacturers to come up with a Jane Street standard desk,
figuring out exactly where our cable tray would land for the power and the networking,
using end-of-row switches that we have, being able to open perforations for airflow to flow nicely through the desk,
putting wheels on the desk to allow wheels that lock and move position to allow
us to wheel them around pretty carefully.
And we did this globally too, right?
So that we've created a desk, we had to pick a standard to use, we built them to a metric
standard and we've shipped them all over the world.
So we have this one desk that we use globally at Jane Street or one style of desk that we
use globally at Jane Street and we're able to move it in different locations.
So we had to find a manufacturer that would meet all those needs.
The shape, the size, fitting our PCs,
having our monitor arms that we like,
having the raise lower feature,
having a pathway for our power and data to flow.
So there's a few different things that we had to factor in there.
But once we got a design that we're happy with,
we're able to deploy it pretty rapidly.
Actually, how does the power and data connections work?
I imagine you have wires internally in the desk,
but how do they connect from desk to desk?
What's the story there?
Yeah, so under this floor, under this 12 to 16 inch raised
floor, we have these power module boxes
where you gang together a lot of circuits.
And then you have these module plugs that plug in.
So we'll use an electrician to come in and plug them
in underneath the floor.
We'll lift the floor tile, which is very easy to do.
And then we have these predetermined whips depending on what position the desk is.
They're fixed lengths or we could adjust them if we need to, we can shorten them.
And you run these whips out to the end of the row where we have something called a hub
and basically pass through for these wires to come from the floor above and run along
the back of the desk in a nice cable tray.
For the networking side, we ran into design constraint where it was like, at some point you're just running copper
from your IDF rooms out, your network switches
out to the desk, but you end up with these giant bundles
of copper.
Obviously they have a distance limitation,
but also they've gotten so large over time
that they would block the airflow under the floor.
So now we're like, okay, well, here's a new constraint.
So then we started designing, bringing fiber. And this was a while
ago that we decided this, bringing fiber to the
end of the row and housing our switches, our
network switches in these custom enclosures at
the end of the row that bring power to, bring
cooling. We cool our switches out there with the
same underfloor cooling that we use to cool
people. So now we have these very small fibers
that don't block the airflow, land at a switch
and the copper stays above the floor behind the desk.
So instead of a top of rack switch, you have an end of row switch.
That's right.
We like to joke that our offices feel a lot like data centers just stretched out a little
bit with people in them.
So other than this physical arrangement of the desks, what are other things that we do
in these spaces to make them better places for people to work and talk and communicate
and collaborate in?
Yeah, that's a great question. I mean, I think one of the things that we try to do as a group
is really talk to our coworkers and understand what they need and what they want.
Some things that we've done, you know, our lighting system,
we spend a lot of time thinking about the quality of lighting.
We have circadian rhythm lighting, which changes color throughout the day to match the circadian rhythm,
where you come in the morning, it's nice and warm, allows you to grab a cup of coffee,
warm up, get ready for the day, peaks at a cooler
temperature midday after lunch, and then fades back
at the end of the day.
So that's something that we think is pretty cool,
something we've been doing globally for a while now.
How do we know if that actually works?
How can you tell?
Obviously you can tell if the light temperature
is changing in the way that's expected,
but how do you know if it has the effect on people
that you think it does?
Yeah, that's a good question.
I mean, I think the only way is to talk to them.
And the folks that we've asked about it feel pretty good about the effect it has.
I mean, I think speaking for myself, I know coming in in the morning to something
like 4,000K lighting color temperature, it's just harsh.
And coming in at 2,700, 3,000 feels a little bit more easier to adapt to.
Is there also like outside world research that validates this stuff?
Yeah, I don't know that any of them tie to any performance, but there is logic as to
why the color temperatures throughout periods of days has an energizing effect to you or
relaxing effect.
But once you design the system and build it, we have complete control over it.
We can do things like have it follow a circadian rhythm, or we can pick one color that we think
everyone likes and say, all right, that's going to be the color from now on. So by designing it and building it with
this functionality we're able to on the software side make changes as we need to.
Okay so color is one thing what else do we do? Yeah I think we touch on the
cooling and I think at the end of floor cooling is another example of where we
think about thermal comfort and giving people the ability to adjust temperature
at their desk but also the fact that we're cooling under floor keeps that air very close to the breathing zone.
So that air comes out of the floor,
comes up five or six feet,
and it's as fresh as it could be right at the desk.
So we're mixing outside air,
we're mixing that air and sending it out
and allowing you to consume it right
when it comes out of the floor.
The other thing that it allows us to do is
by keeping a smaller Delta T, we move a lot more volume.
And by moving a lot more volume, we have more air changes.
You're getting more fresh air.
We use something called MERV 16 filters, like hospital surgical grade filtration
to clean our air at twice the rate normally, because we're moving twice the
volume that you normally do.
It gives us the ability to keep our air very fresh at the breathing zone where
people are working.
Actually, this reminds me, there's one topic that I know we've talked a bunch about over
the years is CO2.
What's the story with thinking about the amount of CO2 in the air and what have we done about
that?
Yeah, there's been some reports of varying degrees talking about performance versus the
CO2 concentration.
Human performance.
Human performance, yes.
Yes.
And it's hard to tell exactly the impact, but it does seem that there's enough evidence
that it does impact folks.
And like roughly at high levels of CO2, you get kind of dumb.
Yeah, that's right.
That's roughly correct.
Yeah.
What are those levels like at parts per million?
What's totally good?
Where do you start getting nervous?
I think you start getting nervous above 1,500, 2,000 parts per million.
Outside is probably around 400 parts per million, depending where
you measure. Interior you'll see anywhere between 750 to 1200. It just really depends.
And for our trading floors, people are close together. There's lots of people. CO2 is driven
by people.
People are exhaling.
Yeah, people are breathing. Yeah. So we've done a couple things. First here, you kind
of start with the monitoring. You got to see what the problem is. So we've done a lot of
air quality monitoring throughout our space to measure various things.
We publish them internally for folks, and you're able to see what the data is.
But then we've done other things, like we've brought in more outside air.
We've mixed in that outside air to try to dilute the CO2 with fresh air and exhausting some of the stale air.
But also we've tested and been testing CO2 scrubbers, things that were used on spacecraft. Those are challenging at the volumes that we're talking about.
We have large double height trading floors, hundreds of thousands of square feet.
It's very hard to extract all of that, but these are things that the team is looking at and testing and planning.
But wait a second, we've gotten the whole space age CO2 scrubbers.
Why isn't mixing in outside air just the end of the story and that makes you happy
and you can just stop there?
Yeah, because if you want to get down to, you know,
five, six, 700 parts per million,
that starting at 400 parts per million outside,
the amount of volume that you need to bring in
is a challenge.
Moving that much outside air into the space
becomes very difficult.
One, from just a space standpoint,
duct work, getting all that air into a building,
into an existing building,
but also the energy it takes, whether on the coldest day to heat that air, on the
warmest day to cool all that air.
Typical air conditioning systems recycle inside air to allow more efficient cooling, so you're
not bringing in the warmest air on the designed day and cooling it down, right?
It just takes a tremendous amount of energy.
So it's a mix of bringing in more outside air, thinking about things like scrubbers,
and trying to find the balance there.
And moving the air where you need it, when you need it.
If you have a class, moving the air to the classroom.
If you're at the trading desk,
moving the air to the trading desk.
So moving the air where you need it
is also an approach that we look at.
That sounds super hard though.
Jane Street is not a place where people
pre-announce all the things they are going to do, right?
There's a lot of like random,
you say, oh, let's go to that classroom and do a thing.
But looking at the sensors and seeing the CO2 climb and being able to move dampers around
and redirect air based on sensor input.
Is that a thing that we have that's fully automated or do we have people who are paying
attention notice things are happening?
I think it's a little bit of both.
We can make it fully automated, but I think that it's important to have a human looking
at it to make sure where, if you have large congregations in different areas, you can get fooled as
to where you should send the air and think about that.
So it's not something we're doing as a fully automated thing.
It's something we're aware of and we're able to make tweaks and adjustments.
Back to the space age thing.
Let's say we wanted to try and run these scrubbers.
What are the practical impediments there?
So I think the chemical process of pulling the CO2 from the air, the material that's
used in these scrubbers, it gets saturated with CO2 over time.
It's proportional again to the amount of CO2 in the air.
And the way you release that CO2 from that material is by burning it off with heat.
So now we have the situation where you consume a bunch of CO2, you store it, it gets saturated,
it stops being effective, and now you have to discharge it out.
So not only do you need the amount of power to burn that off,
but you also have to be able to duck that CO2 late in air out of the space.
So it's a physical challenge, physical space challenge.
These things get large, they're power hungry,
and you have to have a path to get the air outside.
Is it clear that the CO2 scrubbers would be net more efficient than just pulling in the
outside air at the level that you need?
It's not clear.
I think we're still analyzing it, looking at it.
If you think about the power consumption and space required, you can make arguments both
ways.
So I think the outside air is a more tried and true situation, but we've increased it
pretty significantly over time.
We're going to keep doing that and looking at that.
But there's many people in the industry looking at increasing CO2
as a function of indoor air quality.
But for many years, it's been frowned upon because of the energy that it consumes.
So you have to balance that.
So one thing that's striking to me about this whole conversation
is just the diversity of different kinds of issues that you guys have to think about.
How do you think about hiring people for this space who can deal with all of these different kind of complicated issues and also integrate
well into the kind of work that we do here?
Yeah, it's interesting. I think, first of all, many people don't think of Jane Street
right away when talking about these physical engineering, mechanical, electrical, architecture,
construction project management. So part of it is explaining to them the level of detail
we think about these things in. Right, that there's an interesting version of the problem. Absolutely,
and why it matters for our business is very important. And for the right person, they want
to be impactful to a business, right? For many people who work in the physical engineering world,
you're there to support a business, but you don't always see the direct impact of your work. And
here I feel like we get to see the
direct impact. I get to talk to you and hear about how DeskMove's helped your
team or how our data center design being flexible allows us to put machines
where we need them, when we need them. How the feedback we get from our interns or
our new joiners about the food and the lighting and the space and all the
things that we build. Those things go a long way in helping people here on the team understand the impact that they're having.
And for people who get to work with us, it only takes a few meetings to see how much
we care about these details and how deep we're willing and able to go on these topics.
And to what degree, when we're looking to grow the team, are we looking for people who
are industry veterans who know a lot already about the physical building industry?
And to what degree are we getting people out of school? Yeah. for people who are industry veterans who know a lot already about the physical building industry.
And to what degree are we getting people out of school?
Yeah.
You know, we just started an internship, so that's really exciting for us.
And I think that it's a blend of the two.
I think we really value people with experience, but we also feel very confident in our ability
to teach.
And if we bring someone in with the right mindset and willingness to learn and cast
a wide net of knowledge, I think they're very successful here at Change Street because you come in without these preconceived
notions of how things are done and you're able to challenge the status quo.
You're able to say, hey, these desks don't work the right way.
We want to move them around or hey, we need to bring liquid cooling to a data center is
something that is very much on the cutting edge now.
Those are the types of problems.
We want people who are excited by those problems,
excited by looking at it through a different lens.
Awesome.
All right, well, maybe let's end it there.
Thanks so much for joining me.
You'll find a complete transcript of the episode
along with show notes and links at signalsandthreads.com.
Thanks for joining us.
See you next time.