Into the Impossible With Brian Keating - Physicists Missed These Particle Tracks for Decades (ft. Daniel Whiteson)
Episode Date: December 26, 2025Please join my mailing list here 👉 https://briankeating.com/yt to win a meteorite 💥 From the electrifying environment of high-speed particle collisions to the challenge of sifting signals fro...m heaps of experimental noise, you'll hear how Prof Whiteson and his team are pushing boundaries. They discuss bold new algorithms capable of spotting non-standard tracks—think wild trajectories that defy classical expectations and could reveal surprises nature has kept hidden. Practical questions about detector design, efficiency, and even the mathematics of “smooth” particle paths make for a rich, dynamic dialogue. If you’re curious about how physicists ask the universe its most challenging questions, the frustrations and breakthroughs of innovation, and the fascinating interplay between theory and experiment, this episode will take you to the front lines of discovery. Plus, hear how machine learning might help us find not just the next weird particle, but perhaps the next Nobel-worthy revelation. Get ready for a fascinating journey into the impossible! Daniel Whiteson is a physicist whose research spans a wide range of topics at UC Irvine. By day, he works on the ATLAS experiment, one of the major physics collaborations at the Large Hadron Collider, where he contributes to Higgs boson precision measurements and develops advanced techniques in machine learning, data acquisition, and trigger systems. His research group is known for applying machine learning innovations to physics problems, including projects beyond ATLAS—like using approximate symmetries or jet pattern matching. Recently, his team has been focused on machine learning projects to identify unusual particle tracks, always pushing the frontier between physics and data science. Timestamps: 00:00 Revisiting Discovery with New Tools 04:43 Particle Tracking Constraints Explained 06:56 Challenges in Non-Helical Track Detection 10:29 Non-Helical Tracks and Dark QCD 14:38 "Track Reconstruction and Efficiency" 18:43 Quirk Detection and Reconstruction" 23:27 Testing Generalization Beyond Memorization 25:23 Quirk Tracks and Overlap Analysis 30:36 "Smooth Paths and Signal Control" 31:17 "Training Pipeline on Weird Tracks" 35:55 Filtering Standard Model Tracks 38:24 "Challenges in Parameter Optimization" 43:15 "Neural Networks Learn Complex Mappings" 44:38 "Machine Learning for Track Detection" - Join this channel to get access to perks like monthly Office Hours: https://www.youtube.com/channel/UCmXH_moPhfkqCk6S3b9RWuw/join 📚 Get my books: Think Like a Nobel Prize Winner, with productivity tips from 9 Nobel Prize winners: https://a.co/d/03ezQFu Focus Like a Nobel Prize Winner, with life-changing interviews with 9 Nobel Prizewinners: https://a.co/d/hi50U9U My tell-all cosmic memoir Losing the Nobel Prize: http://amzn.to/2sa5UpA The first-ever audiobook from Galileo: Dialogue Concerning the Two Chief World Systems: Ptolemaic and Copernican https://a.co/d/iZPi9Un Follow me to ask questions of my guests: 🏄♂️ Twitter: https://twitter.com/DrBrianKeating 🔔 Subscribe https://www.youtube.com/DrBrianKeating?sub_confirmation=1 📝 Join my mailing list; just click here http://briankeating.com/list ✍️ Detailed Blog posts here: https://briankeating.com/blog 🎙️ Listen on audio-only platforms: https://briankeating.com/podcast #universe #podcast #briankeating #intotheimpossible #science #astronomy #cosmology #cosmicmicrowavebackground #intotheimpossible #briankeating Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You said this place was steps from the water.
We just haven't found the steps yet.
How much did we save?
Enough.
Enough to get lost.
Or you could book a stay with Hilton.
Welcome to your ocean front room.
Just steps from the water.
The Hilton sale is on now.
Book on Hilton.com or the Hilton app
and save up to 20% to get the stay you expected.
When you want savings, not surprises.
It matters where you stay.
Hilton, for the stay.
Today you're in for a special treat, a lecture delivered by UC Irvine professor Daniel
Weitson, who takes us straight to the edge of what we know and then pushes us over the edge.
Particles collide at near light speed, detectors light up, and data pours in by the petabyte.
But what does it all mean?
Most of it is noise, but somewhere deep inside is a signal that shouldn't exist unless nature is
trying to tell us something new.
This is how modern experimental physics works.
Ask a precise question, build a machine to listen, and let the universe answer
Let's go with the incomparable Daniel Whiteson.
But today I want to talk to you about a project I've been working on recently about finding weird tracks.
All right.
So, very brief introduction to who I am.
I have a sort of broad research program up at UC Irvine.
My day job is working on Atlas, so the competitors are friendly collaborators of your CMS colleagues here.
We do Higgs precision measurements, machine learning unfolding, trigger data acquisition stuff.
But I also have folks in my group who do machine learning for physics who are not on Atlas.
And so, for example, we work on applying approximate symmetries or jet parton matching.
Today I'll be talking to you about their machine learning tracking projects they've been working on.
And I have a sideline in astrophysics, so I do some neutron star work, and have a project on building a cosmic ray telescope using smartphones.
as well as a couple of projects in high energy theory,
exploring high dimensional theoretical spaces with machine learning.
And at the very end, I'll tell you a little bit about my experience
in science communication as well if we have time.
Okay, so today I'm talking about machine learning,
and machine learning is everywhere in particle physics.
You hear about everywhere.
All the cool people are doing it.
It's very powerful and popular, and it's been a big success.
And mostly the story of machine learning is in optimization.
We have really big data sets.
We use physics and tuition to compactify them to one or two dimensions so we can analyze
them effectively, but that's limited because our intuition is limited and the data
never perfectly follows it.
So machine learning helps us extract more information from our data.
And that's wonderful, and that's fantastic, and I totally approve of it and participate
in it.
But I also hope for something more.
I'm hoping that machine learning can do more than just improve our reconstruction of the data,
but that it can knock down the walls that have prevented us from making certain kinds of discoveries.
That it can let us tap into problems we always thought were impossible.
We just sort of put them on the shelf and said, well, nobody's ever going to do that.
So let's just not even think about it.
Now that we have these new tools, let's go back and look at that shelf of impossible problems
and see if we can actually crack any of them.
Because it would be awesome if machine learning could do more than just optimize our analysis.
If you could make what was once a possible, possible again.
And so I want to go back to like the old days of particle physics and think about what discovery used to mean.
It used to be you could make a single event discovery.
Back when the signals were high and the backgrounds were low before all the easy stuff was taken, right?
You could like C1 positron and then your like data to Nobel Prize ratio was one to one, right?
Pretty awesome ratio there.
All the good stuff has been used up and so now we got to make low rate high background discoveries, which are mostly statistically.
So you can't look at one event and say, this was a Higgs or that was a Higgs.
It's all statistical, which doesn't mean it's not valid, but it's just sort of a different kind of experience.
And the reason we do this is that the new signals are rare, right?
And so you need high-rate experiments.
And when you go from low-rate experiments to high-rate experiments, you have to upgrade your technology.
You can't do tracking the way we used to do tracking with computers.
These are computers back in the day, humans doing computing, where you could look at
at the data and say, here's a track, here's a track, here's a track, here's a track.
Because it turns out your brain is really good at that.
But they can't keep up with sort of high rate modern tracking or high rate modern data
taking.
And so we have to move to a new kind of tracking, which allows for high rate analysis but
comes at a cost because tracking is a hard problem.
So let's make sure we're talking about the same thing.
You produce a bunch of particles.
They fly out.
You have detector elements.
elements, you don't see the whole particle trace.
Often you look at event display and you're like, there's the muon.
But what you actually see are individual hits where the muon has left a blip.
And this task tracking, I'm going back and saying, oh, all of these hits were on one
particle.
And these hits were on a different particle.
And it's pretty simple for this example.
But now imagine you have a thousand particles, 10,000 hits.
How many ways are there to assign 10,000 hits to a thousand particles?
Right? 10,000, choose a thousand is a big number, right?
And so in principle the problem is impossible to brute force.
And so we have approximate approaches.
In order to make this problem accessible, we've made some assumptions.
And that's fine.
That's the way you should first do things.
Every time you start a hard physics problem nobody's ever done before, you assume something
spherical or you take the first order, whatever, right?
Let's think beyond that.
So the typical assumptions that are made in tracking are two.
Number one, you assume that particles begin at the interaction point, because most of them do.
And most of the particles you're looking for are emitted from the hard interaction.
And so if you're looking for a set of hits that are a particle, constrain it so that the track has to go through the vertex,
dramatic reduction in the number of combinations, right?
Very quickly.
Much easier problem.
And most of the particles do that, so awesome.
The other assumption typically made is that tracks move as a helix, because particles are either neutral, in which case they're invisible anyway, or they're charged under electromagnetism, in which case in our constant magnetic field, you get a helix.
Another huge simplification.
If you only have to look for helices, then you know how to look for particles where to look for them.
Very important assumptions made for tracking, but these also limit our discovery.
If you're not looking for particles that don't come from the heart interaction, you can't find them.
If you're not looking for particles that are not helices, you can't find them.
Now there's been a lot of work recognizing that we should look for particles that don't come from the vertex.
There's a whole cottage industry of these folks now, and it's great work, and one of my postdocs is working on this.
I think it's very creative and lots of possibilities for discovery.
You know, there's things that are created later and things that are invisible and then pop up, all sorts of awesome stuff.
So clearly we've recognized that we should dig into these assumptions and figure out ways to avoid making them.
Much less attention has been paid to the question of finding non-helical tracks.
And one reason is that the assumption of helicity of helical track is built into our track-finding algorithm.
Typically the way this works is you find a little stub, a few hits in a line,
and then you fit a helix to it so that you can project forward,
you can project forward and figure out where your next hits should be.
You don't have to consider a hit all the way over there or hit all the way over here because
you know where this track is going and you have an envelope with uncertainties and
column and filter and it's very sophisticated and it works really well if you know the
parametric form of your track, if you assume it.
So track finding and fitting closely woven together, right?
We assume it's a helix.
So this algorithm, how could it possibly find non-helical tracks?
It's woven deeply in to this strategy.
So, well, you might think, well, let's just put in another parameterization, right?
Just plug in something else.
I have a helix, plug in a schmelix.
Why don't you look for schmeleses, right?
Okay, that's cool.
And we can do that, and I'll show you an example, but I want to do something bigger.
I want to find something where I didn't have to come up with the parameterization in advance.
I want to be prepared for that crazy weird particle that's in my detector, which
if I just showed you the event display, it would look like.
the positron discovery. Your eyes would be like, look at that, that's something. I don't know what it is,
but it's something in our detector. I want to discover that, find a real surprise. So that's the
big hard problem that I think has been seen as impossible, you know, to look for effectively
potential single event discoveries waiting in our data. If only you knew which event to look at,
but currently we have no algorithms capable of finding these things. Okay, so usually
And this is my fantasy, right?
Find something that makes the theorists go, no, that's not possible.
You can't have discovered that.
We proved that doesn't exist in the universe.
Like, awesome, let's do it.
So how do we do this?
Well, we piggyback on the good work of other people.
So there's a collaboration of folks called exit track
and other people around the world who developed a new way
to find tracks using machine learning.
For completely separate motivations, they
were trying to develop tracking, which is good,
at high pileup environments where there's lots and lots of particles because the traditional
method doesn't work really well when you have huge numbers of particles.
So they developed a completely new approach which uses graph networks.
So you recognize that your data looks like a graph, your network looks like a graph.
It's all very, very cool.
And it works.
In high pileup environments, they're good at finding tracks.
But along the way, they reinvented tracking, and their tracking separates finding and fitting.
machine learning tracking does not assume a helix. It learns the tracks are a helix from the
training examples that you give it. And it separates the finding and the fitting. It says,
here's a set of points I think are on a track. Then later you fit it to a helix. So it's separated
these two tasks, which is exactly what we need if we're going to find things that are
not helices. All right. So that's the setup. And that's what made me wonder when I saw these
papers, can we use machine learning tracking to sort of generalize track finding, to find
things that are unexpected?
And then as a bonus, I think we came up with a way to speed up the track fitting part
of it.
All right, so first I'm going to talk to you about generalized track finding.
How do you find tracks that are not helices?
Then talk about fast and precise fitting of those tracks.
Okay, so first, you might think, I like the idea of craziness, but I need an example.
Is this even possible?
Like, you know, weird discoveries are cool, but give me an example.
So there are folks up at UC Davis who came up with this idea that maybe there are
particles that don't just have electromagnetic charges, but that are like charged under a dark
QCD.
So for example, the quirk, right?
The quirk is a dark matter particle that, it's like a dark analog of a quark, right?
But it has a dark analog of QCD.
But in dark QCD, the confinement scale and the mass scale have a different relationship.
So when you create quarks and they get macroscopically far apart, instead of having a flux
tube which then pops new quirks out of that energy because the quirk mass is low, the quirk mass
is high.
And so it can't pop new quirks out of that flux tube.
And so they remain separated and they oscillate macroscopically.
So you create them in a pair and they go like, whoa, whoa, whoa, whoa, whoa, whoa,
through your detector.
weird trajectories. So that's very cool. It's a good example of something which we
cannot find at all. And so we thought it was a good initial test case to work on. Don't
start with the full problem, start with a baby step. So we generated a bunch of quirks in
Madgraph, passed them through a simulated detector. And as an aside here, this detect that
we use is sort of a toy setup. We just have a bunch of layers of detector and as a particle
passes through it, we say there was a hit.
We'd love to do the more realistic study of actually modeling what happens to a quirk
as it passes through many layers of detectors and doing that in Jaunt.
But it's very tricky to modify Jaunt in this specific way.
And the lore is that there was a student at Berkeley who spent three years working on this,
failed and then left the field.
And so people like, oh, it's impossible.
But recently there's a student at Irvine working on importing quirks into Jaunt for a separate project
for phaser, and he tells me it's going well.
So we'll see.
Anyway, for this study, we began with a simplified detector, just a bunch of layers.
And you see that you get lots of different weird kinds of behavior, depending on the
quirk parameters, of course.
So there's this ratio of the mass of the quirks and the confinement scale.
If the ratio is correct, you get oscillations on the order of centimeters, which is very cool.
Otherwise you might get oscillations on the order of like nanometers, in which case they're invisible,
or really, really large, in which case they're invisible.
But in this intermediate scale, you get all sorts
of really weird tracks you would just never expect
from the standard model.
So can we find these things?
And most of the actual work was done by Max Fai.
He's now a postdoc at Fermilab and Chi Yu Sha at Beijing.
And we use this machine learning track pipeline.
And I won't go into great detail about how it works,
but the idea is that you learn to find track
you don't assume they're helices.
And the way you do this is you take your track,
which has a bunch of hits in it,
and you map it to some abstract latent space.
What is this latent space?
We don't know.
We're going to learn this mapping,
which defines the latent space,
and we're going to learn a mapping
so that tracks,
so that hits that are on the same track
end up near each other in this latent space,
and hits that are not on the same track
and are far apart from each other.
That sounds cool, but like,
how do you come up with this mapping, right?
That's the joy of machine learning.
don't have to know how to construct this or write this down or find some analytical solution,
you just train it to do that.
And so that's what this pipeline does.
You give a bunch of examples of things that are on the same track and things that are not
on the same track, it learns this mapping.
So that in the latent space, things that are on the same track are near each other.
You can just be like, tracking is easy now.
These are all clustered together.
Obviously they're on the same track, right?
So it's like saying, could you map my hard problem to an easy problem?
Thanks.
And that's effectively what it does.
And you notice, along the way, it never assumes a helical path.
It just says this set of hits on a track, this set of hits on a track.
So I'm going to quote the performance of our tracker in terms of efficiency.
And technically I define that as the number of reconstructed tracks,
divided by the reconstructable tracks.
Reconstructable meaning you went through a bunch of layers.
Reconstructed, meaning that you are matched to the tracts.
true, a double majority match to the truth, which means the true track, most of its hits are on
your reconstructed track, and most of the hits on the reconstructed track are from the true track.
It's peak pollination season, and my business is scaling fast. To keep the nectar flowing,
I need a phone plan with top priority data speed. That's why I chose GoogleFi wireless. My
connections stay strong even when the hive is buzzing. Plus, unlimited plans started $35 a month. Now,
That's a deal that doesn't stay.
Explore GoogleFi Wireless plans today.
Plus taxes and government fees.
GoogleFi Wireless is not subject to data traffic deprioritization during times of high network usage.
All right, and we can get into the details of that if somebody has a question.
So we started by saying, hey, can we get this pipeline to run at all?
Because this is like a research thing.
It's not like an app you download in your phone.
It just works.
So we got an account at NIRSQ.
We got this thing to run.
This step was not easy.
I'll be honest.
Like just getting this code to run.
We got some help from the authors.
But first thing we did is like, all right, feed it standard model tracks.
Ask it if it can find standard model tracks.
Answer is yes.
Woo-hoo.
All right, first step, right?
Then we said, well, let's see how hard this problem is.
Let's train it on standard model tracks and then just feed it a bunch of quirks.
Maybe it's just good at it.
Maybe we're underestimating it.
Answer is no.
It sometimes find quirk tracks, but only when those tracks happen to look
exactly like standard model tracks.
All right.
So it doesn't find any of the interesting quirks.
So this tells us the pipeline works.
And how many times did it find things
that weren't there?
Oh, yes, the fake rate.
We'll talk about that at the end.
It's very, very low.
If you just take like random sets of hits,
it almost never chooses those to be tracks.
Yeah.
So, for example, these, right?
It's good at finding these.
These are cases when the oscillations are not visible because the parameters either give you microscopic oscillations.
Yeah, so it fails at this kind of stuff.
Okay, so the pipeline works, but the quirk problem is hard.
If this had just worked, we would have been like, oh, well, standard tracking can find this already a salt problem.
You may have said this, but as a theorist, they'll have to ask, what are the characteristics of your tracker that you're simileating here?
Oh, that's an experimental question.
We have two configurations, one with eight layers and one with 25 layers, just simple cylinders.
Each layer does X1?
Each layer gives you a space point, so effectively X, Y, Z.
Yeah.
Other questions?
So data set, in this line, yes.
Just to see, this is a stand-in for traditional trackers.
If something that's good as standard model can find quirks, then the problem is too easy.
Right? So this was just to see if the problem was hard enough.
Exactly. We expected it to be bad. It was bad. So that was good.
And the performance here depends a lot on these parameters. In some cases, arrangements
of the parameters, you get like most of the tracks look like standard model tracks. In some
cases you get very few of them. So anyway, in the paper you can see this whole table.
Then we did the harder thing. We said, all right, now let's teach it to find quirks. Let's give it a bunch of
a bunch of quirk examples, can it find them?
Right?
This is the thing we were excited about.
And the answer is yes.
So there really isn't like some built-in assumption about helices in this pipeline.
The only thing we had to change was we had to change the assumption that the order of the hits are the order in time,
because sometimes quirks go backwards and then go out.
So we just have to make a little tweak to the pipeline, but then it can learn them.
So we were very excited that this pipeline can learn to associate sets of hits that are not in a heat.
that are not in a helix.
But as you said, somebody asked, like, okay, well, if you only have one quirk in your detector,
then like, how hard is this problem anyway?
So, you know, we're taking baby steps there.
But here's an example of an event where you have two quirks, and the truth quirk looks
like this, and the reconstructed quirk are these dots.
So it doesn't find every single hit, right?
You see there's some places it intercepts, but it finds a majority of the hits and definitely
enough to reconstruct this quirk to identify this as like something weird in your detector.
That's an example.
Okay, so then we said, all right, let's get more realistic.
We're going to run this thing on events with obviously standard model particles also.
So let's train it with standard model particles and quirks in the training data set and then
test it on events that have a mixture, right?
Because obviously events are going to have lots of standard model background.
And this efficiency is the quirk efficiency, not the standard model.
model efficiency because here we labeled these negatively.
We said, don't find the standard model tracks.
Here's a bunch of examples of what not to find.
So we trained it to find these and to ignore these.
And then we measured the efficiency on events that had a big mixture,
mostly standard model, of course, but measured the efficiency
only on quirks.
And so even when it's drowning in standard model tracks,
it can still find quirks with a reasonable efficiency.
And I think that was somebody's question about it.
Mixing them.
Okay.
I know I'm not supposed to ask questions like this, but.
Oh, I load those are my favorite preface or a question.
What is the, what are the characteristics of the tracks geometrically that it's actually,
have any idea of what, what it's training, how it's training?
Oh, like how does it solve this problem?
Right, I mean with standard model we have some ideas that we're looking for, as you
said like a helix, we have the idea of the geometry.
Uh-huh.
And sorry, these are like pixels with timing or no time?
No timing, just 3D space points.
Just a bunch of points.
Just a bunch of points.
It's kind of hard.
Yeah, it is hard.
So we did examples with eight layers, an example of 25 layers.
Here's an example of a pair of quirks.
I'm not showing the standard model tracks just for clarity,
because they'd drown you out, but this trajectory was found,
and it was able to disentangle these, right?
It's like this is the purple, this is the red,
this is the purple, this is the red.
Because it learns some pattern about how the quirks.
What did it learn here?
How is it doing it?
Every training sample is quarks and standard model
in the same detector.
But we label the tracks as like quirk or standard model.
So find these, but don't find these.
We never show it quirks without standard model.
In the last line.
Thank you.
What is the ratio?
How many?
Yeah, great question.
I don't know that number, but we used TT bar as the background.
And we, I think, overlaid two TT bar events on every quirk.
So how often does it fit?
Yeah, almost never.
So the fake rate is very, very low.
Sorry, and when you train, you train on like one quark string tension or something.
But is there some?
I'll talk about that in a minute.
Yes, exactly.
Right.
So in the paper, we scan these two parameters.
There's the confinement scale and the mass.
And we measure the efficiency across this whole plane.
And in some cases, the efficiency is excellent.
In some cases, it's pretty good.
In some cases, it's a little lower.
These cases are mostly when the two quirks are overlapping on top of each other.
And so you can't reconstruct both of them.
So your maximum efficiency is 50% anyway, like here.
So you say, you train, you're training one.
Yeah.
Exactly.
And so what I'm doing here is I have, you know, what, 20 examples?
These are 20 trackers, each one for a specific quirk parameter.
Now if I'm going to go to Atlas and say,
I want to run a new tracking algorithm on the data,
they're going to laugh at me anyway,
even if I have a great idea.
And then if I say I have 20 tracking algorithms
I want to run, it's never going to happen.
And so we don't want to develop an individual tracking algorithm
for every point.
So we are interested in generalizing.
Can we develop a single one that can find any kind of quirk?
So we did some generalization tests.
We said, what if we train on one point,
and then test on that point?
Okay, so that's the baseline.
But what if we train on everything but this point?
We have all these points that aren't our target.
We train on all the other ones and test on one of these points.
What's the efficiency?
It goes down, but it's not devastating.
Which tells you that it's learning something quirky.
It's not like just memorizing these tracks or these paths,
is something about the quirkiness of these tracks
that it can generalize across these parameters.
across these parameters. Then we did another one similar for a different point, and you see
a similar relative behavior. So this was exciting for us because the whole time, like, yeah,
quirks, whatever, I'm not actually interested in finding quirks. I want to find something new and
weird, and our hope was to use this as a stepping stone to generalize this into finding
any kind of weird track. All right. So before I get there, let me answer your question in more detail,
I think it was, about, or somebody's about the fakes, because you're going to run this thing on
data, there's lots and lots and lots of tracks in there, how often do you get something that
looks like a quirk but is actually a standard model track because you're drowning in standard
model tracks.
So what I did is I wrote a quirk fitter takes a bunch of points and just like a helix
fitter finds the best parameters of quirks that describe those points.
And so then you can calculate the kai squared of the quirk hypothesis.
What is the distance from those points to the quirk track, the best fit quirk track, and just
like you can for the helix.
And then you can effectively subtract those two kai squared.
It's like a likelihood ratio of your quirk to helix hypothesis.
So here are standard model tracks, and here are quirk tracks.
And you see the separation.
This is a log scale.
So most of the quirk tracks are way over here, and the standard model tail falls very,
very rapidly.
There is some overlap.
There are some quirk tracks that don't fit great to the quirk hypothesis because they're
missing some of the hits.
There are some standard model tracks that
do look a little quirky that the quirk fitter finds some parameters.
It turns out that's when the tracks are almost back to back.
Essentially you have two standard model tracks.
If you have a standard model track and you try to fit it to the quirk, it will find, it will
assume there was another quirk back to back and that the oscillation length was too small
or too big to see.
So you can almost always find some place in quirk parameter space to describe a standard model
track.
But then you get these two tracks that are back to back.
So if you make the cut here, this is all the standard model tracks, you can remove essentially all of the background.
And, you know, how efficient you want to be versus how much you want to tolerate background is a question of choice.
But these are definitely two separatable populations.
So that was studying the background. Yes.
Ambition comes in all shapes and sizes.
At First Citizens Bank, we roll with your goals because we're built for what you're building.
Fit for your ambition for Citizens Bank.
Yeah.
Yeah, we sign every fit.
When you want to make sure you're getting all the hits, otherwise you just have like a thousand hits,
but you picked up two tracks and then you're going to be able to.
Do you understand what I'm asking?
Are you asking whether random hits that are not from quirks get fitted as quirks or whether you're using up all the hits?
Yeah, whether you're using up all the hits.
Because I could imagine some approach where I'm going to fit all the standard model things and just look at what's left behind.
Oh, I see.
Something like this or you're a lot of...
Like you want to get all the particles.
Right.
How long are you picking up everything?
I don't think you're usually using all the hits because there's also a lot of noise.
Yeah, it's an interesting question.
In these we don't have, we did, right, we don't have pilot, but we did add random hits as noise.
And so I don't, I think you might be suggesting that as a strategy for finding weird tracks, you could
just fit all the standard model tracks, remove them, and then consider what's left.
Right. That's yeah.
Yeah. Which I think is a reasonable thing to do. And it's sort of what's happening here
because here I'm fitting all the tracks to the standard model hypothesis also in using
that as a way to reject them. But you're suggesting don't even include them in the track
finding, which I think is a reasonable strategy, but we haven't explored that.
So for this court, do you assume a specific confinement scale?
No, it fits the best fine, confinement scale.
Yeah, which is why it always ends up here for standard model tracks.
It finds one to effectively make the quirks look like straight lines.
Okay.
So, and then you can calculate like how many quirks you would see, et cetera, et cetera.
And the next step is to actually get this in a jaunt and do realistic studies, a sort of proof
of principle.
And we had a paper out last year.
You can go and read up on the details.
But that was really just the warm up, because what we actually want to do is find
something weird. Like quirks are interesting, they're not a helical, but there's still an idea
somebody had. And what I really want to do is find an idea that nobody had, a discovery
that's a surprise. And so how do you describe, number one, how do you find things you don't
know if you're what you're looking for, and how do you even define the space of things
you might be looking for so you can like train your network? So our goal is to find all
possible weird tracks, even unanticipated ones. So I pitched this problem to Levi,
He's a grad student at UCI.
I said, here's a hard problem.
What do you think about it?
And he said, well, let's make some restrictions.
You can't be sensitive to anything,
because then you could take literally any set of hits
and say, I have a magic unicorn that flies
through my detector and deposits these hits.
And any set of hits could be a hypothesis.
So he said, let's restrict it to any smooth path,
something that has infinitely differentiable,
because then you could imagine there's some force
that's causing the particle to move that way.
And then let's describe the path of the particles in the basis of Fourier modes.
So you can take any path through space and express it as a sum of Fourier modes at various
frequencies.
And he discovered this mathematical tool called the Schwartz function, which will guarantee
that your path is smooth.
So if the amplitude at high frequency falls fast enough, then your path, excuse me, your path
is guaranteed to be smooth to satisfy this infinite differential ability condition.
So this is called the Schwartz function.
I'd never heard of it before.
Turns out there's a guy in our math department.
There's an expert in them.
So that was cool.
And then you can generate example shorts functions or tracks that satisfy a Schwartz function.
And here's one.
So this one, right, he starts at the vertex and then does this crazy thing.
And it's definitely not a quirk.
it's not a helix, it's some other weird path.
And just as an example of the kind of weird track we're hoping to be able to find.
In order to guarantee that the path is infinitely differentiable, there doesn't have a kink in it,
then we limit the amplitude of the high frequency modes.
Because if the high frequency modes can have very large amplitudes, then you can get crazy behavior.
So we don't want to describe any possible function.
We want to describe a subset of functions that are smooth, and this is the way that we can sample from the space,
of smooth and here's a distribution in the number of hits
in a 25 layer detector.
If you start with just one mode,
like you just pick one frequency,
then you get this distribution.
And as you got up to 25 modes,
you start to get some pretty weird tracks
that let go in and out and do weird stuff,
which is a lot of fun.
So we did the same game.
We said, well, let's see what happens.
So we started out by training our pipeline
on the standard model and saying,
let's feed it some weird tracks.
And so this is the number of,
Fourier modes up to 25.
And you see it does terribly.
Occasionally one of these things will happen to look helical enough to be reconstructed,
but usually not.
So it's a hard enough problem.
And then we said, well, let's train it on our tracks.
So let's feed it a bunch of weird tracks and see if it can learn the relationship between
the hits so that it can reconstruct them.
And here, the training and testing, these are one frequency modes, one frequency modes,
but obviously different tracks.
We're not giving it exactly the same tracks, just new samples from the same space.
And it can do it.
So it's not rejecting these tracks.
It's rejecting helices mostly, but it's not rejecting these tracks, which is exciting.
And then we said, all right, well, let's do it with standard model tracks.
So we now are mixing standard model tracks and weird tracks together, training it on that,
and measuring the efficiency on the weird tracks.
Still pretty good.
Important detail here is that now, because standard model,
All tracks in principle are a subset of the weird tracks, we can't label them as don't find
these because we want to find a generalized track finder.
So we might use your idea like first find those, delete those hits and then run this track finder.
But here we didn't negatively label those tracks.
Well it's continuous.
It's continuous.
It doesn't have to go to the outer layer.
No.
But it has to start at the interaction point and then follow a smooth trajectory.
I think I have some examples.
Yeah.
And so as you get more modes, I think it's maybe learning something more general about
the problem.
But I'll talk about generalization in a minute.
I hear some examples of tracks that have found.
So the green circles are the hits on the reconstructed track.
Again, it doesn't find all of them, but it finds this guy.
It identifies it.
It gets the majority of the hits.
And these have TT bar events also.
I'm just not showing those.
Okay, so the question then is, well, you've trained it on a certain set of tracks,
and you've tested it on other tracks, and the efficiency is pretty good,
but have you really learned to find any smooth path?
That's the big claim.
How do you span that whole space?
Well, we can't.
And one reason that we can't is that there's an infinite number of possible Schwartz functions.
Shorts function is some condition about how quickly the amplitudes fall.
A Gaussian satisfies it, but there's an infinite number of them.
So you could choose any of them, and they each define a different space.
of smooth tracks.
But what we can do is try to sample this space.
We can choose a couple different functions
and see what happens.
So if we make one choice of sorts function,
A, for example, and test on that one,
we get pretty good efficiency.
If we train on the same set and then we test
on a different set where we remove the overlap.
So we remove all tracks that are in B and in A,
and only on the B tracks, we get good efficiency,
even higher efficiency.
Okay, so it's definitely learning to reconstruct tracks that are not represented in its training
sample.
It's not just learning some efficiencies, some Fourier modes, or memorizing some details
about them, it's learning some property of these weird tracks, some smoothness that it's
then using to be able to generalize to other tracks it's not seen before, which was very
exciting to me.
Your summer starts now with Memorial Day deals at the Home Depot.
It's time to fire up summer cookouts with the next grill, four-burner gas grill,
on special buy for only $199.
And entertain all season with the Hampton Bay West Grove's seven-piece outdoor dining set for only $49.
This Memorial Day get low prices guaranteed at the Home Depot.
While supplies, price-in-valid May 14th or May 27th, U.S. only exclusions apply.
See Home Depot.com slash price match for details.
And again, you have to worry about fakes.
Like if you're talking about weird tracks, you're also going to be susceptible to like some garbage or the thing radiated away or just looks ugly.
And so here we don't have a hypothesis to fit with, like, as we do with the quirks, where we can fit to the quirks and compare it to the helix.
All we can do is try to fit to a helix.
And sometimes, like this thing just does not fit to a helix.
You get a terrible k-squared.
But sometimes a weird track will get a pretty reasonable kye-squared for the helix hypothesis.
So you can't be sensitive to all of them, but you definitely can reject a lot of them because
they look, they don't look, you can reject a lot of the standard model because they look
like standard model tracks.
So before I move it on the next step, questions about, that's essentially what I'm trying
to do here is I'm trying to show how you could try to filter these things out.
Effectively, you have to remove stuff that has a bad k-squared in the standard model
tracks.
That's dangerous because the performance here, that means cutting this tail.
And this tail depends a lot on the details.
Like in this simplified environment, most of your standard model tracks fit well to a
helix.
You go to a more realistic environment, you're going to get all sorts of crap that's happening.
This tail is going to get worse and worse and worse.
So when we go to a more realistic environment, I think this is going to get a lot harder.
So then I got interested in this question of fitting because essential to removing, to discovering
a weird track is filtering out the weirdest tracks and identifying them.
which means finding the standard model tracks and removing them.
This is essentially anomaly detection.
In the language of anomaly detection,
we typically use an auto encoder,
which is a classical approach.
And an auto encoder, you start from some data set,
which is physical, say for example, hits.
And you find a mapping to a latent space,
which is lower dimensional,
and you train your auto encoder to find some mapping
so that it can map to the latent space
and then back and recover
the original data.
If you can do that, then you've done some sort of like effective compression.
You found some way to encapsulate your complicated data in a lower dimensional space so that
you can expand it back out.
That's the principle of anomaly detection.
But tracking is essentially already that.
We start from a hit space and then we try to describe that using a lower dimensional parameter
space, the helical parameters, right?
Essentially you're saying, I'm going to summarize all of these hits with the
with a five-dimensional parameter, five-dimensional helix.
And if I've done it correctly, then when I go the other direction, I could predict where
the hits were, if I know where the detector layers are.
And if I try to fit something which is totally not a helix, I'm going to get some crazy fit,
which when I go back is not going to recover my original hits.
So tracking already is in the language of anomaly detection.
But the problem with tracking is that we know how to go from parameters to hits.
If I tell you a helix, I could tell you exactly where that helix goes.
I don't know how to go from helix to, I don't know how to go from hits to parameters.
If I give you a set of hits, there's no function I can write down that says, here are the
helical parameters.
This mapping is unknown.
And so traditionally, what do we do?
Well, we do this horrible scan where you explore this space and you say, how about this point?
And you map it back to hits and compare the hits.
No, that didn't work.
How about this space?
No, how about this space?
And you have this optimization problem where you're scanning the parameter space looking for the point,
which best describes your hits.
And yes, we have optimizers, but they're all terrible.
Anybody who has worked on a realistic experiment, there's always like one guy, usually some Russian dude,
whose job it is to make the track fit or work.
And there's so many things in there that you need to tune just right because it's such a pain.
And tracking is filled with local minima.
It's a really hard task.
And it's slow.
If you have to run your optimizer every time and it
to start from different conditions just to make sure you minimize,
it's a pain.
So it's slow, it's unreliable.
In addition, it assumes Gaussian hit noise, right?
You have in this parameterization, you're assuming that your hits have a Gaussian distribution.
And finally, it optimizes for the thing we're not interested in.
It optimizes for what is the point of the parameters that brings me closest to my hits.
But what we're actually interested in are the parameters.
Like, say these are your hits, and the true track is the green line.
The fitted track might be this red one, because what you're optimizing for is the distance from the fitted track to the hits.
Nobody actually cares about the distance between the fitted track and the hits.
What you care about are what was the momentum of this electron, which means what you want to know is,
what was the curvature, what was the direction.
Those are the physical things we're interested in.
We don't care about this distance.
And one experience from machine learning is make sure you're optimizing for the thing you actually care about.
Not some proxy, which usually is a good proxy for it, but sometimes isn't, right?
Always optimized for the thing you actually care about.
So I said, well, we don't know this function to go from hits to parameters.
What if we just try to learn it?
Instead of searching this space here, let's learn this mapping from hits to parameters.
There's no search.
There's no assumption of Gaussian distributions.
And it can optimize for the thing we're actually interested in, which is the parameters.
So I gave this task to a machine learning grad student and he came up with a model which will
take as input hits and take as output parameters.
Because there is a mapping, right?
We don't know it.
It might not be easy to express, but it exists.
Each set of hits defines parameters, right?
So you should be able to be parameters of the track, the helical parameters.
Well, there's one of the parameters.
includes distance of closest approach.
So with the or...
Well, the track is not required
to go through the origin.
Now, the problem is that
there is multiple scatter in the random material effects,
which should take an electron...
Absolutely.
Absolutely, yes.
So it would be targeting the origin
or some other point that...
I guess you...
Well, the beauty, thank you for raising that.
The beauty of this is that we don't have to assume,
for example,
that the hits have a Gaussian distribution.
We can include all of that stuff implicitly
in the training sample.
Give it a bunch of electrons and say,
here were the true parameters of the electrons,
here were the hits.
And it'll learn to associate those.
Where is the origin, right?
You need to pass the true value, absolutely.
And you can generate...
Well, the five parameters define the helix,
but that doesn't require it to go through the origin.
So wherever you generated your tracks,
maybe they all came from the origin,
in which case they do pass through it,
or maybe you have a spread.
of them, is that you're asking about, how we generated the training sample?
No, I mean...
Well, I mean, it's...
Material effects will make it not move in a helical path, right?
They'll deviate from a perfect helix.
Yeah.
Maybe we can talk about it offline, yeah.
I think we're having a semantic issue.
Anyway, produced a model which learns to map hits to helical parameters.
And in the case that the hits actually are Gaussian distributed around the track,
it reproduces the performance of least-square.
of least squares fitting.
So here's the residual in a couple of the parameters.
I'm not showing all of them.
They're in the paper.
And this is the difference, so you want to peak at zero
and you want to be narrow.
So the neural network has the same performance
as the least squared.
And the least squared in the case of Gaussian distributed hits
is the best you can do in the terms of minimizing the residual.
And so we hope to match it.
And so in this case, the neural network does match
the idealized least squared, which is good.
we show that we can learn this mapping.
But what if you have a more complicated situation?
Let's skip that for now.
What if you have a more complicated situation?
What if your hits do not map, are not Gaussian distributed?
What if instead they're skewed from the true path,
or there's an offset, or there's multiple scattering?
Now you can describe that implicitly in your training sample
just by adding those deviations to the hits in the training sample,
and the neural network will still learn the true
true parameters, right?
Because it has seen the examples.
It says if your hits over here, the true track was over there.
It will learn that relationship.
Whereas the Lee squared, you have to encode directly into the model in your exact noise
distribution.
Here, we only have to encode it implicitly.
We can run some complicated simulation or whatever.
We can include as many effects as we want implicitly in the training sample.
The network will learn that.
The Lee squared has to explicitly encode in the kai squared exactly what the noise model is.
And of course this is much, much faster.
So it's about a thousand times faster than searching the parameter space to try to find
the best values of the parameter.
And it works very well as an anomaly detector.
So in the case that you have some deviation from Gaussianness, then your neural network is much better
at separating non-heylical tracks from helical tracks.
than your least squares fitter.
All right.
So in conclusion, machine learning tracking is much more flexible.
By separating the finding and the fitting,
we can do finding of non-heal tracks
and we can do fast and precise fitting of the helical tracks.
All right, thanks very much.
With this fascinating lecture by Daniel Weidson sparked your interest,
don't let your curiosity end there.
Watch my full podcast conversation with Daniel
about his latest book, Do A&Mewan.
the latest book Do Aliens Speak Physics, where we ask what the laws of nature say about
life beyond Earth. Don't forget to like this video, comment with your biggest question about
particle physics, and subscribe for more trips to the edge of the universe and beyond, and don't forget
to share it with your friends.
Yamava Resort and Casino at San Manuel is California's number one entertainment destination
for today's superstars. Catch the Jonas Brothers return to the Yamava Theater stage on April 30th,
the powerful vocals of Demi Lovato on May 17th, and the signature Southern Country
rock of Eric Church on July 19th. Tickets on sale now at yamava Theater.com, only at Yamava
Resort and Casino, celebrating its 40th anniversary. You win? Must be 21 to enter.
