Into the Impossible With Brian Keating - Physicists Missed These Particle Tracks for Decades (ft. Daniel Whiteson)

Starting point is 00:00:00 You said this place was steps from the water. We just haven't found the steps yet. How much did we save? Enough. Enough to get lost. Or you could book a stay with Hilton. Welcome to your ocean front room. Just steps from the water.

Starting point is 00:00:16 The Hilton sale is on now. Book on Hilton.com or the Hilton app and save up to 20% to get the stay you expected. When you want savings, not surprises. It matters where you stay. Hilton, for the stay. Today you're in for a special treat, a lecture delivered by UC Irvine professor Daniel Weitson, who takes us straight to the edge of what we know and then pushes us over the edge.

Starting point is 00:00:39 Particles collide at near light speed, detectors light up, and data pours in by the petabyte. But what does it all mean? Most of it is noise, but somewhere deep inside is a signal that shouldn't exist unless nature is trying to tell us something new. This is how modern experimental physics works. Ask a precise question, build a machine to listen, and let the universe answer Let's go with the incomparable Daniel Whiteson. But today I want to talk to you about a project I've been working on recently about finding weird tracks.

Starting point is 00:01:08 All right. So, very brief introduction to who I am. I have a sort of broad research program up at UC Irvine. My day job is working on Atlas, so the competitors are friendly collaborators of your CMS colleagues here. We do Higgs precision measurements, machine learning unfolding, trigger data acquisition stuff. But I also have folks in my group who do machine learning for physics who are not on Atlas. And so, for example, we work on applying approximate symmetries or jet parton matching. Today I'll be talking to you about their machine learning tracking projects they've been working on.

Starting point is 00:01:43 And I have a sideline in astrophysics, so I do some neutron star work, and have a project on building a cosmic ray telescope using smartphones. as well as a couple of projects in high energy theory, exploring high dimensional theoretical spaces with machine learning. And at the very end, I'll tell you a little bit about my experience in science communication as well if we have time. Okay, so today I'm talking about machine learning, and machine learning is everywhere in particle physics. You hear about everywhere.

Starting point is 00:02:14 All the cool people are doing it. It's very powerful and popular, and it's been a big success. And mostly the story of machine learning is in optimization. We have really big data sets. We use physics and tuition to compactify them to one or two dimensions so we can analyze them effectively, but that's limited because our intuition is limited and the data never perfectly follows it. So machine learning helps us extract more information from our data.

Starting point is 00:02:41 And that's wonderful, and that's fantastic, and I totally approve of it and participate in it. But I also hope for something more. I'm hoping that machine learning can do more than just improve our reconstruction of the data, but that it can knock down the walls that have prevented us from making certain kinds of discoveries. That it can let us tap into problems we always thought were impossible. We just sort of put them on the shelf and said, well, nobody's ever going to do that. So let's just not even think about it.

Starting point is 00:03:06 Now that we have these new tools, let's go back and look at that shelf of impossible problems and see if we can actually crack any of them. Because it would be awesome if machine learning could do more than just optimize our analysis. If you could make what was once a possible, possible again. And so I want to go back to like the old days of particle physics and think about what discovery used to mean. It used to be you could make a single event discovery. Back when the signals were high and the backgrounds were low before all the easy stuff was taken, right? You could like C1 positron and then your like data to Nobel Prize ratio was one to one, right?

Starting point is 00:03:42 Pretty awesome ratio there. All the good stuff has been used up and so now we got to make low rate high background discoveries, which are mostly statistically. So you can't look at one event and say, this was a Higgs or that was a Higgs. It's all statistical, which doesn't mean it's not valid, but it's just sort of a different kind of experience. And the reason we do this is that the new signals are rare, right? And so you need high-rate experiments. And when you go from low-rate experiments to high-rate experiments, you have to upgrade your technology. You can't do tracking the way we used to do tracking with computers.

Starting point is 00:04:17 These are computers back in the day, humans doing computing, where you could look at at the data and say, here's a track, here's a track, here's a track, here's a track. Because it turns out your brain is really good at that. But they can't keep up with sort of high rate modern tracking or high rate modern data taking. And so we have to move to a new kind of tracking, which allows for high rate analysis but comes at a cost because tracking is a hard problem. So let's make sure we're talking about the same thing.

Starting point is 00:04:47 You produce a bunch of particles. They fly out. You have detector elements. elements, you don't see the whole particle trace. Often you look at event display and you're like, there's the muon. But what you actually see are individual hits where the muon has left a blip. And this task tracking, I'm going back and saying, oh, all of these hits were on one particle.

Starting point is 00:05:09 And these hits were on a different particle. And it's pretty simple for this example. But now imagine you have a thousand particles, 10,000 hits. How many ways are there to assign 10,000 hits to a thousand particles? Right? 10,000, choose a thousand is a big number, right? And so in principle the problem is impossible to brute force. And so we have approximate approaches. In order to make this problem accessible, we've made some assumptions.

Starting point is 00:05:40 And that's fine. That's the way you should first do things. Every time you start a hard physics problem nobody's ever done before, you assume something spherical or you take the first order, whatever, right? Let's think beyond that. So the typical assumptions that are made in tracking are two. Number one, you assume that particles begin at the interaction point, because most of them do. And most of the particles you're looking for are emitted from the hard interaction.

Starting point is 00:06:06 And so if you're looking for a set of hits that are a particle, constrain it so that the track has to go through the vertex, dramatic reduction in the number of combinations, right? Very quickly. Much easier problem. And most of the particles do that, so awesome. The other assumption typically made is that tracks move as a helix, because particles are either neutral, in which case they're invisible anyway, or they're charged under electromagnetism, in which case in our constant magnetic field, you get a helix. Another huge simplification. If you only have to look for helices, then you know how to look for particles where to look for them.

Starting point is 00:06:44 Very important assumptions made for tracking, but these also limit our discovery. If you're not looking for particles that don't come from the heart interaction, you can't find them. If you're not looking for particles that are not helices, you can't find them. Now there's been a lot of work recognizing that we should look for particles that don't come from the vertex. There's a whole cottage industry of these folks now, and it's great work, and one of my postdocs is working on this. I think it's very creative and lots of possibilities for discovery. You know, there's things that are created later and things that are invisible and then pop up, all sorts of awesome stuff. So clearly we've recognized that we should dig into these assumptions and figure out ways to avoid making them.

Starting point is 00:07:25 Much less attention has been paid to the question of finding non-helical tracks. And one reason is that the assumption of helicity of helical track is built into our track-finding algorithm. Typically the way this works is you find a little stub, a few hits in a line, and then you fit a helix to it so that you can project forward, you can project forward and figure out where your next hits should be. You don't have to consider a hit all the way over there or hit all the way over here because you know where this track is going and you have an envelope with uncertainties and column and filter and it's very sophisticated and it works really well if you know the

Starting point is 00:08:03 parametric form of your track, if you assume it. So track finding and fitting closely woven together, right? We assume it's a helix. So this algorithm, how could it possibly find non-helical tracks? It's woven deeply in to this strategy. So, well, you might think, well, let's just put in another parameterization, right? Just plug in something else. I have a helix, plug in a schmelix.

Starting point is 00:08:28 Why don't you look for schmeleses, right? Okay, that's cool. And we can do that, and I'll show you an example, but I want to do something bigger. I want to find something where I didn't have to come up with the parameterization in advance. I want to be prepared for that crazy weird particle that's in my detector, which if I just showed you the event display, it would look like. the positron discovery. Your eyes would be like, look at that, that's something. I don't know what it is, but it's something in our detector. I want to discover that, find a real surprise. So that's the

Starting point is 00:08:58 big hard problem that I think has been seen as impossible, you know, to look for effectively potential single event discoveries waiting in our data. If only you knew which event to look at, but currently we have no algorithms capable of finding these things. Okay, so usually And this is my fantasy, right? Find something that makes the theorists go, no, that's not possible. You can't have discovered that. We proved that doesn't exist in the universe. Like, awesome, let's do it.

Starting point is 00:09:29 So how do we do this? Well, we piggyback on the good work of other people. So there's a collaboration of folks called exit track and other people around the world who developed a new way to find tracks using machine learning. For completely separate motivations, they were trying to develop tracking, which is good, at high pileup environments where there's lots and lots of particles because the traditional

Starting point is 00:09:52 method doesn't work really well when you have huge numbers of particles. So they developed a completely new approach which uses graph networks. So you recognize that your data looks like a graph, your network looks like a graph. It's all very, very cool. And it works. In high pileup environments, they're good at finding tracks. But along the way, they reinvented tracking, and their tracking separates finding and fitting. machine learning tracking does not assume a helix. It learns the tracks are a helix from the

Starting point is 00:10:22 training examples that you give it. And it separates the finding and the fitting. It says, here's a set of points I think are on a track. Then later you fit it to a helix. So it's separated these two tasks, which is exactly what we need if we're going to find things that are not helices. All right. So that's the setup. And that's what made me wonder when I saw these papers, can we use machine learning tracking to sort of generalize track finding, to find things that are unexpected? And then as a bonus, I think we came up with a way to speed up the track fitting part of it.

Starting point is 00:10:56 All right, so first I'm going to talk to you about generalized track finding. How do you find tracks that are not helices? Then talk about fast and precise fitting of those tracks. Okay, so first, you might think, I like the idea of craziness, but I need an example. Is this even possible? Like, you know, weird discoveries are cool, but give me an example. So there are folks up at UC Davis who came up with this idea that maybe there are particles that don't just have electromagnetic charges, but that are like charged under a dark

Starting point is 00:11:26 QCD. So for example, the quirk, right? The quirk is a dark matter particle that, it's like a dark analog of a quark, right? But it has a dark analog of QCD. But in dark QCD, the confinement scale and the mass scale have a different relationship. So when you create quarks and they get macroscopically far apart, instead of having a flux tube which then pops new quirks out of that energy because the quirk mass is low, the quirk mass is high.

Starting point is 00:11:56 And so it can't pop new quirks out of that flux tube. And so they remain separated and they oscillate macroscopically. So you create them in a pair and they go like, whoa, whoa, whoa, whoa, whoa, whoa, through your detector. weird trajectories. So that's very cool. It's a good example of something which we cannot find at all. And so we thought it was a good initial test case to work on. Don't start with the full problem, start with a baby step. So we generated a bunch of quirks in Madgraph, passed them through a simulated detector. And as an aside here, this detect that

Starting point is 00:12:33 we use is sort of a toy setup. We just have a bunch of layers of detector and as a particle passes through it, we say there was a hit. We'd love to do the more realistic study of actually modeling what happens to a quirk as it passes through many layers of detectors and doing that in Jaunt. But it's very tricky to modify Jaunt in this specific way. And the lore is that there was a student at Berkeley who spent three years working on this, failed and then left the field. And so people like, oh, it's impossible.

Starting point is 00:13:02 But recently there's a student at Irvine working on importing quirks into Jaunt for a separate project for phaser, and he tells me it's going well. So we'll see. Anyway, for this study, we began with a simplified detector, just a bunch of layers. And you see that you get lots of different weird kinds of behavior, depending on the quirk parameters, of course. So there's this ratio of the mass of the quirks and the confinement scale. If the ratio is correct, you get oscillations on the order of centimeters, which is very cool.

Starting point is 00:13:33 Otherwise you might get oscillations on the order of like nanometers, in which case they're invisible, or really, really large, in which case they're invisible. But in this intermediate scale, you get all sorts of really weird tracks you would just never expect from the standard model. So can we find these things? And most of the actual work was done by Max Fai. He's now a postdoc at Fermilab and Chi Yu Sha at Beijing.

Starting point is 00:14:00 And we use this machine learning track pipeline. And I won't go into great detail about how it works, but the idea is that you learn to find track you don't assume they're helices. And the way you do this is you take your track, which has a bunch of hits in it, and you map it to some abstract latent space. What is this latent space?

Starting point is 00:14:17 We don't know. We're going to learn this mapping, which defines the latent space, and we're going to learn a mapping so that tracks, so that hits that are on the same track end up near each other in this latent space, and hits that are not on the same track

Starting point is 00:14:30 and are far apart from each other. That sounds cool, but like, how do you come up with this mapping, right? That's the joy of machine learning. don't have to know how to construct this or write this down or find some analytical solution, you just train it to do that. And so that's what this pipeline does. You give a bunch of examples of things that are on the same track and things that are not

Starting point is 00:14:49 on the same track, it learns this mapping. So that in the latent space, things that are on the same track are near each other. You can just be like, tracking is easy now. These are all clustered together. Obviously they're on the same track, right? So it's like saying, could you map my hard problem to an easy problem? Thanks. And that's effectively what it does.

Starting point is 00:15:08 And you notice, along the way, it never assumes a helical path. It just says this set of hits on a track, this set of hits on a track. So I'm going to quote the performance of our tracker in terms of efficiency. And technically I define that as the number of reconstructed tracks, divided by the reconstructable tracks. Reconstructable meaning you went through a bunch of layers. Reconstructed, meaning that you are matched to the tracts. true, a double majority match to the truth, which means the true track, most of its hits are on

Starting point is 00:15:41 your reconstructed track, and most of the hits on the reconstructed track are from the true track. It's peak pollination season, and my business is scaling fast. To keep the nectar flowing, I need a phone plan with top priority data speed. That's why I chose GoogleFi wireless. My connections stay strong even when the hive is buzzing. Plus, unlimited plans started $35 a month. Now, That's a deal that doesn't stay. Explore GoogleFi Wireless plans today. Plus taxes and government fees. GoogleFi Wireless is not subject to data traffic deprioritization during times of high network usage.

Starting point is 00:16:17 All right, and we can get into the details of that if somebody has a question. So we started by saying, hey, can we get this pipeline to run at all? Because this is like a research thing. It's not like an app you download in your phone. It just works. So we got an account at NIRSQ. We got this thing to run. This step was not easy.

Starting point is 00:16:34 I'll be honest. Like just getting this code to run. We got some help from the authors. But first thing we did is like, all right, feed it standard model tracks. Ask it if it can find standard model tracks. Answer is yes. Woo-hoo. All right, first step, right?

Starting point is 00:16:48 Then we said, well, let's see how hard this problem is. Let's train it on standard model tracks and then just feed it a bunch of quirks. Maybe it's just good at it. Maybe we're underestimating it. Answer is no. It sometimes find quirk tracks, but only when those tracks happen to look exactly like standard model tracks. All right.

Starting point is 00:17:06 So it doesn't find any of the interesting quirks. So this tells us the pipeline works. And how many times did it find things that weren't there? Oh, yes, the fake rate. We'll talk about that at the end. It's very, very low. If you just take like random sets of hits,

Starting point is 00:17:21 it almost never chooses those to be tracks. Yeah. So, for example, these, right? It's good at finding these. These are cases when the oscillations are not visible because the parameters either give you microscopic oscillations. Yeah, so it fails at this kind of stuff. Okay, so the pipeline works, but the quirk problem is hard. If this had just worked, we would have been like, oh, well, standard tracking can find this already a salt problem.

Starting point is 00:18:08 You may have said this, but as a theorist, they'll have to ask, what are the characteristics of your tracker that you're simileating here? Oh, that's an experimental question. We have two configurations, one with eight layers and one with 25 layers, just simple cylinders. Each layer does X1? Each layer gives you a space point, so effectively X, Y, Z. Yeah. Other questions? So data set, in this line, yes.

Starting point is 00:18:43 Just to see, this is a stand-in for traditional trackers. If something that's good as standard model can find quirks, then the problem is too easy. Right? So this was just to see if the problem was hard enough. Exactly. We expected it to be bad. It was bad. So that was good. And the performance here depends a lot on these parameters. In some cases, arrangements of the parameters, you get like most of the tracks look like standard model tracks. In some cases you get very few of them. So anyway, in the paper you can see this whole table. Then we did the harder thing. We said, all right, now let's teach it to find quirks. Let's give it a bunch of

Starting point is 00:19:26 a bunch of quirk examples, can it find them? Right? This is the thing we were excited about. And the answer is yes. So there really isn't like some built-in assumption about helices in this pipeline. The only thing we had to change was we had to change the assumption that the order of the hits are the order in time, because sometimes quirks go backwards and then go out. So we just have to make a little tweak to the pipeline, but then it can learn them.

Starting point is 00:19:50 So we were very excited that this pipeline can learn to associate sets of hits that are not in a heat. that are not in a helix. But as you said, somebody asked, like, okay, well, if you only have one quirk in your detector, then like, how hard is this problem anyway? So, you know, we're taking baby steps there. But here's an example of an event where you have two quirks, and the truth quirk looks like this, and the reconstructed quirk are these dots. So it doesn't find every single hit, right?

Starting point is 00:20:19 You see there's some places it intercepts, but it finds a majority of the hits and definitely enough to reconstruct this quirk to identify this as like something weird in your detector. That's an example. Okay, so then we said, all right, let's get more realistic. We're going to run this thing on events with obviously standard model particles also. So let's train it with standard model particles and quirks in the training data set and then test it on events that have a mixture, right? Because obviously events are going to have lots of standard model background.

Starting point is 00:20:53 And this efficiency is the quirk efficiency, not the standard model. model efficiency because here we labeled these negatively. We said, don't find the standard model tracks. Here's a bunch of examples of what not to find. So we trained it to find these and to ignore these. And then we measured the efficiency on events that had a big mixture, mostly standard model, of course, but measured the efficiency only on quirks.

Starting point is 00:21:16 And so even when it's drowning in standard model tracks, it can still find quirks with a reasonable efficiency. And I think that was somebody's question about it. Mixing them. Okay. I know I'm not supposed to ask questions like this, but. Oh, I load those are my favorite preface or a question. What is the, what are the characteristics of the tracks geometrically that it's actually,

Starting point is 00:21:40 have any idea of what, what it's training, how it's training? Oh, like how does it solve this problem? Right, I mean with standard model we have some ideas that we're looking for, as you said like a helix, we have the idea of the geometry. Uh-huh. And sorry, these are like pixels with timing or no time? No timing, just 3D space points. Just a bunch of points.

Starting point is 00:21:59 Just a bunch of points. It's kind of hard. Yeah, it is hard. So we did examples with eight layers, an example of 25 layers. Here's an example of a pair of quirks. I'm not showing the standard model tracks just for clarity, because they'd drown you out, but this trajectory was found, and it was able to disentangle these, right?

Starting point is 00:22:17 It's like this is the purple, this is the red, this is the purple, this is the red. Because it learns some pattern about how the quirks. What did it learn here? How is it doing it? Every training sample is quarks and standard model in the same detector. But we label the tracks as like quirk or standard model.

Starting point is 00:22:34 So find these, but don't find these. We never show it quirks without standard model. In the last line. Thank you. What is the ratio? How many? Yeah, great question. I don't know that number, but we used TT bar as the background.

Starting point is 00:22:54 And we, I think, overlaid two TT bar events on every quirk. So how often does it fit? Yeah, almost never. So the fake rate is very, very low. Sorry, and when you train, you train on like one quark string tension or something. But is there some? I'll talk about that in a minute. Yes, exactly.

Starting point is 00:23:19 Right. So in the paper, we scan these two parameters. There's the confinement scale and the mass. And we measure the efficiency across this whole plane. And in some cases, the efficiency is excellent. In some cases, it's pretty good. In some cases, it's a little lower. These cases are mostly when the two quirks are overlapping on top of each other.

Starting point is 00:23:43 And so you can't reconstruct both of them. So your maximum efficiency is 50% anyway, like here. So you say, you train, you're training one. Yeah. Exactly. And so what I'm doing here is I have, you know, what, 20 examples? These are 20 trackers, each one for a specific quirk parameter. Now if I'm going to go to Atlas and say,

Starting point is 00:24:05 I want to run a new tracking algorithm on the data, they're going to laugh at me anyway, even if I have a great idea. And then if I say I have 20 tracking algorithms I want to run, it's never going to happen. And so we don't want to develop an individual tracking algorithm for every point. So we are interested in generalizing.

Starting point is 00:24:22 Can we develop a single one that can find any kind of quirk? So we did some generalization tests. We said, what if we train on one point, and then test on that point? Okay, so that's the baseline. But what if we train on everything but this point? We have all these points that aren't our target. We train on all the other ones and test on one of these points.

Starting point is 00:24:46 What's the efficiency? It goes down, but it's not devastating. Which tells you that it's learning something quirky. It's not like just memorizing these tracks or these paths, is something about the quirkiness of these tracks that it can generalize across these parameters. across these parameters. Then we did another one similar for a different point, and you see a similar relative behavior. So this was exciting for us because the whole time, like, yeah,

Starting point is 00:25:14 quirks, whatever, I'm not actually interested in finding quirks. I want to find something new and weird, and our hope was to use this as a stepping stone to generalize this into finding any kind of weird track. All right. So before I get there, let me answer your question in more detail, I think it was, about, or somebody's about the fakes, because you're going to run this thing on data, there's lots and lots and lots of tracks in there, how often do you get something that looks like a quirk but is actually a standard model track because you're drowning in standard model tracks. So what I did is I wrote a quirk fitter takes a bunch of points and just like a helix

Starting point is 00:25:48 fitter finds the best parameters of quirks that describe those points. And so then you can calculate the kai squared of the quirk hypothesis. What is the distance from those points to the quirk track, the best fit quirk track, and just like you can for the helix. And then you can effectively subtract those two kai squared. It's like a likelihood ratio of your quirk to helix hypothesis. So here are standard model tracks, and here are quirk tracks. And you see the separation.

Starting point is 00:26:15 This is a log scale. So most of the quirk tracks are way over here, and the standard model tail falls very, very rapidly. There is some overlap. There are some quirk tracks that don't fit great to the quirk hypothesis because they're missing some of the hits. There are some standard model tracks that do look a little quirky that the quirk fitter finds some parameters.

Starting point is 00:26:37 It turns out that's when the tracks are almost back to back. Essentially you have two standard model tracks. If you have a standard model track and you try to fit it to the quirk, it will find, it will assume there was another quirk back to back and that the oscillation length was too small or too big to see. So you can almost always find some place in quirk parameter space to describe a standard model track. But then you get these two tracks that are back to back.

Starting point is 00:27:02 So if you make the cut here, this is all the standard model tracks, you can remove essentially all of the background. And, you know, how efficient you want to be versus how much you want to tolerate background is a question of choice. But these are definitely two separatable populations. So that was studying the background. Yes. Ambition comes in all shapes and sizes. At First Citizens Bank, we roll with your goals because we're built for what you're building. Fit for your ambition for Citizens Bank. Yeah.

Starting point is 00:27:43 Yeah, we sign every fit. When you want to make sure you're getting all the hits, otherwise you just have like a thousand hits, but you picked up two tracks and then you're going to be able to. Do you understand what I'm asking? Are you asking whether random hits that are not from quirks get fitted as quirks or whether you're using up all the hits? Yeah, whether you're using up all the hits. Because I could imagine some approach where I'm going to fit all the standard model things and just look at what's left behind. Oh, I see.

Starting point is 00:28:21 Something like this or you're a lot of... Like you want to get all the particles. Right. How long are you picking up everything? I don't think you're usually using all the hits because there's also a lot of noise. Yeah, it's an interesting question. In these we don't have, we did, right, we don't have pilot, but we did add random hits as noise. And so I don't, I think you might be suggesting that as a strategy for finding weird tracks, you could

Starting point is 00:28:53 just fit all the standard model tracks, remove them, and then consider what's left. Right. That's yeah. Yeah. Which I think is a reasonable thing to do. And it's sort of what's happening here because here I'm fitting all the tracks to the standard model hypothesis also in using that as a way to reject them. But you're suggesting don't even include them in the track finding, which I think is a reasonable strategy, but we haven't explored that. So for this court, do you assume a specific confinement scale? No, it fits the best fine, confinement scale.

Starting point is 00:29:24 Yeah, which is why it always ends up here for standard model tracks. It finds one to effectively make the quirks look like straight lines. Okay. So, and then you can calculate like how many quirks you would see, et cetera, et cetera. And the next step is to actually get this in a jaunt and do realistic studies, a sort of proof of principle. And we had a paper out last year. You can go and read up on the details.

Starting point is 00:29:51 But that was really just the warm up, because what we actually want to do is find something weird. Like quirks are interesting, they're not a helical, but there's still an idea somebody had. And what I really want to do is find an idea that nobody had, a discovery that's a surprise. And so how do you describe, number one, how do you find things you don't know if you're what you're looking for, and how do you even define the space of things you might be looking for so you can like train your network? So our goal is to find all possible weird tracks, even unanticipated ones. So I pitched this problem to Levi, He's a grad student at UCI.

Starting point is 00:30:25 I said, here's a hard problem. What do you think about it? And he said, well, let's make some restrictions. You can't be sensitive to anything, because then you could take literally any set of hits and say, I have a magic unicorn that flies through my detector and deposits these hits. And any set of hits could be a hypothesis.

Starting point is 00:30:44 So he said, let's restrict it to any smooth path, something that has infinitely differentiable, because then you could imagine there's some force that's causing the particle to move that way. And then let's describe the path of the particles in the basis of Fourier modes. So you can take any path through space and express it as a sum of Fourier modes at various frequencies. And he discovered this mathematical tool called the Schwartz function, which will guarantee

Starting point is 00:31:14 that your path is smooth. So if the amplitude at high frequency falls fast enough, then your path, excuse me, your path is guaranteed to be smooth to satisfy this infinite differential ability condition. So this is called the Schwartz function. I'd never heard of it before. Turns out there's a guy in our math department. There's an expert in them. So that was cool.

Starting point is 00:31:34 And then you can generate example shorts functions or tracks that satisfy a Schwartz function. And here's one. So this one, right, he starts at the vertex and then does this crazy thing. And it's definitely not a quirk. it's not a helix, it's some other weird path. And just as an example of the kind of weird track we're hoping to be able to find. In order to guarantee that the path is infinitely differentiable, there doesn't have a kink in it, then we limit the amplitude of the high frequency modes.

Starting point is 00:32:09 Because if the high frequency modes can have very large amplitudes, then you can get crazy behavior. So we don't want to describe any possible function. We want to describe a subset of functions that are smooth, and this is the way that we can sample from the space, of smooth and here's a distribution in the number of hits in a 25 layer detector. If you start with just one mode, like you just pick one frequency, then you get this distribution.

Starting point is 00:32:33 And as you got up to 25 modes, you start to get some pretty weird tracks that let go in and out and do weird stuff, which is a lot of fun. So we did the same game. We said, well, let's see what happens. So we started out by training our pipeline on the standard model and saying,

Starting point is 00:32:49 let's feed it some weird tracks. And so this is the number of, Fourier modes up to 25. And you see it does terribly. Occasionally one of these things will happen to look helical enough to be reconstructed, but usually not. So it's a hard enough problem. And then we said, well, let's train it on our tracks.

Starting point is 00:33:08 So let's feed it a bunch of weird tracks and see if it can learn the relationship between the hits so that it can reconstruct them. And here, the training and testing, these are one frequency modes, one frequency modes, but obviously different tracks. We're not giving it exactly the same tracks, just new samples from the same space. And it can do it. So it's not rejecting these tracks. It's rejecting helices mostly, but it's not rejecting these tracks, which is exciting.

Starting point is 00:33:35 And then we said, all right, well, let's do it with standard model tracks. So we now are mixing standard model tracks and weird tracks together, training it on that, and measuring the efficiency on the weird tracks. Still pretty good. Important detail here is that now, because standard model, All tracks in principle are a subset of the weird tracks, we can't label them as don't find these because we want to find a generalized track finder. So we might use your idea like first find those, delete those hits and then run this track finder.

Starting point is 00:34:05 But here we didn't negatively label those tracks. Well it's continuous. It's continuous. It doesn't have to go to the outer layer. No. But it has to start at the interaction point and then follow a smooth trajectory. I think I have some examples. Yeah.

Starting point is 00:34:33 And so as you get more modes, I think it's maybe learning something more general about the problem. But I'll talk about generalization in a minute. I hear some examples of tracks that have found. So the green circles are the hits on the reconstructed track. Again, it doesn't find all of them, but it finds this guy. It identifies it. It gets the majority of the hits.

Starting point is 00:35:01 And these have TT bar events also. I'm just not showing those. Okay, so the question then is, well, you've trained it on a certain set of tracks, and you've tested it on other tracks, and the efficiency is pretty good, but have you really learned to find any smooth path? That's the big claim. How do you span that whole space? Well, we can't.

Starting point is 00:35:21 And one reason that we can't is that there's an infinite number of possible Schwartz functions. Shorts function is some condition about how quickly the amplitudes fall. A Gaussian satisfies it, but there's an infinite number of them. So you could choose any of them, and they each define a different space. of smooth tracks. But what we can do is try to sample this space. We can choose a couple different functions and see what happens.

Starting point is 00:35:44 So if we make one choice of sorts function, A, for example, and test on that one, we get pretty good efficiency. If we train on the same set and then we test on a different set where we remove the overlap. So we remove all tracks that are in B and in A, and only on the B tracks, we get good efficiency, even higher efficiency.

Starting point is 00:36:08 Okay, so it's definitely learning to reconstruct tracks that are not represented in its training sample. It's not just learning some efficiencies, some Fourier modes, or memorizing some details about them, it's learning some property of these weird tracks, some smoothness that it's then using to be able to generalize to other tracks it's not seen before, which was very exciting to me. Your summer starts now with Memorial Day deals at the Home Depot. It's time to fire up summer cookouts with the next grill, four-burner gas grill,

Starting point is 00:36:39 on special buy for only $199. And entertain all season with the Hampton Bay West Grove's seven-piece outdoor dining set for only $49. This Memorial Day get low prices guaranteed at the Home Depot. While supplies, price-in-valid May 14th or May 27th, U.S. only exclusions apply. See Home Depot.com slash price match for details. And again, you have to worry about fakes. Like if you're talking about weird tracks, you're also going to be susceptible to like some garbage or the thing radiated away or just looks ugly. And so here we don't have a hypothesis to fit with, like, as we do with the quirks, where we can fit to the quirks and compare it to the helix.

Starting point is 00:37:18 All we can do is try to fit to a helix. And sometimes, like this thing just does not fit to a helix. You get a terrible k-squared. But sometimes a weird track will get a pretty reasonable kye-squared for the helix hypothesis. So you can't be sensitive to all of them, but you definitely can reject a lot of them because they look, they don't look, you can reject a lot of the standard model because they look like standard model tracks. So before I move it on the next step, questions about, that's essentially what I'm trying

Starting point is 00:37:48 to do here is I'm trying to show how you could try to filter these things out. Effectively, you have to remove stuff that has a bad k-squared in the standard model tracks. That's dangerous because the performance here, that means cutting this tail. And this tail depends a lot on the details. Like in this simplified environment, most of your standard model tracks fit well to a helix. You go to a more realistic environment, you're going to get all sorts of crap that's happening.

Starting point is 00:38:15 This tail is going to get worse and worse and worse. So when we go to a more realistic environment, I think this is going to get a lot harder. So then I got interested in this question of fitting because essential to removing, to discovering a weird track is filtering out the weirdest tracks and identifying them. which means finding the standard model tracks and removing them. This is essentially anomaly detection. In the language of anomaly detection, we typically use an auto encoder,

Starting point is 00:38:44 which is a classical approach. And an auto encoder, you start from some data set, which is physical, say for example, hits. And you find a mapping to a latent space, which is lower dimensional, and you train your auto encoder to find some mapping so that it can map to the latent space and then back and recover

Starting point is 00:39:04 the original data. If you can do that, then you've done some sort of like effective compression. You found some way to encapsulate your complicated data in a lower dimensional space so that you can expand it back out. That's the principle of anomaly detection. But tracking is essentially already that. We start from a hit space and then we try to describe that using a lower dimensional parameter space, the helical parameters, right?

Starting point is 00:39:30 Essentially you're saying, I'm going to summarize all of these hits with the with a five-dimensional parameter, five-dimensional helix. And if I've done it correctly, then when I go the other direction, I could predict where the hits were, if I know where the detector layers are. And if I try to fit something which is totally not a helix, I'm going to get some crazy fit, which when I go back is not going to recover my original hits. So tracking already is in the language of anomaly detection. But the problem with tracking is that we know how to go from parameters to hits.

Starting point is 00:40:03 If I tell you a helix, I could tell you exactly where that helix goes. I don't know how to go from helix to, I don't know how to go from hits to parameters. If I give you a set of hits, there's no function I can write down that says, here are the helical parameters. This mapping is unknown. And so traditionally, what do we do? Well, we do this horrible scan where you explore this space and you say, how about this point? And you map it back to hits and compare the hits.

Starting point is 00:40:30 No, that didn't work. How about this space? No, how about this space? And you have this optimization problem where you're scanning the parameter space looking for the point, which best describes your hits. And yes, we have optimizers, but they're all terrible. Anybody who has worked on a realistic experiment, there's always like one guy, usually some Russian dude, whose job it is to make the track fit or work.

Starting point is 00:40:52 And there's so many things in there that you need to tune just right because it's such a pain. And tracking is filled with local minima. It's a really hard task. And it's slow. If you have to run your optimizer every time and it to start from different conditions just to make sure you minimize, it's a pain. So it's slow, it's unreliable.

Starting point is 00:41:11 In addition, it assumes Gaussian hit noise, right? You have in this parameterization, you're assuming that your hits have a Gaussian distribution. And finally, it optimizes for the thing we're not interested in. It optimizes for what is the point of the parameters that brings me closest to my hits. But what we're actually interested in are the parameters. Like, say these are your hits, and the true track is the green line. The fitted track might be this red one, because what you're optimizing for is the distance from the fitted track to the hits. Nobody actually cares about the distance between the fitted track and the hits.

Starting point is 00:41:49 What you care about are what was the momentum of this electron, which means what you want to know is, what was the curvature, what was the direction. Those are the physical things we're interested in. We don't care about this distance. And one experience from machine learning is make sure you're optimizing for the thing you actually care about. Not some proxy, which usually is a good proxy for it, but sometimes isn't, right? Always optimized for the thing you actually care about. So I said, well, we don't know this function to go from hits to parameters.

Starting point is 00:42:18 What if we just try to learn it? Instead of searching this space here, let's learn this mapping from hits to parameters. There's no search. There's no assumption of Gaussian distributions. And it can optimize for the thing we're actually interested in, which is the parameters. So I gave this task to a machine learning grad student and he came up with a model which will take as input hits and take as output parameters. Because there is a mapping, right?

Starting point is 00:42:45 We don't know it. It might not be easy to express, but it exists. Each set of hits defines parameters, right? So you should be able to be parameters of the track, the helical parameters. Well, there's one of the parameters. includes distance of closest approach. So with the or... Well, the track is not required

Starting point is 00:43:08 to go through the origin. Now, the problem is that there is multiple scatter in the random material effects, which should take an electron... Absolutely. Absolutely, yes. So it would be targeting the origin or some other point that...

Starting point is 00:43:28 I guess you... Well, the beauty, thank you for raising that. The beauty of this is that we don't have to assume, for example, that the hits have a Gaussian distribution. We can include all of that stuff implicitly in the training sample. Give it a bunch of electrons and say,

Starting point is 00:43:41 here were the true parameters of the electrons, here were the hits. And it'll learn to associate those. Where is the origin, right? You need to pass the true value, absolutely. And you can generate... Well, the five parameters define the helix, but that doesn't require it to go through the origin.

Starting point is 00:43:59 So wherever you generated your tracks, maybe they all came from the origin, in which case they do pass through it, or maybe you have a spread. of them, is that you're asking about, how we generated the training sample? No, I mean... Well, I mean, it's... Material effects will make it not move in a helical path, right?

Starting point is 00:44:14 They'll deviate from a perfect helix. Yeah. Maybe we can talk about it offline, yeah. I think we're having a semantic issue. Anyway, produced a model which learns to map hits to helical parameters. And in the case that the hits actually are Gaussian distributed around the track, it reproduces the performance of least-square. of least squares fitting.

Starting point is 00:44:39 So here's the residual in a couple of the parameters. I'm not showing all of them. They're in the paper. And this is the difference, so you want to peak at zero and you want to be narrow. So the neural network has the same performance as the least squared. And the least squared in the case of Gaussian distributed hits

Starting point is 00:44:56 is the best you can do in the terms of minimizing the residual. And so we hope to match it. And so in this case, the neural network does match the idealized least squared, which is good. we show that we can learn this mapping. But what if you have a more complicated situation? Let's skip that for now. What if you have a more complicated situation?

Starting point is 00:45:17 What if your hits do not map, are not Gaussian distributed? What if instead they're skewed from the true path, or there's an offset, or there's multiple scattering? Now you can describe that implicitly in your training sample just by adding those deviations to the hits in the training sample, and the neural network will still learn the true true parameters, right? Because it has seen the examples.

Starting point is 00:45:41 It says if your hits over here, the true track was over there. It will learn that relationship. Whereas the Lee squared, you have to encode directly into the model in your exact noise distribution. Here, we only have to encode it implicitly. We can run some complicated simulation or whatever. We can include as many effects as we want implicitly in the training sample. The network will learn that.

Starting point is 00:46:04 The Lee squared has to explicitly encode in the kai squared exactly what the noise model is. And of course this is much, much faster. So it's about a thousand times faster than searching the parameter space to try to find the best values of the parameter. And it works very well as an anomaly detector. So in the case that you have some deviation from Gaussianness, then your neural network is much better at separating non-heylical tracks from helical tracks. than your least squares fitter.

Starting point is 00:46:38 All right. So in conclusion, machine learning tracking is much more flexible. By separating the finding and the fitting, we can do finding of non-heal tracks and we can do fast and precise fitting of the helical tracks. All right, thanks very much. With this fascinating lecture by Daniel Weidson sparked your interest, don't let your curiosity end there.

Starting point is 00:47:02 Watch my full podcast conversation with Daniel about his latest book, Do A&Mewan. the latest book Do Aliens Speak Physics, where we ask what the laws of nature say about life beyond Earth. Don't forget to like this video, comment with your biggest question about particle physics, and subscribe for more trips to the edge of the universe and beyond, and don't forget to share it with your friends. Yamava Resort and Casino at San Manuel is California's number one entertainment destination for today's superstars. Catch the Jonas Brothers return to the Yamava Theater stage on April 30th,

Starting point is 00:47:42 the powerful vocals of Demi Lovato on May 17th, and the signature Southern Country rock of Eric Church on July 19th. Tickets on sale now at yamava Theater.com, only at Yamava Resort and Casino, celebrating its 40th anniversary. You win? Must be 21 to enter.

Into the Impossible With Brian Keating - Physicists Missed These Particle Tracks for Decades (ft. Daniel Whiteson)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.