Microsoft Research Podcast - 092 - MMLSpark: empowering AI for Good with Mark Hamilton

Episode Date: October 2, 2019

If someone asked you what snow leopards and Vincent Van Gogh have in common, you might think it was the beginning of a joke. It’s not, but if it were, Mark Hamilton, a software engineer in Microsoft...’s Cognitive Services group, budding PhD student and frequent Microsoft Research collaborator, would tell you the punchline is machine learning. More specifically, Microsoft Machine Learning for Apache Spark (MMLSpark for short), a powerful yet elastic open source machine learning library that’s finding its way beyond business and into “AI for Good” applications such as the environment and the arts. Today, Mark talks about his love of mathematics and his desire to solve big, crazy, core knowledge sized problems; tells us all about MMLSpark and how it’s being used by organizations like the Snow Leopard Trust and the Metropolitan Museum of Art; and reveals how the persuasive advice of a really smart big sister helped launch an exciting career in AI research and development. https://www.microsoft.com/research  

Transcript
Discussion (0)
Starting point is 00:00:00 It's one thing to count the number of photos of leopards, but how do you tell the difference between one very narcissistic leopard who likes to get their photo taken and several incredibly shy leopards? The only data that you have in order to tell the leopards apart is their spots and their spot patterns. Many kinds of patterned creatures across the ecosystem, their patterns are kind of like human fingerprints in that they're unique but slightly varied, so it's a very difficult task. So that's really the next step. You're listening to the Microsoft Research Podcast, a show that brings you closer to
Starting point is 00:00:34 the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizenga. If someone asked you what snow leopards and Vincent van Gogh have in common, you might think it was the beginning of a joke. It's not. But if it were, Mark Hamilton, a software engineer in Microsoft Cognitive Services Group, budding PhD student, and frequent Microsoft Research collaborator, would tell you the punchline is machine learning. More specifically,
Starting point is 00:01:06 Microsoft Machine Learning for Apache Spark, MML Spark for short, a powerful yet elastic open-source machine learning library that's finding its way beyond business and into AI for good applications such as the environment and the arts. Today, Mark talks about his love of mathematics and his desire to solve big, crazy, core knowledge-sized problems, tells us all about MML Spark and how it's being used by organizations like the Snow Leopard Trust and the Metropolitan Museum of Art, and reveals how the persuasive advice of a really smart big sister helped launch an exciting career in AI research and development. That and much more on this episode of the Microsoft Research Podcast. Mark Hamilton, welcome to the podcast.
Starting point is 00:02:01 Thank you. Thank you for having me. I always start my podcasts with introductions, and they're usually pretty straightforward, but you're differently situated than most researchers I get in the booth. So I'm going to start by letting you tell me and our listeners who you are, what you do, where you work, and in general, what gets you up in the morning. Yes, I'm Mark Hamilton. I'm a software engineer on Microsoft's Cognitive Services team, and I run an open-source machine learning library called Microsoft Machine Learning for Apache Spark. And this library has kind of brought me into the more applied research space, so we do a lot of AI for good type projects with it.
Starting point is 00:02:37 And recently, I've started my PhD over at MIT. Right now, I'm part-time at Microsoft. I used to be full-time. I've been working at Microsoft for the past three years. And during this next chapter of my life, I'll be part-time, both getting a PhD in computer science and mathematics, as well as working at Microsoft. team creates things like the cognitive services, so translation, text analytics, computer vision, these types of projects. And we really heavily collaborate with Microsoft Research. So we collaborate with John Langford and the Vopal Wabbit team. We collaborate with Light GBM and the DMTK team and MSR Asia. We also collaborate with the AI for Earth team and Lucas Joppa, his whole side of the organization. So we really have a lot of different projects that kind of span into Microsoft Research, where personally, I really like to work. I really like these kind of research problems.
Starting point is 00:03:34 I like working on mathematics and things that aren't necessarily easy to implement. Right. So since you're at MIT, you're physically situated in Massachusetts. Are you working with Microsoft there in Massachusetts as well? Yeah, I work at the Microsoft New England Research and Development Center, or the NERD, colloquially put. And MIT is right next door, so it makes it fairly easy to kind of go back and forth between these two. Although my team is actually situated in Redmond, so I come out here about every month and visit Siddharth and my manager, say hi, you know, make the rounds, that kind of stuff.
Starting point is 00:04:08 And that's why I'm looking at your face today instead of doing this remote, which is awesome. Well, you've been working as a software engineer and a developer, but you just started your PhD at MIT, as you've just told us. What made you decide to do that? And what lines of research are you most interested in right now? It was a tough decision because I couldn't imagine working with a better team than I do now at Microsoft. But really what it came down to is that I just missed math. I missed staring at a mathematics textbook all day. And really that kind of intense learning is what I
Starting point is 00:04:42 really missed in my life. And so that's kind of what drove me back to the PhD. But I didn't really want to leave my team because they're an amazing group of people and amazing collaborators. So I definitely wanted to keep that connection alive as I moved on to the PhD. Some of the things I really want to start to tackle in my PhD is really dive deep into the mathematical foundations of deep learning and how to use topics from more advanced and algebraic mathematics to really influence the kinds of architectures that we can create to learn. And so some of these particular threads that I'm interested in is information theory, because information theory provides these really nice
Starting point is 00:05:21 mathematical tools to describe knowledge in a pure and crystallized way. And what's really exciting is, you know, now we have the ability through things like adversarial networks to really control how information flows through deep networks. And that can really yield a lot of interesting techniques and applications. One of the things that I'm particularly interested in is using information theory to help us understand complex systems that we ordinarily would have no idea where to start. So things like our own thoughts or the thoughts of kind of the larger organism, namely the whole human race, that kind of thing would be really interesting to see if algorithms could pull out structure and really tell us what it is we're doing when
Starting point is 00:06:10 we're thinking or when we're communicating. Small aspirations there, Mark. Yeah, I'm always driven by the crazy core of knowledge kinds of problems. I like them big. We hear a lot about AI for good these days, which is sort of an umbrella term for applying artificial intelligence technologies to things we feel good about, as opposed to just how can I get more clicks on my ads? It manifests itself in fields like medicine, agriculture, the environment, and the arts. So we'll talk about some of those specific projects shortly, but first let's wade upstream and talk more philosophically about using our power for good
Starting point is 00:06:49 and not for advertising. What are the promises of AI that excite you most? Yeah, I mean, AI is very exciting to me because it's one of the most powerful tools that a developer has at their disposal in order to create things of unprecedented intelligence and ability to do work in the real world. I think that it's important that organizations like Microsoft and organizations across the world really think about devoting some of their resources to AI for good type problems, because it not only brings a diversity of problems to the table, it also really can make an extraordinary impact, where it's not like you're just optimizing the last three decimal places out of an advertising click-through scenario.
Starting point is 00:07:33 You're really fundamentally changing the shape of a problem and its solution in some way that may or may not have had a lot of machine learning in the past. Well, and I want to stop there for a second, because usually the problems that get the most attention are the ones that potentially make the most money or affect the bottom line of some organization or person. And some of these problems under the AI for good umbrella tend to be important things, but things that don't have a huge ROI, at least at the outset in people's minds. And so it's like, why should we put our minds over there when someone's going to pay us to put our minds here? So how's that mapping out? I mean, you see Microsoft putting its muscle and mind and money behind AI for Earth. So what do you think about that?
Starting point is 00:08:22 Yeah, I think it's a lot like research or like any other kind of problem where there's an incredible value to diversity, where if your entire business model is parked in a single kind of task or a single kind of automated solution, you, one, are incredibly sensitive to shocks in the market. And two, you never really know where the next good idea is going to come from. A lot of times you get a huge boost from diversity. Like when I used to work in automated theorem proving, the hot topic when I worked there was agent-based theorem provers. And they're kind of like lots of little tiny algorithms that all try to prove different parts of the theorem. And when you look at how long each
Starting point is 00:09:05 one of these little tiny algorithms takes to solve an individual task, it either solves it instantly or it took years. It would never solve it. And so that's when people kind of started realizing we need a diverse collection of these and we need to pool them together because, you know, for some lines of research or lines of thinking, the problem can immediately be solved. Whereas with other lines of thinking and lines of work, the problem will never be solved. Until quantum. Billions of years in 10 minutes. Maybe. Let's get more specific and talk foundationally about a really cool machine learning framework
Starting point is 00:09:56 that you've already alluded to, and you gave it its full name, but it's called MML Spark. Give us an overview of MML Spark and describe some of the features that set it apart. Yeah. So what Microsoft Machine Learning for Apache Spark, or MML Spark for short, aims to do is really bring together a lot of the different technologies that currently exist in the Microsoft and kind of more broader computational ecosystem and put them all under the same roof with a special set of superpowers. And these superpowers are massively distributed so that you can run it on hundreds to thousands of machines at a time and elasticity. So what this means is that you can kind of add in new computers as you
Starting point is 00:10:37 see fit or kill off some computers if they're taking up extra resources. And so this kind of lets you make computations that grow or shrink depending on how much data is actually flowing through the computation. We wanted to tackle the largest scales that any industry could possibly see, but also be able to scale down to a single node if you don't have that kind of money lying around. More specifically, what MML Spark brings to the table is deep learning in this kind of large distributed big data environment, efficient gradient boosted trees with light GBM.
Starting point is 00:11:12 We've recently added Microsoft research work in Vopal Wabbit on Spark. So kind of bringing that into this distributed ecosystem. Talk about Vopal Wabbit for a minute. Every time I say it, I feel like Elmer Fudd, but I suppose that's the point. Yeah. So VopalWabbit for a minute. Every time I say it, I feel like Elmer Fudd, but I suppose that's the point. Yeah. So, VopalWabbit is one of the newest additions to MML Spark. And what VopalWabbit provides is really hyper-efficient text analytics, and now increasingly more it's broadening into the scope of reinforcement learning and multi-armed contextual bandits. So, one thing that VW colloquially put is incredibly useful for is working with like text classifiers, text regressors, and also optimizing
Starting point is 00:11:52 different text-based situations. So for instance, MSN uses it to optimize all of their ads and do what's called multi-world testing. And a lot of other companies are starting to use it to kind of automatically update and refine their content. All right. So keep going a little bit more on the MML Spark value proposition, if you will. There's like a bunch of random different things that are kind of all under the roof. I mean, one extra thing that we've kind of lit up in the past year is the ability for Spark not just to be a big data platform, like a big cluster computing framework, but to serve as what's called a microservice orchestrator. And so in software engineering, it's really useful to take all of your different components and encapsulate
Starting point is 00:12:38 them behind nice little packages, little boxes that talk to each other in the same way that me and you talk to each other. And this kind of provides a nice separation of church and state between different components of your world. It makes it easier for teams to collaborate on them and things like this. And so what we do in MML Spark is we give the building blocks to create these kinds of ecosystems of algorithms that all talk to each other so that you can kind of take your existing spark computation and turn it into a web service or you can take your big collection of spark clusters and use it to talk to a web service and so it kind of provides these two pieces as well as doing integration
Starting point is 00:13:17 with other machine learning frameworks so all right so what's the origin of it and who's contributed because I'm hearing some academic contributors, some Microsoft research contributors, and even Microsoft proper. It's a product. Yeah. Yeah. I mean, the product originally came out of what's called Azure Machine Learning. And what we first started on was the cognitive toolkit on Spark, or taking deep learning and trying to parallelize it across hundreds of computers at a time. And then the first project that we ever worked on was the Snow Leopard Trust. And that really provided the impetus to grow the library and that people saw its potential
Starting point is 00:13:54 through working on Snow Leopard recognition and gave us a lot more time and gave us a lot more ideas and a lot more challenges that we had to kind of solve with the same library and really forced it to grow in a lot of different directions. All right. I want to talk about that right now, the Snow Leopard Trust. And there's a project under AI for Earth that you've done with the Snow Leopard Trust. And this is sort of AI technologies in conservation. This is about identifying, counting and tracking snow leopards, which are very elusive cats, with motion trigger cameras or camera traps, why do we need to like they discovered a whole new leopard habitat or a whole new swath of leopards. They really went back to some of their existing models before you take an animal off the endangered species list, because that can be very detrimental to the animal's protected status. It suddenly means that a whole bunch of rules that were designed to actually preserve and protect the leopard no longer apply. And so what the Snow Leopard Trust
Starting point is 00:15:20 really aims to do is create a much more robust collection of data to really hone in on the true number of snow leopards so that we can accurately say what's going on with the population of snow leopards and should they or should they not be endangered. And you might think like, well, what do I care about this big cat in the mountains. But in order to protect the snow leopard, because it's the apex predator, it's at the top of the food chain, you kind of need to protect the entire food chain in order to keep that funnel going up there. So protecting the apex predator of an ecosystem is incredibly important for the ecosystem's health as a whole. If you keep the apex predator alive, some other portions of the ecosystem start to swell. For instance, when they brought the gray wolf back into Yellowstone, suddenly it was a much lusher place because all of the herbivores, that kind of secondary trophic level, were taken out by the wolves. And so suddenly plants could grow again. And so, you know, the influence of an apex predator can sometimes have like profoundly interesting and beneficial effects on food and
Starting point is 00:16:26 the stability and biodiversity. Right. And also the snow leopard being a stealthy cat, you would have some need to really understand how to count them because are they not there or are they just hiding? Yeah, this is a particularly tough challenge because in the snow leopard trusts, like multiple decades of work on these creatures, they've really only been able to collar in the tens of leopards. And you can't really get much data out of 10 leopards. And so the only real option is to use camera traps. And camera traps are these automated systems that they have a camera, and they have an infrared detector that detects motion, and they'll fire a burst of photos every single time something moves in front of it. And one of the problems with this is that there's
Starting point is 00:17:16 some snow leopards, but there's also a lot of waving blades of grass and goats and even like locals that go and dance in front of the camera. There's a ton of fun stuff in this data set. So there's a lot to cull through. So that's a good segue into why MML Spark. What does machine learning add to the mix here? It can be incredibly difficult in order to actually look at one of these photos and rule out that a leopard's not in the photo because leopards are kind of designed for this.
Starting point is 00:17:41 They're designed to hide. They're designed to be indistinguishable from their surroundings. So for a human being, this can be fairly difficult and time consuming if you really want to be sure that there's no leopards there. And so we've estimated that it would take around 20,000 hours to cull through the Snow Leopard Trust's like 1.2 million images that they have in their backlog from all of these 50 to 60 cameras that are out in various different locations. And so what we really need is a system that can handle this incredibly large upfront cost of processing 1.2 million images, but then elastically scale back down to handle the kind of day-to-day flow of data through the system.
Starting point is 00:18:24 How's it going then? Have they said this new data set confirms our suspicions that it's not endangered or what? Yeah, so there's a few extra steps that need to happen before we can really get those concrete numbers. One of them, which is the task that we're working on now, is that it's one thing to count the number of photos of leopards, but how do you tell the difference between one very narcissistic leopard who likes to get their photo taken and several incredibly shy leopards? The only data that you have in order to tell the leopards apart is their spots and their spot patterns.
Starting point is 00:18:57 Many kinds of patterned creatures across the ecosystem, their patterns are kind of like human fingerprints in that they're unique but slightly varied. So it's a very difficult task. So that's really the next step. And what we've already provided is kind of a great burden lifted off the Snow Leopard Trust shoulders. But this next burden of matching them up, they still have to do it kind of CSI style where they print out all the photos and they plaster them all over whiteboards and try to piece this puzzle together manually in a conference room as opposed to something that is more efficient and automated and scalable. Okay, so that's interesting because I was thinking that's exactly where the machine is going to do better than I would do at telling the difference between the narcissist and the introvert. And that said, then we're still back to human labeling and
Starting point is 00:19:46 data identification. Yeah? No? Yeah. So, you know, there's a lot of nice technologies out here now that really aim to solve this problem in a generic way. The one that we're looking at is called Hotspotter. A lot of researchers in the literature have kind of come up with automated ways in order to do species identification based on patterns. And a lot of these algorithms, they really require you to understand not just is there a leopard in the photo, but where is the leopard in the photo? So some of the work that we've done to try to address this is that we have this classifier, this thing that can say, yes, there's a leopard and no,
Starting point is 00:20:25 there's not a leopard. But it would be ridiculous to think that this classifier didn't actually know where the leopard was and yet performed well. And so we can actually pull this information out of the classifier by asking it a series of questions. And this method is called LIME, or locally interpretable model agnostic explanations that can take any sort of classifier and look into its brain, so to speak, and figure out what is actually causing the classification to occur. And when you do this with a leopard classifier, you find that it kind of hones in on the leopard's body. And we hope to then pipe this into Hotspotter and really use that to complete the end-to-end pipeline in a way that doesn't really require any human effort or any human labeling.
Starting point is 00:21:20 Well, another area that falls under the AI for good umbrella is a fairly recent trend to deploy AI in cultural preservation and engagement. And these are two sides of an important coin. So you've got a great story about how Microsoft, MIT, and the Metropolitan Museum of Art in New York City got together to create what you call GenStudio and employ GANs, or Generative Adversarial Networks, in the world of fine art. Yeah, so this collaboration came out of an initiative that the Metropolitan Museum of Art started, where they took their entire collection, and they took really high-quality photos of everything in a lot of different angles, put it together with all of the metadata about the artists and put these as open access online so that anyone from across the world, developers, anyone looking to enjoy and use this art in their projects could really grab it without needing to worry about the rights, needing to worry about licensing. So seriously, I can go and use any piece in the Met collection in anything I want to do? Yeah, they've done this for about 400,000 different pieces in the Met's collection.
Starting point is 00:22:29 How many are there? I think there might be like a few million. I suppose. But you can't quote me on those numbers. I'm not going to. Yeah. So they released this large open access catalog, and then they employed a team to go through the collection and tag the art with what's actually being shown in the art. So people and flowers and dogs and cats and turkeys and what
Starting point is 00:22:50 have you. And so not only do you have the actual images, you have some semantic knowledge about what's going on inside. And this semantic knowledge is slightly different than what you'd get from a normal classifier, because it's not that it's tagging a physical turkey. It might be tagging the engraved metal in the shape of a turkey. So things that wouldn't normally be picked up by algorithms. So it's a very useful data set. And what they really wanted to do was create an environment where people could use all this data to create interesting applications or create applications to reach a broader segment of people and that's really where we came in is that we took this data and we really wanted to create an experience that got people excited about
Starting point is 00:23:32 not just the Metropolitan Museum of Art's collection but the way that we and algorithms kind of think and make art in a real platonic and philosophical sense keep going I mean I love all aspects of this project, especially the idea of generative adversarial art. Can you unpack that a little bit? Because, wow. Yeah. So what a generative adversarial network tries to do is it tries to model different distributions. It tries to kind of mimic the collection of data that you have available. And the collection of data that we had available was the Metropolitan Museum's collection. And so the particular algorithm that we used learned kind of like
Starting point is 00:24:16 a human being does, in that it starts out by creating images about the size of a postage stamp, you know, eight by eight pixels. and slowly but surely as it learns it grows the image. And what this allows you to do is generate very high resolution photos of art. What we really aim to do primarily is to create an application that lets you explore with this generative adversarial network and then find related pieces to your creations in the actual collection. So not only do we have a GAN, we have an ability to take created works and reference them via a custom reverse image search
Starting point is 00:24:51 and pull them up in the METS actual search collection and then find these kinds of connections and give people a starting point to branch out and explore this kind of 400,000-work database that they have assembled. Definitely one of our goals as we continue on in this vein of work is to create things that would get school kids excited, get people who wouldn't ordinarily go to the museum, get people who have never been in the same 100 square mile radius as the museum to start to play with these because it's hosted online. Anyone can go to the website and start to explore.
Starting point is 00:25:27 What is the website so we know? So it's gen.studio. That's it? That's it. That's G-E-N dot studio. Okay, I'm going there next. I'm really interested in some common themes I'm detecting here. We're using AI to preserve, conserve, and save,
Starting point is 00:25:45 but also to create, generate, and innovate. Let's head back to the more philosophical for a minute and talk about creativity and machines in general. I know you have big ideas. Why don't you share some of your thoughts and then tell us what the world looks like if you're wildly successful? Right now, I think that AI creativity is a bit limited in that it really mimics what humans have created. And there are ways to think about taking this outside of the box. You know, you can think about encoding the temporal evolution of art and seeing if an algorithm could extrapolate into the future. But it doesn't really, to me, feel like it addresses that problem. I mean, the closest thing that I've found to something that
Starting point is 00:26:30 I hope exists or that we can discover is that a few years back, there was a lot of hype about algorithms that created their own languages. And in some sense, what this research kind of showed is that a language is not something that is learned or taught. It really emerges out of the physical world in that a language is needed in order to communicate and solve problems in the world and can emerge by itself in a multi-agent system. And so one of the things that I think would be a nice holy grail is if we can create a kind of multi-agent system that rediscovers art or where art emerges as a real pivotal fundamental concept. You know, it seems like art and language are intimately connected and that visual arts, at least they convey ideas, they beam ideas across
Starting point is 00:27:18 from the wall into your brain in the same way that I'm beaming ideas to you and vice versa through our speech. And so the hope is that these can kind of be modeled in similar ways and they have similar structures. It will be interesting to see, you know, if there is a way to kind of create art without just mimicking human beings, like does it require multiple agents? Does it require just one? My guess is that it would require multiple and that this is kind of how language has to arise. And I would assume that it's just as complicated as language at the very least. Well, that's a good segue into the next question. I can't let any podcast go by without asking what could possibly go wrong. And I always ask, what keeps you up at night? When we start talking about generating art from art that already exists. Is there going to be any problems with copyright or plagiarism or even is it valuable if a machine made it in the same way it's valuable if a person made it? Yeah, and I think those are really good questions that we need to tackle in that, you know, is something that these algorithms will really never have in quite the same way as a human being does.
Starting point is 00:28:50 There are problems in generating art, like cheapening the human experience and also plagiarism. But I think that also what's probably more problematic is adversarial networks when they're employed by really bad actors. There's been a huge amount of controversy around like deep fakes and these kinds of really just sinister applications of adversarial networks. And it's very difficult to square this with pursuing research in machine learning because it seems like, you know, once you create something that you can't put the genie back in the bottle, you know, the humanities and the social sciences have the institutional review board. I mean, I don't see why AI shouldn't have a similar thing where people kind of think about what they're doing before they make it. You know, it might make growth a little bit slower, but it makes it a little bit stabler. And I'd much rather have a little bit slower,
Starting point is 00:29:41 stabler growth than very, very volatile, potentially fast-paced growth. All right, it's story time. Mark, you're only 25, so your story's not a long one yet. But I think you already have a few plot twists. Tell us your background and how you ended up working where you're working and going where you're going. Yeah, so my research life has definitely taken a lot of twists and turns in that when I originally started research, I was working in a photonics lab, and then I went to particle physics at the LHC, and then astronomy, and then metamathematics, kind of trying to chase the foundation of life, and then finally to machine learning. But I think that along the way, you pick up a lot of
Starting point is 00:30:38 different exciting viewpoints, you hear about random other whole branches of thought. And you also get to interact with really powerful mentors. And that I think that what's helped me throughout my life is the help of people like my sister and my teachers and my current manager who really help form your research directions, your opinions, and how you want to conduct your research and work with people and collaborate. Come back to your sister, because it's the first time anyone cited their sister as an inspiration. My sister is a really big inspiration for me. I think that my sister from the very get-go in high school was like, no, Mark, you're taking science research. I don't care how nerdy you think it is.
Starting point is 00:31:21 I don't care how many friends you're going to lose. Who's your sister? What's she do? Yeah, my sister, she's eight years older than me. She's just opened her lab in Arizona State University studying Legionnaires disease. All right. What's her name? So it's Carrie Hamilton. She's actually a MSR. We had a joint project where we looked at figuring out ways to measure Legionnaires.
Starting point is 00:31:44 Legionnaires is a certain waterborne pathogen and figure out how we can take the Legionnaires data and construct limits so that the government can say, hey, we don't want to see any more than this amount of Legionnaires bacteria flowing out of your tap if you want people to be safe. Are there other Hamiltons we ought to know about? Those are the two that are in the research sphere. Those are the ones we want to know about. All right. Well, where'd you go to school? Yeah. So I went to Yale in New Haven and studied math and physics there. So I started out my work there at the Large Hadron Collider and kind of looking for certain alternatives to supersymmetry, namely the vector-like quark. One of the most influential advisors in my life was Meg Ury, and she worked in active galactic nuclei. And what we aimed to do
Starting point is 00:32:31 there was to look at kind of the farthest black holes, namely these things called active galactic nuclei, and figure out their distance from a very small number of measurements. And then from there, I kind of moved into more abstract mathematics that took me into Germany, into this group that created a system of metamathematics. So it's math that creates math. And the goal is to kind of create things like theorem provers up in this very abstract metamathematical space and then apply them to all the languages that come out of this metamathematical space and then apply them to all the languages
Starting point is 00:33:05 that come out of this metamathematical language. And so that was a fun little foray into an intense realm of mathematics. Where in Germany is this math haven? It's in Jakob's University in Bremen, Germany. All right. So then you came back here, obviously. Yeah. And landed at Microsoft? and kind of followed him very closely as he really was one of the foundational people who built Microsoft machine learning for Apache Spark and continues to support it and support all of its related efforts. Well, my new favorite question is, what's one interesting thing about you that people might not know, and how has it influenced your career path?
Starting point is 00:34:00 I guess there's a few things that I do outside of just thinking about math, and one of the things that I've really come to love is cooking these days. You know, I think that cooking is one of the best ways to take your hard elbow grease and turn it into something that's incredibly satisfying without having to wait very long. You know, it's not like coding where you put in a lot of hours and then you change the color of pixels. You actually get to enjoy the things that you create. As we close, since you're on the kind of front end of your life in research, if that's where you land, I'm curious about your vision for the future horizon in AI. What parting thoughts would you leave with our listeners in terms of maybe what you're most looking forward to as you move ahead in the field.
Starting point is 00:34:46 Yeah, I think that one of the things I'm most looking forward to is just getting a large diversity of ideas and, you know, meeting more of the research community and seeing what kind of things are keeping them up at night and seeing kind of how they relate. I'm excited to start to uncover more of the beautiful structure underlying intelligence, and I think that it's surprising and it's complex. And also one of the things that I hope to do in the PhD is really explore how the math established in the pure math community and abstract algebra and things like this can help influence some of the work we're doing in machine learning in that it kind of feels like we are really doing a lot of algebra these days in machine learning and that we're thinking about the structures of different things and how these structures relate. And so I'd love to understand a little bit more about how these different kinds of mathematical tools can create systems capable of understanding the world to a much better
Starting point is 00:35:48 and more interpretable degree. Real quick before we go, do you have any experience having gone to the areas where snow leopards live or is this just all abstract for you? Yeah, I mean one thing that was really exciting was that we actually got to go to Kyrgyzstan and meet up with the Snow Leopard Trust and take a Honda CRV deep into the mountains where we got several flat tires and actually, like, see some of these camera traps. And we went to what's called the Shamshi River Valley. And it was really nice in that it kind of made this data set feel a lot more real and that we're actually able to, like, see one of the cameras and also just see how remote and difficult it is to actually get out there. And what was particularly exciting about this valley is that recently when we had first started doing this
Starting point is 00:36:35 work, they only had one camera there and they found a snow leopard and it was kind of the first documented evidence of finding a leopard in this particular valley. And so now they really want to scale out their efforts to actually understand, is this a single leopard passing through because they have incredibly large range? Or is this indicative of a more established leopard population that needs to be properly preserved? So they're really starting to build out an infrastructure around this river valley and it's really exciting to see them do this. And actually getting to meet Kostub, kind of the head of the Snow Leopard Trust there,
Starting point is 00:37:12 and getting to meet his whole team in that we're just not the only people doing really exciting things around snow leopard technology. They have collaborations in Asia creating 3D snow leopard camera traps and all sorts of really exciting things. They've now invested in a series of drones so they can do thermal imaging and kind of sweep through the entire ecosystem and get understanding of biomass and things like this. Well, I'm thrilled that you were able to join us in person so I didn't have to use a camera trap to see your face. Mark Hamilton, thanks for joining us on the podcast today. Thank you so much.
Starting point is 00:37:52 To learn more about how researchers are using machine learning for social good, visit microsoft.com slash research.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.