Microsoft Research Podcast - What’s Your Story: Ivan Tashev

Starting point is 00:00:00 To succeed in Microsoft, you have to be laser-focused on what you are doing. This is the thing you can change. Focus on the problems you have to solve. Do your job and be very good at it. Those are the most important rules I have used in my career in Microsoft. . Microsoft Research works at the cutting edge.

Starting point is 00:00:29 But how much do we know about the people behind the science and technology that we create? This is What's Your Story and I'm Johannes Gerke. In my 10 years with Microsoft, cross-product and research, I've been continuously excited and inspired by the people I work with and I'm curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now I'm sharing their stories with you. In this podcast series, you'll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

Starting point is 00:01:11 In this episode, I'm talking with partner software architect Ivan Tashev in the anechoic chamber in Building 99 on our Redmond, Washington campus. Constructed of concrete, rubber, and sound-absorbing panels, making it impervious to outside noise, this chamber has played a significant role in Ivan's 25 years with Microsoft. He's put his expertise in audio processing to work in the space, helping to design and study the audio components of such products as Kinect, Teams and HoloLens. Here's my conversation with Ivan, beginning with his childhood in Bulgaria, where he was raised by two history teachers. So I'm born in a city called Jambol in Bulgaria, my origin country. The city is created 2,000 years BC and now sits on the two shores of the river called

Starting point is 00:01:59 Tunja. There has always been important transportation and agricultural center in the entire region. And I grew up there in a family of two lecturers. My parents were teaching history. And they loved to travel. So everywhere I go, I had excellent tourist guides with me. This in this place happened, this in this in this in this year. Were there quizzes afterwards?

Starting point is 00:02:33 But it happened that I was more fond of engineering, technology, math, all of the devices, just mechanical things, just fascinated me. When I read in a book about the parachutes, I decided that I'll have to try this and jump into it from the second floor of a building with an umbrella to see how much it will slow me down. It didn't. And how did you get to him? Oh, I ended with a twisted ankle. Nothing more. So you were always hands-on, that's what you're telling me, right? Always the

Starting point is 00:03:12 experimenter. Yep. So I was doing a lot of this stuff, but also I was good teachers in math and going to those competitions of mathematical Olympiads was something I started since fifth grade. Pretty much every year they were well organized on school, city, regional level. And I remember how in my sixth grade I I won the first place of the regional Olympiad. And the prize was an 8mm movie camera. That, I would say, changed my life. This is my hobby since then. I have been holding this movie camera for several generations everywhere I go and travel.

Starting point is 00:04:02 In Moscow, in Kiev, in Venice, everywhere my parents were traveling I was shooting 8mm films and I continue this till today. Today I have a much better equipment but also very powerful computers to do the processing. I produce three to five Blu-ray discs pretty much every year. Perform performances of the choir or the dancing groups in the Bulgarian Cultural and Heritage Center of Seattle mostly. Wow, that's fascinating. Was that hobby somehow connected to your entry into science and then actually doing a PhD

Starting point is 00:04:42 and then actually going into audio processing? The mathematical high school I attended in the city where I was born was one of the five strongest in the country, which means first math every day, two days twice, physics every day. Around ninth grade at the end, we finished the entire high school curriculum and started to study differentials and integrals, something which is more towards the university math courses. But this means that I had no problems entering any of the universities with mathematical exams. I didn't even have to do that because I qualified in one year, my 11th grade, to become a member

Starting point is 00:05:35 of the Bulgarian national teams for the International Math Olympia and for International Physics Olympia. And they actually coincided, so I had to choose one and I chose physics. And since then I'm actually saying that math is the language of physics, physics is the language of engineering. And that kind of showed the tendency. So literally I was 11th grade and I could legitimately point and choose any of the universities and I decided to go and study electronic engineering in the Technical University of Sofia. And then how did you end up in the US?

Starting point is 00:06:12 So that's another interesting story. I defended my PhD, graduated from the university, defended my PhD thesis. It was something amazing. What was it on, actually? It was a control system for a telescope, but not just for observation of celestial objects, but for tracking and ranging the distance to the satellites. Literally one measurement is you shoot with the laser, it goes to the satellite which is 60 centimeters in diameter, it returns back and you measure the time with accuracy of 100

Starting point is 00:06:45 picoseconds. And this was part of studying how the Earth rotates, how the satellites move. The data, there were around 44 stations like this in the entire Earth, and the data were public and used by NASA for finalizing the models for those satellites, which later became GPS. Used it by Russians to finalize the models for their GLONASS system. Used it by people who studied precession and rotation of the Earth. A lot of interesting PhD thesis came from the data, from the results of this device, including tides. For example, I found that Balkan Peninsula moves up and down two meters every day because of the tides.

Starting point is 00:07:32 So the earth is liquid inside and there are tides under us in the same way as with the oceans. Oh wow, super interesting. I actually wanted to come back, so just to get the right kind of comparison for the unit. And so picoseconds, right, kind of comparison for the for for the unit and so picoseconds right because i know what a nanosecond is because the nanoseconds is one on minus ninth because second is one on minus twelve okay good good just to put that in perspective exactly so this was the the accuracy the light goes 30 centimeters for that time, for one nanosecond. And we needed to go way shorter than that. But why this project was so fascinating for me, can you imagine this is 1988, people having

Starting point is 00:08:18 Apple II or compatible computers playing with the joystick, a very famous game when you have the crosshair in the space and you shoot with laser the satellites. And I was sitting behind the ocular and moving a joystick and shooting at the real satellites. Not with a golden straw, of course. No, the energy of the laser was one joule. You can put your hand in front, but very short, and one nanosecond. So it can go and turn and you do have the resolution to measure the distance. And after that I became assistant professor in the Technical University of Sofia. How I came to and then a friend of mine came back from a scientific institution from the former Eastern Germany.

Starting point is 00:09:18 And he basically shared how much money West Germany has poured to the East German economy to change it, to bring it up to the standards. And that it was, I think, 900 billion Deutsche Marks. But this went after the changes. After the changes, after basically the East and West Germany united. And then this was in the first nine years of the changes. And then we looked at each other in the eyes and said, wait a minute, if you model this as a first order system, this is the time constant.

Starting point is 00:09:51 And the process will finish after two times more of the time constant, and then we'll need another 900 billion marks. You cannot imagine how exact became that prediction when this Germany will be on equal economically to the West Germany. But then we looked at each other eyes and said, what about Bulgaria? We don't have West Bulgaria. And then this started to make me think that most probably there will be technical university of software, but in this economical crisis there will be no money for research, no for development, for building skills, for going to conferences and then pretty much around the same time somebody said, hey, you know Microsoft is coming here to hire.

Starting point is 00:10:40 And I sent my resume knowing that okay, I'm an assistant professor, I can program. But that actually happened that I can program quite well, implementing all of those control systems for the telescope, etc., etc. And literally… And so there was a programming testing as part of the interview? Oh, the interview questions were three or four people, one hour, asking programming questions. The opening was for a software engineer.

Starting point is 00:11:10 Like on a whiteboard? Like on a whiteboard. And then I got an email saying that, Ivan, we liked your performance, we want to bring it to Redmond for further interviews. I flew here in 1997. After the interviews, I returned to my hotel and the offer was waiting for me at the reception. So this is how we decided to move here in Redmond. And I started and went through two full shipping cycles of programmers. So you didn't start out in MSR,

Starting point is 00:11:40 right? Nope. Where were you first? So actually, I was lucky enough. Both products were version 1.0. One of them was COM+. This is the transaction server and the COM technology, which is the backbone of Windows. It was the component model, basically, at that point in time? Common object model. Basically, creating an object, getting the interface, and calling the methods there. And my experience with low-level programming on assembly language and microprocessor actually came here very handy. We shipped this as a part of Windows 2000, and the second product was the Microsoft Application Center 2000, which was a cluster management system.

Starting point is 00:12:22 But both of them had nothing to do with signal processing, right? Nope. Except there was some load balancing in the application center, but they had nothing to do with signal processing. Just pure programming skills. And then in the year of 2000, there was the first TechFest. And I went to see it and said, wait a minute. There are PhDs in this company, and they're doing this amazing research.

Starting point is 00:12:47 My place is here. And TechFest, maybe you want to explain briefly what TechFest is? TechFest is an annual event when researchers from Microsoft Research go and show and demonstrate technologies they have created. So it used to be like in the Microsoft Conference Center, like a really big two-day event. Microsoft Conference Center and basically visited by 6,000-7,000 Microsoft employees. And usually Microsoft Research, all of the branches, were showing around 150 demos. And it was amazing.

Starting point is 00:13:17 And that was the first such event. Pretty much not only… Oh, the very first time. The very first TechFest. Got it. And pretty much not only me, but the rest of Microsoft Corporation learned that we do have a research organization. In short, in three months, I started in Microsoft Research.

Starting point is 00:13:35 How did you get a job here then? How did that happen? So seriously, visiting TechFest made me to think seriously that I should return back to research. And I opened a career website with potential openings, and there were two suitable for me. One of them was in the Enrico Malver Signal Processing Group, and the other was in the Communication, Collaboration, and Multim multimedia group led by Anup Gupta.

Starting point is 00:14:05 So I sent my resume to both of them. Anup replied in 15 minutes. Next week I was on informational with him. When Rico replied, I already had an offer from Anup to join the team. Got it. And that's where your focus on communication came from then? Yes. So our first project was RingCam.

Starting point is 00:14:26 So it's a 360 camera, eight element microphone array in the base. And the purpose was to record the meetings, to do a meeting diarization, to have a 360 view but also based on the signal processing and face detection, to have a speaker view, separate camera for the whiteboard, the aeration based on who is speaking, based on the direction from the microphone array. Honestly, even today when you read our 2002 paper, Ross Kettler was the creator of the 360 camera, I was doing the microphone array. Even today when you read our 2002 paper, you say, wow, that was something super exciting and super advanced.

Starting point is 00:15:11 And then you brought it all the way to shipping, right, and became a Microsoft product? So, yes. At some point, it was actually monitored personally by Bill Gates. And at some point… So he was PMing it, basically? He basically was… He was just a graphic. I personally installed the distributed meeting system in Bill Gates' conference room. We do have basically 360 images with Bill Gates attending a meeting.

Starting point is 00:15:42 But anyway, it was believed that this is something important and a product team was formed to make it a product. Ross Cutler left Microsoft Research and became architect of that team. And this is what became Microsoft Roundtable device. It was licensed to Polycom and for many years was sold as Polycom X5000. Yeah, actually I remember when I was in many meetings, they used to have exactly the device

Starting point is 00:16:08 in the middle. And the nice thing was that even somebody who was remote, you could see all the people around the table and you got this really nice view of who was next to whom and not sort of the transactional windows that you have right now in Teams. That's a really interesting view. So as you can see, a very exciting start. But then Anup went and became Bill Gates' technical assistant, and the signal processing people from his team were merged with Rico Malver's signal processing team. And this is how I continued to work on microphone arrays and speech enhancement.

Starting point is 00:16:45 And this is what I do till today. And you mentioned amazing products from Microsoft, like Kinect and so on, right? And so you were involved in the audio processing layer of all of those. And they were actually then, part of it was designed here in this room? Yep. So tell me a little bit more about that. You know, at the time, I was fascinated by a problem which was considered theoretically impossible.

Starting point is 00:17:07 Multi-channel acoustic echo cancellation. There was a paper written in 1998 by the inventor of the acoustic echo cancellation, from Bell Labs, stating that stereo acoustic echo cancellation is not possible. And he proved it? Or what does that mean? He just… Look, it's very simple. You have two unknowns, the two impulse responses from the left and the right loudspeaker and one equation

Starting point is 00:17:30 That's the microphone signal What I did was to circumvent this When you start Kinect You'll hear some melodic signals and this is the calibration. At least you know the relation between the two unknowns. And now we have one unknown, which is basically discovered using an adaptive filter, the classic acoustic echo cancellation. So technically, Kinect became the first device ever shipped with surround sound acoustic echo cancellation. The first device ever that could

Starting point is 00:18:08 recognize human speech from four and a half meters while the loudspeakers are blasting and gamers are listening to very loud levels of their loudspeakers. So let me just tell the audience a little bit, what does it mean to do acoustic echo cancellation? What is it actually good for? What does it do? So in general speech enhancement is removing unwanted noises and sounds from the desired signal. Some of them we don't know anything about, which is the surrounding noise. For some of them we have a pretty good understanding. This is the sound from our own loudspeakers.

Starting point is 00:18:42 So you send the signal to the loudspeakers and then try to estimate on the fly how much of it is captured by the microphone and subtract this estimation and this is called acoustic echo cancellation. This is part of every single speaker form. This is one of the oldest applications of the adaptive filtering. So what the right way to think about this is that noise cancellation is cancelling unwanted noise from the outside. Unknown noises. Whereas you know acoustic air cancellation is cancelling the own noise. Which we know about. Right okay. And that was an amazing work but it also started actually in Techfest. I designed this surround sound echo cancellation and my target was at the time we had a Windows Media Center. It was a device designed to

Starting point is 00:19:31 stay in a media room and controlling all of those loudspeakers and I made sure to bring all of the VPs of Windows and Windows Media Center and then I noticed that I started repeatedly to see some faces which I didn't invite, I didn't know, but they came over and over and over. And after the meeting, after TechFest, a person called me and said, look, we are working on a thing

Starting point is 00:19:59 which your technology fits very well. And this is how I started to work for Kinect. And in the process of the work, I had to go and talk with industrial designers because of the design of the microphones, with electrical designers because of the circuitry and then requirements for identical microphone channels and with the software team

Starting point is 00:20:24 which had to implement my algorithms. And this actually at some point I had an office in their building and was literally embedded working with them day and night especially at the end of the shipping cycle of the shipping cycle when the device had to go out. And this was not a time when you could go like in the device and you know update software on the device anything the device would go out as is right? Actually this was one of the first devices like that. It could? Yep. Already Kinects were manufactured, they are boxed, they are already distributed to the stores but there was a deadline when we had to provide the image when you connect to your Xbox and it has to be uploaded.

Starting point is 00:21:10 But I get that, but then once it was actually connected to the Xbox, you could still update the firmware on there? Yes. Oh, that's really cool. Okay. But it also has a deadline. So that was an amazing trip. It literally left all of us, breathless.

Starting point is 00:21:29 There are plenty of serious technological challenges to overcome. A lot of first technology was brought to this device to make sure. And this is the audio. And next to us were the video people and the gaming people and the designers and everybody was excited be working like hell so we can basically bring this to the customers wow that's super exciting I mean even just being involved and I mean I think that's one of the really big things that's so much fun here at Microsoft right that you can get whatever you do in the hands of you know, if not hundreds of millions of people.

Starting point is 00:22:08 Coming back to your work now in audio signal processing, and that whole field is also being revolutionized like many other fields right now with AI. Absolutely. Photography, one of the other fields that you're very passionate about, is also being revolutionized with AI, of course. Also revolutionized. You know, in terms of changes that you've made in your career, how do you deal with such changes?

Starting point is 00:22:35 This is something where you have been an expert in a certain class of algorithms, and now suddenly it says, this is completely new technology coming along and we need to shift. How are you dealing with this? How do you deal with this personally? In some sense, you're becoming a little bit of a dinosaur in a little bit while… Oh, not at all! Yeah, that's interesting. I wouldn't be in research!

Starting point is 00:22:54 Exactly. How did you overcome that? So first, each one of us was working and trying to produce better and better technology. And at the time, the signal processing, speech enhancement, most of the audio processing was based on statistical signal processing. You build the statistical models, distributions, hidden Markov models, and get certain improvements. get some information yep and all of us started to sense that this though this is set of tools we have started to saturate and it was simple we use a simple models we can derive let's say speech is Gaussian distribution noise is Gaussian distribution you derive the suppression rule.

Starting point is 00:23:47 But this is simplifying the reality. If you apply a more precise model of the speech signal distribution, then you cannot derive easily the suppression rule. For example, in the case of noise suppression. And it was literally hanging in the air that we have to find a way to learn from data. And I have several papers actually before the neural networks to appear that let's get a big data set and learn from the data.

Starting point is 00:24:20 So a more data-driven approach already. Data-driven approach. I have several papers on that. And by the way, they were not quite well accepted by my audio processing community. All of them are published on a bordering conferences, not in the core conferences. I got those papers rejected. But then appeared neural networks. Not that they were something new. We had neural networks in the 80s and they didn't work well. The miracle was that now we had an algorithm which allowed us to train them. Literally, next year after the work of Jeff Hinton was published in the implementation

Starting point is 00:25:02 of deep learning, several things happened. At first, my colleagues in the speech research group started to do neural network-based speech recognition, and I and my audio group started to do neural network-based speech enhancement. This is the year of 2013 or 2014. We had a speech neural network-based speech enhancement algorithm surpassing the existing

Starting point is 00:25:26 statistical signal processing algorithm literally instantly. It was big, it was heavy, but better. When did the first of these ship? Can you tell any interesting ship stories about this? The first neural network based speech enhancement algorithm was shipped in 2020 in Teams. Okay. We had to work with that team for quite a while. Actually, four years took us to work with Teams to find... You see, here in the research, industrial research lab, we have a little bit different perspective. It's not just to make it to work.

Starting point is 00:26:01 It's not just to make it a technology. That technology has to be shippable. it has to meet a lot of other requirements and limitations in memory and in CPU and in reliability. It's one thing to publish a paper with very cool results with your limited data set and completely different to throw this algorithm in the wild where eight can face everything. And this is what it cost us around four years before to ship the first prototype in teams. That makes sense. And I think a lot of the infrastructure was also not there at that point in time early

Starting point is 00:26:36 on, right, in terms of, you know, how do you upload a model to the client, even in terms of all the model profiling, you know, architecture search, quantization, and other tooling that now exists where you can take a model and then squeeze in on the right kind of computation footprint. That's correct. So you did all of that manually, I guess, at that point in time. Initially, yes. But new architectures arrived. The cloud.

Starting point is 00:27:00 Wow! It was a savior. You can press a button, you can get 100 or 1,000 machines. You can run in parallel multiple architectures. You can really select the optimal from every single standpoint. Actually, what we did is we ended up with a set of speech enhancement algorithms. Given computing power, we can tell you what is the best architecture for this. Or if you want to hit up this improvement,

Starting point is 00:27:32 I can tell you how much CPU you will need for that. But that trade-off is also something very typical for industrial research lab and not very well understood in academia. Makes sense. Well, let me switch gears one last time, namely, I mean, you have made quite a few changes very well understood in academia. Makes sense. Let me switch gears one last time. Namely, you have made quite a few changes in your career throughout.

Starting point is 00:27:51 You started as an assistant professor, then became a core developer, then were a member of a signal processing group, and now you're driving a lot of the audio processing research for the company. How do you deal with this change? Do you have any advice for our listeners on how to keep your career going, especially as the rate of change seems to be accelerating all the time? For 25 years in Microsoft Corporation, I have learned several rules I follow. The first is dealing with ambiguity. It is not just Най-два правила, които следвам. Първото е да се бори с амбигуитет. Не е само да се промени технологията,

Starting point is 00:28:30 но и промени върху командата, организациите, и т.н. Почитайте, че това са неща, които не може да се променят. Това са неща, които не може да се отглежда. Просто да ги отглежда и да се продължава. И тук приема втората правила. not hide. Just accept them and go on. And here comes the second rule. To succeed in Microsoft, you have to be laser-focused on what you are doing. This is the thing you can change. Focus on the problems you have to solve. Do your job and be very good at it. This is the most important. Those are the two most important rules.

Starting point is 00:29:05 I have used it in my career in Microsoft. Okay. Super, super interesting, Ivan. Thank you very much for this amazing conversation. Thank you for the invitation, Johannes. To learn more about Ivan's work or to see photos of Ivan pursuing his passion for shooting film and video visit aka.ms slash researcher stories.

Microsoft Research Podcast - What’s Your Story: Ivan Tashev

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.