ACM ByteCast - Eugenio Zuccarelli - Episode 45

Episode Date: October 25, 2023

In this episode of ACM ByteCast, Rashmi Mohan hosts Eugenio Zuccarelli, Data Science Manager at CVS Health, where he leads innovation efforts for complex chronic care. He’s a business-focused data s...cience leader who has worked for other Fortune 500 companies across several industries, including healthcare analytics, automotive, financial, and fintech. He has also worked in the COVID-19 Policy Alliance task force using analytics to fight COVID-19 and develop policy recommendations for The White House and finding solutions to fight the pandemic. In addition to scientific journals, his work has been featured in Forbes, The Washington Post, Bloomberg, and Financial Times and his recognitions include Forbes 30 Under 30, Fortune 40 Under 40, and being a TEDx Speaker. Eugenio discusses how early passions in engineering, technology, and robotics led him to work in AI and data science, and a lack of the human component in these fields has driven his work. He describes his work on MIT Media Lab’s Project US, which uses AI and advanced biosignal processing to help people become more effective and empathetic leaders and organizations make tangible progress towards their HR goals, and how that research shifted when COVID hit and people worked from home. Eugenio and Rashmi also touch on the common challenges and concerns across different industries, such as data sharing and privacy, and his views on synthetic data. He also shares some of the most important lessons learned in his career and offers advice for students looking to build solutions with machine learning.

Transcript
Discussion (0)
Starting point is 00:00:00 This is ACM ByteCast, a podcast series from the Association for Computing Machinery, the world's largest educational and scientific computing society. We talk to researchers, practitioners, and innovators who are at the intersection of computing research and practice. They share their experiences, the lessons they've learned, and their own visions for the future of computing. I am your host, Rashmi Mohan. If you have ever sat in a hospital urgent care room, agonizing over when your turn with
Starting point is 00:00:35 the doctor will come, you have surely thought of many creative ways to solve that problem. Well, fret not. Artificial intelligence solutions today are analyzing waiting room data to streamline patient movement and optimize for the least wait time. And that's just the tip of the iceberg when it comes to the applications of AI, ML in healthcare. Our next guest is at that perfect confluence of these exciting fields. Eugenio Zuccarelli is a data science leader with the Fortune 5 company, CVS Health. He has worked across multiple industries, has been featured in many leading forums and publications for his exceptional contributions, including being listed in the Forbes 30 under
Starting point is 00:01:22 30 list and being a TEDx speaker. He has studied in premier institutions across the world, including MIT, Harvard, Imperial College London, and the University of Genoa. Eugenio, welcome to ACM ByteCast. Thank you for having me. It's a pleasure to be here. Wonderful. So I'd love to lead with our question that I ask all my guests, Eugenio, which is if you could please introduce yourself and talk about what you currently do, as well as what drew you into the field of computer science. Definitely. So I'm a data scientist. Right now, I'm more on the management side. So leading a team of data scientists, but I definitely come from the data science spectrum. And as you said, I've been studying it across MIT, Harvard, Imperial College, and the University of General. So a lot of training on the technical side,
Starting point is 00:02:11 and also quite a bit of an experience on a few different industries, from healthcare to automotive to fintech. And so I basically started my career working in data science from the technicalities, writing Python code, writing machine learning models. And I still do that too, but I'm now leaning more towards how can we better develop a team of data scientists? How can we do this at scale? Wonderful. That sounds very exciting. And I love the diversity of the work that you've done. What drew the interest into computer science, Eugenio?
Starting point is 00:02:47 Did that start early on while you were in school? Yeah, I've always been passionate about engineering and robotics and all of these technologies. I would say my family have always been pretty passionate about engineering, maybe on a few different areas of engineering, like civil engineering. But we've always been very technical. And so I've always had this passion on the technological aspects. And I started working when I was younger on side projects.
Starting point is 00:03:18 I felt this fascination with robotics, especially. So I started studying electronic and software engineering in college. And from then, you've been moving a bit more towards the software side. But that's pretty much where it started. That sounds great. Was there a certain teacher or a certain project that really led you towards sort of machine learning and AI?
Starting point is 00:03:42 Or was that just by chance? I would say it mostly started when studying college, specifically electronic and software engineering, I felt a bit of a sense of, you know, something missing. It was amazing technology that we built, but it was a bit more for ourselves. So to say, you know, we've always been really passionate, we're really passionate about technology, but it was missing some component, which I quickly realized was a human component. So doing something for a specific purpose and especially something that could help people. And so I started working a lot at the intersection of neurosciences and artificial intelligence, electronics, software engineering.
Starting point is 00:04:25 And that led them to this fascination and passion with AI and data science. You know, doing a little bit of research on your background, that came across pretty strongly, Eugenio. I think the idea that you speak of the human connect and understanding how technology can sort of benefit or help us understand humanity better. I was reading a little bit about your Project Us, and I was wondering if you could tell us more about it. It sounded really intriguing, but I would love for our audience to hear more from you.
Starting point is 00:04:58 Yeah, Project Us was one of those projects, and Steely is one of those projects that is a great way to showcase both a concept, you know, not very specifically tied to healthcare sometimes. It's a bit more on the space of maybe, you know, mental health, which sometimes is left a bit out of the healthcare conversation. But it started as a way to try and understand through artificial intelligence. If we can understand empathy, if we can foster empathy, so we can make people be better in tune, more in tune with other people. And it started as a tool that people
Starting point is 00:05:53 wore as a bracelet. And the question was, can we infer the emotions? Can we infer the empathy from the signals that we perceive from these bracelets and use AI to do all of the computation. And then COVID happened as we moved much more from the physical components of a bracelet to online interactions, to Zoom. And so it became a bit more about how can we use AI and data science to understand emotions through online conversations, through Zoom calls, and how can we align these emotions from the two participants so that we can make them more empathetic, so more aligned with each other, but also they can understand each other better. And this is extremely important when you're talking to your boss and maybe there are some
Starting point is 00:06:43 miscommunications, or if you're talking with people from across the world. And so there might be also cultural components there. Yeah, that sounds really fascinating. What exactly are you measuring? So whether it is the wearable device that you were speaking about, which is the bracelet, or, for example, even if you're looking at I'm guessing maybe text from online conversations you know transcripts from online conversations to understand emotions and then I don't know do you give an empathy score do you give guidance to somebody how does this work it's a great question
Starting point is 00:07:17 so it's a multi-modal system so it takes in a lot of different inputs. One is the images, you know, all the faces of the people exactly as we see them on Zoom. And so there is a lot of image recognition on the faces, on the emotions to try and infer the emotions from the image themselves. But we also extract a lot of the text from the conversation. And so sort of NLP input together with also the signals from the voice. And a lot of the, as we all know, a lot of the communication is not just about the words.
Starting point is 00:07:52 It's a lot about the facial expressions and also about how we say things like the intonation. And so with all of these data points, we're trying and we've been trying to build a single model that tries to approximate, so to say, a person, at least in the communication aspects, to try and then put a sort of score, as you said, on the emotion and on the balance of the conversation. So if it's positive or negative. Now, obviously, the idea is to try and do everything perfectly and trying to understand all of the different emotions all of the different types of interactions so to say but we've started with something simple so positive or negative emotional interaction wow yeah no that's phenomenal i mean did you ever think about how this would eventually
Starting point is 00:08:43 i mean i understand it was research and to be able to get to even that level of analysis to say, you know, is that conversation positive or negative is phenomenal. Did you ever think about how this might make its way into being like productized? And now there's a specific team working on this project and also carrying it forward from research to something a bit more on the product spectrum. But there are countless of possible applications. You can think of the HR type of applications. So we can create better training tools where it's not just about the communication, the text, the words,
Starting point is 00:09:25 but it's also about the emotions and the empathy components of interaction. So that's fascinating at least to me because it's a new way to understand training and HR systems, but also take into account the human component, the cultural component. And you can also think about other possible product applications in vehicles where people are driving. You can try and understand the current state of emotion of a person driving a vehicle or even just having any other type of interaction with a device that can be a car or can be any other tool but it's pretty challenging and also could lead to some negative outcomes and
Starting point is 00:10:14 so it's one of those situations where obviously you would like to better understand the emotional status of a person so that you can take care of that in case anything happens yeah definitely countless of possible applications for sure yeah no and i'm so excited to hear that you know there there is continued work that's happening along this i mean i'd love to see what eventually comes out of it i mean i'm guessing there's a lot of this data is already making its way into product but yeah no i'm very excited to hear that there is continued work around it but i know know that for you personally as well, you've worked in so many different domains. You were speaking earlier, you're talking about working in the automotive domain, working in fintech. What was that common thread that sort of you learned through all of those various domains? Like what led you to each of those those are pretty i mean it's it's one thing to talk about you know ai and ml research which is what your expertise is in data science but also understanding applications of that in different domains did you find the common thread across those yeah definitely and i would say one of the common threads as i said is obviously the technology like ai and data science is
Starting point is 00:11:21 a fantastic tool because it's applicable to basically every industry, every sector. Regardless of a company, regardless of the application, you will always have data. And all of these methodologies and approaches are very scalable and they're applicable to basically any possible application for an industry or for a company. But I would say probably the common thread I've found is the initial thought that maybe some industries
Starting point is 00:11:49 might be less sort of risky or sort of less challenging in terms of applying and implementing an AI system. And so some might say that the healthcare industry, for instance, the healthcare sector might be an area which is extremely challenging in terms of applying AI systems because it's so high risk in terms of applications. You're working with people's lives. But I've got to be honest, working in all these different industries, you realize very quickly how basically especially now every sector every company really deals with a very challenging situation so there is no single industry where now you can say well applying an ai system is easier than another industry and for a lot of reasons you know every
Starting point is 00:12:40 company has sensitive information as a lot of data and has a huge responsibility in terms of our users. And so while at the beginning it might seem that some industries might have been easier in terms of implementation, easier in terms of regulatory components or bias or ethics, it's actually not the case. Every single industry has a lot of challenges and we really have to be careful in how we use these technologies to do the right thing and also take into account the human component all the time. I think that's a very, very important aspect
Starting point is 00:13:16 that you just brought out, Eugenio, because you're right. When you think about AI and healthcare in people's lives, that tends to be, you know, we automatically tend to take that a lot more seriously and a lot more cautiously. I think there's a lot more probably apprehension in terms of saying, I don't know if I can really use it for something that important.
Starting point is 00:13:35 However, when you talk about, especially when you're talking about driverless vehicle navigation, et cetera, those have some very significant safety considerations. When you're talking about fintech, I mean, there's people people's financial health and i think one thing across the board is around you know data privacy right i don't want my financial data to be exposed any more than i want my health data to be exposed that's totally right and you know back in the days probably 10 years ago when social media and all of these other industries might have not been at the point that they are now, some of the conversations were, well, the healthcare industry industry might be less challenging, might be easier to implement a system and not have to go through regulatory approvals or not having to focus on explainable AI.
Starting point is 00:14:35 But now we can see that, you know, obviously social media is one of those areas that has to be really understood. All of those models have a huge impact on people, on the younger generations, on elections and so on. So there is really no single industry now that's exempt to interesting and challenging conversations on bias, ethics, and so on. Yeah, and I think the thing that you had brought up earlier as well around AI in mental health,
Starting point is 00:15:05 right, also and often ignored, maybe there's more sort of attention now, but we in general tend to think of surgery or decisions being made in the hospital rooms versus things like, you know, using AI to think about better mental health care, right, whether it is talk therapy or medication management, etc. That's also a significant area where I think there could be tremendous impact. One of the things that you spoke about in your TEDx talk as well, Eugenio, was around the prohibitive cost of health care in developing nations and also in developed nations. And one of the things you specifically mentioned
Starting point is 00:15:45 is around the fact that sharing of data in healthcare is not easy. Does that come from, I mean, when I say sharing of data, I don't mean between like, you know, two organizations, but even just between like a provider and a patient or between two different disparate systems that a single hospital is using. I would love for you to sort of expand on that a little bit more. You know, since the talk that you gave, do you think that we are making strides in the right direction? We definitely are, but still, it's one of the greatest challenges of the healthcare industry.
Starting point is 00:16:20 It's what's usually called the interoperability issue. So systems that do not really talk to each other they should talk to each other but they do not really do so and as i said even within the same hospital or organization across different maybe your specialties in medicine of same hospital you still have systems that don't talk to each other and so this creates a lot of issues on the patient and i'll say this probably started because there is no patient centric person centric approach in terms of data sharing or at least there's not been up until now and this is true for every single country, developing or developed country,
Starting point is 00:17:09 really have a big issue in terms of sharing data. And obviously for a person like me that has lived in a few different countries, that's an even worse situation. But even for people just that have been moving across cities, you know, in the US or in any other country, they still face this issue where the records that are about their health, they obviously should always be the same.
Starting point is 00:17:31 Health is still the same. The history of their diagnosis, procedures, and so on still stays the same. But it's not really the case in terms of data. And so this creates a lot of issues, obviously clear issues on the AI machine learning side. Models are not able to better understand what's the real situation of a person. And so they might create predictions
Starting point is 00:17:55 or to take decisions on missing information. But at the same time, also patients themselves, if they do not really have the information available, they cannot share it with doctors in an easy way. And so that creates a lot of issues in terms of diagnosing and using the whole available information to actually do the diagnosis and take the right decisions. And so sharing data is not just about, you know, administrative tasks or reimbursements or insurance claims.
Starting point is 00:18:31 It's more about having a complete picture of a patient so that all of the parties involved, like the doctors, the nurses, and so on, can really take and make the best decisions for the patient with all of the information available? Got it. Yeah, no, I understand that. But I wonder, Eugenio, especially when it comes to patients, and if they're not that
Starting point is 00:18:53 sort of maybe familiar with what data is being captured about them or how to share this data, I'm wondering how much of this problem is that, in that there is no simple way to capture the data, forget about sharing it. Yeah, I'd say that's definitely a good point. But to some extent, it's also a matter of, you know, how we can create better processes for this. Because you're definitely right in saying the data collection process is challenging. We've got still a lot of doctors across the world that capture this information on paper. And so that's definitely a big issue. And so sharing comes after and acquiring data and understanding the importance of it comes first and i'd say that's also sometimes one of the sort of cultural issues with you know doctors having to prioritize the care of a patient versus
Starting point is 00:19:53 acquiring data correctly and so obviously that's the right decision you want a doctor to focus firstly on the patient and their health but at same time, you also want to create a system that allows a patient to do that while capturing information correctly. And ideally, that's going to then lead to better decision-making, better systems, more data sharing, and also more artificial intelligence
Starting point is 00:20:20 that can help with the whole decision-making process. Yeah, no, I think, I mean, there's just so much scope for better solutions to be built in that space. I think, you know, it's a great area for anybody who wants to explore that. How much of this problem, Eugenio, is a trust issue? It's, I know that I can share the data and I know how to, but I'm not comfortable with it because I don't know what is going to be done with it. That's right.
Starting point is 00:20:47 I'd say trust is probably the key word in here. And everything revolves around trust. You have data sharing, which is a matter of trusting the different parties, trusting who you're sharing the data with, but also the whole process in between. And I was actually speaking at a panel on privacy-enhancing technologies, so on technologies that can help doctors and hospitals share data better.
Starting point is 00:21:16 And one of the key insights there was that we can have the best technologies to share data, to do all of these processes, but if we don't have a trust, and there is not trust built between parties, across parties, then no technology is going to solve the issue. And so I would say trust is really key. And sometimes we have to really invest
Starting point is 00:21:40 in those areas to then speed up all of the process of data sharing and artificial intelligence. And that's one of the key areas in AI as well. A lot of investments are going towards explainable AI, private model training. All these technologies are not done just to improve performance, but also to ensure that privacy is in place and doctors can trust the whole process related with artificial intelligence. ACM ByteCast is available on Apple Podcasts, Google Podcasts, Podbean, Spotify, Stitcher, and TuneIn.
Starting point is 00:22:24 If you're enjoying this episode, please subscribe and leave us a review on your favorite platform. Yeah, no, that sounds like the absolute right place to begin as well. And it's a little bit of a, I guess, a chicken and egg problem, right? I mean, if you show solutions that work, you show solutions that protect privacy and are yet effective in helping doctors do their job better. Similarly, if patients that see better outcomes, there's probably a continued amount of trust that gets built. Have you seen anything specific done or by specific, you know, I'm trying to understand who are the sort of responsible parties who can really make
Starting point is 00:23:03 an impact here? Would it be researchers and practitioners in the computing field? Or would it be like, say, evangelists, like doctors who we are able to sort of talk with and help them see the benefits of this technology and then them being sort of evangelists for this in their communities? How do you see this sort of getting better? I would say both parties. So the clinical community, which are the domain knowledge experts,
Starting point is 00:23:30 and the technical community, so the ones that develop the tools. And that's why probably there's been a bit of slowness, a bit of an issue so far, is that these two communities, clinicians basically, and let's say data scientists, speak two different languages and have two different sets of priorities. Doctors obviously want to focus on people,
Starting point is 00:23:53 the qualitative aspects on patients' health, as it should be, and data scientists are focused on technicalities and model performance and so on. And so I would say over the years, there has been progress that has been made in trying to understand the two different sides of the same coin. Something like AI and healthcare,
Starting point is 00:24:17 data science and healthcare is not just technicalities. It's about people and vice versa. It's not just about patients' outcomes and their health, but also acquiring the data and using that data. And so on the technical side, it's key to invest and implement new technologies that leverage some of the explainability techniques and some of the also bias and fairness method fairness methodologies so we want to focus on
Starting point is 00:24:48 technologies which are more about how can we make models simpler so to say more understandable more explainable so that we can show them to doctors and explain to them how they're working so going away to some extent from black box technologies to gain the trust of the doctors. And at the same time, on the doctor's side and on the clinician's side, there's an important aspect of having to sort of understand the value of some of these technologies, some of these models that are not going to be there
Starting point is 00:25:20 to replace their job or perform diagnosis without the doctor's oversight. But actually to understand that these are tools that are going to be enhancing their work, and they're also going to be taking over some of the administrative tasks, some of the burdens of the healthcare community. So it's a sort of dialogue between these two parties, and each one of them has to do some work to improve in their respective area and get to some sort of common ground. Yeah, no, absolutely. I think you hit the nail on the head. It's that
Starting point is 00:25:57 dialogue, it's that bringing together of those communities, each of them experts in their own areas, and trying to see where the intersection happens. I do have to ask you, Eugenio, I know you've spoken about this as well in the past. In the absence of, say, real data or scenarios that we haven't seen before, there is a concept of using synthetic data. And I know you've spoken about this in the past as well. What is your opinion on, I mean, one, if you could explain what synthetic data means and what do you think of it? Do you think it's useful? Do you think it could be useful in some situations? So synthetic data is data that's generated artificially through algorithms, through methodologies,
Starting point is 00:26:41 not acquired through real life scenarios, through real people. And so I'm not personally a huge fan of synthetic data. I know that there are new technologies now that can create synthetic data even better than it's ever been done, but I'm not a huge
Starting point is 00:26:58 fan because usually synthetic data is something that is generated through algorithms by data scientists, not acquired in real-life scenarios. And so what happens here is that you usually have a very small amount of data, which is usually what's captured through real-life scenarios. And then it's expanded through synthetic data technologies.
Starting point is 00:27:23 So you start with a small amount of data and you try to expand it and generate more samples so that you can have a bigger data cell, a bigger population. And obviously, I would say everyone can understand that if you have a very small population, you've got
Starting point is 00:27:39 a very small subset of what's the total population. An algorithm cannot really understand all of the possibilities of a dataset. And this becomes extremely challenging. If you think about, let's say, a patient population, we are going to capture a very small percentage of a population. This might not capture the whole broad aspect of possible diseases, possible procedures, but even more so all of the possible diseases, possible procedures,
Starting point is 00:28:08 but even more so all of the possible demographics, so to say. We might be focusing on only people that have a higher net worth than the average. And we might be, because of data acquisition limitations, not be focusing on people that are from lower income backgrounds. And so synthetic data sometimes tends to perpetrate and continue some of the discriminations and some of the bias that we can see in the real life scenarios and expands it even further because it then creates even more data.
Starting point is 00:28:39 So I think it's obviously great technology. It tends to be the great technology, in my opinion, for theoretical situations, maybe a bit more on the research side. And when you have real-life scenarios, it becomes relatively challenging, and especially on the bias and fairness side. Yeah, no, I see the point that you're making. The inherent biases that we have in our data collection will only be exacerbated when you use it to generate
Starting point is 00:29:06 more data. The last point that you made around it maybe being used more in the research side than in sort of the, you know, the applied side. I'm wondering though, the data that's used in the research side is what is sort of in general generating these models for us, right? So is there a risk to using it there? Also, are there certain, you know, fields or domains that might be more, you know, accepting of synthetic data or it could not be as maybe harmful? I mean, bias in data is, I think, harmful no matter where, but I was just curious as to the now prolification of synthetic data and like what people find beneficial in it. Yeah, there are definitely risks. And I would say this also ties back
Starting point is 00:29:53 to what we were saying before. There might have been in the past some applications that were not so impactful on the people, on the outcomes you on the outcomes. You can think of, I don't know, video apps, where if you recommend a specific video to a person, well, if video is not exactly the correct one, it doesn't really matter.
Starting point is 00:30:17 I can scroll past that. And so you can think of social media applications and anything like that. So I would say that maybe in the past could have been a good example of an application where synthetic data and all of these applications might have been less impactful if wrong if we look into false positives and false negatives so would have not been too big of a deal having used maybe synthetic data incorrectly or some of the algorithms but let's say now there is not really an application that i can think of that's not going to have a pretty important impact on the user population and you know even if we think now about social media as we said
Starting point is 00:31:02 it's something that now can really have huge impacts on a lot of big decisions all over the world so i would say synthetic data still remains in my opinion something that has to be really considered and really go through a thorough process of bias evaluation fairness evaluation before being Obviously, it's a great technology. It's just that it's something that has to be vetted and not just used blindly. I would say sometimes that's what happens in the technical areas.
Starting point is 00:31:35 Sometimes you might have technical experts that might be experts on the algorithms that create this synthetic data, but they might be lacking the domain knowledge, the domain expertise of, let's say, a doctor that knows that that synthetic data might not be a great representation
Starting point is 00:31:54 of our population. It might be missing someone in terms of ethnicity, income, and all different other demographics. And so they would know that that could lead to outsized negative impacts. Yeah, no, that's a very, very relevant and, you know, an example that really sort of hits the point home.
Starting point is 00:32:16 Thank you for sharing that. So to change gears a little bit, Eugenio, I know you a student, as a researcher, you've been in sort of smaller academic type of settings where you have the luxury of sort of moving fast and going from, say, ideation to implementation with very limited friction. And then from there on, you're now working for a Fortune 5 company. And while you're still in i guess in your capacity as a leading research how have you had to change your sort of working style and and expectations sort of what's your philosophy as you know working as as i guess as an entrepreneur now well that's a good question because i definitely have to say that throughout all of these different experiences one commonality one of the things that
Starting point is 00:33:06 you learn is that you always have to adapt your skills and your attitude to the situation and even though I might be doing data science and AI in all of these different areas so from academia to a big corporation to startups there are still a lot of different aspects that change a lot of the day-to-day. And so one of the areas that change a lot, going from academia, from research, and being a student or a grad student to working for a big company
Starting point is 00:33:38 is definitely the level of understanding of the stakeholder environment. So obviously the academic environment is much more about the technical components. How can we push the state of the art as much as we can? How can we develop tools and technologies that might not be immediately helpful? They might not be immediately answering a question, but we have this belief and hope that they will at some point. So it's much more about how can we develop better tools and technologies with the most state-of-the-art technologies.
Starting point is 00:34:15 Well, working for a big corporation, especially with a big company, it's actually quite the opposite. It's how can we dump things down to something that can be actually put into production? It's not about trying to build the most complex and innovative solution. It's rather about how can we find the needs and the priorities of all of our stakeholders and try to understand how we can really make the simplest solution that can answer those questions. Because of the fact that the more complex the solution, the more difficult it is to implement it, to manage it,
Starting point is 00:34:55 to implement it in production, and also to maintain it over time, taking into account all of the different other issues of ethics, bias, and so on. And so definitely a lot of differences and much more on the people side and much more on the trying to make things simpler, maybe linear regression or logistic regression, government deep learning, to try and get things done that can help the stakeholders and all the users.
Starting point is 00:35:23 And I think that point can't be sort of emphasized enough because I feel like even as, I mean, I come from an engineering background, but even as engineers, you oftentimes want to build something that's exciting. You want to, because that's what you do. You build things, right? But I think keeping in mind, who are your users? How do you keep this most simple? Not just so, you know, your end user is
Starting point is 00:35:46 one thing, but all of the various functions that you interact with in a large corporation and trying to sort of, you know, take them along on that journey of adoption of the solution, I think is so critical for the success of any solution. Yeah. And that's also one of the biggest issues and challenges i see especially on the more junior data scientists like we're all so passionate about building models do hyperparameters tuning and play with machine learning and some of the latest technologies could be charge apt could be anything else but sometimes we have to really stop ourselves from going down those paths and stop to understand some of the requirements and some of the needs and actually push ourselves
Starting point is 00:36:32 towards doing simpler solutions rather than the more complex ones. So with more seniority also comes a lot of understanding that sometimes we really have to stop ourselves from going maybe the deep learning route and stop at the linear regression, logistic regression side, because that's enough, gets the job done, and then we can iterate later on. Got it, yeah. And although I have to ask, Eugenio,
Starting point is 00:36:58 because this is a question that I often get asked as well, is from moving from being sort of hands-on researcher and data scientist, I don't know how much of your time today is spent on those activities, to, you know, managing and leading teams. How do you find a sense of accomplishment? I mean, oftentimes, when you're building something yourself, you know, you can see the outcome of your efforts, versus in situations like when you're leading a team, the collective result of the team's work is what you have to show. So have you thought about that?
Starting point is 00:37:30 Has that something that's crossed your mind at all? Yeah, it definitely has. And it's one of those things, as we said before, that change a lot in the same way that going from academia to a big company, you have to adapt a lot in the same way. If you go from an individual contributor to a people manager, you have to adapt a lot in the same way. If you go from an individual contributor to a people manager,
Starting point is 00:37:48 things change a lot. And I would say right now, success to me is more about seeing in other members, in other people in my team, success on themselves. So seeing actually the developments that they do and see how they can achieve the
Starting point is 00:38:08 requests the requirements and put things into production effectively that's to some extent indirectly i feel it's also a success to some extent on my part and i really value a lot when communication flows effectively and when people work seamlessly to some extent. So I'd say that's now a bit more of the definition of success to me, or at least what makes me feel that, yes, something has been done actually well compared to when before it was more about performance metrics of a model. I love that. Easy flow of communication. Do you have an effective way of measuring that? Because I think that that is super crucial.
Starting point is 00:38:51 And I think that that would be a great sense of accomplishment for anybody who's leading a team. Yeah, it's a difficult one. I would say the number of miscommunications, so to say. So the number of times that at least I was expecting something and that actually came across like that. And also with stakeholders, the number of times that they request something, request something on time, and actually the solution answers their question.
Starting point is 00:39:18 But it's one of the challenges of going from technicalities and being an individual contributor to a people manager that's all much more qualitative rather than quantitative. going from technicalities and being an individual contributor to a people manager, that's all much more qualitative rather than quantitative. And so it's one of the big challenges, especially for data scientists, trying to navigate success in a very qualitative world of people management, as they call it, management. For sure. Yeah, no, thank you. That's a great way of looking at it. With all these problems that we're trying to solve using data, Eugenio, there is a burgeoning sort of need for data scientists, right? I see a lot of colleges and universities that are offering a data science program now, in addition to, say, a computer science or a computer engineering program in undergraduate education. What are your thoughts on some of these programs, how they are structured, and how can one make the most use of it if somebody is trying to get
Starting point is 00:40:10 into the field? So first of all, I'm a big proponent of just communication and education in data science and AI broadly. So regardless of university-based courses, even just free courses, I'm a big proponent of trying to share the knowledge of these topics as much as possible. One of the caveats that I see oftentimes is that people that only rely on technical courses miss probably the most important lessons there. So the importance of understanding the requirements from a stakeholder, understanding the needs of a user, of a person. And so I wish that universities and also just for the open source courses
Starting point is 00:40:56 focus more on real-life applications of data science and how much it's not just about developing a model or working on feature engineering. That's actually a lot about trying to better understand the needs of a client or stakeholder and translating that into something that can get the job done. And at the same time, also the bias and ethics applications of AI data science. I feel that a lot of times data scientists are trained
Starting point is 00:41:27 on the algorithms, on the technicalities, but not on the possible real-life implications of what they do. And so oftentimes there's a bit of a disconnect with the domain knowledge experts. So there's often a disconnect between clinicians maybe and data scientists. And so having more courses or more focus
Starting point is 00:41:51 on all of the real-life applications or real-life domain applications, I feel would help a lot in going for also the career and experience of a data scientist. Yeah, no, that's excellent advice. In the absence of the courses being sort of modified to include more real-life applications, do you have any suggestions for students
Starting point is 00:42:15 on how could they, in their own time, pursue a better sort of understanding of customer challenges or user challenges and build solutions that are effective? It's a difficult one, but I would say to try and work on side projects that come not from technical experts, maybe, you know, professors on technical topics, but rather side projects that come from domain knowledge experts. So it might be asking the local
Starting point is 00:42:48 doctor for ways that they can help them through data science, through technology, improve some of their needs. I think that in this way you can understand a lot of the cultural components, a lot of the real needs of domain experts. And
Starting point is 00:43:04 this is a fantastic lesson for data scientists that it's not just about the technicalities, it's not just about the algorithms and the data, so to say. It's actually about the real-life problems and everything that's around the world of AI and data science. That's such pertinent advice, Eugenio. Thank you for sharing that. It's a very, very great nugget of information for our listeners who want to sort of break into the field of data science, especially in the healthcare domain. So for our
Starting point is 00:43:39 final bite, I'd love to hear from you sort of what are the large problems that you're trying to solve? Is it part of your current role or, you know, in general in the field of healthcare and AI, what are you most excited about over the next few years? So I'm definitely excited about the ability to use AI and machine learning to try and predict and so prevent diseases. I feel that AI and data science now has the ability to shift the concept of medicine and healthcare from something which is more about curing and a more reactive approach to something which is about preventing diseases before they even happen. So I think that's going to be completely shifting how we live life and how we approach also going to a doctor. It's going to be more about prevention rather than curing something.
Starting point is 00:44:31 And I'm also particularly interested in everything that's about personalization. So trying to use, again, machine learning and all of these technologies to try and understand what's the best way to have positive impacts and positive outcomes on a person thanks to all of the past data that we have. So given all of these analysis and all of these models, we can better understand what could be the best ways to intervene on a specific person. And the third and probably the single most interesting thing right now, not just for healthcare, but in general, is how we can use all of these technologies like charge UPT
Starting point is 00:45:11 and large language models in the healthcare sector, taking into account the possible issues with obviously having such powerful models dealing with such challenging sector, but also all the possible ways we can use, for instance, chat GPT or large language models to release a lot of the burden on the administrative tasks on doctors. So I see, for instance, very soon chat GPT being able to deal with a lot of the admin tasks from doctors.
Starting point is 00:45:46 So when they enter information on a patient and things like that to allow the doctor to focus more on the health of a patient. So definitely a lot more to come in that area. Yeah, I love the vision that you're painting, Eugenio, both in the healthcare industry and across. You know, this has been really very fascinating. I've enjoyed conversing with you. Thank you so much for taking the time
Starting point is 00:46:12 to speak with us at ACM ByteCast. Thank you for having me. It's been a pleasure. ACM ByteCast is a production of the Association for Computing Machinery's Practitioners Board. To learn more about ACM and its activities, visit acm.org. For more information about this and other episodes,
Starting point is 00:46:34 please visit our website at learning.acm.org slash bytecast. That's learning.acm.org slash b-y-T-E-C-A-S-T.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.