Software Misadventures - Automating away your job as a Data Scientist | Melissa Runfeldt (Salesforce, CueIn)

Starting point is 00:00:00 I've been constantly aware of the irony that my entire career has been focused on automating my job away. Versus Salesforce, you know, doing automated machine learning, we were automating everything that a data scientist could do. So feature engineering, sanity checking, everything that goes on to like training a model and hosting it for predictions was completely automated. And now at QN, like my job as a data scientist continues to shift and turn more and more into something where like even I get to experience the job uncertainty. Like, wait a minute,

Starting point is 00:00:37 is my boss going to replace me with a model? As long as they hallucinate, they won't. Yeah, just dumping LSD into the model being like, do the work. You always want a human in the loop. And that is something I've been discovering more now working with companies and having to spend so much time on really getting good labels and also being able to trans what the customer actually wants into labels and into a model. And that's even where I feel like my career is going. It's towards product and kind of being the human that's translating the other human to the machine. Welcome to the Software Misadventures podcast. We are your hosts, Ronak and Guan.

Starting point is 00:01:26 As engineers, we are interested in not just the technologies, but the people and the stories behind them. So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors to chat about their path, lessons they have learned, and of course, the misadventures along the way. Hi everyone, this is Guang. In this episode, we're chatting with Melissa about her unusual career journey in data science. We'll explore the lessons she's learned from automating away her jobs, transitioning from a lead data scientist at Salesforce to her current role focusing on chatbot observability at QN.

Starting point is 00:02:02 Join us as we delve into Melissa's unique journey, what to do in the face of increasing job automation, and explore the latest developments in practical AI. Hi, Melissa. Okay, jumping right into it. So you have a PhD in computational neuroscience, and you studied single-cell electrophysiology and two-photon calcium imaging in acute slice preparations that involves delicately removing mouse brains.

Starting point is 00:02:36 Tell us more about that. I sounded very intrigued. So, yeah, I did my PhD at University of Chicago, focusing on computational neuroscience. And I was studying information coding in the neocortex. So the neocortex is the part of the brain that I think is more exciting. Like when you think of, just think of a human, you have all of these senses, all of the sensory information that you're processing and receiving. So your sensory information comes in through your eyes and your ears and your nose and your skin, and it gets relayed to the neocortex through a structure called the thalamus. And the pathway leading to the neocortex is fairly structured

Starting point is 00:03:17 in the sense of the responses of the neurons. It's fairly easy. It's very parallel to the stimulus itself. So it's pretty straightforward to decode what a stimulus is from the neural activity and these kind of earlier brain structures. But once it gets to the neocortex, it's where a lot of the interesting computation happens. And that's where all of our thinking happens, right? So that's where we're not only flatly trying to perceive what's there, but we're perceiving it in a way that makes sense to us, in a way that can be incorporated with your other sensory perceptions and then ultimately is going to lead to a thought or an action or a motor output. So we were really focused on understanding how groups of neurons in the neocortex encode information. And, you know, all data science is like the data that you're looking

Starting point is 00:04:12 at kind of changes how you view the system. So we had new technology, this is two-photon calcium imaging that let us look at an entire field of neurons in relative real time. So you could see the action potential activity of neurons and you can see where they are in space. So that really allowed us to move beyond like traditional neural encoding methods where you're just looking at one single neuron spiking. And instead we could look at the entire group of neurons and relate their activity to what the stimulus is, to like what computations are being performed and how those groups of neurons are representing the information. So I was in a pitch dark room because it was like imaging, right, two photon with a big laser, pitch black. And like my lab, we love like listening to like hardcore metal or like death metal the whole time.

Starting point is 00:05:06 Black hoodie, patching individual neurons with the microscope too it was really fun stuff how do you guys analyze the data because he's going from analog to then digital and then like yeah it depends on the recording method so for for two-photon calcium imaging, the signal that you get is counting photons. Basically, so you're moving a laser beam around with these two mirrors, these gavel mirrors. And because you are moving, you're controlling the mirrors, you know where the laser beam is pointed. So whenever the laser beam hits a certain specific spot in three-dimensional space, you know where the laser beam is pointing. So any photons that you count at that moment in time, those photons are coming from that one specific spot. So the input is coming in, you know, as photon counts, but you're reconstructing that field of view.

Starting point is 00:06:00 And the laser moves really, really fast. So you could cover like a millimeter surface area pretty quickly, like a few seconds. Comparing that to what you do now, how do these two worlds compare or contrast rather? So now I'm on the founding team of a startup called QN. I think that, you know, the most biological aspect is just interacting with humans. That's where all the variability comes from, like understanding what customers want and how like translating that into labels. There are definitely a lot less mishaps in software as opposed to biology by far. I would say that the more complicated that your system gets, the more opportunity there is for stuff to go wrong. Before QAnon, I was with Salesforce, and I worked on a few different teams within

Starting point is 00:06:51 Einstein. Salesforce Einstein was kind of like the catch-all for machine learning at Salesforce. And I worked on the Einstein platform. So we were building a machine learning platform, completely automated, multi-tenant machine learning platform for all of Salesforce. And as that platform grew, there's just more and more components that can go, that can be misaligned or, you know, not work. And that was actually more like science working on that platform because something would go wrong and you would truly not know why. Everyone had their own different expertise and things that they knew. So it's kind of like science in the sense of you have your tools

Starting point is 00:07:30 and you're sampling from these different dimensions to try to understand this root phenomena. But you don't know if you're looking in the right place, if you're using the right tools. There's some uncertainty and some knowledge nuggets that you have to reveal through banging your head against the computer. So you mentioned you're the founding data scientist or the machine learning engineer. I get these titles confused and I don't know what the difference is between them, which is a different topic which we can get into. Oh, yeah. So data science is more

Starting point is 00:08:06 of a catch-all term. It usually involves that you're interacting more directly with the data, maybe working with scripts, prototyping, whereas machine learning engineer is like you're building software to scale up kind of what a data scientist does. I don't know if that's helpful. No, that helps. So what's the founding story of QN? How did that happen to be? So our CEO, his name is Mayuk Bawal, and he was a product manager at Salesforce. And we overlapped sometime at Salesforce on the Einstein platform. So him and his good friend, Vinesh Ganapathy. They both went to Stanford together and were good friends and always wanted to start a company together. Vinyesh was at Google and then later at Uber,

Starting point is 00:08:52 a leading engineering team, so very experienced in the engineering space. Bayouk was a product manager and he was working on Einstein bots. During COVID times, unsurprising, bots got very popular. A lot of companies really needed to like spin up chatbots really, really fast. And Mayuk was with the group that was working with customers and setting up those bots. And he kept on kind of coming to a shortcoming and having the tools available on that platform that give customers both the observability and insight that they wanted into how their bots are performing and how customers were interacting with the bots, as well as like, what can you do to make your bots better? Analytics can give you insight. You say, oh, this is where it's going wrong. But a company, they always you want to fix it. You want to fix it right away. So Mayuk saw that opportunity, investigated the market space more generally, and then him and Vinesh started QN and Kevin joined as a founding data scientist. And it's been about two years since now our team is seven people and we are so busy.

Starting point is 00:10:06 It's really awesome. We have a lot of customers coming in and our use cases are just like growing. We're scaling up. It's a fun time. Like from doing neuroscience to seven years at Salesforce to now co-founding a startup, like you must have some like perceptions about what building a startup is like you must have some like perceptions about what building a startup

Starting point is 00:10:26 is like. Were there any like big surprises that when you joined? Good question. There weren't any big surprises. It was different for sure, but it was different in all the ways that I hoped it would be. I get to like end-to-end solutions from like communicating to the customer, understanding what they want, developing the model and solution, exploring the data, getting a prototype model up for them, showing them the results, making sure they're happy, reaching out to the rest of our team and working with them to get the end result up to the UI and building the infrastructure around that, getting it into the database, getting it to the lambdas, making sure everything's working. And then like showing it to the customers and like doing all of this within like a month's time span, often even faster than doing it all very, very quickly and getting that fast feedback.

Starting point is 00:11:22 So that's like what I was hoping for. That's what I was looking for. I got it. And yeah, it's a lot of work, but it's definitely rewarding. What did the process of making the decision look like? How did the founders of QN reached out to you to join? And then what made you actually take the decision to leave Salesforce and then join QN, which is again, just two years in. So lots of risk of joining a startup, lots of excitement too, lots of adventures. So what did that decision making look like for you? Good question. I think at the time when Mayuk approached me to join QN, I was actually

Starting point is 00:11:58 really happy with my team, both horizontal and vertical. So it was, you know, it was a little bittersweet to leave at that point in time where like things were going well and I just got my promotion and everyone was happy. But like sometimes you get an opportunity and you just have to go for it. For me, like I asked myself, like, what did I regret not doing this? And me usually the answer is yes like I would regret not having this experience and like when I think of career paths I think I was at the point where like management was kind of more of like the next step for me if I really wanted to keep like growing this is capitalism it's all exponential growth you can't just like have one derivative of improvement so I was like do I want to be a manager or do I want to have

Starting point is 00:12:54 this new experience and like I knew it would be like a little bit of a pay cut but it would put me in a place where like moving forward in my career I think I would be doing more of what I wanted to be doing. Like everything's going well at QN, like we're all very excited for like our equity to turn into like real money one day. But the goal isn't really for me to like stop working. It's to be enjoying my job and to be able to have an impact not only on the technology, but on the culture, on people's lives and experiences. For me, the point of getting power is to like make other people's lives better. Like that's what I like to do. I'm like, we can all just be kind and nice and happy and do great, exciting work together.

Starting point is 00:13:40 And that's kind of like what I prioritized and went for and you know it's hard for people to leave a corporation i think everybody gets a little soul crushed in one way or another so it's just time for a change um so speaking of career path when you first started at salesforce did you have like a rough idea of i'll try to climb the ladder and go for the how did you put it no first order derivative speed of growth exponential growth sorry sorry yeah oh my gosh physics oh sorry and maybe another way of putting this is like two people right like who are just starting their career maybe in data science like would you recommend this sort of try to go for the exponential growth but then at the same time kind of keeping an eye

Starting point is 00:14:31 out for the startup opportunities which are more of the unknown right like because you could go really well but you could also just go really horribly wrong which i've got plenty of stories about so like how do you recommend newbies in data science like think about that okay well i'm going to start with your last statement about startups and kind of what i've heard from my network it's either a really positive experience or it's absolutely terrible and traumatizing and like you need better help for like at least two years afterwards and i think in making that decision you really have to talk to the people on the team and really like get a good assessment

Starting point is 00:15:12 of how happy they are and like what their mentality is are they there to grow and learn together as a team or are they just trying to crush something out themselves so for joining a startup, you really have to focus on the people and the culture and make sure that you don't know if it's gonna succeed or not. The gods of stochasticity will decide that for you. The only thing you're guaranteed to get from it

Starting point is 00:15:37 is the experience that you get. So are you going to be doing stuff where you're growing technology-wise or growing new skillsets? And this is the case for any job, right? You focus on what skill sets you want to grow, and then also ask yourself, is this an environment where I can grow? When I joined Salesforce, I really wanted to grow as a software engineer. Like my background was neuro. So I was really like comfortable with reading manuscripts, like the foundation and mathematics and dynamics and optimization.

Starting point is 00:16:12 And so I felt comfortable in the ML space and in kind of like the research space at the time. But like I've been programming in MATLAB for like seven years and I don't even know if you should put that on your resume sometime. I empathize with that. It's so beautiful, though. I really liked it. I had a lot of fun with MATLAB.

Starting point is 00:16:36 I love building GUIs. So easy. Yeah, so easy. Parallel processing. You ran out of RAM so much. Anyways, I digress. So I just kind of taught myself Python at the tail end of my postdoc and preparing for data science. But I was in San Francisco at the time. And I thought, like, this is the time and place to learn how to be a software engineer. And this is like, especially with the team I was with, that was

Starting point is 00:17:03 building a machine learning platform, which was a very new thing at the time. I saw that as an opportunity to gain knowledge that wasn't knowledge you're going to get from a textbook or even maybe a Coursera. So that's kind of like the muscle that I wanted to grow. And I'm glad I made that investment. Of course, it does get to a certain point where you're like, okay, I miss data. I miss architecture and modeling too. But yeah, focus on where you want to grow. And going back to the startups, if you were given two choices to join startup A, which has really, really fascinating tech, but then like the people aspect, it's like, you know, it's like you get along with people like it's versus startup B. But the tech is maybe mediocre, but then the people are amazing.

Starting point is 00:17:48 Which one would you pick? I think it really depends on where you are in your life and what's valuable. I think there are different times in life where you're like, I got energy. I'm going to conquer. I don't care about all that soft stuff and feelings. I'm just going to power it through. And there can be value in that until you're so miserable that soft stuff and feelings. I'm just going to power it through. And there can be value in that until you're so miserable that like nothing's effective. But up until then,

Starting point is 00:18:10 you can really grow just by like focusing on some technology that you're really passionate about. And there are different times in your life where like it's really important to just be stable and to be able to work 40 hours a week and not have extra stress. So, you know, it kind of depends on where you are. I think having children is a big component, even though I have a three and a five-year-old and I'm at a startup. But, you know, like... Was that part of the calculation for you for joining a startup?

Starting point is 00:18:41 Is like, if you like, it's you've got a hang of the small human being raising process that it's like okay now i can take some no i'm totally contradicting myself life is crazy but you know i looked out with this startup because like i was excited about the tech it was totally like my space and i knew the people, or I knew some of the people, and I knew that they were really smart and really focused and not petty or anything like that. And then, so I was able to get both. So that's kind of why I went in that direction. Also, a lot of people on our team have young children. So I'm not the only one whose meetings get crashed by little bodies. It's pretty common, but that also helps. So you mentioned QN is in the space of customer experience transformation, which includes like

Starting point is 00:19:35 chatbot observability. Can you talk more about this problem space and what kind of products QN builds? Yeah, so more broadly, it's customer experience transformation. But I think you'll understand better once I describe kind of like our tech stack and process. So we pool data in, we build connectors to any vendor that a customer is using.

Starting point is 00:19:58 Let's pause for a second. Let's take it for many people who might not know what customer experience here means or like the vendors that you work with. Would you mind just giving a brief description of what these terms mean? Yeah, so customer experience is kind of like the evolution of customer support. I think customer support is kind of one component of it, but these are groups that are focused on the customer and the experience they have interacting with a company. So that includes communicating through a chatbot.

Starting point is 00:20:32 It includes communicating with live agents through telephone conversations or through chat conversations, as well as email and asynchronous ticket like ticket based conversations. So the typical use cases, you have a retail company and people are reaching out to the retail company saying like, I refund, I want to return, this is too big, this is too small, this broke, what are your promotions, stuff like that. Most companies have some form of customer support because they have a product and customers are using it and things break. So yeah, our goal is really to improve customer experience. And right now, really what companies are focusing on is we all have to do more with less. So everyone has less people, you have less engineers than you used to. You have less human agents that you can pay for.

Starting point is 00:21:27 So everyone wants to automate more of the customer experience process, but they want to keep customer satisfaction, CSAT, NPS, which is like company net promoter score. They want to keep these metrics up while automating more of that customer experience. And they also want observability. And this is really where we come in is we connect all of the different channels and we offer like observability. So you can see the entire customer journey from an individual customer to an aggregate, which is really what we focus on, like grouping all of these different conversations and aggregate.

Starting point is 00:22:09 So you can see where things are going wrong and what agents are saying. So the use cases are pretty large and expansive. Our canonical use case is chatbot observability and optimization, like identifying where your chatbot is getting confused, where your customers are getting frustrated. So we identify those like missed intents when you have like a traditional chatbot system is intent based. So basically the bot has a concept of a list of intents and those intents are mapped to a response. So the bot's job is to customer type something, you identify what the intent is, and then you provide a response. So you can imagine if the intent isn't already there in the chatbot system, it gets confused, either it misclassifies or it has like a, I'm sorry, I'm confused response. And so we can identify those and we do unsupervised learning.

Starting point is 00:23:01 So we identify new intents. And then we look at the agent side of the conversation, because when a chatbot fails, it will usually go, it'll get escalated to a living human agent. And we look at all those different agent responses, we summarize them, we aggregate them. So we can say, okay, in export, both of those back to the bot. So we could say, here's a new intent. This is what customers are asking for. This is how your agents are responding. And they can look through the different kind of aggregated and summarized agent responses

Starting point is 00:23:35 to choose the appropriate one that they want to use. And then you update your bot with that. So then you have a new response for the bot as well. That's super interesting because observability in the machine learning context right usually is to help you build a better model right in terms of you have better labels and you clean it up but then in this case it can be like very different things right like a large language model to something that's like handcrafted by someone like sitting on the product side trying to just figure out hey these are like the branches you want to draw is that like

Starting point is 00:24:08 something that you guys experienced how do you go about bridge between sort of all those different like types of levels of complexity i guess yeah okay i think i get that so you said a few things one you mentioned you know using llms and bots and that is a hot topic so right now i think where most of the industry is at is not letting a generative bot loose on their customers that's something that people are quite apprehensive of it's also something that we you know are moving forward towards but like you you have to set in a lot of guardrails and kind of like, make sure that you have control over the responses, you're not giving misinformation or anything like that. So we as an industry do move more towards generative bots, like the need

Starting point is 00:25:00 for observability is even greater, because like, you probably don't want to release your new generative bot on all of your customers. You want to start, you know, A-B testing with a small group. You want to observe what the, and record what those questions are and what those bot responses are so that you can continuously build confidence in what you've built and then, you know, expand it for more customers. In this case, when you're looking at a bunch of this data coming from different models, and let's say in this specific case,

Starting point is 00:25:30 it's just generative model on some slice of customers, other chatbots on other slice of customers. You mentioned you do unsupervised learning. I don't understand much about it, but what I understand is labeling data is helpful to recognize patterns and train your models. Is that something that's done in this space or you do or is it all like let the data from the data figure out what makes sense? is much more founded in traditional natural language understanding. When we analyze conversations, we do some supervised learning. So we categorize the conversation, the different components. And by categorizing the utterances, that helps us simplify what a conversation kind

Starting point is 00:26:20 of looks like so we can aggregate. And in that case, we supervise learning. So we do labeling. We do labeling internally and work with a team of labelers to make sure that we get these labels correct. And then the unsupervised part that kind of, unsupervised, it's like clustering. Let's say you've got a bunch of utterances and you embed them using some embedding model like BERT. And then you're trying to like figure out what are these different groups. And it's always been a fuzzy process. You have different domains, you have different data sets. So you can, you can kind of get an idea of like what is a good parameter space for you, but there's very much a human in the loop component of it. And when we do that human in the loop component,

Starting point is 00:27:11 I mean, it's kind of fun. Like you iterate locally, you try different parameters, but ultimately like I really try to make sure that I get the final result up in the UI, not just like looking at spreadsheets, but like, I want to see the decisions that I'm making with the machine learning models. I want to see how this impacts what the customer is actually seeing and then like what value the customer is getting from it. So I think that in general is always important to have that human in the loop because we also

Starting point is 00:27:42 get feedback from our customers too. Like we're a really agile team loop because we also get feedback from our customers too like we're a really agile team so like we get feedback from them we change it right away um so yeah when you're especially working in the unsupervised space and see there's always going to be a human in the loop somewhere and that's also something kind of we're working with companies in where we're like proposing new tags or new labels and but they're they have a human in the loop that's like approving them or slightly modifying them so that it's you know kind of more consistent with the current taxonomy they have or something and the feedback loop here is that you made the changes to the system uh for the customer their c-sAT or NPS goes up and that tells you that the system's working? Or is there a different feedback loop? Yeah, that's like the big loop. That's the big slow

Starting point is 00:28:33 loop. The faster loop is taking their data and they say, okay, here's all of our conversations. And then we wave our machine learning magic wand and we put it up in the UI and it's all organized in a way that's supposed to be one and we put it up in the ui and it's all organized in a way that's supposed to be actionable and insightful for them and they tell us what's useful like what looks right uh what is useful what they'd like to see more of taking in all of the knowledge and expertise that they have of their their system and their business um and giving that feedback so that we can adjust it accordingly is this uh online or is it like offline sort of batches like they give you guys the data and then you guys do sort of the analytics or this is all happening

Starting point is 00:29:15 within the platform kind of real time uh right now most of it is a batch meaning that like customers don't need to know like minute to minute what their c-sat score is um yeah so we do a batch you know everything's productionized so in the sense of like we have an automated incremental pipeline so once we've made all of our modeling decisions you know and trained our models then the incremental pipeline is running on its own. We do have one real-time feature, which is an Answers API. I think this is something a lot of people in the space are familiar with. A company has a bunch of knowledge articles or like all of their knowledge base, and they want to be able to just ask a question and then get a generated response from that knowledge base. So, you know, this is this is i think the direction people are thinking of going with bots that's kind of

Starting point is 00:30:09 where you want to go with bots where you have your knowledge articles and as long as you're updating them appropriately you want someone to ask any question from the knowledge article and get a good response um so that's something we're doing right now. It's not being used for bots yet. We're using it for agents. So human agents, especially a new agent or really any of them can just type in a question

Starting point is 00:30:33 and get an answer. And that's a hosted API. I see. Like tying back to earlier where you said about LLM is way harder to implement as a chatbot for specific use cases for companies

Starting point is 00:30:44 because of the guardrails that you need like do you see like a parallel between like now even though the ideal would be something that's very flexible that's lm based to but when you are like looking at in practice it's all like very much decision trees like comparing that to back in 2008 or like when hadoop and big data first became a thing where like everybody's like talking about big data just like now right everybody's talking about lm but then you look out of the hood like uh nobody's actually like doing it and i was wondering like do you see some like parallels of back then maybe it was like the complexity of infrastructure setup that's sort of preventing people like what do you think that is

Starting point is 00:31:25 now that kind of prevents lom adoption in like chatbots is it just like guardrails yeah that's a good question i mean that's always the case it's so funny that you mentioned big data because nobody uses the that term anymore it's just data it's all big it's all big. It's all big. Yeah, that's always the case. You know, there's some new technology, you know, it used to come from research. Now maybe it came from a company and it does something that people weren't able to do before. But there are some kinks to be worked out and some hesitation and caution. And the more like a company has to lose, the more risk adverse they're going to be and the slower they're going to, the more time they're going to take and the more guardrails they're going to set up. I think, you know, in the LLM space,

Starting point is 00:32:18 it's just going to keep moving forward. Like I would be really surprised if in 10 years, like everyone wasn't using generative bots i'd be like that's weird what happened um yeah so i think it's just a matter of taking time and building trust like you know with any new technology you have to build that trust you have to understand understand how it works and if you don't understand it at least see that it performs consistently enough where you're like it's like physics it just works and so there is a case of using generative llms when it comes to chatbot itself you also see a lot of use cases these days like people putting

Starting point is 00:32:58 a bunch of small apis on top of something like chat gpt or other models where upload an article or transcript of a conversation and we'll summarize it for you or you can ask questions back and forth do you see opportunities of using llms in the observability side itself um where you have all this data you have all these transcripts would you have an llm summarize it and then work off of that or that's just like no don't do that because you can't trust the system enough to rely on it for observability.

Starting point is 00:33:31 Oh, we totally do. We totally use LLMs and we usually use them all the time. We host our own, you know, we're fortunate enough to have, I was going to say we're fortunate enough to have some amazing data scientists, but I realized being one of the three of those that like... You can still say that.

Starting point is 00:33:49 Kevin did most of the work in the recent LLMs. Kevin Moore and then Jasmine Wong, who's one of our newer hires, has been working on our LLMs. But yeah, we fine tune our own LLMs and host them. Different companies have different policies on where they want their data to go. Like some companies are okay with using open AI endpoints for like exploration and labeling. For me, I'm always worried about some other company hosting a live service, especially like I get a lot of emails from open ai saying something's down and degraded like i don't want i don't want my customers to have to suffer from that but yeah we totally use it um i think they're really great and like early exploration and like you said summarizing like by summarizing text data like it kind of works as a normalization where like it's making the language more similar so that makes it easier to cluster and guardrails have to be in place in general what we find is hallucinations or what we call factual nonsense which is a hallucination that sounds very reasonable uh i guess you'd call it like

Starting point is 00:35:05 i i i forgot who said it but it's like llms are bullshitters or professional bullshitters of sorts professional bullshitters yeah you gotta put your guardrails in place and like it depends on what the the problem space is you're working on i think as soon as you've refined your problem space and you're working on one specific thing, it's pretty quick to build like confidence and like the responses are being generated and catching where there are errors and how you can fix that, which is usually providing more information or like better input data. What do guardrails look like in this case? Because it's not, it's just internal, right? So you don't have to worry about prom injection and things like that. So what do you have to do for the guardrails?

Starting point is 00:35:50 That's a lot of like, you have a script, you've got your prompts, you're generating results, and then you look at the results. Like there's very much like a data scientist in the loop kind of reviewing things. You know, of course, you're writing scripts, so you're like grouping things and you're like, oh, that looks weird. And you dig into it. It was pretty easy to build like a hallucination detection, like classification model for one use case I was working on where it was just like a very specific use case.

Starting point is 00:36:23 And it was really obvious in the response that there were hallucinations happening so it was you know i got like 100 examples together and trained like a simple classifier and then hallucination detection but it's easier to do the more constrained your problem space yeah i think just really like nodding back to like observability like if you have observability that is your guardrail where you can identify really quickly when something stands out when something's an anomaly and then investigate how to fix that just like an e-bug and can you uh share with us like a little bit on the hardware side because recently i read something that there's like mod lms that's

Starting point is 00:37:02 like compact enough that you can just like it on a single video gaming GPU. What do you guys do? We use AWS, like most startups. Yeah, we have an AWS stack. So we just spin up whatever we're like. Depending on the problem, the harder the problem, the more parameters you need, like the bigger GPU you need, the more money you pay. So like try to use the small models for the small problems

Starting point is 00:37:34 and then big ones for the big ones. This was a discussion we were having internally at one point. Many teams target to train X million, 100 billion, whatever number of parameter models and come up with numbers that keep growing. I mean, you keep seeing with all these new models coming out, oh, now we hit this new target. Oh, by the way, now there's a new target.

Starting point is 00:37:53 How does one go about evaluating that there is enough ROI and that for the business? Like, yes, the model is better. How do you know if it's worth it for all the money you paid for the hardware? Yeah. You always start small right i think for the bigger models you know generalize better they cover more use cases if you're like thinking about constraints like you ask yourself is it cheaper to have

Starting point is 00:38:19 like to train like to have five different like 7 billion parameter models versus like 100 million billion trillion parameter models is going to be a lot more expensive. Especially for us where we have a lot of different like internal machine learning use cases to like build the model for the use case and make it as light as you can without losing out on performance. And there's a lot of improvements and just like deltas on like quantization and different adapters that you can use as well. So you don't have to retrain everything. And there's been a lot of focus in that space as everyone's trying to get like the memory footprint down for for these huge models as you build observability into these models or in general as chatbots improve people start using more llms they keep getting better the number of human agents you have in the loop technically goes

Starting point is 00:39:17 down essentially what i'm getting at is many people have this fear uh that ai will result in many people losing jobs what's your take on this as someone who is working in this space? It's true. But people don't want jobs. They want money to live. This is something I empathize with, you know, especially now, like economy has gone to crap and it's really hard for people to live. Like I have friends and family members whose jobs have been automated away, you know, and it sucks. They don't miss talking on the phone to angry people all day. They miss like having a stable paycheck. So I actually worked at a call center in college doing customer service for UPS. And like, on one hand, it was the best job I had.

Starting point is 00:40:07 I made $9 an hour, which was a lot of money back then because I'm old. And, you know, the people were nice. The managers were nice. You got breaks. But like, it was so miserable. I dreaded every day because people were so mean. And I had different categories for the different types of ways that people were going to be mean. Men tend to just yell at you and then apologize, whereas women would kind of get more personal until you cried. It was terrible. So now it's funny because a large part of my job is like reading through

Starting point is 00:40:48 these customer support transcripts and like I don't feel bad about automating this job. People could do better things with their time instead of being yelled at. They really can. Yeah.

Starting point is 00:41:07 And I don't think people are going to these jobs i think people want you know they want stability so that they can live their lives so like a part of it feels like oh maybe that's a cop-out maybe i should take more responsibility but also like i'm not the government i feel like that's their job they're taking our money. They should help us be stable. They know we're automating things. Yeah, it's always going to be the case, right? Industrial revolution, people left the fields. It'll just keep happening. Makes sense. I agree with that perspective. On similar lines, as AI or these LLMs and other sorts of models help us become more efficient and automate things. What's your take on using LLMs for data labeling itself? Because if LLMs are able to understand data, could they also label it? Yeah, they can. Is that reliable or would that be reliable?

Starting point is 00:41:59 I would say anything that you're doing in an automated fashion, you need observability. So like, can you just do it out of the box? Yeah, but you're not going an automated fashion, you need observability. So like, is it, can you just do it out of the box? Yeah, but you're not going to get very good labels. Yeah, we've definitely tried that, like doing labels like, oh, let's see, let's see how these alarms handle it. And then we're like, not so great. So actually a big focus of ours is like needing high quality labels that do require a human and usually like lots of iterations of human like initial human labeling and then it gets handed off to a data scientist like me where I review it for consistency you could train a model on your existing labels and then look at variability across different model trainings and try to

Starting point is 00:42:45 understand like how consistent your labels are and then go back to the humans and say, okay, these are the ones, this is how we need to change our rubric. This is how we need to change our process so that we can get more consistent labels. Yeah. So it's iterative. So in 2017, you wrote a post about the future of commercial ai where you highlighted like the key themes in that at the time since we're talking about uh llms and ai if you were to do the same for today what would it look like interesting so one that makes me feel really old because that was six years ago and i was just yesterday yeah, it was the AI Frontiers 2017. And that was a really cool conference.

Starting point is 00:43:28 It really pulled in a lot of people from just a lot of different big companies and small companies. You had your big topics, which at the time was autonomous vehicles. And then also speech recognition. I like almost forgot that was like a big deal at the time. Like we can use deep learning for speech recognition.

Starting point is 00:43:50 Like voice to text was not a given. Now we just kind of like take it for granted. And like, yeah, maybe there are some nuance problems with specific systems. But, you know, you can get decent speech-to-text out of the box, especially if your audio quality is really good. What else were people talking about? Oh, yeah, this was back when we were talking about, when I say we, I mean like we as a field, moving from traditional feature engineering to just deep learning, just having a deep learning model handle it. And guess what? Deep learning won. Unless you have a really specific use case that you need

Starting point is 00:44:31 to hand engineer features because you want to control what those features are so that when you provide a prediction, you want to know exactly what those features were for the prediction. That would be a use case for hand feature engineering but for the most part we just let deep learning but but to caveat that it's more for like big scale problem at like big companies right like if you were to start something small like you still want to like handcraft your features right uh not necessarily why not just like i feel like now it's just like throwing out a big model see how it does if you don't like the results then dig into them and like try something more traditional what you're saying sounds promising because the noobs like me can do machine learning like throw

Starting point is 00:45:21 stuff at the wall and see what sticks yeah see what sticks you still have to dig in it'll still give you nonsense if you're not not careful i feel like that process has changed where it used to be like starting from the bottom and building something up where now it's just like it's cheap and affordable and hugging face is awesome and you can just go on there and get a model and see how well it does on your data and then if there are shortcomings then you ask yourself like do i need to fine-tune do i need to dig into the architecture but most of the times the issues are on the data side you know like having an enough data and having really good labels where like unless you really have like a specific use case like

Starting point is 00:46:08 we're just really moving away from that traditional machine learning and there are specific use cases that people have um that we we do want to use traditional machine learning for but less and less these days like so to put it very concretely like say um i'm finding this super interesting because i'm like starting to get like we were talking about so say you're doing like fraud detection right like and like in finance and then you have these sort of uh i don't know like 200 000 uh records of like based on like the different transactions that come through and then you have a set of labels of saying fraud or not fraud. So you're saying like, basically, we can get like, say like, even like an LLM, right? And then just feed it through basically turning feature engineering to like prompt engineering.

Starting point is 00:46:53 And then going to like, right. And then it's like, hey, this is like the data that you're working with. So then whenever there's a new data point, and then you literally just, you know, add it again, transform it to the problem and asking the model again, it's like, hey, do you think this is a fraud or not? Like, that's like basically what you're describing, right? Yeah, something you could do. You know, if you're trying to just do a classification

Starting point is 00:47:15 and you might want to go with just an encoder model, the decoder models that do generative are kind of more heavy. So, yeah, you probably want to try a few different models, but I think usually that is a really good place to start. And you can iterate quickly. And like, even if it's not the final model that you use,

Starting point is 00:47:38 you can get insight pretty quickly. Interesting, interesting. And yeah, like you don't need a ton of data to fine-tune uh especially now that they have like adapters that are out there where you don't have to like tune all the parameters of the model you just have like a small subset of parameters that you're tuning for your use case i see in in that regard almost like it is for like in terms of automation right like it is automating even like some of the data science jobs right because it's like you can turn feature engineering into like prompt engineering well i guess another way of putting it though is

Starting point is 00:48:18 that it's kind of democratizing right like the power that it harnesses right so like more and more have access to it instead of right like you having to understand like all these sort of things but then i guess the flip side of it is a lot of things can go wrong and you don't really understand how things work and yeah yeah i see yeah you're always gonna want that specialist there when things don't work who absolutely goes in there and like can really really tear all the wires out and figure out what's wrong. But yeah, I've been constantly aware of the irony that my entire career has been focused on automating my job away. Versus Salesforce, doing automated machine learning, we were automating everything that a data scientist could do so future engineering sanity checking everything that goes on to like training a model and hosting it for predictions was completely automated and and now it hue in like my job as a data scientist continues to

Starting point is 00:49:17 shift and turn more and more into something where like even i get to experience the job uncertainty like wait a minute is my boss gonna replace me with a model as long as they hallucinate they won't yeah just dumping lsd into the model being like do the work um but no there's no there's always like i think gonna be you're always a human in the loop and that is something i've been discovering more now working with companies and having to spend so much time on really getting good labels and also being able to trans what the customer actually wants into labels and into a model and that's even even where I feel like my career is going. It's kind of going more towards product

Starting point is 00:50:08 and kind of being the human that's translating the other human to the machine. Which is cool, right? We don't want to do engineering for the sake of engineering, right? We want to do engineering for the sake of building better products. Yeah, it depends.

Starting point is 00:50:26 I don't know if people who like to engineer for the sake of engineering. I take that back. This is an engineering podcast. I take that back. So you mentioned a couple of terms, and I've heard about those terms. I think I know them, but I'm pretty sure I don't. And I would love to get to know them from you. So you said something about few short learning what does that mean shot means that you're just putting examples

Starting point is 00:50:53 in your prompt i see so you would let's say i'm googling it i'm like could i just brain fart wait let me make sure yeah yeah you put it in the prompt. Yeah. Can you give an example of this? So let's say I have a recording of, it's just a transcript of the conversation that we've been having. And I want to know when I told a joke that was funny. Wow, this is all of a sudden very relevant. Please continue, Melissa. so i want to know yeah when i said stuff like that was funny um so i take the entire script transcript and i put in

Starting point is 00:51:34 the prompts like when did mel make a joke that was really funny and they weren't just laughing to be nice um and so like if you ask that the model will kind of like do okay but it'd be even better if i gave it some examples so i would say here are some examples and you know i can provide the text of like when this is a positive example and it really was a joke. And this was a negative example where people were just uncomfortable and these were uncomfortable ha-has, you know, and so you just, you just put your labels there in the prompt and, you know, you're restricted by the prompt size. So prompts can't be really big. So you want to kind of try to be concise as much as you can.

Starting point is 00:52:22 And that's few shot. It works. Oh, I, you know, I wouldn't build an entire product on it. It's a starting point, but it's not always great. You usually want to iterate. What does iteration look like

Starting point is 00:52:35 after something like this? I mean, iteration largely is building up your labels and going from, you know, like Fewshot to actual fine tuning. And fine tuning from, you know, like hue shot to actual fine tuning. And fine tuning is, you know, our traditional machine learning where you have your input and your desired output. And you train the model on lots of examples of those input output pairings. And then eventually the model figures out that relationship. So that's fine tuning and you need more data for that. But

Starting point is 00:53:05 that's going to give you obviously much, much better consistent performance. And I have a couple more new questions. Thanks for being so patient. So you mentioned, you mentioned Hugging Face. How do you use Hug face yourselves um it's a great place to just really quickly uh scope out the effort that will be involved in delivering a new product or a new aspect of the product um how so yeah so let's say somebody i actually did this back in the day before LLMs were like really big. I was trying to figure out how to extract root causes from conversations. And root cause is like a really specific kind of like concept. Like you want to know not just like what the customer wants, but like the why, like the underlying thing that happens.

Starting point is 00:54:04 You know, and this is very like obviously lms are great at this now but at the time they weren't really available so i just went on hugging face and i got some qa models and i felt so clever because i was like we just need a qa model we can ask a question and then we get the answer. And then I do like a bunch of like data processing and data cleaning and kind of move some things around. And then, you know, there we go. There we have our root causes. So that was an example where I just pulled a QA model, a question answer model.

Starting point is 00:54:42 And that was like a fun way to. It's funny because it was before prompt engineering and that's kind of what I'm trying to do. It's like, I didn't have time to like label a thousand conversations or train people to label a thousand conversations. So yeah, I tried the QA model. And that comes up a lot where like somebody, they want to know, oh, well, let's look at the sentiment. You know, we're looking at CSAT, but I want to see oh well let's look at the sentiment uh you know we're looking at c-sat but i want to see how the sentiment of the user changes over time and how that's related to their their ultimate customer satisfaction so let's pull a sentiment model off a hugging face and like evaluate how well it performs and see if it can give us insight. And if we like it, then we can take it, pull it,

Starting point is 00:55:26 set it up, make any changes or fine tuning that we want and use it in our pipeline. So Hugging Face is more like a repository of models that you can pick from with some indication about what the model does. I see. When I first looked at Hugging Face, I was a little confused about what the product was. I hadn't seen it until maybe early this year. And it had like a hosting API. It had this repository of models. It has like a code space, some leaderboards. And I'm like, I'm super confused as to what this thing is.

Starting point is 00:55:56 I should ask someone who uses it to know what it actually is. Yeah, definitely a repository of models. Just like a really great knowledge base. You know, what's funny is now that I'm thinking about it, Hugging Face is almost like turning into what OpenAI wanted to be when they started out. Like, do you guys remember when OpenAI was open and they facilitating, like learning about AI and making it like democratizing AI was like a hot term in the early 20 teens. And they had a reinforcement learning playground where people could test their different reinforcement learning algorithms and share them. And it was really cool. I think they did a lot more like outreach and communications and sell them at talks and stuff. And then, you know, things happen. Capitalism happens, COVID happened. And, you know, then they started kind of focusing on these, you know, privatized use cases.

Starting point is 00:57:01 And, you know, now we have Hugging Face, which it's not like a reinforcement learning playground. I think it's even more um you know now we have hugging face which it's it's not like a reinforcement learning playground i think it's even more you know power for that powerful than that because it's just like lots of machine learning models that you get to try out real quick and iterate and you know the license is there you know who wrote it you know you can give homage to the authors it's like get up for ml models of sorts absolutely yeah with some fun and bells and whistles yes a lot of bells and whistles right now that i think of it open ai had the like the dota competition thing right like that they were doing like rl

Starting point is 00:57:38 with like different video games and trying to right right right wow it, it's very different now. You're right. Yeah. By the way, for folks who are new like myself and who maybe want to learn more or play around with some of these models, do you have any recommendations for a place to start? Like, hey, go do this course

Starting point is 00:57:59 or something, some posts or blogs that you came across which are helpful? I think that I'm going to give the advice that I think is pretty canon, or a lot of people give, is just start getting your hands wet right away. I think courses can be helpful for people, especially if they don't have engineering experience. It's hard to get off the ground when you like can't install Python.

Starting point is 00:58:26 I think it's very frustrating, but like if you have like some like basic engineering skills, I really recommend just getting your hands wet. People traditionally for data science, people used to say, go to Kaggle, Kaggle, like host data sets and competitions.

Starting point is 00:58:42 And it was a place where you can go to get data to play around with um but now honestly i recommend that people check out upwork so upwork is a company where like companies can hire people as contractors for different tasks and one of those tasks can be data analysis machine learning so upwork i think is a good place to hone a new skill. And like you're motivated because like companies will post different jobs. So you can see like what a company wants to do. And you can do whatever like their take home interview, see how well you're doing and see what like people are actually hiring for. And this is what you used to only be able to do by like going through the grueling process of like applying to a bunch

Starting point is 00:59:26 of companies and hoping that they like this is what before i started insight this is kind of how i got started doing real data science is like getting a job interview at a company like uber facebook and failing miserably and then going back and resolving that problem and being like okay now i can do this so like now you can do it through upwork and like you can you know i i think it's less soul crushing maybe than having to like get all excited about working for a big company and realizing that you're not there yet yeah that's good advice relatable yeah cool um so we want to be

Starting point is 01:00:07 respectful of your time sorry that we're already going over Melissa but um maybe the the last question um

Starting point is 01:00:13 so like suppose you had the power to send a tweet that everyone will see what would your 280 character message say stop using twitter

Starting point is 01:00:22 damn uh ah message say? Stop using Twitter. Damn. You got us there. We need a better question. Yeah. Oh man. Like we, well, to stay on theme with the LLM, you know, we actually use the LLM to come up with a question like this, you know, that sounds somewhat

Starting point is 01:00:43 reasonable and I guess in also a very LLM fashion that up with a question like this, you know, that sounds somewhat reasonable, and I guess in also a very LLM fashion that, you know, it could lead to hilarious failures. Anything else that you'd like to share with our listeners before we close up? Um, no. Be kind. Kindness is

Starting point is 01:00:59 free. Kind to each other. Kind to your bots. Some people think that one day they will rule the earth and they have all the data. So be polite. Awesome. Thanks so much, Melissa. Thanks for reminding us. Yeah, it was a lot of fun. Thanks for having me.

Starting point is 01:01:21 Hey, thank you so much for listening to the show. You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com. You can also write to us at hello at softwaremisadventures.com. We would love to hear from you. Until next time, take care.

Software Misadventures - Automating away your job as a Data Scientist | Melissa Runfeldt (Salesforce, CueIn)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.