Software Misadventures - Automating away your job as a Data Scientist | Melissa Runfeldt (Salesforce, CueIn)
Episode Date: December 12, 2023Before joining CueIn last year as a Founding Data Scientist, Melissa was a Lead Data Scientist at Salesforce working on the Einstein Platform that focused on automating Data Science workflows. In this... conversation we dive into Melissa’s unique journey, what to do in the face of increasing job automation and explore the latest developments in practical AI. Segments: [00:02:13] Melissa’s background in computational neuroscience [00:06:08] 7 years at Salesforce vs startup [00:11:31] Joining CueIn [00:19:30] Chatbot observability [00:28:16] Feedback loops [00:33:10] Use LLM to observe.. LLMs? [00:39:06] AI automating jobs [00:43:01] Doing ML in 2017 vs now [00:50:35] Few shot learning, Hugging Face Show Notes: Melissa’s Linkedin: https://www.linkedin.com/in/melissajanerunfeldt/ Stay in touch: 👋 Let us know who we should talk to next! hello@softwaremisadventures.com Â
Transcript
Discussion (0)
I've been constantly aware of the irony that my entire career has been focused on automating my job away.
Versus Salesforce, you know, doing automated machine learning, we were automating everything that a data scientist could do.
So feature engineering, sanity checking, everything that goes on to like training a model and hosting it for predictions was completely automated. And now at QN, like my job as a data scientist
continues to shift
and turn more and more into something
where like even I get to experience
the job uncertainty.
Like, wait a minute,
is my boss going to replace me with a model?
As long as they hallucinate, they won't.
Yeah, just dumping LSD into the model being like, do the work.
You always want a human in the loop.
And that is something I've been discovering more now working with companies and having to spend so much time on really getting good labels and also being able to trans what the customer actually wants into labels and into a model.
And that's even where I feel like my career is going. It's towards product
and kind of being the human that's translating the other human to the machine.
Welcome to the Software Misadventures podcast. We are your hosts, Ronak and Guan.
As engineers, we are interested in not just the technologies, but the people and the stories behind them.
So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors
to chat about their path, lessons they have learned, and of course, the misadventures along the way.
Hi everyone, this is Guang.
In this episode, we're chatting with Melissa about her unusual career journey in data science.
We'll explore the lessons she's learned from automating away her jobs, transitioning from
a lead data scientist at Salesforce to her current role focusing on chatbot observability
at QN.
Join us as we delve into Melissa's unique journey,
what to do in the face of increasing job automation,
and explore the latest developments in practical AI.
Hi, Melissa.
Okay, jumping right into it.
So you have a PhD in computational neuroscience,
and you studied single-cell electrophysiology
and two-photon calcium imaging in acute slice preparations that involves delicately removing mouse brains.
Tell us more about that.
I sounded very intrigued.
So, yeah, I did my PhD at University of Chicago, focusing on computational neuroscience.
And I was studying information coding in the neocortex. So the neocortex is the part of the brain that I think is more exciting.
Like when you think of, just think of a human, you have all of these senses, all of the sensory
information that you're processing and receiving. So your sensory information comes in through your
eyes and your ears and your nose and your skin, and it gets relayed to the neocortex through a
structure called the thalamus. And the pathway leading to the neocortex is fairly structured
in the sense of the responses of the neurons. It's fairly easy. It's very parallel to the stimulus itself. So it's pretty straightforward
to decode what a stimulus is from the neural activity and these kind of earlier brain
structures. But once it gets to the neocortex, it's where a lot of the interesting computation
happens. And that's where all of our thinking happens, right? So that's where we're not only
flatly trying to perceive what's there, but we're perceiving it in a way that makes sense to us,
in a way that can be incorporated with your other sensory perceptions and then ultimately is going
to lead to a thought or an action or a motor output. So we were really focused on understanding how groups of neurons in the
neocortex encode information. And, you know, all data science is like the data that you're looking
at kind of changes how you view the system. So we had new technology, this is two-photon calcium
imaging that let us look at an entire field of neurons in relative real time. So you could see the action potential
activity of neurons and you can see where they are in space. So that really allowed us to move
beyond like traditional neural encoding methods where you're just looking at one single neuron
spiking. And instead we could look at the entire group of neurons and relate their activity to
what the stimulus is, to like what computations are being performed and how those groups of neurons are representing the information.
So I was in a pitch dark room because it was like imaging, right, two photon with a big laser, pitch black.
And like my lab, we love like listening to like hardcore metal or like death metal the whole time.
Black hoodie, patching individual neurons with the microscope too it was really fun stuff
how do you guys analyze the data because he's going from analog to then digital and then like
yeah it depends on the recording method so for for two-photon calcium imaging, the signal that you get is
counting photons. Basically, so you're moving a laser beam around with these two mirrors,
these gavel mirrors. And because you are moving, you're controlling the mirrors, you know where
the laser beam is pointed. So whenever the laser beam hits a certain specific spot in three-dimensional space, you know where the laser beam is pointing.
So any photons that you count at that moment in time, those photons are coming from that one specific spot.
So the input is coming in, you know, as photon counts, but you're reconstructing that field of view.
And the laser moves really, really fast.
So you could cover like a millimeter surface area pretty quickly, like a few seconds.
Comparing that to what you do now, how do these two worlds compare or contrast rather?
So now I'm on the founding team of a startup called QN. I think that, you know, the most biological aspect is just interacting with humans. That's where all the variability comes
from, like understanding what customers want and how like translating that into labels.
There are definitely a lot less mishaps in software as opposed to biology by far. I would
say that the more complicated that your system gets, the more opportunity there is for stuff to go wrong.
Before QAnon, I was with Salesforce, and I worked on a few different teams within
Einstein. Salesforce Einstein was kind of like the catch-all for machine learning
at Salesforce. And I worked on the Einstein platform. So we were building a machine learning
platform, completely automated, multi-tenant machine learning platform for all of
Salesforce. And as that platform grew, there's just more and more components that can go,
that can be misaligned or, you know, not work. And that was actually more like science working
on that platform because something would go wrong and you would truly not know why.
Everyone had their own different expertise and
things that they knew. So it's kind of like science in the sense of you have your tools
and you're sampling from these different dimensions to try to understand this root
phenomena. But you don't know if you're looking in the right place, if you're using the right tools.
There's some uncertainty and some knowledge nuggets that
you have to reveal through banging your head against the computer.
So you mentioned you're the founding data scientist or the machine learning engineer.
I get these titles confused and I don't know what the difference is between them,
which is a different topic which we can get into.
Oh, yeah. So data science is more
of a catch-all term. It usually involves that you're interacting more directly with the data,
maybe working with scripts, prototyping, whereas machine learning engineer is like you're building
software to scale up kind of what a data scientist does. I don't know if that's helpful.
No, that helps. So what's the founding
story of QN? How did that happen to be? So our CEO, his name is Mayuk Bawal, and he was a product
manager at Salesforce. And we overlapped sometime at Salesforce on the Einstein platform. So him
and his good friend, Vinesh Ganapathy. They both went to Stanford together and were good friends
and always wanted to start a company together. Vinyesh was at Google and then later at Uber,
a leading engineering team, so very experienced in the engineering space.
Bayouk was a product manager and he was working on Einstein bots. During COVID times, unsurprising, bots got very popular. A lot of companies really
needed to like spin up chatbots really, really fast. And Mayuk was with the group that was
working with customers and setting up those bots. And he kept on kind of coming to a shortcoming
and having the tools available on that platform that give customers both the observability and insight that they wanted into how their bots are performing and how customers were interacting with the bots, as well as like, what can you do to make your bots better?
Analytics can give you insight. You say, oh, this is where it's going wrong. But a company, they always you want to fix it. You want to fix it right away.
So Mayuk saw that opportunity, investigated the market space more generally, and then him and Vinesh started QN and Kevin joined as a founding data scientist. And it's been about two years
since now our team is seven people and we are so busy.
It's really awesome.
We have a lot of customers coming in and our use cases are just like growing.
We're scaling up.
It's a fun time.
Like from doing neuroscience to seven years at Salesforce
to now co-founding a startup,
like you must have some like perceptions
about what building a startup is like you must have some like perceptions about what building a startup
is like. Were there any like big surprises that when you joined? Good question. There weren't any
big surprises. It was different for sure, but it was different in all the ways that I hoped it
would be. I get to like end-to-end solutions from like communicating to the customer, understanding what they want,
developing the model and solution, exploring the data, getting a prototype model up for them,
showing them the results, making sure they're happy, reaching out to the rest of our team and
working with them to get the end result up to the UI and building the infrastructure around that,
getting it into the database, getting it to the lambdas, making sure everything's working.
And then like showing it to the customers and like doing all of this within like a month's time span, often even faster than doing it all very, very quickly and getting that fast feedback.
So that's like what I was hoping for.
That's what I was looking for. I got it. And yeah, it's a lot of work, but it's definitely rewarding.
What did the process of making the decision look like? How did the founders of QN reached out to
you to join? And then what made you actually take the decision to leave Salesforce and then join QN,
which is again, just two years in. So
lots of risk of joining a startup, lots of excitement too, lots of adventures.
So what did that decision making look like for you?
Good question. I think at the time when Mayuk approached me to join QN, I was actually
really happy with my team, both horizontal and vertical. So it was, you know, it was a little bittersweet to leave at
that point in time where like things were going well and I just got my promotion and everyone was
happy. But like sometimes you get an opportunity and you just have to go for it. For me, like I
asked myself, like, what did I regret not doing this? And me usually the answer is yes like I would regret not
having this experience and like when I think of career paths I think I was at the point where
like management was kind of more of like the next step for me if I really wanted to keep like
growing this is capitalism it's all exponential growth you can't just like
have one derivative of improvement so I was like do I want to be a manager or do I want to have
this new experience and like I knew it would be like a little bit of a pay cut but it would put me
in a place where like moving forward in my career I think I would be doing more of what I wanted to
be doing. Like everything's going well at QN, like we're all very excited for like our equity to turn
into like real money one day. But the goal isn't really for me to like stop working. It's to be
enjoying my job and to be able to have an impact not only on the technology,
but on the culture, on people's lives and experiences. For me, the point of getting
power is to like make other people's lives better. Like that's what I like to do. I'm like,
we can all just be kind and nice and happy and do great, exciting work together.
And that's kind of like what I prioritized and went for and you know it's hard for people to
leave a corporation i think everybody gets a little soul crushed in one way or another so
it's just time for a change um so speaking of career path when you first started at salesforce
did you have like a rough idea of i'll try to climb the ladder and go for the how did you put it no first order derivative
speed of growth exponential growth sorry sorry yeah oh my gosh physics oh sorry
and maybe another way of putting this is like two people right like who are just starting their
career maybe in data science like would you recommend
this sort of try to go for the exponential growth but then at the same time kind of keeping an eye
out for the startup opportunities which are more of the unknown right like because you could go
really well but you could also just go really horribly wrong which i've got plenty of stories
about so like how do you recommend newbies in data science like think about
that okay well i'm going to start with your last statement about startups and kind of what i've
heard from my network it's either a really positive experience or it's absolutely terrible
and traumatizing and like you need better help for like at least two years afterwards
and i think in making that
decision you really have to talk to the people on the team and really like get a good assessment
of how happy they are and like what their mentality is are they there to grow and learn
together as a team or are they just trying to crush something out themselves so for joining
a startup,
you really have to focus on the people and the culture
and make sure that you don't know
if it's gonna succeed or not.
The gods of stochasticity will decide that for you.
The only thing you're guaranteed to get from it
is the experience that you get.
So are you going to be doing stuff
where you're growing technology-wise
or growing new skillsets?
And this is the case for any job, right? You focus on what skill sets you want to grow,
and then also ask yourself, is this an environment where I can grow?
When I joined Salesforce, I really wanted to grow as a software engineer. Like my background was neuro. So I was really like comfortable with reading manuscripts, like the foundation and mathematics
and dynamics and optimization.
And so I felt comfortable in the ML space and in kind of like the research space at
the time.
But like I've been programming in MATLAB for like seven years and I don't even know if
you should put that on your resume sometime.
I empathize with that.
It's so beautiful, though.
I really liked it.
I had a lot of fun with MATLAB.
I love building GUIs.
So easy.
Yeah, so easy.
Parallel processing.
You ran out of RAM so much. Anyways, I digress. So I
just kind of taught myself Python at the tail end of my postdoc and preparing for data science. But
I was in San Francisco at the time. And I thought, like, this is the time and place to learn how to
be a software engineer. And this is like, especially with the team I was with, that was
building a machine learning platform, which was a very new thing at the time. I saw that as an opportunity to gain knowledge that
wasn't knowledge you're going to get from a textbook or even maybe a Coursera.
So that's kind of like the muscle that I wanted to grow. And I'm glad I made that investment.
Of course, it does get to a certain point where you're like, okay, I miss data.
I miss architecture and modeling too. But yeah, focus on where you want to grow.
And going back to the startups, if you were given two choices to join startup A, which has really,
really fascinating tech, but then like the people aspect, it's like, you know, it's like you get
along with people like it's versus startup B. But the tech is maybe mediocre, but then the people are amazing.
Which one would you pick?
I think it really depends on where you are in your life and what's valuable.
I think there are different times in life where you're like, I got energy.
I'm going to conquer.
I don't care about all that soft stuff and feelings.
I'm just going to power it through.
And there can be value in that until you're so miserable that soft stuff and feelings. I'm just going to power it through. And there can be
value in that until you're so miserable that like nothing's effective. But up until then,
you can really grow just by like focusing on some technology that you're really passionate about.
And there are different times in your life where like it's really important to just be
stable and to be able to work 40 hours a week and not have extra stress.
So, you know, it kind of depends on where you are.
I think having children is a big component, even though I have a three and a five-year-old
and I'm at a startup.
But, you know, like...
Was that part of the calculation for you for joining a startup?
Is like, if you like, it's you've got a hang of the small human
being raising process that it's like okay now i can take some no i'm totally contradicting myself
life is crazy but you know i looked out with this startup because like i was excited about the tech
it was totally like my space and i knew the people, or I knew some of
the people, and I knew that they were really smart and really focused and not petty or anything like
that. And then, so I was able to get both. So that's kind of why I went in that direction.
Also, a lot of people on our team have young children. So I'm not the only one whose meetings get crashed by little bodies. It's pretty common, but that also helps.
So you mentioned QN is in the space of customer experience transformation, which includes like
chatbot observability. Can you talk more about this problem space and what kind of products QN
builds? Yeah, so more broadly, it's customer experience transformation.
But I think you'll understand better
once I describe kind of like
our tech stack and process.
So we pool data in,
we build connectors to any vendor
that a customer is using.
Let's pause for a second.
Let's take it for many people
who might not know
what customer experience here means
or like the vendors that you work with. Would you mind just giving a brief description of what these terms mean?
Yeah, so customer experience is kind of like the evolution of customer support.
I think customer support is kind of one component of it, but these are groups that are focused on the customer and the experience they have interacting with a company.
So that includes communicating through a chatbot.
It includes communicating with live agents through telephone conversations or through chat conversations, as well as email and asynchronous ticket like ticket based conversations. So the typical use cases,
you have a retail company and people are reaching out to the retail company saying like,
I refund, I want to return, this is too big, this is too small, this broke,
what are your promotions, stuff like that. Most companies have some form of customer support
because they have a product and customers are
using it and things break. So yeah, our goal is really to improve customer experience. And
right now, really what companies are focusing on is we all have to do more with less. So everyone
has less people, you have less engineers than you used to. You have less human agents that you can pay for.
So everyone wants to automate more of the customer experience process,
but they want to keep customer satisfaction, CSAT, NPS,
which is like company net promoter score.
They want to keep these metrics up
while automating more of that customer experience.
And they also want observability.
And this is really where we come in is we connect all of the different channels and we offer like observability. So you can see the entire customer journey from an individual customer to an aggregate,
which is really what we focus on, like grouping all of these different conversations and aggregate.
So you can see where things are going wrong and what agents are saying.
So the use cases are pretty large and expansive.
Our canonical use case is chatbot observability and optimization,
like identifying where your chatbot is getting confused, where your customers are getting frustrated. So we identify those like missed intents when you have like a traditional chatbot
system is intent based. So basically the bot has a concept of a list of intents and those intents
are mapped to a response. So the bot's job is to customer type something, you identify what the intent is, and then you provide a response.
So you can imagine if the intent isn't already there in the chatbot system, it gets confused, either it misclassifies or it has like a, I'm sorry, I'm confused response.
And so we can identify those and we do unsupervised learning.
So we identify new intents. And then we look at the agent side
of the conversation, because when a chatbot fails, it will usually go, it'll get escalated to a
living human agent. And we look at all those different agent responses, we summarize them,
we aggregate them. So we can say, okay, in export, both of those back to the bot. So we could say, here's a new intent.
This is what customers are asking for.
This is how your agents are responding.
And they can look through the different
kind of aggregated and summarized agent responses
to choose the appropriate one that they want to use.
And then you update your bot with that.
So then you have a new response for the bot as well.
That's super interesting because
observability in the machine learning context right usually is to help you build a better model
right in terms of you have better labels and you clean it up but then in this case it can be like
very different things right like a large language model to something that's like handcrafted by
someone like sitting on the product side trying to just figure out hey these are like the branches you want to draw is that like
something that you guys experienced how do you go about bridge between sort of all those different
like types of levels of complexity i guess yeah okay i think i get that so you said a few things
one you mentioned you know using llms and bots and that is a hot topic so right now i think
where most of the industry is at is not letting a generative bot loose on their customers that's
something that people are quite apprehensive of it's also something that we you know are moving
forward towards but like you you have to set in a lot of guardrails and kind of
like, make sure that you have control over the responses, you're not giving misinformation
or anything like that. So we as an industry do move more towards generative bots, like the need
for observability is even greater, because like, you probably don't want to release your new generative bot on all of your customers.
You want to start, you know, A-B testing with a small group.
You want to observe what the, and record what those questions are and what those bot responses
are so that you can continuously build confidence in what you've built and then, you know, expand
it for more customers.
In this case, when you're looking at a bunch of this data
coming from different models,
and let's say in this specific case,
it's just generative model on some slice of customers,
other chatbots on other slice of customers.
You mentioned you do unsupervised learning.
I don't understand much about it,
but what I understand is labeling data is helpful
to recognize patterns and train your models. Is that something that's done in this space or you do or is it all like let the data from the data figure out what makes sense? is much more founded in traditional natural language understanding. When we analyze
conversations, we do some supervised learning. So we categorize the conversation, the different
components. And by categorizing the utterances, that helps us simplify what a conversation kind
of looks like so we can aggregate. And in that case, we supervise learning. So we do
labeling. We do labeling internally and work with a team of labelers to make sure that we get these
labels correct. And then the unsupervised part that kind of, unsupervised, it's like clustering.
Let's say you've got a bunch of utterances and you embed them using some embedding model like BERT. And
then you're trying to like figure out what are these different groups. And it's always been
a fuzzy process. You have different domains, you have different data sets. So you can,
you can kind of get an idea of like what is a good parameter space for you, but there's very much a human in the loop component of it.
And when we do that human in the loop component,
I mean, it's kind of fun.
Like you iterate locally, you try different parameters,
but ultimately like I really try to make sure
that I get the final result up in the UI,
not just like looking at spreadsheets, but like, I want to see the
decisions that I'm making with the machine learning models. I want to see how this impacts
what the customer is actually seeing and then like what value the customer is getting from it.
So I think that in general is always important to have that human in the loop because we also
get feedback from our customers too. Like we're a really agile team loop because we also get feedback from our customers too like we're a really agile team so like we get feedback from them we change it right away
um so yeah when you're especially working in the unsupervised space and see there's always going
to be a human in the loop somewhere and that's also something kind of we're working with
companies in where we're like proposing new tags or new labels and but they're they have a
human in the loop that's like approving them or slightly modifying them so that it's you know kind
of more consistent with the current taxonomy they have or something and the feedback loop here is
that you made the changes to the system uh for the customer their c-sAT or NPS goes up and that tells you that the system's working?
Or is there a different feedback loop? Yeah, that's like the big loop. That's the big slow
loop. The faster loop is taking their data and they say, okay, here's all of our conversations.
And then we wave our machine learning magic wand and we put it up in the UI and it's all
organized in a way that's supposed to be one and we put it up in the ui and it's all organized in a way
that's supposed to be actionable and insightful for them and they tell us what's useful like what
looks right uh what is useful what they'd like to see more of taking in all of the knowledge and
expertise that they have of their their system and their business um and giving that feedback
so that we can adjust it accordingly is this uh online or is it like offline sort of batches like
they give you guys the data and then you guys do sort of the analytics or this is all happening
within the platform kind of real time uh right now most of it is a batch meaning that like customers don't need to know like minute to minute what their
c-sat score is um yeah so we do a batch you know everything's productionized so in the sense of
like we have an automated incremental pipeline so once we've made all of our modeling decisions
you know and trained our models then the incremental pipeline is running on its own. We do have one
real-time feature, which is an Answers API. I think this is something a lot of people in the
space are familiar with. A company has a bunch of knowledge articles or like all of their knowledge
base, and they want to be able to just ask a question and then get a generated response from
that knowledge base. So, you know, this is this is i think the direction people are thinking of going with bots that's kind of
where you want to go with bots where you have your knowledge articles and as long as you're
updating them appropriately you want someone to ask any question from the knowledge article and
get a good response um so that's something we're doing right now. It's not being used for bots yet.
We're using it for agents.
So human agents,
especially a new agent
or really any of them
can just type in a question
and get an answer.
And that's a hosted API.
I see.
Like tying back to earlier
where you said about LLM
is way harder to implement
as a chatbot
for specific use cases for companies
because of
the guardrails that you need like do you see like a parallel between like now even though the ideal
would be something that's very flexible that's lm based to but when you are like looking at in
practice it's all like very much decision trees like comparing that to back in 2008 or like when hadoop and big data
first became a thing where like everybody's like talking about big data just like now right
everybody's talking about lm but then you look out of the hood like uh nobody's actually like
doing it and i was wondering like do you see some like parallels of back then maybe it was like the
complexity of infrastructure setup that's sort of preventing people like what do you think that is
now that kind of prevents lom adoption in like chatbots is it just like guardrails yeah that's
a good question i mean that's always the case it's so funny that you mentioned big data because
nobody uses the that term anymore it's just data it's all big it's all big. It's all big. Yeah, that's always the case.
You know, there's some new technology, you know, it used to come from research.
Now maybe it came from a company and it does something that people weren't able to do before.
But there are some kinks to be worked out and some hesitation and caution. And the more like a company has to lose, the more
risk adverse they're going to be and the slower they're going to, the more time they're going to
take and the more guardrails they're going to set up. I think, you know, in the LLM space,
it's just going to keep moving forward. Like I would be really surprised if in 10 years,
like everyone wasn't using
generative bots i'd be like that's weird what happened um yeah so i think it's just a matter
of taking time and building trust like you know with any new technology you have to build that
trust you have to understand understand how it works and if you don't understand it at least
see that it performs consistently enough
where you're like it's like physics it just works and so there is a case of using generative llms
when it comes to chatbot itself you also see a lot of use cases these days like people putting
a bunch of small apis on top of something like chat gpt or other models where upload an article
or transcript of a
conversation and we'll summarize it for you or you can ask questions back and forth do you see
opportunities of using llms in the observability side itself um where you have all this data you
have all these transcripts would you have an llm summarize it and then work off of that or that's
just like no don't do that
because you can't trust the system enough
to rely on it for observability.
Oh, we totally do.
We totally use LLMs and we usually use them all the time.
We host our own, you know,
we're fortunate enough to have,
I was going to say we're fortunate enough
to have some amazing data scientists,
but I realized being one of the three of those that like...
You can still say that.
Kevin did most of the work in the recent LLMs. Kevin Moore and then Jasmine Wong, who's one of our newer hires, has been working on our LLMs. But yeah, we fine tune our own LLMs and host them. Different companies have different policies on where they want their data to go. Like some companies are okay with using open AI endpoints for like exploration
and labeling. For me, I'm always worried about some other company hosting a live service,
especially like I get a lot of emails from open ai saying something's down and degraded like i don't
want i don't want my customers to have to suffer from that but yeah we totally use it um i think
they're really great and like early exploration and like you said summarizing like by summarizing
text data like it kind of works as a normalization where like it's making the language
more similar so that makes it easier to cluster and guardrails have to be in place in general
what we find is hallucinations or what we call factual nonsense which is a hallucination that sounds very reasonable uh i guess you'd call it like
i i i forgot who said it but it's like llms are bullshitters or professional bullshitters of
sorts professional bullshitters yeah you gotta put your guardrails in place and like it depends on
what the the problem space is you're working on i think as soon as you've refined your problem
space and you're working on one specific thing, it's pretty quick to build like confidence and like the responses
are being generated and catching where there are errors and how you can fix that, which is usually
providing more information or like better input data. What do guardrails look like in this case?
Because it's not, it's just internal, right? So you don't have to worry about prom injection and things like that.
So what do you have to do for the guardrails?
That's a lot of like, you have a script, you've got your prompts, you're generating results,
and then you look at the results.
Like there's very much like a data scientist in the loop kind of reviewing things.
You know, of course, you're writing scripts, so you're like grouping things and you're
like, oh, that looks weird.
And you dig into it.
It was pretty easy to build like a hallucination detection, like classification model for one
use case I was working on where it was just like a very specific use case.
And it was really obvious in the response that
there were hallucinations happening so it was you know i got like 100 examples together and trained
like a simple classifier and then hallucination detection but it's easier to do the more constrained
your problem space yeah i think just really like nodding back to like observability like
if you have observability that is your
guardrail where you can identify really quickly when something stands out when something's an
anomaly and then investigate how to fix that just like an e-bug and can you uh share with us like a
little bit on the hardware side because recently i read something that there's like mod lms that's
like compact enough that you can just like it on a single video gaming GPU.
What do you guys do?
We use AWS, like most startups.
Yeah, we have an AWS stack.
So we just spin up whatever we're like.
Depending on the problem, the harder the problem, the more parameters you need,
like the bigger GPU you need, the more money you pay.
So like try to use the small models for the small problems
and then big ones for the big ones.
This was a discussion we were having internally at one point.
Many teams target to train X million, 100 billion,
whatever number of parameter models
and come up with numbers that keep growing.
I mean, you keep seeing with all these new models coming out,
oh, now we hit this new target.
Oh, by the way, now there's a new target.
How does one go about evaluating that there is enough ROI
and that for the business?
Like, yes, the model is better.
How do you know if it's worth it
for all the money you paid for the hardware?
Yeah.
You always start small right i think for the bigger models you know generalize better they cover more use cases
if you're like thinking about constraints like you ask yourself is it cheaper to have
like to train like to have five different like 7 billion parameter models versus like 100 million billion trillion parameter models is going to be a lot more expensive.
Especially for us where we have a lot of different like internal machine learning use cases to like build the model for the use case and make it as light as you can without losing out on performance. And there's a lot of improvements and just like deltas on like quantization
and different adapters that you can use as well.
So you don't have to retrain everything.
And there's been a lot of focus in that space as everyone's trying to get
like the memory footprint down for for these huge models
as you build observability into these models or in general as chatbots improve people start using
more llms they keep getting better the number of human agents you have in the loop technically goes
down essentially what i'm getting at is many people have this fear uh that ai will result
in many people losing jobs what's your take on this as someone who is working in this space?
It's true.
But people don't want jobs.
They want money to live.
This is something I empathize with, you know, especially now, like economy has gone to crap and it's really hard for people to live. Like I have friends and family members whose jobs have been automated away, you know, and it sucks. They don't miss talking on the phone
to angry people all day. They miss like having a stable paycheck. So I actually worked at a call
center in college doing customer service for UPS. And like, on one hand, it was the best job I had.
I made $9 an hour, which was a lot of money back then because I'm old. And, you know, the people
were nice. The managers were nice. You got breaks. But like, it was so miserable. I dreaded every
day because people were so mean. And I had different categories
for the different types of ways that people were going to be mean. Men tend to just yell at you
and then apologize, whereas women would kind of get more personal until you cried. It was terrible.
So now it's funny
because a large part of my job
is like reading through
these customer support transcripts
and like I don't feel bad
about automating this job.
People could do better things
with their time
instead of being yelled at.
They really can.
Yeah.
And I don't think people are going to these jobs i think people want you know they want stability so that they can live their lives
so like a part of it feels like oh maybe that's a cop-out maybe i should take more responsibility
but also like i'm not the government i feel like that's their job they're taking our money. They should help us be stable. They know we're automating
things. Yeah, it's always going to be the case, right? Industrial revolution, people left the
fields. It'll just keep happening. Makes sense. I agree with that perspective. On similar lines,
as AI or these LLMs and other sorts of models help us become more efficient and automate things.
What's your take on using LLMs for data labeling itself? Because if LLMs are able to understand
data, could they also label it? Yeah, they can. Is that reliable or would that be reliable?
I would say anything that you're doing in an automated fashion, you need observability.
So like, can you just do it out of the box? Yeah, but you're not going an automated fashion, you need observability. So like, is it, can you just
do it out of the box? Yeah, but you're not going to get very good labels. Yeah, we've definitely
tried that, like doing labels like, oh, let's see, let's see how these alarms handle it. And
then we're like, not so great. So actually a big focus of ours is like needing high quality labels that do require a human and usually like lots of iterations
of human like initial human labeling and then it gets handed off to a data scientist like me where
I review it for consistency you could train a model on your existing labels and then look at
variability across different model trainings and try to
understand like how consistent your labels are and then go back to the humans and say,
okay, these are the ones, this is how we need to change our rubric. This is how we need to
change our process so that we can get more consistent labels. Yeah. So it's iterative.
So in 2017, you wrote a post about the future of commercial ai where you highlighted
like the key themes in that at the time since we're talking about uh llms and ai if you were
to do the same for today what would it look like interesting so one that makes me feel really old
because that was six years ago and i was just yesterday yeah, it was the AI Frontiers 2017.
And that was a really cool conference.
It really pulled in a lot of people
from just a lot of different big companies
and small companies.
You had your big topics,
which at the time was autonomous vehicles.
And then also speech recognition.
I like almost forgot that was like a big deal at the time.
Like we can use deep learning for speech recognition.
Like voice to text was not a given.
Now we just kind of like take it for granted.
And like, yeah, maybe there are some nuance problems with specific systems.
But, you know, you can get decent speech-to-text out of the box,
especially if your audio quality is really good. What else were people talking about? Oh,
yeah, this was back when we were talking about, when I say we, I mean like we as a field,
moving from traditional feature engineering to just deep learning, just having a deep learning model handle
it. And guess what? Deep learning won. Unless you have a really specific use case that you need
to hand engineer features because you want to control what those features are so that when
you provide a prediction, you want to know exactly what those features were for the prediction.
That would be a use case for hand feature
engineering but for the most part we just let deep learning but but to caveat that it's more for like
big scale problem at like big companies right like if you were to start something small like you still
want to like handcraft your features right uh not necessarily why not just like i feel like now it's just like throwing out a big model see
how it does if you don't like the results then dig into them and like try something more traditional
what you're saying sounds promising because the noobs like me can do machine learning like throw
stuff at the wall and see what sticks yeah see what sticks you still have
to dig in it'll still give you nonsense if you're not not careful i feel like that process has
changed where it used to be like starting from the bottom and building something up where now
it's just like it's cheap and affordable and hugging face is awesome and you can just go on
there and get a model and see how well it does on your data and then
if there are shortcomings then you ask yourself like do i need to fine-tune do i need to dig into
the architecture but most of the times the issues are on the data side you know like having an
enough data and having really good labels where like unless you really have like a specific use case like
we're just really moving away from that traditional machine learning and there are specific use cases
that people have um that we we do want to use traditional machine learning for but less and
less these days like so to put it very concretely like say um i'm finding this super
interesting because i'm like starting to get like we were talking about so say you're doing like
fraud detection right like and like in finance and then you have these sort of uh i don't know
like 200 000 uh records of like based on like the different transactions that come through
and then you have a set of labels of saying fraud or not fraud. So you're saying like, basically, we can get like, say like, even like an LLM, right?
And then just feed it through basically turning feature engineering to like prompt engineering.
And then going to like, right.
And then it's like, hey, this is like the data that you're working with.
So then whenever there's a new data point, and then you literally just, you know, add it again, transform it to the problem
and asking the model again, it's like, hey,
do you think this is a fraud or not? Like, that's
like basically what you're describing, right?
Yeah, something you could do. You know,
if you're trying to just do a classification
and you might want to go with
just an encoder model,
the decoder models that do generative
are kind of more heavy.
So, yeah, you probably want to try a few different models,
but I think usually that is a really good place to start.
And you can iterate quickly.
And like, even if it's not the final model that you use,
you can get insight pretty quickly.
Interesting, interesting.
And yeah, like you don't need a ton of data to fine-tune
uh especially now that they have like adapters that are out there
where you don't have to like tune all the parameters of the model you just have like
a small subset of parameters that you're tuning for your use case i see in in that regard almost like it is for like in terms of automation right like
it is automating even like some of the data science jobs right because it's like you can turn
feature engineering into like prompt engineering well i guess another way of putting it though is
that it's kind of democratizing right like the power that it harnesses right so like more and
more have access to it instead of right like you having to understand like all these sort of things but
then i guess the flip side of it is a lot of things can go wrong and you don't really understand how
things work and yeah yeah i see yeah you're always gonna want that specialist there when things don't
work who absolutely goes in there and like can really really tear all the wires out and figure out what's wrong.
But yeah, I've been constantly aware of the irony that my entire career has been focused on automating my job away.
Versus Salesforce, doing automated machine learning, we were automating everything that a data scientist could do so future engineering sanity checking everything that goes on to like training a model and hosting it for predictions
was completely automated and and now it hue in like my job as a data scientist continues to
shift and turn more and more into something where like even i get to experience the job uncertainty like wait
a minute is my boss gonna replace me with a model as long as they hallucinate they won't
yeah just dumping lsd into the model being like do the work um but no there's no there's always
like i think gonna be you're always a human in the loop and that
is something i've been discovering more now working with companies and having to spend so
much time on really getting good labels and also being able to trans what the customer actually
wants into labels and into a model and that's even even where I feel like my career is going. It's kind of
going more towards product
and kind of being the human
that's translating the other human
to the machine.
Which is cool, right? We don't want to do
engineering for the sake of engineering, right?
We want to do engineering for the sake of building
better products.
Yeah, it depends.
I don't know if people who like to engineer for the sake of engineering.
I take that back.
This is an engineering podcast.
I take that back.
So you mentioned a couple of terms, and I've heard about those terms.
I think I know them, but I'm pretty sure I don't.
And I would love to get to know them from you.
So you said something about few short learning what does that mean shot means that you're just putting examples
in your prompt i see so you would let's say i'm googling it i'm like could i just brain fart wait
let me make sure yeah yeah you put it in the prompt. Yeah. Can you give an example of this?
So let's say I have a recording of,
it's just a transcript of the conversation that we've been having.
And I want to know when I told a joke that was funny.
Wow, this is all of a sudden very relevant.
Please continue, Melissa. so i want to know yeah
when i said stuff like that was funny um so i take the entire script transcript and i put in
the prompts like when did mel make a joke that was really funny and they weren't just laughing to be nice um and so like if you ask that the
model will kind of like do okay but it'd be even better if i gave it some examples
so i would say here are some examples and you know i can provide the text of like when this
is a positive example and it really was a joke. And this was a negative example where people were just uncomfortable and these were uncomfortable
ha-has, you know, and so you just, you just put your labels there in the prompt and, you
know, you're restricted by the prompt size.
So prompts can't be really big.
So you want to kind of try to be concise as much as you can.
And that's few shot.
It works.
Oh, I, you know,
I wouldn't build an entire product on it.
It's a starting point,
but it's not always great.
You usually want to iterate.
What does iteration look like
after something like this?
I mean, iteration largely
is building up your labels
and going from, you know,
like Fewshot to actual fine tuning. And fine tuning from, you know, like hue shot to actual fine tuning. And fine tuning is,
you know, our traditional machine learning where you have your input and your desired output.
And you train the model on lots of examples of those input output pairings. And then eventually
the model figures out that relationship. So that's fine tuning and you need more data for that. But
that's going to give you obviously much, much better consistent performance.
And I have a couple more new questions. Thanks for being so patient.
So you mentioned, you mentioned Hugging Face. How do you use Hug face yourselves um it's a great place to just really quickly uh scope out
the effort that will be involved in delivering a new product or a new aspect of the product
um how so yeah so let's say somebody i actually did this back in the day before LLMs were like really big.
I was trying to figure out how to extract root causes from conversations.
And root cause is like a really specific kind of like concept.
Like you want to know not just like what the customer wants, but like the why, like the underlying thing that happens.
You know, and this is very like
obviously lms are great at this now but at the time they weren't really available so i just went
on hugging face and i got some qa models and i felt so clever because i was like we just need a
qa model we can ask a question and then we get the answer.
And then I do like a bunch of like data processing and data cleaning and kind of move some things around.
And then, you know, there we go.
There we have our root causes.
So that was an example where I just pulled a QA model, a question answer model.
And that was like a fun way to.
It's funny because it was before prompt engineering and that's kind of what I'm trying to do. It's like, I didn't have time to like label a thousand
conversations or train people to label a thousand conversations. So yeah, I tried the QA model.
And that comes up a lot where like somebody, they want to know, oh, well, let's look at the
sentiment. You know, we're looking at CSAT, but I want to see oh well let's look at the sentiment uh you know we're looking at c-sat
but i want to see how the sentiment of the user changes over time and how that's related to their
their ultimate customer satisfaction so let's pull a sentiment model off a hugging face and like
evaluate how well it performs and see if it can give us insight. And if we like it, then we can take it, pull it,
set it up, make any changes or fine tuning that we want and use it in our pipeline.
So Hugging Face is more like a repository of models that you can pick from with some indication
about what the model does. I see. When I first looked at Hugging Face, I was a little confused
about what the product was. I hadn't seen it until maybe early this year.
And it had like a hosting API.
It had this repository of models.
It has like a code space, some leaderboards.
And I'm like, I'm super confused as to what this thing is.
I should ask someone who uses it to know what it actually is.
Yeah, definitely a repository of models.
Just like a really great knowledge base. You know, what's funny is now that I'm thinking about it, Hugging Face is almost like turning into what OpenAI wanted to be when they started out. Like, do you guys remember when OpenAI was open and they facilitating, like learning about AI and making it like democratizing AI was like a hot term in the early 20 teens. And they had a reinforcement learning playground where people could test their different reinforcement learning algorithms and share them. And it was really cool. I think they did a lot more like outreach
and communications and sell them at talks and stuff.
And then, you know, things happen.
Capitalism happens, COVID happened.
And, you know, then they started kind of focusing
on these, you know, privatized use cases.
And, you know, now we have Hugging Face,
which it's not like a reinforcement learning playground. I think it's even more um you know now we have hugging face which it's it's not like a reinforcement
learning playground i think it's even more you know power for that powerful than that because
it's just like lots of machine learning models that you get to try out real quick and iterate and
you know the license is there you know who wrote it you know you can give homage to the authors
it's like get up for ml models of sorts absolutely
yeah with some fun and bells and whistles yes a lot of bells and whistles right now that i think
of it open ai had the like the dota competition thing right like that they were doing like rl
with like different video games and trying to right right right wow it, it's very different now. You're right.
Yeah.
By the way, for folks who are
new like myself and who
maybe want to learn more or
play around with some of these models,
do you have any recommendations for a place to start?
Like, hey, go do this course
or something,
some posts or blogs that you came across
which are helpful?
I think that I'm going to give the advice that I think is pretty canon,
or a lot of people give, is just start getting your hands wet right away.
I think courses can be helpful for people,
especially if they don't have engineering experience.
It's hard to get off the ground when you like can't install Python.
I think it's very frustrating,
but like if you have like some like basic engineering skills,
I really recommend just getting your hands wet.
People traditionally for data science,
people used to say,
go to Kaggle,
Kaggle,
like host data sets and competitions.
And it was a place where you can go to get data to play around
with um but now honestly i recommend that people check out upwork so upwork is a company where
like companies can hire people as contractors for different tasks and one of those tasks can be data
analysis machine learning so upwork i think is a good place to hone a new skill. And like you're
motivated because like companies will post different jobs. So you can see like what a
company wants to do. And you can do whatever like their take home interview, see how well you're
doing and see what like people are actually hiring for. And this is what you used to only be able to
do by like going through the grueling process of like applying to a bunch
of companies and hoping that they like this is what before i started insight this is kind of how
i got started doing real data science is like getting a job interview at a company like uber
facebook and failing miserably and then going back and resolving that problem and being like okay now i can do this so
like now you can do it through upwork and like you can you know i i think it's less soul crushing
maybe than having to like get all excited about working for a big company and realizing that
you're not there yet yeah that's good advice relatable yeah cool
um
so we want to be
respectful of your time
sorry that we're already
going over Melissa
but
um
maybe the
the last question
um
so like
suppose you had the power
to send a tweet
that everyone
will see
what would your
280 character message say
stop using twitter
damn uh ah message say? Stop using Twitter. Damn.
You got us there.
We need a better question.
Yeah. Oh man.
Like we, well, to stay
on theme with the LLM, you know, we actually
use the LLM to come up with a question
like this, you know, that sounds somewhat
reasonable and I guess in also a very LLM fashion that up with a question like this, you know, that sounds somewhat reasonable, and I guess in
also a very LLM fashion
that, you know, it could lead to
hilarious failures.
Anything else that you'd like to share
with our listeners before we close up?
Um, no.
Be kind. Kindness is
free. Kind to each other.
Kind to your bots.
Some people think that one day they will rule
the earth and they have all the data.
So be polite.
Awesome. Thanks so much, Melissa.
Thanks for reminding us.
Yeah, it was a lot of fun. Thanks for having me.
Hey, thank you so much for listening to the show.
You can subscribe wherever you get your podcasts
and learn more about us at softwaremisadventures.com.
You can also write to us at hello at softwaremisadventures.com.
We would love to hear from you.
Until next time, take care.