Software Huddle - Practical AI for LLMs with Emanuel Lacić
Episode Date: May 22, 2024Today we have Emanuel Lacić on the show. He was in academia for a while. Now he’s been working at Infobip for the last couple of years, building some of this AI stuff and putting it into producti...on. We picked his brain about the best practices when it comes to AI and what we can expect to see over the next couple of years.
Transcript
Discussion (0)
Hey, folks, this is Alex. And today we have Emmanuel Lachich on the show.
Emmanuel has a super interesting background in AI, LLMs, all that sort of stuff.
He was in academia for a while, so went really deep on this stuff.
Now he's been in industry. He's been working at InfoBip for the last couple of years,
building some of this AI stuff and putting it into production.
So Sean and I were able to catch him live at InfoBip Shift.
And he just told us, you know, what are some of the best practices,
what we can expect to see over the next couple of years and things like that. Really interesting conversation. As always,
if you have any comments, suggestions, guests, anything like that, feel free to reach out to me
or Sean. With that, let's get to the show. Emmanuel, welcome to the show.
Thanks for having me.
Yeah, I'm super excited to have you here because we're at InfoBip Shift here. You're talking about
chatbots and LLMs and all that stuff,
which is a very hot topic right now, but you have not only some practical experience,
but a lot of deep research background on that, so I'm excited about that.
Could you maybe introduce yourself and tell us a little bit about your background?
Sure, sure, yeah. I started with an engineer, I was at college working already,
and then my first job was also like a normal engineering developer role
And then at some point I said like okay
Really, I want to like maybe try out a bit of science and research because it's really interesting
especially kind of some research topics, so I
went to Austria for
Work at the University of Graz so there basically started my PhD then moved to a company
where basically what it is and this is what really excites me like this applied
research because a lot of people like who just go like research work on
science stuff yeah they're like okay I mean this like theoretical thing like I
don't have I don't see the application where it ends up and this was a really I
would say a nice mix-up because I had the opportunity on the one hand with my team,
okay, let's really do state-of-the-art research, publish at top-tier conferences,
work together with the research community, and then also at the same time working with different companies in the industry.
Hey, you're the experts in that kind of field in AI. Can you work with us?
Let's build something cool and deploy it and really see how does
it perform, is it helping their users and how can we improve on that.
And usually when you combine those kind of things, then you see a bunch of different
problems.
Yeah, that's where the magic is though, making it real for a lot of folks.
That's awesome.
Can you tell us about what your research area is sort of focused on?
We're a specialist in basically recommender systems.
So these are basically the algorithms behind how Amazon decides what product to show you,
YouTube how does it decide what kind of content to show you there, Spotify to generate, how
does it generate the whole playlists.
The applications are really far and and wide you have everything basic but usually whenever you have some kind of user which
interacts with some kind of content that's basically that and this is basically a subfield of information retrieval
where you're basically searching for stuff and trying to
Show it to the user before he even explicitly requests it.
Basically helping him to have less burden of this information overload because I don't
know how many videos are on Instagram, like real.
No way I'm going to search through everything.
Are one of those recommender systems based on essentially building some sort of model
representation of the user so we can say, say like my model kind of looks like your model and we know what you did and
essentially can predict what you know this user is gonna do based on that or
is there a lot of times based on the similarity from the content where this
content is related to this other content and we've seen people kind of bridge the
gap of those things so we're gonna essentially recommend that next it's
that and combined together and on top of that different
kind of hybrid specialized models so you have basically the recommended system research
community is like i would say really uh uh it's highly uh collaborating with the industry so
there is a lot of different outcomes out there so you have like specialized models how to like
just model the user perspective
its behavior from neural networks some other methods then what do you do when you have a lot
of user data logged in users but what do you do if you have just like anonymous user sessions
those kind of things then depending on what you're trying to achieve for instance if you think about the human resources domain, so if you're matching candidates
with job opportunities, maybe you need to ensure fairness, that you're not discriminating
against some sensitive attributes, those kind of things.
You also have to consider, let's say, the perspective of the content and also the user
perspective and you have also seasonal factors.
Usually it is a combination of multiple models
in a huge pipeline and it changes over time.
So what works today doesn't mean
it's going to work in a few months.
So there's constant innovation in that field.
Yeah, and historically I feel like a lot of AI
has been driven or AI innovation has been driven from research
community but now especially with like LLMs and the cost of GPUs running this stuff at
scale is a lot of the innovation now going to be driven by industry or is it sort of
a combination of the two?
Actually I would say I mean this is the thing that's most fascinating, it has become so
much easier for like anybody to like try new stuff out.
I would say this is exactly the case, a lot of, now more innovations come from the industry.
Usually it's, I think it's always a collaboration of both, but I would say the main way how
to achieve innovation is like just to scale it up as many people as possible.
It doesn't matter if you like have a scientific background.
Basically, there is always, there's a, you don't know, like maybe there's somebody who really now has the motivation,
has got an idea and he tried, I would say the best thing that you can do is now like open it up and try to scale it up as much as possible.
Yeah, and someone with a different background might actually be able to draw the dots together
in a new innovative way where someone who's maybe coming from more of a traditional sort of
theoretical computer science background, they're kind of locked into their way of thinking
and they're not able to necessarily connect the dots.
I can actually give you an example from, I mean, again, a bit of research,
but it doesn't matter, it's maybe just a parallel so
this cross collaboration with different people from different
backgrounds I think it's the most like useful thing that you can do for instance
we like in my previous research team we collaborated with psychologists so they
know about some psychological models from human memory theory, I know, a lot of decades ago.
So they just, by collaborating with them, you see, okay, maybe this is some kind of model that we can
test out as an algorithm, adapt it, and then out of that, for instance, like, yeah, we came up with some
other, basically, like some new algorithm to recommend stuff based on recency and frequency, how the human brain
basically forgets stuff, so those kind of things.
Usually there are many other applications, so I would say the best way that you can do
is really collaborate with different kinds of backgrounds, because you're in your own
filter bubble and usually don't see far outside of it.
You mentioned a few different companies that use that use this stuff like Netflix for videos or Spotify for music,
Amazon or Etsy for products. Are they using pretty similar patterns or are they
gonna be pretty different and distinct depending on like what sort of area?
So I would say every larger company that's like a huge pipeline of different
stuff and you have specific teams who focus on optimizing
let's say some kind of KPIs depending you can for instance have let's let's
imagine a website so you have the home page I mean this is a really simple
example so you have the home page and then you have some kind of details page
of a specific product usually we have like different maybe algorithms or even
like teams working on specific parts where you're gonna show that content and it's same thing holds depending on like
what's the scenario or the use case that you're trying to improve on there are
then different things of course they're sharing the the family of algorithms but
it's always domain specific there is no like silver bullet this one algorithm is
gonna work on your example in your because you have different kind of data, different user behavior, especially
I would say the user behavior is here the most important thing because how I'm like
behaving on Netflix, it's not the same thing as I'm on YouTube or Instagram because like
my attention span is much shorter.
So just like give me now, now something.
So I would say the user behavior drives it mostly and also the content, which is on that.
You mentioned how models sort of get out of date.
You have to be always kind of updating your model.
Is that because the baseline is shifting?
It's like, it was this good, but now everyone's getting that good.
We need to get better.
Or is it because, hey, the model's even shifting user behavior
and different things? or what accounts for
sort of one thing is really I would say fruitful community so a lot of progress
is being made over the past years and still it's an ongoing thing so out of
it people just like to try to do some stuff and then it also depends as I said
like based on the user behavior.
Something if you have, let's say, seasonal patterns, then something was going to work
on in winter, during Christmas, is not the same kind of algorithm.
So you have to maybe like you're specializing the algorithms based on some kind of an objective,
let's say, in that timeframe.
Or another thing, you're changing something in your product, in the UI. So you're changing basically also the
way how users behave and how they perceive the content. And even some
small things, how in the UI you want to present something, may then require that
in the end you're gonna change the algorithm because of that UI change,
users perceive something. For instance, you have this position bias, this effect in human psychology,
where people are more likely to click on the first thing. So when did you go last into the second
Google page? Yeah, never. Yeah, exactly. That was a really, really needed answer.
People then depend, if you're then showing on some list or a car carousel or if you see like a netflix has like this different like
uh carousels one in each other so if you look like at uh like visual representations of the uh
from the eye tracking software so they usually like focus on this like first down right and
that's that and then like if you go to this infinite scrolling again they're different
kind of behaviors and usually if you have your own product, you're going to change things up.
You want to stay relevant.
You could want to be better than a competition even because of that.
And then let's say one thing, it's not only, okay, what's the underlying algorithm, but
you have like a specific field of like how to explain what is black box model.
Why did it show me that?
And this again, like is like complementary to this
baseline algorithm so now you're stacking on top of that other other
things and then it's a combination of everything yep where are we at with that
sort of thing like having this block or black box model or getting some
explanatory power on that are we are we getting better understanding why the
model is recommending what it is I would say this is one of the really it's
getting much more traction, especially in Europe.
In Europe you have now this AI Act,
where the idea is really to try to bring more transparency
and fairness to those black boxes.
So depending, let's say, especially if you are applying,
it doesn't matter the recommendation,
but any AI, let's say in some high risk domain, let's say health,
or really impacting some kind of people,
you may be needed that, okay, you need to explain those.
And a lot of things have been done,
and they're gaining much more traction,
and especially I would say now with large language models,
like now this generative AI trend,
it gets, at least as I saw in the recent trends
in the research community,
much more people are working on it, and not only.
So this is, I would say, one of the things
that is also highly driven by the industry,
because it has been shown that,
by even if you don't need to,
if you explain to your users
why they are
getting something, it correlates really with the acceptance rate and their engagement.
So you're actually inclined to, hey, do this.
The more transparent you are, the user actually trusts more your platform and is less likely
to churn or whatever.
Yeah. So you brought up LLMs, and I know for me,
I would say they really came on the scene
with the release of ChatGPT.
I'd sort of heard about them before,
but I wasn't really paying attention.
I would say for a lot of engineers, that was true,
and definitely the general public.
For you, that was more steeped in this research.
Did ChatGPT seem like a big shift change or step change for you?
Or was that just more like, hey, I've seen GPT-1 and 2 and 3?
It was sort of a progression.
I was aware of Tomales at that time.
But I think this is a really nice example.
It's like collaboration with the industry and science community.
Because if you're able really to open it up up at the scale then you now because I
would say when like with chatGPT when it came real like it's so easy like here go
go to the website use that API you're able to like bring so much new kind of
research so the thing is if you let's say put it let's put it in perspective
you're now some kind of scientist and you're working somewhere you don't have the necessary resources like how we're
gonna pay the cost of a hardware to host now your old LLM to train it for how
many hours I mean this is really remarkable how many man hours and really
money has went into that in the end that there is some kind of interface people
can use it so usually I would say the biggest problem with
like usually in science and research is reproducibility and basically to be able to
know to whatever somebody has said hey we've done this how can i now run it and use it and now
basically somebody has said hey everybody use it so this is I would say one of the
driving factors yeah and I think one of the things that like chat GPT did which
was unique was it become essentially the user interface for AI whereas I think
before and much you were you had some familiarity with AI like it was kind of
this thing that sounded like science fiction I think to non like technical
people that were involved with it.
And it was hard to sort of explain what are the things that you can do with it.
And then suddenly you created this super useful tool that you can point people to.
It's like, here's an application of AI.
Yeah, I mean, this is, I would say, the combination.
You can have the best AI model, whatever it does, but if you don't
like apply it in the right way, like it's not going to maybe be useful to anybody.
So from their side, it's a combination of also like from the whole engineering department,
how to make it user friendly, how actually to be usable.
And usually it's collaboration between between as you said like hey you
don't go out of your filter bubble work with multiple i think that's the perfect example and
i mean uh for instance like one of the really good benefits which came out of that so now you
have like multiplying projects and github frameworks with train your own lm fine-tune it on
uh if you have like like some kind of user feedback,
which where people, so you want to do
some reinforcement learning,
or there are now some new human-aware loss
optimization functions where if you have implicit feedback,
then here you can even much tune your model even better.
So now you have many more things
which weren't available before that.
Yeah, very cool. Okay, so this is a good time I think to shift into your
talk which is prompting enough. Can you just tell us maybe about your
talk? Yeah, so what I'm going to talk in general, so now like with also
this, I would say the rise of generative AI, making co-pilots, so most
prominent kind of co-pilot like GitHub co-pilot here, so most prominent kind of co-pilot, like a GitHub co-pilot here,
it's gonna generate the code. You have BitJourney is also an example of co-pilot,
generates the image. Microsoft now with Windows Office also started like to put
those features, but the trend is basically many software technologies are
gonna want to have a feature where user
just writes natural language and that natural language should now achieve or
some kind of action or do some kind of task.
Does ChatGPT fall into Copilot or is that just like a different thing?
So Copilot is, let's say, the general term of those kind of behavior. User writes something,
text, whatever, I want to do something, and now your domain specific software,
it doesn't need to be on, let's say GitHub,
but your own software which usually does something
now translates this into some kind of meaningful action.
But the input is like natural language from a user.
I want to tell you what you want to do.
And chat, let's say, OpenAI,
its services can be used for that. You can just make some kind of, let's say, prompts. And what I'm going to go into my talk is, okay, sure, you
would start with prompting, try to see can you maybe that natural language transfer in
some kind of code block or element that you can use in your own
domain specific solution but at least like in our case we had to even fine
tune or we wanted to like with prompting you can achieve I know that much so in
my talk I'm gonna go into direction okay how you even gonna measure how good is
it performing or not so like is it hallucinating again for your domain what does actually hallucinations mean how
accurate are you with your predictions and then if you like want to go over
that you can either fine-tune commercially available so from GPT 3.5 or
open-source models so this is like what we also did at InfoBip and
where I'm gonna show, okay here is a process if you would now like this is
this was like our process but usually it's similar there are already some
interesting research papers and blog posts and they go over that how what you
need to do but usually it's like domain specific data you have
to set up again the whole pipeline to know in your domain in your business
scenario what you want to achieve yeah I want to get into specifics there but
maybe before that do you think because I feel like there's this growing trend
where people are building more and more co-pilots or essentially some sort of like
free form text input powered by alarms do you think there's from a user experience perspective
like is there danger with us like stuffing co-pilots like into every piece of tooling
like doing it's kind of like in the era of like social media when that first came out like
suddenly everything has to have like a social component whether it makes sense or not. And if you give people free form entry, sometimes that's too much choice and it can
actually lead to a bad user experience.
I think the last part is you nailed it perfectly. I think the one thing that it looks like it's
going to happen, like everybody, like users are going to expect there is some kind of
functionality, but it's now your job to say, hey, look, here, it doesn't make sense.
Because you're right, it's not at every step of the process you need to have those kind of,
some things just don't make sense.
It's easier, I know, to do some kind of different maybe UI solution or some backend,
something doing.
So I would say everybody, just because of the hype, will want to have some kind of features.
But in the end, it's going to over time go down and say,
okay, now here it really makes sense, here not.
Those are the use cases where usually you would like to have a co-pilot functionality because we saw people really it drives user engagement.
Yeah, cool.
So based on your sort of experience with building these chatbots, I want to walk through like some of the questions I hear come up a lot and like just hear maybe your experience with it. So maybe let's start with just choosing a model, right? You know, I think OpenAI and the GBTs sort of kicked this off, but we've got all the anthropic models. We've got a bunch of open source models, we've got Llama 3 that just came out. In choosing a model, how much should I just be going with the standard OpenAI type stuff?
How much should I be looking at other ones?
There is this, I like this principle, fail fast, fail often.
So whatever helps you to fail as fast as possible.
And this is, I would say, one of the benefits with I mean it's not
that I'm doing a commercial like I mean you if you have the necessary
engineering expertise and you have like I just deploy some kind of model and
really even like with with Lama 3 and now like this is really easily
available if you have the necessary hardwares or structure just run it or even you
have like smaller models we can whatever basically helps you to make a
proof of concept as fast as possible because i would say this is uh this is the most important
thing because you don't want to spend i don't know several months into whatever openai makes it
and makes it really easily accessible but uh the same thing like if you have the necessary
engineering expertise like you can run your model or like if you have a necessary engineering expertise like you
can run your model or like if you already have some kind of GPU or a
hosted instance I mean it's not hard to get so it's possible but just don't I
would say over engineer in the beginning just try to see if you can come up with
something useful and then based on the insights think about okay good now how
can I improve if it brings me value it It can be that, okay, look, as you said, maybe it doesn't bring me value.
It's not this AI-assisted solution system.
Maybe it's not that kind of useful.
You have to test out.
One of the trends that I've heard recently and I I'm seeing from like customers that I work with and stuff like that is that a lot of people will start probably in that mindset of fail fast. Let's start with
like open AI or some sort of public model where it has an API interface and I can get up and
running quickly, but then they reach a point in that journey where now they need to take this
production and they need more control. So that's, they end up sort of going more of a sort of private LLM route where they
can really sort of adjust the knobs as needed.
Is that something that you think is like necessary for the era that we're in, in terms of the
technology right now that at some point you do need to get in there and kind of like tweak
things in order to create a really good experience?
That's it.
On the first thing, a lot of times it's also money thing. So once you have traction,
that you have let's say a larger amount of requests, then it makes sense where you would
like want to save up costs. But yeah, I would say then at this point it really
helps you like improve on your specific use case.
Because then you have the power also to tune the model, to adapt it, you have more control over it.
Because at the beginning I would say maybe it's not that much important just to have something where you can gain traction.
But once the traction is there and you're running and you're thinking about,
okay, do I need a higher throughput, do I need to maybe like to want to improve that model to boost user engagement or really to, I don't know, add
some specific steps of my user journey to help them achieve better results, then it
makes sense.
And I would say it's more like of a process.
You start with whatever helps you to be as fast as possible and once you like reach a
certain threshold, then you move to a privately hosted. I mean you have like different let's say frameworks which
already help you with that so it's a community which really grows and already
now there is a lot of things there. Yeah one thing I saw someone talking about
recently is whether you should maybe start with a more powerful LLM and then
move to a weaker one to save on costs or go the other way?
Do you have any thoughts on that? Like, hey, should I prove it out first and then try and
go cheaper? How do you think about that? We went with this, like, go with a more powerful LLM and
then cut the cost because it's like with, I don't know, it's sort of like investment. You want to
see as fast as possible, yeah, well, like, what's possible? Sure, the costs are like, okay, we're gonna see
in a given timeframe, let's say a month, something,
is it useful?
Because it could be, if you go with a small,
not that sophisticated model, that you think,
okay, there is no way this solution is not gonna
achieve any traction, but maybe the only reason
is because you went with a smaller model. I would say the beginning costs are justified.
You just have to take it. And if you're starting with a more powerful model and
you get something that you're happy with and then you go to sort of like a
cheaper smaller model, then does it also help from a testing perspective?
Because you can essentially compare from an input to output perspective, like this
is what we get with the higher cost, more powerful model.
Are we able to essentially replicate that for our specific use case using the lower
power, less expensive model?
Exactly.
It's a trade-off.
So you have to see, okay, now, am I satisfied maybe with a little bit less performance,
but let's say on how accurate the model is but in the end if
it's enough save on the cost and the runtime and the throughput but at the
same time you have like one of the recent like trends is like using really
like higher like like larger models to fine-tune your own smaller models for
it's like what I'm also gonna talk in my presentation is that there are already some smaller models
which, let's say, used LLAMA2, they proved a bit of the model out which was not necessary,
but in the end it's still possible to achieve good performance with those kind of models.
So in the end there are already new strategies coming up,
how to start with a larger model
and then use that knowledge
to, let's say, find your specialized.
I think the trend will go more
into the direction of a lot of domain-specific
specialized models,
which are smaller, more performant,
like expert users or expert models,
you could say.
Yeah, I've heard that with a company that I talked to that is working on deploying models
to essentially like phones and other edge devices.
So they have to compress the model.
And then when they compress the model, presumably you're going to lose some level of accuracy,
but they can then fine tune it against the original model to get a compressed model that actually has some more performance.
Exactly.
I mean, if you have, let's say, I don't know, the amount of, let's say, content that you're
generating is like, if you only usually in your use case, like, need 20% of it, then
you don't care about 80% of the rest if it doesn't, like, correctly generate or predict.
It's more depending on yours and like we have phones are really I would say a perfect
example of that because I think this is one of the like one of the areas which
are now also gaining traction how can I now make my own language model for an
Android iPhone whatever some kind of like a smartphone which it's now running
on there because I don't need to then waste time or network to send data.
And I also then have basically improved accuracy, or improved privacy, because privacy aspect
is also something which now people will, I would say, start to think more and more as
time goes on.
Yeah.
And there's certain applications where if you're doing it you have
to do it on device in order to support the use case like real something like real-time translation
you're not going to be able to do like a network call in there to do translation and then come back
like it would never ever work. Yeah no so this is I would say one of the one of the next I would say
big construction sites that are going to happen.
Because yeah, that's a problem.
That's something which really helps.
It improves a lot of user experience and especially if you're traveling somewhere, you don't have
internet at that point in time, you still want to have some kind of functionality and
these are real cases.
Yeah.
You talked about moving from the platforms to private LLMs for control and different things like that.
What about just purely on cost?
If I'm maybe using a GPT 3.5 or I'm using an equivalent model that's private and I'm running myself,
am I going to be able to save a lot of money doing that or is it not going to net out?
Let's have a look.
One thing, maybe just to relate to that, to this previous question.
One of the like also like this cost saving strategy is like to mitigate the cost to somebody else.
So let's say if you're running that model of somebody's mobile phone, so you're not paying for that.
Right.
So those are like one of those like strategies that you can do.
But usually, yeah.
So once you go like bigger in the scale, you're gonna have, I would say,
better cost control with your own models.
And I think this is also like if you like when you perceive like somebody writes in
Reddit or somewhere, chat GPT works now worse or better.
I mean, usually companies also like experimenting background, see, okay okay if they can maybe like save some kind of
cost maybe improve or like where is this trade of like how good like this is something that i think
every company with every ai model uh is doing regardless if it's a large language model or any
kind of different ai and then does um that take into account also the fact that you might have to
hire new people that have the like
domain expertise plus there's an infrastructure costable like it's not just like hey can i get
this thing running on a server i now need to i'm now responsible for scale uh reliability i mean
there's a lot of value that is why people go to like public cloud services like i'm not gonna i
don't want to run my own data center.
People still make sometimes the choice to run their own data center, but it's now like
I would say if you're in cloud, it's a choice that you're making at some certain level of
scale, but there's always going to be these trade-offs from the maintenance perspective.
I think it's the same progress as you had like with DevOps.
Now you have like MLOps.
This is now even growing and if you compare these charts which show all the tools related
to machine learning operations a few years ago to now, it just grows.
This is something I would say also, a new field of expertise that roles that the company
will need to acquire because there is a long way, I mean it doesn't need to be that long but there
there's like several steps if you go like from just a purely experimenting
with some kind of model until it really runs in production and you're I know
even thinking about having multiple of them switching between them measuring
online performance and incrementally doing some kind of updates.
So that's a burden which it's not like only one person
or one role can manage.
You have to have specialized roles for that.
Yeah, you're in charge now,
not just of deployment of the model,
but all the updates and upgrades as well.
Exactly.
Because just if you think the amount of time, how like let's say now if you're just specializing in data science in this
models now I want to like spend my time into thinking about okay
what kind of data maybe brings me a better improvement?
But if I only I mean it makes sense to focus on that
but if I also need to focus on how I'm gonna deploy it and something and you're just like the day has 24 hours so you will
have to cut somewhere off so I would say if you as you like grow like in scale
the usage of your own elements you're gonna have like specialized roles in
those kind of things you will need to have.
Yeah you mentioned fail fast or even like deploying improvements to your models, things like that.
How are you even measuring improvements or measuring failure, things like that with this
non-deterministic system?
What are some patterns that are in there?
So usually you would, let's say in our case, so we know, I would say let's see like this like this like 80% of the any work on
some kind of let's say AI project is make preparing the data that you actually are able
to make an offline experiment which simulates as good as possible the online behavior this is what
you're trying to achieve to have let's say in the talk about LLMs, so you have the input is something
written by the user. I mean you have also open source benchmarks which you
can then use and then on your test you're trying to see, okay this was the
input, this was the generated output and then you have different
accuracy metrics or even some like one of the let's say recent
trends is use a large language models as a judge like because to grade the accuracy yes or to grade
even like to grade to grade like the from the psychology perspective the user acceptance like
because for instance this is this is one of the one recent
papers that they read I mean they're like models for large
language models which are trained on reddit data so reddit is a really like I
would say an interesting community people write really interesting stuff so
you can really I think so what they did let's say in that paper to see okay how
would the reddit user upload or download your given that input this is the this
is the rule like let's say I know this is the generated output if that would
be in reddit like how many like up downloads you would you get or even like
how likely is it that your let's say output would generate and this like
going as a like a tree like structure to go like downwards so how many
more collaboration or how interesting is it response so one of the trends is gonna be like
hey let's use chat gpt4 to tell me what a user how likely a user would be accepted is it interesting
for them uh is it correct those kind of thing because the hypothesis is because i mean the
large language models have the name also large in them so a large amount of data has been used to train them so the
inherent knowledge is somewhere in there so the the task is how can I now get
that knowledge back basically to see hey given whatever you have been trained on
and there's a high probability basically you can maybe tell me if now given whatever my generated the responses if it's correct
or not or if it's interesting or not so those kind of like new like evaluation
measures are going to like I would say pop up even more yeah one thing I've
when I've talked to people and I talk about hey what are the ways I can
improve the results I'm getting from my co-pilot type system, things like that, I think I could make my
prompts better and work on that.
I could choose a better model that would maybe fit better.
I could fine tune what you brought up a few times.
I could do RAG, retrieval off-limit generation.
How do you think about those four?
Are there certain ones you say, hey, try and do these first before you go into these, or
I'm not seeing much effectiveness with these? how do you think about those different approaches?
It's a it will depend on domain because okay, let's say retrieval argument a generation is like one of those popular now
Architectures now which everybody's uses you using but like if you have besides just retrieving stuff
You want to do some kind of action?
So you have to have some kind of action so you have to have some
kind of way to understand am i like doing a rug uh scenario do i need to do something else and
there are now also like other uh strategies who like try before you even do that someone okay
this is actually you have to now call something i'll do something else i think the retrieval
argument generation was just the most popular because it's like really also easy to understand and it falls into a lot of common scenarios and
use cases which people can apply to.
And maybe easier for like a non-trained, like a normal engineer to do compared to fine-tuning.
We have different vector stores which are available. There are already frameworks how to convert the text into some kind of embedding.
Basically coming to that back like fail fast, fail often.
It's really easy to fastly develop something.
And now you have to see how to continue with it.
Because we also saw it depends on really your use case.
If it's just like question answering, then Rag is something.
But usually then once you start with that, you see like, oh, okay, we have to continue with something else.
Yeah. And then with RAG, like, I think it's easy to kind of like, it's easy conceptually,
and then it's easy to probably get something out of a prototype standpoint, but like
to go beyond that, at least from what I hear, like that, that's like where it takes a certain level
of expertise and there's a lot of tweaking and figuring out
how do I actually get the correct context,
and not too much, in order to get a result that I need.
That's where I think the hard part is.
Completely correct.
And then you have some other effects that may happen
that you're not even aware of or that you want to control.
Let's say you have some content that you're in your vector store that you're
retrieving and this comes then in the generated response. So one of the like
one of the things that can happen is that then your model is like biased
towards popularity that depending how you uh how you implemented your retrieval stage that some
parts of the content will just in the end have more likely we will be more likely to be seen so
and then the question is okay in your rug pipeline do you have some kind of mechanism to track
how often something gets retrieved like do you even know if this is a problem or not if yes okay can you maybe as i said
like maybe have adapt this retrieval phase to get a bigger context or try to like say maybe say hey
listen this this paragraph has uh this kind of type of content retrieved a lot maybe you want
to like give some more like serendipity or something depending what you want to do yeah
do you need to when you're doing like rag it probably, depending on what you want to do. Yeah, do you need to, when you're doing RAG,
it probably depends on the use case,
but do you need to add some level of randomness
or variance in there so that you're not always getting,
maybe you don't always want the perfect context
based on probability, similar to how an LLM,
when you're generating the next token,
you don't always want the most high probability token
because when you do that you actually get language that isn't that great, like a great response, super repetitive and stuff like that.
So they have to introduce some level of randomness.
We'll now discover a really interesting research project to really find out when is the right time to...
When to hallucinate or when not to. I mean, if you go back
in machine learning, you have different algorithms like multi-armed bandits,
those kind of things. Usually there's always this question of
when I'm doing exploration versus exploitation.
This is usually a task that you need to solve. In the beginning, you're
not thinking about it.
Sure, like, why would you?
But then once, like, okay, you have something that works,
then you start to think, okay, when should I, like,
maybe focus more, like, exploring different stuff?
Or maybe sometimes you really need to be factual.
You have to see, okay, now this is, let's say, the legal domain.
So if you're, like, I don't know, for retrieving stuff which may help your interview win in some
kind of case, do you want to explore a bit different?
It depends where you're applying that model.
And also like, especially let's say with some kind of randomness factor, do you want to
maybe like sample the tokens to be like, oh, be more engaging? But then you're in legal domains really factual
and you have to be explicit in what you're saying.
I think that's one of the big challenges right now with LLMs
is the notion of memorization.
There are certain times when I want a memorized result.
If I'm asking for what is the quote that Mark Twain made
about San Francisco or something like that,
I want the exact quote.
I don't want a proxy to that quote or something like that.
But if I'm saying like, tell me Alex's social security number or something like that,
then I don't want it to give me a memorized result there.
So there's some times when memorization is okay.
And then there's other times where you absolutely want to guardrail against it.
And I think that's one of the triggers. It's hard to know,
like differentiate at the LLM. Exactly.
How can you also safeguard if, okay, you have whatever you don't care
let's say in the data that you're using for retrieving, but maybe you have like some sensitive data like
personal identifiable stuff. So how can you ensure that you're anonymizing the stuff at the
right amount of time. Yeah. Not in the beginning, maybe later. So what I think
it's gonna be is, or it looks like that, that people are gonna just build
pipelines and adding new, let's say, features which, you know, like anonymize
this stuff or now make sure, hey here want to double check make an additional prompt to
to get exactly that kind of content which is said or some kind of before even prompting to see okay
now i have specialized let's say prompting strategies or even models depending on the
task and first it's going to be like a router here go left and then he can point you to the
right direction it's going to be i would say here, go left and then he can point you in the right direction.
It's going to be, I would say, a more complex pipeline as we go in time.
Yeah, I think you have to move it to the pipeline level where basically the governance around the data is better understood than trying to do it at the LLM level.
Because essentially what you put in the LLM, it's a little bit like you have a soup at that point.
It's like a broth.
So if I have a broth, I can't essentially...
You can't pull out the potato.
Yeah, I can't pull out the potato with my spoon.
But that's essentially what you're trying to do sometimes where people are trying to
essentially shove that responsibility onto the soup to try to differentiate between potatoes
and carrots, but it's much easier to do that at the pipeline level.
Exactly.
I mean, even like if you talk about like finger pointing, like if you like generate something
like bad, like something unexpected, you're going to get a finger point.
Hey, why did this AI now generate this?
So yeah, it goes back to the explanatory stuff.
So I would say it's a nice like of different LLM related features or action plans which are going to go.
Do you think RAG is here to stay or do you think it's one of the better tools that we have at the moment?
I would say information retrieval for a long, has been relevant and is still relevant. So it's going to evolve into something.
So it's going to be, I would say, one of the really recent trends that we now saw is generative
information retrieval.
So I think this is now the next step where people are experimenting with where you want to train your LLM to already
generate what should be retrieved.
So the current like most pop-up cases, okay, I have some kind of index and then I'm doing
some embedding comparisons and or some kind of strategies with maybe re-ranking whatever
and then I'm using that with OpenAI or LLAMA or whatever kind of L with maybe re-ranking whatever and then I'm using that with
OpenAI or LLAMA or whatever kind of LLAM. But now the trend is going to already
train the model to give me the basically the ideas let's say of the documents
which should be the same thing like as retrieved. So this is a so there was I
think what in April there was the European Conference of Information
Trivial that they also had like a nice tutorial on that.
And this seems like one of the, if you also look at the SIGIR community, like those are
now, I would say, the trends where what people are trying to achieve.
I think one reason RAC has been so popular has been, you know, the smaller context windows,
but now we've really seen those grow.
Do you think in five years it'll be the case where context is cheap and plentiful,
to where you can stuff millions and millions of tokens and it won't cost that much?
Or like, now that we're seeing Gemini with a million token context windows.
I think it depends heavily on the hardware, because in the end, let's say this context,
in the end it's a matrix, so you're doing matrix multiplications.
So if you're able on a hardware
perspective really to do it efficiently so this will be able to say the main
driver from I know Nvidia and the who's gonna be the first one who really makes
it possible this is in the end like okay who can and how are you or maybe there
will come up some new solution to even like I don't I mean the whole point of like the attention mechanism to see like
what part of context is maybe not really important so either those kind of like
new strategy will come up or really the hardware which I would say if we get a
really huge boost again like a new jump in from the GPU perspective,
then probably the next thing that's going to happen is even bigger contexts.
Do you have thoughts on vector databases in terms of the specialized vector databases
that are like, that's all they do versus the vector databases that have essentially taken,
they've gone from relational relational databases or even even like document stores like MongoDB now supports like vectors.
Like where do you see the world of essentially structured data and vector databases going?
So I mean this is also interesting.
I would say the cool thing is now a lot of like NoSQL, SQL solutions like I know Postgre
has also like now has a PG vector support. So depending like in
your engineering team where the expertise lies, if you already have
vector support, use it. This falls into this fail fast, fail often principle just as
fast as possible to test it out. There was also one, I mean sure for
it's like if you go for performance, if you already know that you will go, gonna have like, I don't know, billions of transactions, you have like, those kind
of specialized vector databases, this definitely makes sense.
The thing is, it really depends on your own team and let's say the company, the company
organization.
For instance, there was one research paper which basically stated, okay, is Lucene enough?
So you have Elasticsearch, Apache Solar, which are built on Lucene.
And the thing is, those are basically really optimized for fast retrieval.
You don't only have vectors because they were like focused more on text search.
So you have multiple stuff which you can use in addition to embeddings and like
even like in Elasticsearch you besides dense embeddings you have like sparse
embeddings those kind of things which are also you can maybe try out to see if
this helps or you can like make some kind of hybrid combinations.
But to get back at the papers, the statement was if your company, if your business is already
let's say established, they have a dedicated team who already supports Elasticsearch, go
with that just from a business perspective, from a business value because you will much
faster develop something.
And then if you
really then get to the point where I know we have to have like millions or
how many like transactions that have really billions of documents stored in
it then think about really specialist or like if you're at the start I think
whatever like wherever you feel it's like it will help you to achieve something good.
And then, I mean like the whole Lucid I would say stack is really well known, well maintained.
You have a lot of features. I would say you won't go wrong with that.
Once then you see, okay, I have some performance requirements where I need, then you can think about to go for them but I mean it depends highly on you your own team your own preferences
and proficiency well we're coming up on time here so I want to close off with
some quickfire questions sure we said people with so if you could master one
skill that you don't have right now what would it be to try to be much more on
time I think I was like what one minute before I came here, one minute before, so I should have maybe like planned.
That's great. What about what wastes the most time in your day?
I mean it's communication. I mean it's still important. It's hard sometimes to say hey okay I have to focus more on my stuff like it depends
it's I mean I mean it falls to the main of everybody like I mean if you
collaborate with a lot of people you always want to help somebody then and it
will take time but maybe it's not a waste it's like it and kind of
investment maybe like short term you think like ah I don't have time but like
I think about it yeah yeah that's good yeah cool if you can invest in one
company that's not not Infobip, not who you currently work for, which company would that be?
There is no other company I would invest in.
Great answer. That's the first time I've heard that.
Yeah, they're monitoring your answer right now.
Yeah, cool. What tool or technology could you not live without?
Tool or technology? I mean, I don't think that there's a lot of things.
You mean more like software development?
Yeah, you can take it anywhere you want.
You could say airplanes, I don't know.
I would never fall into the middle of really being mainstream, but I'm hooked on mobile.
Yeah, for sure.
What about what person, what one person influenced you the most in your career?
My basically
my
I would say I had the luck at every company where I worked at, I really had good mentors like and
usually this really defines you how you're gonna
like, like it really can help and bootstrap you in your career. I think I had really the luck that wherever I came,
and especially now in InfoWeb,
you're working with people who are great,
and this really makes or breaks you a lot of times.
Yeah, absolutely.
All right, last one.
It's a good one for you.
Five years from now,
will there be more people writing code or less?
Oh, I would say more people, but in different ways.
I think it's going into the direction of opening up.
I hope so, at least. I'm a fan of it.
Yeah, well, Emmanuel, this was a great talk.
Thanks for doing this. I really appreciate you coming on.
And best of luck on your talk later today.
Sure, thanks.
Thanks for coming on. Cheers.