Orchestrate all the Things - Evaluating and building applications on open source Large Language Models. Featuring Deci CEO / Co-founder Yonatan Geifman

Episode Date: March 6, 2024

If we look at the current status quo in AI as a case of demand and supply, what can we do to close the gap between the exponentially growing demand on the side of AI models and the linearly growi...ng supply on the side of AI hardware? This formulation was the premise on which Yonatan Geifman co-founded Deci in 2019.  Today, with the generative AI explosion in full bloom, demand is growing faster than ever, and Deci is a part of this by contributing a number of open source models. Join us as we explore: How AI models are different than traditional software and what open source means in AIChoosing between GPT-4, Claude 3 and open source LLMsCustomizing LLMs and fine-tuning vs. RAGEvaluating LLMsMarket outlook Article published on Orchestrate All the Things: https://linkeddataorchestration.com/2024/03/06/evaluating-and-building-applications-on-open-source-large-language-models/

Transcript
Discussion (0)
Starting point is 00:00:00 Καλώς ήρθατε στο Αρχιστήριο των Πορταγών. Είμαι ο Γιώργος Ανατιώτης και θα συνεχίσουμε τα πράγματα μαζί. Στοιχεία για τεχνολογία, δίδα, AI και ΜΕΔΙΑ και πώς φύγουν σε έναν άλλο, σχετικά με τις πληροφορίες. Αν δούμε την σύστημη στάτικη της AI ως περίπτωση της ανάπτυξης και της προσπάθειας, τι μπορούμε να κάνουμε για να κλείσουμε τη γύρω από την ανάπτυξη που αυξάνεται μεταξύ των AI μοντών και την αυξανόμενη προσπάθεια που αητό στον πλευρό των AI χαρτιών. Αυτή η ομορφή ήταν η πρόκληση στην οποία ο Ιωάννης Κέφμαν βρίσκεται στο DECI το 2019. Σήμερα, με την εξελίξη της AI με την πλήρη πλήρη, η αρκετή αίσθηση αυξάνεται πιο γρήγορα από ποτέ
Starting point is 00:00:37 και ο DECI είναι μέρος αυτού από αυτό, επίσης, επίδρασε σε πολλά open-source μοντών. Επίδραστε όσο αναδειχθούμε πώς οι μοντές AI είναι διαφορετικά από το παραδοσιακό software και τι σημαίνει Open Source στην AI. Εύχομαι μεταξύ GPT-4, Cloud 3 και Open Source evaluating large language models and market outlook. I hope you will enjoy this. If you like my work and orchestrate all the things, you can subscribe to my podcast, available on all major platforms, my self-published newsletter, also syndicated on Substack, Hackernium, Medium and DZone, or follow Orchestrate All the Things on your social media of choice. Hi George, I'm happy to be here. I'm Yonatan, the CEO and co-founder at SC. And in my background, I've got my PhD in computer science at the Technion in 2015 to 2019.
Starting point is 00:01:35 It was the early days of deep learning. And in that point in time, it was mostly around CNNs and understanding what people can do with deep learning. But one of the understandings that I had in that point in time is that computational complexity of deep learning is something that is really going to block us from achieving or getting to the full potential of adopting AI in industry applications and real-life applications. And we can see it today, but the hypothesis that we had in the past was that basically computational complexity of models are growing exponentially.
Starting point is 00:02:20 If we'll think about GPT-3, GPT-4, and those large language models, we can see larger models that require more and more compute. And the available compute given by the hardware is something that is growing somewhat linearly, especially when we're comparing it per watt or per cost, per dollar. And the idea is that there's a gap between those two numbers, kind of a supply-demand problem, that the demand is coming from the algorithm and the supply is coming from the hardware, and there's some gap that needs to be resolved,
Starting point is 00:02:57 either by building better hardware, which is something that I don't know how to do, or building better algorithms that are more efficient and better utilizing or leveraging the hardware. So that's where we started with this understanding. It was pre the LLM and generative AI world of models that are getting larger and larger and things like that. And it was really based on data hypothesis.
Starting point is 00:03:24 So we thought, okay, that process of building AI is very manual. How can we automate that? How can we use AI to build better AI? And we ended up building a company that is focused on the algorithmic part of AI, building models that are more efficient and more accurate. And today we have a range of foundation models that we built, most of them in open source models like the CLM and other models that are showing great accuracy and efficiency together where companies can take them and benefit from those models together with our
Starting point is 00:04:03 commercial offering, which is a set of tools to take those models, customize them to the use case with fine-tuning and training techniques, and then deploy them very efficiently. So that's kind of the short story about what we saw in the early days of deep learning and how we decided to start DESI and what is the mission of DESI. So that's the origin of DESI. And people ask me also, why DESI? What is the name DESI?
Starting point is 00:04:36 So it's coming from Latin, from decimus, which is one-tenth. And this is exactly what we're trying to do for the modern AI, to make it one-tenth of the computational complexity of the development time and help companies to ship AI faster with better performance. Great. Great. Thank you for the intro. I have to say, personally, I was familiar with a big part of that
Starting point is 00:05:00 because we've had the pleasure of connecting previously, but I'm sure it's been very useful for people who may not have heard about you and what you do previously. And actually, I also learned something new about the origin of the name, which I didn't know. So thanks for that. And since we've connected before, I was already aware of the fact that you work on the algorithmic side of things, of models. And I was also familiar with your previous, let's say, line of products, which was also open source, by the way, and was focused on computer vision. And last time we connected was in 2022.
Starting point is 00:05:44 So before this whole large language model craze took over the world. But I know that you also have a line of products, a line of language models that are open source and large language models and how these two interplay, let's say, I thought you are a very suitable person to have this kind of conversation around open source and large language models. Before we get into the weeds, however, I thought it's worth starting by actually defining open source models because, because well it's not as self-evident as it may seem at first and the reason I'm saying that is well because open source software has been around for a long time and it's sufficiently well understood even though you know different licensing schemes always have their peculiarities let's say, that somehow require a bit of expert
Starting point is 00:06:47 knowledge to navigate. However, open source AI models are relatively new and they are different from traditional software because there are more artifacts involved. So it's not just code. And also their lifecycle is different. So you have the whole AI model life cycle from training to convergence and release and subsequent release and so on. So I would like to start by just going through the different artifacts that constitute an AI model and by doing that, we should also be able to more properly understand and define what constitutes an open source AI model. So we have the source code, we have training data and processes and weights and different
Starting point is 00:07:34 metrics. So some people, when releasing models, they also include metrics and the binary code, so the final product, let's say. So based on your own understanding and involvement, since you also deal in open source models, which of these parts would you say constitutes an open source model? That's a good question. And I think that in order to answer it, we should go back to the goal of open source. In my perspective, it's to let people build and use on top of other work and collaborate around those derivatives that are in open source. And if we think about those goals,
Starting point is 00:08:20 first and foremost is having the model available out there for people to use. So it's the weights and the code to run the model. Those are the most important thing to put in open source. The collaterals on what is the accuracy and what is the metrics are important, but those could be also reproduced by testing the model. And we see great initiatives like LLM Leaderboard that takes any model and run it through a lot of tests to show the numbers and the accuracies on various data sets and things like that. So I think that the most important aspect is the weights and the code to run the model,
Starting point is 00:09:08 to help or enable people to use that model in their applications and to build on top of those models either by fine-tuning and other techniques, continue the work of that model and bring derivatives out of it. So those are the most important aspects. Alongside those, there are the data and the training process, which are also important, but not at the level that the model and the weight. And some companies, Desi and Mistral, keep those a little bit more closed. And I don't think that it affects the fact that the models are open
Starting point is 00:09:46 and preventing people to use those models because those are kind of trade secrets today. So it's not like what is the exact data, but it's also how to mix the data and what is the right mixture of different components of the data in order to get to this and that model accuracy. So those things that are significantly less important than actually putting the model itself in open source.
Starting point is 00:10:11 So those are the components. And in terms of, you know, the traditional definition of open source, I think that the community concluded to a point where instead of calling it open source models, we can call it open weights models. So I think that that's the current stage of open source in AI. Obviously, there are also examples that, you know, the training data is shared and the training process also, that's shared for most of our models.
Starting point is 00:10:46 Sometimes it's useless because most of the organizations don't have the amount of compute in order to reproduce the training of those models. So basically, the most important aspect is to release the techniques, the code, and the weights of the model in order to people being able to use that model and also build new models on top of that derivatives or that open source that the company shared. So those are the important aspects of open source AI. I think that the training process could be perceived as the development process. So in terms of open source, if you have some algorithm in open source,
Starting point is 00:11:29 in some cases you see code, and in some cases it's hard to understand the code, and in some cases it's easy to understand the code, but you don't get a log of the brain function of the developer that built that open source. And that's similarly the training process. So you don't really need a training process in order to use those open source models. Yeah, it's interesting.
Starting point is 00:11:58 I think I tend to agree with the gist of what you said, at least if I got it right, which is basically that, well, open source models are a different animal compared to open source software. So because precisely of the fact that the process is different, the artifacts are different, maybe
Starting point is 00:12:19 people should come up with a different definition that covers open source in AI models, because, to be honest with you, I think it may be a bit confusing for people who are not familiar with the intricacies of everything that producing an AI model entails, to call a model open source when there's no one-to-one equivalence, let's say. So maybe we should come up with a new terminology that makes things more clear. Also, interestingly, what you mentioned about the training data and process not being that important, I see the point and I think that makes sense if the intent is what you
Starting point is 00:13:02 also said for people to take that model and use it in whatever applications they're building if that is the case then yes i think i think you're right and the training data and process are not that important because what matters is what you can do with the end product but if the intent is for people to be able to reproduce what has been generated as an as an open source model i think I think it's quite important. I just saw yesterday, actually, a statement by an engineer that works in open AI that was, and he was basically saying, well, in my experience, you know, all the hyperparameter
Starting point is 00:13:40 tuning and even all the different architectures don't matter as much as the data set. Eventually, you know, no matter how much tweaking you may do, no matter how many different architectures you may try, what matters most in terms of converging towards something is the data that you're using. And I'm sure that the process and what you also highlighted, so how you take different parts of that data set and what you emphasize and what you de-emphasize is also important. So, again, it depends on what you want to do with the end product. There is a spectrum from the closed source models like OpenAI, GPT-3, GPT-4,
Starting point is 00:14:29 to full open source as we think about it with the data hyperparameters and everything. And most of the AI models in the generative AI area are somewhere in the middle. I'm talking about LAMA models, Mistral, DESI. Most of those are sharing all the things that are needed in order to take and use that end product, but keeping some of the aspects closed. And we're in an ongoing debate about sharing more and more. So in some cases where the data is not proprietary, we are sharing what data we use and what learning rate and hyperparameters and in some cases we cannot share that.
Starting point is 00:15:11 The goal is to do as much as we can to push the community, the open source community, and release as much models with the best accuracy that we can. But sometimes we have limitations on that. I believe that also Meta has some limitations in ensuring what data they use to train Lama, and the same happens for Mistral. Yes, you're right. It's a spectrum, and it all depends on what you want to achieve in the end.
Starting point is 00:15:40 And definitely there's a whole debate whether it's best to use open source models and customize them and tweak them or just use something of the self. And there are many aspects to that debate. So obviously, by leading a company who is releasing open source models, it's de facto what your position is. You're obviously for open source models. However, let's try and just go through the sides, let's say, in this debate. And I think the first thing to examine in terms of looking at this from the end user, let's say, point of view is the classical build versus buy.
Starting point is 00:16:27 So the question there is, so is it worth for a company to invest in having their own model? So can they see it as a strategic asset that they need to own? Or is it fine for them to just sort of outsource that to OpenAI or, I don't know, whatever other closed source company and, you know, just use their APIs and not worry about that. And there's many things to consider there. So the competitive edge. So if you're using something that's the same for everyone in your domain, then what's the differentiation? It's data ownership and, you know, concerns about how much of the data that enterprises share through API calls really belong to them
Starting point is 00:17:15 and what happens with data that goes over the wire. And I'm not talking about technical considerations, but more like things like terms of use or potential data leaks and so on, and also safety and robustness. So how safe it is to use these third party models. So what in your experience, and I'm sure that you talk to lots of clients that bring those issues in front of you. How do you advise them? Yeah, so first, I suggest to zoom out and to understand the role of open source in AI and the risk that we are seeing in reducing or leaning towards closed sources we see today. If Google and Facebook wouldn't write PyTorch and TensorFlow,
Starting point is 00:18:04 and if Google wouldn't open source or write the academic paper about transformers, attention is all you need, we wouldn't be even close to the point that we're at today in terms of the capabilities of AI, the potential of AI, and the adoption of AI. So AI is built on top of open source, either in framework for development, algorithms, training, innovation, and things like that. So today, the benefit and all what we're seeing is based on open source, open source academic research that some or even most of it have been done in the large tech companies like Google, Microsoft, and Facebook and others. So there's a very important role of open source in the field of AI to the continuous progress of AI in the world,
Starting point is 00:19:11 which I am very positive about, although there are some discussions about safety and things like that, but I am very, very positive and supportive for the progress in generative AI in general these days. So that's just to put things in context about the importance of continuing to contribute to open source, publishing academic paper with innovative findings and algorithms, etc. So that's from a high-level perspective about the question. Now we're diving deeper today. AI practitioners have two options. One of them is to go to the closed source APIs, things like OpenAI and traffic and others. And the other alternative is to use open source AI models,
Starting point is 00:19:51 completely on-premise, taking them from Hugging Face, using some open source frameworks. And as discussed, there is a spectrum. You can take those open source and use them on the inferences of service providers, companies like Together, Together AI, but you can also use some tools, either open source or commercial solutions to run those models in your premise. So there is a range. what you need to think about that range is that probably the easier thing that you can do in order to get up and running is to use one of the closed source API. The time that it will take you, the proficiency that you need, and the effort that you will need to put in place in order to get that up and running is relatively low,
Starting point is 00:20:42 and you can do it really quick, and that will help you to get to the POC level. But in some cases, that will not scale nicely as your organization is expecting or needs because several kind of drawbacks of closed source API. First of all, it's a black box. So you will probably be able to build a nice demo, but when you want further customizations, you will be blocked in the amount of fine-tuning or customizations that you can do to those and the control that you have on the algorithms that are running on those models.
Starting point is 00:21:20 Also, for example, when the model will be updated, it will update your application without you having any control on any derivation in quality that you will perceive in your scenario. Another aspect is the cost of closed source AI. Basically, you are paying a premium here that is based on the number of API calls that you're running. And when you scale your application, you can get to very high costs based on the amount of tokens that you are generating through those APIs. And many, many, many other challenges. We talked about the data privacy issue. Most of those APIs are working as a hosted solution
Starting point is 00:21:59 that you need to share the data with the API provider. In some cases, you can take those into your VPC, like in the Microsoft and OpenAI Azure services. But in most cases, you will use the API, a kind of a pay-as-you-go approach. So there's also data privacy issues with sharing your data and ownership and license issues about the data that is generated and things like that.
Starting point is 00:22:26 So all of those are challenges or drawbacks of using the API that works so well in order to build a demo, the POC for the integration that you are trying to build. I'm not saying that open source is very easy and will bring you what you need very, very fast, but there's also challenges for open source. But in a general perspective, we can think about open source in model development as open source in software. You're getting more openness in terms of you understand what is running there underneath. You're getting more control. You can customize the application.
Starting point is 00:23:06 It's an open source model. You can fine-tune and you can customize the model to your specific needs based on your data. It's cheaper. You're not paying for the model. You can run it on your infrastructure, scale that infrastructure in the cost that you're paying for your cloud provider, and that will be your cost at scale, basically. So all of those are kind of the benefits of
Starting point is 00:23:32 using open source models. And what we see, and I think that it's the state of the market at the moment, is that companies are starting to experiment with the closed source API, and then when they understand that they need something more complex or more sophisticated than what they can get with those APIs, they are starting to work with open source. Because one of the considerations that I mentioned, either it's the data privacy or the scalability in terms of cost or the control and product customization capabilities that you are getting with open source.
Starting point is 00:24:09 And all of those are usually the motivation for organizations to go from their first POC that they usually do on open AI or one of their competitors to implementing open source approach for generative AI. And as the market will mature and we'll get more sophisticated, I believe that we'll see more and more open source model development and open source adoption in AI. Another thing that we need to take into consideration is the gap between models accuracy when you are considering GPT-4
Starting point is 00:24:44 and, I don't know, the best open source available out there. So there is still a gap. GPT-4 is the best model today, I guess. And you can get something that is even close to that in open source. But for most B2B applications, you don't need the GPT-4 accuracy.
Starting point is 00:25:05 Open source and that customizability and control, letting you choose a model that is in the right size and the right complexity to the task that you're trying to solve. If you're trying to summarize documents, you don't need GPT-4 to do that. You can do it with a mid-size LLMs, with good performance, lower latency, better cost performance. So, yeah, if you need high-end use cases, you should go to GPT-4. But if you're trying to build some B2B application that can work with smaller models, that's probably the way to go to open-source AI models and use them.
Starting point is 00:25:41 So, yeah, and I believe that this gap will be closed. I believe that we'll see by the end of the year models that are with the capabilities or very close to GPT-4 in open source. So that consideration will probably, will be closed towards the end of the year, I believe. Yeah, you raised a number of interesting points.
Starting point is 00:26:04 Let's see, I'll try and highlight them in reverse order. So starting by the last thing you said about closing the gap in performance, I actually just saw yesterday a new release of LoRa fine-tuned models, a collection actually of them for specific tasks that purportedly actually match GPT performance for specific tasks. So, you know, it just goes to show that if you want to highlight, if you want to focus on specific tasks and you are able to fine tune a smaller open source model, you may be actually able to get GPT level performance. And also, well, on this classical, let's say, dilemma, the build versus buy, I think
Starting point is 00:26:51 yes, it makes sense what you said. So many organizations probably start by just leveraging closed source APIs because, well, it's the easiest thing to do and the easiest thing to get them off the ground. But as they move along, I think what we'll probably end up seeing is what we also see in any other type of software. So you'll have the organizations that want to own and control and customize their deployment
Starting point is 00:27:20 and also the ones that don't see it as such a strategic investment or just prefer to have someone else to take over and actually also point fingers in case something breaks. And so eventually, I think organizations will find their place along this spectrum. Talking about performance, actually, which is a very important consideration in this evaluation, I think, again, we need maybe to take a step back and actually ask ourselves, so what do we mean when we talk about performance here?
Starting point is 00:27:58 You mentioned accuracy, and that's a good metric. There's also speed. So how fast can you get a reply? But then, you know, there's also the interplay of those. So are we talking about easy things? You mentioned document summarization, for example. And again, there are many parameters to consider there. What kind of document? What kind of context windows are we talking about?
Starting point is 00:28:23 So small documents, extensive documents, technical documents, general content documents. So already, just by getting a bit into the weeds and just asking these questions out loud, I think it becomes quite evident that evaluating models is not at all straightforward. So the question there is, how do people currently do it? What do you think are some meaningful metrics that organizations can consult in order to have a better idea of what works for their specific needs? You mentioned the Hugging Face leaderboards, which I guess are kind of the de facto evaluation that people use. You also mentioned open source models kind of closing the gap. And I think we're seeing that happen.
Starting point is 00:29:11 There are many documentations to that. Some people are also even saying that maybe there's a sort of plateau that we've reached, at least a temporary one in terms of performance. So how do you evaluate performance, actually? Yeah, so as you mentioned, performance is not only accuracy. It's a combination of metrics, accuracy, latency, throughput, cost, and other factors that we need to take into consideration to ask ourselves about AI inference.
Starting point is 00:29:43 And I think that one of the things that I didn't mention before is that using the largest models like GPT-4 comes not only with a cost, but with high latency that is really hard to incorporate into real-time applications. So the models that are 7 or 13 billion parameters are working faster in an order of magnitude compared to GPT-4 models. So if you need speed, you should better work with smaller fine-tuned tailored models for your applications. In terms of accuracy evaluation, it's still an unsolved problem, but we see a lot of organizations trying to
Starting point is 00:30:29 understand how they want to evaluate their models. Some people are using human evaluation, so writing some tests and comparing two models to see which responses are better in which cases and try to analyze which model is better. That's one approach. Another approach is using a large language model to evaluate another large language model. So that's the Alpaca eval and the empty bench. So those are very useful approaches to use LLMs
Starting point is 00:31:01 or very large LLMs, it could be GPT-4, in order to do the evaluation of which model produces better results. But we've seen also that for example in alpaca eval, there is a bias towards long answers which the LLM thinks to be
Starting point is 00:31:20 more accurate or more comprehensive or higher quality which is not always the truth. So, for example, if you consider measuring summarization results with large language models, and it will tend to prefer longer answers, it will probably miss the target that you are trying to achieve with summarized answers.
Starting point is 00:31:43 So it's still an unsolved problem. Most of the organizations today are employing human evaluation or human-in-the-loop evaluation. Some of them are using large language models in order to evaluate the models, but the idea is that
Starting point is 00:32:00 you really need to evaluate the model in the context that you're going to use it. The LLM leaderboard means nothing for your company if you're trying to do summarization, for example. You need to evaluate the model in summarization tasks. And as you mentioned, summarization for long or short documents would be very different and for technical documents would be very different from general documents. So you really need to build the evaluation pipelines for the models that you're using in the scenario, in the data distribution that you are using. And it's a challenging problem today.
Starting point is 00:32:37 So I don't have good solution except human evaluation and large language models that is being used for evaluating models. Yeah, I can totally understand how challenging it can be. And yeah, it's an evolving field and even in fields that are not evolving. So if you take something like a database, for example, sure, you have benchmarks. But in the end, what matters most is you need to set up a representative environment of the application that you intend to use the software for and evaluate different options there. And I guess the same goes for LLMs as well. And just the last one before we actually go to talk a little bit more about your specific offering. So the open source LLMs that you're providing,
Starting point is 00:33:26 just to cover one base on customization. So lots of people these days, whether they're using off-the-shelf closed source APIs or open source customized models, they want to sort of fine-tune them or use something called rag- show retrieval made at Generation in order to basically tailor the model more to the data sets and therefore to their needs. So do you think it's easier
Starting point is 00:33:56 to employ something like that, whether it's fine-tuning or RAG, using a proprietary or an open-source LLM? I would tend to say intuitively probably an open source, but I wonder what your take is. That's a good question. And there's a huge difference between fine-tuning and RAG. Fine-tuning is changing the actual model for a specific dataset, and RAG is more around the model, around the generation,
Starting point is 00:34:24 which is the retrieval and augmentation part. and RAG is more around the model, around the generation, which is the retrieval and augmentation part. I think that it will be much easier to do, or I would say that in RAG there is not a lot of difference if you're looking to do it for open source models versus to closed source models. In open source models, and in fine tuning, it will be much easier to do for an open source models, and in fine tuning, it will be much easier to do for an open source model and experiment
Starting point is 00:34:48 and get the results and understand them better, etc. So, but I think that in the debate of fine tuning versus RAG, the future is kind of a combination of both of them. There's some benefits of doing RAG, mostly from data privacy. You don't want to train the model on private data, but
Starting point is 00:35:04 in RAG, you can supplement the model with private data in the prompt. So that's one of the benefits of using RAG, and also the update, the fact that you don't need to fine-tune all the time the models in order to get it updated, so you can give updated data. You see very interesting products today that are using RAG, like Perplexity and u.com and other LLMs or systems that are using RAG in order to bring up-to-date data from the Internet. So those are kind of the benefits or the pros and cons of each one of them.
Starting point is 00:35:44 And the future is probably the mix of RAG and fine-tuning models and obviously fine-tuning it much easier in open source. So overall, I would say that building those more complex use cases will be much easier to do in open source versus closed source in the future, especially when the accuracy gap will be closed completely between the closed source and open source models. So with that, I think it's about time we actually sit focused to what you do specifically around open source models
Starting point is 00:36:18 and open source language models, to be more precise. So I was wondering, basically basically if you can just quickly introduce us to how it works for you basically. So I know that you had a line of open source models even before you started focusing on language models. So how do those fit in your business model, in your overall offering? And we can start from that part. And then we can focus specifically on the language model. So what is your offering there? And what are the different features?
Starting point is 00:36:56 How do people use them? And so on. Sure. So this is building foundation models. We usually put them in open source. So most of our models are available in open source. You can check either SuperGradients, which is an open source repository for training computer vision models with computer vision open source models inside,
Starting point is 00:37:16 or you can check the Hugging Face page of DESI where you can find text-to-image models and LLMs. So basically we have one component in our offering, which is the open source models that are free to use, and you can try and use them. And then comes the tooling layer, which is tools to customize those models like super gradients to fine-tune those models and to adapt them to specific data,
Starting point is 00:37:41 and then the tool to deploy them that is called in ferry so the idea let's take for example the clm7b the clm7b is a model that is similar in its accuracy to the industrial 7b but the interesting aspect about the clm7b is that it's significantly more efficient or performant in terms of inference speed than Mistral in this example. So Desi LM running in 4.5x better throughput compared to Mistral 7b. If you take Desi LM for open source, what you'll see is that it's running around 2x faster than Mistral. But if you're using Ferry, which is is DES's runtime engine for LLMs,
Starting point is 00:38:27 Inferi LLM is for LLMs, and Inferi is for regular models, you will see that you can get additional boost out of the commercial offering. So basically, it's an open-core approach where the models are open source. But if you want to get extended value in the area of performance,
Starting point is 00:38:42 you can use our commercial offering, which is called Inferi, and the tools that is complementing the models in order to shorten the development cycles and improve the performance of the models as they run at the inference workloads in production. So that's the general approach on how open source models working with our commercial offering and we have a variety of open source models from computer vision like yellow nest and yellowness pose for pose estimation and now a new model that is called ulana sat for for small object detection with examples from satellite image analysis and then in the llm world we, we have DESI Coder 1B,
Starting point is 00:39:26 which is a coding assistant model, which is very small and can work on the edge or in the cloud. And the CLM 6B, which outperform CodeLambda 7B and StarCoder 15B. And the CLM 7B that I just mentioned that is similar to Mistral in terms of accuracy and running about 4.5x faster at inference time. So those are DLMs and we have some text-to-image models that are similar to stable diffusion 1.5 and 2.1 and all of those are available in Hugging Face in our company's page. Okay, great. Thank you. And yeah, I can definitely see why you decided to address
Starting point is 00:40:09 the other language model market, let's say, obviously, because there's been a huge demand for that. And interesting that you mentioned you have models that are focused on programming, basically, sort of like coding co-pilots. And I wonder specifically for those, if you could share,
Starting point is 00:40:29 are they also able to integrate with coding, with IDEs, or people's coding setups in some way? Yes, so one of the nice things that when you release open source models, that is people building cool stuff around them. So one of the open source contribution is a tool that can take that 1 billion power meter models and use it as a coding assistant in IDE. And that model can work locally at the speed that you're typing and coding and to assist you with code completion and code correction in real time on your laptop.
Starting point is 00:41:05 So those can be easily integrated with tools, either through API tools that are working in the cloud or tools that are working locally on laptops. What kind of use cases are you seeing? Obviously, for the ones that you just mentioned, it's pretty clear. And even though if you have specific clients that are using it that you just mentioned, it's pretty clear. And even though if you have specific clients that are using it that you're able to share, I'm sure that would be interesting. But for the other ones, for the generic, let's say, language models, what are you seeing people using them for? Yeah, we see a lot of use cases and there's a lot of excitement in trying new approaches and new use cases.
Starting point is 00:41:46 We try to focus on the business area or the business vertical where our uniqueness is the most compelling, which is usually the performance side of the model. So if we'll think, for example, on chatbots for customer care, there are two approaches. One of them are chatbots that are really talking with the customers, and the second one is the co-pilot approach, that is a chatbot that's assisting the customer care representative in giving service to the customer during the call. And in both of those cases, we have customers that are using our models
Starting point is 00:42:22 and tools in order to support their workflows and their customers. And in those cases, it's very important to have a low latency model that can respond in real time and really be an assistance so the customer won't need to wait a lot of time for the LLM to compute or to generate the text that is needed. Another area that we're working in with customers is the financial services. We have customers there that are doing financial analysis and other aspects of assets analysis with LLMs. There's a lot of business use cases in enterprises like summarizations and highlights from calls,
Starting point is 00:43:06 either from calls with customers or from internal calls and other aspects that people are summarizing, also summarization of documents. So those are mainly the use cases that we see people are using there. Let's call that mid-size LLMs that we're currently providing, together with the tool, with the very strong emphasis on working on-premise with high performance. Yeah, you did mention that technically those models that you have, I think you said they go up to 15 billion parameters, so I think technically they would be called probably mid-size, not large-size.
Starting point is 00:43:43 Do you have plans for potentially maybe releasing also larger-sized models? Do you actually see the need for that? Or are the use cases that you're targeting well-served by the models in the range that you already have? Yeah, so we are going towards larger models, and we'll probably release later this year models that are larger than the models that we released so far so so definitely we see need for models that are more capable we're trying to push the boundaries of the capabilities of the models in the middle mid-size but we'll also get to release models that are larger than what we have today for for sure. And what's the process, actually, of you building those models?
Starting point is 00:44:28 Obviously, you have to start by finding the right data set, which is something you talked about in the beginning. But let's put it that way. Is there something special? Is there maybe some kind of secret sauce that you apply besides the obvious of having this extra speed inference layer that is available as your value-add proposition. In the actual process of training the model,
Starting point is 00:44:54 is there something special that sets your models apart? Yes, so we're trying to build the most accurate model. So at the moment, our model is slightly more accurate than Mistral and more accurate than Alarma 2. We're working on a more accurate model in the 7 billion parameter range and also on an 11 billion parameter range. And we're trying to break the state of the art and the capabilities of the models in those size categories. So first, we put a lot of emphasis on the instruction following capabilities of the models and their accuracy.
Starting point is 00:45:32 Second, we try to also think about how easy to use those models, to customize them, fine-tune them, and to use them in your workflows and in your organizations with our set of tools that we're providing. And third, we're thinking about high performance. How can we improve that performance, reducing the latency, reducing the cost of using those models, and improving the throughput that you're seeing on the hardware that you're using, etc.
Starting point is 00:45:58 So those are kind of the priority. Best accurate models, easy to use, time to value, short development cycle, and performance on cost. I see. And I wonder, as lately, obviously, the shift of focus to language models and coding or assistant models as well has been evident. I wonder how that has played into, well, your overall trajectory,
Starting point is 00:46:26 let's say, as a company. I saw, for example, that you had a very recent, let's say, Series B round. And so I wonder how, if you're able to, let's say, pinpoint the role that this new line of products that you are releasing now with the language and coding assistance how it has been contributing towards your growth and what is your say what are your future plans and what part do you see for this specific line of models in those plans yeah that's that's a good question i think that year ago, maybe a bit more than a year ago, with the, we call it the change GPT moment,
Starting point is 00:47:08 we all saw new capabilities emerging in AI and a new set of enterprises that didn't think about integrating AI started to think about it. So basically what we can think that as an AI platform, our market just tripled or even more in the last year with the amount of companies that are trying to integrate AI into their products and workflows. For us, it was natural to expand to those areas. Usually, our mission was to solve that problem that I mentioned at the beginning of the podcast about the gap between the computational complexity of running those models and the available compute.
Starting point is 00:47:51 So basically, that's a natural gap that even increased or the problem even increased when thinking about those large LLMs. So it was very clear to us that we will have to enter into that territory, and it's a territory that is very interesting to solve both because of the excitement from those models and the new capabilities and the potential of them being integrated in enterprises across many verticals and also the technical challenge of making those models to run faster with the high accuracy that we see from going to larger and larger models.
Starting point is 00:48:30 So those are kind of the two reasons why it was very natural for us to expand to those areas. So today, we have two segments in the business. One of them is the computer vision, which is the old one that we're continuing to growing there very fast with more and more customers as this side of the business is more mature in terms of market. The market maturity is higher. People know what they want, they know what they need, and they are more looking towards production use cases
Starting point is 00:48:57 and more and more multiple use cases per company and team. And in the journey, people are more early in the journey, more in an exploratory phase. And we believe that this year will be kind of the production year that more and more companies will try to get to production. So you can think about it
Starting point is 00:49:17 as tackling two segments of the market, which one of them is more mature and more business-oriented, and the second one is more exploratory at the moment. But both of them have a lot of potential to grow the business of the company in the long term. So I'm very positively thinking about both of them as being kind of two components of
Starting point is 00:49:40 an AI platform that is enabling the organization to benefit from the potential of AI. Yeah, thanks. And I think you're right. I think lots of people are expecting that, well, this year should be the year that actual deployments started to happen because yes, so far for many organizations, it has been mostly around experimenting and understanding their own needs and trying to find what works for them and so on. But hopefully this should be actually the year where the rubber hits the road, so to speak. And I guess you'll be in a good position to tell us how things have actually worked out about, let's say, about this time next year. Thanks for sticking around. πώς τα πράγματα έγιναν, ας πούμε, αυτή τη στιγμή το επόμενο χρόνο. Ευχαριστώ που παρακολουθήσατε. Για περισσότερα ιστορία όπως αυτή, εγγραφείτε το σχόλιο στο βιβλίο και ακολουθήστε την οδηγία κοινωνικών δεξιών.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.