The Data Stack Show - 162: Accelerating Enterprise AI Transformation With Open Source LLMs Featuring Mark Huang of Gradient

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week, we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Justice, this week's show is with Mark Wong of Gradient.ai. And I'm really excited about this conversation because I think it's a great example of the type of thing that will further accelerate the usage and

Starting point is 00:00:42 adoption of LLMs. So Gradient essentially takes adoption of LLMs. So Gradient essentially takes open source LLMs, so let's say like Lama 2, right? You want to operationalize Lama 2 for some use case. They actually package Lama 2 into a service and essentially give you access to it via an API endpoint. So you get an API endpoint, and now you can literally send and receive data from Lama2, and they take care of literally all

Starting point is 00:01:13 of the infrastructure, which is pretty fascinating. I think one of the big topics here is what this means for MLOps. But I think there are also implications for data roles that we sort of traditionally see as data engineering, data science workflows, that sort of manage this data lifecycle. And you're almost jumping over a lot of that, which is fascinating. Yeah, 100%. And unfortunately, I didn't make it on this recording, but I had the luxury to listen to the recording already. So I have to say that's like a very fascinating conversation that you had with Mark there. But yeah, like 100%,

Starting point is 00:01:51 like I totally agree with you. I think there are a couple of different things here. The first one is access to the technology itself, right? Which, I mean, just putting like a REST API there, like it literally makes, like lowers the bar of

Starting point is 00:02:07 accessing such a complicated technology so much that pretty much everyone can go out there and build anything. Just by being a front-end developer, you can go now and build an AI-driven application, right? Which is amazing in terms of

Starting point is 00:02:24 the potential innovation that can be created. But there's, I think, another factor there that many people might not think about, and that has to do with infrastructure and primarily the hardware infrastructure,

Starting point is 00:02:40 right? It is extremely hard today. For someone who wants to go and experiment and build around these systems, you just cannot get access to the hardware that you need to go and do that. So figuring out using a service that removes all the logistics around doing that, I think it's like an amazing opportunity and it is one of the reasons that we see like so much growth right now happening around ai right it's not just like the technology it's also like how the industry managed to react really fast and start like

Starting point is 00:03:16 delivering uh products that's pretty much like everyone out there can go like and start like accessing these technologies which is amazing so let's go and listen accessing these technologies, which is amazing. So let's go and listen to what Mark has to say. He's the expert here. And this is a very fascinating area. And we will have more guests around that because things just change too fast, right? So nothing is settled yet and many things will change. So we need to keep an eye on that and learn as much as we can.

Starting point is 00:03:49 So let's go and hear what Mark has to say. Let's do it. Mark, welcome to the Data Sack Show. Hi, Eric. Great to be here. All right. Well, we'll start where we always do. Give us your background, which is fascinating. Can't wait to dig into that. And then tell us what you're doing today with Gradient.

Starting point is 00:04:17 Yeah. So I'm the co-founder and chief architect at Gradient. We are a LLM developer platform. And I spent years all in the data space. Most recently, I was at Splunk working on streaming distributed analytics systems and machine learning there. And I also prior to that was a data scientist, sort of working with the business stakeholders, shipping machine learning and data science there. And I actually spent in another life half of my career as a quantitative trader at algorithmic hedge funds. I actually started my own in Hong Kong for about a year. Wow. That's wild. Definitely want to ask about that. How did you get into quantitative trading in the first place? Did you just start doing that out of school? Yeah, I guess I always had a bend towards statistics and data and always knew that it would be some intersection of the application of how do you leverage it? How do you get the most value? And how do you solve solutions and problems?

Starting point is 00:05:21 So going to the University of Pennsylvania, you Pennsylvania, half the class always goes out into Wall Street. And I ended up going into it, but I got way more fascinated about the technology and the methodology aspects of all of that. Yeah, totally. When we were chatting before the show, you mentioned that, I think you said Citadel won because they had sort of the best like end-to-end stack, which is really interesting. Can you explain that in sort of the like algorithmic trading world? What does the stack look like? Is it similar to sort of the stack that you would think about, you know, sort of like with a modern SaaS company or something? There's really interesting parallels. I think we're in an exciting time in AI.

Starting point is 00:06:06 So yeah, it kind of reminds me of how, as much as I can talk about how hedge funds work, but it's not really one thing. It's not one strategy. It's not one system. It's not one set of tooling and insight that leads to someone to win out in their market. It's actually being able to have that the most frictionless environment for any researcher or any new trader

Starting point is 00:06:34 to plug in and then actually leverage their strategy the best and being able to ship that and tweak it over time. And Citadel just really have one of the best systems in the world for that. And they've shown that over the last 10 years. So that's sort of my belief in the AI space too. We're kind of in an arms race where there is a need for the picks and shovels to be able to democratize that and actually help the enterprises that need to leverage it the most. Yeah. Were the friction points similar in like the hedge fund environment? I mean, you think about friction points in sort of algorithmic work, right? And you have like data inputs, right? Okay.

Starting point is 00:07:16 Well, that can certainly be a, you know, a friction point. You like actually running analysis, analyses, and sort of the time that it takes to do that. Are those the same types of friction points that you saw in the hedge fund space? Yeah. I mean, it's almost me coming from that space and going into the SaaS software space felt really natural for me because it's almost like doing the same type of things, right? Taking a ton of data, figuring out how to explore it, figuring out how to clean it up. And then how do you scale out the pipeline that it needs to put it into production?

Starting point is 00:07:54 And those are entirely the same on both sides of it. It's just like, what's the product? The product is different, right? At a hedge fund, the product is just the returns that you can give your investors and then SaaS businesses. It's like, how do you sell your software so other people can get their problem solved by what you build? Yeah, totally. And what motivated you? I mean, starting your own hedge fund is pretty wild. What was the point? Do you remember the moment where you said to yourself, I think I'm going to start my own hedge fund? I think it was a brash decision, to be honest.

Starting point is 00:08:32 It was feeling a little bit like the industry was ripe for a little bit of disruption, particularly. I went over to Hong Kong on the emerging market side, there was a lot more opportunity and being a young, you know, 25 year old wanting to just start a new venture. And just not knowing, like, it wasn't clear to me the exact problem I was going to solve with respect to that. But I knew there was enough interest there to allow me the opportunity to explore it. And I reflect on it, and I have this list of things I would have done differently. And coming into Gradient, it felt a lot different. It felt more so the timing is now, these are the ways that I approach

Starting point is 00:09:17 it. And you have your initial plan that obviously changes, but you have your initial plan at least. Yeah, totally. Did you, in starting a hedge fund, when we think about going back to the most sophisticated end-to-end stack that Citadel has, for example, did they have economies of scale on the tech side that you didn't have as a you know a new sort of startup hedge fund i think that's absolutely true and in effect you you actually see that you know particularly today where there's sort of this gravity effect towards some of the largest fund managers and and and being someone where you know you could have actually the same exact set of strategies and the same model predictions coming out,

Starting point is 00:10:10 but because you plug into their system, they have everything more optimized. The transaction cost analysis and the way that they're able to execute the trades and the way that you're able to get feedback from all that, that you just can't get where you have to roll out your entire stack, right? It's like developing software all over itself. And we all know that's a business in itself. Yeah, yeah, for sure. No, that's just infrastructure as an advantage,

Starting point is 00:10:36 especially sort of the end-to-end. That's super interesting. Okay, I want to switch over to talking about Gradient and LLMs. But what's interesting about Gradient is that you focus on open source models and private models and sort of enabling those things. But before we dig into that, could you give us a 101 on the model landscape? And I mean, I know a lot of our listeners are familiar with AI. I know we have a lot of listeners who are probably working on AI stuff, but we also probably have a lot of data engineers or analysts or people around the space who have heard of these things, have maybe read about them, but maybe they don't have a wide view of the horizon when it comes to the different

Starting point is 00:11:26 options that are available, right? So of course, you know, OpenAI and ChatGPT, you know, most people have heard about that by now and have, you know, prompted ChatGPT for something. But when we sort of go a layer deeper and think about, you know, LLMs as an entity, what are the options out there? Give us a sense of the landscape. Yeah, absolutely. So, you know, on the closed side, there's a few major players and it would be OpenAI, Anthropic, Google has their own and then Cohere has their models too as well. And there's kind of that oligopoly on that side of things. But then on the open source side, it's actually really interesting because you just have these

Starting point is 00:12:15 open source models, right? These are bases to be able to actually train more data on top of. And Meta has been pushing forth the open source constantly for that. So everybody has heard of Lama and Lama 2. And it kind of felt like the moment Lama 2 came out, there was an explosion of builders, like these AI builders in the open source who are releasing their own models off of the base and all the hard work that Meta had. So amongst those choices, you get models like Lama 2, you have Code Lama, you get a few of the other ones from Hugging Face like Bloom, and everybody's sort of taking all of these democratized foundational models,

Starting point is 00:13:00 which is what they're called, to be able to build on top of. Yeah. Yeah. That makes total sense. What creates the oligopoly on the private side? Why is there such a huge concentration there? Is it just access to additional resources? Yeah. I mean, I think if you can raise a billion dollars early on, then you can basically get that, right? That's the gravity. That is the gravity that is pushing the oligopoly on the closed side. It's effectively very concentrated groups being able to raise a lot of money

Starting point is 00:13:34 and basically be able to fund the research and the compute that's necessary to effectively create these state-of-the-art models. So on the open source side, it's a bunch of people collaborating together with the democratized access. And I think that's where I'm very excited about. Yeah. So let's talk about the... Of course, it sounds on the surface a little bit of a David and Goliath type story. That's obviously an imperfect analogy.

Starting point is 00:14:06 But can you explain maybe the advantages from your perspective of the private side that's, you know, you can sort of do infinite training, you know, access to infinite sort of hardware resources, all that sort of stuff. What does that get you and then what does the open source approach and sort of the collaborative approach get you like what are the advantages of each approach i think when it comes down to it a lot of ai has has the principled aspects of machine learning and data science. They're still the same where basically it's about the data. So on the close and the private model side with OpenAI and all those folks, they've taken a lot of care in curating the data, cleaning it in, and making sure that they can train

Starting point is 00:15:04 these extremely large models. Sure, they have all the compute, but if you look at all the model architectures, if you look at all that PyTorch code and look at all the different architectures of the new models coming out, there's not a lot different, actually. They're just trained on the data across each other. And even more so the fact that enterprises may have some of the most valuable data out there inside of themselves, but they're sensitive to it. So then be able to leverage it is going to probably take us into kind of the next era of what I see in model development. Super interesting. Let's dig into that a little bit. So when you say enterprises have some of the most valuable data within themselves, give us an example of that. Yeah. I mean, Stripe, for example, they have all this transactional data that they couldn't ever release. They need to be incredibly careful about it. And it probably greatly outstrips all the transactional data in the internet, because that's just so sensitive, right? So from the standpoint of, you know, some of the products they create, I believe it's Stripe raised right radar which is an anomaly detection service that they have internally and all the modeling that can be done with that data you know they're basically sitting on a corpus that the entire world can't

Starting point is 00:16:37 see and other companies have the same exact aspect there too as well. And particularly even governments and healthcare, right? Like those are kind of some of the hardest places to penetrate into for AI and services. But, you know, you see this pressure for adoption and a lot of them are starting to, you know, become a little bit more open to adopting AI within their enterprises. Yeah, that makes total sense.

Starting point is 00:17:08 Let's talk about the data and the comparison between, you know, so obviously, you said open AI, like one of the advantages is just massive amounts of data, carefully curated and prepared, you know, so that the model can produce outputs that are, you know, highly tuned. Even if you think about an enterprise or think about Stripe, right? It's a much smaller data set relatively, right? That they would be training a model on

Starting point is 00:17:39 than say like what you put into, you know, open AI, right? But the data set's also very different, right? It's a like homogenous data set, you know, that generally sort of represents like a data structure that's very similar across all of the data points within it, et cetera, right? Whereas, you know, open AI is sort of, you know, a huge variety of different types of data.

Starting point is 00:18:06 Can you talk through that tension? Because you said a model is only as good as the data you put into it. One of the aspects of that is scale. One of the aspects is quality. When we think about an enterprise leveraging an LLM to get more value out of the data that they have? Are there thresholds or economies of scale that they need in order to sort of actually operationalize, say, an open source LLM? Yeah, I think that's sort of our, you know,

Starting point is 00:18:37 our gradient, what we sort of believe in is that it's not one model that's going to be sitting in an enterprise. So it's going to be like a thousand models. Like what does the world look like when these large enterprises are launching a thousand, they have a thousand models inside their enterprise that are helping them either improve productivity, operationalize their work, or actually even become user facing. And what they need is to be able to have like custom models that are really good at what they need to be, right? Like from a business standpoint, a lot of times, you kind of know what you want. You want this model to do X, Y, and Z. And then

Starting point is 00:19:20 everything else, you know, it's fine if it's interesting that it's really good at, but what you just need is just the new model that will do really well on that specific task. So if you can take, you know, that subset of data, you actually don't need so much of it. And you can actually, you kind of call it, we call it fine tune your base from Lama model to like do that particular task better. Yep. Do you see,

Starting point is 00:19:50 I want to get into the fine tuning and the specifics of gradient in a minute, but I'm interested to know, you know, from your perspective and what you're hearing, you know, from people using gradient, especially in the enterprise, when you think about the,

Starting point is 00:20:03 like an enterprise adopting technology, you know, there's, there can be different rationale for the type of technology that you adopt, right? So, you know, going back to the old adage of like, no one ever got fired for buying IBM, you know, you sort of have like a large, like well-established company, you know, with tons of resources, right? And so maybe that's one of the members of the oligopoly that's, okay, this is stable, supported, you know, all that. But then also there's the portability and sort of flexibility and lack of vendor lock-in that comes with open source. What are you seeing in the enterprise of these companies who have this really valuable private

Starting point is 00:20:45 data i mean both of those are rationale that people have used to adopt new technology in the enterprise yeah i think that you know our observation is everybody will have tried open ai first it is a fool's journey to think that you'll get in as the vendor before them. Yeah. But what we've noticed is effectively almost everybody says either we need a complementary solution to them, like we want to iterate ourselves and learn a little bit more how to develop these models for our custom use cases, or we know we need to move off of them, which is kind of interesting as it's almost so new in developing the technology and people already kind of planning for the future. So yeah, they want the, they, the first thing they ask us before I even opened my mouth to say that

Starting point is 00:21:37 we have the access to the open source models or they say, so do you have Lama too? Do you have code Lama? Like That's the first thing they ask me and I'm just like, yeah, we have that. That's the point of us, to be able to give you the guys the access much easier. Super interesting. Okay. Another question about the enterprise. This is just fascinating because I think things are changing really quickly. So the insight is really, really helpful here. Inside of enterprise organizations, the, and I want to talk about Gradient's API. But, you know, previously, or let's talk about maybe even going back to the, when you were working in hedge fund world. You have dedicated infrastructure for algorithms, bespoke development of these models that are driving things,

Starting point is 00:22:33 and just massive investment in these teams and infrastructure that it takes to actually do this. Now we're seeing that become democratized, of course, through the oligopoly and through open source. But enterprises aren't necessarily the most quick to change. Is this actually accelerating organizational change inside the enterprise? I've actually been shocked to see how quickly enterprises are trying to adopt an AI strategy and also how quickly they are willing to talk to someone like us, right? Gradient is a startup and why should we get in the door for a conversation when they have been trusted partners with many other vendors?

Starting point is 00:23:21 But it's the fact of the matter is they see our ability to deliver the solutions that they need. And then they have the pressure to want to adopt AI that I think that, yeah, it's kind of the first time I've seen something where they, in a sense, put the cart before the horse where they really want someone to help them adopt it. They want to know, hey, can I automate this? Can I do all these things? And what do you offer to help me do that? Rather than asking where the other vendors sit, they just know that they need something really badly. Yeah. I believe that a lot of people believe, and of course I do as well, that this is such a monumental, this is a huge step. And I think this is a sea change in sort of

Starting point is 00:24:15 all sorts of things, right? That LLMs are going to drive in terms of even just baseline productivity every day, but like customer experiences, I mean, it's going to impact so many things. But is that urgency in the enterprise, almost like a FOMO? That's probably too informal of a term. But, you know, I mean, is that like a we have to figure out LLMs or we're going to fall behind? Or are you seeing a lot of companies be really strategic and know exactly how they want to deploy it or leverage it? So you kind of see both. I'm not going to lie, right? Yeah. If open AI can almost force Google's hand, then you know that AI is, there's a little bit of FOMO effect across all enterprises, right? Being afraid that you'll fall behind. But then with some of the enterprise

Starting point is 00:25:11 customers we work with, particularly on the automation side, what's interesting is that people are looking to expand themselves into other business lines by getting LLMs in there. So like on the auto, you always saw like the UiPaths of the world do like process automation and the automation was sort of the first iteration of that. But then what happened next these days is what I view is they take that process and now they're generalizing to other business verticals where now the flexible LLM is going to open up new doors to them. So they're viewing it as a revenue generator more than anything. Fascinating. And you mentioned vendors. How many vendors are there out there? I know that sounds like a dumb question, but of course you know, the oligopolies.

Starting point is 00:26:05 You have a handful there of like the big, well-funded private ones. But like how many vendors are there? And I mean, it seems like they're sort of cropping up every day. But is it a pretty fragmented landscape already? You know, you kind of if you take the thousand foot view, you mostly you can't miss the oligopoly of the super large research institution companies such as OpenAI or Anthropic. And then on the smaller side and you kind of have the open source side, Hugging Face kind of stands as the monolith there. But then you have all the other smaller vendors around. And I think that the space is definitely more fragmented than you would think. But it's interesting to see, like the problems are the same, which is, you know, why we sort of set out to work here is, it's just not easy to get started, to get access to these models and do the things and build and customize them for like a single developer. And it's even harder for enterprises to get started on it from lack of

Starting point is 00:27:07 knowledge or an inability to scale beyond these vendor solutions for the workloads that an enterprise actually expects. Okay, one more question before we dive into gradient specifics. Sorry, I've been keeping thinking of interesting questions to ask. How is this impacting people who have worked in ML in the enterprise for some time? Because to some extent, you almost see a leapfrogging of, let's call it sort of the traditional like ML process or workflow, right? I mean, it's an API now, right? Which I want to dig into, but that's pretty significant, right? For sort of traditional ML teams who are running, you know, full end to end, you know, especially even like sort of the on-prem like ML ops infrastructure. Is that shifting a lot in the enterprise? I think that's probably the main

Starting point is 00:28:07 observation we made. And it defined, honestly, it defined the way that we released our product. We made sure to learn from the OpenAI release, which was that it was the easiest way, it was the easiest interface for anyone to get started. So us being like, you know, web APIs to call on models and to run their fine tuning and to run their completions and inference on top, like that was the way that we wanted people to experience it and unlock, like just basically remove all the developer friction to be able to do. Okay, can we, let's all the developer friction to be able to do. Okay.

Starting point is 00:28:45 Can we, let's talk about developer friction and this is a great way to dive into the specifics of gradient. So let's talk about Lama two without gradient. If I just, you know, I'm at company X and I want to go use Lama to operationalize whatever it is we're trying to do with LLMs, a recommendation or whatever we're trying to leverage it for, right? Information

Starting point is 00:29:15 retrieval for our users, whatever app we're building. Can you walk me through, what do I need to do to operationalize that just on my own sort of hand rolling it and then talk us through what is that experience like with gradient yeah absolutely so well first thing you're gonna do is you gotta call your aws or gcp you know salesperson tell them that you need a reservation on a bunch of A100s or other GPUs, he's probably going to make you pay three years up front in order to

Starting point is 00:29:52 run your development tests because you just can't get them. Is the demand really that high? Yeah, I would say I've never had so much trouble just trying to get hardware in my life. Maybe we should go just sell...

Starting point is 00:30:08 Maybe we should go be hardware salesmen. I mean, I think that's kind of the funny FOMO effect of a few companies these days. We decided to build software and there is a bit of FOMO on like, hey, the hardware could have been pretty nice business too, right? Okay. So sorry to interrupt there. Yeah, for sure. That just struck me.

Starting point is 00:30:28 Okay. So I'm calling up AWS and I'm going to pay three years up front for the hardware that I need. Yep. Okay. All right. So from there, you're going to need to, depending on how large the model is, like we support on our public interface, the 13 billion parameter model.

Starting point is 00:30:50 And in enterprise, we support the biggest one, the 70 billion parameter model. So you're going to have to learn a little bit of distributed computing, understanding how to distribute this model across multiple GPUs, do a lot of load testing in terms of, you know, any time that you send a piece of data into it, you will come across in your tests, the dreaded out of memory error. It's called the OOM, right? And on the GPU, unfortunately, it's kind of a one way door. It OOMs and you've got to restart the entire system.

Starting point is 00:31:26 So you go through that type of pain in operationalization of that. So you have to build out that system. And then now, probably one of the hardest aspects that we've worked a lot in trying to facilitate is having a system that can run at high concurrency, like having a lot of users come in. And then even beyond that, how do you ensure that every user can do training and customization of their model without blocking everybody else? Because as it is, you're effectively pushing these GPUs to their memory limits. And having like 10,000 users come in one day to try to throw their data at this model and drain it is going to cause system outages and slowdowns in latency. So building out that entire infrastructure being highly available, concurrent in low latency and having, being able to handle the micro batches.

Starting point is 00:32:30 Those are all the things that you're going to have to think of in a system standpoint. So. Well, that's like DevOps and SRE. I mean, you're just talking about like you haven't even really gotten into like working on the model itself yeah like that's just to be able to run a singular experiment in the friction to be able to you know what i like to say is like you know play around with these models so you have to go through all that in order to get the harness that you need to be able to run your experiments. And then to actually start operationalizing different experiments on top,

Starting point is 00:33:08 it's like you have to grab the data, you have to make sure that you can format the data in the correct format to send to these models. And then you have to also ensure that, hey, what you're doing is going to be available to everybody to kind of perform those tests too. And then when we talk about outputs and operationalizing those, how much further of a push is that

Starting point is 00:33:35 to sort of actually deliver those outputs, you know, in some sort of experience downstream, right? Because, I mean, even if you, so you get your experiments up and running and let's say you have, you know, you have this thing running, you know, running in a way that it's actually operational. Well, then you actually have to sort of package the outputs and then deliver them as part of whatever downstream,

Starting point is 00:34:00 like experience that you're building, say, right? So like essentially delivering the output as a user experience, right? I mean, one way to sort of operationalize an LLM. What does that process look like? Maybe let's say last mile, right? If you're trying to build a product for your end users that leverages the output of the LLM. Yeah, I think from that aspect, let's say you have this system

Starting point is 00:34:27 and you can have it highly available. You containerize it. I don't know. Some people like Docker. You can use whatever type of virtualization or containerization you want. And part of having the model actually run in production is being able to send data, questions, prompt, they call them prompts, right? These are language models, you send your prompt in, and it gives back a response. effectively an event-driven service to handle that, the last mile for us was actually the hardest part for making it available to users. Like being able to have the inference calls correctly on the custom models that people have trained and then making it so that they can spin up new models and try those out and maybe even ensemble the models and deliver it into a product through one endpoint. Interesting.

Starting point is 00:35:36 Can you dig into the inference piece of that? You said that was the hardest problem. Why was that the hardest problem? Is it because you actually had to build that essentially an event-driven system that could process those things? Of course, you probably need to manage ordering and all of those different things. Can you dig into what made it hard

Starting point is 00:36:03 and then how you solved it? Yeah, it's a lot of the same things I sort of alluded to earlier with respect to, unlike the CPU where you can process data and, you know, you're able to kind of allow the different batches and queuing systems to handle the data. So even if the hardware goes into some weird state, you're able to recover from it. On the GPU, the moment you get data that's too large, for instance, people talk about context limits in these models. So suppose you're still under the model context limit, but your system context limit is much smaller. And you have a user that brings in a really long set of text and having it not take down the entire server for the other 10,000 users. That's a problem in micro-batching. able to deliver and serve that request in a reasonable amount of time, like being able to distribute that workload across multiple GPU chips and handle the fact that at the same time, someone actually might even be training their model on top of training their

Starting point is 00:37:21 data on top of the same model. And having that all available to all the users, all at the same time, we've really only seen that with OpenAI. They're one of the vendors that has delivered that experience, right? And we all kind of know how popular they are due to that seamlessness. Yeah. Yeah, it is actually... You know, it's so crazy to think about, like, ugh, I'm getting this timeout on, you know, GPT-4 API,

Starting point is 00:37:54 you know, and it's like... Thinking about what's going on under the hood is insane, right? I mean, the amount of, like, requests that they're handling is pretty wild. Okay. So this is interesting. So I'm thinking about being in an enterprise. I've tried one of the big players, you know, big private vendors. Okay. It gives me an idea of what I need to do, but I need something more bespoke that I have more control over. And then now I'm at a point where I realize, okay, well, actually hand rolling this myself is a way more of a DevOps and sort of, you know, software, like, you know, high transaction,

Starting point is 00:38:42 super low latency software engineering challenge than I thought because I just want to use an LLM to deliver this product. And so I come to Gradient. Okay. So walk me through operationalizing it with Gradient. Well, all you got to do is go to our website, gradient.ai and click sign up. And from there, we have $10 of free credit so anybody can try it actually right at this very moment and you just need to create an account download either our sdk or cli for better experience you you honestly you can just use python and hit it through crow or use crow wow it's request based And you just hit our endpoints

Starting point is 00:39:26 and you run your completions on it. You can try the Lama 2 right now and ask it whatever questions you want and also send data in to train that model and get your own custom model out of it and be able to see what the change was. The funniest kind of use case that we have internally is someone's trying to train a model on Rick and Morty scripts, like making a Rick bot.

Starting point is 00:39:55 So yeah, we're just playing around with it. It's pretty hilarious, but it's kind of that type of frictionless experience. Don't have to think about the infrastructure. All it is to you, it's just a product. It's a service that should give you answers or completions whenever you hit the endpoints. Super interesting. Okay, so can you talk about your SDKs a little bit? So what is that? So let's talk about the, I mean, obviously you can hit it with curl, just write some

Starting point is 00:40:28 Python to ask it a question. But if we're talking about, you know, putting this thing into production, what's the SDK experience like? It's a lot like OpenAI to be honest. So basically with your token, right, you will have access to our endpoint and all you have to do is you point it to the name of the model that you want. We're going to be supporting many more models, but to start, we mostly just have the Lama 2 flavors of the models. Yep. And then you have a text string. So you import the Python library, do gradient.ai, and then give it the model name and then type

Starting point is 00:41:13 complete, because it's a completion technically, and then some sort of question. Let's say you've already trained that model on the Rick and Morty scripts that I was saying, like you can be like, hey, Rick, why are you so mean? And then it'll give you back a tongue in cheek response. So that's the, you know, kind of the experience there. And with the SDK, you can do a lot more interesting things. I think we have some notebook examples out there where you can actually build entire systems on top. Three-volt augmented generation is a topic that a lot of people are talking about to be able to build effectively a knowledge base that you can ask questions on and prevent hallucination of the model, the model telling you things that it's just making up. Yep. And okay, so that's the SDK side.

Starting point is 00:42:07 When we talk about like, you know, feeding the model of data to train it, what is that experience like? It would just be a, for us currently, what we support is just a list of JSON. So you just send it a JSON that has just a, it's a string. So you send it a string of questions and it should give you you have question query response pairs for the model and you just send it into our

Starting point is 00:42:42 endpoint and the model strain just straight off of that. And it's sitting in that server, governed access for you, and you can run your completion on top of that. So you don't really have to save anything. You don't have to think about it. Oh, interesting. And we give you the model ID and the version. So conceivably, you can build a ton of models and actually have like this system or ensemble models in whatever product or system you want to build.

Starting point is 00:43:10 Interesting. And do you, so that's interesting on the versioning, because I would guess that's really interesting for companies who want to sort of tune these, right? So do I access that metadata like via the SDK or the CLI? Yeah, that's exactly how you would do it. CLI experience is very slick. It's very easy to do. And then the SDK gives you a little bit more power and slightly a little harder, but it's

Starting point is 00:43:40 like one line extra. Yeah. Okay. And then do you keep a history of the sort of versions and the like the versions of the model that have been run? So like, is there a history there or is it just sort of one to one? Like I'm getting the metadata back and I would need to store that. So long as the user currently our experience is so long as the user doesn't do a delete on that model ID is it'll exist. Interesting. They were able to just grab it.

Starting point is 00:44:15 So, you know, we had a user the other day's building. He wants to build a chatbot service and he wants the interesting use case that our technology was uniquely suited for is like he wanted every user to have their personalized chatbot experience so he wanted to launch a thousand models on our service and asked us if we could handle the load and i couldn't actually tell him at the exact moment if we could, but apparently we can actually do it. He was able to just launch all those models in a for loop and just get them all and try to create that experience for all the users. Wow. That's super cool. Yeah. I'm just thinking about situations where you may be bringing new people in to work on models and sort of having access to that historical metadata is really

Starting point is 00:45:04 interesting, right? Because you may be looking at output, you know, that was produced previously. So that's super interesting. So you said, you know, a lot of Lama 2 flavors. What other models are you excited about bringing into the Gradient platform? Yeah, I think we focused in, that was intentionally, thus far we have all the text modalities, so everything is sort of natural language, but I'm particularly excited

Starting point is 00:45:40 to start supporting the multimodal models, like the text-to image or like image synthesis and taking in images and kind of the one-to-many modality models, like text to audio, text to image, text to video, like all those types of flavors of things that I just think are getting unlocked at such a rapid pace, right? It's actually hard to keep up. And being able to have those up, I think will bring on even more experiences and bring in a lot of different types of people that want to work on these. So I'm really excited to start doing that like in the next quarter. Yeah.

Starting point is 00:46:22 Okay, let's talk about fine tuning. And so, you know, we think about, you know, sort of, okay, big vendor, private, you know, sort of get what you get. I want to actually do something myself. You know, so I start using gradient, but I want to start fine tuning this model. What does that process look like? And then also maybe like, I want to use my tuning this model what does that process look like and then also maybe like i want to use my own private model not necessarily like one of the llama flavors because that's possible too in gradient right is using your own private sort of operationalizing your own private model yeah so on the enterprise side we allow folk we do support people bringing their own models

Starting point is 00:47:03 and putting it on top of our platform, depending on how compatible the model architecture itself is to our platform. But you can bring in your own model and try it out. And also, if you want to just take the open source model and train it on your own data, it's yours. We don't keep any of the model weights, unlike the closed source vendors. And then particularly for the closed source vendors, you'll see in the licensing that you don't technically own the model itself, but you own the completions of the model. So I don't know the implications of that, but all I know is that we just allow you to have it, right?

Starting point is 00:47:51 Like it's yours. If you want to download it later in the enterprise, we just allow them to have the government access. Yeah. Yeah, that is super interesting. That's like a interesting way to achieve lock-in.

Starting point is 00:48:05 Yeah. You do all this work and then like you can only, you basically like fund putting all these parts, you know, into the machine to make it run the way you want, but then you can only get the output. Yeah. Super interesting. Okay. I want to rewind just a little bit. And I didn't ask you about this specifically earlier, but when did you decide to start Gradient? What was the moment where you said, okay, I want to... I mean, thinking about companies who want access to the open source, or even just this is infrastructure that allows me to just have an endpoint to operationalize my own model.

Starting point is 00:48:46 Super compelling. Was that the original idea? Can you tell us about the very early genesis of Gradient? Yeah. I think the early genesis for us... So I come from a machine learning background. My co-founder, Chris, who I've known since freshman year of college, who I trust a lot, he worked at Netflix. So you can just imagine the experiences that they were there. And he was head of sort of the for the enterprise and anyone in general like users of these platforms they just had so much developer friction to be able to use ai and machine learning and then the other part was it just never felt quite right in terms of, hey, I want to go from A to B, right? Something that you talked about was it's almost like my mom or my dad, they wanted to say, tomorrow I had this business and I think I could do better

Starting point is 00:50:13 because I have all this data and I want to improve the product, right? Like that's almost the thing that was always missing. So through, you know, a few iterations of what we wanted to develop, we finally landed, you know, like you were saying, kind of this wave of this unstoppable wave of, you know, the release of chat GPT, that this was the right experience. Like this was exactly what

Starting point is 00:50:40 we needed. And luckily to, to get it out there, to like bring it to people, explain it like I was five, right? Now we didn't have to explain to everybody like they're five, like they already understood it. So, you know, that was the decision point. And we got the company sort of gone through the same vision that we thought it would kind of get to, but maybe, you know, did we expect it to be purely large language model-based? Not necessarily, but it was always going to be kind of an experience,

Starting point is 00:51:12 an endpoint experience for everybody to be able to adopt their AI. Yeah. I mean, it sounds like it's so funny, but yeah, I think, you know, obviously chat GPT, or at least in my opinion, will go down as one of the most high impact interface decisions, you know, in tech history for sure. And what's interesting about it is hearing myself say like, okay, you just, you have an endpoint, you know, to operationalize your LLM, right? It's that easy. Five years ago, it would have sounded like a massive enterprise project, etc. But ChatGPT makes that make sense, right? Because an endpoint is essentially just a way to exchange information via a very simple interface, right?

Starting point is 00:52:02 And so it is interesting that chat gpt like just created a context you know for that which is fascinating yeah i think i think it was sort of that shift too from the standpoint of which we found our icp was actually the app developers right like those were the people like oh wow yeah yeah like i have a lot of empathy for the deep learning scientists because that's sort of what I worked on for many years in machine learning and understanding all the interesting aspects of that. But then you kind of get towards the people that want to productize it. And then you realize that that's what our product was made for. It was made for those people to be able to easily embed it into what they needed and not actually need to know how it works. Like when you build a distributed system, everybody sort of says,

Starting point is 00:52:57 you know, you wish you would have never done distributed systems at all because it's just such a thing. Yeah. Yeah. No, I think that's a really good thing, right? Is that the democratization of this, giving people an endpoint that allows them to... I think about it as unlocking creativity really, right? So we you know, we kind of talked about like jumping over the ML, you know, the entire ML team and literally giving an app developer an endpoint and saying, look, you can create experiences that weren't possible for you to create before. And all you have to do is sort of leverage this endpoint, right? And it's easy to actually put it into your app. Just super cool. I think it's going to unlock and already has just huge amounts of creativity, which is super exciting. Okay, we're close to the buzzer here. One more question. What excites you the most about the future of

Starting point is 00:53:59 LLMs and what scares you the most about it? I think what excites me is seeing what's going to happen with people running like their thousand models, right? Like I kept on putting inside of the enterprise and like, what is that going to look like to combine the sets of data that already exist out there? So you have all the databases and all the database companies in the world, and people were used to structured data, and then LLMs were mostly unstructured data. And what is it going to look like when you have entire stacks built out with the gravity of all the data coming into like these language models or these, you know, the embeddings models for vector search. And what, how is that going to transform

Starting point is 00:54:52 what the enterprise stack looks like? I don't have an answer to that. And it's something that you just kind of, every time I talk to another client for that, it kind of looks a little bit different, which is really cool. On the side of what scares me are, you know, there's a few things I think people do talk about it's more so that the education and understanding of like how do you actually govern access and how do you make the models do what you think is safe so like alignments and all that like alignment is a huge topic in the industry and additionally the most the maybe part of why we started the company is sort of feeling like, now you have these LLMs. Billions of dollars are being put into AI in these models.

Starting point is 00:55:53 And the infrastructure is still built on a house of cards. It just feels so brittle. And that kind of terrifies, you know, over time, it terrifies me that you have all this and the experience and the frictions need to be made better. But I mean, plenty of companies out there are trying to work on that, ourselves included. I think that'll eventually mature. So those are all interesting things. I think it's super exciting time. Yeah. Well, Mark, this has been absolutely incredible. The time flew by. I feel like we started talking five minutes ago, but I guess it's closing in on an hour. But congrats on Gradient. Such a cool product. I can't wait to go actually sign up and use my free $10 and send a curl request.

Starting point is 00:56:41 So yeah, congrats again. And thanks for coming on the show. Amazing convo. Yeah, appreciate it. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

Pet Camera - EBO Air 2

The Data Stack Show - 162: Accelerating Enterprise AI Transformation With Open Source LLMs Featuring Mark Huang of Gradient

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

The Data Stack Show - 162: Accelerating Enterprise AI Transformation With Open Source LLMs Featuring Mark Huang of Gradient

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.