This Week in Startups - Highlighting Data Intelligence with Databricks & Bonbon’s Reward Innovation | E2020

Starting point is 00:00:00 I'd have a lot of conversations with my internal team as well, like my co-founders. And the framework we used internally was which path allows us to influence the world more. We're all academics. I care about this field. I care about technology. We care very much about how AI is going to impact humanity. There's a large component that's extremely mission driven here. You know, I would say not all startups form that way. You know, if you have some SaaS startups that's building a cool little feature, you're really trying to like make money, which is fine. There's nothing wrong with that. I have no issue with it. I don't think you're really thinking about humanity so much. And I'm not trying to beg you be on a high horse or anything like that. I'm just trying to

Starting point is 00:00:36 like really say like, we cared about this. And that's, that was our initial motivations. This week in startups is brought to you by Open Phone. Create business phone numbers for you and your team that work through an app on your smartphone or desktop. Twist listeners can get an extra 20% off any plan for your first six months at openphone.com slash twist. LinkedIn jobs. A business is only as strong as its people and every hire matters. Go to LinkedIn.com slash twist to post your first job for free. Terms and conditions apply. And Beehive. Power your newsletters with AI tools, referral programs, and ad network features, all in one platform. Get 30 days free and 20% off your first three months at Beehive.com slash twist. Hey, everybody. Welcome back to this week.

Starting point is 00:01:28 week in startups. My name is Alex, and we have a very, very excellent episode coming your way. Databricks, as you know, is one of the most valuable private market technology companies in the world today. It last raised $500 million in a series I that gave it a valuation of around $43 billion. But also, in roughly the last year, the company bought Mosaic ML. And before that deal was actually put together, announced and then executed, we had Neveen Rao, the CEO and co-founder of that company on the show. Well, I have dragged Neveen back because we are now over a year past the acquisition. Much has happened in the world of AI, especially on the open source front.

Starting point is 00:02:06 So, please welcome back to the show. Neveen Rao, now from Databricks. Neveen, hey, how are you? Great. Thanks for having me on. Cool, close the loop a little bit after we're in the midst of the acquisition and now we're on the other side of it. So, all right, let's start there.

Starting point is 00:02:21 So 1.3, $1.4 billion deal. How long did it take to go from like you and Ali Gazi talking to win, the actual paperwork was signed. Well, I guess first conversation was probably like end of March, and then we signed the deal, uh, like mid-June. Actually going from here is what we want to do. We actually want to like the concept of, yes, we're going to, we want to work forward towards an acquisition was probably like the early May. I think it was, I think it was 62 days from that conversation to the actual close, which is kind of record time. Yeah, that's a lot of work to do in that amount of time because you were a venture by company.

Starting point is 00:03:01 So there's a lot of parties involved, a lot of signatures. Totally. I have to ask, though, because I was just going back through Databricks's acquisition history, prepping for our talk today. And the more I think about how Mosaic ML fits into Databricks, you guys are super important because Databricks, of course, built on the open source Delta Lake project, really kind of popularized the lakehouse format for BI across structured and unstructured data, as everyone knows.

Starting point is 00:03:25 And then you guys brought on the ability to train. essentially AI models for corporations that have data, which nests very neatly on top of what Databricks are already built. So a pretty core component of what Databricks is today. And given where prices are for some AI companies, Nevin, did you perhaps sell for a little bit less than you otherwise might have? And I know it's rude, but I'm curious if you stay up at night thinking about the yacht, you didn't buy.

Starting point is 00:03:52 I mean, we're doing just fine. Honestly, the math was about, can we come together with Database? bricks and actually make the hole been better. That was the way I looked at it. And honestly, the valuations are high right now, but, you know, there's somewhat paper valuations. I had a conversation with Allie beforehand. It's like, really, the valuation and stock price of data bricks is strictly

Starting point is 00:04:13 less risk than Mosaic. So we're moving from kind of high risk, you know, very, very high beta to something that's slightly lower risk, probably still high beta as far as like public stock go. But, you know, we saw that there was an opportunity to make the hole better. And really, that's what the thesis was, was, I mean, I can kind of walk through the logic. It was actually quite simple. If we stayed at it alone, you know, things were growing really fast. And let's say in three years, I was looking at a three-year horizon and it was like, can we get to, you know, call it 300 million revenue?

Starting point is 00:04:46 That would put us in the, you know, five, seven billion dollar range as far as valuation goes. We'd probably need to raise a round, a round and a half get there. so there's going to be some dilution taken. So that's scenario one. And so that has certain risks associated with it, of course, the market and what happens with AI, etc. Then there's Databricks, which basically I said in that same time frame,

Starting point is 00:05:08 can we double the price? And I think that doesn't sound crazy, right? In three years, can I double the price? And then we're actually at the same place in terms of how much our stock is worth. So I looked at it from this perspective, and the data bricks path was lower risk. And really, the first conversation I had with Allie about it was

Starting point is 00:05:25 this all sounds great. I'd love to do it. The economics just have to make sense. Like, those numbers have to line up, right? And so we got there. Yeah. Well, this brings us to Founder Mode. You had a tweet from a couple weeks back about, you said, the funny thing about founder mode is that if you have to tell everyone you're in it, maybe you aren't found that really funny.

Starting point is 00:05:44 But I do think that you have to be in founder mode to get a deal done in 62 days from start to finish. Did you have to jostle investors a little bit to? get them to agree to this or were they pretty on board from the start when you put the deal together? Yeah, that's a great question. And literally every one of my investors wanted me to continue going. Really? Yeah. And it wasn't in a bad way. I wouldn't say this in any negative way. My investors actually, I've known for a long time, many of them. And they're like, look, let's go at it. We let's not sell. Let's let's let's go bait. Right. They're like, if you want to do this and you think it's the right

Starting point is 00:06:25 path will support you. But we're totally all in to keep going. So I thought that was a great message. And really, it was, it was hard because I'd have a lot of conversations with my internal team as well, like, you know, my co-founders and sort of the framework we used internally was which path allows us to influence the world more. That was it. Do you think that that perspective is the right one for founders in general to take as they observe a possible acquisition offer? Or is more of a Mosaic ML specific perspective? I would say definitely it was a mosaic thing. We're all academics. I care about this field. I care about technology.

Starting point is 00:07:07 We care very much about how AI is going to impact humanity. So there's a large component that's extremely mission driven here. I would say not all startups form that way. If you have some SaaS startups that's building a cool little feature, you're really trying to make money, which is fine. There's nothing wrong with that. I have no issue with it. I don't think you're really thinking about humanity so much. And I'm not trying to beg you be on a high horse or anything like that.

Starting point is 00:07:30 I'm just trying to like really say like, we cared about this. And that's, that was our initial motivations. Well, I mean, your comment was whimsical and, you know, I just like edge to it. But I don't think you're off base. And also I think what we've learned watching SaaS companies as a specific cohort in the last two years is that if you built a SaaS company that was a very strong feature that raised money in 2021, you may not survive in 2024. So I don't think. I don't think your comment, even though it was slightly sharp elbowed, was off base. Now, on the mission point, though, this is a good thing to touch on because Databricks, of course, has a open source heritage, if you will.

Starting point is 00:08:07 And also the Mosaic ML company put together a number of very popular open source models, including MPT-7B and MPT 30B. One of them got more than 3 million downloads, which I think at the time, Naveen, was the most downloaded open source LLM. Is that right? It was, yeah. Yeah. So it seems like there's a real kind of like mission agreement when it comes to open source in general at Databricks plus Mosaic. And this brings us to the recent news item in which we saw that SB 1047 in California,

Starting point is 00:08:40 a AI regulatory bill was vetoed. Now, one sticking point or one point of criticism in that was that that bill might have made open source AI development more frankly legally risky. Now that's behind us. I'm curious if there's anything that you think is standing in the way of open source AI development when we consider it in a competitive race with closed source. I mean, I think the potential for lawsuits on copyright information obviously is real. There was still the executive order that was put out at the end of last year that put some

Starting point is 00:09:16 hard limits in terms of compute. It's not clear how this is going to impact open source. I think whether something is open source or closed source is a business to but like outlying it actually causes problems. We're already seeing it with llama models like multi-level llama models. We're not rolling out in Europe because of the regulations they put put in. So I think that's the net of this is that the consumer suffers, right? They have fewer choices and there are fewer ways for them to modify, customize these

Starting point is 00:09:45 things for their purposes. So I think it's a net negative if you start to restrict what people can do, especially this early. You know, maybe in five years when we truly, understand the impacts and how the liabilities work, etc., etc. Okay, we can start to put some rules in, but it's just too early. All right, everybody, I'm on the road. I mean, all the time, right? And that means I am always juggling phones and laptops, apps, all these different services I use. And when you use multiple

Starting point is 00:10:11 devices and you have all these different apps running your business, you need to have one single phone number that is perfect. And that perfect phone number is open phone. It's going to simplify all the communications you have in your organization, because open phone has rethought the modern business phone. It's so magical. It works with a single elegant app that you can put on all your devices, and it works right on your existing phone. Even works on your desktop. Our sales team uses it here at launch. Why do we use it? Well, we don't want people talking to customers on their private phone lines, and our account executives don't want to give their personal phone number out. That's just weird. You want to have everything tight, an open phone will make it tight, and tight is right in this regard.

Starting point is 00:10:55 Shared phone numbers are also awesome for things like customer support or when we run events. We like to have a field phone number. So, hey, if you're a VIP and you're at the liquidity conference, just call that number. Open phone is super affordable. It's just $13 a month. Twist listeners, get an extra 20% off because they got you covered. Openphone.com slash twist. And what if you have an existing phone number?

Starting point is 00:11:15 No problem. Open phone's going to pour them over at no extra cost. head over to openphone.com slash twist to start your free trial and get 20% off. So on the meta not bringing their most recent family of llama models, I think that was announced at the 3.1 work. We're now at 3.2 for the meta llama family. I'm curious what your take here is on this. Is that an actual meta saying, look, we can't bring them to Europe because of regulatory confusion? Or are they trying to make a point by saying that as a way to politely influence the way that AI regulation is put together in Europe.

Starting point is 00:11:52 Honestly, I don't think there's anything too galaxy brain going on here. I think it really is perceived risk. And it's like, okay, don't want to take that risk. All of us, every company who's deploying these solutions has to look at that and make a decision around the risk. And yes, maybe there is in a back of our minds like, okay, we want to change that environment and make it more friendly. But, you know, I think in the short term, we're really just looking at it, like, is it worth

Starting point is 00:12:16 the risk? Is it a risk reward tradeoff you're willing to make? And Lama said, no, it's not. Or Mata said, no, it's not. So I don't think there's anything else going on here. I think this is a consequence of early legislation. Yeah. Does Databricks have any restrictions or set up changes based on different geographic regions for its product set? Because data bricks, plus mosaic is now a pretty broad company. So I don't know every single product line you guys have exactly where it's available. But I'm curious if regulations have also impacted how you're rolling out new features, open source models and so forth. Absolutely. Yeah, it is. I mean, we conform to a lot of these same rules. Like if meta is not rolling something out in Europe, it's unlikely we would.

Starting point is 00:13:00 Okay. So essentially, I can't run a Lama 3.2 model on the Databricks Mosaic system in Europe? At the moment, we're still looking at the exact details of this. But yeah, we're not trying to like provide an end around or anything like that. That's not the goal, right? Right, right. I was not trying to, I was not leading towards trying to get you to provide an end run around Mark Zuckerberg. But I am keeping tabs on how open source AI development is playing out around the world,

Starting point is 00:13:30 because just frankly, as an open source sympathizer, just kind of, you know, personally, I'm really excited about what open source AI models have done in the last 12 months. It's given me a lot of excitement that there's not going to be too much of a performance gap between what we're seeing that's open source and available on, say, Hugging Face and what's available via a open AI API call. And that actually kind of brings me to the Lama family of models. When those came out in July of this year and then later of September, were they better than you guys expected them to be? Because the market seemed to be very pleasantly surprised by them. Were they better? I think what was this big surprise for us with the 3.1, the 3 and 3.1 was really,

Starting point is 00:14:12 Maybe not the quality of the base models. We knew that was going to be within a certain window, but really the care that was taken on the RLHF side of it, where they actually did a lot of work there. And, you know, frankly, put a lot of money into it. So that was probably surprising for us how much they did. And I think it was really a statement of, here's our commitment. This is how big we're going to go with this.

Starting point is 00:14:36 So it wasn't clear. When the long of two models came out, like they were good. But, you know, DBRX and other open source models came out, mistrol and others that were better, you know, at the time. So it wasn't that that was, they were far in a way. But then honestly, I think in the last, you know, six months, things have really shifted. They're, they're investing heavily and they're building really great stuff. Yeah, absolutely. And just to everyone knows, RLHF is reinforcement learning from human feedback. I actually forgotten two of the acronym points to want to bring that up. Can you just explain what that is and

Starting point is 00:15:05 why it matters for folks out there who are AI interested, but not perhaps AI experts? Yeah, so when you put these models in the wild, you can't, you can't make, give them just a raw model. There's all kinds of weird stuff these models can say, you know, things that might be correct, but not necessarily, you know, kosher to say, perhaps, or, you know, sometimes they can weed down paths like go very dark and we don't necessarily want our models to do that. Yeah, especially if you're talking about psychological safety, if somebody's depressed and they type in, you know, what's the best way to commit suicide? I don't want the model to instruct them. So actually providing these kind of balance and guiding things in a way that to positive or saying I won't answer. Really, it's pushing all the probabilities around of where that knowledge lives and how to how to impress it. And it's actually quite hard to do this well.

Starting point is 00:15:51 Yes. And if you're going to do a lot of it, you're going to have a lot of humans actually doing this. Ergo it's expensive. And that's why your point about what meta put into the Lama models matters because they're not just writing code. They're also doing the flesh and blood work to make these models pretty good. Now, you mentioned DBRX, which is the open source model that Databrix put out March of 2024, if memory serves. Correct. Yeah. And in March, I believe.

Starting point is 00:16:16 And the company said at the time that it spent about $10 million training this model. And that number just kind of sat in my head because I couldn't quite figure out what it meant. It didn't seem like a huge amount of money. It didn't seem like it was free. But one thing I did recall is that Mosaic ML, one of your goals as a company, was to make training models a little bit cheaper, more. efficient. And so I'm kind of curious. One, are you all going to be putting out DBRX2? And then two, should I expect models of that size and complexity to become cheaper to train over time from companies like Databricks and also perhaps other corporations? Yeah, so the first one,

Starting point is 00:16:52 you know, we're partnering very closely with meta now. So we basically said, what's the point in providing another choice unless there's direct innovation? So we can choose to work together with them and say, hey, these are things that we see our customers needing, you know, new potential directions to take to make more efficient. So now we're just working more closely with them. So at the moment, we have no plans for DVRX2. On the second point, you know, we're constantly working towards making these things cheaper. So we're still doing base research on these kinds of techniques.

Starting point is 00:17:24 Like, how do we extract the most from our hardware? How do we make these neural networks learn faster and cheaper? And, you know, we're going to communicate that with our, partners like meta. And so, you know, the DBRX was cool because it was a very large, what's called a mixture of experts. And really, this was a way to make things train cheaper to a certain quality, but not only that, actually make the inference faster and cheaper. And we're going to see this happening in the open source more and more. Mistral actually had an open source M.O.E model as well. So, you know, I anticipate we're going to see more of these techniques coming out.

Starting point is 00:17:58 And really, the benefit is going to be to the user. You're going to see a faster response. bonds with higher quality and less money. That's where we're moving. I still believe this idea that we talked about at Mosaic. We called it Mosaic's law of 4X reduction in price to get to a similar quality model. Year on year is real. We're seeing that today. We're seeing it from Open AI.

Starting point is 00:18:18 We're seeing it from everyone. So this actually brings to me a thought that I think six months ago wasn't talked about much, but it's now pretty common, which is that people are concerned about the value in AI models and how quickly they can depreciate based on progression. And I think we can see this in OpenAI's price points for a model over time does decline pretty precipitously. So does working with meta allow essentially Databricks writ large to have leading AI models inside of its platform without having to do the work to train them? It feels very much, and I say this politely, like the corporate version of getting your cake and eating it too. Maybe that's what we're shooting for.

Starting point is 00:18:54 No, I mean, you know, look, they want to invest in this and they have good reasons to do so. Yeah. And, you know, we're close with them. We're non-competitive in a lot of ways, right? Meta is doing this to enable a developer community. They're a social platform. They're using the models internally for all kinds of stuff. Those things don't overlap with what we do.

Starting point is 00:19:14 We're focused on enterprise. And so it's actually a really great partnership. We're very complimentary. They want to build a developer community. We want to enable developers. So it works great. So why not? Why don't we do this together?

Starting point is 00:19:27 I mean, I just realized that now I have to ask you, once we stop recording, to not sell your company to meta, because given that how you guys are sufficiently well matched, I mean, I metabind database wouldn't be the strangest thing we've seen in the last 24 months, but it would be a top five thing. Yeah. Founders, I know that you're keeping a close eye on your burn rate. I am too. In today's venture market, every single hire you make has to be perfect, right? You can't make mistakes. You got to keep that runway as long as possible so that you can run more experiments and you need talented people to run those experiments and figure out how you're going to get product market fit, how are you going to scale your

Starting point is 00:20:04 company? And that's why you need to use LinkedIn jobs. As you know, LinkedIn brings you the candidates that you can't find anywhere else. LinkedIn passed the one billion member mark. Think about that. One billion members. And 70% of LinkedIn users don't visit the other leading job sites. This is a phenomenal statistic. They don't even go to the other job sites. Why? Because they might not be looking and those are the best hires but they're hanging out on lincoln doing professional development checking in on their network building their network sharing content finding leads all that great stuff bottom line there's amazing hires waiting for your company on lincoln and nowhere else and they have a special deal right now post a job for free what f r e what a great price lincoln dot com slash twist that's right

Starting point is 00:20:47 lincoln dot com slash t wist to post your job for free terms and conditions course apply so On models getting better over time and also cheaper. The thing that I've been playing with the most lately has been Open AIs 01 model. And I still think this has been digested a little bit by people, but it does feel a bit like a high watermark. And it was using a lot of reinforcement learning, goes a little bit slower. Is compared to the open source world closed source now ahead of what was he from open source? Or is this just kind of the give and take of progress, if you will? I mean, the closed source guys are still ahead.

Starting point is 00:21:23 There's no doubt about it. I mean, Open AI is, they have to stay ahead of the curve. And as you pointed out, the half life of a model is actually quite short. It actually, it speaks to the economics of this whole industry right now, by the way, is that you know, you spend a billion dollars on a model and it's only going to generate anything for maybe six months of revenue. And so that's kind of hard, right? Like in terms of capital investments, I mean, we don't see this anywhere else.

Starting point is 00:21:48 Like if I invest, you know, $100 million into designing a car, I can sell that car for five years. Yeah. If I can only sell it for six months, you know, the market has to be really big. So that is definitely going to be a hard thing for the industry to work through. And I don't know what the answer is. But in terms of close source, close source is still ahead, no doubt about it. Meta is putting the investments in to make open source catch up. And what I think is interesting is that if you look at the O1 model, I actually don't think there's a huge, a huge step forward on the model itself on the OOM. It's actually more the orchestration of that model, how that model self assesses and comes up with better answers. Because the way these models work, they actually come up with probability distributions of outputs.

Starting point is 00:22:36 So if you think of it that way, like, there's a probability that the output is incorrect. But the mean of that probability is correct. So you have to kind of do multiple generations many times to get to the right answer. And really, that's what we're seeing here. And actually, it's interesting that Open AI, who has the most resources thrown at this has gone that direct. Instead of like saying, I'm going to make the output probability to be correct. It's like, no, no, no, I'm going to think about retries and doing this over and over again.

Starting point is 00:23:05 So it's a peek into the direction of where things are going. It's a sort of idea of orchestrating the model to, you know, have retries and get to a more correct solution through iteration. Help, help. I can only get so bullish about AI. Okay, so this is super cool. So essentially what you're saying is the core technology is. still advancing, but what Open AI is done in this case is use what currently exists kind of recursively

Starting point is 00:23:29 to improve the output. This is the reinforcement learning element, correct? No, not exactly. So this is all done at the inference time. So inference time is basically Oh, I see. The model is somewhat static. So you're basically asking the model, produce for me multiple results, and have some way of judging the quality of those results and then saying, ah, okay, this is a better step forward than that one was. So if you think about what an agent is, like a human agent, is you're kind of thinking through multiple steps to solve a problem. You break a problem down. And each one of those steps has to be correct. So if you're incorrect in one step, it's sort of taken down a bad path. So you need to assess, self-assess. Did I make the

Starting point is 00:24:08 right call on this step? I think I did. Then move on to the next step. And really, that's what's happening now. How far can we can we push this point about orchestration to improve model output quality. If we can get gains from orchestration, does that road have a long way to run? Are we just scratching the surface of what we can do with that? Or has 01 perhaps kind of shown the limits of that new technique? It's a way to get better results out. Oh, no, this is just the beginning. And the idea that you can break a problem down is not new, right? We actually wrote a blog on this, I don't know, six, seven months ago about compound AI systems. And really the idea here is, if you look at any engineered system, let's call it, I don't know, the original coding for computers, right?

Starting point is 00:24:55 We would write at the assembler level and it was this big monolithic blob of code, probably completely unmaintainable. If you think about it, like by today's standards, it's like, oh my God, why would we do that? Because we didn't have abstractions then. And so we sort of thought about this big blob of code is completely non-introspectable. It was hard to maintain. over time, we moved to a place where we could modulize. We could break down code into functions, objects. Each of these things can be independently verified and then strung together.

Starting point is 00:25:24 And so advancements in programming languages allow us to do this. We're going to see the same thing in large language models. First, we start out with this giant blob that just does something and it proves the concept that we can make something work. Now we're moving to a point where it's like, well, hang on, if I'm in enterprise, I actually want to modularize this. I want to say, maybe I want a language front end that can parse multiple languages or multiple modalities. Maybe I want a reasoning engine that understands how to reason over a particular kind of data. Maybe I want a backend that understands when I should be calling to a database

Starting point is 00:25:57 or calling functions. And then I want something that assembles all this stuff, creates an output that's formatted in the way my users need it. Maybe that's JSON. Maybe that's English. Maybe it's something else. So all of these pieces come together and you got to orchestrate them and you want to independently verify them. And this way, we move to a much higher reliability. Instead of talking about this giant monolithic blob, now I have independently verifiable components that I can actually make really good and I can find bugs and I can debug the system. Okay. So one thing that I saw maybe five years ago and I loved this was a lot of companies thinking about no code and low code programming. Essentially, abstracting one layer above actually typing code, connecting things

Starting point is 00:26:37 things, moving boxes around, et cetera. As someone who hasn't really written codes since C++ plus in high school, I loved that. It's fantastic. I felt like I had superpowers. In the world that you're describing, is this a future in which people who are not hardcore developers will be able to set up these different models and functions like you're describing? Or will this, do you think, remain the remit of the engineering department even in, say, three years?

Starting point is 00:27:01 I think it's going to remain in the remit of the engineering departments. The reality, I know, it's this. I mean, it's one of these things. I think coding assistants have really opened the aperture of who can play with these things. I think there's a certain level of complexity you could express that way. Human language is naturally kind of imprecise. You know, we talk about like coding as if it's a bad thing, but it's just a very precise way to describe a problem. That's all it is. There's nothing inhuman about it. It's just a more precise way to describe in English. I actually don't want to program in English. I want to program a programming language because once I'm facile with that, because it's more precise. I know exactly what the effects are. So if I think about it like that, we need we need high levels of precision. I need the model to behave in a certain way under certain conditions. I need to describe those conditions very precisely. I can't sort of hand wave and just say, yeah, go figure it out, right? I need to actually work all the way down to precision. Well, I suppose my question then would be how long until those parameters you have

Starting point is 00:28:00 to set or rules you want to set up or test you want to run are entirely done in natural language versus code, because at that point, it would open the aperture quite a lot, but it sounds like, frankly, not soon. I mean, look, I think there are patterns that we see today that are common. Let's call it 80% of the use case can probably be captured in, you know, some kind of pattern that I can call from English. And so I can get most of the work done with English, but then there's going to be that long tail of stuff that needs to be really precise.

Starting point is 00:28:28 And I think there's, there's just no shortcut. You've got to be really precise. I mean, it's like a legal document. It's the same thing. Yes, it's written in English, but it's not really English. It's a very precise language that's meant to describe something. It could be English, but, you know, it's still got to be precise. Does that make legalese then a very high-level object-oriented programming language?

Starting point is 00:28:48 It absolutely is, in fact. No, it totally is, right? Actually, there have been attempts to make it into code where I have strict syntactical requirements of things, which actually enforces potentially correctness. and verifiability. This is a programming language's topic that's been around forever, you know. I suppose, though, if we do then use AI to automate the legal industry out of most of its employment, we can then just take lawyers and turn them into developers

Starting point is 00:29:14 because they'll already be familiar with the style of communication. There we go. Problem solved. Labor market fixed, everybody. You heard of here first. I want to talk about the enterprise, though. And one reason why is I am a little bit muddled about agentic AI. And I know you guys have talked about.

Starting point is 00:29:31 about recently the Mosaic AI agent framework and agent evaluation services. Sierra by Brett Taylor is working on agents inside the enterprise. To me, it's become a bit of a catch-all term, if that makes sense. And so I'm kind of kind of curious to me, how do you slash Databricks define agentic AI? And then I want to dig into that a little bit. But let's start there. Yeah, you're right.

Starting point is 00:29:52 It is a catch-all term. It became the new cool thing to say. And it's not well-defined as to what it really, I mean, I would say, what the term, how the term came about was building something that can work through a set of steps, you know, some sort of a workflow, break a problem down and actually work through multiple steps to get there. So, you know, we see this all the time with RPA, robotic automation. Yeah. Okay, everybody, you know I love newsletters. I love sharing knowledge and what I'm up to through

Starting point is 00:30:24 my newsletter. And if you've got a newsletter, you got to check out Beehive. It's spelled B-E-E-H-I-V. I use it for the Twist Ticker and our Twist 500 newsletter. My team is raving about Beehive because it is an all-in-one platform that not only powers our newsletter, but it's got all these incredible platform features that are helping us grow our subscriber base, which is what these newsletters are about. Listen to this. Beehive's co-founder is the same person who helped Morning brew reach millions of subscribers. In other words, they took all those tactics and they put it into a platform. They've got a great feature. I just want to tell you about it. It's called the AI Post Builder. This makes writing easy. That's a great writer. Sometimes I need inspiration. Well,

Starting point is 00:31:10 they will get inputs for your ideas and then shape and optimize your content for the maximum impact. It's perfect for busy founders, right? And they're available 24 hours a day. They also have a referral program that turns your audience into ambassadors. It works great. Plus, hey, if you want, to monetize, they got an Ed network. And you know what? It's super affordable, starting just at $39 a month. So here's a great call to action, 30-day free trial, plus 20% off your first three months. Go to B-Hive.com slash twist, beehive.com slash twist, B-E-E-H-I-I-V-com. What a great product. Yeah, so we want to say like, here's a common pattern I do over and over again. Maybe it's like, I have a bunch of data in one file, I cut and paste that, I drop it into an,

Starting point is 00:31:57 Excel format, then I go and apply this formula. Like, it's a common pattern we see over and over again, but it's just very hard to automate. Yeah. So I think that's what the RPA stuff is all about. And agents were kind of a way to say, well, maybe I can have it observe what a person does and then learn. That's where this started.

Starting point is 00:32:15 I think we're not quite there from a, from the perspective of I can just turn it loose and be 100% correct. But what we want to work toward is doing that within data bricks or other data platforms, because these are the kind of work clothes that dominate inside of data bricks. People do ingest of data, they do some sort of transformation, then some sort of dashboarding, and they can make some kind of decision? Can we start to automate some pieces here? Really, that's the overall vision.

Starting point is 00:32:41 We have a few different ways that this looks today. Okay. I want to pause and talk about this, because it sounds like when you describe breaking up a problem into smaller pieces, taking steps. Honestly, that reminds me of going back one step. What I see with 01 when I ask to show. me it's logic. So to me, it feels like the agenic AI you're describing is kind of like what

Starting point is 00:33:02 that model is already doing. And I presume here that I'm confused and I'm oversimplifying things. So can you just set me straight? No, conceptually, you're right. And I think what the big difference is between that and what we're doing is the idea that I want something that's really truly bespoke and built from my process, my data, built from my data and the learnings that I have within my company. O-1 is great. It's a concept and it works well under the conditions where they tested it. But you can break it pretty fast.

Starting point is 00:33:31 I mean, if you just go on Twitter and look, there are lots of people that broke it and made it do stupid things with really simple questions. And I think this shows that to make these kind of systems work, you do need specificity. You need to work from your data. Your data is the ground truth of what should really happen.

Starting point is 00:33:47 It's not a general problem. And this is not a foreign concept. If we look at like humans, It's the same problem. Like, if I got generic smart kid out of college and said, go solve this problem that I have 20 years of institutional experience around in my company, they're going to fall apart too. They're not going to get the question right. Maybe if they're really smart over time, over the next couple of years, you could train them

Starting point is 00:34:09 to be really good at it. So I think this is the same thing we're seeing here is that we see the value in building things that are bespoke and experts within a particular domain. We call this concept data intelligence where you can use your data to create intelligence to actually automate things and make useful for your business. Yeah, and actually I watched a clip of you guys sent over about the difference between general intelligence, which is essentially a large model, it can do cool stuff, and then data intelligence, which is that model applied to your existing corporate data set.

Starting point is 00:34:39 And this, by the way, is always why data bricks, data lakehouse plus mosaic ML made sense to me because why not have a brain on top of your bucket of data? But at the same time, in that same post about the Mosaic AI Agent Framework, you said, or your company said, sorry, while building a proof of concept for your GenA application is relatively straightforward, delivering a high-quality app is proven to be challenging for a large number of customers. And I'm curious if that issue is around the data they need to have perhaps cleaned and ready to be put into the model, or if it's something else that's making a difficult

Starting point is 00:35:12 to turn a bucket of cool data, cool models into an actual product people want to use inside their company did today. Yeah, I mean, there's a lot of things to pick on here. So data quality, data cleaning, all of that, absolutely. You need to get your data in very good shape because if you're going to start calling this data the ground truth, you need to make sure that's really true. So bad data leads to bad outcomes. That's true for humans too.

Starting point is 00:35:34 If you give someone the wrong information, they're going to do the wrong thing. Sometimes you give them the right information they did the wrong thing or the wrong information that right thing. Humans are, bro, humans. We can talk about that. a lot too. I know. Let's not get bug down. Sorry, it's an election year, so I have a lot of thoughts about that. But let's, sorry, back to Univine. But humans are actually still quite amazing at understanding causation and how to get to an end goal, which we haven't quite cracked

Starting point is 00:36:02 yet in this field. When we look at some of the problems I've seen from different companies is, you know, I would characterize the second half of 2023 and the first half of 2024 as every Fortune 500 is like, oh my God, what's our JNI strategy? Right. Everybody had to have this. I've talked to boards of large companies and they all said they're asking their management, you know, what is your strategy? And the strategy was go get an open AI account. Right. And I think, you know, people don't know. That was my strategy too, Nevin. I mean, and they're defense. And that's fine. You know, I think it's a place to start. But really, you need to understand what success is. This has to have a business impact. Like just putting a new

Starting point is 00:36:42 feature on your banking app may or may not have any effect. on your business. It has to be something that your users want, that they get value from, and actually use your platform more or do more banking or do more loans or whatever it is. So actually thinking through that is still a business strategy question. It's really not an AI problem at that point. Is this a good thing for my customers? Perhaps at some point, AI will be able to tell me that. Today, that's not really true. Okay, so now we can say, well, I've identified a problem. Great. Now I want to think through what is success criteria. So I can take components that are in data bricks and I can start to assemble them.

Starting point is 00:37:18 I can get to a working demo pretty fast, as you said. And that demo might have some kind of performance attributes to it. It's going to break in certain places. Oftentimes we find that this looks cool from a demo perspective, but if you actually go and try to roll it out, you'll see all kinds of degenerate use cases, all kinds of places where it breaks. Okay, so let's characterize that.

Starting point is 00:37:38 Let's build an evaluation and give it a score. This is at 37% correct for my application. application or it's 37% useful for my users, like start to score this stuff and evaluate it. Once you score it, now you're in a place where you can actually start to improve it. We have all kinds of tools that can say, okay, well, let's try this to see how it moves the score. But you need to get to that point. And I think this is actually the blocker for most of these applications.

Starting point is 00:38:05 Beyond all the plumbing stuff of security and governance, you need to get all that right. So I think there's a lot of steps that are that are kind of impeding progress for enterprises. And we're trying to break down those steps and make it systematic for our user. So I'm familiar with the CEO of a company called Skyflow, Antushu Sharma. He's a friend of mine and likes to explain things to me. And one thing that he was early on was, you know, helping corporations keep PII safe. And then the Gen.A.I. Boom came and everyone was talking about putting the data data to work. And I presume that there is the same kind of tensions between people wanting to keep their data properly permissioned and only the right people can see it. And I can't ask about

Starting point is 00:38:42 Nevin's work data and so forth. Has Databricks and other vendors solved that issue, or are we still working out how enterprises can bring their data safely into an AI context today? Yeah, that's a great question. It's actually a big

Starting point is 00:38:58 part of what we do at Databricks is we call this data governance. The cool thing about Databricks is that we already had a framework for this. We call it Unity Catalog. And this is basically a universal source of truth of of permissions and lineage and, you know, logs of what happened.

Starting point is 00:39:16 If you're a regulated industry, you're a bank or a healthcare company, you need that. You can't actually go willy-nilly and say, okay, here's some data. I'm just going to throw into a model and, you know, hope for the best. Like, it doesn't work that way, right? That's a one-way ticket to sitting before Congress being told that you're a bad person. Yes. And not actually not just sitting in front of Congress, like massive lawsuits and huge damage to your business.

Starting point is 00:39:38 I mean, what's really interesting is that banks and other regulated industries, is they work on trust. Their whole business is based on trust. I'm trusting that that bank is doing the right thing with my money when I put it in there. Otherwise, I wouldn't give them my money, right? So they have to be kind of conservative in this regard. And, you know, that's been a real big blocker to get this stuff out to their customers. What we're seeing today is they're thinking about a blast radius.

Starting point is 00:40:01 Like, okay, I can try building a solution. Maybe I can expose it to internal users as a tool or an age. Sure. And start there and then maybe work your way up to something that goes external. That will happen as we start to trust these solutions and be able to engineer them through all of the edge use cases. But really, this is a huge blocker. So what we've done is inside of Databricks, we've integrated a bunch of stuff that came from Mosaic, as well as a lot of things that have been developed here under the umbrella of Unity Catalog. So, for instance, if you fine-tune a model, that model is fine-tuned on some data.

Starting point is 00:40:35 That data had some attributes associated with it. Maybe it had certain access controls. Those access controls need to be inherited by the model. because that model actually is a function of that data to some degree. Or when you use a vector database and you do a pattern like RAG or retrieval augmented generation, there may be data in that vector database that can be viewed by certain users and not others. When a user hits that endpoint, the vector database needs to be aware that, oh, this user can't see that data, so don't pull that data.

Starting point is 00:41:04 So these things, they seem simple. They're conceptually not difficult problems, but to manage it at scale gets very, very complicated. And so we were able to leverage this actually quite mature framework, which we just open source, by the way. It was open source at the Data AI Summit in June. And on state, Natay, Zahari, who is a CTO and co-founder of Databricks, actually hit the button on state that open sourced it. I thought it was kind of cool.

Starting point is 00:41:28 No, no, I mean, I'm always here for pageantry. And why not push the button live? Was it open source onto, where did you guys put the code online? I think it's on a GitHub repo. Cool. I'll throw a link to that in the show notes if anyone wants to check it out. Absolutely. So anyway, I think the point is that we had a quite mature way of thinking about this in terms of data governance.

Starting point is 00:41:49 And we've applied that now to Gen AI. So different components like fine-tuning, embedding, you know, vector databases, function calling, all of these things live under the guise of this Unity catalog, which sort of takes away that security and governance plumbing side of things to unblock some of these use cases. Now, getting to correct models, they're useful models, all of this stuff is the next frontier, I'll call it. Okay, but listening to you talk about this

Starting point is 00:42:16 and thinking about the issues that have come up and that have been fixed or at least worked on by Databricks and other companies to get enterprises from, we have an open AI account to, we're using our own data in production for a feature that's either internal or external. It seems that a lot of the pathway has been smoothed.

Starting point is 00:42:32 So it has not helped a lot of companies go from, we have a Gen. strategy, which is we're tinkering, we're playing so we don't fall behind, but we're not really in production yet. Two, this is something that we've delivered both to customers and also our employees. Yeah, they're getting there, right? This is still so new. I mean, remember, Chad GPT still was not even two years ago.

Starting point is 00:42:52 We're coming up on it in a month. It was November of 2022. So this is actually quite unprecedented how quickly the industry is moving, right? Two years is amazing to actually even get a POC going on this stuff. Well, that is, that is, yes, that is enterprise speed. If you're curious why it took two years and that's fast to get a proof of concept, go work for a private equity owned company for a while and you'll find that two years to do anything. It's fantastic.

Starting point is 00:43:18 So I hear you. It is. So we are starting to see that now. And, you know, having conversations, you know, almost daily with these enterprises about how to do this and how to do it well and what are some of the best practices we've seen. Having these governance tools out there, like we launch this stuff this year. these are these are within six months we've watched a lot of these features and you know you need to be a trusted partner like I know there's a lot of startups building cool stuff and absolutely continue building cool stuff there's no reason not to but you know it's hard for a big

Starting point is 00:43:47 company like uh I don't know but name your big favorite bank you know they they can't take and use favorite bank I mean who pick your favorite super villain yeah who charges you ATM fees I chase I guess okay anyway sure but yeah take a chase, like for them to use a startup, it's quite hard, right? I mean, they're just getting a tool approved. Maybe they can get some POCs going, but they need trust in their vendors. And, you know, Databricks has already built this trust. So putting these features in on top of a trusted platform they already have,

Starting point is 00:44:21 has actually been quite transformative. Our business is growing very fast. And, you know, I can't talk about the precise numbers, but, you know, it is, we're seeing a lot of uptake. I've written a lot about Databricks' growth over time. And this is actually a beautiful segue to what I wanted to ask, which is before Gen AI became a household name, the thing that everyone talks about in the world of technology, Databix was growing very quickly by itself. In the pre-Mosaic ML days, if you will. Yes, absolutely.

Starting point is 00:44:47 Now, of course, when you pull up Databricks' website, it is the data and AI company, clearly an emphasis across the branding and really, let's be honest, the industry. So now when new customers are coming in to the Databricks world, are they coming in because they're more interested in the, the, the, lakehouse product, data, storage, and BI, or is the AI now a leading vector by which you guys land new logos and accounts? I would say it's a combination of the two, honestly, because we still see a lot of companies out there who are struggling with migrations of their data platform. I mean, again, going back to enterprise speed, I know it sounds like, you know, things that were like 10 years ago, but these are still problems from 10 years ago, like these legacy systems.

Starting point is 00:45:29 I actually talked to a company, not in the U.S., but in Asia, that was literally still running on, you know, old IBM mainframes 20 years ago. And they're like, how do I modernize? I mean, I know this sounds silly, but like, this is how these industries work, right? And that's how COBOL programmers have 100% employment right there. Exactly. And actually, they're getting paid a lot. Yes. Because that's actually, that's my backup plan to journalism, by the way, is if this all falls apart, I'm going to go by a couple of books on COBOL and just,

Starting point is 00:45:59 nestle myself deep in someone's basement, it's going to be great. Yeah, take a year, go learn it really well. I think you'd have a, you know, great career prospects for the next 10 years. Yeah. So I think we are seeing a combination of companies who want to leverage their data for AI.

Starting point is 00:46:15 And this is squarely the vector that we are winning on right now is that we are the combination of those two things in it. And frankly, it's not something that's new to Databricks. In fact, Databricks started there. Spark started there. It was actually the Netflix challenge.

Starting point is 00:46:31 If you remember this thing, I think from 2012 or 2011, where Netflix published a bunch of data and they said, okay, go build something that can match, you know, preferences of our users to movies they might want to watch. And they published a big data set. So, uh, Mette and others actually made Spark in response to that

Starting point is 00:46:49 because they want to scale it out over a whole bunch of CPUs. So, uh, you know, AI has been something that's been front and center for Databricks. In fact, many people use data bricks for AI and then use other data platforms. Oh, interesting. Yeah. Okay. I want to touch on that because one thing we've heard so much about at an industry level

Starting point is 00:47:08 is the need for more compute to avoid being GPU poor. How well do you know Jensen? Will he send you H-100s? But, you know, prepping for today's show, I'm seeing Databricks on GCP, Databricks on Azure, data bricks on AWS. And you guys, you know, you mentioned you're not working on DBRX2. So does this mean that Databricks doesn't need to build a hyperscale network of data centers around the world? And if so, is that a risk to the company?

Starting point is 00:47:38 Well, we do not build data centers. So we never have. We run on top of the cloud. We're a software company. So being GPU poor, be rich. We actually do still a lot of fundamental research. So we still have a sizable footprints of GPUs. But we're not building those open source models.

Starting point is 00:47:55 It doesn't mean we're not doing other work. That's on the internally facing side. Externally facing, we're running lots of inference. We're seeing inference kind of take off for these models. I mean, as we get richer use cases, things like agents that we've been talking about doing multi-step, like there's a ton of pressure on inference, making inference really fast, making it really cheap, making it very high quality. All of that takes GPU compute.

Starting point is 00:48:19 So I don't see the world where we are not doing a lot of stuff on GPUs. We're just going to have more and more of this going forward. And, you know, we're expanding into multiple GOs. We're not having everything in one place now. So I think the pattern of deployments of those GPUs is maybe different. Instead of having one giant monolithic cluster, we're breaking things apart. But the numbers I don't see going down any time. I worry about platform risk because you guys are dependent in a way on these major scaled cloud platforms.

Starting point is 00:48:50 And at the same time, Microsoft is making its own AI models. It has its own BI tools. They don't want to share. Right now, they'll be besties with you because it's convenient for them. But you guys have such an ability right now to raise capital at whatever price you want. I mean, so to me, I'm just kind of curious why not offer a Databricks cloud alternative to these big tech companies who I don't think ultimately want you guys to win. Yeah, I mean, I would say you should ask my boss about that one. But tell him to text me back.

Starting point is 00:49:21 I mean, geez, come on, Ollie. No, I mean, I think all possibilities are there in the future. We'll see how things go. I mean, what we have done is establish a particular point in the stack of trust. So we do own these abstractions for data. Now we're owning a lot of these abstractions on how to take that data and make models that are really great and how to deploy them. So when you have that abstraction, you can move down the stack, typically. Moving up the stack becomes very hard.

Starting point is 00:49:49 So you see this. If you're a hardware company moving to cloud, comes hard. If you're a cloud company, perhaps moving down to build your own hardware is less hard. We see this. Every cloud company is building their own inference chips now. Yes. So, you know, how is it going to run in the future? That does not include Databricks, just to be clear. Correct. We are not doing hardware. I mean, we don't do the cloud, so we're not doing hardware. I'm just making sure that we talk about these other companies. It's not you guys. You're not making your own. Google is, Microsoft is,

Starting point is 00:50:18 meta is, everybody else. Okay, cool. That's right. Exactly. So, How are things going to shake out in 10 years? I don't know. But right now, I mean, there's a good reason for the clouds to work with us. We drive a lot of incremental revenue to them. We store a ton of data. And now we have the tab to our acquisition. Like, lots of data in the cloud is on our formats.

Starting point is 00:50:40 So, you know, that gives us a pretty good control points. And, you know, the clouds are are incentivized to work with us. At some point, maybe that's not true. Well, I think what I'll do is I'll just go start buying small servers and then plugging them in to the back of your guys' HQ and then slowly build up a cluster the U.S. can offer as pass. To me, I don't know.

Starting point is 00:51:00 I just, one of my things that I'm most scared of right now is that the largest tech companies, the ones that are the hardest to knock off that have the most market weight are also the companies that own the major cloud platforms. And that, to me, makes them so monolithic and hard to kill that I'm worried that we're going to see less innovation over a long time frame,

Starting point is 00:51:24 over many different technology cycles. But it just makes me a little scared. And I know there's a lot of money to build a hyper-skilled cloud. And so there's only so many companies that can do it. And so the fact that you guys currently don't plan to, just me as an individual that bums me out, although I understand from a business perspective how we'd be frankly distracting to you guys right now.

Starting point is 00:51:45 I just, I hope that in five years you're showing me around, you know, DataBrix Data Center No, 9 or whatever. it is. That'd be great. Yeah, but I mean, if you think about it on a long time horizon, one of the largest incumbency events in tech happened in the late 70s. What are the two largest companies in the world today? Apple and Microsoft. They both started then and they both started with the PC. They've leveraged that advantage over and over again through multiple

Starting point is 00:52:12 tech transitions. And, you know, it has a really slowed innovation, right? They've been on, They've been maybe behind, but they buy into it or, you know, buy a startup that's, that's doing something innovative. So I don't know. I mean, I like that argument. I just wish that instead of the magnificent seven, it was the magnificent 27, you know, like just more competition. I don't disagree, but I mean, it's sort of like how these ecosystems evolve.

Starting point is 00:52:37 You have to have certain capital to have that incumbency. And, you know, they have it. And it keeps going, right? It keeps rolling forward. Yeah. It's great for you guys because you can work with all of them at once. and therefore no matter where your customers are, you are as well. So I get that.

Starting point is 00:52:51 There's a lot of talk about small models versus large models. And my impression, going back to the models, you guys made at Mosaic, because quite often we can see a 7 billion parameter model be as good as a larger one if it's well-tuned. What is the mosaic slash Databricks perspective on corporations using larger versus smaller models today? Yeah, I mean, I think it's about building the right thing for your application. Do the thing that matters. If latency really matters, go small.

Starting point is 00:53:17 if quality really matters, maybe go bigger. But one thing you'll see as a trend is literally no one has built a model bigger than GPT4. That happened, I don't know, 16 months ago, something like that. And that model is still the biggest one out there. And even Open AI has been building smaller and smaller models. You have GPT40 and 40 Mini. 4O mini is probably less than 10 billion parameters.

Starting point is 00:53:39 I don't know with specificity, but it's very fast. And we can look at that as kind of a metric for how big it is. And I think we're seeing that they're able to extract higher levels of quality from a smaller model. So the science is by no means done here. There's a lot more work to do. There's a lot more we can ring out as smaller models. And then chaining these smaller models together as in these compound AI systems is the way forward, right? That's how we're going to do this economically.

Starting point is 00:54:06 That's how we're going to do it in a modularity fashion where we can independently verify each module and build N10 systems that are very reliable. Well, then do large models then exist as a way? way to help create smaller derivative models? Yeah, that's, that kind of had to happen. It seems like we built this giant model. And then we said, oh, well, we can actually take the outputs of that model and actually start to modify smaller models to have better performance. And so, you know, Microsoft did a bunch of work with the FI model that they published.

Starting point is 00:54:35 Those are like 1.3 billion parameters and they're very high quality. So we're seeing that trend over and over again where I can, it's not distilling. I'm using synthetic data generation from a bigger model to basically. bootstrap and train a smaller model. And it works. So maybe we need to go that route of building something really big to go and build something really small that's high quality. Okay.

Starting point is 00:54:55 On the synthetic data point, I wasn't going to bring this up, but you brought it up so I can blame you if we go over a little bit. Synthetic data to me makes a lot of sense, but always just based on the way my brain thinks, considers it always to be of slightly lesser quality than non-synthetic data. Is that a bias I need to get rid of in my thinking? I wouldn't say get rid of it yet. It's kind of, I don't know if you remember that movie multiplicity, where the guy makes a copy of himself to do more work.

Starting point is 00:55:23 It's actually kind of a funny movie. Basically, he makes him a copy of himself to have one at work and one at home and they makes another copy and a copy of a copy. Michael Keaton. Michael Keaton, that's right, yeah. I've not seen this movie, but it looks hilarious. That's going on the family list. Thank you very much.

Starting point is 00:55:38 Yeah, so anyway, it's a little bit like that. I mean, you start to become more unbounded, from the real distributions as you make more copies and more synthetic generations of it. It's like you think of it as making mashups. Some of those mashups are not accurate reflections of the real world. We call this concept model collapse. If you train a model on a bunch of these synthetic objects, maybe you get something that actually doesn't give you anything useful.

Starting point is 00:56:04 So far, we're seeing it as a good way to augment models and actually allow them to explore the distributions of the real world. If you can keep tabs on keeping quality and being it grounded. So I think it's a useful technique. It's not something that's going to solve everybody's problems. I still think we have some major issues in the field of AI. We've gone very brute force. You know, 20 trillion tokens of data to train a model.

Starting point is 00:56:29 That's 20,000 human lifetimes of text. This is not how humans work. I mean, humans and animals actually use a lot less data to get much higher quality causal models. So we have a lot of science, I still think, to do to build. to build these things economically and, you know, very well power. Your brain runs on 20 watts of energy. A mouse runs on 800 kilowatts and they do amazing things. No, that's one of the most encouraging things that I've seen.

Starting point is 00:56:56 I've seen the people showing like the brain watts versus data center watts and how many, you know, equivalent flops you can get and so forth. Do you think that eventually, and this is completely off topic, so this might be a stupid question of being, but like there was a, there was an effort. I read about this in like business week, like 10, 15 years ago to replicate the human brain, thinking that that is the neural network that is so efficient, can do so much, we should do it. We seem to have kind of moved away from that in the Transformer Revolution. Do we bend back towards biology over time?

Starting point is 00:57:29 So, okay, those are like the human brain project and stuff like that, I think is what you're referring to. I mean, I'm a neuroscientist, and I thought those were like very poor facsimiles of what was going to lead to intelligence. It's because you're trying to replicate something without understanding why that thing exists. It actually becomes, you know, you get one little parameter wrong, the whole system falls apart. These are delicate systems. So I do think going down the path of trying to extract principles make sense. Now, I do think that looking at biology still is potentially a useful path forward. There is a different dimension to more data that we are missing currently in these systems. They're not, they're not trying to understand how to make better decisions and self-critique. Maybe we're getting there now with this kind of like this chaining idea. But, But, you know, our brains have multiple systems that actually interact in very interesting and precise ways. So you may have read Daniel Kahneman's book, Thinking Fast and Slow. Yeah.

Starting point is 00:58:25 Very, you know, this is a great book if anyone hasn't read it. It actually has roots in neurobiology. There are systems that are very old in our brain that kind of make learning without parameters, without, without a model. Then there's these other systems which actually try to simulate different realities and then make good decisions. We use both of these things in concert. And systems we build today, like the AI systems, don't really do this.

Starting point is 00:58:49 They don't have a good grounding of this. This is a real representation of the world that I can verify. They're just sort of like almost pattern matching against things that we're seeing in the training data. So am I saying we're on the wrong path? Not necessarily, but I think there's a lot of work to do that we can take some inspiration from biology without direct mimicry. I think the biggest takeaway from our conversation today for me is the, idea that there's still so much improvement coming in our ability to build intelligent software systems.

Starting point is 00:59:20 Like the orchestration point from 01, what you just said about this. I mean, this to me feels like we have 10 years of incredibly quick development ahead of us and we're going to end up with, I mean, something truly marvelous at the end. That's absolutely. Look at the internet, right? I mean, I was a freshman in college when really the first web browser came out in 1993. All the basic concepts of what the web was going to look like sort of existed then. But look at what we have now.

Starting point is 00:59:48 We have cloud computing. We have ways to do live updates and applications that run within a browser. None of that stuff was really contemplated at that time. And it was, again, went from this big monolithic thing that sort of work until now we can engineer modules, make things work really interactively. And we created a set of systems that are highly reliable. Companies like Google basically figured out how to be. build resilient infrastructure. This needed to happen to have like, you know, nine-nines up time.

Starting point is 01:00:18 You know, all of this engineering has to happen. And we're just at the beginning of this. Chat GPT was two years ago. You know, we've got 10 years of innovation left ahead of us to make these things really, really amazing. I think you should point to that sign that says chat GPT came out less than two years ago. Every time someone complains that AI hasn't solved world PC, I'd be like, all right, it's been 22 months. Everybody calm down. Actually, I just realized that my my first child is basically as old as chat GPT. Huh. That's going to be a really weird milestone to have in my life.

Starting point is 01:00:49 Okay. One last question before I let you go, Devine. A lot of startups out there working on AI drive solutions, services, lots of people building cool stuff in AI. Databricks has been acquisitive. It's how you joined. Tabular, of course, was a big deal.

Starting point is 01:01:04 I'm curious about two things. One, how often are AI founders calling you up very politely? And also just how interested is Databricks today in doing more, perhaps smaller tuck-in acquisitions of AI-focused startups. Yeah, I mean, I do angel invest. And, you know, I typically try to pick companies or founders that I feel I could be useful to them. As an angel investor, I kind of see myself as a de facto advisor or helper whenever I can.

Starting point is 01:01:32 And, you know, and things that I find interesting, you know, if it's a new, cool technology, all right, let's give it a go. I am really interested in places where we can apply AI to actually see. solve vertical problems. Healthcare. Healthcare is actually a very important one to me because I think it's something so fundamental to what it means to human. And we have the pieces to actually go make healthcare accessible to many people, but it still hasn't happened. There's a lot of structural reasons, things like that. So let's get to that. So I like to invest in these kinds of things. So yeah, I do get called up quite a lot. And I'm happy to talk to any founders who are building

Starting point is 01:02:07 something cool. So there's that. Sorry, what was the second part of the question? And just how Acquisitive is Databricks going to be, say, in the next 12 to 18 months. A lot of companies have been built. Many of them probably aren't going to make it to the next level. So how many do you guys want to tuck in? Yeah, I mean, we're always open to these tuck-ins. We've done a few of them. We're always looking.

Starting point is 01:02:24 We have a really good corporate development team that's basically scanning, finding places that fill in holes that we have. So we're always looking at this. But we're very thoughtful about it. We don't want to buy something just because, you know, it looks cool. We're going to look at it like, okay, how is this going to work strategically with what we're building. Does it solve a problem that we don't have a good solution for today? Does it have a customer base that we want access to? Something like that. There's got to be some reason. So,

Starting point is 01:02:49 we're pretty thoughtful about how we assess it. And when we bring these companies in, I mean, some of our agent evaluation stuff has come from a tuck-in that we had, a company called Lilac AI. They actually created a really nice UI for, you know, embedding data and visualizing it. We've actually morphed that into something that is going to be much bigger and very impactful for our customers. So we are absolutely interested in more tuck-ins and, you know, we're growing fast. We're growing. Our revenue is growing more than 60% year-on-year. And so I think we are going to be one of the major players for these companies to go find an exit toward and, you know, keep in touch with us, build a relationship with us. You literally just through the, we're growing fast,

Starting point is 01:03:30 60% growth, I was not going to bring up you guys going public. But I mean, dear God, you just, you're demanding it. So I know the answer is we'll do it eventually, not yet, blah, blah, blah. But like H1, 2025. Like, if you guys don't do it then, I think I'm going to submit like my tears in a vial. Like, come on, give me the filing. So just so you know, that's coming next summer if you guys don't pull the trigger. But Neveen, thank you so much for coming on. We appreciate it.

Starting point is 01:03:54 We'll have you back in another 250 episodes or whatever it was the first time. And where can people find you online just before we go? Yeah, I'm on Twitter pretty actively. So Neveen G-R-R-A-O is my handle there. I'm also on LinkedIn, of course, so you can find my name there. But yeah, Twitter is usually where a lot of the AI stuff happens and where we have a lot of cool discussions. So follow me. Absolutely.

Starting point is 01:04:19 For everyone else, more interviews, more live news, more twists coming your way. I'm Alex. I'll see you soon. Bye. Okay, everybody, welcome back. It's time for a jam with J-Cal session. What's a jam with J-Cal session? This is where a startup founder tells me what they're working on, who their customers are, what their product does, what they're trying to.

Starting point is 01:04:38 to achieve, essentially their vision for how they're going to change the world. And then I asked them, hey, what's your biggest challenge, are the things you're struggling with? And then using my experience, having invested in 400 companies and taken well over 10,000 pitches from founders, and having done 2,000 of these episodes of this week in startups, sometimes, you know what? I know where some of the bumps in the road are. And we have a dialogue about how to solve problems. which is what startups are all about. So with me today is a gentleman named Elliot Easterling. He's the CEO of a company called Bonbon.com.

Starting point is 01:05:20 And that is our partner today. Dottech is a great domain name. I use it for Founder Fridays.com. Many people out there are using DotTech domain names because they're awesome. They let people know, hey, you're a technology company. And so thanks to our friends at DotTech for supporting this segment of, the show. We pick all the companies that come on. Welcome to the show, Elliot. Thanks, J-Cal. Thanks for having me. I appreciate it. I just want to walk you through a little bit

Starting point is 01:05:47 of the business. I'll take you through a deck and then I'd like to jump into some questions I have about somewhere I'll go-to-market. Review the deck. I like reviewing the good deck. Let's say how you did. Great. Awesome. Well, I'm the CEO, as you mentioned, of Bon Bon Technologies, and we are a rewards platform for publishers, allow publishers to reward anything. And that ultimately drives a lot more engagement and much higher registration rates. So let's just get started. Really, the pain that we're trying to solve is for ad-focused publishers, the 99% of publishers that are ad-focused.

Starting point is 01:06:19 And they've been really suffering from wave after wave of big tech changes. Things like cookie deprecation, which started in Safari and now is moving to Chrome. Search results pages are referring less and less traffic out. Social media algorithms referring less and less traffic out. And then now AI, which is going to have a massive impact as, big tech platforms use AI to drive more engagement on their platforms and ultimately refer less profit. So what we're seeing as publishers are looking right now for solutions.

Starting point is 01:06:47 They're planning for a smaller user footprint, but they also want to drive a lot deeper, more profitable customer relationships from those users that are on their platforms. And so what we built basically is a fix by rewarding engagement. So we built the first publisher rewards platform that gives consumers, relevant rewards, access to unique content, simple and transparent data and privacy controls, and ultimately a better user experience. Publishers get logins, and logins reenable cookies and lost IDs. They also allow the publisher to build direct relationships, which they're starved for in the world where they're mediated by these big tech platforms. We also give the publishers gamified engagement a points program that

Starting point is 01:07:28 drives things like revisitors, page views, watching videos, and ultimately, that's all delivered five times more monetization per user. All right, you got me so far. It's interesting. I find myself nodding as a publisher, just so we check in here as you run me through the deck. Yeah. Publishers do have these problems. And if their consumers are logged in and you know a little bit about them, you can monetize them better.

Starting point is 01:07:54 And people like gamification. So let's see how this works. Cool. Yeah, and you're a media maven. So I knew you'd really understand the business pretty quickly. So we've cracked the code on sculpting consumer behavior with this rewards program. We built an optimization engine that drives outcomes like 300% higher registration rates, 100% more engagement, 250% higher ad rates.

Starting point is 01:08:16 And then we're finding that 54% of people after they log in will actually complete their data profiles and provide even a richer understanding to the publishers or who's visiting their site. And this is all at a moment where publishers need this most. And so let me just walk you through really quickly kind of like a sample of the tech. Finally, I want to see the product. Okay, here we go. And so on the left, you see a little icon, a little tab. That's us.

Starting point is 01:08:38 We control that. And that's all the user's navigation that can access the rewards program. But on the right-hand side here, what you see is that at any time through the process, we can allow a publisher to trigger a rewards window either in line or as a pop like this that essentially runs eight or nine offers simultaneously to figure out through machine learning. What do the users of that site care about most that will make them willing to register? And ultimately, this technology delivers like a 3x hire. register from anything they've been doing prior.

Starting point is 01:09:08 Okay, so for the audience that was listening, there's a little arrow on the side, like a little chip on your phone. If you were to click it, it pops up, says, hey, would you like to win this television set or whatever, some kind of sweepstakes? If you log in, if you register, you automatically get updated, yeah?

Starting point is 01:09:25 Yeah, that's exactly right. You get entered into the contest. I love it. Yep. And then after someone enters or registers when we verify their email, we actually gamify the registration process. Like we say,

Starting point is 01:09:34 hey, tell us more a little more about yourself. Give us your name. 94% of people will give us name for more points. Zip code, 91% give us Zip. Gender, 89% give us gender. Even they verify their phone number with us at 54%. And we're just getting started. The other things we do is we give people points for reading.

Starting point is 01:09:50 Every article they read, they earn more points, and that drives 100% more engagement. And just high, high level, we provide them with what we call three parts of the platform. Open Identity Manager, which allows them to collect and manage first part of data. This rewards engine, which runs. runs hundreds of offers to try and give publishers to be able to reward anything. And then we have a bunch of front and tools that consumers interact with that basically

Starting point is 01:10:12 sort of deliver the product. And we also have an API where publishers can call it to issue the rewards on their own. Very nicely done. Have you started to deploy this with any publishers yet? Where are you out in terms of building this business? Yeah, we are live on 27 websites. We have 60,000 bond bond members have actually registered. The one thing is one user logs into the publisher, they become a Bombar Awards member and also a publisher's first-party data. So we've been able to build our file

Starting point is 01:10:42 up to $60,000 as of last week, and we're running about $60 million monthly pages across our publisher network. Great. You know, publishers are looking for tools like this, and if you can help them build their profiles and give them those gamification tools, I could see a number of them wanting to participate in this. The challenge of course is you're trying to solve a problem for publishers who are really struggling, which means they're not high growth.

Starting point is 01:11:13 And so they might have very small budgets, right? And so that is an issue, is that they're constrained. So has that come up and how are you picking your ideal customer profiles when, hey, you know, the publications I used to run, whether it was Engadget or Autoblog or the ones that Vox created themselves, like The Verge, et cetera. A lot of those publications are having challenges these days, right? And so how do you think about that? Because you're picking publishers, which are a group of people whose businesses might be flat, they might be contracting, they might be slow growth, and they may not have budgets. Yeah, so we offer two solutions for enterprise publishers.

Starting point is 01:11:54 We have a SaaS platform where publishers can pay a SaaS fee, and we also have a free with ads version. Publishers have to have a minimum size to be able to qualify for, free with ads, but we basically inject ads and all our modals, and that essentially pays for the full program, including the rewards. Got it. So there's nothing to lose for those publishers by using this tool. And on the enterprise version, I assume I can opt out of like you owning the profiles and they're

Starting point is 01:12:25 just for me if I pay you enough money. No, the whole program is a cross-publisher rewards program that's sort of backed by Bon Bon. And one of the things we do is we issue, basically, we issue the rewards out across publishers. Ah, okay, that's clever. So hold on. That's an important thing to pause on there. If you're going to give away this OLED TV that costs $5,000, that cost a $5,000. We'll get it abstracted across 10 publishers. So it's really net net, $500 each. Or if it was 100 publishers, it would be $50 each, which is super clever. Okay, I understand. Yes, we have the benefit of that in that we basically can amortize the cost or cross-publisher.

Starting point is 01:13:07 Number two is sort of we put the privacy guarantee. We allow the users to come into our system and opt out of any publisher they want. We know, generally speaking, consumers don't like to do that. They just don't want to mess with the settings, but it is a really key important part of our value proposition. One of the other nice things here is, you know, the user can participate or not participate. So what I like about what you're doing is you might have somebody like me who will, looks at it, you know, as a 53 year old now. And I'm like, I don't want anything for free. I don't want to be part of any sweepstakes. But I might want you to know a little bit more about me as a publisher

Starting point is 01:13:45 because I want to get certain mailings, et cetera. So I have an email for just when I'm shopping that I use for my shopping websites. And I don't mind them knowing about me because I would rather get sales for men than women. Or you get the idea. Or I want tickets to Knicks games, not, you know, I don't know, Miami Heat games or Chicago Bulls games. So that kind of personalization is a benefit because you don't waste my time. And then when I was younger, I might, if I couldn't afford the OLED TV, that's $5,000.

Starting point is 01:14:21 I might very much want to join that sweepstakes. Thank you very much. So I think that's a very interesting approach, too, is this thing. data will help us personalize content for you. And then there's, hey, this data might let you get into a sweepstakes. And you didn't mention the gamification, but that one was a highlight for me as well. So do you have an example of gamification yet, or that's on the roadmap? Yeah.

Starting point is 01:14:44 So one of the things we do is out for people register, we sort of say, hey, and we deliver us to the email campaigns. We sort of send them a weekly email that basically says, hey, here are some articles you could read that are personalized to you that earn you extra points. And so we're essentially starting to sort of like enable gamification through our newsletter program. And we do also offer point bonuses for doing things that go to this one publisher and play one of their games. And that will earn you 100 points. So we're early on in the gamification process.

Starting point is 01:15:15 But I do think that that's going to be sort of one of the critical things. And sort of my promise of publishers is like, hey, not only can I get you a law more registered user than you could on your own, but these users are hyper engaged. because through communication, we can sort of scope traffic directing back to your site to spend time on the things that you care about and ultimately reward them

Starting point is 01:15:34 for that behavior and that activity. One thing that's tried and true is inviting a friend or a friend member bring a member kind of gamification. So you might find some interesting, you know,

Starting point is 01:15:48 engagement there from publishers if you could say, hey, I'm already a member. I think my brother should know about this content, you know, who else should know about this story? I enter somebody's email. It emails them. They click. They register. I get points. So that's tried and true. You see that in Robin Hood where you gift a stock. You get a stock. Uber, you give a ride. You get a ride. Dropbox. You give some

Starting point is 01:16:12 storage. You get some storage. So I think you got a really interesting tool-based and network-based business here are two ways to win. And so what's your question for me? Any challenges right now in the business? Yeah. So, you know, in today's fundraising environment, you know, ever since the market turned, publish, or rather, VCs, sort of, they behave similarly. And the mantra now is like, I want to see revenue. And all they care about is revenue, revenue, revenue. Now, this is a network effect business.

Starting point is 01:16:38 And we win by building distribution. Right now, we're building users at zero cap. And when you think about most rewards businesses, they're paying, you know, $5, $15 per user. We're not paying for users. And so our go-to-market really should be 100% focused on building distribution, getting more users, but at the same time, VCs and the market wants to see revenue, revenue, revenue, revenue.

Starting point is 01:17:01 Me putting pressure on publishers to essentially pay us is going to slow my business down. So how would you navigate that? Yeah, sure. This is a great question. Network-based businesses do have a carve out in our industry when it comes to monetization. So if you're a SaaS business, if you're a superhuman and you charge a dollar a day for an email product or Slack and you charge $8 to $35

Starting point is 01:17:26 a month depending on which plan you are per user. Yes, of course, people want to see the number go up. In your business, however, if you could prove that if you, you have these 60,000 members and if you said hey, we're giving away

Starting point is 01:17:42 the iPhone 16 and we just want you to take a survey about mobile phones and visit these three reviews of, you know, we're going to present you with 10 pages. Every time you visit one of these 10 pages in our network and you scroll to the bottom and, you know, click on the iPhone 16, you get entered in. Now, you'll get a bunch of sweepstakes, jerks who, you know, kind of just do this in a scammy kind of way. But you also,

Starting point is 01:18:13 if it's good content, might get people to actually, who are interested in iPhone 16s and technology to visit 10 pages. That's good for the publishers. And it cost you $1,500 to set up the sweepstakes and give it away. Now, you could prove to people, if you could prove to people, that when you do that, the publishers get page views, and the cost is spread for the iPhone 16 across 30 publishers, so it's only 50 bucks a publisher. And you just run those experiments without having the publishers tell you,

Starting point is 01:18:48 I'll pay for it. You're running a $1,500 experiment. and see, look, we sent, of the 60,000 people, 3% of them, you know, went and actually did something. Imagine if we had 6 million people and 3% did something. Now we'd really have a business. So that's up to you to prove it to me, the investor, that you could do these little experiments, and, whoa, look at the interest that this created. And they had to click on an email to confirm they were in it.

Starting point is 01:19:17 And if they put in their friend's email, they got, you know, $10. They got 10 more entries. and their friend got double the entries. So there are little experiments like that where you could show growth in the user base and the page of user driving and be able to correlate it. So that's really up to you to run those small tests

Starting point is 01:19:37 and prove to us you can grow, you know, five to 10% week over week. That would be what viral growth would look like in your business. Now, for sales and for SaaS products, you show me a SaaS business growing 10% a month. I'm interested. You show me a business like yours growing 10% a month. I probably would think not very interesting.

Starting point is 01:19:57 So you've got to get to that 5% to 10% week over week. That's a lot of tests to run. But you could actually do that with very low dollar amounts, yeah? Yeah, yeah. I think where you're sort of pointing as a unit economic story that we need to tell to investors like you to basically say, hey, we're getting these users for free. And then we run of these experiments that actually show that we're getting revenue tract and the revenue traction's building or the activity traction's building

Starting point is 01:20:20 on a per unit basis. Engagement. Yeah. The engagement. Is the name of the game. So you could say to me, you know, if you came to me with these $60,000 and said, okay, it cost us, you know, six months and, you know, we burned a half a million. We burned $100,000 in six months getting to $60,000. So we're going to need to spend a million dollars to get to, you know, what we think is $3 million because we're getting better at it. It's a fixed cost business, whatever. but if we did this many prizes,

Starting point is 01:20:49 we would get to, you know, 10 million. And then every week or every day, we're going to run an experiment with a million of those 10 million, so we don't burn them out, but every 10 days they're going to get some sort of offer to engage them, and then we'll start charging the publishers

Starting point is 01:21:03 because they'll be addicted to it. But, you know, you have to get the flywall started, right? And so getting the publishers to agree, getting the VCs to agree, all this stuff takes work. So you may have to just invest small dollar amounts to show it up. on a micro basis.

Starting point is 01:21:18 Yeah. And then say, imagine we add two zeros. Yeah. And here's what it costs to add two zeros to the velocity I'm showing. Yeah. Yeah. Makes tons of sense.

Starting point is 01:21:26 Pretty straightforward. Yeah. And that's like your investment. But the good news is I think you can show this for very small amounts of money. How many people working on the team? I'm curious how many developers should have. Yeah. So we have a product person, a couple engineers who are really, really good,

Starting point is 01:21:41 are kind of working kind of core on the product. Outst engineers or full-time employees. Offshore? Offshore, part-time. Got it. Basically is what the team looks like, yeah. So you're in year one. You're an early stage startup, I take it.

Starting point is 01:21:56 We're pretty early, yeah. We're sub two years. Yeah. Okay. Yeah. How much year raise? 1.4, basically. Oh, okay.

Starting point is 01:22:03 Wow. So you've raised a decent amount of money. Yeah. You should be able to get that network effect going, just invest in giving away whatever the product of the moment is that aligns with this. I would also start thinking about, you know, the virality of social media and TikTok. You went after publishers,

Starting point is 01:22:20 which is where attention is going away. So they need your tool, obviously. They really need it. But their businesses are contracting, and you may be selling like, you know, deck chairs to the Titanic. Now, what business is growing? TikTok, shorts, video, podcasts.

Starting point is 01:22:41 There's a bunch of things that are growing in the world. So I'd also think about these tools. and say, hey, if we were to get 20, if we had 60,000 TikTokers and people making shorts and they were engaged in the network, what might this look like? Because sometimes, you know, a surfer is only as good as the waves presented to them. But when you go big wave surfing, right? Yeah. You're, you've got these little tiny waves, three, four, five foot ones. Okay, fine. You can surf them, but it's not like you're going to be going to the North Shore and hitting a 20, 30 foot wave where like,

Starting point is 01:23:17 whoa, it's impressive if you can catch one of those waves. So I encourage you to really think deeply about the beach you're surfing at. The beach you're surfing at, the tide might be going out faster than you can build a business. I love looking at, you know, and I've done this myself, right? I caught the magazine Zine wave in the 90s. I caught the blog wave. I caught the podcasting wave 14 years ago when I started still riding that wave. And caught the angel and seed investing in incubator wave.

Starting point is 01:23:50 Sometimes there's waves that are bigger than you. And if you catch them right, man, that's why I'm doing more thinking about live and shorts. You know, TikTok shorts. Yeah. Or, well, TikToks aren't called shorts. I think YouTube calls some shorts. But anyway, I've been thinking about other alternative formats outside of podcasting. And I've been thinking about live a whole bunch.

Starting point is 01:24:09 So I encourage you to think about that as well. You have an interesting business. I mean, I would have, if you had presented this business to me to come to our accelerator, I'd be like, yeah, let's do it. But I think you're in a weird position having raised a bunch of money, not having violent product market fit or market pull yet. So you really got to get some market pull. And I think it might be that you might need to find another beach where the waves are a little bit bigger. So I would think about that. E-commerce is another one, direct-to-consumer.

Starting point is 01:24:37 the people in direct to consumer were getting slaughtered, slaughtered, were trying to get Facebook to work. And Facebook worked for a long time, then it stopped working. But the ones who went to TikTok and social media and podcast did make it work. So sometimes it's just like it's not you, it's the environment you're applying your skill in. So he seemed very talented, yeah. Yeah, and we are, I can't disclose what we're doing, but I think some of the things we're indicated our areas where we're testing right now

Starting point is 01:25:05 doing some experimentation. Yeah, I'll leave it at that. I don't want to tip your cards. Great job. And for everybody listening who wants to learn more, give us your URL again. I know that you're a dot tech, but remind everybody of your domain name.

Starting point is 01:25:20 Yeah, bonbon.com. What a great domain name. Bonbon. t-e-ch. If you would like to get one of these great. Dot tech domain names, just go to get g-get. Dot tec-c-c-c-h and get one of those great

Starting point is 01:25:36 dot peck domain names tell them your boy jkowl sent you we'll see you all next time i'm jam with jkowl and this week in star nucks

This Week in Startups - Highlighting Data Intelligence with Databricks & Bonbon’s Reward Innovation | E2020

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.