Software Huddle - Lessons from Building Tagged.com + AI-Driven Database Optimization with Johann Schleier-Smith

Starting point is 00:00:00 There are a lot of things that you can think of and that might be cool, but they just don't work as a business. It's like all the pieces really have to come together. And so there are these particular constellations in particular, let's say, on the growth. You have a growth mechanism, you link it up with a product and together. And there's got to be a revenue angle, too. Right. And together, these pieces make a business.

Starting point is 00:00:24 And there are a lot of other things that are cool, and they're not businesses. And they never will be. What was this all built in? Was this LAMP stack? This is PHP? All of our first stuff was built in PHP, and this is very much about that era. And Yahoo was successful. Yahoo was built in PHP. If you want to be successful, why not just do what Yahoo did? And when did LLMs and general AI come onto your radar and what was your first impressions? Hey, everyone. Sean here, and I'm really excited

Starting point is 00:01:00 to introduce our guest today. We have Johan Schlatter-Smith, the CEO and founder of Crystal DBA on the show. I've known Johan for over 10 years. He was an investor in my company way, way back when I was a founder. He's a really, really smart guy, multi-time founder. During the interview, we spent some time discussing his first company, Tag, a social discovery website founded in 2004 that Johan helped grow to 300 million numbers and a team of several hundred based out of San Francisco. We discussed some of the scale problems and challenges they face back in the heyday of social networking and how some companies in that space, they ended up failing primarily because they simply couldn't meet the technical

Starting point is 00:01:41 demands of scaling. We also chatted about his new venture, Crystal DBA, which is an agentic AI system that gives every engineer instant access to a database expert to help you improve your database durability, reliability, and scale. They're still early, but it's super cool technology. I love the vision of democratizing the DBA. Anyway, let's get you over the show. As always, if you have questions or suggestions for Huddle, feel free to reach out to Alex or myself. Johan, welcome to Software Huddle.

Starting point is 00:02:11 Hi, Sean. Good to be here. Yeah, absolutely. So I wanted to start off talking a little bit about your background. You have a pretty impressive resume. I'm sure you've heard that before. Two PhDs, one from Stanford, one from Berkeley. And your second PhD was actually focused on service computing. And you might not know this, but when we met up, I think we got coffee like eight years ago or something like that. And I was asking you for some advice because I just finished. You sort of wrapped stuff up with my own startup. I was trying to figure out what I was going to do next.

Starting point is 00:02:43 And you were explaining some of your PhD work, your second PhD work, at least at that time. And that was actually my first introduction to serverless. It was from you. I wasn't super familiar with it at the time. So thank you for bridging the gap for me. Yeah. Absolutely. So it's been quite a journey here in, I guess I'll say, you know, California, Silicon Valley for a number of years. You know, I think I'll say I actually have one degree, one Ph.D., but it's split between time spent at Stanford and time spent at Berkeley. And actually, the whole thing, and it just kind of is what it is. It's probably the longest PhD ever or close to it because it took me about 20 years to get the thing done. But yeah, I mean, I definitely persevered through it.

Starting point is 00:03:42 So I'll give myself some credit for that. I think a lot of folks go into these things knowing it's going to be hard, but not actually realizing how hard it is. And I will say my respect for that degree basically got higher and higher every year. So it is something that I'm proud of. But yeah, what can I tell you about it? I mean, you know, we definitely have the journey. Serverless is something that is obviously, you know, really exciting, a big trend in cloud that is continuing. Yeah, absolutely.

Starting point is 00:04:24 So was the timeline essentially you had started a phd and then you founded tag you left to go do that and then once tagged your time there finished you essentially went and finished the the phd so so i'll i'll walk you through so i i came out uh and you know had had finished my undergrad in physics and math at Harvard. And so, you know, when you kind of do that, then sort of like the thing that is like the thing to do, I guess you've been, you know, you've been a student all your life. So like, let's keep being a student. Let's go to grad school. That was kind of the path that I was just on, you know, doing well academically. So just like keep doing it. Right. And so I did make a fairly deliberate decision to come out to California. You know, I'd been in San Francisco for a conference, maybe like a year before. I just was just really struck by the beauty. So four out of five schools that I applied to for grad school

Starting point is 00:05:30 were in California. Ended up selecting Stanford together with Greg Singh, who was my co-founder, I tagged, which was later rebranded as If We. And it kind of became a larger umbrella for social apps. But yeah, and look at that decision, the decision to come to Stanford, the decision to come to Silicon Valley was was not an accident. We have been doing sort of side project businesses, including back in college.

Starting point is 00:06:01 And so the proximity there to to Sand Hill Road and that whole ecosystem was very much kind of a draw for us. I think, I mean, Greg more than I, but, you know, was very enamored with what was happening, you know, in the dot-com boom. And so, you know, sitting on the East Coast, kind of feeling like we're missing out on it, and then actually getting that chance to come was very exciting for us. So in particular, so what we had, you know, at the time, we had built a simple website that, you know, had this mechanism, which, you know, became known later as growth hacking and viral marketing. But it was a very silly idea, which is, what if there's a database of everyone's

Starting point is 00:06:55 secret love interests? And you could tell people only if they had a match. And so this goes viral. And we had built that up and had actually signed up millions of users prior to it was very popular in high schools and so forth. So prior to social networking, we were kind of doing these social kinds of things just on the side, and then kind of we evolved that business into what became tagged, which initially wasn't really social networking. It was very much at that early pre-social networking era. And where we say we were afraid to add photos, could we handle the bandwidth? What if people

Starting point is 00:07:42 uploaded inappropriate photos? We had all these concerns, but at some point it became pretty clear, particularly with some of the things that Facebook and Friendster and others were doing, that this was something, it was an exciting thread. So we made a pretty quick move and then the things started taking off. So to kind of set the stage a little bit around 2004, actually in grad school, doing really exciting work. So my advisor, W.E. Murner, somebody I have a huge amount of respect for. He's since won a Nobel Prize doing these very sensitive measurements, basically making videos of enzymes, digesting milk sugars and stuff like that. Really, really cool work. So really great opportunity. At the same time, you've got a million members or so on an early social network. It's just a question of like, what do you do? And in hindsight, hindsight look a few things i mean one just really blessed to have

Starting point is 00:08:47 such great opportunities right to work with nobel prize-winning scientists and to you know and and to go you know at the going at the early stages of you know what eventually becomes a trillion dollar industry um so we picked you know we picked the socials knowing that we could always come back, uh, to, to academia if that's what we wanted to do. Um, and ultimately over time, that's, you know, what I, what I did end up doing about, uh, you know, um, a little over 10 years later. Yeah. Yeah. I mean, I, I think it's, you know, I, I, you know, had a kind of similar journey in some, in some sense as well. I was pulled to the Valley because I also felt like I was missing out on what was happening at the forefront of tech. I only ended up lasting a these Bay Area colleges are losing graduate and undergraduate students at a steady pace as they jump out to go and raise money on Sand Hill Road and start

Starting point is 00:09:53 their own thing. Because there's just so much happening and that pole is very, very strong. It's hard to... Unless you really, really want it, I think stay there eating Top Ramen on the student salary versus going and chasing the dream of creating something. Yeah, absolutely. I mean, it's definitely very tempting. And I know when we were there, it was a bit of an anomaly. When we were there, when we went to the department chair, and it was actually kind of scary kind of walking in there and saying like, hey, you know, we're, we're taking a leave is what it was called. We'll be back in two years or whatever, which was obviously wishful thinking. But, you know, kind of the way you think at the time, because we were really passionate about it. But we were actually, it was met with encouragement, which we were actually really pleasantly surprised by.

Starting point is 00:10:47 On the other hand, my understanding is that over the years, it became really sort of a problem for the university, actually, and for the faculty there with just students coming in to really just use it as a stepping stone. But I think the reality is that people got to do what's right for them. And I think that, you know, and so, you know, you don't, you also don't want people in the program who aren't actually there to do academia over the long run, or at least to make the most of the opportunities that are given to them. Yeah, absolutely. And that first idea that you're talking about, this sort of a way of storing your secret love, was that at all inspired from Craigslist's Mixed Connect connections feature from way back? You know, not directly, but I'd say that there was really a lot of ideas being thrown around and sort of tried out in that early era of consumer internet.

Starting point is 00:11:59 And the thing that I think that was really key and that really linked together both the success of our businesses, as well as, frankly, a number of the larger success stories in consumer, was having a scalable, highly efficient growth mechanism. And so in this particular case, the way the mechanism works is that you're building the list of people that maybe you're a high school student, you have secret crushes or something like that, you're building that list, and then they know that they're in the system. They can come join and see if there's a match. Right. And so it just kind of spreads. And the same thing with social,

Starting point is 00:12:49 you have not the same dynamics, but social networks are interesting because all these products are sort of naturally, we call them naturally viral products, which is to say it's not like something that's added on as a marketing gimmick. It's like the core mechanism of the product is doing things with other people. And so when one person joins social networks, using them alone is like really boring.

Starting point is 00:13:17 Right. So you have to bring in other people to make that product useful and valuable to you. And so it's just a win win to build up that community. And you see the same thing, whether it's the social networking, the professional networking, and then even more recently, the networking that's happening outside of the people that you already know. So another thing about the tag story. So I'll kind'll fast forward a

Starting point is 00:13:46 bit. There was a very fluid phase. And then we really landed on our part of the market, which was social discovery. So what is social discovery about? We basically recognized, we said in 2008, what is something that we can do that we can focus on that other folks like Facebook, for example, which at the time actually wasn't that much bigger. I think maybe I'm going to be 2008. I think they were a bit bigger. But when we go back to 2007 or so when we're making these decisions, it was something like there are 100 people, we're 30 people. But they're really smart. And unless they really make a mistake, they're just going to stay ahead of us. There's just no way that we can get around. And so we need to do something that they're not going to do for a while, like, let's say 10 years

Starting point is 00:14:38 or something like that. And that's when we made a decision to focus on this potential that social has to create new connections, to bring people together. We already have that experience in the dating space. So, you know, one is romantic connections. One is sort of shared interests. And then we also built a pretty good business around games and bring people together who kind of had a shared passion around a certain game. Right. So we said, that's what we're going to do. And that's, you know, if you think about business takeaways, finding kind of a well-defined purpose that you have

Starting point is 00:15:15 that's sufficiently differentiated, that's what, I mean, that's what made that business work. And that's what made it so that, you know, up being another MySpace or any of the many, many social networks that basically failed when the Facebook network effect just destroyed everything. We were actually the first social network to get profitable. So, yeah, we're really proud of what we built there over the years. Yeah, that's awesome. We're really proud of what we built there over the years. Yeah, that's awesome. And I think that's a key point about when it comes to virality or viral growth. I think it has to be natural virality versus something where it's jammed in. Because there was a period, and this was something that i experienced when i

Starting point is 00:16:06 was you know building my company was uh because of the success of social networks like everybody was trying to do there was like a lot of pressure essentially like what is your virality angle like now i'm sure everybody's like what is your you know gen ai play but like so whether it made sense for your product or not you were like forced to have some sort of viral angle that you just jammed in and like it was a feature. But it's never, ever going to work. Like you can't for certain products, like it doesn't make sense to go and send your friends 50 invites to also join the same thing. If that's not like a natural part of the entire experience. Yeah.

Starting point is 00:16:42 Yeah, absolutely. And so I think that actually, to me, brings up a larger point, which is there are a lot of things that you could think of and that might be cool, but they just don't work as a business. It's like all the pieces really have to come together. And so there are these particular constellations in particular, let's say on the growth. You have a growth mechanism, you link it up with a product and together and there's got to be a revenue angle too, right? And together these pieces make a business. And there are a lot of other things that are cool and they're not businesses and they never will be. And it kind of is what it is.

Starting point is 00:17:25 Yeah. I mean, I think that's why building large, you know, large scale companies is really hard because you got to get all those pieces sort of come together. And then you're also taking into account that all the challenges of like, you know, scaling people, hiring the right people to essentially drive all those different business areas. There's just so many things that can potentially go wrong, regardless of how amazing the vision of the company and the product is. Tell me about it. Yeah, so we built, tagged, and then to kind of fill in here, so another social network, we're actually involved with getting it off the ground, which is High Five.

Starting point is 00:18:03 And so that one, we ended up merging it, putting it all together, creating If We as the umbrella brand. We had a few other brands underneath as well, some that we built ourselves, some that we had acquired in. But that company, we ended up scaling. It ended up being about 200 people in downtown San Francisco. And yeah, I knew it was great. I'm really proud of the team that we built. Downtown San Francisco is pretty different now, which is kind of sad. But I, you know, I'm optimistic that it's, you know, that it's coming back and it's going to be potential to, you know, bring together great teams like that in san francisco again in the

Starting point is 00:18:46 future yeah so what were like some of the sort of like hard technical challenges that you ran into in this kind of height of you know the the social networking era i'm sure like scalability must have been a challenge like this is you you know a lot of these decisions you're making back in like, you know, 2008 sort of pre, you know, really like public cloud and all the types of things that we've take for granted today. Yeah, so absolutely. So that just the technical scalability of can you do it and can you do it at reasonable cost was a really big deal so when we had raised our series b you know shortly before then maybe a year or two before a friendster had sort of spectacularly basically gone up in flames because they didn't manage to scale right so they had the first mover advantage, and they had the money. They'd raised $50 million, which then today it's still a lot of money.

Starting point is 00:19:52 Then it was really a lot of money, and they couldn't make it work. I think it took them multiple rewrites, and by the time they got there and having their single, I think it was a MySQL database that everything was running on. And that thing just basically couldn't take the load, kept crashing, kept crashing.

Starting point is 00:20:14 And they just straight up couldn't do it. That said, you had proven that it was possible. Facebook had gone with a different architecture. Originally, their architecture was basically they would run one database. It would be run one cluster per school because they were focused on college campuses. And so they kind of had an architecture for sharding that.

Starting point is 00:20:37 That was really natural. And then eventually they evolved it. But yeah, look, there were a ton of challenges, largely because people just hadn't done this before. So you had bits and pieces of technology. You had Memcache, which was a really important component. So we started out, we had the database, the first version of it. It didn't have any caching at all. Every single page, it hit the database.

Starting point is 00:21:08 Then you put in the caching layer, you know, then you start starting the database. Then you start adding services that handle specific components like, you know, one for understanding how people are connected on the graph, you know, different sorts of searches and so forth. So we ended up going to a microservices architecture, which has good things and bad things about it. But it did make it work and we did scale it. We're doing all of this on-prem, right? So we don't have access to cloud. And that means that you really have to be looking ahead and making decisions about budgeting for capacity, buying those racks, buying the service, reserving the space, reserving the power, understanding what your network switches you're going to need in terms of your

Starting point is 00:21:56 capacity for your network over time so that you're able to plug in more resources and so forth. So really a lot of planning that goes into it. And I think one of the things that was great for me and for our team was, you know, our scale, you know, thousands of servers, but, you know, we weren't like tens or thousands, hundreds of thousands. Right. And so, you know, we really got to own all of the problems and all the challenges and really just solve end to end, which is definitely pretty cool. State management is definitely the thing that stands out. So in social in particular, when you think about it, you look at a page,

Starting point is 00:22:43 traditional page. Now you're in mobile apps and it's a bit different, but on a webpage, you've got at least a hundred distinct pieces of information that get rendered into that page. Oftentimes it's more, it might be like 200. And some of those are coming off of lists of photos or friends or comments or something where you, you know, you might actually have a lot more than that. And then you have to take the top end and so forth so it's actually just like a huge amount of information that gets aggregated and rendered and so whether you're sending that at a database or you are going to a caching tier which ultimately you have to in order to get the performance and the cost to make sense um it's it's it's like it's a lot of it's like actually legitimately a lot of work

Starting point is 00:23:28 that needs to be done to put that together. And you start to have issues like, you know, can my switches, I'm going to go out, I'm going to fan out all this stuff. Maybe we're going to do it concurrently, right? And now I'm going to get a whole lot of packets back. You start blowing out buffers on switches and so forth. And so it gets really down in the weeds technical pretty quickly. So smart

Starting point is 00:23:51 people figured out how to solve all these problems. I mean, it was legitimately a lot of engineering challenges. So fun times. Yeah, absolutely. and what was the like this all built in was this uh like you know lamp stack you read this is php yes yes so um all of our first stuff was was built in php and this is very much kind of about that era right and you And, you know, Yahoo is successful. Yahoo is built in PHP. If you want to be successful,

Starting point is 00:24:29 you know, why not just do what Yahoo did? Look, I think that these sorts of languages, these scripting languages for just getting stuff done are really practical. And I mean, still today, you know, you're seeing it come up in different flavors or variants. And I don, still today, you know, you're seeing it come up in different flavors or variants. And I don't want to say Python is the same thing as PHP. I mean,

Starting point is 00:24:50 actually, well, for both languages, I think Python is actually, you know, a nice, beautiful language, but has a lot of those same drawbacks, right? Like actually, the performance isn't that great. Now we're getting, you know, a JIT for it. So, you know, hopefully that's going to get better, but a whole bunch of limitations. But the fact fact is is that you can build stuff quickly and that i will say is the other piece of this when we talk about the scalability was very much uh on my mind and on the team's mind um because people were moving so fast and the key to being competitive and staying competitive in in the social space you know here now you know we're going to talk 15 years ago right um is really was you know could you put out better product and try ideas and so forth faster than anybody else in

Starting point is 00:25:42 order to drive that innovation forward. Particularly, you know, we're getting to the gaming space, just lots of stuff that you're just like, you need to build, you need to build it fast. And, oh, by the way, you have to ship it. It has to work at scale. So, you know, the era that we're starting in is sort of pre-CICD. Those terms didn't really exist, but we were doing it. And essentially what we had is we had some great technical minds on our team who were in tune with the ideas that were circulating.

Starting point is 00:26:20 And they basically figured all this stuff out for us. So, for example, how do I make it so that a developer is able to actually run the entire system? Because think about it, you've got a system, it's like a thousand servers or whatever, it's running in a data center. And now I want to be able to do development on that. Well, actually, the first versions of that, we would have this shared dev environment,

Starting point is 00:26:46 which I think now sounds, well, actually, it's probably actually not that crazy because I think some people, I think, are maybe still in that era, right? But being able to get every developer the ability to run the full stack for themselves so that they could do the testing and, you know, that we could get all the tests so that we could reliably ship every day and so forth.

Starting point is 00:27:11 Like there was a whole bunch of stuff that went into building that. What type of, you know, how much RAM do we need in the workstations? What type of virtualization are we using? Like all these things, you know, we didn't have Kubernetes or anything or Docker or anything like that. So, you know, we had to build a lot of different stack of technologies. Yeah. And I think, you know, that was, I think you talked about Friendster, you know, essentially

Starting point is 00:27:35 they weren't able to meet those scale needs. I think that was a big differentiating factor for companies that were successful in that era was if they were actually getting traction, could they meet the technical demands to have to invent a lot of this stuff that just didn't exist at that time? I want to transition a little bit to talk about your latest venture of Crystal DBA. Before we jump there, my understanding is it sits a little bit at the intersection between like generative AI and databases.

Starting point is 00:28:09 And I wanted to pick your brain a little bit about generative AI. Like you, like me, your master's was focused in machine learning, if my understanding is correctly. And when did like LLMs and generative AI kind of come your radar, and what was your first impressions? Yeah, so when I was at Berkeley, both in the master's and the PhD, I spent a decent bit of time on machine learning in addition to systems, in addition to databases. And now we're talking sort of the 2016, 2017, 2018 era. And machine learning in that context, generative AI was definitely on the horizon, but it wasn't in the same terms. And frankly, the technology that's dominant today was basically being invented at that moment to some extent at Berkeley and at other places as well at Google and so forth. So for example,

Starting point is 00:29:14 in 2016, I remember taking a course. It was a scalable machine learning course with John Kenney. And there was just a slide there and it said, attention. I was like, I have no idea what attention is. And I remember the lecture and talking about it. And I think probably the whole class was trying to figure out like, what is this thing? Right. What what what is attention and why should we care? And, you know, but it was kind of like, you know, it was like one of many, many things that we were talking about. And it wasn't something that we paid particular attention to.

Starting point is 00:29:52 You know, kind of fast forward, you know, six to 12 months. And now you start seeing some remarkable results coming out of the Transformers work and, you know, the context of language translation. And then, you know, fast forward a bit more and there are some other results that are, you know, that are kind of mind blowing, which is like, hey, we developed this thing for language translation, but now we're throwing a bunch of non-translation NLP tasks at it. And it's actually doing really well. And this is actually, you know, it's pretty surprising because at the time people were spending a lot of energy sort of engineering

Starting point is 00:30:39 these architectures of the neural networks in order to solve specific problems. And then this idea that you engineer the architecture for one thing and it's doing well on something else. It's like, wait a minute, that doesn't make any sense. Why is that? So all these things were kind of in the water, but air in the water, air, whatever, in the ecosystem. But again, the technologies were pretty different.

Starting point is 00:31:05 So we were looking at neural network-based architectures for NLP, but it was the RNN and LSTM and it wasn't actually these architectures that are popular today. However, there was kind of enough going on there that I had in the back of my mind, like, hey, this is going to get really interesting. And it certainly was.

Starting point is 00:31:38 And like when I was thinking about the, you know, kind of some of the things to do next and my background in social, I was like yeah, it's like, it's very clear to me that we're going to be having conversations with these AIs. Like, like it's, it's just like, that was pretty obvious to me, you know, even, even in like 2016 or 2017. Other things along the generative front, like, you know, you start seeing these photos, whether they are images of flowers or faces or whatever. And the first ones were actually not very good. But still, you're like, that's crazy. I've never seen anything like that before. And fast forward. And of's, it's amazing. Yeah. I actually have a slide and a gen AI talk that I do where it's like a picture generated from 2022.

Starting point is 00:32:33 I'm sure at that time people were blown away by it. Cause you can tell that it's supposed to be a person, but like the hands are kind of like not so great. And it little, it looks a little bit like, you know, abstract art or something, but then a year later, it's like, I can't tell the difference between the picture and a picture of a model, basically. Like, it looks perfect. And then you fast forward, like, less than a year later, and you have prompt to, you know, video through things like Sora. And now, you know, I know at reInvent this week, they just announced a model on AWS that can also do that.

Starting point is 00:33:08 So everybody's sort of following. It's just moving so fast. And I think one of the big differences, especially from the machine learning that I took in school and did my master's degree around, is all that was so bespoke. Like it was very like, you know, I'm going to take really domain specific data and build a really specific model that then I can use to solve this one thing. But I can't just pick that model up and like apply it to some other new thing. And I think that's like sort of the foundational difference between the ML of old and what we're seeing essentially now with generative AI is that you can take something that was built for your example translation, and then suddenly you can solve other problems with it. And that's really, really transformative. Yep. Yep. Absolutely. And then the GPT on top of that and the interface. I mean, the idea now that we can solve so many of these tasks where it used to be like, well, let me start building

Starting point is 00:34:08 a data set and let me come up with a neural architecture and all this stuff. And now we're just like, oh, there's an API, and we just pass stuff in. I mean, it's the accessibility of the technology that the GPT and the chatbots have enabled is really incredible. Yeah, absolutely. So you've been working on, you know, Crystal DBA for a couple of years now. What was the original idea and sort of just before generative AI really broke out into the mainstream. And the company originally was a bit more of a systems company than it is today. And I'll get to what it is right now. But actually, I'll start with the problem that we targeted, which actually hasn't changed. And really it does go back to what I experienced and what I saw building up, tagged and high five and moving fast in the social space and operating at scale, which is really, you know, I just could never really really understand i'll just put this for like

Starting point is 00:35:27 why it was so hard okay so let me frame that a little bit um and actually i want to talk about twitter rather than you know what we were doing because twitter is just such a simple product right and it stands to reason the first version of it was built by a small team in like two weeks right and then you fast forward a bit and you know they start having all these challenges around making it keep working as you scale right um yeah and they get the fail well. And it's surprising because the product is so simple, right? And you build up a team and you have like 100 engineers. And look, unlike Friendster, they made it work, right? But it actually wasn't obvious at the time.

Starting point is 00:36:17 And frankly, if they hadn't had the right people on the team, it might not have. And so the thing that, you know, that was actually really sort of the motivation also for my work in serverless. But it was really like, why can't these things just work? Why, if I've sort of specified out the, you know, the application, and I have it working, why can't it just stay working? And so often the challenges when things break with these systems, it actually, it all comes down to the state management, right? And so, so many of the challenges that we saw building and making the scale, whether it's at Twitter or whether it's at, you know, if we in tag,

Starting point is 00:37:02 it was all coming down to like, we've got to make this database, you know, run better scale, you know, and that means that we're, you know, we're doing indexing, we are, you know, we're sharding, we are, you know, tuning the internal knobs and the parameters, we're using Oracle, but you know, tuning all that managing the the storage, getting all these things just right so that the application would continue to run. And we had a team of experts who really knew the system, these DBAs. And so we were able to do it. And I think what we hear from so many of these stories, you know, including the failures, including the, you know, the say the Friendsters, you know, if they developers to move fast and build whatever they can imagine without getting stuck. And at the original iteration, that was a more scalable, easier to manage database, Postgres, it's since changed to really take advantage of the technology that is now commonly available, which is this generative AI and specifically the interactivity, the interface, so the natural language interface and the reasoning capabilities of the LLM. So what is Crystal DBA? It's an agentic system. Okay, so you can think of it as taking

Starting point is 00:38:54 Postgres knowledge and expertise, wrapping it up in a box, right? And then connecting it to your databases, connecting it to your Slack, so that now every developer basically has access, instant access to an expert DBA, an expert Postgres DBA, whenever they want, to help them through whatever it is, the challenge that they're seeing. So it could be in a production challenge around, you know, this query is slow. What's going on?

Starting point is 00:39:29 Why is it doing that? It could be around the capacity planning. Right. So say you go to RDS, AWS RDS, and they have many, many different shapes and sizes of Postgres instances and just for options for the storage. And it's actually very confusing. many, many different shapes and sizes of Postgres instances and different options for the storage. And it's actually very confusing. Which one should I use?

Starting point is 00:39:50 You pay too much. If you oversize it, you're wasting money. On the other hand, what we often see with folks that we go into is that they're actually wasting money, and also they're having performance problems, because they've tried to throw hardware at it and, you know, it works to a point and then it stops working. And so you're really sort of in this worst of all worlds scenario. And it's really just because you don't have the expertise. It's like you can do it, right? And smart people actually can figure it out. Developers are smart. They read enough books or blog posts or whatever and try things.

Starting point is 00:40:26 They're going to figure it out. But most people have better things to do. They want to be developing their application, adding features, coding, whatever. Figuring out what's going on with the database is the last thing that they want to be doing. So there's an entire plethora of various database solutions that exist that are available to handle, solve a variety of different scale issues. Do you think, I'm just curious, based on your experience, are some of those technologies created because you could actually do most of this stuff with, I don't know, MySQL or Postgres

Starting point is 00:41:16 or name your relational database if you had the DBA expertise there to tune it to reach scale, but then other products essentially are, are they taking advantage of the fact that like, you know, a lot of people don't have that. So then they're able to offer something that scales in some other way that out of the box provides, you know, you know, certain like horizontal sharding,

Starting point is 00:41:37 or if you go to certain distributed databases, you give up some of the value of relationships so that you can scale in different ways. Is essentially the lack of having database expertise why we end up with so many flavors of databases? I love that question. And there's a lot to say here. Frankly, you could probably write a blog or a book on this. So let me think about how to answer concisely. Let's actually just focus a little bit on relational versus NoSQL, because that's been and continues to be kind of a big question that developers have. Like, which way do I go? And one of the advantages of the NoSQL systems

Starting point is 00:42:30 is that there's this sort of belief, which I think is largely true, is that you don't really need DBAs that much. So if you're in a key value store, what you've done is you've basically moved. Like like you can't do joins. Right. So these systems, as long as you're kind of operating on one piece of data, right, at a time and doing that, then they work really well. But then when you want to operate across multiple pieces of data, what you're going to do is you're going to bring all that data into the application and you're going to do it there. And if you're a developer and you

Starting point is 00:43:06 know how to work in the application, this is actually a much more natural thing for you to do. That said, the Go or Java or JavaScript program in order to do a join is way, way bigger of a program than a SQL program to do the same thing. Right. And so what you've basically done is you've kind of moved the problem around. And so when I think about relational versus non-relational, I think that really you ask, what is the shape of my data, right? If my data is a bunch of documents, then actually, and I use it as documents. So I kind of do one at a time operations, you know, and if maybe you're in a social profile, right? I take it,

Starting point is 00:43:59 I display it, right? There isn't really anything that is table-like about that. I can put it in a table. You can always put it in the other model, like you can always transform it, there isn't really anything that is table-like about that. I can put it in a table. You can always put it in the other model. You can always transform it. Then that's going to be something that makes sense. But if I have tabular data, which a lot of business applications, it makes a lot of sense to store, whether it's financial data or customer data or whatever, it makes a lot of sense to store it in tables. And what you also commonly want to do is you want to recombine that data in a whole variety of different ways in order to do different analysis or different presentation and so on and so forth. And if that's what you're doing, then relational databases are a really good solution. And the SQL language is a very expressive way to sort of define those programs.

Starting point is 00:44:51 And whether that's people writing it or now increasingly generative AI writing it, if you can write a simpler, more concise program, it's going to be better. Now the challenge becomes then how do you make that run well? And so one of the sort of key innovations in relational databases, and this is going back to the 1970s, is this idea that you separate your logical model from your physical model and that you can have a physical model and sort of an actual execution plan to the the you have kind of a logical plan, which is what are you trying to compute? Right. And it's essentially declarative. And then to run it, computers, you know, computers actually don't work declaratively. They work imperatively. They work on a sequence of instructions. You need to translate that. And that translation, which is happening in the... It's essentially a compiler and optimizer, right? We don't always think about it that way, but that's what it really is, right? And what is that doing? It's taking a high language and

Starting point is 00:46:02 translating it into a low level thing. And the reason why things get complicated basically is because there are limits on how well those systems work. Right. And so oftentimes what we're doing is we're supplementing with human expertise to basically tune what in the original concept is an automatic transformation. Right. Yeah. I mean, that makes sense. Like you're,

Starting point is 00:46:34 you're going to have, you know, even if you're doing that compilation and it's, it's, you know, super tuned, it's not going to be a hundred percent, especially they're going to solve that in generality.

Starting point is 00:46:45 And for specific use cases or specific problems, it might not essentially be the best query planner sequence that you could run because for your very niche use case, you might have to do something a little bit more custom to get the performance that you need. So I think maybe let me just kind of cap this out a little bit and try to kind of and I feel like I sort of let me finish it a little bit more. So what we have today in relational databases is, you know, it's like the problem is solved maybe 70 or 80% of the way. This sort of translates from high level to the more low level executed. But it's not solved 100% of the way.

Starting point is 00:47:33 And actually, people have been trying to solve it 100% of the way for really for a long time, right? I mean, you go back and Andy Pavlo has written in his group at Carnegie Mell. They've written on this a whole bunch and they're actually doing some of the cutting edge research in this area as well. But basically, we just we haven't gotten all the way. And so that's where the humans come in, the human database administrators come in, you know, and they're essentially patching over that last little bit. If you go to the NoSQL model, it becomes very... You basically give it to the humans 100%, right? Because you don't have a lot of machinery under there in order to do these optimizations.

Starting point is 00:48:17 That's changing a bit with some of the latest versions of Mongo and so forth. They're incorporating relational and the two are kind of blending. But the thing that's really different now also because if you look at it say oracle has um been doing and and microsoft they've actually had automation for around databases and database administration for for 25 years right um microsoft has had the auto admin group. They've had things in SQL Server since sort of the, you know, probably somewhere in the 2005 to 2010 era. It's the same time that Oracle came out with, you know, 10G, which had a lot of optimizations and so forth

Starting point is 00:49:00 that were done automatically. It eliminated a lot of the cumbersome manual tuning that people might need to do. But it didn't go all the way for various reasons. And when I talk with Oracle DBAs, actually, many of them still today, and Oracle's been promoting the autonomous database and all that, which doesn't benefit from generative AI,

Starting point is 00:49:22 I want to be clear. Many DBAs actually start out, they just turn that stuff off and say, well, it's resource consumptive. And actually what it does is it does bad advice. And actually, in our experience, we had a whole bunch of outages at one point that turned out to be something undocumented, automated. It was trying to make some optimizations and actually, you know, it ended up causing outages. So there are a number of things that are very different today with Crystal DBA, both in terms of the technology that's available to us. And I think also importantly, there are a number of things that we do differently in terms of the approach. So when it comes to operational data, the number one thing, right, is the durability, the security and the availability of that

Starting point is 00:50:19 data system. Performance also matters, right? But nobody wants the performance if it's going to break the reliability or the availability of the system. And so you really have to look at your objectives that you're giving to the AI system. As number one, this thing has to be, you know, has to be secure. You know, it has to not mess up the data, not corrupt anything, not lose anything. Um, you know, and, and it has to be available. It has to be available for queries. Right. Um, and so if you start out with an approach of how can I get 10% more performance from this system, um, you're actually, and then you try to, to maybe add these things later, you're just not, you're like, you're actually... And then you try to maybe add these things later. You're just not going to get there. You're going down the wrong path.

Starting point is 00:51:10 And so that's why the way we've engineered and the way that we test Crystal DBA is first and foremost on understanding, are we making it reliable? So that's one thing that's different. And then, frankly, the generative AI is just a great tool for being able to reason through complex chains, such as root cause analysis, and definitely also for interfacing with a customer. So how do you bake in all this DBA knowledge essentially into the AI so that it can make good decisions or provide at least good advice? Yeah.

Starting point is 00:51:55 So first of all, I'll say the way that we're doing it right now is almost certainly going to be different in six months and then 12 months just because the tools are changing. But I can tell you, it is an agentic system. It has knowledge of Postgres that is baked into it, where we have basically created the manual, if you will, the training, the DBA training guide that we've written. And then the other thing that is really key is the tools. Right. the ability to run Postgres in a sandbox environment, to install indexes, to update statistics, to see what will Postgres do in this particular situation. And we also have a training data set that's generated in a gym where it's seen a lot of situations already that it can map.

Starting point is 00:53:05 So what this ends up delivering as a result is that the advice is actionable. And then because it's connected to your database is actionable in your environment. And we've also done a lot of which is essentially the prompting to cut out any of the generic responses that the LLMs might give you. But high level agentic system, the current system is not using fine tuning. We expect that that's probably going to be part of it as well, but you could actually go a pretty long way without doing that. And I think the other thing that I just kind of want to emphasize is that the

Starting point is 00:53:51 architecture that we have today wouldn't have been I mean, I guess it would have been a research thing, but it wouldn't have been a sort of a well accepted technique, I think, 12 months ago, and I really don't know what it will be 12 months from now. It's moving very fast. But the things that stay the same are the validation. And that is, I guess I'm going back to the things that let you move fast when you were building, say, social networks 15 years ago. It's all the same stuff.

Starting point is 00:54:25 It's all the same software development stuff, right? Like, well, how does CI CD work for interactive chat experiences with databases? So those are a lot of the problems that we're working on, too. Yeah. So you're doing some sort of post response validation basically to make sure that things are runnable, it's quality advice, no chance for hallucinations and so forth. Exactly.

Starting point is 00:54:51 Exactly. And I think what's really fascinating to me, too, right now is how we're really seeing data science and software engineering. They're kind of merging and blurring. And I think this is going to be really interesting to watch over the next five years. The skill sets actually that developers have end up being quite different. Yeah, absolutely. So what's sort of next for you guys?

Starting point is 00:55:18 What's the state of the company and what are you focused on for the next couple quarters? So the company is seed stage. Right now, we have the product in POCs with customers. I think I'm quite excited about an upcoming self-serve, anybody can download it and just get started model. So that's coming. We don't have a date set for that announcement, but we're pretty excited about it and it's not going to be long. Awesome. Well, I'm looking forward to that. Johan, we'll have to have you back down the road to dig into some more details as you guys progress as well.

Starting point is 00:56:02 But thanks so much for being here. Yeah. Thanks very much for having me, Sean. you guys progress as well. But thanks so much for being here. Yeah. Thanks very much for having me, Sean. And good time chatting as always. Yeah, absolutely. Cheers.

Software Huddle - Lessons from Building Tagged.com + AI-Driven Database Optimization with Johann Schleier-Smith

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.