Software Huddle - Serverless Clickhouse with Tyler Wells

Episode Date: March 19, 2024

Today's episode is with Tyler Wells. Tyler is the CTO and co-founder at Propel. He was an early employee at Skype (and Microsoft after the acquisition) as well as Twilio. While at Twilio, Tyler helped... build a data platform to power customer-facing analytics for a major Twilio feature. Propel is the productized version of that for other teams looking to build similar experiences. In this episode, we see how this real-time, flexible analytics problem is tricky for a lot of teams, as well as how Propel is helping to solve the problem. We also cover some of Alex's favorite hobby horses for infrastructure developers -- what it's like building infrastructure services, how to think about billing, how S3 is becoming ubiquitous, and what to do about cross-AZ network costs. Timestamps 02:29 Introduction 08:05 What is Propel? 22:28 ClickHouse 29:15 Target Customers 30:28 Billing Model 35:10 S3 becoming a key part? 36:47 Cross AZ Network Costs 41:56 Current Support 51:39 Access Policies 55:39 Rapid Fire 01:03:16 AI replacing Software Engineers? Show Notes Data Chaos Podcast https://www.propeldata.com/

Transcript
Discussion (0)
Starting point is 00:00:00 A lot of us coming from the builder background, that engineering background, especially having been in larger, more established companies, you're not thinking about the go-to-market. And that's definitely been a steep learning curve. Are you running separate ClickHouse instances for each one of your customers, or are you running ClickHouse sort of multi-tenant? I don't know how ClickHouse works on that aspect. So we're doing multi-tenant? I don't know how ClickHouse works on that aspect. So we're doing multi-tenant. So ClickHouse is designed as a single tenancy, but we're kind of giving each customer their own database and then using the access control policies inside of ClickHouse
Starting point is 00:00:38 to ensure that, you know, customer A only ever sees customer A's data. And that's it. And so we've turned it in, we've, we've made it multi-tenant and we built all the security and sort of the safeguards to ensure it's multi-tenant, but we've also evolved now. So originally it was like, Hey, we're going to have the serverless multi-tenant thing that, um, you know, we'll run this giant cluster and, and maintain it and ensure the performance and everything else like that. But we're now starting to evolve to where, yes, you still have the serverless, but a customer
Starting point is 00:01:08 can come to us and say, hey, I want to have single tenancy. I want to have my own virtual instance, run that for me, give me all the same things that Propel has, but then now separate that from the multi-tenant solution. So I've got a little more control over performance and don't have noisy neighbor problems or't have noisy neighbor problems or potentially have noisy neighbor problems and stuff like that. In 10 years, do you think we'll have more or fewer software engineers?
Starting point is 00:01:34 Hey folks, this is Alex. Today's episode is with Tyler Wells. Tyler is the CTO and co-founder at Propel. And I thought it had a really interesting backstory, right? He was at Skype, early Skype. He worked for Microsoft after Microsoft acquired Skype. He was early at Twilio, you know, when there were 200 people was there as they went public and grew and stayed there seven or eight years. And then started Propel based on some problems, some challenges he saw at Twilio and
Starting point is 00:02:00 at previous places. So I thought that was really interesting to talk about this sort of data platform they're doing to give customer facing analytics to their users. It's really interesting to think about scaling and multi-tenancy and building on top of AWS, building the infrastructure service and how you think about charging and pricing and fairness and all those sorts of things. So I hope you enjoy this episode. If you have any questions, comments, other guests you want on, feel free to reach out to me or to Sean.
Starting point is 00:02:23 And with that, let's get to the show. Tyler, welcome to the show. Alex, thank you very much for having me. It's a pleasure to be here. Absolutely. So you are the CTO and co-founder of Propel, which is an interesting, unique data company that I'm sure I would have liked to have used at a few different companies in the past. Can you just give people a little bit on your background and what Propel is?
Starting point is 00:02:43 Sure. I'll start with my background. So like you said, I'm Tyler Wells. And prior to starting Propel, which we started in February of 21, I spent seven and a half years at Twilio. Joined Twilio in 2013 as a technical lead and left Twilio in 21 as a senior director of engineering. Did a couple of different things when I was there. Started the entire video organization. So when I joined Twilio, there was only SMS and voice.
Starting point is 00:03:14 Obviously, I needed to round that out by adding real-time video. And so I joined them to start that organization and build that up. Ended up also opening an office in Mountain View for them, also opening an office in Spain. Spent a lot of time obviously building out all of the infrastructure for video plus the front-end clients. All of that was based on WebRTC. That also was sort of like a pretty fun journey in terms of data that we got to embark on there.
Starting point is 00:03:44 Obviously with WebRTC and having a global client base, we collected a lot of data, needed a lot of that data to troubleshoot, ended up using a lot of that data to basically answer questions about when a call center maybe was having a bad day or things were going wrong there, and that ended up turning into a product, one of our first data products that we shipped at Twilio, and was a lot of, I would say, sort of the initial impetus for why we started Propel. But more on that later.
Starting point is 00:04:13 Spent like six plus years doing the whole video thing and then left to start the SRE organization. That was sort of like my final piece there. Started that organization from scratch, built it up. Really got involved with reliability across the entire company there. Enjoyed that a lot and left to go start Propel where I'm at today. Yeah, very cool. Just to give people an idea, when did, I guess, when did Twilio start? When did it go public?
Starting point is 00:04:39 Where were you sort of in that timeline there? Yeah, so I came on a few years. I think Twilio started, I want to say like 2008 ish, 2009 ish. I don't remember. Um, exactly. It was before my time, uh, went public in 2016. So I was there a good three years, uh, before we went public. So I got to see all of that. I got to see our growth. Like I joined, I was employee like one 87, I think I was. And by the time I left, we were five or 6,000 employees. It was, it was a crazy, like, you know, a rise. It was just nuts. All the people where we went from. Oh yeah. Some true changes. I'm sure you saw there. Oh, huge. Um, you know,
Starting point is 00:05:16 it's, yeah, I mean, there's a lot of things that stayed true to form. Um, it was always a small team culture and we tried to continue to embrace that. But things have to evolve and change when you become larger and larger and you've got more surface area that you have to cover because of the products and everything else like that. But is there a change like around the going public time specifically, either like right before that in preparation for it or right around that time? Or is it just more gradual as you get bigger? It's just sort of always changing. And the public private distinction isn't isn't like sort of a true marker in any way. I mean, the public private distinction, largely what I saw as is more more formal trainings that you had to take. You know, at the end of the day, you know, we're building for our customers, right? And customers always remain number one. And that was the first and our goal was first and foremost to delight our customers.
Starting point is 00:06:09 And so a lot of those values, those core values did not change over time. They evolved, we refined them. But at the end of the day, we had to develop, we had developers we had to serve. And so even through the whole course of going public, that was always the core focus, right? Even the day that we went public, it was like, you know, this is day one or day zero, whatever the saying was. We'd gone public and it was like, nothing changes. Like, keep going, keep doing the exact same thing, keep delighting your customers. The only thing I would say, like I said, that changed was a little more administrative stuff, especially for leadership. A lot more mandatory classes that you had to take
Starting point is 00:06:45 in order to meet the different compliance standards and things of that nature. But no, at the end of the day, the mission stayed the same, and that was build amazing developer-focused products that allow people to communicate globally. Yeah, absolutely. And Twilio is one of those just amazing companies that seems to have kept that culture for a really long time,
Starting point is 00:07:03 always known for great developer experience and education and, and, and just great products as well. So I'm sure it was fun to see through that. But anyway, I interrupted you go back to, to Propel. So 2021, you leave Twilio. Uh, did you jump right into Propel or did you take some time off and what did that look like? I tried to take a grand total of two weeks off, um, which didn't really work. Um, you know, I'd said, okay, I'm going to, I'm going to total of two weeks off, which didn't really work. I'd said, okay, I'm going to unplug for two weeks, but was excited about starting Propel with my co-founder, Nico. And you just kind of end up doing things, right? Like you can't stop thinking. So I didn't really take any time off between that seven and a half years and jumping straight into Propel.
Starting point is 00:07:44 Just went right after it, was invigorated by the opportunity. It was essentially building on something that we'd already built at Twilio and there was no platform for it. So we really wished, we essentially built the platform we wish we had when we were building the whole client and voice insights product. Yeah, absolutely. So tell me about, yeah, what is Propel and what is that product you had at Twilio and you sort of wanted to bring to the masses? Yeah, so I think if I start with what is Propel and I'll go backwards. So Propel is a platform for developers, product engineers to deliver customer-facing analytics or analytics of any nature quickly and efficiently. And it encompasses everything that you need on the back end, from the whole data storage to the fast query response times, all the way to the front end from the React components that you
Starting point is 00:08:41 need to deliver the visualizations and everything in between there and so when we built that at twilio we built that specifically for a product which we'll call you know which was called voice insights um long story short we had a lot of data that was coming off of our kafka pipeline that was originally being landed in s3 and then somehow making it you know making its way into redshift we could answer some questions that way internally, but needed a platform that would evolve that we could eventually turn into a customer-facing application. Graduated that data from S3 into Elasticsearch. Started using Elasticsearch and Kibana to answer a lot of the questions. We would provide Kibana graphs to
Starting point is 00:09:26 customer support so they could actually turn those around and give those to customers. And then came the idea of like, let's just short circuit this and let's empower our customers with all this data. And so then we started building all of the pieces on top of that that you need in order to deliver a customer facing solution. Yeah. And customer facing, yeah, being huge. You mentioned Redshift and I know like Snowflake and Databricks or Quickhouse have been big lately. Why not just expose your Redshift or your Snowflake behind some sort of API to your customers? What's wrong with that sort of setup? Let's see. Redshift, oh gosh, that would have been very scary to do.
Starting point is 00:10:00 Right. I definitely wouldn't have done that. Yeah. Like, let's just leave that one alone. We'll just skip that. Back then, especially, that would have been... When it was like five queries at a time. Yeah. The concurrency was terrible. It's not going to scale for you well. I mean, that was always one of those things. You run a query, then you go away, get a cup of coffee,
Starting point is 00:10:20 come back and hope it finishes, right? And you can't deliver a customer-facing application like that. That's not going to work. And then if you go to Snowflake, I mean, those early days, we didn't have Snowflake at Twilio. We were not utilizing it. Those were probably very early days for it. And even today, if I wanted to build a customer-facing application with high concurrency and everything else like that, one, you're not going to get the concurrency you need. To get the performance you need, that's going to cost you an arm and a leg to keep those instances running all the time.
Starting point is 00:10:51 I mean, the whole reason for Snowflake is that separation of compute and storage. So when you're not running queries against it, you can spend the expensive storage down and keep the cheap compute. Well, if you've got a customer-facing app, you never know when somebody's going to hit that, so you've got to keep that up and running all the time.
Starting point is 00:11:07 So that's costing you a ton of money. And then there is no real API other than a JDBC connection on top of Snowflake today. That doesn't exist. Actually, at the Snowflake Summit, they were running a class on how to create a Flask app with a REST API on top of, you know, a Flask app with a REST API on top of Snowflake and all the things you have to do for that. Yeah, that just like
Starting point is 00:11:31 routes through it. Yeah, just routes through it. I mean, it was a very like, you know, hugely attentive. There's a ton of people attending that class because it doesn't exist. And if you want to build off that Snowflake data, you know, you just can't, you can't do it. And so for us going back to Twilio, you know, we had spun up Elasticsearch first because we were struggling trying to answer questions efficiently using Redshift. We started putting all that data into Elasticsearch indexes. Now we were getting the speed that we wanted and we were getting the whole front end through Kibana so we could actually visualize everything. And then at that point it was like well we're using this already this works what if we put in like what if we continue to use the api on top of this because they did offer that
Starting point is 00:12:15 but then build out the middleware and then essentially build that into the the console the twilio console uh in the form of visualizations like you know all your you know you know your time series and everything else like that, so you could actually see what was happening with the call. Yeah, very cool. And so when you sort of mentioned customer support having access to that Elasticsearch Kibana, when you rolled that out to more customer-facing, like actually end users,
Starting point is 00:12:39 was that still Elasticsearch behind the scenes doing that, or did that architecture then switch? Well, we couldn't. So to go to the first part, so we couldn't actually give customers support access. Like some of them, maybe later on that were a little more advanced, we would allow them access to that Kibana interface. But typically what would happen was support would go to Slack and say, hey, here's a call SID that was reporting issues. Here's the customer. Can you tell me what happened? And then we would take that and, you that and go through a series of steps to figure out, okay, what's
Starting point is 00:13:10 happening? Start asking questions, get a visualization to turn back and say, hey, we could do as much as say, okay, for this particular call, we could see in this leg here, all of a sudden the CPU started spinning up really, really high and the call, and there was a whole bunch of jitter, the call quality dropped. And then eventually the call just dropped off and we could return that in the form of a graph, or at least tell that story, give it back to customer support who could then go to the end customer and say, here's what happened. Obviously that becomes very expensive as you, as you scale and get larger and larger and so i think at one point we had calculated like each customer support ticket was like 35 yeah oh yeah because you basically have a
Starting point is 00:13:50 you need a data analyst on demand as well as well as all that infra and yeah saying someone that can be sharp and think about that stuff yeah and we tried to put as much tooling as we could around there like we wrote some slack bots to like where they could drop in a call sit and would automatically run a bunch of sort of basic analysis and things like that, and then make it more self-service internally. And then that was, you know, that was largely when it was like, well, can we build this for customers? And when it was time to build it for customers, we kept the same data in Elasticsearch and continued to use Elasticsearch for both internal and external. And then we just essentially built a middleware tier on top of that
Starting point is 00:14:27 that would broker the inbound REST requests from the console to Elasticsearch and back. And then Pridewall, all the access control and everything else like that. How big was that Elastic cluster, are you allowed to say? I can definitely say, it's just more can I remember. I mean, it was in the terabytes of data, and it was probably, I don't know, at least like eight to 10 nodes. Yeah. Okay. Well, I would have, I would have expected bigger than that if I sort of customer
Starting point is 00:14:53 facing step, maybe larger. Uh, yeah, it might've been larger. Um, yeah, I'm just trying to think that was, it was, it was, it cost us a considerable amount of money a year, put it that way. Yeah. And we had five people. It took me five people to staff a team in the beginning to build that entire application, the whole customer-facing stuff. And so it definitely was not a cheap endeavor. And then the biggest problem was all the other product lines inside of Twilio wanted that same thing. Um, and so it was, you know, we couldn't support it on what we were running because we, like I said, we're generating a ton of data already. And it was folks inside of my team having to manage that, keep it operational. Um, and then somebody like, you know, the chat team comes along and it's like, Hey, can we start filling up indexes?
Starting point is 00:15:44 And it's like, no, um, you gotta, you gotta spin up your own. And then it's, you know, the chat team comes along and it's like, hey, can we start filling up indexes? And it's like, no, you got to spin up your own. And then it's, you know, the messaging team is like, hey, can we, no, you need to, so each, all of these, you know, inside of each kind of like organization, you end up having to staff their own like mini data engineering teams and, you know, spin up a Elasticsearch instance and run all this stuff and kind of follow the same blueprint we had. But, you know, that's when we started thinking, like, why is this not a platform? This needs to exist as a platform. This seems like a pretty common problem. Yep, yep.
Starting point is 00:16:12 Also, just going back to, you know, we talked about why not Redshift or Snowflake. Why not, you know, if I have a Postgres instance that's handling this particular feature thing, why can't I just run these sort of queries and show that to my customers, you know, run against my Postgres or my SQL or whatever, whatever sort of transactional store I'm using? Sure. I mean, I think if you enjoy getting paged for incidents and pager duty and everything else like that. I mean, originally there were products inside of Twilio that did exactly that. And so we would have our transactional databases running. These are all highly normalized that are optimized for writes and transactions.
Starting point is 00:16:55 And somebody would come in and say, hey, you know what? I've got to get some analytics out of this. And so the first thing they would do is they would start querying that. And then some query would get awry and would lock up a table. And then the use case that it was designed for would no longer work. And then we query we'd get a ride and would lock up a table. And then the, you know, the, the use case that it was designed for would no longer work. And then we'd have an outage and then we'd all get paged and we'd come in and figure, you know, forensically figure out what is going on and, Oh, who ran this query? Oh, that's from, you know, that's from the reporting side of the house. Shoot. Why are we doing that? Okay. Let's do read replicas. And then you kind
Starting point is 00:17:24 of graduate to that. And then it just becomes, we actually need to separate this. Like we actually need to separate these two use cases. And, you know, yeah, that's when we started sending things to Redshift or, you know, obviously in the case of what we were building, we weren't using a lot of databases.
Starting point is 00:17:41 So our data was, essentially our services were generating events that were going onto a Kafka pipeline. And then that was making their way into Elasticsearch. Yep, absolutely. Yeah, I think that's like, there's like this middle ground where, you know, the strict OLTP stuff that's handled really well by Postgres, MySQL, all sorts of flavors of database. And then the strict OLAP reporting stuff is handled well by Redshift and Stoflake. But that middle tier where it's like, I want to be doing aggregations and flexible queries and things like that. But it's going to be at a higher volume. It's going to be exposed to my customers.
Starting point is 00:18:12 And that's a harder problem to solve. And I've seen that be difficult for a lot of folks. So I guess, what's the approach at Paprel? What's going on there? That's where we wish we had ClickHouse. And if we'd had ClickHouse at that time, I think that probably would have been the technical choice we would have made. How interesting.
Starting point is 00:18:32 But coming out of Twilio, going into the early design phase of Propel, it was like, I don't want to run Elasticsearch again. I don't want to be responsible for spinning up this, you know, this giant cluster of nodes and the type of problems that we had. This doesn't seem to make a lot of sense for what we want to do if we want to platformize this and have it be multi-tenant and everything else like that.
Starting point is 00:18:59 And so this is where ClickHouse came in. Yeah. Just to interrupt you, I feel like that's the feeling of, of like everyone that's done that. They're like, you know, I'll stick with it while it's here. But like the next time this sort of comes up that someone suggests this, it's like, please, can we just use anything else? Cause it's operationally, like it's very difficult. It's amazing and powerful, you know, in a lot of ways, but it's also like operationally it's, it's a difficult beast and, you know, try to avoid it or limit it as much as you can. Well, it's, it's, it's funny because it's also like operationally, it's a difficult beast. And try to avoid it or limit it as much as you can.
Starting point is 00:19:27 Well, it's funny because it's like all the things you're hitting on are unfortunately things that I've done and have cut my teeth on and hurt myself at other places. We tried the whole Postgres thing. We did that at Skype. We would regularly knock over. Heroku had a hosted Postgres. And we would destroy that thing with the amount of data that we were sending to it. So we would have to get on the call with like the CTO of Heroku and like,
Starting point is 00:19:52 hey, sorry, we did it again. They would have to try to recover it for us. We'd only be able to recover a certain amount. And then, you know, a few days would go by and we'd kill it again. And they were angry at us. And so it was like, yeah, don't, don't do this type of workload on Postgres. And then obviously at Twilio was the, okay, we're doing this on Elasticsearch and now Propel, we're doing all this on Clickhouse. Yeah. That's amazing. I just, I just interviewed Craig Kirsteins from, you know, Heroku Postgres for a long time. I wish I would've known that story. I could ask him, Hey, what do you think about, what do you think about Tyron Wells and the Skype team and their Postgres usage? I'm sure you're going to love that one. Yeah, it was a kind of interesting story.
Starting point is 00:20:28 When I was at Skype, I had built the Facebook video calling that was all powered by Skype. And we had shipped, well, we'd launched it. And day one, we launched it to like 13 million Facebook users. And they had it sort of like geographic specific on where they were going to allow it. Of course, the way that they'd done it didn't quite hold up and people were figuring out how to game it. And so all of a sudden we went from one of the sort of like caveats of the deal was we had to hit a certain like call completion percentage or something like that. So if somebody attempted to make a call, it had to complete, I don't know, like 95% of the time or something like that. And so we had to generate CDRs, which is kind of like a very SIP type world, you know, call data records.
Starting point is 00:21:25 But we had to have some place to shove those CDRs so we could actually calculate the call, you know, the call completion percentages. And we couldn't shove those into the back end of Skype. They were like, no way, you know, that would have taken a long time. And so that's where we ended up spinning up Heroku. So it was like, okay, we got out the company credit card, paid the money for hosted instances of Heroku and a couple of other things, and started jamming all that data in there. Well, all of a sudden you go from $13 million to $25 million to $100 million, and you can imagine all the CDRs you're generating. The billions upon billions arose, and then finally, I think we just fall over. Yeah. Oh, yeah, I'm sure.
Starting point is 00:22:08 Yeah. I mean, we finally got to the point where we would just have a cron job that was just going in and just all it did to delete rows, just deleted rows constantly. And then we had to introduce sampling because even doing that, we couldn't keep up with it, especially as that got opened up to, you know, the 750 million Facebook users that were there at the time and as they were nearing a billion. Yeah. That's wild. What a fun story. Yeah. So, okay. So sounds like you've sort of graduated to ClickHouse at Propel. Tell me, yeah. Tell me what that looks like sort of under the hood.
Starting point is 00:22:41 Yeah. So under the hood, the first thing we did is kind of did a steel thread. And we were like, okay, what's the performance of this at a billion rows? And so we wrote a very, I would say, rudimentary API running in a Lambda function. It was written in Node.js or something like that. And it could connect to a ClickHouse. At the time, we were just running a ClickHouse on an EC2 instance, probably underpowered or whatever else like that.
Starting point is 00:23:08 And we created a synthetic data set that had over a billion rows. And it was like, hey, can we get the performance we want of this? And that was like sub-second. We want sub-second performance over a billion rows doing a bunch of dynamic aggregations and everything else like that. And obviously, if you know ClickHouse, it was like, yeah, no problem. That's a piece of cake. And so from there, it was like, okay, so now let's actually build this thing. So we'd proved to ourselves that the things we want to do are going to make sense from
Starting point is 00:23:35 what we want, you know, the platform we want to create at Propel. Let's go make it real now. And so it was like, okay, how do we want to do this? We at Twilio had done a bunch of things. We're just standalone EC2 instances. You have to spin a lot of that stuff up by hand, or you've got to build a lot of kind of supporting infrastructure to handle deployments and everything else like that. And we're like, you know, let's do Kubernetes.
Starting point is 00:23:59 You know, there's a lot of cool stuff around tooling around Kubernetes. Let's use that. Let's stand up EKS. Let's use the ClickHouse operator and go with that. And that's how we're running it today. And so, you know, we're running a pretty decently sized cluster. We've got three replicas, you know, running at all time. It's sitting inside of our production environment under AWS EKS.
Starting point is 00:24:26 We use CDK for all of the deployments, as well as the application of those either Helm charts and or manifest to manage the infrastructure. And yeah, it's been running now quite well for us for a couple of years. I've been real happy with that. It's been very stable quite well for us for a couple of years. I've been real happy with that. It's been very stable and fast. Gotcha. And so are you running separate ClickHouse instances for each one of your customers, or are you running ClickHouse sort of multi-tenant? I don't know how ClickHouse works on that aspect.
Starting point is 00:24:57 So we're doing multi-tenant. So ClickHouse is designed as a single tenancy, but we're kind of giving each customer their own database and then using the access control policies inside of ClickHouse to ensure that, you know, customer A only ever sees customer A's data, and that's it. And so we've turned it in, we've made it multi-tenant, and we've built all the security and sort of the safeguards
Starting point is 00:25:23 to ensure it's multi-tenant. But we've also evolved now. So originally it was like, hey, we're going to have this serverless multi-tenant thing that we'll run this giant cluster and maintain it and ensure the performance and everything else like that. But we're now starting to evolve to where, yes, you still have the serverless, but a customer can come to us and say, hey, I want to have single tenancy. I want to have my own virtual instance, run that for me, give me all the same things that Propel has, but then now separate that from the multi-tenant solution. So I've got a little more control over performance and don't have noisy neighbor problems or potentially have noisy neighbor problems and stuff like that. Yeah, gotcha.
Starting point is 00:26:02 And when they do that is that still entirely within propel's aws account or is that like you know partially in someone else's account but you manage it for them like what's i we i've seen a few like different hosting models for new data infrastructure like what are you seeing what are you all doing there today it's 100 in our aws account okay and so we have everything hosted in usc 2. All that data stays inside of US East 2 and we manage all of that infrastructure on behalf of our customers. Gotcha. Is that in the same Kubernetes cluster, but just like sort of different pods and things like that if someone wants their own virtual? Yeah. Okay. Yeah, correct. So we have a set of pods in their own namespace today that's all the serverless. And then if a customer comes in, we'll spin up a new set of pods in their own namespace and keep it there. Gotcha. And I'm just like not as familiar with
Starting point is 00:26:50 ClickHouse and what's going on in this thing there. I know it's at least columnar. I guess can you describe to me a little bit on like how ClickHouse is working so well? Because I've seen a lot of people talking about how well it works for these sorts of use cases. Yeah, I mean, it's a, you know, it's a it's a columnar data store. So it's going to be insanely fast. Specifically, you know, we, we do most everything with with time series data. And so when people are bringing us that data, you know, it's either going to be doing kind of append only data where they're just constantly adding to it. So it's like, you know, they've got events that are taking place and these events are not mutating. And so we're going to create a data pool for them in that nature.
Starting point is 00:27:32 They're going to say, hey, here's the timestamp that I want to use. Here's the column that's the timestamp that I want to use. That allows us to do all the ordering and clustering everything that way. And then ClickHouse takes care of the partitioning and everything like that for us and then makes all those sort of on-the-fly dynamic aggregations extremely quick. Or a customer can come to us. We also have the ability to do updating data pools as well. And so if you are mutating records inside of that data set, we'll use a replacing merge tree for that. And that'll allow us to, hey, this data is, we need to mutate it
Starting point is 00:28:08 sometimes. Things are going to happen to our backend. Maybe we're running state machines or things like that. And I've got a single record that's going to mutate through states. We'll pick all of those changes up and then apply those to that data pool as well. Gotcha. And what are sort of the trade-offs on if I have more of like an update style workflow versus, you know, an insert only, append only type workflow? Am I going to see, like, lower latency or, I guess, higher latency on that update only workflow or just like some delay in getting those? Like, what does that look like? I know that's difficult for sort of warehouse type things generally. For warehouse type, yeah. I mean, I think from what we've experienced so far, it's been minimal. I would say it's been, I'm not going to quantify it very well right now, but it hasn't been enough to sort of like, you know, make me stay up at night going like, oh, crap, is this thing going to work or not, right? So all of our testing and everything else like that and our customers that are in production utilizing it today, we're not seeing any sort of latent issues that we didn't expect.
Starting point is 00:29:12 Things are working as expected. Gotcha. And is there a sweet spot for Propel customers of amount of data or different things like that for like, hey, this is when this really shines for you and either bigger is too hard or like smaller is like something else might be more effective for you or what sort of target customer for you in that sense? Yeah, I mean, we do all of our benchmarking at a billion rows plus, a billion to 10 billion rows. And we have data sets that we keep around that. So when we're running through all of our unit testing and everything else like that that we can utilize those those larger data sets i feel like where we can shine is
Starting point is 00:29:49 you starting to get in that 100 million row plus um you know category um there's things that we can do there that are that are really going to make that data set snappy that are going to continue to ensure that you've got that you know, sub second response time, whether you're running, you know, a simple count, whether you're running any sort of, you know, time series type aggregations and things of that. I mean, that's all just going to be very snappy and responsive. And so your customer's customer, or our customer's customer, they're going to be able to, you know, experience kind of near real time data analytics, you know, inside of analytics inside of the console or whatever they're trying to build for. Okay, so tell me, you all have a pretty serverless billing model,
Starting point is 00:30:33 which is interesting to me and kind of surprising. I feel like a lot of data places still have trouble with that sort of thing. So you're paying per gig of storage, you're paying per gig of data scan during a query type operation. Was that a difficult thing to pull off? Have you had to adjust or re-level that or has that worked pretty well? How did you sort of go through that process? Well, first and foremost, having dealt with a number of other, I would say,
Starting point is 00:30:59 like embedded analytic solutions and things of that nature that we're trying, that they have seat-based licenses. How do you deliver a solution like this with a seat-based license? That's very hard. So if I want to build on my data and it's like, well, how many seats do I have? No idea. Am I going to buy one per customer? Am I going to take one seat and kind of hack my way through it? Like we felt that that just never worked. And so, you know, we took our experience at Twilio where everything was usage-based and we said, okay, how are we going to apply usage-based billing model, you know, to this type of analytics? And we broke it down and like, what are the things that we have
Starting point is 00:31:33 to pay for that you should have to pay for as well? And that's storage because we're providing value on top of just like, it's not just sitting in S3 or it's not just sitting on an SSD. You get a whole lot more value out of that data that's there um and that's something we have to keep up and running at all times and then what's the next piece that you know that we is providing value and that's the compute because compute is providing you that low latency and so it was very easy for us to look at how long did and click house makes it simple as well because click out soon is going to give you statistics in terms of how many rows that it scanned how much data was that, how long did it take, and everything else like that.
Starting point is 00:32:09 That just becomes an event in our system that goes on the bus, and then we use it for billing later on. Has that been hard to – I mean, are people receptive to usage-based billing? I mean, in some sense, you see a lot of this with certain serverless services, but then also people in some sense have been used to paying for instances and just thinking about that and having more predictability out of it. Like what, what's been the, the reaction to, you know, having a fully usage based billing model? I'd say it's 50, it's 50, 50. Um, some people that have come from like the Snowflake model or have used Snowflake before, where it's very usage basedbased understand it very well others start to do this crazy sort of mental calculus where they're they're taking these absolute worst case scenarios and then multiplying your your cost per you know gigabyte queried or you know storage and everything else like that
Starting point is 00:32:59 because everybody at first goes to oh i'm definitely going to have a terabyte, and I'm going to serve millions of queries. And, you know, at least to date, that's not been true. But they kind of go through that, and it immediately starts to freak them out. And they're like, that's gonna be way too expensive. And that's, that's hurt us at times. And so, you know, people have sort of extrapolated out these worst case scenarios, or best case in some, you know, however you want to look at it. And they're like, oh, that just costs us way too much money. We can never do it. And it's like, but if you're serving that same level of traffic and you're running it yourself, let's talk about the number of engineers you're going to need to run that. Let's talk about your infrastructure that you need to run that. You're going to be paying probably 10x um than what it
Starting point is 00:33:45 would cost to go and propel um and you know if you've been there and you've done that you're like okay i get it like i understand it's it's also if you start to get to those levels of you know i'm doing millions of queries you're probably not going to be playing rack rate either like we didn't when we were at aws and use them them, you start to have those negotiations. We start to put together sort of like tiered schedules and things like that. And so it's kind of been 50-50. If you're used to buying instances, it throws you off a little bit. And so that's why we've actually just last week started redoing our entire price model.
Starting point is 00:34:26 We've also made some internal changes in terms of how we store data that have given us a lot of efficiency. So we dropped our, our storage price from 250 a gigabyte down to, I think we're at 30 cents a gigabyte. Wow. Yeah. I saw 30 cents. Wow. That's a, that's a big drop there. Yeah. Yeah. It's a huge drop. And then we started, we're starting to introduce pricing now for those dedicated virtual instances. And so we're starting to put that in. I think that's running, you know, for like a medium type instances, like, you know, like $1.30 an hour or something like that. I don't remember exactly. And then so that way, you know, if you understand the pricing model of usage of, hey, I want to be serverless, I'm going to pay for storage,
Starting point is 00:35:01 I'm going to pay for gigabytes process per query. You can take that route. If it's like, no, I want my dedicated, I understand buying instances, you can take that route now as well. Yeah. With a lot of different data products over the last, I would say, you know, six, seven, eight years, we've seen S3 become like a key part of that. Is that something that factors into your product or your roadmap in terms of maybe like the tiering type things? Or is it just like, hey, we're a low latency provider and like we have to be on, you know, SSDs and rather than, you know, taking that hit from pulling cold stuff from S3? Or how do you think about that? It'll become a combination where it's already become
Starting point is 00:35:39 a combination of both, right? So S3 is obviously cheap storage. Customers aren't accessing all the data all the time. So you want to be able to have hot tiers and you want to be able to have some things that are more available or obviously use smart caching. And so we'll have a combination of that. So we've started to move towards a more S3 based model. If you look at like ClickHouse Cloud, they have a whole write up of, you know, all of their stuff is S3, then they use caching. Our first versions of all this stuff were basically SSDs, and that just gives us the crazy insane performance. But then you've got to deal with how do I scale that infrastructure? What do I do if, okay, I've deployed my infrastructure with this many terabytes of SSDs attached.
Starting point is 00:36:23 Somebody comes in and immediately starts to exceed that. What do I have got to do? So obviously we have things that are monitoring and everything else in place that then start to spin up and attach a different, you know, additional SSDs. But if you take the S3 approach, it's unlimited storage, right? And so I feel like long-term, that's the way people will go. And then you just become smart about how you're going to tier that data or cache that data. Yep. Yep, absolutely. Another thing I like to talk about with sort of data infra-type people, think about underlying costs, is sort of cross-AZ network costs, which is something I see come up a lot.
Starting point is 00:36:59 Is that like a big factor for you all? I think you mentioned three replicas for this. Do you have a lot of cross-AZ traffic? Is that a big factor for you all? Like, you know, doing, I think you mentioned three replicas for this. Do you have a lot of like cross AZ traffic? Is that a meaningful cost for you? Or is that sort of overrated, overblown in terms of what it looks like? Right now, overrated, overblown. We're not seeing it. So yes, I've got three replicas.
Starting point is 00:37:16 They're deployed in three different AZs inside of the same region. So inside of US East 2. That doesn't even, when I look at that AWS bill, which I look at a ton, spend a lot of time there. It's not even, it doesn't even when i look at that aws bill which i look at a ton spent a lot of time there it's not even it doesn't even factor in um my compute costs are sort of number one my storage costs are number two um you you know you got to get start to get things like dynamo and everything else in there uh surprisingly at one point our nat gateways were very very expensive um we've since done away with majority of of those and eliminated a lot of that cost. But no, the cross-AZ traffic, I just don't even see it right now.
Starting point is 00:37:51 Yeah. Is it tough to have this infrastructure product? And you're much more than just a reseller of AWS because you're, significant value add on top of those those things. But is it tough to like manage that? You know, there's that that core underlying infrastructure costs? And how do you sort of build that out to, to customers and go through that, that sort of like, how did you go through that? Did you talk with other info providers? Like, what did that process look like? Yeah, I mean, I think in the beginning, we did a cost model, right? So we looked at what does all of the underlying infrastructure cost for us to keep this thing up and running with, you know, kind of hypothetical this many number of customers
Starting point is 00:38:34 and hypothetical this much data stored on us. And what makes sense for us to kind of break that down to, you know, not be insanely greedy in terms of your margins there, but, you know, something that's reasonable. And then obviously you start to look at some of the people that are playing in the same space. I think one thing that was hard was finding people that were doing what we were doing at the time. ClickHouse Cloud wasn't out yet. There's a company called Tiny Bird that is doing, I think, similar things. CubeJS was different. And so we kind of had to take a swag and said, okay, here, what makes sense? What makes sense on what we are spending? And then the value that
Starting point is 00:39:23 we're adding in the way that we looked at a lot of that value was how much would it cost for a company to do this themselves? And how many engineers would it take to maintain this and manage this? Because this was like the same sort of infrastructure play that we would have taken if, say, for instance, I had to build this again at another company and then factored all that in. Yep, absolutely. What about in terms of, you mentioned compute being your highest cost.
Starting point is 00:39:49 I remember talking to a friend at Datadog and talking about how their compute ran at pretty low utilization because they just need to be ready for, you know, a customer query that comes in and spikes that super quickly. Do you have to run your compute pretty low just to account for, like, you know, these huge queries that can spike through that? Yeah, 100%. that super quickly. Do you have to run your compute pretty low just to account for these
Starting point is 00:40:05 huge queries that can spike through that? Yeah, 100%. Obviously, I use reserved instances to help offset some of that. But yeah, we're a platform, right? And available 24-7. Our customers are building their analytic use cases for their customers on top of us. We have no idea when they're going to hit us. We have no no idea when they're going to hit us. We have no idea how often they're going to hit us. We have to be available 24-7 with, you know, I'd love to say five nines. We're probably not quite there yet, but we're pretty close to that. And so, yeah, we've got instances of specifically on the data tier that are relatively low utilization.
Starting point is 00:40:45 And, you know, they have periods where it's like, okay, it's getting there. specifically on the data tier that are relatively low utilization. And, you know, they have periods where it's like, okay, it's getting there. We're getting a lot of traffic and then periods where it's just absolutely nothing. But I think, you know, that's pretty common. We used to see a lot of similar type patterns, you know, at Twilio. And there's some things you can do with that. But, you know, for the time being, I think we just kind of got to eat that and, and, you know, make sure that I would, I would rather focus on the availability and the responsiveness than try to do some trickery with, you know, auto scaling, especially when it comes to people's data. Right. I would hate to have something where it's like, oh, you know, we try to be smarter or clever, you know, with our utilization and I lost data. Like that's one thing as a data company I never want to have happen is lose a customer's data.
Starting point is 00:41:35 Yeah, absolutely. Granted, we have a ton of redundancy built in and ways that we can recover it through backups and replicas and everything else like that, but still two huge fears as a data company is I lose somebody's data or I have a data breach or customer A runs their queries and they see customer B's data. Those would just be horrible. Yep. Yep. Absolutely. Okay. Let's shift back to the product a little bit. In terms of getting data in to Propel, as I was looking at it, it looks like I would say three main buckets of ways, right? One would be Snowflake directly. People hook up their Snowflakes and you're just ingesting data there. Another one would be Webhook, right, where I can just sort of send data to the Propel API. You'll ingest it and put it in.
Starting point is 00:42:18 And the other one I would say is, like, is, you know, like S3 Parquet, of which a bunch of things can sort of export to, whether that's Kinesis Firehose, different databases and things like that. Am I sort of understanding that correctly? 100% correct. Yep, that is it. So, yeah, the first one we built was Snowflake. We really, really liked that ecosystem and felt that was, at the time, kind of a gold standard. Still at the time, I mean, still is, still is, kind of gold standard of warehouses.
Starting point is 00:42:46 And we're very hopeful that a number of Snowflake customers would need that API that we provided at Propel, that GraphQL API to build applications. And so we wanted to make that sort of the standard. And that allowed us to then venture into, hey, you know what, we should probably have a webhook. We should have S3. We should be able to, you know, we do a lot of the stuff in Parquet format, that's sort of our native format throughout Propel. And then we're starting to open it up to more things like, you know, see Kafka coming here shortly, we're almost done with that. You know, and we've looked at things like Databricks, we've looked at things like BigQuery, we haven't seen as much sort of
Starting point is 00:43:22 market demand for those. But yeah, I mean, essentially, that's how we can ingest that data. Yeah. Okay. And with the S3 Parquet, is Parquet something that ClickHouse speaks natively, or is it just easy for you to sort of suck that up and then ingest it into Parquet, or I guess why the Parquet choice? Yeah, ClickHouse speaks that natively. We have no issues with that, so we can ingest that very simply. Okay. Yeah, and efficiently, too. So, like, we've had to do some work around that, but I feel like our Parquet support is very strong. And so we'll even take, you know, if somebody's sending us JSON through webhooks, we convert all of that stuff. And then we were able to slurp that into ClickHouse easily and efficiently.
Starting point is 00:44:01 Gotcha. With those webhooks, do you do some sort of like batching and aggregation and maybe upload every 15 minutes or something like that, like with Kinesis Firehose? Or are you taking those and inserting them right away? Or what's the webhook story look like?
Starting point is 00:44:14 Depending on the request per second, we will insert those straight away. But we have the ability also to throttle those and slow it down a little bit. And then we'll use things like Kinesis in order to do the batching and everything else like that.
Starting point is 00:44:26 And so I think we're probably doing, what are we at now? 50 messages per second or something like that on the webhook connector, which today most of our customers are not exceeding. We can obviously burst well beyond that, but we've kind of, we start to slow it down at about 50. Okay. And so ClickHouse does pretty well with frequent small inserts. Like I'm from like the, again, like the redshift mindset of just like, hey, make sure you have batches of data. I mean, you're not just like writing the little tiny inserts, but ClickHouse does well with just sort of writing little tiny records in. Yeah, that was one of the really cool things.
Starting point is 00:45:06 I mean, I remember if I go back again to my Skype days of why, you know, I was always interested, oh, let's do Hadoop. Like we can run all this. We've got like, you know, all the power of Hadoop, and we can run all these MapReduce jobs and really make this more efficient. But when you're doing CDRs, those are all very small, you know, small amounts of data coming in frequently, and that was not going to work.
Starting point is 00:45:27 You had to batch all that stuff up, and that's how we ended up in Postgres land. But ClickHouse has no issue with that. So you send us one or two records, fine. It's going to get in there quickly and easily. You start to send us thousands of records, or we're going to chunk that up, especially if it's coming from Snowflake. We're going to break that into logical pieces because there is sort of a limit of how much you can push through Snowflake at one time without knocking it over.
Starting point is 00:45:49 And so we've experimented with all of that and tested all of that. And so our ingestion systems will pull that in, chunk it up appropriately, and then dump it into ClickHouse and make it available. Gotcha. What about straight streaming from transactional databases, whether it's like DynamoDB Streams or Mongo Oplog or Postgres and MySQL, sort of the replication log? Are people looking at that or is this the type of data that is more like you're saying the event type data that's going through Kafka, maybe just into Snowflake, but not sort of your core transactional data? So in the earlier days, I would say it was more of the analytical data. And so this would be like post DBT type data was sort of the first use case that we went to. So you've kind of built that universal table. You've done all of your joins. You've enhanced that data, enriched that data from a bunch of other tables. You've landed in Snowflake and
Starting point is 00:46:41 now you brought it into Propel and now you've got all of that fast-serving layer. That's evolved into now multiple data sources coming in and then wanting to join that data. And it's evolving again until, hey, what if I've slapped Debezium on this, and now I've got Debezium feeding CDC into Kafka, and now I want to land that into Propel. And I think at some point, we'll see it also where those types of use cases bypass even a Kafka and go straight to Propel. And I think at some point we'll see it also where those types of use cases bypass even a cough can go straight to Propel. Yeah. Okay. Do I have to, you sort of mentioned like the joins ahead of time, some of the denormalization work, like, I guess, how does that ingest work? Is it just going to take it exactly as I come? Is it going to infer my columns? Should I do some transformations on it? Like in terms of like optimal querying from this, what should that
Starting point is 00:47:27 look like? Yeah, that's a great question. This is again, always evolving as data and we talk to more people. The original thesis was people were utilizing, they were using Postgres for transactional. They were transferring all of that data into something like, like Snowflake. And so extracting it all, dumping it into Snowflake, taking something like DBT, running their jobs on top of that, landing that into that nice wide table. Then we could come in and slurp all of that up. That was like our happy path. Obviously, when you get into the real world,
Starting point is 00:48:05 I've probably got two customers that actually do that. And they're great. They're very easy. It's like, wow, this is a piece of cake. And you actually listen to us and some of our advice and you built those pipes that way. And that thing just runs and it's super available for them. But the real world doesn't operate like that.
Starting point is 00:48:26 And so as we've learned more, we've had to evolve and we've had to say, yes, why we would love for you to do that. You're probably not going to. So we are starting to give them, our customers, more control over being able to do all of the joins and some more of the sophisticated sort of cleaning of that data and or enriching that data into whether it's a new view or, you know, materialized view, something of that nature. Those are all things that we're having to now evolve to because a shift that I think we're seeing is if you think a lot of the data space today, you know, how many different pieces of infrastructure are people utilizing? So let's move beyond just the, hey, I'm going to have to have, in most scenarios, a transactional database. So let's just say you've got a Postgres, MySQL, or a Mongo on that left side, and then now I've got to sort of build the rest of my infrastructure
Starting point is 00:49:22 to do analytics, to build applications and everything else like that. Today, a lot of that is a whole bunch of pieces of infrastructure, right? So you've got everything from, okay, I've got a Kafka pipeline. Sure, I can continue with that. I've got to do some transformation and some enrichment. I'm probably going to bring in DBT. Or if I'm going Snowflake, I'm probably going to send that to Snowflake. And then I'm going to use our dynamic tables. I'm going to create these materialized views. I'm going Snowflake, I'm probably going to send that to Snowflake, and then I'm going to use their dynamic tables. I'm going to create these materialized views. I'm going to probably have to bring in another piece of data to sort – there's a lot of cobbling of Lego pieces, right?
Starting point is 00:50:07 And I think one of the trends we're kind of seeing is people want to have potentially like that single solution Swiss army knife of maybe I'm not at the scale of Snowflake or I'm not at the scale of Databricks yet, or maybe do I even need to be, but I want to be able to do all of these things on top of my data at maybe a somewhat smaller scale. And I want to use that data to build applications on top of it. That's a lot of different skills and sort of components that you have to bring together. And so I think the trend that we're kind of seeing is where do they go when they want that in a single solution? And if they want that in a single solution, does it exist today? Maybe. You know, are we marching very quickly towards that at Propel? We're trying to. Because again, our customers are driving that. Our customers are saying like,
Starting point is 00:50:48 I don't want to, I don't really want to use DPT. I don't really want to do this additional step. Why can't I send you my Kafka, my web hooks, and pull in data from maybe Parquet files, and then do all of that joining and enrichment inside of Propel and then utilize your APIs and your front-end UI components to deliver that to my customers. Yep, absolutely. Does ClickHouse have materialized views natively? Yes, yeah, absolutely. Yeah, you can do some really cool things like you can do a materialized view of a Postgres table inside of ClickHouse. So your ClickHouse is actually connecting to your Postgres and then you're creating materialized views
Starting point is 00:51:28 out of the CDC in that Postgres. It's pretty neat. So yeah, there's some very cool stuff and tricks inside of there that you can do neat stuff with. Yeah, very cool. One other thing I loved, I'm just looking through, is you have access policies. Given that there'll be API keys and things like that locking it down, but specifically the multi-tenant access policy stuff, right?
Starting point is 00:51:52 Where I'm exposing this to end users, of which I'm probably going to have a lot of, and their data is going to be intermingled together. And basically just sort of native in Propel, I can set up these policies that say, hey, you can only, you know, they're going to pass this parameter in and they can only query for that data that they pass in, right? So you're not, you know, like you were short of saying, you don't want customer A to see customer B's data. If I'm customer A, I don't want customer one seeing customer two's data, that sort of thing. And you make it a lot easier for that sort of thing. Yeah. So we don't use API keys in that case. We actually don't support them. We use Jot tokens. And so in order to utilize the Propel API, the GraphQL API, you need to create a JWT token. And then one of the things you do to enforce that multi-tenancy is you can use custom claims inside of that JWT token to state what is your tenant ID, your column of tenancy.
Starting point is 00:52:46 And so you can imagine if you've got an account ID and so you're going to go create that jot token with that account id embedded embedded and cryptographically signed inside of that jot token that's going to be passed to the back end we're going to validate that and then that tenant id is now going to be used essentially the best way to think about it it, it's stamped into that where clause at all times. And so your front-end developers don't ever have to worry about that. So they're not thinking to themselves, okay, gosh, I got to make sure that SQL query contains that where clause
Starting point is 00:53:15 that I'm making sure that I'm ensuring that the account ID is set and everything of that nature. It's just part of the token. And so when that gets to the back-end Propel infrastructure, that's enforced by us. And so when that gets to the back-end Propel infrastructure, that's enforced by us. And so we're ensuring when you're saying, give me that count of revenue or the sum of revenue for a given month for customer ABCD,
Starting point is 00:53:40 it's going to be customer ABCD and not customer somebody else that's inside of your tenancy. Yeah, exactly. And that makes me think back to at Snowflake Summit, you were saying building an API on top of Snowflake. That's another thing you need to think about is just making sure in your Flask API or whatever's fronting it that you're always adding that where clause. And it's interesting, we're seeing some more tenant-aware tools like this. We have Nile for transactional postgres now,
Starting point is 00:54:05 which also is a tenant-aware solution. You all have tenant awareness built into it. I think that's pretty cool and is going to help a lot of these multi-tenant SaaS solutions that folks have out here. Yeah, and those are all lessons, obviously, we took from Twilio, where Twilio was a multi-tenant platform and we were that way from day one.
Starting point is 00:54:22 And so as we were designing Propel, it's like, okay, multi-tenancy, top of the list. Like, it's got its table stakes. It's got to be there. And we've got to make sure we do it right. So yeah, that was important for us. Cool. One thing I was thinking about
Starting point is 00:54:33 as I was looking through this, you know, data pool is sort of like your storage concept there. If I'm a tenant of yours, should I always just have all my data in one data pool? Or should I potentially have split data pools? Like given there's that multi-tenancy aspect, are there other reasons
Starting point is 00:54:49 to split up into different data pools? Or how do you think about that? Yeah, I mean, I think it just comes down to use cases and kind of how you're modeling on your own back end for the time being. I mean, that is changing here very quickly to where you can start to bring in all your tables and then create kind of these derived views on top of that inside of Propel. Today, it's going to be how are you sort of doing that modeling in your backend and you're thinking about the use case that you're wanting to serve in your frontend, in your console, or your application. And so some of our largest customers will have two or three data pools that are just built for different use cases that don't necessarily contain the same data.
Starting point is 00:55:31 There may be some overlap there, like the account IDs and things of that nature, but they're built for different use cases that are serving up inside of their analytics, their applications. Okay, cool. Super slick. All right, I want to close this out with some common questions we've started asking all of our guests here, sort of rapid fire and get your thoughts on them. So there are six of them. First one is if you could master one skill that you don't have right now, what would it be? Go to market. Go to market. Nice. That's good. That's a good, that's a hard one, right?
Starting point is 00:56:00 Yes, it is. Just being builders all the time and thinking, how do I do i you know connect the dot to that end customer yeah hopefully i don't slow down the rapid fire here but i mean i think a lot of us coming from that builder background that engineering background especially having been in larger more established companies you're not thinking about the go-to-market um and that's a you know that's that's definitely been a you know a steep learning curve as we're building and coming out. Because the whole notion, if there's ever the biggest fallacy of build it and they will come, not a chance, right? Yeah, I mean, sometimes that happens.
Starting point is 00:56:36 You're lucky. But I think in this case, yeah, I would go to market. Yeah, very cool. Great answer. What wastes the most time in your day? Waste the most time. Um, I don't know. I mean, as a founder, it's, it's hard to say that anything is a waste. Um, because there's always, yeah, I mean, there's, there's everything you're doing is, is, you know, for survival. So, um, I don't know, sleep,
Starting point is 00:57:05 um, you know, if I'm sleeping, I'm not necessarily working towards, uh, you know, getting, you know, making Propel successful, but no, I mean, it's, it's really hard to say, because like, I look at, I was saying, okay, everybody probably says, you probably get a lot of like, oh, meetings. Well, I mean, as a founder, who else is going to is going to take the meetings i have to um and those meetings are going to be everything from sales calls to marketing to product design to operate i mean it's across the board so yeah maybe maybe sleep yeah that's a great one um cool so if you could invest in one company that's you know not propel ideally a private company so you could actually, so it's actually harder to invest in. Who would it be? Which company are you bullish on?
Starting point is 00:57:48 It's got to be OpenAI right now. Yeah. It'd be hard not to. That's a great answer. Or maybe pick off one of the smaller up-and-coming. But if you could get an OpenAI. Flexity or Mistral or something. Yeah, like a Mistral or somebody like that.
Starting point is 00:58:27 You may be able to get in a little earlier. So you got to, you know, you got a little more percent ownership. But yeah, I mean, I think you'd have to go up in AI. I mean, they're just it's just's your sort of usage like? I would say daily. I use it for just about everything. I was building something the other day and I was having to convert from, oh gosh, what was I doing? It was giving me like something in like these crazy metrics and I needed like, how do I take this and convert it so I can actually use a tape measure? You know, how do I measure this, this metric value to a, you know, translate that to a tape measure? And it's, it's, you know, was doing things like that. You know, I've had it do everything from, hey, I've got these three ingredients, give me a recipe to, hey, I really don't want to parse, I don't really want to write code to parse XML. Can you, you know, give me that? Oh, yeah, it's amazing. Or generates fake data. data like it's it's all kinds of like amazing stuff yeah it's it's pretty wild
Starting point is 00:59:11 yeah so we use it for all that stuff internally like on the job stuff like we've we've got a bunch of prototyping that we've done for propel to make some of the analytics more efficient for our customers and to help them to get to insights faster but we haven't released it yet so it's it's we're playing with it still and i think you'll start to see it soon um we actually had a really a really cool thing one of our customers did is they wanted to offer conversational analytics um on top of the data they have inside of Propel. And so he built a rag, um, or utilizing a rag that, uh, you know, he made this sort of like conversational insight. So people could, could ask questions of the data in, you know, in English, natural language.
Starting point is 00:59:56 Yeah. Wow. That's pretty cool. That's awesome. Yep. All right. What, what tool or technology could you not live without? I guess, I mean, I'm just saying my computer, right? I mean, yeah, it'd be kind of hard to live without that. And if I, you know, if I take it down a step from there, you know, I could live without an IDE. I wouldn't be very happy about it, but you know, um, you know, I kind of like having my IDs for that stuff, but no, I mean, I just keep it simple. Yep. Absolutely. What person influenced you the most in your career? Man, I don't know if I have a single person. I mean, there's definitely been a lot throughout my career.
Starting point is 01:00:55 I'll go to my early career. I'll say early in my career as like a senior software engineer, kind of making my way from into the early leadership as like an early director. It's probably more early director of engineering, stuff like that. There was a guy by the name of Paul Emery, who was our VP of engineering that was just, I mean, he was amazing. He got the tech side of things, but he got the empathy side of things as well. And so he influenced a lot of my early leadership style, I would say. Was Paul at Twilio? No, Paul was not.
Starting point is 01:01:32 So Paul and I worked at a company called FaceTime Communications, and it was not the FaceTime that became Apple FaceTime. It was only the name that they bought from that company. Okay. They actually bought the name from that company? They actually bought the trademark and the name from that company. Okay. They actually bought the name from that company? They actually bought the, yeah, they bought the trademark and the name from that company. That actually helped keep the company, the company ended up turning into, they renamed themselves to Actience, Active Compliance. And that was, yeah, Paul and I worked there at the time. It was FaceTime communications. Wow. Cool. Great answer. All right all right last one what is your probability that ai
Starting point is 01:02:07 equals doom for the human race zero zero you're feeling good i'm feeling good it's all upside from here it's not gonna yeah i mean i think a lot of it just gets it gets i think a lot of it gets overblown and i think that there's enough people that are thinking about the problem from that side of the house that it's just, it's not going to run away from us that, you know, um, we're gonna, it could always happen, but I think it's close. It's very close to zero. That's, that's where I'm at too. But, uh, it's fun to think about. I think it's just too much good. Right.
Starting point is 01:02:40 I think that just the efficiency gains you can get from it like i mean there's probably a million projects and swirling around in my head that i would love to do more with it but obviously running a company is is takes up a lot of time and having a family takes up a ton of time and but there's like it just makes my life easier with them with a number of aspects and so um you know you know maybe i'm bought in too much because I'm too much of a technologist. But I can't say that I've ever stayed up at night thinking like, oh, my God, AI is going to take over the world and we're all going to be hunted down by these like Terminators or something like that. Yeah. What about this is off script, but, you know, since you're clearly like thinking about this stuff a little bit in 10 years, do you think we'll have more or fewer software engineers?
Starting point is 01:03:28 I don't see software engineers going away anytime soon. I don't think people's jobs are necessarily at stake today because I can't think of anything where I could punch in an idea. I can't go to GPT today, punch in an idea, and have a fully deployed solution for that idea. It's just not going to work. Probably going to get way too many hallucinations. And two, there's still today, like, you can't just say deploy. And all of a sudden, magically, all of that infrastructure it conjured up is now up and running.
Starting point is 01:04:04 And, you know, pick your flavor of cloud providers. That doesn't work. Will it get there? Probably. But today, in the next decade, no, I don't think so. I think there'll still be plenty of roles here for people that – because AI doesn't solve everything, right? I mean, we've looked at what we're building at Propel, and every time we bring up AI, we look at LLMs, it's not going to increase our market share, right? It's not going to increase our TAM.
Starting point is 01:04:37 Oh, all of a sudden I introduce this thing that like, oh, AI, we're using LLMs to do X. My TAM doesn't increase magically because of that. My existing customers are able to do things possibly more efficiently, faster, in a more repeatable nature without having to know as much. But I don't see my TAM wildly expanding because I've all of a sudden introduced a bunch of functionality there. Yeah, yeah, that's right. You know, when sort of ChatGPT first came out and first started playing with it last January, February,
Starting point is 01:05:12 I was like, wow, this is going to replace me soon. And then the more you play with it, you're like, okay, it's super useful in a lot of ways, like you were talking about earlier. It still needs a lot of guidance. I think it needs a couple jump steps before it really starts replacing people. I think it's going to open doors for a lot of people and increase that. So it's been fun to see. But I think if like, the other thing that I'm seeing that's kind of interesting is there's
Starting point is 01:05:38 sort of a class of developers that don't want anything to do with it. Yeah. And they're like very anti-GPT and they're like, screw that thing. I don't want anything to do with it yeah and they're like very anti-gpt and they're like screw that thing i don't want anything to do with it and you know i've been at this for like 25 years and there's things i don't want to write anymore like i don't want to remember how to do certain things right and it's been great at that like you know having to just sitting down and saying oh i don't want to write this again. God, I've done this, I mean, how many times?
Starting point is 01:06:06 And like, okay, now I'm not doing it in this language, but I'm doing it in this language over here. Just go have it do it. And then I just have it do it. And I can, obviously you can read it, look at it and be like, well, okay, that's pretty close. That's close enough. Okay, I can take it from here and just write the rest of it. And I think those efficiency gains, at least for me, when it comes to just development have been really, really cool. I did use it for one thing that I thought was pretty fun the other day, was how do you mint a Jot token? You know, how do you pick your
Starting point is 01:06:38 language of choice and how do you create a Jot token? Because that's kind of the first thing that a developer has to do when using the Propel API. And we kind of had similar issues, I would say, back at Twilio. Because on the Twilio client, we use Jot Tokens as well for authentication. And I was like, you know, we should just have code examples in a whole bunch of different languages of how to just mint a token. And so I started with, obviously, like started started with simple, did Python, did Ruby. Then I was like, well, let's do JavaScript. Let's do TypeScript.
Starting point is 01:07:10 Well, hey, let's do C Sharp. Let's do Rust. And so next thing you know, it was like over a weekend, utilizing GPT and what I wanted to do and refining my prompt every time. Some things were kind of getting a little screwy, so do and refining my prompt every time. Some things were kind of getting a little screwy, so I kept refining my prompt.
Starting point is 01:07:30 And I would get to the point where I was like, okay, this was now spitting out working code that would generate this JWT token and then it was fully testable as well. So it would also generate all the unit tests and everything else like that. And now I have all that stuff is available in our open source on Propel and GitHub. And a customer can go look at that and
Starting point is 01:07:51 it's like, yeah, this is kind of cool. This saved me a lot of time. And it's like, you could have probably done all those. It would have taken a lot of mental energy, especially for the languages you don't know quite as well. But just being able to just really, really reduce the mental burden to the point where you're checking it and making sure it's all right and tweaking it a little bit rather than doing that pure creation aspect of it for something that's kind of menial. Yeah, that's a great use case.
Starting point is 01:08:17 I love doing that. I expand my applications and add these little things on that would seem kind of annoying or hard to do, and now it just lowers the barrier of doing that sort of thing. Yeah, I mean, there's enough we've got to think about every day, right? And something like that is, like you said, I mean, yeah, I could probably sit down and do it. And but, you know, what am I what value am I going to necessarily get out of it? You know, I want this for my customers and I'd rather be efficient about doing it and, you know, learn on the prompting and kind of go from there. So
Starting point is 01:08:50 that's been pretty cool. And are you using it all for, for like any of your content or content generation? You know, not a ton, although I did have something recently where, you know, the, the fear of the blank page was just like so strong that I, I went into, uh, my IDE where I have co-pilot set up and I put like just a little comment at the beginning, like, Hey, I'm writing this. And in here, like sort of the points I wanted to do. And then I started writing and just to like complete some paragraphs. And often I would, I would change it, but it would at least be like, okay, that's sort of what I want to say, or, you know, I'd word it differently or things like that, but it just helps with that.
Starting point is 01:09:25 And then also if you have code examples in there, just being like, here's an example of retrieving a user in this particular thing. And then it does it. And you can tweak it a little bit, but again, just getting some of that stuff out of there. I found that pretty helpful. I might start doing more of that.
Starting point is 01:09:42 Yeah, I do the same. I find that especially the blank page, getting over that you know, being able to refine what I have or kind of take different spins on it. Um, it's definitely been super efficient and yeah, I liked it. Yeah. It's like, I don't love the voice, but it helps like just, you know, keep the ball rolling and it's like, okay, I would rewrite it in this way, but that's like the same sort of idea that I wanted to get across. Yeah. I've had it try to do a few intros for my, for my podcast. And it's a bit too bombastic and sort of in its languaging where you're just
Starting point is 01:10:15 like, okay, I would never speak that way. I can't, that's, that's too much. You've gone, like I said, be enthusiastic, but that's like above, above enthusiastic. You've gone too like I said, be enthusiastic, but that's like above enthusiastic. You've gone too far now. I got to pull it back. Yeah, I know. It doesn't know how to like, you know, just do like a nice, genuine, you know, like, I'm excited about this, but I'm not like, you know, exclamation points and way over the
Starting point is 01:10:39 top and lots of, you know, big adjectives and things like that. Yeah, I don't want the word fantastical used all the time. Yeah, that's great. Well, Tyler, this has been a lot of fun. I've enjoyed having you as a guest and learning more about Propel and what's going on in the scenes there. As you mentioned, you have your own podcast, so maybe tell folks where they can find you, where they can find Data Chaos Podcast, Propel, all those sorts of things. Yeah, absolutely.
Starting point is 01:11:07 So Data Chaos Podcast is on all the major platforms out there. So obviously using Spotify Podcaster makes that super simple. So it's distributed across the Googles, the Apple Apple podcast and Spotify is kind of the main one. Uh, that's been, you know, very easy from a distribution standpoint and I've enjoyed that a lot. Um, just starting the second season of that. I just actually put out my first episode of season two today. Um, had a repeat guest come back on. So that was a lot of fun. Um, talked a lot about AI and rags and, and Lama index. So that was pretty neat. And then as far as finding Propel, it's www.propelledata.com. You can also find all the podcast episodes there
Starting point is 01:11:55 as well and our blog and everything else that goes along with Propel and the platform itself. Cool. Awesome. Well, we will link those in the show notes. And Tyra, thanks for taking the time today. It's been great. Alex, really appreciate it. I've had a lot of fun in this conversation. So, uh, yeah, it's been good times. Thank you. Cool. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.