The Data Stack Show - 109: How Does Headless Business Intelligence Work? Featuring Artyom Keydunov and Pavel Tiunov of Cube Dev

Episode Date: October 19, 2022

Highlights from this week’s conversation include:The context of Headless BI (3:31)What Cube Dev does (9:24)How Headless BI works with other tools (13:03)An analysis of LookML (18:04)User interaction... with Cube Dev (23:40)Who manages data artifacts (25:22)Taking care of the developer experience (30:37)Levels of performance (30:37)Artyom and Pavel’s background and career journey (35:47)Why you should use Cube Dev (43:38)Roles within a data organization (48:55)How Cube Dev impacts visualization (53:35)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com..

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome back to the Data Stack Show. Today, we are going to talk with the people who started Cube, which is a super interesting tool in the analytics space. Costas, they claim that they're headless BI, which is super interesting.
Starting point is 00:00:43 We may have had the concept of headless come up, but I don't know if we've had a company come on who just explicitly calls themselves headless, which is really interesting. And that being in the BI space is really interesting. So my burning question is actually just around the topic of headless itself. Like a lot of things we discussed on the show, you know, like say CDC or other topics, like the technology or the concept itself isn't brand new, but the way that it's being implemented is actually pretty new. And I think Hube is doing that in the BI space. And so I just want to ask them about like headless as sort of a concept and how that plays itself out in various forms of technology. And of course, specifically what that means for BI. But they talk about data
Starting point is 00:01:41 modeling and all that sort of stuff. So I know you have some burning questions too. Yeah. First of all, whenever I say the term headless, it has nothing to do with Cube. For some reason, I always expect to start watching the Tim Burton movie or something. I don't buy it. Of course. Headless. Of course, headless horse.
Starting point is 00:02:09 Tim Burton movies. We'll have Tim Burton on the show and we can talk about this. We should do that. Absolutely. Yeah, I'm pretty excited. I think what we are seeing here is some very interesting short-range and lean patterns. They are getting employed on a more architectural level. seen here, like, some very interesting sort of ranging link patterns that they're getting, like, employed, like, on a more architectural level, and I think that's
Starting point is 00:02:31 what also, like, happening here with the headless behind, like, how you pick up all things so you can have, like, more control. So yeah, I'd love, like, to hear, like, how they do it, why they do it, and how it is used. And most importantly, what are the challenges? Great. Well, let's dive in and talk. RDM and Pavel, welcome to the Data Stack Show. We are so excited to talk with you about both analytics and Headless BI in particular, but also your story. So thanks for giving us some time on the Datasack show.
Starting point is 00:03:13 Thank you. Yeah, really excited to be here today to chat about data and the Headless BI. Thank you. Yeah. Hi, everyone. Okay. I'd actually like to start by sort of setting the table around the context of headless. So headless is, you know, as sort of a conceptual, like, data flow isn't actually new, right? It's sort of one of those things that became popular in, like, various forms and sort of gained the term headless in contrast to existing technology. And I'm not an expert. This is just sort of my perception. And I think one of the first big public-facing marketing pushes that we saw for headless was the CMS, right? Content Management System, right? for headless was the cms right content management system right and so people are used to dealing
Starting point is 00:04:05 with like enterprise like brutal enterprise content management management systems like you know adobe experience manager or you know at scale like really gnarly wordpress you know stuff and so like you know which not only from like a data and user standpoint was tough, but like from a performance standpoint, it was really bad. Right. So headless comes out and solves a bunch of these problems. Can you talk to us and especially speak to the listeners who I'm guessing a lot of them know what headless is, but like when we say, when you say BI, like you don't think about
Starting point is 00:04:43 headless as a concept, you know, sort of like a first order concept. So could you just help sort of go back and explain what is the concept of headless and maybe even break down the technology flow? And that I think will help us understand like why it's important for BI. Yeah. Yeah, definitely. I think the headless CMS actually is a great example too. And overall, I think the idea of the headless lies in a decoupling
Starting point is 00:05:16 of the visualization plane and the data plane or like a sort of a control plane. And that's not, like the concept itself is very old. And you know, you can, you can find roots of this concept in a software engineering and, you know, like a patterns and software engineering and all of this, right. When we have models, we have controllers, we have use, right.
Starting point is 00:05:38 We try to decouple logic and decouple, you know, like responsibilities of different pieces of code, how they operate in a software. And I think when we roll up that idea on a higher level, like on the application level, we're starting to talk about, like, can we apply the headless idea, you know, like a decoupling idea to the application layer lane. And one example is the headless CMS, where we decouple visualization layer, basically, from the data layer, right? So with the headless CMS, you can have a database of your content. And when you create this database, you don't think about how that's going to be presented,
Starting point is 00:06:18 because the presentation layer is a different one. Front-end development teams, they can use Gatsby, they can use Next.js, all of this like a folder of Jamf's technology to present data however you want, right? So you do not control as a data, as a content manager, you do not control the presentation layer and they communicate with each other over the API. I think with headless BI, the same idea is, is it possible to decouple the data model layer with visualization layer? The good example that many people talk about is the looker, because they have such good data modeling layer, like a lookML. And many times when we talk about the headless BI concept and idea, Many people, they build an example around lookers. Like, what if I can have a looker with a look ML,
Starting point is 00:07:07 but without the visualization coming from a looker, right? Like, what if I can still use look ML, but then use any visualization, any other dashboard tool on top of my look ML model? So the idea is here, what if we take the BI, we unbundle the BI into the data model layer? That's one piece with caching, access control, all of other layers that kind of naturally belongs into data model, the data plane, and then we'd have a
Starting point is 00:07:35 separate visualization layer decoupled from the data model. And I know Pavel, you talk and think a lot about that headless concept too. So maybe if you have any thoughts. Pavel Tchernovskyi, Yeah, I guess all that's right. And I guess the whole idea around like this headless part is basically whenever you're building something within BI, you're actually locked in with this BI because you're building logic around your data. And when you're building security controls, when you're building role security,
Starting point is 00:08:12 and when you design who can see this data and who cannot, and do advanced calculations on top of it, you're actually building logic inside visualization where, which is basically when they're looking to many users. And it's why we actually started being asked to, can we just query kubectl? And it was at first very like crazy idea but over time we realized it's actually made sense to have this like feature could you dig into that a little more so like when you say you started getting and actually maybe we should step back a little bit and can you describe what cube does we sort of started
Starting point is 00:09:08 off with like the baseline concept of headless but can you describe what tube does and why a user might have wanted or why users started asking to query t with sql yeah i guess i can i can start there oh i'll feel free to follow up. What we usually talk about in Q is that we talk about four layers of features that Qube has, and I mentioned LookML already, right? So imagine LookML as a data modeling layer that's done in Colt, right? That's a foundation of the Q. So every other feature is pretty much downstream of the data model. So the first thing that cube has is a data model.
Starting point is 00:09:48 So you put your cube on top of your data warehouse, and you start building the data model in cube. What are your data sets? What are the measures? What are the dimensions? How data sets relate to each other? So you're building sort of a data model similar to LookML data model. Then inside the data model, you can define how you want to cache data, how you want to refresh that cache, what is the access control rules
Starting point is 00:10:14 or security controls, right? Who can access the data and how the data can be accessed. And then finally, Kube provides a set of APIs. We provide a REST API, GraphQL API, and a SQL API. This way, the data can be the data model, and the data source of data model can be accessed from all these downstream tools. The very sweet spot for the Kube is a custom deal,
Starting point is 00:10:39 what's sometimes called data apps, or sometimes called embedded analytics, right? You're building a software product, you're having the dashboards, you're having some reports features built in your product, and you want to expose that features to your customers. And most likely you're going to build with something like React, you're going to use maybe ChartGF, Charting Library, DreamJS. You want to build a really custom native experience,
Starting point is 00:11:05 but the data play, like data API, that Kube can solve that. the charting library, D3JS, you want to build a really custom native experience. But the data play, like data API, that Kube can solve that. In that case, the users will query the Kube through the REST API, through the GraphQL API, get data, and then display it to the customers, and Kube will provide the data model, caching, security, all of it. And then what Pavel just mentioned, the SQL API, that's where many of our community members, they started to want to use different BI visualization or dashboarding tools, maybe like Metabase or Apache SuperSat or Hex, just to connect from all of these tools to Kube and to be able to consume data, not
Starting point is 00:11:44 directly from the warehouse, but consume data models tools to Kube and to be able to consume data, not directly from the warehouse, but consume data models built in Kube. Fascinating. Okay. I have a ton more questions, but Kostas, there's so much in there that I know is going to be interesting to you. And I'm interested to know what questions you're going to ask. So I'm going to pass the mic over.
Starting point is 00:12:03 Sure. going to ask, so I'm going to pass the mic over. Okay, sure. Let me start with a question on what you were just talking about. So can you explain a little bit in more detail how a simplest BI works together with the rest of, let's say, the common tools that we have in data stack out there, right? Like we still need some kind of job activation and jQuery has to do that. And then we also have a query engine somewhere where like data gets queried and all these things. So we have like quite a few new parts there.
Starting point is 00:12:45 So how do we orchestrate all this together? How do we work with all this together? And where should we start from when we build a new site? Right. Yeah. I mean, that's a great question. So, and I think that obviously what I believe in and probably believe in more like a data warehouse sort of data lake centric architecture, so what we see
Starting point is 00:13:15 with either on the data storage and the compute side, right, we usually see either data warehouse or sort of something like a data briggs or Trino, you know, like something more like a query engine, like a data lake architecture, right? So it's usually one of those two. And obviously there are like a lot of magic, like rather stack getting events into this, right, like NTL things, like loading data from different places. But in terms of, you know, like a compute and storage, that's what we see usually with a stack.
Starting point is 00:13:45 And then it would depend on the use case. So if we're talking about consuming data internally through the dashboarding or reporting tools, right? The use case usually would be that maybe different teams in an organization wants to consume data differently. Like we have a Tableau for one team, we have maybe Jupyter notebooks for the second team and a database for the third, right? So in that case, the stack would look like that. They would use Kube on top of the Databricks, right? Something like Snowflake.
Starting point is 00:14:20 They will build data model in Kube. They also can transform data with tools like a dbt and stuff like that. We see that's very, very common, right? So they can, a data engineering team can transform data with dbt upstream, and then they put a Qube on top of this. Qube will create this, the meta layer or semantic layer, right? Like sometimes, sometimes it's called just data model. So we usually, we usually use term data model
Starting point is 00:14:45 at Kube. So data engineers will build the data model layer in Kube, and then they will expose that to Tableau, they will expose that to the Apache Super Set and Jupyter Notebook. And then all the teams, they will just consume data through these tools, but they would not consume data
Starting point is 00:15:01 directly from the Snowflake, right? But they would consume data through the cube. And all the metrics, all the data definition, everything would be defined in cube. And security controls would be defined in cube as well. Who can access what kind of data, right? If you wanted to mask some of the fields or provide role-level security.
Starting point is 00:15:19 And finally, caching would be done in cube as well. So just to make sure that the data is cached on the same, you know, like with the same rules for every downstream tool. So that would be use architecture for internal use case. When we're talking about exposing data to the customers, right? In many cases, it's actually the same data set and same data model definition. So what would happen is the developer team would build on top of the API from
Starting point is 00:15:48 Kube some sort of, you know, like React application with charting libraries, and they would use this as a part of the front-end application serving to customers, right? In that case, it's going to be like embedded analytics, right, or data app. That's go through the Kube. So from React, it will be like REST API or GraphQL API call to the cube. Cube will do data modeling, processing, queries, a snowflake, right? Get data back and then sends to the front end.
Starting point is 00:16:16 But the idea is just like, regardless of the data, downstream data application, right? They all go through the cube. So it sort of centralize the data model. And is Kube, let's say, a gateway that is used only for BI or, let's say, the rest of the interactions with the data warehouse will also happen, like, through Kube, let's say, I'm a data engineer and I'm mainly building ETL pipelines, right? Is this something that I still should do through Qube or is like something that I
Starting point is 00:16:52 bypass completely, like I'm not even aware of Qube being there? Yeah, I think in that case, it happens more upstream from Qube. I would think about Qube being more like a downstream, so ETL pipeline, data collection, transformation itself that happens upstream from Kube, and then Kube usually works with data, either raw data or data already transformed by tools like DPG. You did. Okay.
Starting point is 00:17:20 And you mentioned, you mentioned Looker and LookML. So based on your experience, I mean, what are, let's talk a little bit about Looker, first of all, and LookML, and I want to ask that because I think that the introduction of LookML was like a very interesting thing that happens, the market. And it's been around like for a while. So I'd like to share from you, like in your experience, like what do you think about LookML, the things that they did right and a couple of things that you think they did wrong? Yeah, I can take this one. Yeah, I think like Looker and LookML is definitely a unique piece of technology.
Starting point is 00:18:13 However, if you try to remember what was before, you can find out like different BI tools, which basically have pretty the same approach as a LookML and basically introduction of data model in a declarative way, where you define your measures, dimensions, and then basically a web model and basically a WebCube. In a sense, what Looker introduced well and what it introduced like at first, it was a so-called true-lab based model, relational lab, because it, coincidence or not, Looker started when data warehouses started to be really responsive. So you can do live querying. And if you remember before that, it's actually most of the BI tools were only working on downloading a copy of the data.
Starting point is 00:19:22 And Looker, one of the first tools who was working on a live query mode, and right now it is standard for all BI tools. So this thing was done really right. And another thing, they just introduced this data modeling in a new level, which allowed you to define all the relations between all of your, basically, tables.
Starting point is 00:19:51 But in fact, it's more like cubes or in Looker, it's so-called views. And also relations between them are called explore, which is where you can define those like joints right so and this is was done done
Starting point is 00:20:10 video right and it it felt like basically it's really
Starting point is 00:20:16 great demand for it at that time from what we already discussed
Starting point is 00:20:22 and what looker can do better is basically visualization part, which is in fact not really so great. And if you compare it with Tableau, it's very limiting.
Starting point is 00:20:36 Yeah. But in fact, the modeling layer itself is actually very powerful. And if you separate this as a separate product, it makes sense to have it in general. Stas Miliuszak Woldanewski So how model it happens in Q?
Starting point is 00:20:58 And it's like, do you have a landmarks defined similarly to look at mail? Or you do it have a language defined, like, similarly to Looker mail? Or like, you do it in a different way? Gabor Karinawitschina- Yeah, I mean, we have a concept called cube that's how you get a name. So the cube is similar to cube in Looker. So it's basically the reference to the table in your data warehouse. In a more simple way, it just like selects start from specific table, right? It's just a reference to some physical table in your database.
Starting point is 00:21:37 It can be a little bit more complicated. It's like you write a select statement, right? And then it's a table by the end of the day, but it's just like a more like a dynamic table, right? That we define and apply. But every cube is backed by the table. Once you have a cube and a table, you define a measure. Measure is basically the reference to the columns,
Starting point is 00:21:57 but you apply aggregations to these references, right? Like the classic example would be, you have like a product, you know, like a mount column, right, you apply sum and then the total amount of the products, right, that's an aggregation, that's a measures, they always have aggregation. And then you have dimensions. Dimensions usually just mapping to the columns, right, as a properties of a single row, very common as the same as lookML does. And then in Kube, you also can define relationship between Kubes.
Starting point is 00:22:29 So if you have a Kube orders, you may have Kube users, right? And then you can define that users, they can have many orders, right? And orders, they always belong to the user and you can define this relationship. That's useful when you query that, when you want to get analysis, right? You know, like of orders by users, age or by users, country, right? A Kube already knows about this relationship. Kube knows the data graph and then can construct the correct SQL. We didn't have a concept of explore, the same as a looker, but we just define the relationship
Starting point is 00:23:06 of the looker and the cubes on a different level. I think we are thinking it more... there is a new interesting project from a looker called Malloy. I think we are looking at more like in direction of this, like more like closer to Malloy from the joints perspective rather than the look of it, but it's all, you know, like details, but overall it's very the same concept of making the data modeling with cubes, measures, and dimensions. Stas Miliuszak Woldekewicz It makes sense.
Starting point is 00:23:37 And how is the user interacting with that? Is it like a markup language that is used or is it like a user interface and it's more like, let's say, like UI driven, like how is the experience that the user has like with the product? Yeah. And we are, we are indeed both engineers, so we did it in code. And I, I also believe that that was one of the greatest innovation of Looker since we mentioned it. They put data modeling in a code.
Starting point is 00:24:10 The data modeling being in BI for a while, right? But it's usually been done through the user interfaces, you know, like a drop and drop. And I believe that should be done in code. I believe we should apply best software engineering practices to the data management. And it all starts with putting that in a code, putting that code under version control, so we're doing that in code too. For our case right now, we're doing more like a JavaScript, JSON-based data modeling. However, we got a lot of requests from the community to support YAML as well,
Starting point is 00:24:41 you know, like, which is, Looker is doing YAML-based, right? So I think we'll support YAML as well, you know, like, which is, with Looker is doing YAML-based, right? So I think we'll support YAML soon too, but right now it's like a JavaScript, JSON-based. Stas Mislavskyi. And okay. Based on what you're saying, like we are talking about like some pretty technical artifacts that have to be managed, right?
Starting point is 00:25:02 Who is the owner of these artifacts? Is it like an analyst? Is it an analyst? Is it a data engineer? Who is, let's say, the person inside the command that has ownership and is responsible for managing these artifacts that come from Q? Yeah, let me take this one. This is a great question. And basically, data ownership is, I guess, one of the emerging areas at Kube.
Starting point is 00:25:29 So in fact, as you may imagine part of your data model is defined. So, in fact, what you usually see with the teams, in fact, teams can apply the very same forces they apply to code. For example, you can have multiple directors and multiple departments, data engineering departments can have different
Starting point is 00:26:09 tools on merging PRs to those directors and different policies. You can set up those policies at GitHub using code on your files, for example. But you can use even more advanced workflows if you have different enterprise tools for your
Starting point is 00:26:28 basically source code management. But the idea is quite simple. This data modeling clear, in fact, will tap governance from different data engineering groups. So what we are also working on is metrics, like our data catalog. So in fact, it's basically the catalog feature will allow you to see who is owner of different pieces, like different pieces of your data model. Yeah, so, but it's still in progress. Yeah.
Starting point is 00:27:09 All right. So we have a model that is represented in a JSON construct that's near monitoring. And I would assume that like what happens is that this piece of JSON gets translated into SQL and then executed on the data warehouse, and the results of that are written back to the user, right? Right. What is the... like, in your experience, like, so far, because there's some code generation, like, that is involved there, and the reason I must, because of my experience with Lookout, because things
Starting point is 00:27:46 kind of like re-carried at some point. It's really hard to go and do any kind of debugging on the data warehouse, right? When you have so much auto-generated SQL. So how do you take care of the developer experience around that when it comes to debugging and being able to understand at the end what my tool is doing, right? You actually spot on that. That's a big problem. And it's interesting that we realized that too, because Kube started as an open source project, right?
Starting point is 00:28:27 And we had a community, we have a community now, and we started to get a lot of questions about how to debug, how to understand what's an opportunity for us to build a commercial product because we were like, when we were back then, we didn't have a commercial product and we were thinking about what we would build, how our commercial product would look like, because it's, you know, like, it's always a question for, you know, like for open source companies, right? Like how you go from open source product to the commercial product, what features you should have, right? That's every meeting when open source product to the commercial product, what features you should have, right? That's every mid-season when we were raising the ground asking us like,
Starting point is 00:29:12 you know, like what features are going to build in a commercial product? And back then we did it all friendly. Like we all will figure it out. And I think there's like a deep Bible trace and developer experience. That was the biggest one. So we actually rebuilt a good set of tools to help, you know, like navigate queries, how they've been executed, cross-reference them, to go to data model, to look what's happened. Did it cache?
Starting point is 00:29:30 Did it get, you know, source database? How did, what was the queue status? All of that, you know, like things. Frankly, it's not easy answer. It's like build this specific tools. It's usually built a bunch of tools that can develop first, the data engineers can use, you know, to debug this issue.
Starting point is 00:29:47 And we're still building that. But that's, luckily, that's been a really good, you know, like sort of differentiator for commercial offering. That's awesome. And did you, like, what's your, let's say,
Starting point is 00:30:01 interaction with the data warehouse itself and some of the ergonomics of the data warehouse? Typically it has views, defining views, materializing views, which is something that sounds important when you want to work with the performance. So what are you doing there? And how easy it is to use that stuff from the data warehouse? And those are all great questions.
Starting point is 00:30:36 So in terms of performance, we have really multiple layers to achieve the result. The very first layer is basically this one lies pretty much outside of the cube. We just discussed it in the beginning. So you can transform data before the cube. And people usually
Starting point is 00:30:56 would use dbt or else or any other ELT transforming tools just to get the data in shape before getting to to cube. And at this point there will be a views, materialized views created and just in a cube view just write select from these views. But this is just the first step. But when it boils down to joins
Starting point is 00:31:28 and many relationships, it's where it can get really complicated. And at this point, we provide two-level caching systems. First one is in-memory. It's enabled by default. So it's basically every query result which you're going to queue and back will be cached.
Starting point is 00:31:52 And there are rules based on so-called refresh key. It's either time-based or SQL-based that can tell you for how long do you want to cache those results. So this is one layer. This layer is basically aggregate awareness. In Qubits called pre-aggregations. So you can pre-aggregate the data which lies in the warehouse.
Starting point is 00:32:17 For example, there is a complex join and you have materialized use in your snowflake, but in fact, you're joining them. And there are various compositions of the joins. So you can join this in a data warehouse, and persist as a rollout in our cache, in kubecache. Kubecache in Claro is called kubestore and it's basically really highly scalable cache. It's designed to store like billions of rows of raw data. It's not designed to store raw data. So there is a single purpose for KubeStore is download pre-built
Starting point is 00:33:05 rollup and serve it effectively. So in designing to store pretty sizable rollups in size, it can be like hundreds of gigabytes. And so in basically queried on a scale, so the latency time is low. So we are aiming at second response times. Okay. That's super interesting. And this casting is hosted by Kube.
Starting point is 00:33:35 Like how does the user like interact with the cast? Like something that like lives on your own, like on your systems? Like if someone wants to have something on-prem, let's say like, how does this work? Yeah, it's a great question. So the cache itself is an open source. So those parts are open source. So what we offer in a KubeCloud is basically an enterprise runtime.
Starting point is 00:34:04 So for these technologies, which is, for example, for KubeStore, it's available on demand in very same way as you access BigQuery. So you don't host BigQuery, right? And you are using
Starting point is 00:34:19 in a serverless way. And this is very same for KubeCloud. But essentially, technology is the same. The implementation reference implementation is the same, but the runtime is different for kube cloud. Okay, super cool. All right, guys. I asked like, quite accurately, let's say one problem in technical questions. And I want to ask something before I give the microphone back to you, where it has to do with your journey. Like I see you like all this time and I hear from like two deeply
Starting point is 00:34:58 knowledgeable people about what the market needs and what the technology also needs to have. Right. And it takes time to put that, right? I mean, I'm not going to argue like how smart you are, but it doesn't matter. I mean, like experience, experience and pull will take time, right? Like we cannot fight like our time dimension, unfortunately. So that was a little bit about like your experience, how you ended up like starting to like two years ago, if I remember correctly, based on like LinkedIn and what you've done before and how these things are linked together.
Starting point is 00:35:40 Right? Yeah. I would love, love to share our journey. So actually Pavel and I, we met when we started to work on a company called Statsbot. And that was a company that was before Cube. And the idea for that company was to build a Slack application. It started all as my hobby project. It was 2016, so as Slack, or even 2015, so Slack was, you know, like growing really fast, mobile companies started to use it. And we started to use Slack at my old company. And I was running an engineering company called Headnumbers, where like doing educational tech and rebuilding software for schools, for kids in schools, mostly on the East Coast. And we were using Slack a lot.
Starting point is 00:36:33 And what I wanted to build, I wanted to draw the Slack applications that can get data from different places, from different systems. I think we used New Relic and some Spanel, maybe Google Analytics. I mean, we had a bunch of Postgres databases. So what I wanted to do, I wanted to get data from all of these places and just put it on Slack or something, you know, like a control plane and monitoring QI. So I built some sort of a Slack bot, Slack integration. I put it out on Reddit after I realized that it was very helpful for my team. So I put it out on Reddit, Patreon, all of these places so people can start using it. And people started to use it.
Starting point is 00:37:12 That was fun. And then Slack reached out. And they were saying that they wanted to launch a Slack application directory. And they know that StatB bot was one of the bots that had been already used by different teams, so they want to put it, you know, like on a, on as an application directory. So that was great. And I was like, yes, let's do that.
Starting point is 00:37:34 But that came a lot of traffic, like a lot of people started to use it. And Power was one of them, you know, like she started to use it. Then she texted me, we kind of chatted and then she joined to help me because I was really like, I had a Ruby on Rails application and that was not scaling well. And it's like, I really need, I really need as many hands as you know, like, as I can get to, to, to be able to scale that. So Pavel joined, he needed a lot of magic and it started to actually be stable. And then VCs started to reach out and they were like, would you like to raise money for that project?
Starting point is 00:38:18 And we didn't consider that at first because we thought that would be a hobby project, like a side project that could, you know, like make some money. But then we thought, like, let's give it a shot. And I was in LA back then. I moved up to San Francisco. We, we went, we went to the 500 startups accelerator. So that was, that was a lot of fun. That was my first exposure to really, you know, like the startup ecosystem. And that was, that was like a lot of, you know, like accelerator, you know, like sort of, you know, like style, hustle, and all of that was really, really fun. And we, we continued to work on StatsBot after the accelerator, but over time, you know, like we were like building more and more technology that eventually became Qube because we needed a technology that would help us to get data from different places and to build some sort of a data model layer.
Starting point is 00:39:05 And then we would apply semantics with the natural language so people would be able to query that through the Slack. We added support to Teams. We even started to build support for voice apps because we thought that maybe voice could be a big thing. So the idea was for us how we can build an engine that can potentially support main interfaces, right? Exactly the headless BI use case.
Starting point is 00:39:27 But over time, we realized engine was more valuable than product that we were building around it. Some of the companies started to work without digging deeper to understand how they can use the cube as an engine to power some of their internals projects or some of their external customer facing applications. So we started to dig deeper. And eventually that led us to the idea that let's give it a shot. Let's open source the engine and let's see. Maybe people would find it useful on its own.
Starting point is 00:40:07 So we open sourced it and it clicked. So people started to use it. And when we saw that and we just decided, let's just kind of pivot from stats about to cube because it seems it's just a bigger, bigger thing and frankly, just more fun to work on. I mean, we are all engineers, right? And it's like building more engineering thing. It just like sounds simple.
Starting point is 00:40:27 It's just more interesting. Right. So then like that Slack application was fine, but I mean, you can't compare it to the, you know, like the data engine. Right. So yeah, that that's the story. And, and then, you know, like we've been doing all the open source for a couple of years and then only recently know, like we've been doing all the open source for a couple of years.
Starting point is 00:40:45 And then only recently we released cube cloud, our commercial product, which is super, super early still. Okay. That's awesome guys. Like I hope people like we get inspired by the journey. I think it's very important to hear like when you build stuff, there's always a genuine role and the main event takes time. So like there's no, like, you know, the fairy tale of like waiting out for a new forming and coming up with an idea and suddenly the next day, like you change the work, like it doesn't work like that. So I think it's, it's important for people like to hear this kind of experience. It's like, right.
Starting point is 00:41:22 And you think you're just focusing on technology, but also like we, you know, like this monastic, like the journey, like the growth and like everything that is involved for sorts of like, extremely, extremely valuable for people like to, to share. Yeah. So thank you for sharing that. Yeah, sure. All right.
Starting point is 00:41:41 All yours. How exciting. No, congratulations. Yeah, that is a super exciting journey. One thing I'd like to do, so you did a great job talking about LookML and some of the things that they did right and some of the things that they did wrong.
Starting point is 00:41:59 I actually want to zoom out a little bit on the BI space and almost play devil's advocate to cube. If you would, you know, if you would entertain me. So let's say that I'm building like a modern BI function, right? So like I have a data engineer and they're sort of doing all the stuff that's upstream from cube, you know, and my warehouse is in a pretty big place. And at this point, I have a huge number of options to choose from out in the marketplace, right? Like, you know, people who have been around for a long time,
Starting point is 00:42:35 you know, may sort of choose to run their, like, Tableau playbook. There's Sigma. There's, you know, sort of the Metabase, you know, sort ofase open source visualization pieces. There's metrics layers tools that feed directly into it. You could use tools like Hex. What is the motivation for me to use Qt?
Starting point is 00:43:07 And specifically, maybe like, do you see, and I guess that's also like in a sort of self-service context, and I know you've said like you do a lot of editing analytics as well, you know, which there are a lot of tools that, you know, sort of drive that drive that piece as well. But like, what is the motivation for someone who's out there browsing the analytics space to adopt the Q methodology? Yeah, this is a really great question. And we were asking ourselves at the beginning, really why people would use it. And it's actually, we're starting to build it, give it an
Starting point is 00:43:47 advance of people and see what the answer is. And the answer is pretty interesting. So there are multiple use cases here. So first of all, you just told us like a whole bunch of tools. You need to choose the vendor and commit to it before it's right. And even if you didn't work with it a lot, you're going to do a lot of investment beforehand. So the thing is, people actually want to avoid when they're looking.
Starting point is 00:44:42 And the use case here, as I already mentioned, they want to avoid defining security at BI level, which is very costly to redefine in any other BI tool if they're not sure if they're going with a Tableau or QuickSight or they want a Metabase. Or in the case they have multiple BI tools, it's even cost-perper-activity to have really all the security controls set up in every BI tool consistently. There is no way you can do it like an organization which uses multiple data tools and BI tools. So security controls is one of the most significant parts here for internal data consumers.
Starting point is 00:45:27 Another part, which is you already mentioned the embedded analytics part. And so as we started with this use case, it's actually a lot of people at first deployed Kube to their customers as customer facing analytics, and they started to ask, can we use the same for our internal app, just not to redefine the data model? So, and that's when the SQL API thing appeared like in the first place. So, but what we are right now, what embedded use cases, it goes even further. It's basically people starting to ask, can we provide this SQL API to our customers?
Starting point is 00:46:14 So the customers want to use their data tools, but they can't connect to data warehouse directly because for security reasons. And the data is not theirs there, but also other customers and there is no really great security controls in place to limit access of different terms to the data. But you perfectly solved this problem as well. Yeah. So that's why. Stan Mallow- Kata, super interesting. And I'd love to know, could you dig into a little bit more around like the methodologies
Starting point is 00:46:51 sort of that people who migrate to Qt for embedded analytics, you know, serving analytics in their application? Like, what are a lot of the sort of methodologies or technologies that you see companies using before they migrate to Qt for embedded analytics? I think we see a lot of what we call integrated BI's. So, you know, like at SideSense,
Starting point is 00:47:16 good data, power BI, they all have embedded offerings, right? The problem with that is usually because they are coupled with the data modeling, they are coupled with the rest of the BI stack. They are very, very inflexible. So it's really hard to customize.
Starting point is 00:47:35 It's really hard to make them look and behave the way you want. So you have a lot of restrictions on what you can do, what you cannot do. And they're always not very fast for exactly the same reason, because the coupling, their processing engine with a visualization and you started to have like 20 charts on the page, and then it started to be extremely slow. So it's clunky, not flexible, not customizable solution. And that's a very common sentiment we're hearing from a company that wanted to migrate to something more fabulous.
Starting point is 00:48:12 I guess, we started this whole conversation with Padlet, who is an example of Padlet CMS, right? For example, imagine moving from Adobe CMS or a bunch of customized WordPress to something more like the modern stack, right? Like where you have Contentful and then you have a bunch of MagJets, Gatsby, BTS applications. So that's pretty much the same story here. Got it. Makes total sense. Yeah, that's super interesting. One thing I'd like to do in our last little bit of time here is talk through your views on roles within a data organization or an analytics organization. So, you know,
Starting point is 00:48:56 in the last couple of years, this concept of like an analytics engineer has been popularized, you know, I think by DBT and other, you know, vendors in the space, you know, it's just sort of like a hybrid role between like an analyst and a data engineer. You know, you have sort of pure analysts, right? You have, you know, data engineers who do some analysis or like feed some analysis. Within the context of Kube, how do you see teams using Kube? And then what do you view as sort of the ideal role of analysts within the data organization? Yeah, I think that's a really good question. That's the unfortunately not hard yet answer. it's, it's all, you know, like, it's always not a point of time, right? It's always like evolution at some sort of vectors that, you know, like
Starting point is 00:49:52 as technology changes, the behavior changes, you know, like the roles are changing and I think that analytics engineering is a great idea, frankly, you know, it's a great to have, you know, like to start to see that role. And I think the main, one of the main drivers here is an idea of applying best practices of software engineering to the data, right? So that movement and that, you know, like created this role basically. And I think we're still trying to understand, you know, like how that should fit in the industry into the knowledge landscape.
Starting point is 00:50:25 I think for QoF, we mostly see data engineers, analytics engineers, or application developers being the main like owners of the project, like those who are building to know like data models and then on a more like a consumption layers, right, like internally, we usually consider, like, our users into two groups, like more like builders, developers,
Starting point is 00:50:49 and then consumers. Right. So builders, developers, and really doing engineers, analytics engineers, so like application
Starting point is 00:50:55 developers, are based on use case and, you know, like type of the company. And then, well,
Starting point is 00:51:02 consumers could be front-end developers that actually, you know, like if you're building embedded analytics, you're mostly going to consume, you know, well, consumers could be front-end developers that actually, you know, like if you're building embedded analytics, you're mostly going to consume, you know, like as a front-end developer, you're going to use Kube's API, or if you're using more like the BI dashboard in Fisopad, so that could be analytics. So for us, analysts would be more like consumers rather than owners of the cube.
Starting point is 00:51:28 But it would also depend on the structure and the culture of the organization. Because in many cases, a main organization would try to build a model where analytics engineers and data engineers would build all the models and sometimes would build the dashboards too. And then we'll just give dashboards to the business units. Right.
Starting point is 00:51:49 And then ideally the business units would be able to look at the dashboards on their own. So sometimes there is really no like an analyst role in some organization, but if there is, if there are any analyst role, it's more like a consumer for the cube, I would say, rather than those who build the data model. But again, every organization is different. So sometimes we see data analysts building models as well. Got it. Super interesting. Yeah, the space is changing in so many ways. And with all the new technologies, you see different roles and different tools and all that sort of stuff.
Starting point is 00:52:25 I'd love to ask a question about visualization, you know, because if you think about one of the core value propositions of Qube, you're decoupling, you know, sort of the data layers from the visualization layer. Have you found that that actually helps drive better visualization? Or how has that impacted visualization?
Starting point is 00:52:46 I mean, visualization is one of those interesting components of analytics where you can stop visualization with good data or bad data, right? Like, good data makes visualization easier, but you can still screw it up, right? Like, good data is not a guarantee for good visualization. But it seems like the approach of taking a cube
Starting point is 00:53:12 where you're sort of modeling the data and making it available via API can have an interesting influence on the way that people do visualization, no matter what tool they're in. Have you seen that to be true, or how are you seeing cube impacted impact, you know, sort of the actual materialization or visualization in dashboards?
Starting point is 00:53:31 Yeah, this one is a really great question. And in terms of visualization, there are actually multiple layers to it. So first one, if you're talking about visualization, which is for embedded analytics, and it was the very first value proposition of Q, where you can build
Starting point is 00:53:58 very custom-made and custom-tailored visualizations. So basically, you can build a very product-native experience for your users, which can be embedded in your application and those users can even do not realize it's something like behind the scenes, like such a cube, which is providing this data and there will be no way to distinguish it from your product. As opposed to most of embedded BI solutions where you just embed an iframe and that's it.
Starting point is 00:54:37 And you basically in your product you're looking at the iframe dashboard, which is looking quite a bit different and it doesn't feel native. So, and then it was very first value proposition of Kube and why it was successful in the very beginning. But then we started to realize it's actually as good as it is to provide this custom tailored visualization where it's also time-consuming for people to build those layers because those require front-end engineers, which is not cheap. And in fact, people wanted to use like actually data engineers to build like dashboards and data analysts at the first step. And that's why also SQL API started to pick up in an embedded space. So people just want to start with simple visualization, like embedded
Starting point is 00:55:37 iframe based on superset or Metabase as a first version to just validate the product and not to spend front-end team resources. But data model itself is not so hard to build. So they're building it in any way. And in the point they realized they want really custom charts for some of the product, it's just replacing them one by one.
Starting point is 00:56:04 It would happen if they feel of a part of their product. It's just replacing them one by one. It's what happens if they feel they need a different level of quality for their product in these places. Henry Suryawirawan, Phelps University, Pavel, thank you so much for that answer. You know, it's funny that you mentioned iframes. I think about like some of the worst web experiences I've had. And iframes are are a serious offender. Although they do make a lot of sense as an MVP for embedded analytics.
Starting point is 00:56:35 But yeah, it's super interesting to hear that that's sort of the flow of actually implementing embedded analytics, which is super interesting. And super interesting that they're starting really simple, even with like the SQL query stuff, right? That way a data engineer can produce like really amazing stuff. So it's super interesting.
Starting point is 00:56:55 Well, I think we are at the buzzer here. But guys, this has been a really incredible conversation. I've learned a ton. And amazing to see what keeps happening in the analytics space and the stuff that companies like you are building. So thanks for telling us about that today. Definitely. Yeah, thank you for having us.
Starting point is 00:57:16 That was a really, really great conversation. Thank you for all the questions. They were great. Costas, as always, that was a fascinating conversation mainly because of me having to move to another room and you hearing the cathedral-esque echo in my microphone.
Starting point is 00:57:36 That was no extra charge. Yeah. It felt really good, to be honest. It's like this feeling of peace that you get when you're inside the church. Yeah, it's a natural repose. So you're welcome to that. No, I think my biggest takeaway...
Starting point is 00:57:54 I think my biggest takeaway, actually, one thing that was really interesting was the core value prop of Kube is embedded analytics, which makes total sense based on the discussion we had around headless but it's fascinating to me that they see this current of adoption around people using it for their own bi right which is which is just super interesting. And I think in many ways, I think that that is possibly reflective of a product that's really hit a nerve with sort of a workflow or a paradigm for how to manage certain things, right? Where users will actually pull additional use cases out of the product, you know, without the company even really thinking about it. So that was really cool. That was my big takeaway.
Starting point is 00:58:46 Yeah. Yeah. What I would take from the conversation that we talked about is how products change because of the impact that engineering has on how we do things. And that's not to do with too many things that are happening here. One is, first of all, we have more engineering roles out there. Pretty much, you start seeing more and more companies having engineers working in there where you wouldn't probably expect a couple of years ago. The other is that as software engineering is maturing and what
Starting point is 00:59:42 patterns and lessons learned from applying and trying to build products and technologies, some of these patterns are more universal and we can use them to also do other things, even if an engineer is not necessarily evolved. And that's what we see here with this decoupling of the visualization part with the modeling of the IER and then the start of the new part. So yeah, that's very fascinating. It's like a sign of progress, right?
Starting point is 01:00:18 And so the product is super interesting and I'm looking forward to chatting with them again, like in the future. And we have like many more things to discuss about with them. So looking forward to that. Absolutely. All right. Well, thanks again for joining us on the Data Stack Show. We will catch you on the next episode. Tell a friend about the show if you haven't.
Starting point is 01:00:41 Subscribe if you haven't. And you like this episode and we'll catch you on the next one. We hope you enjoyed this episode of the Datastack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rutterstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rutterstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.