Drill to Detail - Drill to Detail Ep.107 'Cube, Headless BI and the AI Semantic Layer' with Special Guest Artyom Keydunov

Starting point is 00:00:00 maybe set the scene a little bit really about what what was what was the product landscape like when you thought about cube at the time um and what problem did you initially try and solve within that within that market yeah yeah the good question i think to be honest the initial idea was what if we built a handless look ml but bring a different BI on top of it. So hello and welcome to Drill to Detail, and I'm your host, Mark Rittman. So I'm very pleased to be joined on the show today by Artem Kadyanov, CEO of Cube. So Artem, great to have you on the show. Thank you for having me today, Mark. I'm really excited about our conversation. My name is Artem, co-founder and CEO at Cube, started Cube in 2019. So really looking forward to today's conversation.

Starting point is 00:01:09 Fantastic. So we've been working together a little bit actually recently as well. We're a cube partner and, and certainly myself and the team have been very, very kind of excited to be working with, with cube products and cube company. So, but it's great to talk to you. It's always great to talk to the kind of, you know, original brains behind the, the product really. So it'd be quite good to go through in this episode, some of your background really, but then the area of semantic models is really hot at the moment. And I'm particularly interested to understand what your thinking is around this area and what's the differentiator really for cube product and where you see this all going. So, but let's start off really with your, I suppose, your kind of backstory so tell us about um tell us about stat spot which was a product that i actually used a

Starting point is 00:01:49 little while ago and i was pleased to see actually you know it was your company or you that actually was involved in that as well so what was stat spots and what was the story behind that uh yeah yeah good good to hear that you know like you you enjoy the product. So I started StatsBot, I think, in 2016 or so. And I was running engineering at a company that was building software for schools back then. And we had a good engineering team that was starting to use Slack a lot. And I thought, what if we turn Slack into like some sort of a BI tool, right? Like bring charts or analytics data into Slack, because we already spend a lot of time in Slack, right? So I built that integration with Slack and a few other places, you know, like Google

Starting point is 00:02:39 Analytics, databases, Salesforce. And so the stats bot was able to pull data from different places and just display that in uh in slack it was either like a real time or it was you know like kind of on a scheduled basis so and i started is as a just kind of side hustle project you know like but uh it started to grow quickly. Slack came with this application directory, and they reached out to me and said, hey, Statsbot is already getting some traction. We wanted to feature it in our upcoming application directory launch.

Starting point is 00:03:19 So they did this. I got a lot of traffic, got a lot of users. My co-founder with Cube, he's actually one of the first users of Statsbot, and he texted me and I was like, my Ruby on Rails application on Heroku is not doing well. Can you help me? He jumped in and kind of, you know, like started to help me.

Starting point is 00:03:39 So, and it was going well. And then at some point, VC started to reach out, like saying, okay, we see some traction in Statsbot. And it was, you know, like during, okay, we see some traction in StatsBot. And it was, you know, like during the days when a lot of people were talking about conversational interfaces, it was like magic, like SMS, like a text-based app, right? It was like a bunch of bots, Facebook bots. So there's like a little bit hype in a venture world about that as well. So VCs reached out and they kind of wanted to fund Cube, or fund Statsbot, sorry.

Starting point is 00:04:08 So we decided to do that. So we quit our jobs, we erased a little bit seed run for that. And I think it was a good run. I liked what we built with Statsbot.

Starting point is 00:04:24 I think the problem with Statsbot was its initial idea was just to build a side project. And it was really good at the side hustle, but it was not like a big venture story. And then Slack really kind of stopped growing at some point. It was all application built on top of Slack. So at some point we decided that we wanted to focus on a bigger opportunity. And at this point, actually, we started to look more into Kube. And Kube was an engine we built for StatsBot. Because essentially what we needed at StatsBot was to have a roll-up engine, relational roll-up, that can sort of generate every bi needs it right like so you generate some sort of you know like a sql

Starting point is 00:05:05 query from the multi-dimensional model and then you run it in a warehouse so we we thought what if we just take this in journals of stats bot what if we take a cube and put it out on a on a github so people going to use it you know like for the for the building their own data apps with building their own you know like analytics building their own analytics products. So we did this and it took off. So a lot of people started to use it and we decided just to pivot completely and focus on Kube. Interesting, interesting.

Starting point is 00:05:37 So I think maybe I used it at the time for doing GA reporting within Slack. So yeah, it was good. And there's also a company called all count as well that was involved in your your history um and that's where pavel your your co-founder came from so i know all count wasn't maybe sort of direct product of yours but but what's how does that fit into the story really yeah uh i all count my co-founder paul he was working on the open source project called all count before he joined me at statsbot so it was uh it was a rapid uh application building platform for

Starting point is 00:06:15 uh for accounting based on not just okay and i suppose that had has been stepping forward a little bit to the start of cube it was cube js KubeJS, I think, at the time. And was that kind of where, I suppose, the focus on Kube being embedded came from, or was it just coincidence, really? Yeah, yeah. I think the main reason is that our data model back then was JavaScript-based.

Starting point is 00:06:43 So right now we still have JavaScript, but we also have a YAML based model. And it seems the YAML based models are getting more traction and just more, I would say natural environment for data engineers. But our original framework for the data modeling was only in JavaScript. So that's why we had this JS in a name.

Starting point is 00:07:07 At some point, we decided to remove that because we started to see some confusion from people thinking that Qubit is some sort of, you know, like a JavaScript visualization library or something like that. Because many, you know, like charting libraries, they have a GS in a name, like a D3GS or chart GS, right? So like having GS in the name was not good for us. So we removed it probably like almost two years,

Starting point is 00:07:31 maybe one year ago. Okay, okay. So we're going to talk about Kube really in this episode. And I suppose the wider analytics market and semantic models within there and the position that cube hopes to have within that market really okay so so let's just take a step back so so when when again when i first heard about cube there was a lot of talk about about headless bi um and and metrics layers

Starting point is 00:07:57 and and so on really so i suppose um maybe set the scene a little bit really about what what what was the product landscape like when you thought about cube at the time? Um, and what problem did you initially try and solve within that, within that market? Yeah. Yeah. The good question. I think to be honest, the initial idea was, what if we built a headless look about? So, and, uh, we were like big fans of the Looker product

Starting point is 00:08:25 and we have many Lookers on the team right now. So I think the Looker model is just a great product and I have a lot of respect for the team. So we thought, what if we build that? And that was the idea. What if we just build a headless data model? And we started to think think what kind of use cases are people going to use it for, right?

Starting point is 00:08:49 When we released it in open source, we started to see most of the people that were building embedded analytics or like interactive data apps, right? Which felt natural, right? Just you have the data model that you can run on top of warehouse. It has all the data modeling capabilities,

Starting point is 00:09:05 but it also has some sort of SQL execution engine with some caching. And then you have API. And then you can build your own charts, like a React with Chart.js or something like that. So that was our major use case back then. And then probably around 2020 or 2021, more and more people started to talk about the metrics layer and headless BI as a term and semantic layer. So we started to see some ideas of people wanting to take something like a LookML but bring a different BI on top of it, right?

Starting point is 00:09:45 Not only do embedded analytics, but bring a different BI, like a Tableau superset on top of LookML model. And there were like some blog posts talking about this idea, like a headless BI or something like that. And every time these blog posts would come out, people in our community would point to it and say like, hey, that's exactly what Kube is doing. We're like, yeah, that sounds about right.

Starting point is 00:10:08 And we started to think more about that use case. And we started to see, you know, like see some pull from a community. And I think there's several companies, you know, like speaking of landscape, there were several companies trying to do that. There was one great project, the MetriQL. One probably was one of the first that, you know, like trying to say, okay, what if we build a data model and then, you know, like let different BI tools to connect to MetriQL. MetriQL was like Presto-based. So they were exposing like a Presto SQL interface and things. that was a good idea. And then there were companies like Supergrain, Transform Data.

Starting point is 00:10:46 I think it was a few others. And we at Kube, we started to think a lot about that use case as well. So I think at some point, from a naming perspective also, it was a little bit like a chaos. Some people were using term headless BI, and we at Kube were using that as well. And then some people were using metrics layer, metrics store, and then semantic layer. And now I feel like everything is converging

Starting point is 00:11:12 to like semantic layer, which is good. So we have a one single term right now, but it was a little bit interesting times, like two years ago, you know, like there's a lot of companies, projects getting into space. Okay. So I suppose what are, There's a lot of companies, projects getting into space. Okay.

Starting point is 00:11:31 So I suppose without getting into Kube's particular details at the moment, what are the, I suppose, the generic challenges in trying to build a headless BI, a headless semantic model? And I suppose, why also would a company want to do that, want to use that rather than, say, just using LookerMail and Looker, for example? So what are the challenges and why bother, really, I suppose, in some respect? Yeah, yeah. I think one, why and what is the value, I think,

Starting point is 00:12:00 if I would try to summarize that, I think it's overall idea of bringing software engineering practices to the data management. And one big best practice is dry, meaning that do not repeat yourself. And what's happening right now is when we use multiple visualization tools, every visualization tool, it has some sort of data modeling layer, right? Like it's very advanced in the looker, but Tableau has it, Power BI has it, 2% Metabase, every BI they have it. And what's happening is that we're repeating ourselves in these places. Every time we do new visualization visualization every time we bring another tool or even embedded analytics right we we repeat the data modeling in that layer so idea is do not

Starting point is 00:12:54 repeat yourself and extract the data modeling upstream into some sort of a component that provides a unified consistency and accuracy so that's an idea behind semantic layer and why to use it. And what a challenge is. Because, I mean, I guess I would imagine having to support lots of different databases, you know, different BI tools. I mean, I imagine it's not a trivial thing to do, really, is it? Yeah, yeah, exactly. I think that the main challenge would be around the BI tool because coming back to the thing that every BI has its own semantic layer, usually this semantic layer enables you, it controls the user interface and it controls the end user experience. Like we mentioned Looker a few times already, but let's take Looker as an example, right?

Starting point is 00:13:46 We have an Explorer in the Looker. So every time I create an Explorer in a LookML, it will pop up in my list of Explorers, right? And then I will go into that UI. So like everything I'm doing in the data model, sort of, you know, like it affects my UI and for myself and for like end users. So that's a challenge is like,

Starting point is 00:14:07 we still need to have this semantically on a BI level because all the like BI controls are connected to it to provide a native experience. Right. So if we try to bypass that, then the experience of the end user would be very, very bad. Right. Like it would not be native. So I think the challenge is like, then the experience of the end user would be very, very bad, right? Like it would not be native.

Starting point is 00:14:30 So I think the challenge is like, how do you implement semantic layer, but still keep the same native experience for the BI for like end users? Okay. Okay. So let's get into the detail of Kube then really. So you mentioned about how I suppose StatsBot was an inspiration and some of the kind of the roots of Kuber in that. But you went down the route, I suppose, of open sourcing Kube, didn't you? So maybe talk about the first couple of years, really, and when it was Kube.js

Starting point is 00:14:54 and how you kind of, I suppose, leveraged the community a little bit in this really as well. So maybe the starting sort of origin story there would be quite interesting. It was a lot of fun. So it was mostly me and my co-founder running the open source project. We were just doing mostly three things. Writing code, obviously talking to users.

Starting point is 00:15:21 We put a Slack out there. So that was good. Every time people will run into some exception or some problem, they will join our Slack and ask for a question. And then we'll use that as an opportunity to build a conversation in relationship with that users. And this way we'll be able to learn how they were using product. So we're writing codes, talking to customers and just blogging a little. You know, just how do you solve this problem with Cube? How do you solve that problem with you? How do you use Cube with that technology?

Starting point is 00:15:53 So blogging was really a way for us to attract new users. And by talking to users, we were able to shape the product. So it was just like pretty much the slope, you know, like put a blog out get any users talk to them write code and then repeat let's go into i suppose some of the the detail of how cube works okay so there's different layers to cube aren't there there's like caching and there's apis and pre-aggregations and so on maybe just talk us through a little bit about how the product works really and how you the choices you made over how it's architected and built.

Starting point is 00:16:26 When we talk about Kube, we usually talk about the four layers of a product. And the first layer is the data modeling. And pretty much all other layers, they're all coupled to the data modeling. In a data modeling, you define your data models, right? Hence the name. So in a Kube world, we have two objects for the data modeling cube, you define your data models, right? Hence the name. So in a cube world, we have two objects for the data modeling.

Starting point is 00:16:49 One is called cubes, and the other is called views. So the cubes, the purpose of cubes, this is a business entity. So you take a user as a business entity, you take an order, transaction, you define what measures, what dimensions these business entity, you take an order, transaction, you define what measures, what dimensions these business entities have, and then you also define the relationship between them. One to many, many to one, all of that. So you're sort of building your data graph of your business entities. And then views, the job of views is to act as a data marks or slices of data. So you can take some measures and dimensions and specific cubes and then present them as an interface to the end user, some sort of curated data sets.

Starting point is 00:17:37 And you also on a views level, you can control the joint paths because the cubes, they constitute the data graph, but it's not directed. And views, you give direction to joints on a views level because potentially you may have multiple ways you direct your graph, right? So on a views level, you can control that direction. So that's a fundamental idea of the data model. Then on top of this, we have access control, which is coupled into data model. So every time that we execute a query in cube, we execute it in a context of some security context.

Starting point is 00:18:15 We call that idea security context. So security context can affect the data model, meaning that for different users, you may have a different version of a data model, which could be row-level security, column-level security, just remove entire set of measures, entire set of dimensions. So your data model can be flexible based on the context of the query. So that's how we implement access control. And then we have a caching layer. The caching layer is, idea is aggregate awareness. So Kube can build the aggregates based on measures and dimensions.

Starting point is 00:18:54 Kube has its own storage for the caching. So you can, Kube can aggregate, it runs the initial aggregation in a data source and then downloads the result into its own cache. And the aggregates, again, they're like tables, right? It's a relational cache, so meaning that it potentially can serve a lot of different permutations of measures and dimensions in a query. And then Kube can refresh that either with its own job scheduler

Starting point is 00:19:23 or you can use orchestration like Airflow. And then final layer of the product is API. So we have REST API, GraphQL API. That's where we started. People usually use this API if they wanted to build embedded analytics for interactive data apps. And then we have a SQL API.

Starting point is 00:19:41 Our users use SQL API to connect with BI tools. Okay. So How does Kube translate queries against your data model into SQL that can be sent to BigQuery or Snowflake? How do you do that? How do you handle the different dialects and so on?

Starting point is 00:20:00 Yeah. We first build a multidimensional query, right? Like measures dimensions. And then we translate that into the SQL based on the different dialects. And the cube has a concept of a driver. So every time you need to support a different data warehouse, you need to build a driver. And a driver, it needs to implement the connection, you know, like how connection is done technically, right? Different protocols, all of that. And then it needs to implement the SQL dialect as well. So because sometimes, you know, like a predicate, you know, filters, like all of that,

Starting point is 00:20:43 they just have a little bit different syntax, right? So that's how it works. So every time we wanted to introduce a new support, a new warehouse, a new database, we need to implement the driver. And because we open source community implemented a lot of drivers already, which is really good. You know, like we have this luxury. So yeah, Kube, that's how the Kube works on that side. Queering side, it's interesting where like the SQL API, like it's because Kube pretends to be a SQL database too, right?

Starting point is 00:21:17 So your BI can connect to Kube. So like Tableau would connect to Kube as to Postgres database or like a superset would connect to Kube to a postgres database or like a superset would connect to cube as opposed to this database the the missing piece here is a notion of a measure right because cube has measures inside it but sql spec it doesn't have idea of the measure so here we're like extending the sql spec and we're adding idea of the measure so So the measure is just a spatial type of the column, and it has a spatial function which we call measure as well. So meaning that

Starting point is 00:21:51 when you query cube and you say, I wanted to get this measure, that means that you don't need to do any aggregation, any calculation with that measure now, and it should be used those calculations that you have in your data model so we adding this extension to sql okay okay so i think probably

Starting point is 00:22:12 imagine if i was listening to this and i was uh the audience i'd be thinking well but how does this compare to say um the dbt metrics layer or say you know looker and looker mel um so so with the dbt metrics layer and i suppose the classic version and you know maybe what's coming with with the transform um uh acquisition how is how is cube differentiated from though from that from that product area that that product sort of family and um and you know would cube compete with that or is it a complement it's generally how do people sort of understand the difference between cube and the dbt semantic layer i think cube is definitely complementary to dbt as a transformation tool and you know like i in many cases i encourage our customers to use transformation tool like dbt

Starting point is 00:22:56 upstream uh from cube uh with dbt semantic layer it's a little in flux right now because it's unclear how the end product would look like, right? We know the history. So dbt announced that they wanted to build a symmetric layer. That's how they called it originally. And then at some point, apparently they decided what they built was not meeting all the requirements. So they decided to buy the transform right and and transform is a great team they have a great product i think the question is like how these

Starting point is 00:23:32 two things are going to sort of converge right and like what is going to end product look like so it's a little hard for me to tell at this point it's like how it's going to be compared to cube because i don't know how what what is the end product will be there. But, you know, like I think fundamentally there's two questions in a semantic layer that I think different people have different views on how the metrics should be defined and how the metrics should be queried. I think in a cube world, and I already described all the cube news architecture, we are data set centric, meaning that we believe that the semantic layer

Starting point is 00:24:13 should provide as an interface, as a product, it should give a data set where it can contain multiple measures, it can contain multiple dimensions, but it should be a data set about some specific business entity or some specific business area right like users like oh you know like sign ups like the transactions work but what i saw from a previous situation and dbt semantic layer they thought in a more like a metric centric way where they would ship a metric and attach a multiple dimension to that, and then ship another metric, attach a multiple dimension.

Starting point is 00:24:49 So it's less a data set, but more like a metric-oriented. So that's one difference. Again, it may change with transform coming in, but that was before. And a second big area in a semantic layer is on how you query it. So Kube way of doing this is to query it through the SQL. I think that's the right approach. SQL is the language of data, right? Every tool knows how to speak SQL, so we should support SQL. The transform team had a little bit

Starting point is 00:25:22 different approach and a metrics layer from DPD had it too. So, the transform team had a little bit different approach and a metrics, and a metrics layer from dbt had it too. So transform team was using, I think it was called MQL metrics query language. So their own like a metric query language, which it was not SQL. And then with dbt, I think it was ginger based in that into SQL. So it's a little different.

Starting point is 00:25:44 So I think that's an interesting area. I don't know, again, how it's going to look like at the end state, but Kube was SQL first from the beginning, and I think that's the right approach. Okay, okay. So I suppose a combination of Kube and preset is something that we've had a fair bit of success with recently on client projects um maybe just talk about um i suppose the the particular kind of i suppose value in

Starting point is 00:26:12 integrating say cube with preset sort of superset and i suppose how cube we're investing in that area in the future really we we're having a lot of a lot of users and customers with superset and preset so that's why we're're excited specifically to build more integration with that BI tool. One area where we're working on improvements is we're integrating with superset semantic layer itself. So idea is, and we call this feature semantic layer sync. Idea is to let cubes data model to push it downstream

Starting point is 00:26:54 to all different visualization tools and synchronize cubes data model with BI's data model as well. So in a superset, you have data sets. So cube now can programmatically build and manage data sets and supersets and define all the metrics, define all the dimensions. So users, they don't have to put them manually. And every time you make a change in a Kube's data model, the Kube automatically synchronizes it with a data model in a superset.

Starting point is 00:27:22 And that gives a native experience to the end users. I think that's one of the biggest challenges in a semanticet, and that gives a native experience to the end users. I think that's one of the biggest challenges in a semantic layer implementation and how you give this native experience to the end users, and I think the semantic layer sync is a way to solve this problem. Okay, okay. So going back a moment to you talked about the caching layer in Kube, and certainly, again, from our experience, the caching layer in in cube and certainly again from from our experience um the the caching layer and you mentioned aggregate awareness there are i suppose particularly defining features of cube so maybe let's go back to that a little bit how does that

Starting point is 00:27:54 work and how does that how does it then give you career performance that can be quite fast you know you mentioned i presume you rewrite as part of that as well how does the caching layer work and what what's the end user experience like in the end when that's working properly? So I'll start with just high-level architecture, what we have. So in Kube, we have our own engine, which is used for orchestration and caching. So the first part of the job is to do orchestration

Starting point is 00:28:25 because kube instances, they are headless and they are stateless. So meaning that you can, you know, like it lets you horizontally sort of scale it, but they also need a way to, you know, synchronization point. So our caching engine, it manages a queue, execution queue.

Starting point is 00:28:46 This way it orchestrates all the queries, but also manages the caching. So the caching piece is divided into the router node and a worker node. So it's a distributed query engine where we store the cold storage is per K files and a hot storage, it's a memory of the worker. So the way it usually works is when you define your aggregate in your data model, kube caching engine will go in your data source say snowflake will execute a query in snowflake and then download the whole result of that query into its own storage then it will do some repartitioning re-indexing resorting and put it all into you know like the parquet file format

Starting point is 00:29:42 and then when a query comes to Kube, say from a BI tool or from embedded analytics, Kube will know that it has an aggregate. That's where aggregate awareness comes in, right? The Kube will have this knowledge that the aggregate exists for this specific query, and then we'll go and query the aggregate from a Kube store, which is our caching engine, and which is inherently much faster because we already uh pre-processed that pre-indexed everything can load it into workers memory

Starting point is 00:30:13 for this fast querying so that's why the cache is is really fast for that specific uh queries that are being processed by kube okay so this this reminds me quite a lot of the days when I used to work with tools like S-Space and Microsoft OLAP, when we had things like aggregate storage and we had things like calculation plans and aggregation at various levels in the hierarchy. I mean, do you have a background in OLAP? And is OLAP something that is on your mind really

Starting point is 00:30:40 about when you think about where the product is going? Yeah, I mean, my co-founder and I, we spent some time with BI systems like Mondrian and all of that. So, you know, like that's where we saw a lot of, you know, like implementation like this. I don't think we are, I mean, we try to use ideas from these tools because it's still multidimensional analysis, right? But I don't think we wanted to rebuild

Starting point is 00:31:08 like one of the old systems, right? Like we try to understand how the same ideas can be applied to the data warehouse-centric world. So I suppose we've been talking implicitly about kind of internal BI use cases for Kube, but certainly I suppose the origins of Kube were, you know, maybe access via API and the embedded market. Tell us a bit about the kind of type of customers that are using Kube

Starting point is 00:31:34 in an embedded context, and how do you get Kube to, how do you actually include Kube in your, say, SaaS application? How does that kind of work? There are a lot of tech companies using cube for that use case because as you can imagine every tech company they have some software they usually sell and if it's a b2b company right they usually have some sort of insights and monitoring features like a dashboarding features inside their applications because customers now they demand a

Starting point is 00:32:06 lot of uh a lot of analytics features in the product so we have a big cohort of customers who is using cube to power uh sort of a customer facing analytics inside their application so that's been probably the biggest uh you know like uh segment of our customers so far before we started to see more like an internal BI use case. And in that stack, you would usually run Kube on top of warehouse or SQL database and expose API to the front-end team. And then front-end team will build some sort of visualization

Starting point is 00:32:44 with tools like React or Angular and different charting libraries. We even ship some SDKs and libraries to integrate natively with React and Angular. Looking to the future now then, you've got a market I suppose, something that's been

Starting point is 00:33:00 a validation of your strategy is the fact that there are other players in the market now, and in particular I suppose you've got looker with their with their semantic with their new universal semantic model and look a modeler and so on um so i just wondered what where do you where do you see cube fitting into the market going forward and what's the what's the unique space that you think you guys would be in that would that would differentiate you from from say the other players um and it would make you a valid choice to be to be chosen in preference to say sort of something that's maybe maybe more of a kind of like a safe bet or certainly more of a known product

Starting point is 00:33:36 um so where does cube fit into the future do you think really yeah um i think that we'll see some of the semantic layers that are going to be not open, but more like a closed ecosystem. Like a LookML or LookerModeler, that's a good example, right? It's most likely going to exist mostly within Google ecosystem with a goal of selling more BigQuery, right? That I think one of huge difference that we wanted to take with Kube as a product, we wanted to create not only universal semantic layer, but we wanted to make sure it's open. So first of all, we open source, which is a big difference, right? Like in our core offering, it's an open source.

Starting point is 00:34:30 While we have a cloud product, but main features, many, many features, they're still in open source. So I think being open and open from any affiliation from the cloud vendors like GCP, also open from a code-based perspective, that's going to be a very big difference for the cube.

Starting point is 00:34:52 And to be honest, I think it's a safe bet for the enterprises and organizations because, again, the underlying technology is open source, right? And it's not going to work with big query or you know like if ibm decides to enter the warehouse market right like you can just build an ibm driver and use cube with that i think open source is a huge huge difference here do you think there's a do you think maybe the way the market might evolve in the future is actually not necessarily that you would choose one semantic model and that's it but you might for example link i don't know cube to looker through looker's new they announced that they were going to make it open up looker's semantic model via a sql interface

Starting point is 00:35:34 so you could imagine maybe cube running on top of that to maybe to make it easier to embed that data from there i mean do you think there's a kind of multi multi semantic layer future ahead as well? That's a good question. I think it it may be and I think it would be a good future if you know, like if it if we would be able to make it true. I think the prerequisite for that would be some standardization on the market. And at Kube, I would be happy to push for the open semantic letter standard. And if some other vendors would support that, it will help to bring this standardization. And then if we will have that standardization, then different semantic letters would be able to integrate and you know like even merge with each other which will which

Starting point is 00:36:31 will help you know like to have this sort of a cross cross vendor integration eventually okay and you've mentioned open source a few times there and i think again going back to maybe a unique a unique differentiator for cube is that community how important is the community to cube going forward i mean it was there at the start but what role would it play in cube going forward as well do you think uh i think it just it's essential part of the product and it's essential part of the, of every open source project. So it's, you cannot separate community from a product or from open source project because usually they define each other and they influence. So as much as product to project, it influences who are your users, that much users are influencing Q.

Starting point is 00:37:27 Even if we're not talking about direct code commits, you know, like even just getting feedback and just, you know, like asking questions and kind of, you know, like moving product into a specific direction, I think community will influence it. So it's hard for me to imagine, you know, like it's two things as a separate entities. Yeah. Okay. And, and, and just to kind of round things off, you, you, so you mentioned at the start, you talked about stat spot and you said it was you, you talked about conversational and conversational BI now. So there was, we actually had David from Delphi labs on the,

Starting point is 00:38:03 on that episode a little while ago. And we've kind of been trialing that product since then. And in a way, it sounds to me that, and that links in with Cube. So it sounds to me that what he's doing with that product and the way it uses Cube is kind of going back to some of the things you were trying to do, but doing it maybe with a few more years worth of kind of technology around and the whole ai kind of world just tell us a bit about that that integration you've done with delphi labs and how that works yeah uh we just we just announced it a few days ago and i'm super excited about it first you know like it's obviously because of stats but it kind of you know like i had a you know like a weak spot for that but uh yeah i think

Starting point is 00:38:47 it's a it's great idea and large models right now they really enable the use case that's what we didn't have back then with stat spot and that's now what david and team they have it so i'm really bullish on on it so it's uh it's still a lot to build, right, of course, and it's still early use case. But I think it's just now we have a real chance to make it work, really. So I'm super excited about it. So maybe for anybody that doesn't know what you're talking about, maybe just describe what it is and what it adds to what you did originally

Starting point is 00:39:23 in the safe stat spot. Yeah. So with Delphi, you can actually run a conversational interface like think about chat gpt but it knows about your data it knows about your data model so you take uh you take a cubes data model you give it to the delphi and delphi learns about it. It also knows a lot about the analytics in general and then you can ask questions and they say, hey, how is my sales doing? What is that? What is this? And then it can go and look at your data

Starting point is 00:39:55 model and generate a query through your data model and then gives the results back. So it's kind of like an analyst that can work with your data model which is which is pretty cool and i think yeah i think that's really large models and semantic layers they enable this use case right now so you know like we if we'll have more adoption of a semantic layer uh that will that will be really good for you know like a delphi and a team just kind of you know like for them to to really provide a lot of value on top of semantic layer

Starting point is 00:40:26 because semantic layers, they give the meaning, they give business meaning to the data, which exactly what the systems they need. I mean, my experience with it, I mean, it's only days and we've only been using it for a day or so, but it's like having an analyst at the end of the, it's like having an analyst working the end of the on a on a it's like having an analyst working with you on on slack really so you could just say you know how are sales doing

Starting point is 00:40:49 this month and and it will like you say the the large language model in the background is the thing that allows it to have a kind of conversation with you and it will come back and say it will connect to cube for example and it will then say um you know by by by revenue do you mean this particular measure here by by kind of region do you mean this one here and it has this kind of conversational model with uh sort of interaction with you um which is the which i think is the difference between that and maybe what stat spot was where there was maybe a certain way in which you had to ask questions but it couldn't have a conversation with you as such um but then it will it will then link in with with cube and it will

Starting point is 00:41:24 then work and you'll then kind of access your data through there so it's a kind of a fantastic i suppose way of exposing cube to a more a less technical audience and doing it in the environment they like to use which is slack so it's um i was really impressed with it yeah yeah exactly i think you're spot on this fact that it can do follow-up questions and do a little bit investigation. What actually you mean, I think that was a missing piece for StatsBot. And that's what we realized that it was not possible to do back then. But now with large models, it's actually possible. So this system can ask follow-up questions and build that knowledge build that memory about what you actually want

Starting point is 00:42:05 and then go and get get the data for you yes i think that's uh that changes a lot fantastic fantastic so so how do people find out more about cube and um and also just maybe just explain what cube cloud is as well i i think people just you know we rely a lot of an Inbound as in every open source project, right? So it's just people, the word of mouth mostly, and just a little bit blogging maybe. So people, they find out about Kube through the open source awareness, and then we have a cloud option. Cloud option is not a hosted option of Kube. It's more like a full-featured product built on top of cube.

Starting point is 00:42:46 So that's the way we position it. So there are some additional features and additional integrations that we have in the cloud. People sometimes, in many cases, they just opt in for a cloud instead of a cube core. Okay, fantastic. And Rippon Analytics as well, we're a cube partner, so we do a quick start package as well to get you money with cube. So just a little plug there, fantastic. And Ripman Analytics as well, we're a Cube partner, so we do a quick start package as well to get you up and running with Cube. So just a little plug there, really. But, Artyom, it's been great having you on the show.

Starting point is 00:43:11 Thank you very much for coming on and telling us about the kind of origin story for Cube. And, yeah, keep an eye on the products in the future, and best of luck for everything you're doing with the product, taking it forward. Thank you. Thank you for having me today. It was a really good conversation.

Drill to Detail - Drill to Detail Ep.107 'Cube, Headless BI and the AI Semantic Layer' with Special Guest Artyom Keydunov

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.