Orchestrate all the Things - Universal semantic layer: Going meta on data, functionality, governance, and semantics. Featuring Cube Co-founder Artyom Keydunov

Starting point is 00:00:00 Καλώς ήρθατε στο Αρχιστήριο Όλων των Πράξεων. Είμαι ο Γιώργος Ανατιώτης και θα συνεχίσουμε τα πράγματα μαζί. Στοιχεία για τεχνολογία, δίδα, AI και ΜΕΔΙΑ και πώς μπλούν σε έναν άλλο, σύγχρονα με τις δύο. Ποια είναι η Universal Semantic Layer και πώς είναι η διαφορά από τη Semantic Layer. Είναι εξωτερικά συμβολή σε σημαντικές, ποιος τα χρησιμοποιεί, πώς και για ποιο. Όταν ο κυβερνός του Cube, ο R.T.O.M. K.Dunov, ξεκίνησε να χάρτ. Dunov started hacking away a Slack chatbot back in 2017, he probably didn't have answers to those questions. All he wanted to do was find a way to access data using a text interface and Slack seemed like a good place to do that. K. Dunov had plenty of time

Starting point is 00:00:41 to experiment, validate and develop Kube, as well as get insights along the way. We caught up and talked about all of the above, as well as Kube's latest features and open source core. I hope you will enjoy this. If you like my work and orchestrate all the things, you can subscribe to my podcast, available on all major platforms, my self-published newsletter, also syndicated on Substack, HackerNin, Medium, and Dzone, or follow or gesturate all the things on your social media of choice.

Starting point is 00:01:13 My name is Artem. I'm co-founder and CEO at Qube. My background is in computer science, software engineering, data engineering. I started Qube as a side project actually back in 2017 or so. And I started with the idea to build a Slack bot. That was like a first side project. let's just try to build a Slack bot before bots were kind of all hyped before LLMs. And I built that Slack bot. And for myself, for my company, that Slack bot was able to get data from different places, connect to different systems, organize them into a single data model and push that kind of information into Slack. But by building that side project, I and my co-founder now, who joined quickly to work

Starting point is 00:02:11 on that stats bot, we built Cube. And then we decided to open source Cube, and then we eventually built a company around that. So that's an interesting kind of story that Cube as a project really kind of been created as a side effect of another project, which is no longer around. And I felt like it's a very cool story. But yeah, we started Cube as an open source in 2019. And then in 2020, we built a company around it. And now I'm currently co-founder and CEO. Okay, yeah. Indeed, I wouldn't have guessed that what you do now, which is basically all around semantic layers, started out as a Slack bot.

Starting point is 00:02:50 And to be honest, I don't really see the connection. So if you wanted to build a bot, right, like, whether you would have a good LLM as we have right now, or back then, you know, like seven years ago, without LLM, regardless, they're like, if you want to have Slack, if you want your chatbot to be able to access data in warehouses or databases, you need to break it down into two steps. So one step would be how I go from natural language to some multidimensional data model representation, and then how I go from that sort of a data model to SQL. So it's really almost impossible to go from text to SQL directly. So you need to go from text to data model and from data model to SQL, essentially. So that's how Kube kind of been created because when I was working in my co-founder,

Starting point is 00:03:51 we were working on that Slack bot. We built this intermediate layer, which is Kube now. So our bot was able to go from text to data model and then from data model to the query in a database or data warehouse. So, and now, even now, if you think about using it with LLM, that's what a lot of people do right now with cube. It's just kind of, you know, like we come in a full circle here, but people building chatbots with cube right now.

Starting point is 00:04:21 And what they're doing is that they're taking all the context from a kube semantic layer and they use this context to generate correct SQL queries so that's essentially why we started kube in the first place so that's a connection well yes it does make more sense now because either I missed it or maybe you didn't mention it but when you talked about the slack bots I missed the part when when you said what exactly the slack bot was supposed to do and i'm guessing based on on what you just said from the context that the the purpose of a slack bot was to actually serve as a textual interface to query i don't know a data warehouse or something like this right so now it does make sense i can see the connection indeed. Right. Yeah. And you're right. That's what this Slackbot was doing. So idea was that,

Starting point is 00:05:15 you know, like a lot of teams, they were using Slack, still using Slack, and they have a lot of different systems to access data. And the idea was, what if we can bring this data right directly into the Slack, right? So you're still on a Slack, you do all this, you know, like a conversation, discussion, but what if you can just pull your data, say from Salesforce or from, you know, like some performance analytics from like a new relic or some other system right into Slack,

Starting point is 00:05:37 so you can just do a query, right, right directly in Slack with natural language. So that was an idea to, and pull the data from different sources and put it into Slack. So I think the key, well, at least in some ways, the key to all that is what you just said. So data from different sources. As long as you're only creating a single data source, that also implies you have like a single schema. So things are relatively straightforward. It starts getting much more complicated as soon as you add data sources, because in all

Starting point is 00:06:11 likelihood, your two or more data sources will not have the same schema, or maybe they don't have a schema at all. And so you have to somehow start mapping things, basically. And I guess that's what semantic layers are all about. And so there's a number of things that we would like to cover in this conversation. And also the fact, as I mentioned, that you have some news coming up, which we're going to touch upon. But I think it's best if we start from the actual definition of a semantic layer, because

Starting point is 00:06:43 I think for many people, it may not be entirely clear. So how would you define what you do? So what constitutes a semantic layer? I think we all used semantic layers in a way that I define it during, you know, like in our sort of a professional work at some point when we use BI. So many of us use some business intelligence tools at some point. We kind of drag and drop chart measures dimensions onto the canvas to build chart. Thinking about Tableau or some other BI tool. So that is essentially semantic layer.

Starting point is 00:07:25 When we do something like a drag and dropping, what BI tool does, it takes our metrics, our measures from the UI, and then it translates them into the query to the underlying data source. In many cases, it's a SQL data source. Nowadays, it used to be sometimes something different, especially back then. But today's semantic layer inside of BI tools,

Starting point is 00:07:50 they essentially take measures, take dimensions, all of this high-level business objects, revenue, active users, all these definitions, and then translate them down to the underlying SQL queries. And sometimes it's not one-to-one. So sometimes you bring multiple measures, multiple dimensions, and they can translate into multiple SQL queries because you need to do more complicated calculations

Starting point is 00:08:15 to get to these metrics and then do some post-processing. But at the end of the day, it's how we go from a business-level definition, business-level metrics, into the database SQL queries. And the semantic layer as an idea is not new. It's been around for a while since the first days of the business intelligence. In fact, business objects, they had a patent for sort of a semantic layer idea in the 90s. And then MicroStrategy successfully depended against it. And then the interesting problem happened and why Kube and idea of universal semantic

Starting point is 00:08:51 layer is relevant now is as we went from first wave of BI, where we had one big BI tool like business objects and microstrategy being Houston inside organization. We went from that wave to this current wave where we have just explosion of different data visualization and data consumption tools within organization. So more data consumers, they wanted to access data in more self-service way. There is like a whole idea of democratizing data, right? So we want to give more tools to data consumers to like slice and dice differently data however they want. But it also comes with a cost of having more tools. And a cloud architecture made it really easier.

Starting point is 00:09:36 So it's like it's easier to buy new tools. And there are like a lot of BI tools out there. So what's happening is that in organization right now, you have like 10 BI tools and 20 BI tools, and every BI tool, it has a semantic layer. Now what's happening is sort of kind of multiplied your semantic layers inside your organization. And the problem is that you don't know which one is correct. You might have one BI tool with all the metrics, and the second BI tool is all sort of the same metrics metrics and they are not going to be in sync it's always in engineering when we repeat ourselves you know like the system is going to be error prone it's going to be not in sync not consistent we even have this idea of dry right do

Starting point is 00:10:17 not repeat yourself so essentially what we're doing with this universal semantic layer now is we're trying to apply dry principle to the data architecture at scale so ideas a we say okay you have all these 10 bi tools they all have semantic layers they all have definitions you probably don't want it to repeat yourself and build the same metric across multiple systems so let's take out the definition put it in one place where you can have it under the version control, you can have all the best practices, how you want to manage your metric, and then just use that metric across all of these BI tools.

Starting point is 00:10:57 And if you want to add more, 10 more BI tools, just add them. You still have your metrics in a single place. But that's an idea about Universal Cement Player. In a way, it sounds like a new twist to an old problem. I mean, you could be in pretty much the same situation if you had, it sounds like a kind of meta layer in a way. So yes, you did have semantic layers ever since you had BI tools. But now, as you're describing, because of the fact that you have many BI tools, you need to go one level up and define a meta-semantic layer to talk to all these different semantic layers. So it's like integrating different databases with different schemas, in a way, just on a different level.

Starting point is 00:11:41 Right. Yeah, correct. the level right yeah correct i think that really the problem here comes with just a scale of the of the stack as long if you have only one bi and you're going to use only one bi the entire life right in organization you probably don't need an extra semantic layer what do you have inside that bi is fine but that's realistically not going to happen, right? People are going to use Excel, Google Spreadsheets, custom build applications, all of that. So at this point, you really need a way to break, make sure that your semantic layer can work outside of your BI tool, right?

Starting point is 00:12:18 Because if it's only working within that BI tool and Google Spreadsheet is not using the metrics or Excel users are not using the metrics, defining that BI tool, Google Spreadsheet is not using the metrics or Excel users are not using the metrics, defining that BI tool, that's really bad because you want that same metrics to be consistent across the organization. The difference with that, with database analogy, let's say, is that at least in the database scenario, assuming you're talking about relational database, you do have a standard that you can reuse. So you can always, if you have your queries, you can

Starting point is 00:12:51 always revert, let's say, to using ANSI SQL and you're pretty certain that your query will work across your databases. But how do you ensure that, you know, your definitions will actually work across your different BI tools? Because do you have something like a SQL for metrics? That's a great question. Yeah, I think we call it and think about it as interoperability. When you think about the semantic layer, the data modeling is an extremely important layer. It's like how we help users to build the actual metrics but then the other important

Starting point is 00:13:26 metric layer is how people can actually use this metrics right interoperability so cube currently we have several apis so one api is just a rest api and you you can sort of sister API to that is GraphQL. So that's being usually used if you wanted to build your own application or to query data programmatically. So you can just send a JSON structured query or GraphQL structured query with a list of measures, dimensions, filters, and just retrieve the data back. The other interface we have is a SQL. So Q pretends to be a PostgreSQL database and C-compatible PostgreSQL database. The only caveat here is that tables in this database, we call them useInternal in Kube, they are sort of a multidimensional.

Starting point is 00:14:22 It means that they're not real. They're not materialized, right? And then we actually materialize them on a fly when you query them. So that's sort of like a caveat here. But the system can generate the legit SQL. It's possibly compatible. We support all the different SQL functions. And this way, tools like Tableau and other BI tools can connect to Kube.

Starting point is 00:14:50 And the other layer we support now is MDX. That's for more like Excel, Power BI, Microsoft ecosystem, because that's the most common query language in the Microsoft ecosystem. And then we assume going to support DAX as well, because DAX seems going to replace MDX eventually. It's kind of Microsoft pushing more DAX right now rather than MDX. So if I get it correctly, then those different ways, let's say, of interacting with the underlying BI tools have to do with execution

Starting point is 00:15:27 time really. But how about the actual definition of metrics? So if I want to define something like, I don't know, sales per quarter or whatever, do I have like a lingua franca that I can use to do that? Yeah, the way we think about it is it's all code so the cube is one of the big philosophies that cube is being a code first so your data model essentially is just a code base it's a set of yaml files or python files so we use yaml for most of it because it's declarative, it's easier to define. So if you want to create measure, for example, sales per quarter, right? You can just put a five lines of YAML as a definition, and then you're going to get that measure. If you want to do more like a complex things, sometimes you can write a Python code to make more like a dynamic imperative things. But most of our customers, most of our users, when they build data model in Kube,

Starting point is 00:16:29 they just use YAML definitions. So that's YAML code base. You put it under the version control. You use something like a GitHub to manage the version control, and then you just develop, add more definitions to it. I think the way we think about developing data model is more like a software. So if you wanted to add a new measure and new dimension, you would go into feature branch, you know, like create a new method dimension, put it on staging environment, test things.

Starting point is 00:16:59 If you know, like if it makes sense, then you can merge it and then you can have a pull request review process and all of that. But at the end of the day, just a code base. Okay, that's an interesting idea. It just makes me wonder, you said that the majority of your users work with YAML files. So it makes me wonder, what are the profiles of the majority of your users because in my experience it would be more like the engineering type of person that would be comfortable doing that not so much the analysts and in my mind at least analysts would be like the number one audience for for these bi tools right yeah great question yeah i would say most of our users who are working with data model

Starting point is 00:17:46 and keep directly their data engineers. The way I think about it is that if you have a data model semantic layer in your organization, right, that role is essentially sort of a data stewardship, right? It's someone who is building data products, right? Like some measures, dimensions, data models, something that can be used downstream

Starting point is 00:18:06 to build visualizations. And people who are usually building visualizations in tools like Tableau or some other BI tools, it's more like a data analyst, right? Like people who consume data, data consumers. So our primary users, I would say, would be more for data stewards, data engineers, people who are building this model. And they're usually more familiar with code-based tools.

Starting point is 00:18:32 Especially in recent years, I feel there is a big movement in the data world to apply software engineering best practices to data management and putting more things into code. Now we see Airflow, you know, like very code-first driven, which is a Python code. Some other tools like a Daxter, Prefag, really some transformation tools like SQL Mesh, they're all like code-first as well. So it's really like a big movement of making code-first workflows in data stack. Yes, you're right about that. And in many ways,

Starting point is 00:19:07 I think there's this idea, this principle, let's say, of applying the same principles that have worked well for software engineering also in data and data engineering. It's probably a good idea. The hard part really, in my experience at least, is actually getting non-technical users to do that. So, you know, people, first of all, most of the time they have a bit of issue, let's say a bit of trouble getting their heads around, you know, the key concepts like pull requests and pushing code and Git and all of those things.

Starting point is 00:19:43 But even if they do, then actually doing that, I don't know, firing up an IDE or a command line and just doing those things, I wouldn't really expect business users to be able to do that. So it sounds like the type of user that is the key audience for your tool is actually maybe doing a bit... They're also sitting in the middle in a way so they're sort of taking the requirements from the business users and trying to translate them into code so that they can work out that's that's correct i think that's where we are right now so cube is all the workflow and cube is very engineering and technical driven.

Starting point is 00:20:26 I think that's a correct foundation, how to build infrastructure. But I think you brought an interesting concern. It's like, how do we make it more friendly or accessible for more non-technical people to be able to make changes to the data model. I think historically, it's been a very big problem in data governance and the data modeling in general is that it's relatively slow moving in a way that if you need to make a change to your data model measure dimension, you need to go through multiple cycles before you can see a result. And if the workflow is technical, then a non-technical person cannot just do that. They would need to go and submit a Jira ticket to the data engineering team and just kind of wait for the next sprint so that ticket would go in and kind of be finished. I think what opportunity we have now is that with new large language models,

Starting point is 00:21:31 they're really good with generating code. We know that. And there are like specifically some models that are really being built for code generation. GitHub Copilot works fine and a lot of users use it. So it's a living example of how LLMs can generate code, which is correct and helpful. So I believe in the future, if you think about it as a data model, it's just being a code. Why LLMs cannot generate that code in the future? If we don't have a measure here, why LLM cannot just generate that? It could develop and fix the data model or do something about it. And if we take these AI agents that can do changes to the data model and pair them with domain experts that should not be technical, they may be from finance, from a marketing, they're just data consumers, but they're domain experts.

Starting point is 00:22:26 And we pair them with these technical AI agents that can actually make changes to the code base. Then the magic can happen. Then we can iterate on the data model quite fast. But at the same time, we keep the right foundation in place. It's still a code base, meaning that we can apply all the software engineering best practices to it. We can have a pull request review system from a central governance data team. We can do static analysis, impact analysis, CICD, all of that. So I think that would be an interesting opportunity in future for us, how to leverage LLMs to move faster with data models.

Starting point is 00:23:03 Interestingly, it sounds like by adding, so initially you had, as you also pointed out, you had this chronic, let's say, situation in that even the slightest change had to go through a long, long cycles of, you know, approval and then implementation and all of that stuff. And the whole idea behind specifically self-service BI tools was to do away with that and kind of make things simpler and faster, having in mind precisely those business users and being able to serve them in that they would in a way they wouldn't be dependent on the technical people to implement, you know, even the simplest of changes. And now by adding this universal semantic layer, it sounds like this has sort of come in through the back door again.

Starting point is 00:23:58 So we're back in a situation where self-service BI won't cut it. Yeah, it's always about the balance. So on one side, you have a well-governed data, trusted metrics. On the other side, you have a flexibility and velocity that you want to give to your data consumers. So the question is, how you can give them velocity and flexibility,

Starting point is 00:24:24 but at the same time still keep the correct governance. Because what could go wrong if you give a lot of freedom that people can start building wrong metrics, and these metrics are going to spread across the organization, which is wrong because someone built them. And then that person may be not a data person at all. They built them, they're wrong, that person quit a job and then the metrics still remain there.

Starting point is 00:24:52 No one knows who built it, how it's working but people have built layers on top of that and now a lot of things are broken. So that's a problem. But the other problem is that if you put everything into the governance, then you started to move very slow, right? People cannot just do ad hoc analysis. So I think what we need to do is

Starting point is 00:25:15 we need to give a good layer of data governance that we can maintain, the data team can maintain, that's going to be like a gold layer or platinum layer, just a lot of reference to metals in a data industry. But we can keep that governance layer in place. But what we also need to support is ability to create ad hoc calculations

Starting point is 00:25:41 on top of governed metrics inside these BI tools. So the analyst, the data consumer can go in a BI tool, Tableau, for example, right? Load the data model from Kube into the Tableau just to start doing some analysis. And then they see that the metric is missing. And then they should be able to quickly build maybe LED calculation in Tableau to test something. And that should be supported on top of these basic metrics. Once it's built, then the question, okay, if it makes sense, we would need to bring it back into the governance layer.

Starting point is 00:26:14 And that's where the workflow could happen. It could be through the data team in future. It could be through AI agent to make it faster. But we still need to give a way for people to quickly build calculations on the fly, on the edge, to test things because it's not possible to cover everything inside the data model, inside the governed data model. Yeah, totally agree. I was also going to add to what you said that, yes, you're right. You do need to have some sort of process through which you govern your metrics,

Starting point is 00:26:50 because otherwise you're going to end up with chaos. But I see that as kind of orthogonal to a universal semantic layer, actually. Another question that I had, actually, as when you mentioned previously that the main way through which you define metrics is actually pretty simple so yaml files and I was wondering because personally I come from a background of well more more elaborate let's say data modeling so knowledge graphs and ontologies and kind of formal semantics and all of that stuff. And to me, when people talk about semantic layers, always it kind of gets to a show.

Starting point is 00:27:32 Okay, is anyone using that kind of technology to an actual semantic layer? Have you considered that? Do you think it would make sense to introduce something like that? Or if not, why not? Right. Yeah, I think the name semantics about semantics on a naming is always been a been a challenge even in our industry you know when we only started to

Starting point is 00:27:56 think about building what we call now in universal semantic layer we had a different names for that type of technology some people were calling it metrics layer. Some people were calling it a headless BI. And then eventually, as an industry, we all arrived on a term, semantic models, semantic layer. But at the same time, you're either knowledge graphs and ontologists and all of that. So I would give you a definition and try to scope what we mean by semantic layer at Kube

Starting point is 00:28:30 and what other technologies in data space specifically mean in analytics, mean by semantic layer and by others, it mostly would be like at scale, DBT, like a Power BI semantic model. So usually it's all about building measures, dimensions, and a relationship between them. So you would have some entities that hold your measures, dimensions,

Starting point is 00:28:57 and these entities are tables essentially, right? It could be materialized tables in your warehouse. It could be virtual tables. These tables, they will hold measures, dimensions, and you will have a relationship between tables that sort of create your data graph. And then you might have some slices of that data graph. So we call them views at queue. Looker calls them explorers. Power BI calls them perspectives. But essentially, just like a slices of this data graph that you present to the end users.

Starting point is 00:29:29 So that's how we only define semantic layer right now. It's all for analytics purposes. So it's not a general purpose knowledge graph from that perspective, but it's enough to cover analytics use cases.

Starting point is 00:29:43 Let's fast forward a little bit because you actually already mentioned something that hinted to some new features that you're about to release. And I had the chance to read a preview of those features and it seems like at least a big part of it is centered around using LLMs to do what precisely? As again, a sort of natural language interface for people to define their metrics or a bit more than that?

Starting point is 00:30:18 Right. Yeah, I kind of mentioned already on one potential use of LLMs with semantic layer and the data modeling and how LLM can generate a data model. So essentially, if you think about that use case, eventually, I believe that LLMs will build and maintain semantic models over time. That might not happen tomorrow, but it will happen over the course of the next few years. That's one use case. The other use case, and that's what we're releasing this week, is how we can leverage semantic layer context

Starting point is 00:30:57 and the knowledge to create better AI assistants that can access data in our data warehouses. So if we think about the problem from the engineering standpoint, think it's a text-to-SQL problem, right? So imagine you need to build an AI agent that should be able to give you the correct answer based on your data in your data warehouse. Maybe it needs to run some calculation,

Starting point is 00:31:27 calculate the revenue per salesperson and compare that to the average revenue across all of them, all the account executives. And to do that, an AI agent needs to generate a SQL code. It's not a problem to generate a SQL code as to make it just NC SQL because there are like so many examples of NC SQL on the internet. The problem is that your AI agent doesn't know about your metrics. It doesn't know about your data model.

Starting point is 00:31:58 If you just give it a table or a set of tables and give a DDL of these tables as a context. It's not going to really learn a lot. It needs to get more prepared data on one hand. So the metrics already defined, but it also needs to get a more context about it. What is that metric? Why is it here?

Starting point is 00:32:21 How is it connected to that other metric? How is that revenue connected to actually other metric, how that, you know, like your revenue is connected to actually account executives. What is account executive? How it sits in an organization. So all of this sort of, you know, like a knowledge or knowledge graph, essentially all this relationship, it needs to have that context.

Starting point is 00:32:39 That context can be represented as just a text at the end of the day, describing everything that you have in your data model so you can take that text and you give it can give it to llm now your llm has all this context and it can generate a correct sql query you also don't want that llm to generate query to your end data warehouse because it's the query is still going to be big. It might involve multiple joins, all this calculation. So there is a lot of room to hallucinate and make a mistake.

Starting point is 00:33:15 And it would be hard to debug that. You wanted that to make a query to your semantic layer. So the query is going to be very simple. It's going to be just kind of, give me this metric with that dimension and apply that filter. Once you send it all to your semantic layer, the semantic layer can apply the data model it has inside

Starting point is 00:33:33 and then generate a real big SQL in more deterministic way, right? And then execute that SQL and send data all the way back to AI agent. So that's what you're releasing as a sort of first step toward that vision, which is releasing an API endpoint where you would be able to send your natural language query

Starting point is 00:33:54 saying like, hey, I want to look at my average, compare my average, you know, like sales with average sales per specific reps in that region. And then the LLM can generate all these queries, run them, execute them, and give you results back. And then you can use it if you're building a chatbot, you can use it directly, or if you're building maybe like a more complicated AI agent,

Starting point is 00:34:19 you can use it as part of your chain, right? So you can get that result based on this information, apply some other action and kind of keep it going. It's going to be just an API endpoint that can plug into any chain, into any RAG architecture. Interesting. So, crucially in the process that you mentioned, one crucial part is the context that you also mentioned, which you need to bring in so that the LLM can have the right background information to actually translate that textual input to something more structured to pass on to the semantic layer.

Starting point is 00:34:56 How do you actually feed that context to the LLM currently? It depends. If a context, if a data model is not huge, you can just take it and put it into prompt, right? Like as is. If it's a larger context, you probably want to turn it into embedding beforehand and then apply some RAG architecture where you would do like a vector search for the relevant context based on the query you're getting in, and then like apply only subset of that context. So like some optimization, you can do that.

Starting point is 00:35:31 But it's just from an architectural perspective, you know, like once we have a context, once we able to do all this like metadata indexing, understanding what relevant information we have, we can turn it into the embeddings. And then from there, we're just good to go, which is applying best practices on building RIG architectures. Okay, so it sounds like you're using a kind of mixed approach.

Starting point is 00:35:56 Let's say sometimes you use the context window. Sometimes you use RUG, basically. I have a tip here for you just to tie this back into the previous conversation on knowledge graphs and so on. So actually, some people from the knowledge graph community did an interesting experiment and tried to compare and contrast these two different approaches, specifically on RUG. So in one instance, they do precisely what you are trying to do here. So basically feed context to LLMs in order to access a back-end database. So on the one set of experiments they used the schema information from the database as is basically for the LLM and in the other instance, in the other set of

Starting point is 00:36:38 experiments, they used the same information but morphed into a knowledge graph. And they compared the results in the two sets of experiments and they found out that by using knowledge graph-based RAC, they had much better results. Yeah, I think I saw that paper you're referring to. And I remember folks in our community, what they did is that they took that experiment and they run it instead of knowledge graphs they used cube semantic layer and i think they got to even better results many things already sort of being pre-calculated in a cube

Starting point is 00:37:19 semantic layer so if you think about it going from like a DDL to knowledge draft, it's one improvement. And then going even from knowledge draft to semantic layer, it's another improvement. Interesting. I didn't know about that last bit. But yeah, I guess it just goes to show that there's lots of... It's in the early stages, so there's lots of room for improvement, I guess. Yeah, for sure. Okay, and the other new feature that I saw you releasing has to do with chart prototyping. So what's that exactly? Yeah, so Kube is commonly used to build different data applications.

Starting point is 00:38:01 So we were talking a lot about using BI tools and LLMs, right? But sometimes you just need to build your internal data app, maybe like this React or Python or some, you know, like other visualization tool. And you just wanted to present it internally, like show it to the customers. So we were releasing a way to quickly build prototype different types of the views visualizations on top of cube so we are still catalyst technology so we're not we're not providing you with a kind of visualization layer but we give away to quickly iterate the different visualization options just to to bring it to production sooner so that's been a very common request from many customers, how they can just quickly prototype charts on top of Kube APIs so they can build different apps. One kind of closing,

Starting point is 00:38:52 I guess, question, because we're almost kind of out of time, I had was, you did mention previously that the core technology, at least, is open source. And now we are just talking about new features and so on. So I wonder which parts are the core open source ones and which are the value-add features. And so if people want to get started with Kube, where do they start and what can they start experimenting with? Yeah, we have on a cloud, think about it as our cloud commercial product. It's just a more feature-rich platform. All the different AI capabilities are built in a cloud platform. Things like a chart prototyping built in a cloud platform. There are more integration with a BI tool that's built in a cloud platform.

Starting point is 00:39:41 Integration with Excel built in a cloud platform, all of that. But the core data modeling experience, you can write, you can create your data model still in open source, right? You can write this YAML files in open source. You can compile them. You can run them. So there is like an open source core. The easiest way to start is just to sign up at the kube.dev.

Starting point is 00:40:04 That's going to be a cloud product. We have a free tier. It's a very generous free tier in a way that you can build, you can develop, you can build a staging and even small production on that. And that would be probably the best way really to start with Kube. Well, thanks for the conversation. It was a good, interesting one.

Starting point is 00:40:27 I learned a few things, which is always what I like to walk away from conversations with. And I hope it was a good one for you as well. And I guess we can wrap up right about here unless there's anything that you feel we didn't cover and we should. No, that was a great conversation.

Starting point is 00:40:44 Thank you. I really enjoyed it. Thanks for sticking around. For more stories like this, check the link in bio and follow Link Data Orchestration.

Your Ad Here

Orchestrate all the Things - Universal semantic layer: Going meta on data, functionality, governance, and semantics. Featuring Cube Co-founder Artyom Keydunov

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.