Drill to Detail - Drill to Detail Ep.112 ‘From Delphi to Cube’s New Semantic Model AI Features’ with Special Guest David Jayatillake

Episode Date: September 12, 2024

Mark Rittman is joined by returning guest David Jayatillake, VP of AI at Cube.dev, to talk about Delphi Labs’ journey from a standalone data analytics chatbot to now becoming the basis of Cube’s n...ew AI features within its composable semantic model product.Drill to Detail Ep.102 'LLMs, Semantic Models and Bringing AI to the Modern Data Stack' with Special Guest David JayatillakeDrill to Detail Ep.107 'Cube, Headless BI and the AI Semantic Layer' with Special Guest Artyom KeydunovIntroducing the AI API and Chart Prototyping in Cube CloudA Practical Guide to Getting Started with Cube's AI APICube Rollup London : Bringing Cube Users Together

Transcript
Discussion (0)
Starting point is 00:00:00 So welcome back to a new series of the Dual to Detail podcast sponsored by Rittman Analytics and I'm your host Mark Rittman. So in this first episode of the new series I'm very pleased to be joined by returning guest and fellow Brit, David Giatillica. So welcome back to the show, David. Thanks for having me, Mark. It's great to be here. So David, well, we had you on the show 18 months ago, and you were talking about the startup you just founded, Delphi Labs. Give us a bit of history there and tell us a bit about what you're doing now.
Starting point is 00:00:45 Yeah, so my co-founder, Mike Eleven, and I started Delphi Labs in March of 2023, which I guess in hindsight isn't that long ago. It feels like a lot has happened. But we set out to provide an AI interface on top of semantic layers. And this was, at the time, a pretty contrarian view because everyone at the time was thinking that AI would learn how to write SQL perfectly. And that would be the way that we enabled self-service access to data.
Starting point is 00:01:20 And the reason, you know, and this has been proven, the reason they want to do that is because it's easy and it doesn't require a lot of engineering. Whereas using a semantic layer does require engineering. But, you know, as you expect, doing things properly pays dividends. And so that was our philosophy and we achieved a level of product success where we know on benchmarks, we achieved the highest accuracy of, uh, being able to answer questions using our method versus Texas Eagle, which was very inaccurate. Um, but what are some of the things that we encountered were really to do with go to market that made things difficult. So obviously, we at the start, we kind of assumed that someone would have a semantic layer. And
Starting point is 00:02:10 often that might have been LookML. Now, the problem with LookML is usually LookML sprawls over time. And so some of the companies we helped were having six-year-old Looker instances, which were huge. They had many duplicates and poor naming conventions, no documentation. And on those types of semantic layers, you know, if a human can't go into it and understand what it's about, an LLM won't be able to either. So it won't be able to pick the right metric or the right dimension for a question if it's very, very difficult to understand that. And what that led us to doing is we realized that we needed to build the semantic layer for people. So we started building people cube semantic layers and embedding cube into Delphi. And when we did that, we found we had great results.
Starting point is 00:03:06 Actually, people were able to get very accurate answers, much like with the benchmark. And then we realized that it ended up with us being more or less a BI tool. We needed to build all the charting and all the other features that a BI tool had. And really, when we raised money last summer, we raised money to build a much more lightweight tool than a BI tool. Building a BI tool is extremely capital intensive. And at that point, we thought, you know, Kube is our best partner. Kube is a semantic layer we chose to embed into our product. Maybe they would be interested in us joining them rather than us going and building a BI tool. Because to be honest,
Starting point is 00:03:45 I thought we could do it, but the risks are very high that we would fail. And I thought as a responsible, you know, custodian of investors' money, maybe they'd be more likely to get a return if we join Cube. Now, and Cube were interested in us joining. That's, you know, the rest is more or less how we joined. And so since we've joined Cube, what we've been doing is very similar to what we've been doing at Delphi. My role at Cube is VP of AI. And roughly that is a product management role that's focused on delivering AI features. And the first AI features we delivered were very much aligned with what we had at Delphi, which was allowing that natural language interface data. So we released the Kube AI API, which allows people to build their own chatbots on top of their Kube semantic layer. And then we've also got the AI assistant, which
Starting point is 00:04:42 allows people inside KubeCloud to ask those kinds of natural language data questions. And we'll keep iterating and building more as we go along. Okay. Okay. So that's a good little part of history, I suppose, what's happened in the last 18 months. But let's kind of just take a step back, really. So for anybody that doesn't know you and doesn't know your, I suppose, your history in the analytics, I suppose, kind analytics community, the market and so on. Give us a little bit of a biography of what you did prior to starting Delphi and why you're interested in this area. Yeah, so I've worked in data before entering data startups for about 12 years.
Starting point is 00:05:22 So I started out at a company called Ocado. I always ask people whether they've heard of Ocado because when I started out there, not everyone had. But now most people know who Ocado are. They're an online grocer. And that's where my data journey started. That's where I learned my SQL and played around with something like a very early version of Tableau for the first time. And, you you know primarily used excel as a bi tool and learned all my excel stuff from a bunch of consultants who worked for a card at the time um and they're you know very good at excel but from there i moved into the payment space and that's i think where we overlapped a bit at world pay um and so you know with our good colleague chris tabb who's also
Starting point is 00:06:05 now pretty big in the world um that's where i spent a lot of my time in in my career built you know started out there as what was called a business analyst but actually was more like what you'd call an analytics engineer today i spent most of my time building data sets and building stored procedures to construct those data sets and refresh those data sets. And then I did analytics on top of as well. And so that's kind of like what's led me to semantic layers in some ways is those early experiences where I was acting as a semantic layer, where people would come up to me and ask me a question. And then I translate that question
Starting point is 00:06:45 using my knowledge of the data model and how to use it into a SQL query and that's that's very much how something like cube works with with AI right is that you know someone asks a question and then it constructs a query except it's without a human and it's codified and so I've then I spent a lot of time at WorldPay doing different roles, all mostly what you'd call data or analytics engineering type roles, leading teams, but led a commercial analytics team as well there, which was really, really enjoyable as well. Then moved back into more generic data roles. I led a BI and analytics team at a company called Elevate Credit, which was a short-term lender. Then moved to Lists.
Starting point is 00:07:29 So some of you may have heard of the Fashion Marketplace List. And I led a data team there, finally leading a mixed analytics, engineering, analytics, and data science team of 25 before moving into startups. But what made you or what gave you the, I wouldn't say inspiration, but what was the trigger to actually start your own startup? And I'm interested to understand how that kind of went in terms of how much it turned out to be how you expected and what was it like trying to get that product market fit and so on. What was that experience like of actually founding your own sort of product startup so i mean i think it probably is is it's interesting to think that you know delphi wasn't my first startup so i when i when i left list i ended up working at Evora, where I was trying to spin a new startup out of an existing company.
Starting point is 00:08:29 And I had an experienced co-founder in Ricky Thomas with me. He's built a number of companies and now is looking at his CTO and co-founder of StreamCap. And so that was my first experience, really a of trying to run a startup and trying to raise money and all that kind of stuff and the great thing about that was you know if i didn't have ricky with me i wouldn't have done it because i would have been too afraid of all of the things i didn't know because ricky knew those things and guided kind of guided me through it, I always felt okay to do it. Then the second time round, what happened was I wrote a post and I write a sub stack, David S. Jader sub stack. And I wrote a post on there about a text to SQL tool called Ask Edith. And in the comments, my co-founder, who wasn't my co-founder at the time uh michael wrote uh hey i've built this prototype of a tool
Starting point is 00:09:27 called delphi and instead of doing text to sql it does uh text to semantic layer and he at the time he got it working with dbt subsequently we got it working with light dash because that's what i had available to me at the time and um that's how we ended up kind of hacking together on it. And because I'd been a founder before and I'd built this network of VCs, I'd built the skill set up, but hadn't been successful in building the spin out from Evora that we had wanted to. I could see this big potential in Delphi, but I didn't know what Michael wanted.
Starting point is 00:10:05 And I ended up asking, what do you want to do with this? Do you want this to be a business? Is this an open source project? Is this just a hobby? And Michael said, I do want to make this a business, but I don't really know how to do that. And funnily, you know, weirdly enough, that's the thing that I had learned just before, which is how, you know, how to how to you know raise money and start a company around something like this and so I offered not thinking thinking not thinking that he would
Starting point is 00:10:30 accept because I thought you know he you know Michael has worked for some pretty big companies in America he must have someone that he might want to found the company with and actually he didn't and he was really interested in founding the company with me and that's how that's how that happened so you said you said actually you were embedding cube inside your product at one point as well because i think the time when the time when we were kind of working with you a little bit was when you would connect delphi to say sort of look or so to various models so did it did it evolve on from that then to actually embedding a semantic model in in the product yeah so in from those experiences where we didn't get the results that we wanted, that we knew we could get. And the thing that really inspired Embedding Cube and building semantic models for people was actually this benchmark
Starting point is 00:11:14 that Data.World released that DBT then went to recreate using their semantic layer in an LLM. Michael and I reproduced it and we achieved the best of lot. We achieved 100% accuracy on those answers. And that made me realize it's the quality of the semantic layer was really being a blocker for us actually achieving the accuracy we wanted. And so then we thought, well, maybe we need to build this for people because actually, firstly, that's going to give us the accuracy and the wow factor when we demo this for people and we do POCs for people.
Starting point is 00:11:50 But secondly, most companies don't have semantic layers who use data. And that enables us to reach out to a lot more people, basically anyone, rather than just people who had Looker or dbt or Kube already. So let's move on to to to cube now actually for and we'll come back to the features that you've been adding into cube for ai based on that learning experience with delphi um as we go on but but for anybody that doesn't know cube at the moment um what is cube the product okay what where does it fit into the data stack and what
Starting point is 00:12:21 problem is it designed to solve in general not just around ai so cube is um a universal semantic layer and i guess firstly like a good thing to be would be to explain what a semantic layer is so semantic layer um provides an abstraction on top of data that allows someone to request objects. So this could be a customer or revenue or attributes about those things from the semantic layer without having to understand the structure of the data at all. And then the semantic layer will compile that request into a raw SQL that can be executed against a data warehouse. And then run that query and return the results. And therefore, the semantic layer also stores the definitions of
Starting point is 00:13:17 what these entities are, what their metrics are, what their dimensions are, how to join data together, and all of those things that enable it to deterministically compile a request like that. And so a universal semantic layer like Kube is designed to work with many, many different data warehouses. So we integrate with a great deal of them and multiple of them at the same time as possible. But also with, you know, and the main point is with multiple consumption points so some of cubes customers today they have embedded analytics needs they build their own like applications in in javascript or or whatever it might be and then they use cubes apis to expose their data and they might be selling that product to their customers. Then they also have internal analytics use cases where they might be using a BI tool like Tableau
Starting point is 00:14:10 or something like that. And they might have multiple BI tools. So they might have Tableau, Superset, and something else. And they can all feed off the same semantic layer, same definitions, and then have that consistency and governance centrally. There's obviously a few players in the semantic model market, really. You've got, obviously, you've got Kube, you've got AtScale, you've got products like LookML, for example, with Looker. And you've also got, I suppose, BI tools that embed their own semantic letters, like, say, Omni, for example, or things like Power BI.
Starting point is 00:14:42 How would you say Kube's approach is differentiated from the others? And why do you think that Kube took the approach it did around, say, composability, independence, and that sort of thing? So I think with the BI tools, in some ways, the BI tools have an advantage when they embed the semantic layer in their tool, is that they can develop their BI features alongside the semantic features, and then it's a much more streamlined process. And that means it works really well for BI out the box. The problem is that it means it doesn't particularly work well for anything else. So if you want to build your own embedded analytics solution,
Starting point is 00:15:24 it's not a great experience. If you ever want to move own embedded analytics solution, it's not a great experience. If you ever want to move from that BI tool, it's a really, it's really, really problematic. And that's one thing I always advise people who are looking at something like Looker or Omni or ThoughtSpot, where they where they kind of store the semantic definitions with the BI tool, you know, you're kind of stuck with them for a really long time because coming off them, when you've put 10,000 lines of look a male or whatever into the product and that stored there, that's kind of, they own it. You think you might own it, but they kind of own it. And because of that, you can't leave. And so that's, I think like a key differentiator between something like cube and those tools. When you think about why cube is different to the other, I guess, headless semantic layers,
Starting point is 00:16:11 AtScale has been around for quite a long time. And what they've really specialized on is the Microsoft stack. So AtScale, they look like SSAS or MDX, and they're really intended for use with Power BI or Excel. They don't have great APIs for builders to build their own data products with. That is where Kube is completely dominant, is where people are using either open-source Kube or Kube Cloud
Starting point is 00:16:43 to build their own data products. It's the number one embedded semantic layer for sure. And then I guess the other thing is, cube wants to go far beyond just one set of BI tools. Like we are gradually supporting Microsoft now, we support Excel right now with our MBX API. We will launch support for Power BI very, very soon. And we also support Tableau, Superset, and many more BI tools. And so we want to support a wider range of BI tools.
Starting point is 00:17:19 And I think that's probably a differentiator, whereas AtScale is very, very focused on enterprise, very, very focused on Microsoft. Okay, fantastic. So let's dig into this bit more of the story about you moving into the cube and I suppose the transition from Delphi to the features you're building at the moment. So go back in time, go back a little bit.
Starting point is 00:17:38 You said that the challenge you were finding was that not every customer had a great semantic layout and I suppose a lot of them didn't have that um but but why did you choose to embed cube um given that my understanding with this was that it wasn't so much the the the the type of semantic layer it was it was how well defined it was and how unambiguous the semantic layers were and so i mean just maybe just elaborate a little bit on on semantic layers from the customers you have to try and work with not being up to up to the job really and why again that led to you sort of joining cube so there there is an element about
Starting point is 00:18:15 quality as well so the the when we're when you're when choosing like something to embed like firstly cube has a fantastic open source offering, which we could just use. And actually, there wasn't really anything else out there that has something like that. Like dbt semantic layer is not open source, neither is at skills. And so
Starting point is 00:18:37 that was a big deal for us, was that we could just embed it if we wanted to. Secondly, Kube's APIs are the best. So that's what we found, you know, objectively comparing the different systems at Delphi was Kube's APIs are the nicest to use with an LLM. Like, for example, DBTs has places
Starting point is 00:18:59 where you need to inject SQL into a GraphQL query, which is unfortunate and makes it difficult for the LLM to get the answer correct. And Kube's semantic layer is more capable in that it can do multi-hop joins out of the box. You can define how those should work in a given scenario using Kube's views, which are similar to Littler Explores. That's not possible in dbt um so there are a number of reasons why we chose cube and and yeah cube being
Starting point is 00:19:33 the open standard is is a is a really strong reason for that okay okay so so so tell us a bit about what you've been doing at cube since you joined so you mentioned about the ai assistant and you mentioned about the api um i i you know in preparation actually for the for the session on monday but also just my interest i've been playing around with the api um and and i'm really impressed with that actually um but tell us what the api is the ai api is and um i suppose the thinking behind it and yeah to start with that with that, first of all. Yeah, so I guess this goes back to Cube's like absolute strength, which is in the embedded analytics market. And so many of our great customers
Starting point is 00:20:16 build their own products on top of Cube. And they have been, you know, and this was some of the stuff that we knew before we joined Cube. And we'd this was some of the stuff that we knew before we joined Kube, and we'd spoken to some of these customers, actually. And so we knew that one of the first things that we wanted to do on joining Kube was to make a way for those customers
Starting point is 00:20:35 to use AI with Kube without having to, like, roll their own, with their own application with an LLM and Kube. So because we knew that they're all going to have to build their own retrieval augmented generation method. They're all going to have to build so much stuff that if we build once and do it once well for everyone, that it would make a lot more sense.
Starting point is 00:20:59 So that was the thinking behind AI API. So it means that they can make their own chatbots. And so you can do conversational with the AI API. You can get it to output a suggested chart configuration. So that was something someone asked for recently released. You can get it to run a query or just to generate the cube queries to run elsewhere if the cube queries can take a really long time to run.
Starting point is 00:21:27 So, yeah, and we've also released like a chart prototype, but for the AI API. So people can just take a little TypeScript application, which shows you how to build a chatbot application using the AI API. Okay, okay. So, I mean, the reason I thought this was interesting was that we recently did a PSA for a customer where we built a chatbot for them. And it was something simple in JavaScript.
Starting point is 00:21:55 And then it would call a REST API. So, and the REST API that we built was, you know, it was a Google Cloud function and it was behind it was some Python, which would take in the question. It would then use Langchain and the SQL agent to, you know, go through the steps a SQL agent does to query the data dictionary and then sample the data and so on there. And then it would return the results back to the customer, sorry, to the front end, which would be, you know, the revenue revenue mount or whatever that sort of thing so it was a kind of it was a q a chatbot to query your data now looking at what you've built with that api you know it looks like it would replace the need for that back end or
Starting point is 00:22:34 certainly replace the need for a lot of the complexity of that back end because you well tell me how tell me how it would compare to say building something yourself using land chain for example yeah so it should it should really replace that it's meant to be a turnkey solution where someone where someone who's building that application in javascript doesn't then need to go and build retrieval augmented generation they could just literally call this api and it works right okay okay and so where so where do you see that feature going there So this is an API currently that you can, and just to be clear on this, because Kube has an API for querying it just for normal sort of BI type queries.
Starting point is 00:23:11 How does this work? You send a prompt, a query to the AI API, and then does it then send back the results or does it send back a definition of a Kube query to actually get the results? So it depends on how you use it. So there is like a run query parameter. So if you think that your query is pretty small,
Starting point is 00:23:27 it's not going to generate loads of data and it'll run quite fast, you can set this run query parameter to true. And so what it will do is you'll send the, it's a very straightforward API request, which is literally the blob of text that the user put in as their question. And then it will return a little summary of what
Starting point is 00:23:46 it's chosen to do using some terms from the semantic layer so if you ask for it uh i want to know the number of sales by month it will say something like you know sure i'll say i'll tell you revenue because that's what it's interpreted as the num you know the amount of sales by um order month or something which is something it knows about from the semantic layer. And then if you've chosen to run query, it will also show you the cube query that it's generated, and then it will run it and give you the results. If you choose not to do run query,
Starting point is 00:24:17 and there's a good reason around timeouts and stuff not to, then it will also send you the results in json as well okay and you say this you're working on a charting part of that as well then so it actually will display charts maybe well so what what some of our customers said was that's great that it gives us the how to pull the data but like how do we know exactly what how to chart it and like you know that really depends on like the the data that comes back so we we built a capability for the llm to actually recommend a chart type for the type of data and like usually it's choosing between line or bar based on whether it's like a a ratio type metric or it's like a cumulative type metric yeah okay and you mentioned uh lm there as well so what
Starting point is 00:25:06 lm does it use in the background for this so we use uh gpt the gpt4 family of models i think we're using g4o at the moment uh but we have implemented for some customers uh like claude 3.5 sonnet that works quite well as well. And this is like some of the developments that we will have for the AI API is that we're obviously going to continue to experiment with different LLMs, open source, everything,
Starting point is 00:25:38 and figure out which is the best to have that balance of speed and accuracy that people need. Okay. Okay. Now, I guess this is probably outside of your, maybe your remit, but can you,
Starting point is 00:25:50 have you been speaking to like, say Preset, for example, about using that API to give their BI tools, the conversational sort of ability, or is that, you mentioned about, you know,
Starting point is 00:26:00 application builders could use this. Could you maybe see open source BI tool builders using this? We probably need greater alignment with one of those open source tools. So I can imagine some of them who have more of a native integration with Kube would consider doing that. But I think those kind of open source tools tools they haven't focused heavily on inputting ai
Starting point is 00:26:28 inside the tool because i think you know but given the costs the online costs of using ai having ai features in the open source tool is somewhat contradictory because you can't really offer them um because they're not free. So what about the AI Assistant? What's that in Kube now? Yeah, so the AI Assistant is very similar to the AI API, except that it actually has a visual interface inside Kube Cloud. So you can go to a tab in Kube Cloud and ask a question and it will output an answer in a chart. The only difference between it and the AI API
Starting point is 00:27:02 is that the AI Assistant uses this other new feature called Kube Catalog. So Kube Catalog is a way to explore what's in your semantic layer, what's upstream of it in your data warehouse, as well as what's in the cube semantic layer it can also say oh you've got a tableau dashboard that looks similar to your question as well that's and that's like the key difference between it and the ai interesting so what's i mean i i'm guessing this is just the first of the things you want to try and build with with cube in this area i mean what what within the within the realms of what you are allowed to talk about you know what do you see as being the next thing that you'll be doing in this area what can we look forward to as as kind of the next manifestation of your dream really in this area so that there are a number of things about enhancing um both of those
Starting point is 00:27:59 features so a typical problem that we we've planned and we plan to solve this at Dogfight and we're planning to solve it here as well is how do you deal with things like where someone just refers to a value of a dimension, for example? Like, you know, let's say rather than saying where a marketing channel is paid, they just say, show me all paid sales. Right. And you don't know what that is from the metadata and the semantic there. This is actually real data that's inside a dimension or a column. And so there's no magic to doing this. You kind of end up needing to store the values of a column or vectorize the values of a column. And so one of the things we're planning to build really soon in order, hopefully, before we GA these features
Starting point is 00:28:49 is a way for people to elect to profile dimensions in this way because a number of customers have asked us for this and we knew this was a problem when we were at Delphi as well. And more generically, we want to use AI broadly in Kube to help whoever it is. So if it's the engineer, can we help them write code more easily, generate Kube semantic layer
Starting point is 00:29:16 more easily, either at startup or from maintenance mode? Is there a way to schedule things that need to happen in cube for example you know cube has some really powerful features around caching and pre-aggregation but some of that requires configuration like could we do that automatically for someone so these aren't necessarily like llm type ai things to do maybe they're more traditional ml type stuff. And yeah, we want to see what we can do for people. Okay. And in your role looking after AI within Cube, what are you seeing as being good customer use cases and examples of this being used really?
Starting point is 00:29:56 For us, the most acute need that we see is where someone is already selling a data product to their customers. They have paying customers paying for data that they have, right? And they're using Kube to power their existing application. That's where they are desperate to get AI in because their customers are saying, hey, can we have an AI interface to this so I don't have to like click around and figure out how this works?
Starting point is 00:30:20 I could just ask a question. So either their customers or their board are saying we need this. And so that's, you know, that's where we see a lot of interest. And we've even seen some deals won on that basis where someone is interested in you because of the AI API. What about, I suppose, internal BI use cases? Are you seeing it being used within companies to help summarize financial results
Starting point is 00:30:43 or to have know the classic thing about chatting with your data or things like that are you seeing actual good examples of that actually now in practice in with customers yeah so we we have like a a fair amount of interest it's still early but i think there's a fair amount of interest in both cathlog and ai assistant from that kind of internal analytics point of view we've we released those two features more recently than ai api okay okay and just to put just to say on that you and i are both speaking at a cube event on monday next monday yeah um just to kind of plug that as well so so what is that event on monday um uh
Starting point is 00:31:15 david yeah so cube is having uh its first ever conference this year so it's called cube roll-up and we're having an event in London at RSA House on Durham Street in the Durham Street Auditorium, which is like very, very central near Charing Cross. And so the London event is the first leg of the conference and we'll have a second leg in San Francisco on October the 15th. So what we want to do is to bring Kube users and customers together, celebrate a bit about how far Kube has come from the original KubeJS project
Starting point is 00:31:56 which started as far back as 2018 and which is now growing into this really great community. It was really interesting, actually. Igor, who looks after our community a lot, but is also a head of products at Cube, he showed we have more GitHub stars than dbt, which is amazing because of how well-loved Cube is by engineers around the world.
Starting point is 00:32:21 And so we celebrate things like that, but also show people what we've released recently what would what you know show a bit about what we're going to release really soon and also to unveil some new new product features and then also to just bring bring some other people on board like who are great partners of ours or who built stuff with cube so like obviously you're speaking as someone who's uh used cube in your engagements as a consultant we've got someone else who's built a product around cube um and then we've also got a great like customer story uh in permutative as well so it's uh yeah, we're trying to
Starting point is 00:33:05 it's kind of all kind of facets about Cube. That's really good. Thanks very much for coming on the show. It's been great to have you back and looking forward to seeing you in person again on Monday. Yeah, thanks for having me and looking forward to Monday. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.