Drill to Detail - Drill to Detail Ep.112 ‘From Delphi to Cube’s New Semantic Model AI Features’ with Special Guest David Jayatillake
Episode Date: September 12, 2024Mark Rittman is joined by returning guest David Jayatillake, VP of AI at Cube.dev, to talk about Delphi Labs’ journey from a standalone data analytics chatbot to now becoming the basis of Cube’s n...ew AI features within its composable semantic model product.Drill to Detail Ep.102 'LLMs, Semantic Models and Bringing AI to the Modern Data Stack' with Special Guest David JayatillakeDrill to Detail Ep.107 'Cube, Headless BI and the AI Semantic Layer' with Special Guest Artyom KeydunovIntroducing the AI API and Chart Prototyping in Cube CloudA Practical Guide to Getting Started with Cube's AI APICube Rollup London : Bringing Cube Users Together
Transcript
Discussion (0)
So welcome back to a new series of the Dual to Detail podcast sponsored by Rittman Analytics
and I'm your host Mark Rittman.
So in this first episode of the new series I'm very pleased to be joined by returning guest
and fellow Brit, David Giatillica. So welcome back to the show, David.
Thanks for having me, Mark. It's great to be here.
So David, well, we had you on the show 18 months ago, and you were talking about the
startup you just founded, Delphi Labs. Give us a bit of history there and tell us a bit
about what you're doing now.
Yeah, so my co-founder, Mike Eleven,
and I started Delphi Labs in March of 2023,
which I guess in hindsight isn't that long ago.
It feels like a lot has happened.
But we set out to provide an AI interface
on top of semantic layers.
And this was, at the time, a pretty contrarian view because everyone at the time was thinking that AI would learn how to write SQL perfectly.
And that would be the way that we enabled self-service access to data.
And the reason, you know, and this has been proven, the reason they want to do that is because it's easy and it doesn't require a lot of engineering.
Whereas using a semantic layer does require engineering.
But, you know, as you expect, doing things properly pays dividends.
And so that was our philosophy and we achieved a level of product success where we know on benchmarks,
we achieved the highest accuracy of, uh, being able to answer questions using our method versus
Texas Eagle, which was very inaccurate. Um, but what are some of the things that we encountered
were really to do with go to market that made things difficult.
So obviously, we at the start, we kind of assumed that someone would have a semantic layer. And
often that might have been LookML. Now, the problem with LookML is usually LookML sprawls
over time. And so some of the companies we helped were having six-year-old Looker instances, which were huge.
They had many duplicates and poor naming conventions, no documentation.
And on those types of semantic layers, you know, if a human can't go into it and understand what it's about, an LLM won't be able to either. So it won't be
able to pick the right metric or the right dimension for a question if it's very, very
difficult to understand that. And what that led us to doing is we realized that we needed to build
the semantic layer for people. So we started building people cube semantic layers and embedding
cube into Delphi. And when we did that, we found we had great results.
Actually, people were able to get very accurate answers, much like with the benchmark.
And then we realized that it ended up with us being more or less a BI tool.
We needed to build all the charting and all the other features that a BI tool had.
And really, when we raised money last summer, we raised money to build
a much more lightweight tool than a BI tool. Building a BI tool is extremely capital intensive.
And at that point, we thought, you know, Kube is our best partner. Kube is a semantic layer we
chose to embed into our product. Maybe they would be interested in us joining them rather than us
going and building a BI tool. Because to be honest,
I thought we could do it, but the risks are very high that we would fail. And I thought as a
responsible, you know, custodian of investors' money, maybe they'd be more likely to get a return
if we join Cube. Now, and Cube were interested in us joining. That's, you know, the rest is more or less how we joined.
And so since we've joined Cube, what we've been doing is very similar to what we've been doing at Delphi.
My role at Cube is VP of AI.
And roughly that is a product management role that's focused on delivering AI features. And the first AI features we delivered were very much aligned with what we had at Delphi, which was allowing that natural
language interface data. So we released the Kube AI API, which allows people to build their own
chatbots on top of their Kube semantic layer. And then we've also got the AI assistant, which
allows people inside KubeCloud to ask those kinds of natural language data questions.
And we'll keep iterating and building more as we go along.
Okay. Okay. So that's a good little part of history, I suppose, what's happened in the last 18 months.
But let's kind of just take a step back, really.
So for anybody that doesn't know you and doesn't know your, I suppose, your history in the analytics, I suppose, kind analytics community, the market and so on.
Give us a little bit of a biography of what you did prior to starting Delphi
and why you're interested in this area.
Yeah, so I've worked in data before entering data startups for about 12 years.
So I started out at a company called Ocado.
I always ask people whether they've heard of Ocado because when I started out there, not everyone had. But now
most people know who Ocado are. They're an online grocer. And that's where my data journey started.
That's where I learned my SQL and played around with something like a very early version of Tableau
for the first time. And, you you know primarily used excel as a bi tool
and learned all my excel stuff from a bunch of consultants who worked for a card at the time
um and they're you know very good at excel but from there i moved into the payment space and
that's i think where we overlapped a bit at world pay um and so you know with our good colleague chris tabb who's also
now pretty big in the world um that's where i spent a lot of my time in in my career built you
know started out there as what was called a business analyst but actually was more like what
you'd call an analytics engineer today i spent most of my time building data sets and building
stored procedures to construct those data sets and refresh those data sets.
And then I did analytics on top of as well.
And so that's kind of like what's led me to semantic layers in some ways is those early experiences where I was acting as a semantic layer,
where people would come up to me and ask me a question.
And then I translate that question
using my knowledge of the data model and how to use it into a SQL query and that's that's very
much how something like cube works with with AI right is that you know someone asks a question
and then it constructs a query except it's without a human and it's codified and so I've then I spent
a lot of time at WorldPay doing different roles, all mostly what you'd call data or analytics engineering type roles, leading teams,
but led a commercial analytics team as well there, which was really, really enjoyable as well.
Then moved back into more generic data roles.
I led a BI and analytics team at a company called Elevate Credit, which was a short-term lender.
Then moved to Lists.
So some of you may have heard of the Fashion Marketplace List.
And I led a data team there, finally leading a mixed analytics, engineering, analytics, and data science team of 25 before moving into startups.
But what made you or what gave you the, I wouldn't say inspiration, but what was the trigger
to actually start your own startup? And I'm interested to understand how that kind of went
in terms of how much it turned out to be how you expected and what was it like trying to get that
product market fit and so on. What was that experience like of actually founding your own sort of product
startup so i mean i think it probably is is it's interesting to think that you know delphi wasn't
my first startup so i when i when i left list i ended up working at Evora, where I was trying to spin a new startup out of an existing company.
And I had an experienced co-founder in Ricky Thomas with me.
He's built a number of companies and now is looking at his CTO and co-founder of StreamCap.
And so that was my first experience, really a of trying to run a startup and trying to
raise money and all that kind of stuff and the great thing about that was you know if i didn't
have ricky with me i wouldn't have done it because i would have been too afraid of all of the things
i didn't know because ricky knew those things and guided kind of guided me through it, I always felt okay to do it. Then the second time round, what happened was I wrote a post and I write a sub stack, David S. Jader sub stack.
And I wrote a post on there about a text to SQL tool called Ask Edith. And in the comments,
my co-founder, who wasn't my co-founder at the time uh michael wrote uh hey i've built this prototype of a tool
called delphi and instead of doing text to sql it does uh text to semantic layer and he at the time
he got it working with dbt subsequently we got it working with light dash because that's what i had
available to me at the time and um that's how we ended up kind of hacking together on it.
And because I'd been a founder before and I'd built this network of VCs,
I'd built the skill set up,
but hadn't been successful in building the spin out from Evora that we had wanted to.
I could see this big potential in Delphi,
but I didn't know what Michael wanted.
And I ended up asking, what do you want to do with this?
Do you want this to be a business?
Is this an open source project?
Is this just a hobby?
And Michael said, I do want to make this a business, but I don't really know how to do that.
And funnily, you know, weirdly enough, that's the thing that I had learned just before,
which is how, you know, how to how to you know raise money and start a company
around something like this and so I offered not thinking thinking not thinking that he would
accept because I thought you know he you know Michael has worked for some pretty big companies
in America he must have someone that he might want to found the company with and actually he didn't
and he was really interested in founding the company with me and that's how that's how that happened so you said you said actually you were embedding cube inside your
product at one point as well because i think the time when the time when we were kind of working
with you a little bit was when you would connect delphi to say sort of look or so to various
models so did it did it evolve on from that then to actually embedding a semantic model in in the
product yeah so in from those experiences where we didn't get the results that we wanted, that we knew we could get. And the thing that really
inspired Embedding Cube and building semantic models for people was actually this benchmark
that Data.World released that DBT then went to recreate using their semantic layer in an LLM.
Michael and I reproduced it and we achieved the best of lot. We achieved
100% accuracy on those answers. And that made me realize it's the quality of the semantic layer
was really being a blocker for us actually achieving the accuracy we wanted. And so then
we thought, well, maybe we need to build this for people because actually, firstly, that's going to
give us the accuracy
and the wow factor when we demo this for people
and we do POCs for people.
But secondly, most companies don't have semantic layers who use data.
And that enables us to reach out to a lot more people,
basically anyone, rather than just people who had Looker or dbt
or Kube already.
So let's move on to to to cube now
actually for and we'll come back to the features that you've been adding into cube for ai based on
that learning experience with delphi um as we go on but but for anybody that doesn't know cube at
the moment um what is cube the product okay what where does it fit into the data stack and what
problem is it designed to solve in general not just around ai so cube is um a universal semantic layer and i guess firstly like a good thing to be would be
to explain what a semantic layer is so semantic layer um provides an abstraction on top of data
that allows someone to request objects.
So this could be a customer or revenue or attributes about those things
from the semantic layer without having to understand the structure of the data at all.
And then the semantic layer will compile that request into a raw SQL
that can be executed against a data warehouse. And then run that query
and return the results. And therefore, the semantic layer also stores the definitions of
what these entities are, what their metrics are, what their dimensions are, how to join
data together, and all of those things that enable it to deterministically compile a request like that.
And so a universal semantic layer like Kube is designed to work with many, many different data warehouses.
So we integrate with a great deal of them and multiple of them at the same time as possible.
But also with, you know, and the main point is with multiple consumption
points so some of cubes customers today they have embedded analytics needs they build their own like
applications in in javascript or or whatever it might be and then they use cubes apis to expose
their data and they might be selling that product to their customers. Then they also have internal analytics use cases where they might be using a BI tool like Tableau
or something like that. And they might have multiple BI tools. So they might have Tableau,
Superset, and something else. And they can all feed off the same semantic layer,
same definitions, and then have that consistency and governance centrally.
There's obviously a few players in the semantic model market, really.
You've got, obviously, you've got Kube, you've got AtScale,
you've got products like LookML, for example, with Looker.
And you've also got, I suppose, BI tools that embed their own semantic letters,
like, say, Omni, for example, or things like Power BI.
How would you say Kube's approach is differentiated from the others?
And why do you think that Kube took the approach it did around, say, composability, independence, and that sort of thing?
So I think with the BI tools, in some ways, the BI tools have an advantage
when they embed the semantic layer in their tool,
is that they can develop their BI
features alongside the semantic features, and then it's a much more streamlined process. And that
means it works really well for BI out the box. The problem is that it means it doesn't particularly
work well for anything else. So if you want to build your own embedded analytics solution,
it's not a great experience. If you ever want to move own embedded analytics solution, it's not a great experience.
If you ever want to move from that BI tool, it's a really, it's really, really problematic. And
that's one thing I always advise people who are looking at something like Looker or Omni or
ThoughtSpot, where they where they kind of store the semantic definitions with the BI tool,
you know, you're kind of stuck with them for a really long time because coming off them, when you've put 10,000 lines of look a male or whatever into the product and that stored there,
that's kind of, they own it. You think you might own it, but they kind of own it. And because of
that, you can't leave. And so that's, I think like a key differentiator between something like cube
and those tools. When you think about why cube is different to the other, I guess, headless semantic layers,
AtScale has been around for quite a long time.
And what they've really specialized on
is the Microsoft stack.
So AtScale, they look like SSAS or MDX,
and they're really intended for use with Power BI or Excel.
They don't have great APIs for builders to build their own data products with.
That is where Kube is completely dominant,
is where people are using either open-source Kube or Kube Cloud
to build their own data products.
It's the number one embedded semantic layer for sure.
And then I guess the other thing is,
cube wants to go far beyond just one set of BI tools.
Like we are gradually supporting Microsoft now,
we support Excel right now with
our MBX API. We will launch support for Power BI very, very soon. And we also support Tableau,
Superset, and many more BI tools. And so we want to support a wider range of BI tools.
And I think that's probably a differentiator, whereas AtScale is very, very focused on
enterprise, very, very focused on Microsoft.
Okay, fantastic.
So let's dig into this bit more of the story
about you moving into the cube
and I suppose the transition from Delphi
to the features you're building at the moment.
So go back in time, go back a little bit.
You said that the challenge you were finding
was that not every customer had a great semantic layout
and I suppose a lot
of them didn't have that um but but why did you choose to embed cube um given that my understanding
with this was that it wasn't so much the the the the type of semantic layer it was it was how
well defined it was and how unambiguous the semantic layers were and so i mean just maybe
just elaborate a little bit on on semantic layers from the customers you have to try and work with not being up to up to the
job really and why again that led to you sort of joining cube so there there is an element about
quality as well so the the when we're when you're when choosing like something to embed
like firstly cube has a fantastic open source offering,
which we could just use. And actually,
there wasn't really anything
else out there that has
something like that. Like dbt semantic
layer is not open source, neither is at skills.
And so
that was a big deal for us,
was that we could just embed
it if we wanted to.
Secondly, Kube's APIs are the best.
So that's what we found, you know,
objectively comparing the different systems at Delphi
was Kube's APIs are the nicest to use with an LLM.
Like, for example, DBTs has places
where you need to inject SQL into a GraphQL query,
which is unfortunate and makes it difficult
for the LLM to get the answer correct.
And Kube's semantic layer is more capable
in that it can do multi-hop joins out of the box.
You can define how those should work in a given scenario
using Kube's views, which are similar to Littler Explores.
That's not possible in dbt um so there are a number of reasons why we chose cube and and yeah cube being
the open standard is is a is a really strong reason for that okay okay so so so tell us a bit
about what you've been doing at cube since you joined so you mentioned about the ai assistant and you mentioned about the api um i i you know in preparation actually for the for the
session on monday but also just my interest i've been playing around with the api um and and i'm
really impressed with that actually um but tell us what the api is the ai api is and um i suppose
the thinking behind it and yeah to start with that with that, first of all. Yeah, so I guess this goes back to Cube's
like absolute strength,
which is in the embedded analytics market.
And so many of our great customers
build their own products on top of Cube.
And they have been, you know,
and this was some of the stuff that we knew
before we joined Cube. And we'd this was some of the stuff that we knew before we joined Kube,
and we'd spoken to some of these customers, actually.
And so we knew that one of the first things
that we wanted to do on joining Kube
was to make a way for those customers
to use AI with Kube
without having to, like, roll their own,
with their own application with an LLM and Kube.
So because we knew that they're all going to have to build
their own retrieval augmented generation method.
They're all going to have to build so much stuff
that if we build once and do it once well for everyone,
that it would make a lot more sense.
So that was the thinking behind AI API.
So it means that they can make their own chatbots.
And so you can do conversational with the AI API.
You can get it to output a suggested chart configuration.
So that was something someone asked for recently released.
You can get it to run a query
or just to generate the cube queries to run elsewhere
if the cube queries can take a really long time to run.
So, yeah, and we've also released like a chart prototype, but for the AI API.
So people can just take a little TypeScript application,
which shows you how to build a chatbot application using the AI API.
Okay, okay.
So, I mean, the reason I thought this was interesting
was that we recently did a PSA for a customer
where we built a chatbot for them.
And it was something simple in JavaScript.
And then it would call a REST API.
So, and the REST API that we built was, you know,
it was a Google Cloud function and it was behind it
was some Python, which would take in the question. It would then use Langchain and the SQL agent to,
you know, go through the steps a SQL agent does to query the data dictionary and then sample the
data and so on there. And then it would return the results back to the customer, sorry, to the
front end, which would be, you know, the revenue revenue mount or whatever that sort of thing so it was a kind of it was a q a chatbot to query your data now looking at what
you've built with that api you know it looks like it would replace the need for that back end or
certainly replace the need for a lot of the complexity of that back end because you well
tell me how tell me how it would compare to say building something yourself using land chain for
example yeah so it should it should really replace that it's meant to be a turnkey solution where someone where someone
who's building that application in javascript doesn't then need to go and build retrieval
augmented generation they could just literally call this api and it works right okay okay and so
where so where do you see that feature going there So this is an API currently that you can,
and just to be clear on this,
because Kube has an API for querying it just for normal sort of BI type queries.
How does this work?
You send a prompt, a query to the AI API,
and then does it then send back the results
or does it send back a definition of a Kube query
to actually get the results?
So it depends on how you use it.
So there is like a run query parameter.
So if you think that your query is pretty small,
it's not going to generate loads of data
and it'll run quite fast,
you can set this run query parameter to true.
And so what it will do is you'll send the,
it's a very straightforward API request,
which is literally the blob of text
that the user put in as their question.
And then it will return a little summary of what
it's chosen to do using some terms from the semantic layer so if you ask for it uh i want
to know the number of sales by month it will say something like you know sure i'll say i'll tell
you revenue because that's what it's interpreted as the num you know the amount of sales by um
order month or something which is something it knows about from the semantic layer.
And then if you've chosen to run query,
it will also show you the cube query that it's generated,
and then it will run it and give you the results.
If you choose not to do run query,
and there's a good reason around timeouts and stuff not to,
then it will also send you the results in json as well okay and you say this
you're working on a charting part of that as well then so it actually will display charts maybe
well so what what some of our customers said was that's great that it gives us the how to pull the
data but like how do we know exactly what how to chart it and like you know that really depends on like the the data
that comes back so we we built a capability for the llm to actually recommend a chart type for
the type of data and like usually it's choosing between line or bar based on whether it's like a
a ratio type metric or it's like a cumulative type metric yeah okay and you mentioned uh lm there as well so what
lm does it use in the background for this so we use uh gpt the gpt4 family of models i think we're
using g4o at the moment uh but we have implemented for some customers uh like claude 3.5 sonnet that
works quite well as well.
And this is like some of the developments
that we will have for the AI API
is that we're obviously going to continue
to experiment with different LLMs,
open source, everything,
and figure out which is the best
to have that balance of speed and accuracy
that people need.
Okay.
Okay.
Now, I guess this is probably outside of your,
maybe your remit,
but can you,
have you been speaking to like,
say Preset,
for example,
about using that API to give their BI tools,
the conversational sort of ability,
or is that,
you mentioned about,
you know,
application builders could use this.
Could you maybe see open source BI tool builders using this?
We probably need greater alignment
with one of those open source tools.
So I can imagine some of them
who have more of a native integration with Kube
would consider doing that.
But I think those kind of open source tools tools they haven't focused heavily on inputting ai
inside the tool because i think you know but given the costs the online costs of using ai
having ai features in the open source tool is somewhat contradictory because you can't really
offer them um because they're not free. So what about the AI Assistant?
What's that in Kube now? Yeah, so the AI Assistant is very similar to the AI API,
except that it actually has a visual interface inside Kube Cloud.
So you can go to a tab in Kube Cloud and ask a question
and it will output an answer in a chart.
The only difference between it and the AI API
is that the AI Assistant uses this other new feature called Kube Catalog.
So Kube Catalog is a way to explore what's in your semantic layer, what's upstream of it in your data warehouse, as well as what's in the cube semantic layer
it can also say oh you've got a tableau dashboard that looks similar to your question as well that's
and that's like the key difference between it and the ai interesting so what's i mean i i'm guessing
this is just the first of the things you want to try and build with with cube in this area i mean
what what within the within the realms of what you are allowed to talk about you know what do you see as being the next thing that you'll
be doing in this area what can we look forward to as as kind of the next manifestation of your
dream really in this area so that there are a number of things about enhancing um both of those
features so a typical problem that we we've planned and we plan to solve this at Dogfight and we're planning to solve it here as well is how do you deal with things like where someone just refers to a value of a dimension, for example?
Like, you know, let's say rather than saying where a marketing channel is paid, they just say, show me all paid sales.
Right. And you don't know what that is from the metadata and the
semantic there. This is actually real data that's inside a dimension or a column. And so there's no
magic to doing this. You kind of end up needing to store the values of a column or vectorize the
values of a column. And so one of the things we're planning to build really soon
in order, hopefully,
before we GA these features
is a way for people to elect
to profile dimensions in this way
because a number of customers
have asked us for this
and we knew this was a problem
when we were at Delphi as well.
And more generically, we want to use AI broadly in Kube to help whoever it is.
So if it's the engineer, can we help them write code more easily, generate Kube semantic layer
more easily, either at startup or from maintenance mode? Is there a way to schedule
things that need to happen in cube for example you know cube
has some really powerful features around caching and pre-aggregation but some of that requires
configuration like could we do that automatically for someone so these aren't necessarily like
llm type ai things to do maybe they're more traditional ml type stuff. And yeah, we want to see what we can do for people.
Okay. And in your role looking after AI within Cube,
what are you seeing as being good customer use cases
and examples of this being used really?
For us, the most acute need that we see
is where someone is already selling a data product to their customers.
They have paying customers paying for data that they have, right?
And they're using Kube to power their existing application.
That's where they are desperate to get AI in
because their customers are saying,
hey, can we have an AI interface to this
so I don't have to like click around and figure out how this works?
I could just ask a question.
So either their customers or their board are saying we need this.
And so that's, you know, that's where we see a lot of interest.
And we've even seen some deals won on that basis
where someone is interested in you because of the AI API.
What about, I suppose, internal BI use cases?
Are you seeing it being used within companies
to help summarize financial results
or to have know the classic
thing about chatting with your data or things like that are you seeing actual good examples
of that actually now in practice in with customers yeah so we we have like a a fair amount of
interest it's still early but i think there's a fair amount of interest in both cathlog and ai
assistant from that kind of internal analytics point of view we've we released those two features
more recently than ai api
okay okay and just to put just to say on that you and i are both speaking at a cube event on monday
next monday yeah um just to kind of plug that as well so so what is that event on monday um uh
david yeah so cube is having uh its first ever conference this year so it's called cube roll-up
and we're having an event in London at RSA House
on Durham Street in the Durham Street Auditorium, which is like very, very central near Charing
Cross. And so the London event is the first leg of the conference and we'll have a second leg in
San Francisco on October the 15th. So what we want to do is to bring
Kube users and customers together,
celebrate a bit about how far Kube has come
from the original KubeJS project
which started as far back as 2018
and which is now growing into this really great community.
It was really interesting, actually.
Igor, who looks after our community a lot,
but is also a head of products at Cube,
he showed we have more GitHub stars than dbt,
which is amazing because of how well-loved Cube is
by engineers around the world.
And so we celebrate things like that,
but also show people what we've released recently what
would what you know show a bit about what we're going to release really soon and also to unveil
some new new product features and then also to just bring bring some other people on board like
who are great partners of ours or who built stuff with cube so like obviously you're speaking
as someone who's uh used cube in your engagements as a consultant we've got someone else who's built
a product around cube um and then we've also got a great like customer story uh in permutative
as well so it's uh yeah, we're trying to
it's kind of all kind of facets about
Cube. That's really good. Thanks very much
for coming on the show. It's been great to have you back
and looking forward to seeing you in person again on
Monday. Yeah, thanks for having me and
looking forward to Monday. Thank you.