Drill to Detail - Drill to Detail Ep.66 'ETL, Incorta and the Death of the Star Schema' with Special Guest Matthew Halliday
Episode Date: May 27, 2019Mark Rittman is joined by Matthew Halliday to talk about the challenge of ETL and analytics on complex relational OLTP data models, previous attempts to solve these problems with products such as Orac...le Essbase and Oracle E-Business Suite Extensions for Oracle Endeca and how those experiences, and others, led to his current role as co-founder and VP of Products at Incorta.The Death of the Star Schema: 3 Key Innovations Driving the Rapid DemiseAccelerating Analytics with Direct Data MappingAccelerating Operational Reporting & Analytics for Oracle E-Business Suite (EBS)The Good, the Bad, and the Ugly of Extract Transform Load (ETL)E-Business Suite Extensions for Endeca: Technical ConsiderationsThe Pain of Operational Reporting Solutions for Oracle E-Business Suite (EBS)
Transcript
Discussion (0)
So hello and welcome to the Drill to Detail podcast and I'm your host Mark Rittman.
In this episode I'm joined by Matthew Halliday who's previously an applications architect at Oracle
and now more recently he's actually a co-founder at Incorta. So Matthew, do you want to introduce yourself really? And welcome to the show.
Yeah, thanks, Mark. It's great to be here. Yeah, so as I mentioned, I started out my career at
Oracle. I got tricked into joining the company in some respects, or at least I didn't really
know what I was getting into. I joined Oracle because I saw a fancy brochure and I thought,
wow, that looks like a cool company. This was back in the late 90s when the internet hadn't
really taken off in the way that it has now. And so I joined this company and on day one,
I was sitting next to an accountant with an accountant in front of me for a week and a half
talking about general ledger, accounts payables, accounts receivables. I thought, what in the world
have I got myself into?
Fast track a few years, I ended up becoming the applications architect for all of the financials and procurement products within Oracle. And with that was a lot of responsibility for actually
creating the very first Oracle Fusion environment, where I had to source 55,000 tables, which I'm
trying to think maybe the largest ETL process in the world, I'm guessing,
and bring all those objects in and create that Fusion data model, bringing in PeopleSoft,
Oracle EBS together. I then went to work at Microsoft. I wanted to change a pace,
spent a few years there working on their enterprise applications. That was kind of a fun ride. And
then Oracle kind of coerced me to come back and
join them as an applications architect, working with some of their largest customers through
United States and helping them understand and leverage the Oracle technology for their business
applications. It was at that time that I met one of the co-founders of Encoder and we worked very
closely on some products together. And then one day, shared the idea for Encoder.
And it was at that point that I was like, I really couldn't go back to what I was doing.
I was just so excited about the potential and the possibility of what Encoder can provide
that I was all in from the get-go.
Fast forward again, five years, I'm responsible for products at Encorda and working with our development teams,
working with our customers to build out this next generation
or next transformative change, I would say, in the business of analytics.
Okay, okay.
So you were surprised when you said you joined Oracle and it was about accounting.
I remember speaking to Mike Darwin, who's one of the PMs in the Oracle Analytics team,
and he said when he joined Oracle in the late 80s,
he thought it was the actual Teletext company
that competed with CFAX.
So I imagine the surprise he had
when he was managing his product manager for Discover
shortly afterwards.
Yeah, that's quite funny.
I remember those days when you used to find out
what was going on and what was going to be on the TV and the channels, et cetera.
Yes, the football scores and that sort of thing.
But, yeah, what interested me about speaking to you was the fact that,
as you say, you were one of the architects working on, I suppose,
the ETL processes and data migrations into the Fusion sort of tables,
Fusion apps.
And I suppose how that has fed into what you're doing now within Coulter
and how you're trying, I suppose, to address some of the challenges
and things, you know, I suppose some of the things customers
were trying to do then that was a challenge.
I mean, take us back then really to when you worked at Oracle
and you're working on that task there of that ETL process and those ETL routines.
Customers were speaking to you about what they were trying to do.
What are the challenges that you were seeing there at the time
that have led, I suppose, into what you're doing now in quarter?
You know what?
It really spans back to a presentation I remember Larry Allison gave at Oracle
saying when he would ask
for just the number of employees at Oracle, it would take them a couple of weeks to give him
the answer. And I was kind of blown away, right? With all of this technology that we had with this
immensely powerful database, why couldn't we answer those simple questions? You know,
some of it at the time was down to, you know, global single instance and merging all these different instances of ERP into one.
And then once we did that, we ended up creating another monstrosity of a problem called
incredible scale of transactions on a highly normalized data model where we had all of
these join relationships, which was fantastic for OLTP-based applications.
So when you're updating, inserting, they work really great.
But when you want to do analysis across everything,
it became almost impossible.
And so one of the things, especially in the finance sector
that we were trying to address is,
how do you take something like highly aggregated GL postings
and then be able to drill down into those details?
And so Hyperion became a of dominant player in that space.
And then Oracle acquired Hyperion.
And then we spent a lot of time figuring out how do we merge these two together?
And so one of the projects I worked on there was where we took Hyperion inside and started
maintaining the cubes as well as the OLTP and then trying to create connections between
them.
It's always been this challenge of going from that.
It's going from aggregate all the way down to detail.
So that's always been there.
And so why we've not been able to really get at that,
customers have had to go down other paths.
So we look at, okay, let's take the data.
Let's take it from 55,000 tables, which reside in the Oracle EBS environment.
And let's try and reduce that to a single big fact table with a bunch of dimension tables
around it.
Because largely, we were just trying to work around the data model, which worked really
well for data entry, for updates, and for concurrency of users.
But for analysis, it didn't really cut it.
And that's where the data warehouse
started coming out of. And that becomes a whole different beast and animal in its own right,
bringing all this data in, connecting, transforming the shape, flattening of data,
aggregating it, understanding the requirements of what the questions people are going to ask,
all becomes incredibly complex. And whenever someone wants to make a change, it felt like it
was a house of cards, right? To go back and say, okay, I'll make this change. And then, well,
how does that affect all my ETL processes? How do I really understand that this is not going to mess
things up? And it became really, really error prone, very, very difficult, and certainly nothing
that someone could just take off the shelf, purchase and start getting instant data from. So it's always been this painful challenge. And in some respects,
I think people take pride in the fact that they're fixing a really complicated problem
and using these complicated tools and doing it in a complicated way because it really shows,
hey, we're smart, we can do this. But also there's got to be a better way of addressing this common problem
of going from aggregates to detail and having flexibility at the same time
to navigate where you want to go.
Okay, okay.
So I remember sitting in an open world presentation about that link,
I think, between S-Space and the and the bi and the and ebs
and at the time thinking that was you know it's quite i suppose quite a brave uh a brave quite
a challenge to take it on really to do that and i don't think it ever sort of went any further than
that uh and but one thing i saw that you're also involved in uh was the indecra extensions to ebs
i mean and again i can see sort of in a way parallels or certainly a kind of common theme
there with the things you're doing now in Coulter.
What was that kind of product or that initiative about at the time?
Yeah, so, you know, I remember when I first saw Indeca and the EBS extensions for Indeca, I was kind of really blown away.
I thought this is really, really cool.
I got super excited about that multifaceted search capability that you could, you know, drill down to the details that you wanted to see.
And so I remember I kind of got my hands on the Indecker product and I pulled in the Oracle bug
database into Indecker and started looking at it that way, partly because of my frustrations on how
to navigate around bugs inside of the Oracle tool set. What then happened is I started going and
talking to customers and presenting to them the EBS extensions for Ndeka.
And the room would change.
People would get so like, you know, on the edge of their seat, you see them paying attention.
They would put their phones down.
And everything was going great, right?
They would say, this is amazing.
This is like, this has brought new life to Oracle EBS.
And it really had this great promise.
Everything went well until there was always one question.
And it's like, can this replace my data warehouse and OBIE?
And it was always, no, this is an addition.
It handles current data sets, maybe three months worth of data.
And there was just a lot of problems because one of the things with Indica
is you needed a flattened dataset, right?
And to do that, you had to put those against these views.
So we created these Indecker views inside of Oracle EBS.
The problem with that is those views were really hard to put tune.
And so we spent the majority of our time not building dashboards, but working on performance tuning of the views just to pump the data out.
And so you couldn't get to near real time and
you couldn't bring in the data volume people wanted. And that's when I saw people were just
kind of, okay, it's, it's became then just a, it's a UX improvement for, you know, navigating my,
my, my open transactions. Okay. I, and also I don't think I'm going to go down that path.
So it didn't get the traction that I thought it was going to get at the beginning.
That was the thing in, in, in quarter that kind of got me excited yeah yeah i mean that's um yeah i think the point i suppose to me is is it's pointing to a need there
isn't it in them it's pointing you know the fact people were on the edge of their seats and they
did put their phones down it it must have sort of said to you you know this is a problem that
has yet to be solved.
And there's a lot of value in it for customers if you do that.
Yeah, no, absolutely.
I mean, at Oracle, I remember whenever we'd get their quarterly results, our VP would have, you know, a spreadsheet.
And they would go through and do analysis and would say, how was the quarter?
How did our products do that would take a few weeks before they could ingest digest that data and synthesize it in a way that we could then learn about you know how our products were doing in the marketplace that's with all of the horsepower of the sun acquisition all of the
engineering and product development we still didn't have what i would say is really you know
freedom and data access to look at the data that was pertinent to our business to understand how
to use it okay so so the other reason i was keen to speak to you is you you you i think you put out
quite a provocative um blog post recently which was you know the problem with etl and star schemers
and generally making the kind of point i suppose that that our current way of trying to build data
warehouses and do analytics on these complex OLTP type
systems is kind of broken and is not fit for purpose, I suppose, really. I mean, maybe
starting at the start of that post and talking about, I suppose, the route of relational databases
and how they come out of the kind of COD date kind of rules for data normalization. What is
the inherent problem really in trying to do analytics on these types of data sources? And how does ETL, if anything, make this kind of worse really?
Yeah, so I will say there's two parts to ETL, right? There's the good, which I would say is
the data enrichment. You're bringing additional value to the data. Maybe you're cleaning up the
data because of duplicates or you're creating additional business rules.
Maybe there's like revenue, for example, you want to factor in loyalties or things that you might
have to pay, right? Those are kind of bringing value to them. The other part, the dirty, ugly
kind of underbelly of ETL, which I think is the kind of core problem is that you need to transform
the data. You need to take the shape of the data and then make some assumptions
on what are the columns that are important?
How do I want to aggregate this?
How am I going to put this into a data model
that I can slice and dice?
And the first question you have to ask
before you can even start that process
is you need to go to the business and ask them,
what are the questions that you're going to ask
for the next three years of your data
so I can go build a data model to satisfy it?
And I'll come back to you in nine to 12 months.
That, well, one is the question to business is completely unrealistic, right?
Imagine our conversation today, if you could not ask a follow-up question to anything,
and I'd say something, you're like, wow, I didn't anticipate that. I wish I could ask a follow-up,
but we'll say, well, maybe next year we'll have that. The conversation would be really jagged and weird. And it would just be like, well, that's not how we converse. That's not how we think. We make connections and see things as things develop. And so that's kind of
one of the core problems. Now, what I think is fundamentally the bottom of the root cause of all this is there's a few things.
So first of all, we started off, we were predominantly disk-based with storing our
data, right? So we started off with memory. When I was at Oracle, I remember when we got,
I think it was like a 52 megabyte memory machine or something like that. It was like, wow,
so much memory. I can give one and a half megs of data to this Oracle instance or something like that. It was like, wow, so much memory. I can give, you know,
one and a half megs of data to this Oracle instance or something. Everything was disk-based.
And then it was also stored in a row level, right? So that all the information that you needed for a
single transaction would be held in those pieces. Now, when it came to slicing and dicing it,
that became problematic
because we were storing the data
in a way that wasn't conducive.
So the first thing we did is,
let's put things in a column of format.
Great, because when I do analysis,
I normally look at column A and columns D.
I don't look at all of the columns for everything.
And so storing it in that format lended itself.
Second bit was memory.
Memory all of a sudden plummeted in cost and we saw memory
footprint increasing. And that was phenomenal and great. And of course, we started to leverage that,
right? And that made things a lot faster. There's one final bit that was kind of never,
ever resolved. And that was, how do I join data? How do I go from one table to another
without having to go through this hash table join function
without getting too complicated that's not order n log n so exponential when you add more joins
but is is linear right how do I get to the point where I can run a query and literally have 60
tables in one single select statement and have that perform at scale when I'm not just looking for a particular
month with a particular cost center, but I'm wanting to look at all my cost centers across
all my months of business, right? That has been the Achilles heel of everything. And it's because
of that, we've had to change the shape of the data. We've had to put it in a way that we can
leverage some of these great advancements we've seen in terms of analytics. But largely, I would say the innovation we've seen in the last 10 years in analytics has
been around flattened singular data sets, which is great if you're an IoT or click stream data.
But if you're analyzing business systems, your ERP systems, your supply chain, your ITSM,
all of those applications, that's not how your data is stored.
And that has always been the major problem.
It's stored in a format you could never use.
Okay, okay.
But so what about, I mean, you have UX Oracle, so wouldn't the solution to this be, say,
putting it into, say, an S-based cube?
Yeah, sure.
I mean, if you could define, well, there's two problems, right, with that. One is you is you've got to again know all the questions you want to ask up front to define the cubes because the cubes
are going to define what you want to look at so if we would say you know how many pens do you have
okay great we're going to roll up the number of pens we have and we'll have that and we'll store
pens by month and that'll be one of the you know dimensions of your cube the next day you come in
and you say,
well, I need to know how many of those are black pens and red pens and blue pens. Like, well,
we don't know that, right? We have no insight into that level of detail. So then you say, okay,
let's change the cube structure. Let's go ahead and then create the world roll up by color of pen.
And then someone says, well, what about dry erase pens versus ballpoint pens? You're like, well,
I didn't know you wanted that. So let's go back and do it again.
And then all of a sudden,
now you're permeating the number of types of pans,
times the number of colors,
times the number of months.
Before long, your cubes don't perform.
You know, they perform well
up until maybe 5 million records.
Beyond that, good luck.
And so the maintenance of all these cubes,
the loading of the cubes,
and you still have ETL to feed into those cubes,
it's not going
to cut it either okay okay and and the etl part itself i mean just the sheer time it takes to
to build these mappings to to to change them up as data changes and so on i mean i i haven't i
haven't seen a full-blown data warehouse project in years really where people have invested
who've committed to a kind of project of several years of etl development to do this sort of thing. I mean, do you find again that there's less appetite for that in
the market now? Yeah, it's interesting. You know, I'm definitely seeing ETL processes still there,
especially with large organizations that have been around for a while. What I am seeing,
which is pretty interesting, is there's two things I think going on. One is there's people
who've never built a data warehouse in their current company, right? So maybe their growth company, they've gone to this point where they're like,
we're at that point where we kind of need to go to that next level, but we've got, you know,
challenges with or concerns about taking the data warehouse approach. And so people say,
what alternatives do I have? I know that that's fraught with expense and problems and maintenance
and it's restrictive.
And generally those projects don't bring customer delight to your end users.
The other flip side of what I see in those larger organizations is that there's some activities
taking place where they're looking at BI modernization, looking at the analytics footprints
and figuring out how do we change this. And there's an appetite to get out of the ETL business
because I've been at companies where they literally employ two people permanently
just to keep their ETL processes running for accounts receivable.
Just one area of their e-business suite.
And they have two people because they always have problems.
There's things they need to check.
And they need to make
sure that those reports are correct. I've seen countless customers where prior to Encoder,
they have these ETL processes and then through no fault of their own, they have human error in them,
right? It's going to happen. When they go to Encoder, they say the results don't match.
And then we find out that they were living with problems in the ETL process they didn't have
because it's not easy to get right.
It's not something you can just pick up and do.
You need to have really good technical skills in a particular product.
Plus, you need to understand the business requirements and the application.
So it's a very unique role that understands so many different areas that it's really become super difficult to sustain this with any kind of sanity and reality.
Okay. Okay. So let's move on then to Encourter. I mean, just give us the, I suppose, the elevator
pitch, what the product is, and then we'll kind of go into a bit of detail then really of how it
solves these problems you've been talking about. So just give us the, I suppose, the high level
overview of what the product is really. Yeah. So I think the way
I like to refer to Encoder is it's an analytics platform that really enables you to, from the
point of the source system, ingest your data as is to be able to not have to do all those ETL
processes and then run queries against that, whether you use Encoder's own visualization or
use something else like Power BI or Tableau or MicroStrategy or even Excel against huge volumes and scales of data that you would not
even come close to seeing even come back or perform in other systems. So we have examples
of customers that have transactions that currently in their systems prior to Encoder were taking
six to 20 hours to run a standard report.
And then they're able to give that exact same report
with additional benefits of additional visualizations
and ways of looking at their data
and have it literally come back in sub seconds.
It's pretty unbelievable when you see it.
Most people are skeptical.
Most people don't believe it's true.
And I'm always a great fan of that
because I say that means we're doing something really
exciting.
If you could believe it, it's probably a marginal improvement.
When people just say, I don't believe you, the only thing I say is, that's great.
You understand the problem.
You understand what we're saying is pretty audacious.
But then I just say, well, we've got some amazing customers who are all referenceable
with 100% renewal and retention that will speak
to you about how this is true. And then if that is true, really the onus is on you to figure out
what does that mean for your business? And to ask that question, if these claims are true,
what does that mean? Because if those are that transformative to your business,
really the onus is on you to figure out, is this real or are we just making this up?
And obviously with
the customers that we have, the renewal rates we have, I'm definitely on the camp that this is
changing the way we approach analytics. Okay. So you're responsible for product
within a quarter. So just give us a bit more substantial about what the actual product is
in terms of how does it handle the ETL? What does it, what does it store data in and so on?
Maybe kind of a bit more of a technical thing
as to how it actually solves these problems.
Yeah, sure.
So the product, first of all,
you know, one of the first questions I get is it,
is it in the cloud?
Is it on-premise?
And the answer is yes and yes.
So it really gives a good option or a capability
for people who maybe have cloud in their strategy,
but are not ready to go there quite yet, that they install in quarter on-premises uh we have close relationship with
azure you can go on azure aws and running quarter as well so we work in both of those environments
it's a complete software solution there's no hardware there's no gpu type processing on it
uh so we can work on um we're platform agnostic from that perspective. Everything's built around HTML5
and JavaScript. So there's no tools that you need to download for any part of the system from
configuration installation all the way through to application users going in looking at their data
or analysts looking at their data. The whole thing is built around that HTML5 interface.
The core heartbeat of Encoder is definitely this direct data mapping engine which we've created.
And a little bit about what that engine is and what is direct data mapping. So
in traditional databases, you have gather schema statistics, where you go out and profile the data,
generate some schema stats, which then your cost-based optimizer will leverage to make
informed guesses, decisions around the execution plan for a query. With Encoder, we've removed
that, made it completely redundant. There is no cost-based optimizer inside of Encoder.
When your queries run, they run immediately. And what's unique, this data map
that we have really provides the ability for us to know exactly how data relates to each other
in other tables. And so it doesn't have to go guess, should I filter by the city or by this
product before applying filters and going through the execution plan, it literally will understand, oh, this transaction relates to this one
and it knows almost like how to directly get there.
Just kind of like jump to that data point and be able to get it
without having to sort the table and go through all of that complexity.
That's kind of the heartbeat and what kind of unlocked
a lot of the need to remove etl and those things the product
also has this ability for you to very easily with a schema wizard just point at source systems
whether those are applications like salesforce or service now or whether those you know net suite or
whether those are databases that you already have right any kind of database you have in
internally or the cloud you can connect to those. And then also big data applications or big data technologies like Kafka, for example,
where you can hook Encoder up to that and then bring that data from those applications or from
those data sources, bring them into an Encoder platform. That process, we can manage the
orchestration, the loading, we generate parquet files, which are open file format standard.
We keep that so your data is not proprietary locked into some format that only we can understand.
We keep that data for you.
And then we leverage that data for our direct data mapping engines that can give you this query performance on top of it.
We also have provided the ability for SQL interface to it too, right?
So you don't have to actually, you know,
the commoditization of visualization tools.
A lot of people say I'm a Tableau shop or I'm a Power BI shop.
And so we wanted people to realize
you don't have to use our visualization.
You can connect a different product to it.
What our visualization, you can connect a different product to it. What our visualization tool gives
over others is that it has that tight lineage between the data in the platform and the analyzer
tool. So if you're looking at things, you can find out sampling of data, you can look at what is the
definition of this column, what is, is there any descriptions about it? So all the metadata around
the data to curate meaningful data sets is kind of brought to
the surface.
So those who are generating insights are able to look at it.
So that's, as a nutshell, kind of the main bit.
There's one final bit, right?
We've seen a lot talk about AI.
And so we also have PySpark embedded within our platform.
And we can leverage that
so you can build machine learning algorithms
inside the platform and orchestrate those
and have transactional data merged with
or joined with AI-based models
and the output of those AI-based models
so that business users can then slice and dice that
and interact with it.
This is having to kind of go through
some lengthy data science kind of route.
Okay. I mean, there's a lot of products and a lot of technology in there, really.
How do you, I mean, you mentioned earlier on that you were involved in bringing together
data from, say, I don't know, PeopleSoft and EBS and so on. And presumably part of the challenge
of that was to come up with a single kind of customer master or to,
to,
uh,
did you,
did you,
did you pick up customer records with different systems?
I mean,
how,
how does Incor to help with that kind of problem really?
Yeah.
So when I was at Oracle,
I was working on creating that very first fusion development instance.
And that was largely at the time is Oracle warehouse builder.
And we,
we,
we took,
we took those,
um, warehouse builder. And we actually created a program to automatically generate the graphs
to run as part of the ETL jobs. And that was taking these objects from EBS release 12 and
then pushing them directly into Fusion and then also bringing the data along with it.
It became pretty challenging, of course, when you would have things like your TCA models or your customer definition, which would span across both,
which was going to be the source of truth. There was no UI for developers, so they couldn't enter
data. It had to be done by pulling those things in. So there was a lot of work around doing that
to get something that the development teams could, who are focusing on the back end at the beginning, the EOs and the VOs and the AMs, for example, for them to be able to
get up and running. So it was, you know, a full-time job just to kind of manage this. And,
you know, probably that's when I went completely bald during that time. And so that's kind of
the fundamental challenge, if you like. In the world of Encoder, it's kind of kind of the the fundamental challenge if you like in the world
of encoder it's kind of been fun to see companies i worked with one company they literally had 40
erp systems uh which you know i kind of hadn't heard about that for a while because it's kind
of kind of surprised but um that a lot of these erp systems and within quarter they're able to
create one data model and feed all of those in
using multi-source queries just to be able to bring them in. And again, because the data wasn't
fundamentally having to be changed, it becomes really easy. You can just bring the tables in,
replicate them. We literally have examples where we've done disparate data sets and brought them in
in half a day, right? You go in, install the software, hook it up to a couple of systems, bring in the data.
And that same day, you're seeing insights that literally customers said, I would expect
it six weeks, six months before I see anything.
We had one example of a product profitability.
There was a customer that has 28,000 stores worldwide.
And they want to analyze all of their SKUs by store location.
And they knew at a regional level what was going on,
but they didn't know down to the store level for every single SKU
for every day what was going on
and how profitable each one of their products were.
They'd allocated about $2 million for this project
and about a year to do it.
Within quarter, it was done in 10 weeks.
I've been able to bring that in and to turn that around. And now the business users can imagine
now for an extra nine to 10 months, we'll have access to data that they didn't have before.
Plus the other beauty is they have access to 40,000 columns worth of EBS data in this particular
case. And they didn't have to make those assumptions of
like, well, what's the data we want to bring in? It's all there. At any point in time, they can say,
hmm, that's interesting. What if I slice it this way? And they're able to go in, add that filter
or add that column and be able to make that change. And it literally becomes like a 20 second
exercise, which our customers are telling us prior to Encoder, that would be 10, 12 weeks for any change to their data warehouse.
Okay. So obviously the ETL side sounds interesting.
How do you then store the data in such a way that
you can slice and dice it by any way you want? It sounds like you've got some
of the performance of an OLAP cube,
but you've got also the, I suppose,
the flexibility of a relational database
and maybe kind of, I suppose,
the openness of, say, a NoSQL key value store database.
I mean, what is the kind of the underlying database engine technology
that you work with, really?
So we actually built the Encoder engine from the ground up.
It's 100% Encoder designed and engineered.
And we leverage the data being stored in a columnar format in Parquet.
But once we have that data there, we use our engine to run against that.
And we've really focused on the one use case.
We didn't want to make a multipurpose database that could be used to run your business applications on.
We wanted to create something for the sole purpose of doing analytics.
And so we really focused on that.
And you can kind of think of it as we built a race car, right?
And when you build a race car, it's very different than building a luxury car for people to commute in.
So there's no AC.
There's no electric mirrors and all those kind of things.
We made it go really, really fast,
but we made it handle what you want to do in analytics as the primary use case.
And so with that, these engines are able to,
there's actually multiple engines,
but they all leverage the direct data mapping.
So we have a pivot engine, an aggregation engine,
a search engine, a filtering engine,
all of those things inside of Encoder are
specifically built to do that task.
And they all leverage this direct data map, which really is the secret to how Encoder
is able to give this earth-shattering performance.
It's one of the things that I think really is, once people understand what this is and
how transformative that is, it kind of never ceases
to amaze us how fast it is. You kind of forget sometimes after you've been using, and I've been
using Enquora now for years and years. And when I actually go back and sometimes see customers'
environments, I just forget. I hired someone a few years back and after six months, he said,
I just had my most frustrating day ever i was
like oh man what happened he goes well i wasn't using in court i had to use you know i won't say
the name this other product and it was so frustrating i couldn't get anything out it took
me hours just to get five metrics to do a comparison and sometimes you just forget we get
so adept to change people don't realize that their systems
shouldn't be the way they are and that there is a better way of doing it and then other customers
that get into the in quarter way they forget what the world was like it's amazing how our memories
are like that okay okay so um okay so how talk me through a typical i I suppose, onboarding and, say, development process that you'd have.
Imagine I was a customer and I had EBS and I had financials and I had PeopleSoft and whatever.
How does the onboarding process go and how does a typical, I suppose, first engagement go to work with this really?
Yeah.
In reality, as I i mentioned no one believes us
right that the people are skeptical and they say no way you're doing something behind the scenes i
don't believe you now what generally happens is you know we do pocs no one's generally willing
to buy a product that they think that just doesn't even i can't believe that's real what that looks
like is let's say in the
case of EBS, we'd go to a customer and we would install in quarter in about 20 minutes, everything's
up and running, and then we'd connect to the data source. That generally is the most difficult part
is making sure that the servers we have actually have the ports open so we can actually create a
JDBC connection. It's kind of funny, but that's the most difficult bit, right? The moment someone can give us a valid JDBC connection from that box, we're kind of off to the races.
Then within 15 minutes, you've probably brought in their accounts receivable data or payables data
and have some dashboards up and running. Literally, we have application modules that we can,
you can run through Encoder. You can say, hey, I'm interested
in this particular topic. Encoder will do the data lineage, figure out these are the objects I need.
Here are the joins. If you're familiar with EBS, there's no foreign key relationships in the
database, but we've automatically built in detection that will say, these are the joins
that we know about and we'll deploy those for you. And so you don't have to do that work. And then you can
just start slicing and dicing on it. And so really customers see that and they go, wow, that's really
unbelievable. And then they just want to start pushing the limits and they might spend a couple
of weeks just using the product, trying to find out, can it do more? Can it do more? Can it do
more? And they keep throwing more at it, right? Bring in more datasets. Well, let me see if I
bring in this other dataset or this legacy dataset. can i handle this stuff i have on mainframes all right
we're seeing customers go down those paths as well it's it's it's been pretty extraordinary to see
that the different use cases getting thrown at us at this point okay okay so i suppose another
then the elephant in the room we've not really mentioned in here is is all for themselves and
and there's the bi apps for example out there out there, which I think is one of its transition periods at the moment.
But particularly, you know, customers are being encouraged to move to the cloud
and there's solutions coming along there.
You know, if a customer said to you, this is very interesting,
but we're thinking about moving our data into the cloud
and we're thinking about a package app solution in general,
you know, what would your reaction to that be?
How would you kind of potentially position your product
against maybe a packaged solution running in the cloud?
Just doing the same thing you've done for 20 years,
but just thinking, you know, in the cloud,
sure, you get some elasticity
or you don't have to pay for the support
of the hardware and the services.
But in essence, nothing's changed, right? It's still
the same old ETL process behind it. And while there's, I'm not going to say that, you know,
that there's definitely benefits, right? I'm not going to say there's no benefit to go into the
cloud. I'm a fan of the cloud. I'm a favorite of the cloud. Absolutely. Going to the cloud makes
sense, but that doesn't really change anything. Your business users, how is that going to change their lives, right? Maybe it makes a little bit marginal improvement for yourself.
And that's what I've seen, right? People can say, hey, that's something I understand,
something I can get. It's not that disruptive to my flow. It gives me a marginal improvement.
I'll go and put this in the cloud and then, you know, use those systems that way.
But really, what has changed?
I mean, not a lot, right?
It's still the same thing behind the scenes.
I think, you know, you'd be too polite to say this, really. But I mean, one of the kind of the, I suppose, the dirty secrets of any packaged application, really,
is that I think it typically sells well to the people that don't actually have to use it and um you know a package solution is good but it never seems often to be the thing
that users actually need a lot of the content is often thrown away and and and either they're not
customized what they want or or the work to customize what they've got is is is kind of
massive i mean looking at that as a thing with your product once you've done the initial um
onboarding how easy is it to then evolve what you're delivering and evolve the analytics as the needs actually emerge, really, so you're not stuck with what it is you did on the first day?
Yeah, great question. Butterfly is a customer of ours, and they took Oracle EBS, their supply chain, advanced supply chain products, and some other EBS modules around inventory, et cetera, and they leveraged Encoder.
So we went in, and within four weeks, or four to five weeks, they were in production on five modules, I believe, on EBS.
And what we were able to do for them was we put in a semantic layer in place.
So we brought the physical tables in as is, mirrored them from source.
We then had a semantic layer because quite honestly, your analysts don't want to deal with tens or 20, 30, 40, 50 table tables when they're doing analysis.
They want to look at somewhat kind of flattened views of the world.
The problem is, is that the flattening of those is very expensive. So we say, just don't flatten them, but you can virtually flatten them. You can
have a definition that looks like a view, right? A descriptive view. So we have those inside of
our platform. And then we gave some sample dashboards. And then what we found is the
business users, those who had never built dashboards before in their lives, went out and
said, I like your dashboards, but I'm going to build my own. And Rachel from Shutterfly went out and built 30 dashboards to
run her business. And what's pretty cool is when you look at those, days on hand supply,
all of these kinds of things that they have available, they're able to look at.
And what's really, I think, quite exciting is the willingness of our customers to then share that with us.
And so they have been sharing it.
Broadcom has been sharing.
Keysight has been sharing the application content that they build.
And then other customers are able to benefit from it.
And because we're not going through an ETL process that's curating data to a certain format, these things are massively deployable across customers.
And so you can literally take the work that's been done at Shutterfly and deploy it at another
customer.
We did that at like Guitar Chocolate, where we, you know, very small company, very small
IT team, but able to benefit from the collective knowledge that our recorded customers are
sharing and being able to
leverage those and say, oh, this is how Shutterfly looks at this. So we can look at my chocolate
bars in stock and days on hand of supply. All these kinds of things become very easy for people
to leverage across. If you have an ETL process, it's a black box. It's got tons of stuff in it
and nobody has an off the shelf ETL box that works, right? It's not a black
box. You end up getting it, pulling it apart and trying to put it back together. Don't care who it
is, right? Domo. Looks like it's a nice way to bring it, but it's still ETL process behind the
scenes. ThoughtSpot, still ETL process behind the scenes. That's kind of the bit that people don't
like to show, right? They never lead with that in a demo. They always show other things. And then when you push them, so how
is the data going in? Where's that going? How did you make that assumption? It's like, okay,
it's still there. It's a star schema. It's a flattened view. It's aggregated tables and it's
data pipelines. And then we have companies that are jumping up about, well, let's just automate
it. Let's just put investment into data warehouse automation. And I'm like, still the same thing, still the same way of doing it,
put it in the cloud, automate it, still the same thing. Sure, it may be a little bit less painful,
but we're really slapping band-aids on everything versus going to the root cause.
And the root cause is, why do we need to change the shape of the data? When I took my very first class in school on SQL, I didn't know the queries wouldn't run at scale,
right? I learned to select statement, put it together, boom, run it. It's like, if I only
ever took SQL 101 or the first year of SQL, and then took it to an enterprise application or Fortune
100 company running their business applications
on an Oracle database or any other database, right? We're completely agnostic. I know we
spoke a lot about Oracle EBS. It's my background, but we have customers who are not EBS at all.
Most of our customers are not on the whole. A lot of them have some ERP systems, but
those queries won't run, right? There'll be snapshot to old error messages.
You'll have queries that'll never come back.
It's just a mess.
And what if that would work?
What would that mean if I could just run the SQL,
like very rudimentary in the way that I thought I could,
but I was never able to.
And I kind of scratched my head.
And I remember at Oracle, I would go to Ahmed Alamari or Lester Gutierrez, who were like the performance gurus, right?
Super smart guys. I wasn't smart enough to figure out how to get my SQL to work.
And then they say, oh, we need to denormalize. We need to take this data. Let's get rid of that
join. And we would do all these things, even within application development. We would say,
if you're doing a join just to get like a status code or something, let's put the status code on the transaction.
Let's get rid of that join. It's going to be faster. If it's more than three fields, okay,
then we'll leave it out. We started to have to do all these things to constantly work around
this one limitation. No one's been able to fix it until now. Now that we change it,
it changes the approach, but everyone is so gung-ho going down these paths.'m just saying look we're all going in the wrong direction it's innovation down the same path
it's the wrong path there's a different way that's where we need to be going fantastic fantastic so
i'm going to ask you in a second how people find out more about in quarter but um before i do that
what are you you're tackling now the the problem of etl and and and so on what what's the
to your mind what's the next kind of customer problem that you see out there that hasn't been
addressed or the next challenge or the next kind of i suppose in a way speed bump in in getting
analytics in people's hands i think that's around self-service i think what we're seeing right is
people are coming in and almost like back in the 90s i don't know if you remember on people's
resumes and cvs they would put microsoft word and microsoft excel is like some of their skills right um it kind of
seems ridiculous to put that on your resume right now like i know powerpoint or i know keynote people
like kind of laugh at you um today i think the new one is um data data driven or and you know
can do analysis and things like that and so you got all these people as proliferation of content. I've seen companies with 17,000 plus dashboards
that they've created and they have no clue what's going on with them. You ask them,
which ones have been used? No idea. But you need to move, you know, my platform to say,
well, I need a migration path because I got 17,000 reports. I'm like 17,000 reports, like
what in the world is going on?
And this is only going to get worse.
I think it's only just begun.
And so are you able to answer questions?
And how do you manage that, right?
How do I bring formal process to things that maybe get viral, right?
Someone creates a dashboard and then shares it with someone.
And then all of a sudden that becomes the hot thing.
That probably should be productized.
Someone should look at it and say, wow, is this actually correct? Are people using right data?
I think people are using data like a hammer and they're going around hitting people.
I've heard nightmare stories of people literally laying off people because of data and then finding
out the data was wrong. I'm like, that's pretty bad, right? And that's, you know, no one died,
but hey, right, that's affecting livelihoods and? And that's, you know, no one died, but hey, right,
that's affecting livelihoods and the ramifications of how we use data in that way, I think need to
be figured out. And so one of the things I'm pretty passionate about is how do you bring
sanity to what we're building? And I kind of feel like the illustration is this. It's like
before we had CRM software, people were just managing their sales and they
would kind of say, I think we look good for the quarter or whatever. When it comes to analytics,
I feel it's a little bit like that. What's the analytical app that people use to deliver
analytical applications? How are people doing A-B testing? Are the analytics you're building
actually even doing anything? I contend that a lot of these dashboards that people build are pretty much the most expensive
pieces of virtual art sitting on a virtual corridor that maybe nobody looks at.
And even if they do look at it, maybe it doesn't even change or move the needle in any shape
nor form.
How are you measuring that?
How do you know which users actually use your data?
Which dashboards are actually being used, which reports are useful, and which ones actually you can attribute investment as being worthwhile, right?
All of those things, I think we've got to figure out because this proliferation of people saying
they're data savvy, that they know what they're doing, and just give me access to the data.
And then you get in these rooms and you have people saying, well, my report says this,
and someone says my report says that.
And then everyone's scratching their head going,
you know, which one's real.
Those days, you know, we've got to come to a point,
I think, of being data literate to a sense of that we really understand
how to use data and how to ask questions
and really how to challenge data.
I see sometimes that people don't
do that kind of, you know, we spoke about journalism and rigor of investigative journalism
and how that's dying and people only read the headlines and all those kinds of things.
I feel like that's happening with data. People just read the headlines. They don't dig into
the details. And often they couldn't because it was only aggregate detail. How could you drill
down to the details when you never had them?
It became super expensive to say, okay, here's an aggregate number.
We're down on this product.
I should not sell it anymore.
But then how do I drill into that to actually see what's going on, to understand exactly
what were the transactions behind it?
How do I have confidence that the high level aggregations I'm looking at are correct. Because I would contend that probably over 50 to 60% of the time,
somewhere you have data problems that you don't even know you have.
I've worked with customers that literally have had values
that report into the street that have been incorrect.
And then they found out.
And nobody wants that.
Interesting.
I'll look out for your thoughts on that in the future then,
because I totally agree.
Yeah, I absolutely agree.
I think that's one of the next big challenges, really.
So how would people find out about Encourter,
and how would they, I suppose, get to experience the technology
and get to try and, I suppose, in a way, test out what you're saying here, really?
I mean, it sounds fantastic.
What's the next stage in establishing if this is the right thing for them?
Yeah, I think, obviously, there's Encoder.com.
No need to even say that, right?
But there's two things.
The first thing is, one, is understand that the problem that we solve, I think a lot of
people sometimes look at Encoder and they just bring in like a single source data set,
right?
A single flattened table and then just evaluate it feeling like it's a visualization tool.
Completely wrong way of doing it.
And if you're going to do that, I'd honestly say you could find better products at this
point.
If you were going to bring in a highly complex data set, that's something that
more mirrors exactly your backend systems that you have, your application data models that you have,
and leverage those, I guarantee that you'll find nothing that comes close to Encoder.
So there's a number of ways you can learn more, right? On our website, you'll find there's
blog entries, there's eBooks, there's webinars.
There was a webinar from actually yesterday that Keysight provided with actually demo
that live system.
We have another one where Shutterfly demoed that live system, showing in quarter, showing
how they're using it and how it's transformed their business for them.
There's also, you can reach out and schedule demos.
We're happy to show it.
But as I mentioned, a lot of people, a lot of skepticism, um, people who are not skeptics generally don't understand the problem.
Those who are skeptics become some of our biggest ambassadors and, um, tell others about
the product and become real passionate champions about it.
Um, so I've had conversations with diehard data modelers types who actually make a living
presenting at data warehousing conferences and spent days with them.
And literally when they get it, they're like, oh my goodness, this really is changing everything
in terms of what we've been doing and how we've approached analytics, data warehousing,
ETL, and, and just what we're doing in that space.
Okay. Fantastic. Well, brilliant. Well, it's in that space. Okay, fantastic.
Well, brilliant.
Well, it's been great speaking to you, Matthew.
Appreciate you coming on the show
and good luck with the product.
And yeah, hopefully some people will kind of check you out
and maybe get some kind of benefit
out of what you're doing.
Great. Thanks, Mark.
It was a privilege speaking to you.
It was kind of fun reminiscing
about some of the old times at Oracle
and seeing where we will head up in the future,
but great chatting.
Warehouse builder.
Excellent.
Make me laugh.
Cheers.
Okay.
Take care. Thank you.