Drill to Detail - Drill to Detail Ep.66 'ETL, Incorta and the Death of the Star Schema' with Special Guest Matthew Halliday

Starting point is 00:00:00 So hello and welcome to the Drill to Detail podcast and I'm your host Mark Rittman. In this episode I'm joined by Matthew Halliday who's previously an applications architect at Oracle and now more recently he's actually a co-founder at Incorta. So Matthew, do you want to introduce yourself really? And welcome to the show. Yeah, thanks, Mark. It's great to be here. Yeah, so as I mentioned, I started out my career at Oracle. I got tricked into joining the company in some respects, or at least I didn't really know what I was getting into. I joined Oracle because I saw a fancy brochure and I thought, wow, that looks like a cool company. This was back in the late 90s when the internet hadn't really taken off in the way that it has now. And so I joined this company and on day one,

Starting point is 00:00:54 I was sitting next to an accountant with an accountant in front of me for a week and a half talking about general ledger, accounts payables, accounts receivables. I thought, what in the world have I got myself into? Fast track a few years, I ended up becoming the applications architect for all of the financials and procurement products within Oracle. And with that was a lot of responsibility for actually creating the very first Oracle Fusion environment, where I had to source 55,000 tables, which I'm trying to think maybe the largest ETL process in the world, I'm guessing, and bring all those objects in and create that Fusion data model, bringing in PeopleSoft, Oracle EBS together. I then went to work at Microsoft. I wanted to change a pace,

Starting point is 00:01:37 spent a few years there working on their enterprise applications. That was kind of a fun ride. And then Oracle kind of coerced me to come back and join them as an applications architect, working with some of their largest customers through United States and helping them understand and leverage the Oracle technology for their business applications. It was at that time that I met one of the co-founders of Encoder and we worked very closely on some products together. And then one day, shared the idea for Encoder. And it was at that point that I was like, I really couldn't go back to what I was doing. I was just so excited about the potential and the possibility of what Encoder can provide

Starting point is 00:02:14 that I was all in from the get-go. Fast forward again, five years, I'm responsible for products at Encorda and working with our development teams, working with our customers to build out this next generation or next transformative change, I would say, in the business of analytics. Okay, okay. So you were surprised when you said you joined Oracle and it was about accounting. I remember speaking to Mike Darwin, who's one of the PMs in the Oracle Analytics team, and he said when he joined Oracle in the late 80s,

Starting point is 00:02:48 he thought it was the actual Teletext company that competed with CFAX. So I imagine the surprise he had when he was managing his product manager for Discover shortly afterwards. Yeah, that's quite funny. I remember those days when you used to find out what was going on and what was going to be on the TV and the channels, et cetera.

Starting point is 00:03:07 Yes, the football scores and that sort of thing. But, yeah, what interested me about speaking to you was the fact that, as you say, you were one of the architects working on, I suppose, the ETL processes and data migrations into the Fusion sort of tables, Fusion apps. And I suppose how that has fed into what you're doing now within Coulter and how you're trying, I suppose, to address some of the challenges and things, you know, I suppose some of the things customers

Starting point is 00:03:40 were trying to do then that was a challenge. I mean, take us back then really to when you worked at Oracle and you're working on that task there of that ETL process and those ETL routines. Customers were speaking to you about what they were trying to do. What are the challenges that you were seeing there at the time that have led, I suppose, into what you're doing now in quarter? You know what? It really spans back to a presentation I remember Larry Allison gave at Oracle

Starting point is 00:04:04 saying when he would ask for just the number of employees at Oracle, it would take them a couple of weeks to give him the answer. And I was kind of blown away, right? With all of this technology that we had with this immensely powerful database, why couldn't we answer those simple questions? You know, some of it at the time was down to, you know, global single instance and merging all these different instances of ERP into one. And then once we did that, we ended up creating another monstrosity of a problem called incredible scale of transactions on a highly normalized data model where we had all of these join relationships, which was fantastic for OLTP-based applications.

Starting point is 00:04:43 So when you're updating, inserting, they work really great. But when you want to do analysis across everything, it became almost impossible. And so one of the things, especially in the finance sector that we were trying to address is, how do you take something like highly aggregated GL postings and then be able to drill down into those details? And so Hyperion became a of dominant player in that space.

Starting point is 00:05:08 And then Oracle acquired Hyperion. And then we spent a lot of time figuring out how do we merge these two together? And so one of the projects I worked on there was where we took Hyperion inside and started maintaining the cubes as well as the OLTP and then trying to create connections between them. It's always been this challenge of going from that. It's going from aggregate all the way down to detail. So that's always been there.

Starting point is 00:05:32 And so why we've not been able to really get at that, customers have had to go down other paths. So we look at, okay, let's take the data. Let's take it from 55,000 tables, which reside in the Oracle EBS environment. And let's try and reduce that to a single big fact table with a bunch of dimension tables around it. Because largely, we were just trying to work around the data model, which worked really well for data entry, for updates, and for concurrency of users.

Starting point is 00:06:01 But for analysis, it didn't really cut it. And that's where the data warehouse started coming out of. And that becomes a whole different beast and animal in its own right, bringing all this data in, connecting, transforming the shape, flattening of data, aggregating it, understanding the requirements of what the questions people are going to ask, all becomes incredibly complex. And whenever someone wants to make a change, it felt like it was a house of cards, right? To go back and say, okay, I'll make this change. And then, well, how does that affect all my ETL processes? How do I really understand that this is not going to mess

Starting point is 00:06:35 things up? And it became really, really error prone, very, very difficult, and certainly nothing that someone could just take off the shelf, purchase and start getting instant data from. So it's always been this painful challenge. And in some respects, I think people take pride in the fact that they're fixing a really complicated problem and using these complicated tools and doing it in a complicated way because it really shows, hey, we're smart, we can do this. But also there's got to be a better way of addressing this common problem of going from aggregates to detail and having flexibility at the same time to navigate where you want to go. Okay, okay.

Starting point is 00:07:14 So I remember sitting in an open world presentation about that link, I think, between S-Space and the and the bi and the and ebs and at the time thinking that was you know it's quite i suppose quite a brave uh a brave quite a challenge to take it on really to do that and i don't think it ever sort of went any further than that uh and but one thing i saw that you're also involved in uh was the indecra extensions to ebs i mean and again i can see sort of in a way parallels or certainly a kind of common theme there with the things you're doing now in Coulter. What was that kind of product or that initiative about at the time?

Starting point is 00:07:49 Yeah, so, you know, I remember when I first saw Indeca and the EBS extensions for Indeca, I was kind of really blown away. I thought this is really, really cool. I got super excited about that multifaceted search capability that you could, you know, drill down to the details that you wanted to see. And so I remember I kind of got my hands on the Indecker product and I pulled in the Oracle bug database into Indecker and started looking at it that way, partly because of my frustrations on how to navigate around bugs inside of the Oracle tool set. What then happened is I started going and talking to customers and presenting to them the EBS extensions for Ndeka. And the room would change.

Starting point is 00:08:29 People would get so like, you know, on the edge of their seat, you see them paying attention. They would put their phones down. And everything was going great, right? They would say, this is amazing. This is like, this has brought new life to Oracle EBS. And it really had this great promise. Everything went well until there was always one question. And it's like, can this replace my data warehouse and OBIE?

Starting point is 00:08:51 And it was always, no, this is an addition. It handles current data sets, maybe three months worth of data. And there was just a lot of problems because one of the things with Indica is you needed a flattened dataset, right? And to do that, you had to put those against these views. So we created these Indecker views inside of Oracle EBS. The problem with that is those views were really hard to put tune. And so we spent the majority of our time not building dashboards, but working on performance tuning of the views just to pump the data out.

Starting point is 00:09:23 And so you couldn't get to near real time and you couldn't bring in the data volume people wanted. And that's when I saw people were just kind of, okay, it's, it's became then just a, it's a UX improvement for, you know, navigating my, my, my open transactions. Okay. I, and also I don't think I'm going to go down that path. So it didn't get the traction that I thought it was going to get at the beginning. That was the thing in, in, in quarter that kind of got me excited yeah yeah i mean that's um yeah i think the point i suppose to me is is it's pointing to a need there isn't it in them it's pointing you know the fact people were on the edge of their seats and they did put their phones down it it must have sort of said to you you know this is a problem that

Starting point is 00:10:03 has yet to be solved. And there's a lot of value in it for customers if you do that. Yeah, no, absolutely. I mean, at Oracle, I remember whenever we'd get their quarterly results, our VP would have, you know, a spreadsheet. And they would go through and do analysis and would say, how was the quarter? How did our products do that would take a few weeks before they could ingest digest that data and synthesize it in a way that we could then learn about you know how our products were doing in the marketplace that's with all of the horsepower of the sun acquisition all of the engineering and product development we still didn't have what i would say is really you know freedom and data access to look at the data that was pertinent to our business to understand how

Starting point is 00:10:45 to use it okay so so the other reason i was keen to speak to you is you you you i think you put out quite a provocative um blog post recently which was you know the problem with etl and star schemers and generally making the kind of point i suppose that that our current way of trying to build data warehouses and do analytics on these complex OLTP type systems is kind of broken and is not fit for purpose, I suppose, really. I mean, maybe starting at the start of that post and talking about, I suppose, the route of relational databases and how they come out of the kind of COD date kind of rules for data normalization. What is the inherent problem really in trying to do analytics on these types of data sources? And how does ETL, if anything, make this kind of worse really?

Starting point is 00:11:31 Yeah, so I will say there's two parts to ETL, right? There's the good, which I would say is the data enrichment. You're bringing additional value to the data. Maybe you're cleaning up the data because of duplicates or you're creating additional business rules. Maybe there's like revenue, for example, you want to factor in loyalties or things that you might have to pay, right? Those are kind of bringing value to them. The other part, the dirty, ugly kind of underbelly of ETL, which I think is the kind of core problem is that you need to transform the data. You need to take the shape of the data and then make some assumptions on what are the columns that are important?

Starting point is 00:12:07 How do I want to aggregate this? How am I going to put this into a data model that I can slice and dice? And the first question you have to ask before you can even start that process is you need to go to the business and ask them, what are the questions that you're going to ask for the next three years of your data

Starting point is 00:12:20 so I can go build a data model to satisfy it? And I'll come back to you in nine to 12 months. That, well, one is the question to business is completely unrealistic, right? Imagine our conversation today, if you could not ask a follow-up question to anything, and I'd say something, you're like, wow, I didn't anticipate that. I wish I could ask a follow-up, but we'll say, well, maybe next year we'll have that. The conversation would be really jagged and weird. And it would just be like, well, that's not how we converse. That's not how we think. We make connections and see things as things develop. And so that's kind of one of the core problems. Now, what I think is fundamentally the bottom of the root cause of all this is there's a few things. So first of all, we started off, we were predominantly disk-based with storing our

Starting point is 00:13:11 data, right? So we started off with memory. When I was at Oracle, I remember when we got, I think it was like a 52 megabyte memory machine or something like that. It was like, wow, so much memory. I can give one and a half megs of data to this Oracle instance or something like that. It was like, wow, so much memory. I can give, you know, one and a half megs of data to this Oracle instance or something. Everything was disk-based. And then it was also stored in a row level, right? So that all the information that you needed for a single transaction would be held in those pieces. Now, when it came to slicing and dicing it, that became problematic because we were storing the data

Starting point is 00:13:46 in a way that wasn't conducive. So the first thing we did is, let's put things in a column of format. Great, because when I do analysis, I normally look at column A and columns D. I don't look at all of the columns for everything. And so storing it in that format lended itself. Second bit was memory.

Starting point is 00:14:02 Memory all of a sudden plummeted in cost and we saw memory footprint increasing. And that was phenomenal and great. And of course, we started to leverage that, right? And that made things a lot faster. There's one final bit that was kind of never, ever resolved. And that was, how do I join data? How do I go from one table to another without having to go through this hash table join function without getting too complicated that's not order n log n so exponential when you add more joins but is is linear right how do I get to the point where I can run a query and literally have 60 tables in one single select statement and have that perform at scale when I'm not just looking for a particular

Starting point is 00:14:46 month with a particular cost center, but I'm wanting to look at all my cost centers across all my months of business, right? That has been the Achilles heel of everything. And it's because of that, we've had to change the shape of the data. We've had to put it in a way that we can leverage some of these great advancements we've seen in terms of analytics. But largely, I would say the innovation we've seen in the last 10 years in analytics has been around flattened singular data sets, which is great if you're an IoT or click stream data. But if you're analyzing business systems, your ERP systems, your supply chain, your ITSM, all of those applications, that's not how your data is stored. And that has always been the major problem.

Starting point is 00:15:31 It's stored in a format you could never use. Okay, okay. But so what about, I mean, you have UX Oracle, so wouldn't the solution to this be, say, putting it into, say, an S-based cube? Yeah, sure. I mean, if you could define, well, there's two problems, right, with that. One is you is you've got to again know all the questions you want to ask up front to define the cubes because the cubes are going to define what you want to look at so if we would say you know how many pens do you have okay great we're going to roll up the number of pens we have and we'll have that and we'll store

Starting point is 00:15:59 pens by month and that'll be one of the you know dimensions of your cube the next day you come in and you say, well, I need to know how many of those are black pens and red pens and blue pens. Like, well, we don't know that, right? We have no insight into that level of detail. So then you say, okay, let's change the cube structure. Let's go ahead and then create the world roll up by color of pen. And then someone says, well, what about dry erase pens versus ballpoint pens? You're like, well, I didn't know you wanted that. So let's go back and do it again. And then all of a sudden,

Starting point is 00:16:26 now you're permeating the number of types of pans, times the number of colors, times the number of months. Before long, your cubes don't perform. You know, they perform well up until maybe 5 million records. Beyond that, good luck. And so the maintenance of all these cubes,

Starting point is 00:16:40 the loading of the cubes, and you still have ETL to feed into those cubes, it's not going to cut it either okay okay and and the etl part itself i mean just the sheer time it takes to to build these mappings to to to change them up as data changes and so on i mean i i haven't i haven't seen a full-blown data warehouse project in years really where people have invested who've committed to a kind of project of several years of etl development to do this sort of thing. I mean, do you find again that there's less appetite for that in the market now? Yeah, it's interesting. You know, I'm definitely seeing ETL processes still there,

Starting point is 00:17:12 especially with large organizations that have been around for a while. What I am seeing, which is pretty interesting, is there's two things I think going on. One is there's people who've never built a data warehouse in their current company, right? So maybe their growth company, they've gone to this point where they're like, we're at that point where we kind of need to go to that next level, but we've got, you know, challenges with or concerns about taking the data warehouse approach. And so people say, what alternatives do I have? I know that that's fraught with expense and problems and maintenance and it's restrictive. And generally those projects don't bring customer delight to your end users.

Starting point is 00:17:51 The other flip side of what I see in those larger organizations is that there's some activities taking place where they're looking at BI modernization, looking at the analytics footprints and figuring out how do we change this. And there's an appetite to get out of the ETL business because I've been at companies where they literally employ two people permanently just to keep their ETL processes running for accounts receivable. Just one area of their e-business suite. And they have two people because they always have problems. There's things they need to check.

Starting point is 00:18:23 And they need to make sure that those reports are correct. I've seen countless customers where prior to Encoder, they have these ETL processes and then through no fault of their own, they have human error in them, right? It's going to happen. When they go to Encoder, they say the results don't match. And then we find out that they were living with problems in the ETL process they didn't have because it's not easy to get right. It's not something you can just pick up and do. You need to have really good technical skills in a particular product.

Starting point is 00:18:52 Plus, you need to understand the business requirements and the application. So it's a very unique role that understands so many different areas that it's really become super difficult to sustain this with any kind of sanity and reality. Okay. Okay. So let's move on then to Encourter. I mean, just give us the, I suppose, the elevator pitch, what the product is, and then we'll kind of go into a bit of detail then really of how it solves these problems you've been talking about. So just give us the, I suppose, the high level overview of what the product is really. Yeah. So I think the way I like to refer to Encoder is it's an analytics platform that really enables you to, from the point of the source system, ingest your data as is to be able to not have to do all those ETL

Starting point is 00:19:37 processes and then run queries against that, whether you use Encoder's own visualization or use something else like Power BI or Tableau or MicroStrategy or even Excel against huge volumes and scales of data that you would not even come close to seeing even come back or perform in other systems. So we have examples of customers that have transactions that currently in their systems prior to Encoder were taking six to 20 hours to run a standard report. And then they're able to give that exact same report with additional benefits of additional visualizations and ways of looking at their data

Starting point is 00:20:13 and have it literally come back in sub seconds. It's pretty unbelievable when you see it. Most people are skeptical. Most people don't believe it's true. And I'm always a great fan of that because I say that means we're doing something really exciting. If you could believe it, it's probably a marginal improvement.

Starting point is 00:20:30 When people just say, I don't believe you, the only thing I say is, that's great. You understand the problem. You understand what we're saying is pretty audacious. But then I just say, well, we've got some amazing customers who are all referenceable with 100% renewal and retention that will speak to you about how this is true. And then if that is true, really the onus is on you to figure out what does that mean for your business? And to ask that question, if these claims are true, what does that mean? Because if those are that transformative to your business,

Starting point is 00:20:59 really the onus is on you to figure out, is this real or are we just making this up? And obviously with the customers that we have, the renewal rates we have, I'm definitely on the camp that this is changing the way we approach analytics. Okay. So you're responsible for product within a quarter. So just give us a bit more substantial about what the actual product is in terms of how does it handle the ETL? What does it, what does it store data in and so on? Maybe kind of a bit more of a technical thing as to how it actually solves these problems.

Starting point is 00:21:31 Yeah, sure. So the product, first of all, you know, one of the first questions I get is it, is it in the cloud? Is it on-premise? And the answer is yes and yes. So it really gives a good option or a capability for people who maybe have cloud in their strategy,

Starting point is 00:21:43 but are not ready to go there quite yet, that they install in quarter on-premises uh we have close relationship with azure you can go on azure aws and running quarter as well so we work in both of those environments it's a complete software solution there's no hardware there's no gpu type processing on it uh so we can work on um we're platform agnostic from that perspective. Everything's built around HTML5 and JavaScript. So there's no tools that you need to download for any part of the system from configuration installation all the way through to application users going in looking at their data or analysts looking at their data. The whole thing is built around that HTML5 interface. The core heartbeat of Encoder is definitely this direct data mapping engine which we've created.

Starting point is 00:22:34 And a little bit about what that engine is and what is direct data mapping. So in traditional databases, you have gather schema statistics, where you go out and profile the data, generate some schema stats, which then your cost-based optimizer will leverage to make informed guesses, decisions around the execution plan for a query. With Encoder, we've removed that, made it completely redundant. There is no cost-based optimizer inside of Encoder. When your queries run, they run immediately. And what's unique, this data map that we have really provides the ability for us to know exactly how data relates to each other in other tables. And so it doesn't have to go guess, should I filter by the city or by this

Starting point is 00:23:18 product before applying filters and going through the execution plan, it literally will understand, oh, this transaction relates to this one and it knows almost like how to directly get there. Just kind of like jump to that data point and be able to get it without having to sort the table and go through all of that complexity. That's kind of the heartbeat and what kind of unlocked a lot of the need to remove etl and those things the product also has this ability for you to very easily with a schema wizard just point at source systems whether those are applications like salesforce or service now or whether those you know net suite or

Starting point is 00:23:57 whether those are databases that you already have right any kind of database you have in internally or the cloud you can connect to those. And then also big data applications or big data technologies like Kafka, for example, where you can hook Encoder up to that and then bring that data from those applications or from those data sources, bring them into an Encoder platform. That process, we can manage the orchestration, the loading, we generate parquet files, which are open file format standard. We keep that so your data is not proprietary locked into some format that only we can understand. We keep that data for you. And then we leverage that data for our direct data mapping engines that can give you this query performance on top of it.

Starting point is 00:24:43 We also have provided the ability for SQL interface to it too, right? So you don't have to actually, you know, the commoditization of visualization tools. A lot of people say I'm a Tableau shop or I'm a Power BI shop. And so we wanted people to realize you don't have to use our visualization. You can connect a different product to it. What our visualization, you can connect a different product to it. What our visualization tool gives

Starting point is 00:25:05 over others is that it has that tight lineage between the data in the platform and the analyzer tool. So if you're looking at things, you can find out sampling of data, you can look at what is the definition of this column, what is, is there any descriptions about it? So all the metadata around the data to curate meaningful data sets is kind of brought to the surface. So those who are generating insights are able to look at it. So that's, as a nutshell, kind of the main bit. There's one final bit, right?

Starting point is 00:25:36 We've seen a lot talk about AI. And so we also have PySpark embedded within our platform. And we can leverage that so you can build machine learning algorithms inside the platform and orchestrate those and have transactional data merged with or joined with AI-based models and the output of those AI-based models

Starting point is 00:25:58 so that business users can then slice and dice that and interact with it. This is having to kind of go through some lengthy data science kind of route. Okay. I mean, there's a lot of products and a lot of technology in there, really. How do you, I mean, you mentioned earlier on that you were involved in bringing together data from, say, I don't know, PeopleSoft and EBS and so on. And presumably part of the challenge of that was to come up with a single kind of customer master or to,

Starting point is 00:26:25 to, uh, did you, did you, did you pick up customer records with different systems? I mean, how, how does Incor to help with that kind of problem really?

Starting point is 00:26:33 Yeah. So when I was at Oracle, I was working on creating that very first fusion development instance. And that was largely at the time is Oracle warehouse builder. And we, we, we took, we took those,

Starting point is 00:26:44 um, warehouse builder. And we actually created a program to automatically generate the graphs to run as part of the ETL jobs. And that was taking these objects from EBS release 12 and then pushing them directly into Fusion and then also bringing the data along with it. It became pretty challenging, of course, when you would have things like your TCA models or your customer definition, which would span across both, which was going to be the source of truth. There was no UI for developers, so they couldn't enter data. It had to be done by pulling those things in. So there was a lot of work around doing that to get something that the development teams could, who are focusing on the back end at the beginning, the EOs and the VOs and the AMs, for example, for them to be able to get up and running. So it was, you know, a full-time job just to kind of manage this. And,

Starting point is 00:27:36 you know, probably that's when I went completely bald during that time. And so that's kind of the fundamental challenge, if you like. In the world of Encoder, it's kind of kind of the the fundamental challenge if you like in the world of encoder it's kind of been fun to see companies i worked with one company they literally had 40 erp systems uh which you know i kind of hadn't heard about that for a while because it's kind of kind of surprised but um that a lot of these erp systems and within quarter they're able to create one data model and feed all of those in using multi-source queries just to be able to bring them in. And again, because the data wasn't fundamentally having to be changed, it becomes really easy. You can just bring the tables in,

Starting point is 00:28:15 replicate them. We literally have examples where we've done disparate data sets and brought them in in half a day, right? You go in, install the software, hook it up to a couple of systems, bring in the data. And that same day, you're seeing insights that literally customers said, I would expect it six weeks, six months before I see anything. We had one example of a product profitability. There was a customer that has 28,000 stores worldwide. And they want to analyze all of their SKUs by store location. And they knew at a regional level what was going on,

Starting point is 00:28:50 but they didn't know down to the store level for every single SKU for every day what was going on and how profitable each one of their products were. They'd allocated about $2 million for this project and about a year to do it. Within quarter, it was done in 10 weeks. I've been able to bring that in and to turn that around. And now the business users can imagine now for an extra nine to 10 months, we'll have access to data that they didn't have before.

Starting point is 00:29:16 Plus the other beauty is they have access to 40,000 columns worth of EBS data in this particular case. And they didn't have to make those assumptions of like, well, what's the data we want to bring in? It's all there. At any point in time, they can say, hmm, that's interesting. What if I slice it this way? And they're able to go in, add that filter or add that column and be able to make that change. And it literally becomes like a 20 second exercise, which our customers are telling us prior to Encoder, that would be 10, 12 weeks for any change to their data warehouse. Okay. So obviously the ETL side sounds interesting. How do you then store the data in such a way that

Starting point is 00:29:56 you can slice and dice it by any way you want? It sounds like you've got some of the performance of an OLAP cube, but you've got also the, I suppose, the flexibility of a relational database and maybe kind of, I suppose, the openness of, say, a NoSQL key value store database. I mean, what is the kind of the underlying database engine technology that you work with, really?

Starting point is 00:30:19 So we actually built the Encoder engine from the ground up. It's 100% Encoder designed and engineered. And we leverage the data being stored in a columnar format in Parquet. But once we have that data there, we use our engine to run against that. And we've really focused on the one use case. We didn't want to make a multipurpose database that could be used to run your business applications on. We wanted to create something for the sole purpose of doing analytics. And so we really focused on that.

Starting point is 00:30:50 And you can kind of think of it as we built a race car, right? And when you build a race car, it's very different than building a luxury car for people to commute in. So there's no AC. There's no electric mirrors and all those kind of things. We made it go really, really fast, but we made it handle what you want to do in analytics as the primary use case. And so with that, these engines are able to, there's actually multiple engines,

Starting point is 00:31:16 but they all leverage the direct data mapping. So we have a pivot engine, an aggregation engine, a search engine, a filtering engine, all of those things inside of Encoder are specifically built to do that task. And they all leverage this direct data map, which really is the secret to how Encoder is able to give this earth-shattering performance. It's one of the things that I think really is, once people understand what this is and

Starting point is 00:31:43 how transformative that is, it kind of never ceases to amaze us how fast it is. You kind of forget sometimes after you've been using, and I've been using Enquora now for years and years. And when I actually go back and sometimes see customers' environments, I just forget. I hired someone a few years back and after six months, he said, I just had my most frustrating day ever i was like oh man what happened he goes well i wasn't using in court i had to use you know i won't say the name this other product and it was so frustrating i couldn't get anything out it took me hours just to get five metrics to do a comparison and sometimes you just forget we get

Starting point is 00:32:23 so adept to change people don't realize that their systems shouldn't be the way they are and that there is a better way of doing it and then other customers that get into the in quarter way they forget what the world was like it's amazing how our memories are like that okay okay so um okay so how talk me through a typical i I suppose, onboarding and, say, development process that you'd have. Imagine I was a customer and I had EBS and I had financials and I had PeopleSoft and whatever. How does the onboarding process go and how does a typical, I suppose, first engagement go to work with this really? Yeah. In reality, as I i mentioned no one believes us

Starting point is 00:33:06 right that the people are skeptical and they say no way you're doing something behind the scenes i don't believe you now what generally happens is you know we do pocs no one's generally willing to buy a product that they think that just doesn't even i can't believe that's real what that looks like is let's say in the case of EBS, we'd go to a customer and we would install in quarter in about 20 minutes, everything's up and running, and then we'd connect to the data source. That generally is the most difficult part is making sure that the servers we have actually have the ports open so we can actually create a JDBC connection. It's kind of funny, but that's the most difficult bit, right? The moment someone can give us a valid JDBC connection from that box, we're kind of off to the races.

Starting point is 00:33:50 Then within 15 minutes, you've probably brought in their accounts receivable data or payables data and have some dashboards up and running. Literally, we have application modules that we can, you can run through Encoder. You can say, hey, I'm interested in this particular topic. Encoder will do the data lineage, figure out these are the objects I need. Here are the joins. If you're familiar with EBS, there's no foreign key relationships in the database, but we've automatically built in detection that will say, these are the joins that we know about and we'll deploy those for you. And so you don't have to do that work. And then you can just start slicing and dicing on it. And so really customers see that and they go, wow, that's really

Starting point is 00:34:31 unbelievable. And then they just want to start pushing the limits and they might spend a couple of weeks just using the product, trying to find out, can it do more? Can it do more? Can it do more? And they keep throwing more at it, right? Bring in more datasets. Well, let me see if I bring in this other dataset or this legacy dataset. can i handle this stuff i have on mainframes all right we're seeing customers go down those paths as well it's it's it's been pretty extraordinary to see that the different use cases getting thrown at us at this point okay okay so i suppose another then the elephant in the room we've not really mentioned in here is is all for themselves and and there's the bi apps for example out there out there, which I think is one of its transition periods at the moment.

Starting point is 00:35:08 But particularly, you know, customers are being encouraged to move to the cloud and there's solutions coming along there. You know, if a customer said to you, this is very interesting, but we're thinking about moving our data into the cloud and we're thinking about a package app solution in general, you know, what would your reaction to that be? How would you kind of potentially position your product against maybe a packaged solution running in the cloud?

Starting point is 00:35:31 Just doing the same thing you've done for 20 years, but just thinking, you know, in the cloud, sure, you get some elasticity or you don't have to pay for the support of the hardware and the services. But in essence, nothing's changed, right? It's still the same old ETL process behind it. And while there's, I'm not going to say that, you know, that there's definitely benefits, right? I'm not going to say there's no benefit to go into the

Starting point is 00:35:54 cloud. I'm a fan of the cloud. I'm a favorite of the cloud. Absolutely. Going to the cloud makes sense, but that doesn't really change anything. Your business users, how is that going to change their lives, right? Maybe it makes a little bit marginal improvement for yourself. And that's what I've seen, right? People can say, hey, that's something I understand, something I can get. It's not that disruptive to my flow. It gives me a marginal improvement. I'll go and put this in the cloud and then, you know, use those systems that way. But really, what has changed? I mean, not a lot, right? It's still the same thing behind the scenes.

Starting point is 00:36:35 I think, you know, you'd be too polite to say this, really. But I mean, one of the kind of the, I suppose, the dirty secrets of any packaged application, really, is that I think it typically sells well to the people that don't actually have to use it and um you know a package solution is good but it never seems often to be the thing that users actually need a lot of the content is often thrown away and and and either they're not customized what they want or or the work to customize what they've got is is is kind of massive i mean looking at that as a thing with your product once you've done the initial um onboarding how easy is it to then evolve what you're delivering and evolve the analytics as the needs actually emerge, really, so you're not stuck with what it is you did on the first day? Yeah, great question. Butterfly is a customer of ours, and they took Oracle EBS, their supply chain, advanced supply chain products, and some other EBS modules around inventory, et cetera, and they leveraged Encoder. So we went in, and within four weeks, or four to five weeks, they were in production on five modules, I believe, on EBS.

Starting point is 00:37:40 And what we were able to do for them was we put in a semantic layer in place. So we brought the physical tables in as is, mirrored them from source. We then had a semantic layer because quite honestly, your analysts don't want to deal with tens or 20, 30, 40, 50 table tables when they're doing analysis. They want to look at somewhat kind of flattened views of the world. The problem is, is that the flattening of those is very expensive. So we say, just don't flatten them, but you can virtually flatten them. You can have a definition that looks like a view, right? A descriptive view. So we have those inside of our platform. And then we gave some sample dashboards. And then what we found is the business users, those who had never built dashboards before in their lives, went out and

Starting point is 00:38:22 said, I like your dashboards, but I'm going to build my own. And Rachel from Shutterfly went out and built 30 dashboards to run her business. And what's pretty cool is when you look at those, days on hand supply, all of these kinds of things that they have available, they're able to look at. And what's really, I think, quite exciting is the willingness of our customers to then share that with us. And so they have been sharing it. Broadcom has been sharing. Keysight has been sharing the application content that they build. And then other customers are able to benefit from it.

Starting point is 00:38:57 And because we're not going through an ETL process that's curating data to a certain format, these things are massively deployable across customers. And so you can literally take the work that's been done at Shutterfly and deploy it at another customer. We did that at like Guitar Chocolate, where we, you know, very small company, very small IT team, but able to benefit from the collective knowledge that our recorded customers are sharing and being able to leverage those and say, oh, this is how Shutterfly looks at this. So we can look at my chocolate bars in stock and days on hand of supply. All these kinds of things become very easy for people

Starting point is 00:39:34 to leverage across. If you have an ETL process, it's a black box. It's got tons of stuff in it and nobody has an off the shelf ETL box that works, right? It's not a black box. You end up getting it, pulling it apart and trying to put it back together. Don't care who it is, right? Domo. Looks like it's a nice way to bring it, but it's still ETL process behind the scenes. ThoughtSpot, still ETL process behind the scenes. That's kind of the bit that people don't like to show, right? They never lead with that in a demo. They always show other things. And then when you push them, so how is the data going in? Where's that going? How did you make that assumption? It's like, okay, it's still there. It's a star schema. It's a flattened view. It's aggregated tables and it's

Starting point is 00:40:17 data pipelines. And then we have companies that are jumping up about, well, let's just automate it. Let's just put investment into data warehouse automation. And I'm like, still the same thing, still the same way of doing it, put it in the cloud, automate it, still the same thing. Sure, it may be a little bit less painful, but we're really slapping band-aids on everything versus going to the root cause. And the root cause is, why do we need to change the shape of the data? When I took my very first class in school on SQL, I didn't know the queries wouldn't run at scale, right? I learned to select statement, put it together, boom, run it. It's like, if I only ever took SQL 101 or the first year of SQL, and then took it to an enterprise application or Fortune 100 company running their business applications

Starting point is 00:41:05 on an Oracle database or any other database, right? We're completely agnostic. I know we spoke a lot about Oracle EBS. It's my background, but we have customers who are not EBS at all. Most of our customers are not on the whole. A lot of them have some ERP systems, but those queries won't run, right? There'll be snapshot to old error messages. You'll have queries that'll never come back. It's just a mess. And what if that would work? What would that mean if I could just run the SQL,

Starting point is 00:41:37 like very rudimentary in the way that I thought I could, but I was never able to. And I kind of scratched my head. And I remember at Oracle, I would go to Ahmed Alamari or Lester Gutierrez, who were like the performance gurus, right? Super smart guys. I wasn't smart enough to figure out how to get my SQL to work. And then they say, oh, we need to denormalize. We need to take this data. Let's get rid of that join. And we would do all these things, even within application development. We would say, if you're doing a join just to get like a status code or something, let's put the status code on the transaction.

Starting point is 00:42:07 Let's get rid of that join. It's going to be faster. If it's more than three fields, okay, then we'll leave it out. We started to have to do all these things to constantly work around this one limitation. No one's been able to fix it until now. Now that we change it, it changes the approach, but everyone is so gung-ho going down these paths.'m just saying look we're all going in the wrong direction it's innovation down the same path it's the wrong path there's a different way that's where we need to be going fantastic fantastic so i'm going to ask you in a second how people find out more about in quarter but um before i do that what are you you're tackling now the the problem of etl and and and so on what what's the to your mind what's the next kind of customer problem that you see out there that hasn't been

Starting point is 00:42:49 addressed or the next challenge or the next kind of i suppose in a way speed bump in in getting analytics in people's hands i think that's around self-service i think what we're seeing right is people are coming in and almost like back in the 90s i don't know if you remember on people's resumes and cvs they would put microsoft word and microsoft excel is like some of their skills right um it kind of seems ridiculous to put that on your resume right now like i know powerpoint or i know keynote people like kind of laugh at you um today i think the new one is um data data driven or and you know can do analysis and things like that and so you got all these people as proliferation of content. I've seen companies with 17,000 plus dashboards that they've created and they have no clue what's going on with them. You ask them,

Starting point is 00:43:35 which ones have been used? No idea. But you need to move, you know, my platform to say, well, I need a migration path because I got 17,000 reports. I'm like 17,000 reports, like what in the world is going on? And this is only going to get worse. I think it's only just begun. And so are you able to answer questions? And how do you manage that, right? How do I bring formal process to things that maybe get viral, right?

Starting point is 00:43:57 Someone creates a dashboard and then shares it with someone. And then all of a sudden that becomes the hot thing. That probably should be productized. Someone should look at it and say, wow, is this actually correct? Are people using right data? I think people are using data like a hammer and they're going around hitting people. I've heard nightmare stories of people literally laying off people because of data and then finding out the data was wrong. I'm like, that's pretty bad, right? And that's, you know, no one died, but hey, right, that's affecting livelihoods and? And that's, you know, no one died, but hey, right,

Starting point is 00:44:25 that's affecting livelihoods and the ramifications of how we use data in that way, I think need to be figured out. And so one of the things I'm pretty passionate about is how do you bring sanity to what we're building? And I kind of feel like the illustration is this. It's like before we had CRM software, people were just managing their sales and they would kind of say, I think we look good for the quarter or whatever. When it comes to analytics, I feel it's a little bit like that. What's the analytical app that people use to deliver analytical applications? How are people doing A-B testing? Are the analytics you're building actually even doing anything? I contend that a lot of these dashboards that people build are pretty much the most expensive

Starting point is 00:45:08 pieces of virtual art sitting on a virtual corridor that maybe nobody looks at. And even if they do look at it, maybe it doesn't even change or move the needle in any shape nor form. How are you measuring that? How do you know which users actually use your data? Which dashboards are actually being used, which reports are useful, and which ones actually you can attribute investment as being worthwhile, right? All of those things, I think we've got to figure out because this proliferation of people saying they're data savvy, that they know what they're doing, and just give me access to the data.

Starting point is 00:45:41 And then you get in these rooms and you have people saying, well, my report says this, and someone says my report says that. And then everyone's scratching their head going, you know, which one's real. Those days, you know, we've got to come to a point, I think, of being data literate to a sense of that we really understand how to use data and how to ask questions and really how to challenge data.

Starting point is 00:46:04 I see sometimes that people don't do that kind of, you know, we spoke about journalism and rigor of investigative journalism and how that's dying and people only read the headlines and all those kinds of things. I feel like that's happening with data. People just read the headlines. They don't dig into the details. And often they couldn't because it was only aggregate detail. How could you drill down to the details when you never had them? It became super expensive to say, okay, here's an aggregate number. We're down on this product.

Starting point is 00:46:30 I should not sell it anymore. But then how do I drill into that to actually see what's going on, to understand exactly what were the transactions behind it? How do I have confidence that the high level aggregations I'm looking at are correct. Because I would contend that probably over 50 to 60% of the time, somewhere you have data problems that you don't even know you have. I've worked with customers that literally have had values that report into the street that have been incorrect. And then they found out.

Starting point is 00:47:02 And nobody wants that. Interesting. I'll look out for your thoughts on that in the future then, because I totally agree. Yeah, I absolutely agree. I think that's one of the next big challenges, really. So how would people find out about Encourter, and how would they, I suppose, get to experience the technology

Starting point is 00:47:23 and get to try and, I suppose, in a way, test out what you're saying here, really? I mean, it sounds fantastic. What's the next stage in establishing if this is the right thing for them? Yeah, I think, obviously, there's Encoder.com. No need to even say that, right? But there's two things. The first thing is, one, is understand that the problem that we solve, I think a lot of people sometimes look at Encoder and they just bring in like a single source data set,

Starting point is 00:47:50 right? A single flattened table and then just evaluate it feeling like it's a visualization tool. Completely wrong way of doing it. And if you're going to do that, I'd honestly say you could find better products at this point. If you were going to bring in a highly complex data set, that's something that more mirrors exactly your backend systems that you have, your application data models that you have, and leverage those, I guarantee that you'll find nothing that comes close to Encoder.

Starting point is 00:48:18 So there's a number of ways you can learn more, right? On our website, you'll find there's blog entries, there's eBooks, there's webinars. There was a webinar from actually yesterday that Keysight provided with actually demo that live system. We have another one where Shutterfly demoed that live system, showing in quarter, showing how they're using it and how it's transformed their business for them. There's also, you can reach out and schedule demos. We're happy to show it.

Starting point is 00:48:50 But as I mentioned, a lot of people, a lot of skepticism, um, people who are not skeptics generally don't understand the problem. Those who are skeptics become some of our biggest ambassadors and, um, tell others about the product and become real passionate champions about it. Um, so I've had conversations with diehard data modelers types who actually make a living presenting at data warehousing conferences and spent days with them. And literally when they get it, they're like, oh my goodness, this really is changing everything in terms of what we've been doing and how we've approached analytics, data warehousing, ETL, and, and just what we're doing in that space.

Starting point is 00:49:25 Okay. Fantastic. Well, brilliant. Well, it's in that space. Okay, fantastic. Well, brilliant. Well, it's been great speaking to you, Matthew. Appreciate you coming on the show and good luck with the product. And yeah, hopefully some people will kind of check you out and maybe get some kind of benefit out of what you're doing.

Starting point is 00:49:36 Great. Thanks, Mark. It was a privilege speaking to you. It was kind of fun reminiscing about some of the old times at Oracle and seeing where we will head up in the future, but great chatting. Warehouse builder. Excellent.

Starting point is 00:49:48 Make me laugh. Cheers. Okay. Take care. Thank you.

Drill to Detail - Drill to Detail Ep.66 'ETL, Incorta and the Death of the Star Schema' with Special Guest Matthew Halliday

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.