Drill to Detail - Drill to Detail Ep.74 'Accelerating Oracle from Express to Analytic Views' with Special Guest Bud Endress

Starting point is 00:00:00 Welcome to Draw to Detail, and I'm your host, Mark Whitman. So I'm joined today by Bud Endres, Director of Product Management at Oracle, and someone I've known for many years, back from the days when I used to work with Oracle Express, right through to Oracle's in-memory database now, and the recent analytic views feature. So Bud, welcome to the show, and it's great to have you here. Thank you, Mark. It's good to be here. So Bud, I mean, just tell us a bit about the role you do at Oracle at the moment. Yeah, that's fine. So as you mentioned, my title is Director of Product Management. What that means to me is that, first of all, I work in the development organization.

Starting point is 00:00:48 So I sit with engineering. And my role is primarily product planning and delivery into the sales organization. So there's a few things here. I'm part of a larger product leadership team involving product management and engineering. I participate in setting long-term directions and strategy for product within Oracle. Closer to home, my products, which include OLAP, option to the Oracle database, analytic views, part of database and memory, and more recently, participating on the product management teams for Autonomous Database and something that was announced at Open World

Starting point is 00:01:33 and Autonomous Data Platform. I work to drive requirements, participate in specification design processes, things like that. My nature is to be very hands-on with product. I often prototype new features and things like that. And later in the development process, evaluation of features and finally delivery into the sales organization. And then we start the cycle again. Fantastic.

Starting point is 00:02:06 So, I mean, as I said at the start, I've known you, I've known all of you certainly and known you recently for many years. And I think you've probably been the product manager that's had, I suppose had the most influence over the things that I do, funny enough, over that time. I mean, I remember working back, working with Express back in the days, you know, when you were in that area and right through Oracle OL, through to the in-memory aggregation. And we'll cover a lot of those really in this kind of podcast. But take us back to, I suppose, your route into Oracle all those years ago.

Starting point is 00:02:36 What were you doing before that? And how did that lead you into joining Oracle? What was the kind of link with IRI and Express and so on? Yeah, sure. So, you know, out of school, I was involved in a number of startups that failed in pretty severe ways. And one that was really pretty successful. I ended up at Information information resources in 1992. One of the startups I was in, I was involved in products that were transitioning from character mode to Windows to sort of set the calendar there. And I was hired into information resources as they were making a transition of a product called Data Server, which ran on Express Server that you mentioned.

Starting point is 00:03:33 That was transitioning from character mode to Windows. And I had a lot of experience in Windows when people didn't. Oracle acquired information resources, and luckily I came along with that. Oracle acquired my product, Data Server, which became Oracle Sales Analyzer, and the database, Express Server, on which it ran. So that's how I ended up at oracle so i mean people still people people i know in the oracle world we you know we still talk about express with with you know really fond memories and and and it was it had a um it had a kind of a really strong faithful

Starting point is 00:04:18 following and it as a product it it was you know fantastic what was it that you thought was special about express at the time and and why do you think it kind of got know fantastic what was it that you thought was special about express at the time and and why do you think it kind of got such a following and it was you know in the end it was something that oracle you know chose to acquire really what was what was special about express really yeah well express was a really interesting product um you know first um it was, as far as I know, the very first multidimensional database. Storage was array-based rather than row-based, like a relational database. It was highly optimized for very fast retrieval of aggregate data. It naturally understood concepts such as data sparsity. This was reflected in the model, the calculation language, things like that. To Express, all data was data.

Starting point is 00:05:12 It wasn't just a missing row, and it influenced how data was calculated. So that was pretty cool. Express had a 4GL language that was very accessible to business users, pretty smart business users, but business users. Using this language, non-programmers could navigate data at the language level, define calculations, and so on. It was a really great platform in that sense for self-service BI. Express had a really cool method of filtering.

Starting point is 00:05:50 You'd filter dimensions, you know, time is a year-level product, a part-level, and so on. And Express would automatically confine filters on different dimensions to produce a slice of the data. And the next one I think is, I still think it's cool, is that it had the notion of persistent filters. So one of the things you could do is you could set, we called it status then,

Starting point is 00:06:18 but you could set filters on each of the dimensions, and that would remain persistent throughout your session until you change them. So if you were working on a certain area, you'd set your filters and go about your work and more or less forget about that until you wanted to change them. I really like that. I think that was a cool feature. And Express had a really unique transaction model. It was built for what we call what-if applications. And what would happen here is that users could go in and make changes to data for very long periods of time, for hours, without writing them back to the database. And all their changes were isolated.

Starting point is 00:07:03 So they could make changes, aggregate, make other changes, allocate, whatever, all in their own little sandbox. In the end, if they wanted to save it, it was fine. And if they didn't, it just went away. So it's perfect for what-if type applications. There's a lot, but those are the things that really stick out in my mind. Yeah. And a sales analyzer

Starting point is 00:07:27 and financial analyzer i mean when i came into the consulting world it was i worked for a consultancy um and uh which was acg which it was owned by acg in the states and uh which you probably knew from iri and uh and ofi and rsa again were kind, were client applications that had very good customer satisfaction and were very well thought of, really. So Sales Analyzer. Tell us about Sales Analyzer. And again, what did that try and do? And why is that remembered quite fondly, do you think? Yeah, OK.

Starting point is 00:08:01 So you mentioned Financial Analyzer and Sales Analyzer. I grew. So you mentioned financial analyzer and sales analyzer. I grew up on the sales analyzer side of the house. And sales analyzer, to the best of my knowledge, was the very first dimensional end user reporting tool, self-service reporting tool. And it did a really, really good job of presenting the dimensional model to the end user. So it was very natural for a user in a marketing department to understand that sales vary by time, product geography, demographic characteristics, and things like that. And it allowed them to express queries, view data, and define calculations within that context. So it was really powerful, but it really fit into the mentality

Starting point is 00:08:55 of that business user. And again, I think it was the very first product along those lines. Okay, okay. So you were responsible for Sales Analyzer. And there were other products around that time. I remember RAMRA at the time, the Relational Access Manager, those products which were trying to bridge, I suppose, that kind of world between relational and multidimensional.

Starting point is 00:09:22 And as you said, there was then the acquisition of IRI's technology by Oracle. I mean, there must have been, I mean, I remember reading at the time in the tech press that this was a massive big deal, wasn't it? And a massive endorsement, I suppose, of the OLAP world by one of the big database vendors. Is that how you saw it at the time? Yeah, okay.

Starting point is 00:09:44 Yeah, it was a really big deal to us. You know, Oracle was a huge company compared to IRI and particularly the software arm. The Express was first brought to the attention of Oracle because Oracle was a customer. They used Oracle Financial Analyzer. When we were acquired, we were actually the very first large acquisition by Oracle. Oracle had not been in the habit of acquiring companies up until that point. For us, you know, it really meant access to a lot of technology and a huge sales organization.

Starting point is 00:10:30 It also, you know, Oracle had pretty deep pockets relative to what we had. So, you know, it's just a lot of great resources, the ability to hire good people. And, you know, it got access to that sales organization. So, yeah, it was huge. It was really important. Okay. Okay. So I guess when you particularly came to my notice, really, was back in the days of when the announcement was made that Express would be,

Starting point is 00:10:58 I think it was back in the days of Oracle 9i, and there was an announcement that Express, the technology would start to be embedded in the days of Oracle 9i, and there was the announcement that Express, the technology would start to be embedded in the database. And we started to sort of see things like Oracle 9i OLAP and so on. Maybe just tell us a bit about the story and the thinking around that. Because you could have gone different ways. I mean, if you think about, say, what Arbor did with Airspace, it was always kept standalone.

Starting point is 00:11:21 But, you know, you went down the route of embedding it. What was the thinking behind that? Yeah, well, even before we were part of Oracle, we always had customers that wanted to have the features of Express, the dimensional model and the language and things like that, and access large volumes of data. And their choices were to load data into the ExpressCube, the Express database. And, you know, that took time and processing. And the request was to be able

Starting point is 00:11:54 to leave the data in place in a relational database, but still access it in the same way. So even before we were acquired by Oracle, we built something called Relational Access Manager, which was a dynamic bridge between the dimensional model and relational databases. So we began this process really, really early. And the motivation was really seamless access to data and relational databases without the pain of without loading and moving it and that got us greater scalability and you know that's just what customers wanted to do you know later on obviously we got to deeper integration points so so i mean i remember i remember at the at the time it was it was um i think like any kind of, you know, I suppose initial integration, it was, I remember at the time we were thinking, I think the first iteration of 9i OLAP was relational, wasn't it?

Starting point is 00:12:55 And tell us about how 9i OLAP, I suppose, evolved and, you know, the way you went from, you know, relational, multidimensional, that sort of thing. Right. So, you know, with Express, we had a multidimensional database and multidimensional storage. And we thought that it would be a great idea if what we could do is always layer over relational storage, eliminate the need to move data into the Express database. And really at the same time, there was movement around standardization of 11 APIs. I forget what the standard was called, but there was an attempt in the community to define

Starting point is 00:13:39 an API, a standard API for multidimensional access. So in 9i, what we tried doing was developing a Java OLAP API, and it always accessed data from relational tables. This was separate from Express Server. I think the concept was kind of ahead of its time, quite frankly. The database wasn't really ready to handle this load. It just never matched the performance of Express Server.

Starting point is 00:14:32 And also by moving the language from the multidimensional language to a Java API, we lost the focus on the business user. It really became a developer. It was a good experiment, but at the end of the day, it really didn't catch on. It was interesting i mean i think there were i mean looking back at it some things i remember i remember the uh i think when when enterprise manager became the kind of tool that was you manage things with and there was a famous i mean i remember working consulting often you just get an error message with a red bell with a thing clanging against it and no message and it was it was there were early days but everybody knew their early days and and you know it was interesting during i think think it was 9i release too, that, you know, you started to get the multidimensional storage

Starting point is 00:15:09 coming back into it again. And, you know, it was something where you could see the obvious benefits, managing it as part of the database and so on. But it was, you know, it was probably, I think, for me, it was like, I think 10G was when, you know, I think it was that when you had an analytic workspace manager come out. And I think that, having worked in product management, you can see how these things come in stages. And, you know, what was the transition between 9 and 10 like for you, really? Well, it was really the transition between 9i release 1 and 9i release 2.

Starting point is 00:15:43 So we did this Olapi experiment and technically it worked, but it didn't catch on. And our customers told us loud and clear that they liked the dimensional model, the engine and this and that that was express. But at the same time, it was problematic to have a separate server and storage and, you know, all the things like that. So we made the decision to embed the Express Engine into the Oracle kernel. It's a huge, huge undertaking to merge those code bases but we did that and we moved the storage

Starting point is 00:16:27 the multi-dimensional data types into Oracle storage so at this point we really had much of the best of the Express Engine and we had one server which was good we introduced a SQL interface over multidimensional data types, which was really good. And that path went on for a long time, and it's still in the database. One of the questions I always had at the time was whether, because obviously one of the things that made it, the lack of backwards compatibility, the fact that you didn't support Snappy.

Starting point is 00:17:06 And I always wondered to myself at the time, was it impossible to put Snappy support back in again? Or I just wonder, what was you thinking around that, really? I mean, was that a technical thing where you couldn't be supported anymore because of the underlying thing? Or what was your thoughts on that? Okay. So Snappy was the call interface language to Express. So this is, you know, you connected to Express through Snappy and this is where commands and data moved around. We could have

Starting point is 00:17:41 moved that into the Oracle database, but we decided not to. You know, Oracle has OCI. You know, it has SQLNet, whatever you care to call it. And it really made much more sense to leverage everything that OCI had in its connections to the database. You know, one of the things that we wanted to get away from were duplications. So we didn't want two separate servers. We didn't want two separate storage engines. We didn't want two separate sets of database

Starting point is 00:18:19 users and things like that. And the called interfaces fell into this category. You know, there's a lot of things that we got for free once the Express Engine was in the database. You know, we inherited all of the scalability and security of the database, you know, all of the networking infrastructure, all of that. So, you know, first the Oracle customer was on OCI, not Snappy. And second, the amount of resource in the development team that it freed up to work

Starting point is 00:18:55 on core functionality in the multidimensional area was really pretty significant. We did rethink that, the Sna snappy decision at one point and we almost reintroduced it but but we didn't yeah yeah interesting i mean i think from for my day the thing that i i look back at and think was a very interesting and impressive technical achievement was the work you did around sequel really and and in you know, I think you guys took a decision that SQL would be your, your primary interface into multi-tool, into, into OLAP and then ran with that idea. I mean, just tell us again a bit about where the idea came from and what you were trying to achieve with that and, and, you know, why you went down that route rather than, I suppose, the alternative, which is, I suppose, MDX and those kinds of

Starting point is 00:19:42 languages. Yeah. I mean, it's a pretty, it was a really pretty simple decision, right? You're in the Oracle database. The primary languages of the Oracle database is SQL and PLSQL. And if you wanted customers, you had to go there. You know, the number of people that were willing to write to proprietary APIs and languages is really quite small as compared to the number of people that were willing to write to proprietary APIs and languages is really quite small as compared to the number of people, you know, that would be writing SQL. It meant that you had a chance for tools compatibility. You know, if you wanted to work with microstrategy, Cognos, you know, all the things of that day,

Starting point is 00:20:21 you know, the answer was SQL. We did look at MDX over time and we eventually did MDX. SQL was by far more dominant and MDX was really not a standard, so it was kind of squishy to work with. You know, so, I mean, if you're in Oracle, you speak SQL. I think it's in a lot of ways as easy as that. And PL SQL. Yeah. So, I mean, I suppose, again, that was a, I suppose, SQL support, SQL is the language for OLAP. It went through a lot of kind of, I suppose, generations, didn't it? I mean, I remember, you know, I think OLAP table was one thing at a certain point.

Starting point is 00:21:10 I mean, maybe what was your role in that? And, you know, again, how did that sort of evolve over time, really, the way you went from something that was fairly, I suppose, just an API call through to pretty much, I mean, we'll get on to analytic views in a moment. But, you know, that was quite, you know, you can see a lot of generations and evolution in that really over time. Yeah, well, I mean, my works well and it's not necessarily, you know, flat or dimensional. You know, you select certain things and you have certain filters and it returns the data back,

Starting point is 00:21:59 you know, as rows or these days as JSON or something like that. So from an expressibility point of view, it was just fine. I can say in SQL, select geography where level is state just as well as I can in anything else. What was really much, much more difficult

Starting point is 00:22:20 was figuring out the implementation. So you mentioned OLAP table. You know, at first we used external tables as the access method because that's what you did in Oracle. Did I say external tables? This callout mechanism. You know, because that's what was available. And over time, you know, we just went lower and lower in the database stack eventually into the SQL execution engine directly.

Starting point is 00:22:59 But a lot of pragmatic steps, I would say. Yeah, yeah, definitely. And so I suppose the next evolution of that that I saw was, I think it was, I don't know if this is bumping out, kube-organized materialized views. Is that correct? I think it was something that I saw come out where, you know, in the past I'd used things like the kube-by-rollup statement, for example, within Oracle SQL to create aggregates

Starting point is 00:23:24 that had multiple aggregations in there. And then I saw cube organized materialized views. Maybe just remind us what that was and the problem that was trying to solve and how it used your technology. Yeah, okay. First, to step back just a little bit, the cube storage in the database is extremely efficient at query time. The, you know, Express and its storage was colander storage before there was colander storage.

Starting point is 00:23:58 You know, it's an array-based storage mechanism that's pretty, that has a lot of similarities to columnar storage now, but that was built at a time when memory and CPU was absolutely precious, extremely expensive, and disk was slow. So the strategy was to expend an awful lot of time and energy into building and optimizing the cube. That's where all the CPU went. And at query time, it was extremely fast at retrieval and extremely fast at accessing aggregate data. So the idea behind the cube organized materialized view is that it could play the role of aggregate storage

Starting point is 00:24:47 for relational applications. So you had some tables, and you could build aggregate tables in the database, or you could build a cube in the database. The cube was simply much more efficient, very, very fast. The problem was that applications didn't know what a cube was. So what we did is built a rewrite mechanism. So you'd query the tables with some in a group by and the database would recognize that and rewrite that query into the cube where it was much faster

Starting point is 00:25:28 and lighter on CPU. Yeah, I mean, I think that was a very impressive sort of thing. And it obviously allowed you to use any kind of BI tool, but it would kind of still work with your cube storage, really. So, I mean, so there was that. And then I'd heard that you you know as i said you recently involved in every kind of product area that i found interesting then i i found i found you again you were working in the kind of in memory area of the database um maybe just again

Starting point is 00:25:55 just for anybody who's kind of new to this area when what was that what was the product what was going on with the product at the time and and and um what was in memory trying to solve and and and kind of how did it relate to what you're doing with you know cube organized materialized yeah okay so i mentioned a minute ago that um you know at a certain point in time cpu and memory um you know they were just precious it's um you know really expensive stuff and um you were careful with it, used it wisely. These days, you know, CPU is, God, it's almost free, right? And memory isn't almost free, but you can get a lot of it. You know, these things are no longer, you know, what we recognized is that it would be best to make a shift from pre-building a cube or a collection of tables, for that matter, and putting a lot of effort into processing those, pre-processing those with the goal of having more efficient retrieval query with lots of CPU available, lots of memory available, we would make a shift towards dynamic computation, particularly of aggregate

Starting point is 00:27:13 data. And this is a very, very big shift. So the idea with database and memory is that storage would become columnar, a lot of similarities with the ExpressCube, and it would be put in memory where it could be scanned much faster and we would make a very dramatic shift again from pre-calculation and storage to dynamic computation. And there's tremendous advantages to this. One is you just don't pre-build a cube or pre-build a lot of aggregate tables. You save space there.

Starting point is 00:27:59 But I think more importantly, when everything's computed dynamically, you can decide at query time how you want to aggregate the data or how you want to dimension it, dimensional speak. And you could approach much larger data sets, not just in terms of the number of rows in a table, but how wide a dimensional model might be. So if building a 10-dimensional cube was expensive, querying a 10-dimensional table is really cheap. So you could really broaden the scope of dimensional models in this context. Okay. And I think I remember speaking to you a while ago, and you were involved in some of the, I suppose, optimizations that go on in the background when these in-memory aggregation queries are running.

Starting point is 00:28:50 I think that vector group buy-in. What was your involvement in that and what were those in the background doing really? Yeah, okay. So database in-memory, and this is as it stands today, you can think of as having two major parts to it. On the storage side, there's the in-memory columnar tables. So data is loaded from disk and put to memory in this columnar format. The job of the in-memory columnar table is to be very, very fast at filter and scan and eliminate the need for indexes and things like that. The next part to this is the SQL processing side. So if getting data into the SQL engine is basically free from the in-memory column or store, the next question is largely how quickly can you join it and how quickly can you aggregate it

Starting point is 00:29:46 and that's where our team here gets involved we do the sql execution side of database and memory so we did this thing that we call in-memory aggregation if you look at the s plan, it's vector join and vector group by. And both of these just improved the execution of certain types of joins and aggregation by 10 to 50 times. Very, very dramatic changes. Okay. So I suppose it brings us to the present day really I mean so the product or the feature product whoever analytic views I mean just tell us about what that is and I suppose

Starting point is 00:30:33 the problem it's the problem it solves and I suppose you know how it fits into this timeline really of things that you've worked on. Yeah it's you, if you listen to what I was talking about in the past, you can draw a pretty straight line. You know, we've, the dimensional model is incredibly useful. It's really easy to work with. It has a lot of value. The storage, as separate multi-dimensional storage has always been a bit of a liability, often with advantage performance. But what we wanted to do as in-memory became available and, you know, the computing resources became cheap, is we wanted to separate the dimensional model, keep that value, and marry that with the flexibility and now the speed of the in-memory column store and in-memory processing. So the idea behind the analytic view

Starting point is 00:31:39 is that it's a new object in the database, well, it has a port as well, so I guess not that new, but it's new object in the database, well, new as of Oracle 12, so I guess not that new. But it's an object in the database that represents data dimensionally. It has all of the same model constructs. It has hierarchies, levels, attributes, measures, and things like that, as we had in Express, and later the OLAP option, but it has no storage. Everything is accessed dynamically. So you query it and you calculate as a dimensional object, which is really nice, and it

Starting point is 00:32:17 accesses data wherever it might happen to be. You know, be it a local table in the library column store or, you know, it could be, you know, far away an external table or, you know, something like that. You know, that was the fundamental concept and, you know, we did this with RAM. We tried this with Olafi. We returned to RAM and, you know know we eventually separated out the layers really really cleanly in a lot of ways technology caught up with the concept it sounds like it's in a way what you're trying to do with nine iolap back in the day but with with technology and uh and maybe kind of i suppose in a way just the benefit of experience, but particularly technology that makes it possible now.

Starting point is 00:33:06 That's exactly right. And I would say definitely with the benefit of experience. Yeah, yeah. This was really hard to do for a long time. Oh, yeah. How do you keep the best of a dimensional model and how do you layer this over relational and now other storage?

Starting point is 00:33:27 It's not just relational out there, is it? Yeah. So what would the user, I mean, if you're someone who is maybe, I don't know, using the always free scheme now with Oracle Cloud, you've accessed, you've set up an autonomous data warehouse. What would the experience be like with with analytic views is it just writing sql as normal or or how would it be simpler or faster or easier or what really well um first of all it can be writing sql as usual

Starting point is 00:33:56 the return to why you would do this a little bit. When you layer a model over data, you're imparting meaning on that data, right? So, you know, the problem, in a sense, with relational tables is you have a table and the database has no idea what the intent is. So I have a table called sales, and the database says, great, you have a table called sales,

Starting point is 00:34:23 and I got another one called geography, and it says, good for you, you got a table called sales and the database says great you have a table called sales and I got another one called geography that says good for you you got a table called geography it really doesn't know what the relationship between those tables is you might have a constraint but otherwise it has no idea well when you have a dimensional model you can express how you want to use that data. So if I say, you know, I have a sales table and I have product geography and time hierarchies, and if you specify the relationship between those, the database can do some wonderful things. One thing it can do is it can explain how those tables are used to an application. So instead of querying the data dictionary and saying, great, I got a couple of tables, and I guess I'll figure out what to do with it.

Starting point is 00:35:12 You query the data dictionary and it says you have sales and sales is dimensioned by product geography and time. And sales is aggregated in a certain way. And the intent is known and can be communicated to the end user and the application that's that's a really powerful thing you can define calculations really easily so let's say i want to do something as simple as the percent change in sales this year versus last year if i'm doing that in the analytic view, I have one function that says sales this year

Starting point is 00:35:49 versus last year, but the hierarchy is time. The database figures the rest out. If I do this in a query, it's up to me. I have to express that. I have to express the aggregation, the outer joins, you name it, the null data handling, all this other kind of stuff. So you can embed calculations in the analytic view and it's not the database's responsibility. It's not your responsibility at query time to be able to do that.

Starting point is 00:36:20 You can optimize performance. You know, the analytic view knows the relationships and it knows where the data is. So let's say the data happens to be, you're running on Exadata and it's a row store table. The analytic view knows that and it generates the optimal SQL for that. Let's say the data is in an external table.

Starting point is 00:36:48 It knows that and it generates the optimal SQL for that. So you have tons and tons of optimization opportunities when you have the metadata. All that's good stuff. But in a lot of ways to me, it gets down to servicing the end user. And how do you how do you service that user make it easy to communicate the usefulness of the data and things like that? Yeah. So so I mean, how would that mean what you described there about knowing about the relationships and the hierarchies and so on? You obviously could define

Starting point is 00:37:23 a dimension within within the sort of the Oracle database metadata in the past. How does this differ to, say, just a create dimension statement? Well, yeah, the create dimension statement, I'll just say on the relational side of the house for lack of a better term, its purpose is to define the relationship between a dimension table and a back table and a star schema. And it gives a little bit more metadata than a foreign key constraint. It gives you the relationship between certain columns of the dimension table. You know, this level rolls up to this level, it rolls up to this level.

Starting point is 00:38:06 It doesn't do anything beyond that. So its primary purpose is really in the query rewrite mechanism of materialized views. In the analytic view hierarchy, you get that and you get a complete business description of the hierarchy. So the hierarchy says, great, you have levels, month, quarter, and year. You have textual descriptions. You have presentation metadata. So time is calendar year or whatever text description you want.

Starting point is 00:38:45 It is completely extensible. You can add your own application metadata to that. It plays a role in calculation definition. It really combines physical properties, semantic properties, and presentation properties in one object. Okay. So how would a tool other than writing in SQL yourself make use of these features? I work with a tool called Looker quite a lot recently, and it is a straight SQL tool against

Starting point is 00:39:21 kind of databases. How might it leverage some of the, is it something that's completely transparent to the calling application or are there ways that applications can kind of really, I suppose, make more use of these features if they do things in a certain way? Yeah, it can be. We support right now four different access methods

Starting point is 00:39:40 into the AV. One is if you want to take full advantage of the analytic view you can write hierarchical SQL, that's my term, that is SQL that is aware of the dimensional model and the special features of the hierarchy in the analytic view. So if you really want to exploit it, you can do that. If you're an application developer, you're building applications in Application Express or something else that you have control over the SQL,

Starting point is 00:40:17 that's a great way to go because you get the most power, the most features, and it's really a lot easier to write SQL looking for calculations and aggregate data against an analytic view than it is tables. But you have to be aware of the analytic view to do that. If you're a MDX-based application, we do have an MDX interface,

Starting point is 00:40:42 and that's native to the Oracle database. If you're not aware of either of those, but you still want to get at the analytic view, we have a rewrite mechanism. So there is a form of a relational view that has a couple of keywords that tells it that it's accessing an analytic view. And you can then write just regular old star queries to these views, and the database will rewrite that into the analytic view. Okay. And how might that, I mean, is that something that, what would you,

Starting point is 00:41:21 and just describe what you would put into the kind of the view statement to have that happen then um you select from an analytic view and you include two words um factors okay okay and that and that and so that is documented that's something that that you know is out there so i mean that sounds very interesting actually it is um it is uh that that feature first exists in the autonomous data warehouse um in the cloud that's available now and um yeah it's just if you look at uh create view that's part of the create view statement okay we'll follow up on that actually so what so you mentioned hierarchical um sql then and i remember back again, back in the days of Oracle OLAP SQL, there was a similar concept.

Starting point is 00:42:11 So what would that be then? And how does that, I suppose, leverage the features even more directly? Yeah, okay. One of the cool things about the analytic view is that it returns the columns of the original data sources that you had. So if your table had month, quarter, year, city, state, set, things like that, you get those columns in the analytic view. But they also have special columns. We call these hierarchical attributes. We manufacture those. They're pseudo columns. Every hierarchy has certain things. They come for free. We create a unique key for every dimension member. The purpose of this is to make values unique across levels, right?

Starting point is 00:43:06 New York City, New York State. We take care of that automatically. So we give you a key for that. It has information about parentage. So for any value, what is my parent? What is my ancestor? What level do I exist at? Oh, I'm a month, I'm a quarter, I'm a year.

Starting point is 00:43:23 It has all of these types of things. It has hierarchical sorting. So the analytic view of the hierarchies itself always include all of these columns, and they're tremendously useful for navigating data easily. The analytic view returns all data detail and aggregate as rows selecting from the same column. So aggregate data is embedded in the analytic view.

Starting point is 00:43:53 It's calculated dynamically, but it's embedded in the analytic view. So I can say something like select geography member name at country level and a measure. And that's all I need to do. I don't need to explain how data is aggregated, how it's joined. I don't need to explain how a time series calculation is calculated. That's all part of the definition of the analytic view. And I just select it. If I'm selecting from tables, it's up to me to me the application to express all of the logic of

Starting point is 00:44:27 that query so so you know you've i guess that you know the the story of your of your time doing what you're doing is is you know the ultimate thing you've done now is separated the kind of the multi-dimensional model the olaq model from storage really and i think when when when you know you and I first discussed this feature a couple years ago you were really at pains to say this is not like the next thing of Oracle OLAP this is this is about the the I suppose the way in which you access data and the and the metadata model but you know we're de-emphasizing the storage because the storage could be anything in a way is that kind of is that a good summary of it? Yeah, it is. And it's very important. So,

Starting point is 00:45:07 part of this has been getting away from dedicated multidimensional storage and letting something else play that role. And with modern versions of the database in memory now, X data, the storage, offload, and all that kind of stuff. We have a lot of ways to optimize the storage and the retrieval and the calculation of data in the relational engine. So that's all good. We need to keep the model. So you're right. It's about separating those layers and taking best advantage

Starting point is 00:45:46 of each. But it's also about something else. You know, the database has the ability to access a lot of different data types. So, you know, we have regular old row store tables, we have in-memory columnar tables, we have external tables, and underneath an external table might be a flat file. We have big data SQL, where the storage now is far away and disorganized in a Hadoop cluster, for example. All of this can be represented in the Oracle database for SQL query. So the analytic view can now be a common interface to all of these different data types. And from a consumption point of view, the analytic view unifies the presentation of all of these. That is, if you learn the analytic view and know how to query that, it doesn't matter what the underlying storage is.

Starting point is 00:46:53 And then the part two of this is the performance. The analytic view is very, very aware of how data is stored, and it optimizes in SQL for that, and it has its own caches to handle caching, for example, of aggregate data alongside the analytic view. So it's not just running away, in a sense, from the data type you don't want anymore. It's being able to layer a dimensional model over any data type. Okay. So is this your work done now, then?

Starting point is 00:47:24 Has this reached perfection in your mind? model over any data type okay so is is this your work done now then have you have you has this reached perfection in your mind or or yeah is there i suppose is there one more problem to be solved in this area i mean where would you see you know where would you i suppose what areas you're working on what have you announced recently open world or what's the next thing you're trying to do with this um yeah first of all it's never done. We all know that. What we're really concentrating on right now is what I was just talking about. It's the analytic view as the presentation point of the data for any data well organized and in the local node of database, or highly disorganized and far away. You know, a remote table, a Hadoop cluster, you know, a JSON object, you name it.

Starting point is 00:48:13 Anything that can be represented with SQL in the Oracle database can be represented through an analytic view in a consistent and codified and performant way. That's really what we're working on now. It's kind of a last mile problem. You know, you go through all this trouble gathering, storing, organizing data. How is it presented and how is it accessed efficiently?

Starting point is 00:48:41 Okay. And probably quite a few people listening to this podcast wouldn't have wouldn't have be so familiar with oracle these days because they might be from say looking with snowflake or big query or whatever or maybe that developers starting out and and the autonomous data warehouse is is now available isn't it on the always free sort of program with oracle cloud and you mentioned about analytic views being in there. So again, let me just kind of mention, I suppose, how might someone get started with looking at analytic views on there? And what would be a good thing for them to sort of try out to get a flavor of how this works?

Starting point is 00:49:14 Yeah, okay. First of all, analytic views are in every edition of the Oracle database, and there is no charge. It's not a licensed option or anything like that. You know, if I were to do that, I'd do two things. From a learning point of view, there's great tutorials on liveSQL.oracle.com. Just go there and search analytic view, and you'll find them. So if you just want to poke around at them, do that. You can play with them live. One of the, the getting started tutorial is always in the top 10 tutorials at LiveSQL. It's hit a lot. So that's, that's just

Starting point is 00:49:55 a mechanical, how do you learn type thing. The next thing I would do is from a, you know, I'd pick an application that just anything where it's obvious that, you know, some variable, some measure varies by, you know, a few things, you know, sales by product, geography, and time. Start to build something super simple and just get your feet wet and then, you know, just kind of expand from there. You know, unfortunately, it's not rocket science.

Starting point is 00:50:28 I suppose, yeah, that's really good. I mean, so it's, I mean, it's interesting. I mean, it definitely, I've been impressed with what you guys are doing. And I think it's interesting. It seems to me a bit of a general renaissance in the interest in data modeling at the moment. But one of the things actually i find is a little bit of a lost skill or lost awareness is is dimensional modeling and the value of a business model in the database and you know it's it's good to see you guys you know still

Starting point is 00:50:55 putting emphasis on this and and taking the stuff that you worked on you know many years ago but making it relevant for people today and and i suppose in a way making it accessible as well i mean i think the thing about analytic views yeah the thing that is probably not people aren't aware of it is just so easy to use really and in fact the sql you write with it can actually be simpler than than writing normal sql because you know the way you is that the case it's writing the sql to the analytic view can be incredibly easy um easy. I get back to just a simple comparison of I've lost sales this year versus last year. That's not easy to do. You've got to aggregate data to a certain level, then you've got to do the time series calculation you've got to worry about

Starting point is 00:51:45 well what happens if i'm doing this at the daily level and and sales didn't exist last year how do you do the null handling how do you do any of that um you know it's it's it's yeah it's not easy that's good well bud it's been great speaking to you it's been great speaking to you and i'm conscious of time that we're taking you for a long time here but it's um but it's been great speaking to you it's been great speaking to you and i'm conscious of time that we're taking you for a long time here but it's um but it's been great speaking to you and say you know you're someone who all the 20 years of consulting in this space and i think you're you've been the product manager of every new release that i've been thinking this is really every yeah it's always been very interesting really and uh yeah and uh thank you very much for coming on the show and it's been i'll post you know we'll get some links and uh

Starting point is 00:52:24 to the kind of live sequel tutorial and so on and uh and yeah it's been i'll post you know we'll get some links and uh to the kind of live sequel tutorial and so on and uh and yeah it's been fantastic speaking to you all right very good we'll we'll try to keep you happy going into the future mark Thank you.

Drill to Detail - Drill to Detail Ep.74 'Accelerating Oracle from Express to Analytic Views' with Special Guest Bud Endress

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.