Drill to Detail - Drill to Detail Ep.17 ‘Mode 1 Analytics and the Future of Cloud DWs’ With Special Guest Rick Greenwald

Starting point is 00:00:00 So hello and welcome to another episode of Drill to Detail, the podcast series about the world of big data, data warehousing and BI, and I'm your host, Mark Ripman. So I'm joined this week by Rick Greenwald, someone I'd heard of before in my days working in the Oracle Database Developer Community, but then we actually met properly for the first time at an event in Malta recently, where we spent all evening at dinner arguing about kind of why IT should be involved in BI projects, you know, where the market's going, and so on. And I said to Rick at the time, we should have this discussion further on the podcast. So he's here now. So Rick, do you want want to introduce yourself properly and then we'll kind of make a start so just tell everyone where you come from what you do sure my name is rick greenwald and as mark mentioned i i was around oracle for a while i worked for them for 15 years before i joined gartner about two and a half years ago and at gartner i initially joined in what they call the Gartner for Technical Professionals group, which is more kind of oriented at operations and people in IT.

Starting point is 00:01:11 And I'm now part of IT Leaders, which is more aimed at the C-level. My background and expertise is in Oracle. I've written, I think, 19 books on Oracle. But I've also been involved with cloud. When I was away from Oracle for a while, I worked at Salesforce, which was, I don't know, about nine years ago, and then I spent my last three years at Oracle as product manager for the Oracle database cloud.

Starting point is 00:01:34 So I've thought a lot about cloud too. So happy to chat with Mark, always like a spirited discussion, and I'm ready to go when you are. Excellent. Well, Rick, it's great to have you on here. So we were talking in Malta at the event and we were talking about a recent, you know, garden magic project that came along that was for the BI market. And obviously you weren't involved in writing that, but in a way it kind of crystallized a lot of the debate we've had

Starting point is 00:02:01 recently in the BI industry about the extent to which we should put the tools in the hands of business users and the extent, I guess, to which IT doesn't need to be involved anymore in these kind of projects. And it's been termed kind of mode one, mode two analytics and so on. And we had quite an interesting discussion about why IT is kind of important. So, Rick, do you want to just kind of like, just recap a little bit on your kind of thoughts on that before we kind of get into the detail? Yeah, as will become obvious in my discussion in the kind of discussion of IT or not IT,

Starting point is 00:02:34 I'm pretty firmly on the IT side, and I think I have some reasons for it too. So, the first thing you should know, and this is really going back to my background, when I was at Oracle, I mainly focused on Oracle as a transactional database. And a lot of the stuff I wrote about involved some of the tricky and complex issues in doing transactional OLTP-type operations. And this involved things like locking and consistency across multiple writers.

Starting point is 00:03:02 And these are really extremely difficult problems that are not really relevant to analytic use of the data at all. When we're talking about what you want to call a data warehouse or analytics or whatever you want to say, that's almost an entirely different work profile. Instead of having to deal with consistency and small amounts of data, relatively small amounts of data, repeated many times, you're now dealing with large data sets, small results, and an emphasis on reads. And virtually not caring about consistency at all, because you're typically looking at

Starting point is 00:03:36 data as if it were consistent where it is. So it's a different set of issues. And because it's a different set of issues, when you're working in an OLTP environment, if you're doing it in a serious enterprise way, you are eventually going to have to wrestle with those issues, very difficult issues of things like consistency. If you're doing analytics, you don't have to. And I mean it started with Excel and has continued through analytic tools, and it continues even more with the

Starting point is 00:04:05 cloud, where, as you know, you can become IT if you have a credit card. That's all you need. Now, that's a difference in workloads, and that's a difference in emphasis, and that means that you don't necessarily need the same things to work with data analytically that you do to work with data in an OLTP environment. However, the same qualities of data come over. And one of these areas is an area which is difficult to do, a bit difficult to understand,

Starting point is 00:04:38 and therefore is kind of ignored by a lot of people in analytics. And this is the whole idea of data integrity implemented by governance, data quality, all that hard stuff in the background that makes your data correct. Because as we all know, data entered in any way, shape, or form is not necessarily correct, meaning consistent and congruent at a single point in time. That's hard to do. You have to go through the governance process. And the governance process is hard to do because it's not really IT.

Starting point is 00:05:13 It's organizational. It's interactive. It's political. People have to get together and agree on compromises and think problems through. Now, someone who says, just give me the data and let me look at it, is making a secondary statement by that. I don't understand governance. I don't think it has value, so I'm going to skip that.

Starting point is 00:05:35 Okay. But there are risks to run there. And keep in mind, I just want to make it clear when I talk about this from now on, when I talk about risks, that doesn't mean you shouldn't do it. It means you have to understand your tolerance for risk, and you have to calculate the potential effects of risk into what you're doing and the results that you get. So you may be very happy to say the data is a day old that I'm looking at or somewhere between one and 24 hours old because we only do a daily refresh. That may be perfectly okay for your analytic purposes. And that data could be fully consistent within that window.

Starting point is 00:06:18 But you can't say I'm not even going to think about this, I'm not going to bother about this, I'm going to get data from many different sources and just kind of plug it together, come up with an answer, and that'll be it. You know, might be right. There is the whole concept of luck. And by the way, to be dead serious about this too, you can make good decisions with bad data and bad decisions with good data so none of these are written in stone but for me I want my data to be consistent and

Starting point is 00:06:50 accurate that's kind of table state table stakes it doesn't come that way automatically you know it at least has to be examined and validated yeah yeah exactly and I think probably there's there's a few kind of dimensions to this to this kind of you know argument in favor of mode one, as you might call it. I suppose there's, like you've been saying, there's the accuracy of the data. So there's this thing about the numbers have to be kind of accurate and they have to be consistent across the organization and so on there. And like you say, that's about risk and it's about accuracy and so on. And then there's the kind of the thing that perhaps the business would say that IT is within their domain, which is, I guess, kind of architecture and performance and that sort of thing as well. And there's also perhaps the area of efficiency as well.

Starting point is 00:07:36 You know, it might be that we can get you data today now and then you can kind of do stuff and hand kind of wrangle it and so on. But keeping doing that over time is going to make you an inefficient organization. And I think that there are probably sort of three areas that are worth looking at here, really. I'm going to take issue to one thing you said, but it ties on to something else I wanted to say. Whenever I talk to people, and by the way, I'm writing research about this right now, it's going into the final review process. Whenever I talk to people about consistency, my colleagues come back and say,

Starting point is 00:08:07 you don't always need the level of consistency that you're talking about. And I say, you're right. You don't. As long as you understand the consistency level you have, as long as you understand the integrity or lack thereof that you have, it's okay. However, you have to be aware of that, number one. And number two, there's the specter of scope creep. You know, I've come this far.

Starting point is 00:08:35 The people who advised me, the people who implemented this, knew the quality of this data. And now someone else is going to get it and extend it further, basically beyond what you could do in terms of consistency. And then it goes over the data integrity cliff. Let me give you an example from the world of IoT, which I think this is going to be a massive issue moving forward. So when you talk about consistency, there's a few specters here,

Starting point is 00:09:10 there's a few straw dogs. I mean, one of my least favorites is the whole idea of eventual consistency, which I think is a longer way of saying inconsistent. Data which is not consistent does not necessarily get consistent, period, full stop. Data which is inconsistent cannot necessarily be identified as being inconsistent because it looks just like consistent data. It just isn't. So because of this, once data becomes inconsistent, it only gets worse.

Starting point is 00:09:41 It can't get better. And you're using, so let's say you're getting sensor data that has to do with a wheel. And is the wheel spinning faster than the other wheels? In other words, are you starting to skid on ice? All right. You have a sensor in every wheel. Now, you could get that data, and there could be lags in the sensor readings, or there could be lags in the sensor transmission. So you wouldn't know whether event A1 came before or after event A2.

Starting point is 00:10:14 Well, that obviously wouldn't do. So people will implement what I would call serialization. Serialization says, I guarantee you that if you get event A2, it happened after event A1 and before event A3. So you implement that and you say, fine, my data is in therapy, my data is right. Except, remember you have one of these in each wheel. So then there's B, and you can once again say that event B1 comes before event B2 comes before event B3. But here's what you can once again say that event B1 comes before event B2 comes before event B3. But here's what you can't say.

Starting point is 00:10:48 You can't say event B1 came before or after event A1. Absolutely. And you can't say whether event B2 came after A1, after A2, or after A3, unless you have a universal time source, which, by the way, is pretty difficult to implement. You can't say that. And here's a situation where if you're going to implement something somewhat operational in the sense that you're going to take corrective action for a skid, does it matter if the right wheel started spinning before the left wheel?

Starting point is 00:11:21 Oh, yeah. It matters a lot. So if you went from just saying, I want to find when it's skidding, great. But if you went from saying I'm going to combine it with other data, I'm going to assume consistency and take an action based on that, you've just stepped over the cliff. So Rick, so one of the complaints I guess about IT is that it's kind of, it always says no and it takes too long to do things and in particular enterprise architecture and planned architecture performance kind of

Starting point is 00:11:50 tuning and performance kind of thinking is something that IT spends a lot of time on and the business doesn't see the value you know is that a defunct skill now or is there value in that do you think? Well first of all it's not possible it's a defunct skill I mean I don't think anyone in the world would say that we're getting less data in our lives and we're getting fewer data sources in our lives. And the idea that you're getting more data and more data sources and you need less IT, it just doesn't make any sense at all. I'm not saying that I don't get calls asking that question frequently, but it really doesn't make any sense.

Starting point is 00:12:26 So let's back up, because you bring up a really important point, which is the way you phrase the question is kind of like, is that planned architecture? Is that design skills? Is that IT? Is that mode one stuff? Is that obsolete or not? And the answer is, it should be not, is it obsolete, but rather, when is it appropriate to use that, okay?

Starting point is 00:12:51 So if we're talking about people who want to do this stuff in a more rapid way, one of the problems we have is that understanding why things like governance and integrity are important, it's not something that's trivial to understand, and it's something which is much more familiar to people in IT than people in business. So people in business who basically, or agile people, or people who want to do analysis immediately, if they're rejecting this stuff because they don't understand it, they lose.

Starting point is 00:13:26 That doesn't pass. However, if what I'm saying is true, we obviously are going to have more demand on IT and those sorts of skills than they can fill because the need for IT knowledge and IT systems is exploding, and the IT staff is not. So when do we do this? Well, we do this when the cost of taking those actions is outweighed by the benefits. And this brings up the whole idea of the discovery use of data and the BI use of data, let's just say. And by BI, I mean operational reports that you run every day. I mean planning forecasts that you run on a regular basis.

Starting point is 00:14:13 So if you take that kind of lump that you need to design and plan your structure, and that's a fixed cost, and you spend it to use this data in a certain sort of way, whether you use it that way once or a hundred times, obviously if you're only going to use it once, if you're doing discovery or something like that, it's a bad investment. And by the way, I actually don't believe you're going to get it to go faster by being smarter or more agile. I've worked with a few IT departments, and I'm pretty sure that none of them have had as part of their interview process,

Starting point is 00:14:52 can you go really slow because we only hire people who go slow. It doesn't work that way. Keep in mind something I say to people on a weekly basis, performance is all a matter of expectations. So if you think they're going slow, it means you want them to go faster. It doesn't mean they're going slow. Let's just be real clear. So the whole idea is that you can use data in mode two.

Starting point is 00:15:20 You can use data for discovery. You can use data for discovery. You can use data for exploration. That's all fine because that's something where you're going to do that analysis, and much of the time it's not going to pan out. You're going to throw away the analysis. You might even throw away the data, right? So spending time up front, that big indigestible lump of planning, for data which you're going to look at once and throw away is a really bad idea.

Starting point is 00:15:49 However, if you're going to be using that data over and over and over and over again, at that point, the cost of that lump is amortized and becomes almost nothing. And remember something else. I don't care what anyone says, that if you're designing your data structures, if you're designing your data flow, it's going to run more efficiently than if you're not. Full stop. You can't throw hardware at the problem and make it run faster indefinitely. Okay?

Starting point is 00:16:20 So it's going to run more efficiently. It's going to run faster. And when that's appropriate, it should be done. This is why I sometimes say when you're looking at Mode 1 and Mode 2, Mode 2, your more ad hoc day discovery business practice, it's not designed to get answers. It's designed to ask better questions. And then when you come up with a good question,

Starting point is 00:16:43 then you can say, now let's find out the answer and i think everyone would agree or i would hope everyone would agree that the answer is only as good as the question so if you're asking a bad question you're asking a bad question it doesn't matter how fast you get to the answer it's still going to be a bad answer so so yes i think everyone would agree with that but what's your what's your i mean we touched on um uh techniques there and ways of doing things that that you know people would describe as maybe a data lake and data lakes i suppose my concern with them is it's a great idea i think you know you if you have you have to understand the context in which they were thought up but you know that there's ways in which data lakes are being used now which is putting a lot

Starting point is 00:17:23 of the burden of kind of data modeling and understanding onto users. And they're becoming kind of, in quotes, operationalized. You know, what's the Rick view on data lakes, really, and where the value is and risk and so on? Well, that's a horrible mistake. Look, a data lake, we have many definitions of data lake. Even in Gardner, there are many more out there in the world. But there's one definition I like. A data lake is where you put data with unknown business value.

Starting point is 00:17:54 I'm not saying it doesn't have business value. You just don't know what it is yet, okay? So the thing that organizations have to be aware of is that there's a continuum and there's got to be a flow. So when you have data in a data lake and when it becomes useful, and by that I mean it's being used frequently, it's being combined with data, curated data from the EDW, it's being used in ways which are tending towards operational decisions or strategic decisions, at that point it has business value.

Starting point is 00:18:26 And at that point, you have to combine it into your overall data warehouse estate. Now, at Gartner, we have this concept called the logical data warehouse. Other people have different names for it. But one of the things it says is you don't necessarily have to move that data from the data lake to the data warehouse, but you have to combine it together into a uniform semantic layer so people can access both of them. Now, keep in mind, adding it to the semantic layer is going to require the governance. It's going to require that under just the lump that I talked about before. Keep in mind also that the data lake is, in many cases, not going to be as efficient as it would be

Starting point is 00:19:10 if you combined it with the data in the central repository. So it's my view, and not everyone agrees with me, but it's my view that the data lake is a fluid and ever-changing environment where a lot of data comes in and some data goes out. When that data proves its value, it gets integrated into the larger picture. And whether that's a separate part of the data lake, which is part of the LDW, or whether it actually migrates to a more central repository,

Starting point is 00:19:41 that once again is an issue having to do with efficiency and performance. But I think that's the way that it's going to work. And when people say, and people have asked this, they say, well, isn't the EDW, isn't the Enterprise Data Warehouse being replaced by the Data Lake? No! Not at all. Never. And I don't say never because I'm an old-fashioned curmudgeon, although I am. I say never because, you know, this reminds me of a very impactful experience I had when I was young. I was watching television, a show with one of my favorite early philosophers, Bozo the Clown. And Bozo was in a, you know Bozo Marquis of America?

Starting point is 00:20:28 Yeah, yeah, yeah. Okay, anyway, sorry. Cultural reference, if you don't know Bozo, sorry. He was a television cartoon clown. He was in a dogfight in airplanes with his enemies, and he's got a biplane with a propeller. And one of his enemies shoots at his bi a biplane with a propeller and one of his enemies shoots at at his biplane and shoots off the propeller he says oh boy now i'm a jet now i can fly fast that's kind of what this is like you know you don't turn a data lake into an edw by wishing or by using it as if it were an edW. It lacks the qualities of an EDW.

Starting point is 00:21:05 Yeah, exactly. Now, we'll get on later on to kind of, I suppose, where the market and technology is going, and it may well be that some of the technology could be used for that. But let's take a step back, and we made quite a spirited defence of the things that we think are important. But let's take a sort of step back and think, well, you know, this argument about IT is considered slow.

Starting point is 00:21:23 You know, let's think about why that is that is the perception okay and is that actually a kind of a signal or proxy for something else really so yeah first of all but you know what why do you think why do you hear that businesses say that IT is slow and what do you think they're kind of really saying there what's the underlying issue there do you think oh i i i can tell you my opinion okay yeah um and by the way to some extent it brought this on themselves all right and i'll get back to that in a moment but i uh my thinking on this really became clear in the course of a discussion i was having with a client.

Starting point is 00:22:08 At one point in the discussion, he began to complain about having hundreds and hundreds of ETL processes, and they would drag them down. He was spending all his time matching them, blah, blah, blah, blah, blah. After a little bit, I said, I think you may have forgotten the value that these processes are providing. He just changed. Like his voice changed. He goes, you're right. So I think what has happened is a lot of these things,

Starting point is 00:22:36 which are things that, as I mentioned before, IT understands and business doesn't. IT has just kind of not bothered to explain them very well, not bothered to advance a value proposition that explains their value, and they were just saying, well, IT, you have to do this with us. Now, what's happened in the past decade and certainly has been accelerated by the cloud is, no, we don't have to deal with IT. We can go to Amazon, we can go to Microsoft, we can go to anyone and say, we're getting

Starting point is 00:23:07 this stuff in the cloud, and we're seeing an increasing amount of IT-type budget being spent with entities other than IT. And before that, it was IT budget being controlled by entities outside IT, which that's been going on for a while. But now you're being spent without IT even knowing about it. So IT has to represent the value proposition better. The key example of this is I was having a conversation with a guy who was the CTO for a pretty significant product at a pretty significant company,

Starting point is 00:23:39 and I was going through my kind of explanation of how business doesn't understand IT. They really, you know, the stuff IT does that business can't do. They're skipping over that because they don't understand it. He's, yes, he's buying an agreement, blah, blah, blah, blah, blah. I say, and IT doesn't understand business. And he goes, oh, no, IT understands business. And I said, you know, sorry, you can't have it both ways.

Starting point is 00:24:08 You can't say that IT understands everything and business understands nothing. That actually doesn't work that way. So there's going to have to be kind of a reset of the relationship between IT and business. Yeah, definitely. I mean, I think, and there's a point that I think your colleague, Cindy Housen, made a couple of episodes ago when she was talking about, it's all about risk and it's about understanding risk and so on. And going back to your point earlier on, you talked about the IT being there almost as the, I wouldn't say the conscience of the business, but certainly the kind of the saying, you know,

Starting point is 00:24:47 that there's risk involved in not having kind of numbers matching up. There's risk involved in not curating them and not sharing them and so on. How do you get, how do you, how do you find, what's the successful kind of strategy and approach that IT can have to try and tell the business that it's kind of taking a risk that the business really carries the risk for. And it's not seen as basically being a blocker or being, you know, lecturing kind of taking a risk that the business really carries the risk for um and it's not seen as basically being a blocker or being you know lecturing kind of business how can it how can it

Starting point is 00:25:10 get that message across without being seen as lecturing or getting outside the scope of what it should cover i think i think there has to be a a new deal essentially And what this means is, so what business wants, what mode one wants, is what I call frictionless access. They just want to get access to stuff. They don't want anything standing in their way. Okay? What IT wants to provide is, if we look at mode two, I would say it's more frictionless integration.

Starting point is 00:25:45 So this is data that's been properly governed and all that, right? IT has to stop saying no all the time, okay? And the deal, I think, that has to be struck is IT and says, we will give you access, but you have to understand the limits of what you're doing. They can't stop them. I mean, I have a friend I've known for a long, long, long time. Smart guy, actually has somewhat of a background in IT also. At one point, he was so frustrated, he was telling me, I think we should fire the entire IT staff and hire a new one we'd be better off.

Starting point is 00:26:32 And I just said, I guarantee you that's a bad idea. I guarantee you that's a bad idea because, number one, if you're not paying more, you're not going to get better people. Number two, if you get the same level of people, at least the people you have now understand your environment and the new people will not. So it's going to be worse while they learn about your environment. I don't just mean technical environment. I mean business and organizational environment.

Starting point is 00:27:03 So there has to be compromise. There have to be bridges built. And really, here's my kind of big idea on this that I suggest. And it's not a technical idea. It's an organizational idea. And I came about this from understanding the cloud. The thing about the cloud is the cloud takes some part of your IT infrastructure and hides it from you, puts it behind a wall or a curtain that you can't see through and you can't get through, right?

Starting point is 00:27:40 So if you were going to implement high availability, if you want to guarantee availability from your cloud provider, you wouldn't do it by saying you need to implement this replica this far away with this CPU. That's not the way it works. That's all hidden from you. What happens instead is your cloud provider says, I'm going to give you this service level agreement for this is how frequently I'm going to be, this is how much I'm going to be up,

Starting point is 00:28:09 the percent of availability, and this is how long it's going to take me to recover in the event of a loss of availability, and here's the penalties involved with that, although the penalties are never reasonable in the sense that if you have a really bad outage, a 10% discount on next month's bill is not going to make it better. So what I suggest as a starting point is that IT and business basically come up

Starting point is 00:28:38 with some SLAs of their own. In other words, you say, business, when you make a request for me, you will get a response, not an answer, but a response within X amount of time. And just like cloud providers, the X amount of time that you give is a time which you will never miss. So if you say you'll get an answer in 10 minutes, but you know you can't make it, that's an SLA, which is not going to serve you as a function. If you say you're going to hear within two hours,

Starting point is 00:29:12 it means, number one, when an end user needs something, they can count on getting a response in two hours. Number two, if they need it by a certain time, they know maybe I should begin two hours early just in case I need help. So this sets up some certainty in the relationship, and the certainty in the relationship starts to build expectations properly as to when things can get done.

Starting point is 00:29:33 The other thing that can happen from this is as you build your SLAs on different aspects of what you're doing, non-IT people come to understand the hard bits. So, for instance, you know, putting data in a data lake and giving you an end-user tool to access it, that's not hard. Getting data cleansed with proper quality and integrated with existing data stores, that's actually a lot harder. But if you say, if all you want to do is to get this data, you identify a data source, we put it in a data lake, or rather, you identify a data source, we respond to you within an hour as to whether we can consider it, we respond to you within 24 hours as to whether we can do it, and give you a date as to when it's going to be available, that is typically

Starting point is 00:30:24 a good enough response, and you can set expectations around that. Now, if they say, when can I use this with my enterprise data, you can then continue to say, what does that mean? Do you mean you just want to use enterprise data and play around with it, or do you mean you want to use it with enterprise data in a way that you're going to come up with enterprise answers? Because the second requires governance, the first does not. So this is the sort of thing you do to build a foundation, a new way of interacting,

Starting point is 00:30:53 because the reality is, you know, certainly 30 years ago, maybe 20 years ago, probably not 10 years ago, IT could more or less do what they needed to do with the amount of requests that they had. That's no longer true and that's never going to get through again. Interesting. And so that's quite an interesting kind of lead in, I think, because another area is interesting to talk to you about, is which is cloud okay so one of the I suppose one of the kind of responses from from businesses to is to use cloud for this um your own old company Oracle is doing a lot of work in that space and just to recap a little bit you just tell us what you did on cloud at Oracle before just to kind of set the scene a little bit for your your kind of experience and then let's talk about where this is going so what what did

Starting point is 00:31:43 you what did you do with database and cloud cloud Oracle first of all I was the product manager for the Oracle database cloud the first iteration of it which is they're now calling the database schema service so that that was the thing that came out about five years ago and I was involved in working on what came out later which is what they call the unmanaged database cloud, where you get a pluggable database. So you get your own database. It's unmanaged, so it's not really a database as a service, but it's more than just infrastructure as a service. Okay.

Starting point is 00:32:21 Okay. So that was – and I think most listeners will be aware of that and and so on there um so so i mean where do you i mean where do you see the database cloud market now uh and where do you see where do you see it going really in terms of technology and the value and so on so what's your view on where it is now first of all rick well i mean first of all let's just talk about the two levels of the cloud which are relevant when we're talking about database. Infrastructure as a service and database as a service, which we call database platform

Starting point is 00:32:55 as a service because it falls into a category of path. And what we're really talking about here is the different levels of the stack where the separation is between the cloud and on-premises. Infrastructure as a service is basically saying I'm giving you a server in the cloud. You can put whatever you want on that server. You can load database software on there. You could get a machine image, which is almost like a VM template, to say when I allocate this, I'm going to get a fully functional, fully installed database with default configurations when I allocate this.

Starting point is 00:33:34 Database as a service does the same thing, but the key difference is for database as a service, all basic maintenance functions are managed by the cloud. You don't have full access to all the configuration options you would for that database instance. You, in fact, do not own the instance. You use it. What I say is, in Oracle terms, or really in general database terms, for a database as a service, you are not the system administrator. For an infrastructure service, you are because you've loaded the software. Now, your system administrator means you can do anything,

Starting point is 00:34:11 and also you're responsible for it. So even though infrastructure as a service with database software may come with the software fully installed, including automated maintenance procedures, at the end of the day, you're responsible for it. So if the backup isn't done, you have to take some action to make that happen. So the infrastructure as a service marketplace, if you ask me, is pretty much over. It's a duopoly. It's Amazon and it's Microsoft.

Starting point is 00:34:52 Really enormous differences in scale between Amazon and Microsoft and everybody else. Since infrastructure as a service is fairly low level, it's more like the hardware market than the software market in some ways. And we all know the hardware market exists on margin, right? Margin and market share. That's what we've seen this play out in hardware field after hardware field. And when you have a company as dominant in terms of market share as Amazon is now, that's not going to change. Just as an aside, I have long believed in the idea that mature markets have three players, a leader and then a second and third, and the second and third may swap their positions back and forth, but they basically never catch up.

Starting point is 00:35:46 In that market, markets change and evolve and new markets come up, up in that market. Markets change and evolve, and new markets come up in that market. And I always thought it odd that in the infrastructures of service market, really we're talking about a duopoly, not three different vendors. I mean, and whenever you ask people, they'll come up with different ideas for the third one, but, you know, whoever they bring up is still a significant distance behind. So what was your take, I mean, when you saw, I don't know if you saw, I listened to Open World last year, and Thomas Coombe was very much bullishly talking about, you know, Oracle will overtake Amazon, just in that basic market. I mean, that was, there was a lot of kind of scepticism, to put it politely at that really did you did you

Starting point is 00:36:26 kind of see that I hear that at the time oh yeah it's been diplomatic here yeah I heard it yes yeah I mean it's interesting it's interesting that that I don't know if there was an analyst there actually, you might have been there actually recently and it seems that for our part particularly, the emphasis now has changed to be more getting the business that's currently managed service business that partners have. I think to try and compete head on with Amazon is, I mean this is a sort of separate issue with databases, but certainly I get the impression that like you say that the lead of Amazon is so unassailable and it's all and it seems

Starting point is 00:37:08 it seems kind of ridiculous to think you can get there really but I don't know I mean it's a it's a it's a target I suppose well I mean look keep in mind what's the difference between IAS and managed hosting. What's the difference? Well, the difference, there's not a great deal of difference in many ways because in both of them you have the ability to do whatever you want on top of that platform. But it really has to do with how it's priced and how it's achieved. So self-service and pay-by-use. but it really has to do with how it's priced and how it's achieved. So self-service and pay-by-use.

Starting point is 00:37:53 Managed services are not pay-by-use. Managed services are pay-by-configuration or whatever you want to call it. So people are – in one way, the cloud is re-energizing that whole thing. And one thing that we are seeing, kind of a little bit surprising to me, is people are getting more and more interested in private cloud, because there's some issues with public cloud. And I personally think these issues are legislative and perceptional rather than actual issues, you know, security and all that stuff. But there's also people who say, I want to have the machine in-house. And we're seeing it. And Oracle's done this, by the

Starting point is 00:38:40 way, with cloud at customer, where they say, we're putting the machine in your data center. You're paying for it with the exact same pricing model and licensing model that we have in the cloud, which is a really big difference, and we'll manage it for you. So something like this is giving people all the advantages of the cloud and not having to worry about it being out there, outside of your firewall. Now, what this can also mean is you can also say it's a little, it's not a gigantic bridge to say, okay, and I want a custom configuration here. Because in the cloud, you can't get custom configurations.

Starting point is 00:39:23 That kind of breaks the cloud model. Inside the cloud is a uniform environment, fully under control of the cloud provider, and it benefits because the cloud provider can automate. They can manage a thousand instances with not a whole lot more than managing one. Well, the only way you do that is if all 1,000 instances are just like the one instance, right? If all 1,000 are different, you know,

Starting point is 00:39:51 it's like work harder, work smarter, not harder. Sorry. So there's a lot of give here. And it was funny. When I was product manager for the cloud, I would get people coming up to me a lot of give here. And it was funny, when I was product manager for the cloud, I would get people coming up to me a lot and saying,

Starting point is 00:40:08 I want this, I want that in terms of configuration. So even when people would say to me things like, do you use Rack? And you know, if you're Oracle, Rack's a big deal. Do you use Rack?

Starting point is 00:40:21 And I'd say, none of your business. Because I'm a smart ass. But it really is none of their business. Whether Oracle is using RAC inside their cloud or not, you can't do anything about it. It's a great technology. It's a really good idea. They really should, but they can't change it because they don't have access at that level what they will get is an SLA what's your percentage of availability and how long does it take you to recover

Starting point is 00:40:55 from a failover and whether you implement that SLA with rack or data guard or backup you know is transparent yeah transparent. Yeah, yeah, exactly. I mean, one thing I'm interested to get your take on, really, Rick, is, you know, we've talked about Oracle database in the cloud and so on there, but what about some of the things we've seen coming out of, say, Amazon, for example, and, say, Google with BigQuery? I mean, I do a lot of work now, actually, with BigQuery, and it's an interesting take on things.

Starting point is 00:41:24 The elasticity is something that's interesting there. But you've obviously got Athena coming through from Amazon, but Amazon seems to have quite a different approach to all this as well, in that there isn't just one database engine there as well. There's lots of different ones. And what's, again, first of all, what's your take on the way that Amazon are doing this thing, really? Amazon is really interesting, OK? Because Amazon, and we run into this

Starting point is 00:41:48 when we try and evaluate them against other vendors, because Amazon is taking just a philosophically different approach. Because let's compare Amazon with Oracle, okay? And I can do this without saying good or bad. It's just going to be different, right? If you look at Oracle, essentially there's the Oracle database, right? And the Oracle database does all kinds of stuff. By the way, same thing with SQL Server. SQL Server has three different processing engines within it, right?

Starting point is 00:42:22 So they have a single unified entity, and this means a lot of stuff, okay? This means that maintenance procedures are going to be the same, even though you, you know, if you have a local database that you're using for transactional work and OLTP and ETL, you know, it's okay. You're all doing it in one place. When you back it up, you back it up all the same way.

Starting point is 00:42:48 When you allocate resources, you allocate from a single pool. There's some efficiency to be gained by a unitary view of what you're doing. Amazon says we're not going to do that. Amazon says we're going to have Aurora and we're going to have Redshift and we're going to have EMR and we're going to have Redshift, and we're going to have EMR. And they all do different things. Now, what Amazon says is we think we're going to service your need better by giving you something designed specifically to do what you want

Starting point is 00:43:19 rather than a more general tool. And any overhead involved in bringing this stuff together is up to you. They're also, by the way, they're kind of making a bet, saying when people come to purchase this service, even though they will eventually need multiple services, they're only going to need one to start. And that's the one we'll sell them. And when it comes time for them to do something else,

Starting point is 00:43:47 when you want to speed up what you're doing, we sell you ElastiCache. When you need a document database, we sell you DynamoDB. Now they are all, you get one bill for all this, you get a management console that manages these as distinct entities from what they call a single pane of glass, and we'll give you that, and we're going to leave that

Starting point is 00:44:13 up to you. And I've talked to Amazon about this repeatedly, and they believe this is the way it should be done. And that's different. So it will be. Now, of course, they've been wildly successful.

Starting point is 00:44:34 Wild. I mean, when they first started telling us about customer numbers, we were stunned. I mean, we frankly didn't even believe it. I mean, and it's not that we didn't believe it, that we thought they were being dishonest, we just, we had no idea it grew that fast. So, time will tell. And by the way, by the way,

Starting point is 00:44:56 Amazon may change their tune. They may decide that, you know, it's going to be more unified. But that's the way it is now yeah I mean I guess I guess yeah to look at say Google I mean BigQuery is the engine they use there is a lot more you know I suppose consumerized it's a lot more kind of single service I guess really again what's I mean I think for example Oracle might come up with something at some point along that lines

Starting point is 00:45:22 really you know what's your view on on these kind of big elastic data warehouse engines that are, you know, that all, I suppose, in a way, are one type of engine for everything? What's your view on that? Yeah. What's your view on that? Well, first of all, when you have a product that's been... So many, many products were designed for a pre-cloud environment, right? You didn't worry about separation of storage and CPU because when you bought a computer,

Starting point is 00:45:58 you got both of them, right? You could expand storage, but it came coupled together. So that's how they were designed, and that was such a fundamental assumption that everything from the deep core of the product was designed that way. Now you have products out there, some of them, have been designed from the ground up for a cloud environment, meaning there's a shared pool of storage separate from a shared pool of CPU, and you can increment either one of these independently. Now, there's a lot more to it than that, because, of course, it's not just storage and CPU.

Starting point is 00:46:40 There's interconnect, and you need that balancing and all that stuff. So this doesn't solve all problems. It solves some problems. So, for instance, if you need an unbalanced configuration, meaning sometimes you need more storage than would normally be allocated for the amount of CPU you have, if the product's been designed from the ground up for separation of compute and storage, you can do that. If it hasn't, it's going to be harder to do.

Starting point is 00:47:09 Now, keep in mind, it's behind the cloud. It's inside the cloud, so you can't necessarily see what's happening, but you can feel the effects of it. So something like Redshift is not going to allow you to expand from a large size to a very large size in a matter of minutes because of the way it's implemented behind the scenes. Other products may allow you to do that. Now, so that's something which is actually different.

Starting point is 00:47:37 It's designed differently. It's implemented differently. And by the way, even more so, I think you're going to see in the not-too-distant future, some cloud companies have really designed their infrastructure from the ground up. So for instance, at Oracle, they've announced this, so I'm not telling anything out of school, that they're a bare-metal database, where they put a lot of emphasis on how they design the networking. And they can do that because it's their environment. They can do anything they want.

Starting point is 00:48:11 They can control it any way they want. So we're starting to see the dawn of people going out and doing stuff, which is hard to do in a kind of unitary environment where people are connected by networks, and easy to do if you start with that assumption, if you start with a distributed assumption from the beginning and you take steps in your very initial underlying architecture to say, we're going to take care of this. So, for instance, you could build a service where you had

Starting point is 00:48:42 an absolutely synchronized atomic clock around the world. Well, that in and of itself doesn't buy you anything, so no one's going to do that for starters. But for things like having a distributed system where you have to resolve conflicts based on a timestamp, that's table stakes. So I think we're going to see more and more people, cloud vendors, bringing out architectures that were based on the assumptions of an internal environment, totally controlled by the vendor. What can I do when I'm in that place? Interesting.

Starting point is 00:49:21 So I'm conscious of time now, and I think we're almost running out of time. But one question I want to ask you, Rick, is if you're a developer, if you're a kind of buyer of kind of database technology and so on, where would you start? What would your starting point be now in terms of your technology choices, how you look at evaluating which database to use for analytics and so on? You know, where would you what would you what would kind of, your advice be now really on this? Where would you start? In cloud, would you on-premise? What would your thoughts on that be?

Starting point is 00:49:51 Yeah, I mean, first of all, I would say that at this point in time, saying cloud first for new projects is very feasible, okay? And keep in mind what that means. That doesn't mean cloud is the default choice. That doesn't mean cloud only. That means cloud should be included in your initial selection criteria. And if there's a cloud product which is appropriate for your needs, that should be considered along with on-premises. Now, keep in mind, of course, I hope we've gotten to the point

Starting point is 00:50:27 where the values of cloud are not overemphasized. And by that, I mean, at least certainly at Gartner, there's a very widespread acceptance that cloud is not cheaper. Cloud is expensive in a different way. If you're doing one-to-one comparisons, cloud is not going to be cheaper. And by that I mean if you're using cloud 24-7 the same way you would on-premises with the same hardware and the same resource requirements,

Starting point is 00:50:56 you're changing your licensing model, you're changing your cost model. But three to five years out, you're going to be paying the same and then you're going to be paying more. The way you save money on cloud is you pay as you go, so you don't have to pre-buy hardware. You don't have to use more than you're going to use. So look at the cloud. The other thing is I hope that people would realize that the fact that you can deploy a cloud instance in a matter of minutes

Starting point is 00:51:27 as opposed to taking a week to setting it up in-house or something like that is, it seems like a lot of time, and it is in the first month, but over five years it means nothing. So I would never say cloud's your only answer. I'd say look at cloud along with other things too. Excellent, excellent. Well, Rick, thank you very much for your time. It's with other things too. Excellent, excellent. Well, Rick, thank you very much for your time. It's been great to speak to you, really. It's great to talk, you know, talk databases,

Starting point is 00:51:51 talk with the veteran and so on there. So thank you very much for your time. Have a good, you're in the States, I take it now, so have a good afternoon, morning, or whatever. And thanks so much for coming on. It's been great speaking to you. It's a real pleasure, Mark. Thanks for having me it's been great speaking to you it's a real pleasure mark thanks for having me okay cheers thank you

Drill to Detail - Drill to Detail Ep.17 ‘Mode 1 Analytics and the Future of Cloud DWs’ With Special Guest Rick Greenwald

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.