Drill to Detail - Drill to Detail Ep.17 ‘Mode 1 Analytics and the Future of Cloud DWs’ With Special Guest Rick Greenwald
Episode Date: February 2, 2017Mark Rittman is joined by Gartner analyst and ex-Oracle Database Cloud Service PM Rick Greenwald to talk about IT’s continuing (and essential!) role in corporate BI&DW deployments and the debate aro...und Mode1 vs. Mode2 Analytics, how we got here, and the future of data warehousing database platforms as we move into the cloud.
Transcript
Discussion (0)
So hello and welcome to another episode of Drill to Detail, the podcast series about the world of big data, data warehousing and BI, and I'm your host, Mark Ripman. So I'm joined this week by Rick Greenwald, someone I'd heard of before in my days working in the Oracle Database Developer Community, but then we
actually met properly for the first time at an event in Malta recently, where we spent all evening
at dinner arguing about kind of why IT should be involved in BI projects, you know, where the
market's going, and so on. And I said to Rick at the time, we should have this discussion further
on the podcast. So he's here now. So Rick, do you want want to introduce yourself properly and then we'll kind of make a start so just tell
everyone where you come from what you do sure my name is rick greenwald and as mark mentioned i
i was around oracle for a while i worked for them for 15 years before i joined gartner about two and
a half years ago and at gartner i initially joined in what they call the Gartner for Technical Professionals group, which is more kind of oriented at operations and people in IT.
And I'm now part of IT Leaders, which is more aimed at the C-level.
My background and expertise is in Oracle.
I've written, I think, 19 books on Oracle.
But I've also been involved with cloud.
When I was away from Oracle for a while,
I worked at Salesforce, which was, I don't know,
about nine years ago, and then I spent my last three years
at Oracle as product manager for the Oracle database cloud.
So I've thought a lot about cloud too.
So happy to chat with Mark,
always like a spirited discussion,
and I'm ready to go when you are.
Excellent. Well, Rick, it's great to have you on here.
So we were talking in Malta at the event and we were talking about a recent, you know,
garden magic project that came along that was for the BI market. And obviously you weren't involved
in writing that, but in a way it kind of crystallized a lot of the debate we've had
recently in the BI industry about the extent to which we should put the tools in the hands of business users
and the extent, I guess, to which IT doesn't need to be involved anymore in these kind of projects.
And it's been termed kind of mode one, mode two analytics and so on.
And we had quite an interesting discussion about why IT is kind of important.
So, Rick, do you want to just kind of like, just recap a little bit on your kind of thoughts on that
before we kind of get into the detail?
Yeah, as will become obvious in my discussion
in the kind of discussion of IT or not IT,
I'm pretty firmly on the IT side,
and I think I have some reasons for it too.
So, the first thing you should know,
and this is really going back to my background,
when I was at Oracle, I mainly focused on Oracle as a transactional database.
And a lot of the stuff I wrote about involved some of the tricky and complex issues
in doing transactional OLTP-type operations.
And this involved things like locking and consistency across multiple writers.
And these are really extremely difficult problems that are not really relevant to analytic use
of the data at all.
When we're talking about what you want to call a data warehouse or analytics or whatever
you want to say, that's almost an entirely different work profile.
Instead of having to deal with consistency and small amounts of data, relatively small
amounts of data, repeated many times, you're now dealing with large data sets, small results, and an emphasis
on reads.
And virtually not caring about consistency at all, because you're typically looking at
data as if it were consistent where it is.
So it's a different set of issues.
And because it's a different set of issues, when you're working in an OLTP environment,
if you're doing it in a serious enterprise way, you are eventually going to have to wrestle with those issues,
very difficult issues of things like consistency.
If you're doing analytics, you don't have to.
And I mean it started with Excel and has continued through analytic tools,
and it continues even more with the
cloud, where, as you know, you can become IT if you have a credit card.
That's all you need.
Now, that's a difference in workloads, and that's a difference in emphasis, and that
means that you don't necessarily need the same things to work with data analytically
that you do to work with data in an OLTP environment.
However, the same qualities of data come over.
And one of these areas is an area which is difficult to do,
a bit difficult to understand,
and therefore is kind of ignored by a lot of people in analytics.
And this is the whole idea of data integrity implemented by governance, data quality,
all that hard stuff in the background that makes your data correct.
Because as we all know, data entered in any way, shape, or form is not necessarily correct,
meaning consistent and congruent at a single point in time.
That's hard to do.
You have to go through the governance process.
And the governance process is hard to do because it's not really IT.
It's organizational.
It's interactive.
It's political.
People have to get together and agree on compromises and think problems through.
Now, someone who says, just give me the data and let me look at it,
is making a secondary statement by that.
I don't understand governance.
I don't think it has value, so I'm going to skip that.
Okay.
But there are risks to run there.
And keep in mind, I just want to make it clear when I talk about this from now on, when I talk about risks, that doesn't mean you shouldn't do it. It means you have to understand
your tolerance for risk, and you have to calculate the potential effects of risk into what you're
doing and the results that you get. So you may be very happy to say the data is a day old that I'm looking at
or somewhere between one and 24 hours old because we only do a daily refresh.
That may be perfectly okay for your analytic purposes.
And that data could be fully consistent within that window.
But you can't say I'm not even going to think about this,
I'm not going to bother about this,
I'm going to get data from many different sources and just kind of plug it together,
come up with an answer, and that'll be it.
You know, might be right.
There is the whole concept of luck.
And by the way, to be dead serious about this too,
you can make good decisions with bad data and bad decisions with good data so none of these are written in stone but for me I want my data to be consistent and
accurate that's kind of table state table stakes it doesn't come that way
automatically you know it at least has to be examined and validated yeah yeah
exactly and I think probably there's there's a few kind of dimensions to this
to this kind of you know argument in favor of mode one, as you might call it.
I suppose there's, like you've been saying, there's the accuracy of the data.
So there's this thing about the numbers have to be kind of accurate and they have to be consistent across the organization and so on there.
And like you say, that's about risk and it's about accuracy and so on. And then there's the kind of the thing that perhaps the business would say that IT is within their domain, which is, I guess, kind of architecture and performance and that sort of thing as well.
And there's also perhaps the area of efficiency as well.
You know, it might be that we can get you data today now and then you can kind of do stuff and hand kind of wrangle it and so on.
But keeping doing that over time is going to make you an inefficient organization.
And I think that there are probably sort of three areas that are worth looking at here, really.
I'm going to take issue to one thing you said, but it ties on to something else I wanted to say.
Whenever I talk to people, and by the way, I'm writing research about this right now,
it's going into the final review process.
Whenever I talk to people about consistency,
my colleagues come back and say,
you don't always need the level of consistency that you're talking about.
And I say, you're right.
You don't.
As long as you understand the consistency level you have,
as long as you understand the integrity or lack thereof that you have, it's okay.
However, you have to be aware of that, number one.
And number two, there's the specter of scope creep.
You know, I've come this far.
The people who advised me, the people who implemented this,
knew the quality of this data.
And now someone else is going to get it and extend it further,
basically beyond what you could do in terms of consistency.
And then it goes over the data integrity cliff.
Let me give you an example from the world of IoT,
which I think this is going to be a massive issue moving forward.
So when you talk about consistency, there's a few specters here,
there's a few straw dogs.
I mean, one of my least favorites is the whole idea of eventual consistency,
which I think is a longer way of saying inconsistent.
Data which is not consistent does not necessarily get consistent, period, full stop.
Data which is inconsistent cannot necessarily be identified as being inconsistent
because it looks just like consistent data.
It just isn't.
So because of this, once data becomes inconsistent, it only gets worse.
It can't get better. And you're using, so let's say you're getting sensor data that has to do with a wheel.
And is the wheel spinning faster than the other wheels?
In other words, are you starting to skid on ice?
All right.
You have a sensor in every wheel.
Now, you could get that data, and there could be lags in the sensor readings,
or there could be lags in the sensor transmission.
So you wouldn't know whether event A1 came before or after event A2.
Well, that obviously wouldn't do.
So people will implement what I would call serialization.
Serialization says, I guarantee you that if you get event A2,
it happened after event A1 and before event A3.
So you implement that and you say, fine, my data is in therapy, my data is right.
Except, remember you have one of these in each wheel.
So then there's B, and you can once again say that event B1 comes before event B2 comes before event B3.
But here's what you can once again say that event B1 comes before event B2 comes before event B3. But here's what you can't say.
You can't say event B1 came before or after event A1.
Absolutely.
And you can't say whether event B2 came after A1, after A2, or after A3,
unless you have a universal time source, which, by the way, is pretty difficult to implement.
You can't say that.
And here's a situation where if you're going to implement something somewhat operational
in the sense that you're going to take corrective action for a skid,
does it matter if the right wheel started spinning before the left wheel?
Oh, yeah.
It matters a lot.
So if you went from just saying, I want to find when it's skidding, great. But if you went from saying I'm going to
combine it with other data, I'm going to assume consistency and take an action
based on that, you've just stepped over the cliff. So Rick, so one of the
complaints I guess about IT is that it's kind of, it always says no and it
takes too long to
do things and in particular enterprise architecture and planned architecture performance kind of
tuning and performance kind of thinking is something that IT spends a lot of time on
and the business doesn't see the value you know is that a defunct skill now or is there value in
that do you think? Well first of all it's not possible it's a defunct skill I mean I don't
think anyone in
the world would say that we're getting less data in our lives and we're getting fewer data sources
in our lives. And the idea that you're getting more data and more data sources and you need
less IT, it just doesn't make any sense at all. I'm not saying that I don't get calls asking that
question frequently, but it really doesn't make any sense.
So let's back up, because you bring up a really important point, which is the way you
phrase the question is kind of like, is that planned architecture?
Is that design skills?
Is that IT?
Is that mode one stuff?
Is that obsolete or not?
And the answer is, it should be not, is it obsolete, but rather, when is it appropriate
to use that, okay?
So if we're talking about people who want to do this stuff in a more rapid way, one
of the problems we have is that understanding why things like governance and integrity are
important,
it's not something that's trivial to understand,
and it's something which is much more familiar to people in IT than people in business.
So people in business who basically, or agile people,
or people who want to do analysis immediately,
if they're rejecting this stuff because they don't understand it, they lose.
That doesn't pass.
However, if what I'm saying is true,
we obviously are going to have more demand on IT and those sorts of skills than they can fill because the need for IT knowledge and IT systems is exploding, and the IT staff is not.
So when do we do this?
Well, we do this when the cost of taking those actions is outweighed by the benefits.
And this brings up the whole idea of the discovery use of data and the BI use of data, let's just say.
And by BI, I mean operational reports that you run every day.
I mean planning forecasts that you run on a regular basis.
So if you take that kind of lump that you need to design and plan your structure, and
that's a fixed cost, and you spend it to use this data in a certain
sort of way, whether you use it that way once or a hundred times, obviously if you're only
going to use it once, if you're doing discovery or something like that, it's a bad investment.
And by the way, I actually don't believe you're going to get it to go faster by being smarter
or more agile.
I've worked with a few IT departments,
and I'm pretty sure that none of them have had as part of their interview process,
can you go really slow because we only hire people who go slow.
It doesn't work that way.
Keep in mind something I say to people on a weekly basis,
performance is all a matter of expectations.
So if you think they're going slow, it means you want them to go faster.
It doesn't mean they're going slow.
Let's just be real clear.
So the whole idea is that you can use data in mode two.
You can use data for discovery.
You can use data for discovery. You can use data for exploration. That's all fine because that's something where you're going to do that analysis,
and much of the time it's not going to pan out.
You're going to throw away the analysis.
You might even throw away the data, right?
So spending time up front, that big indigestible lump of planning,
for data which you're going to
look at once and throw away is a really bad idea.
However, if you're going to be using that data over and over and over and over again,
at that point, the cost of that lump is amortized and becomes almost nothing.
And remember something else.
I don't care what anyone says, that if you're designing your data structures,
if you're designing your data flow, it's going to run more efficiently than if you're not.
Full stop.
You can't throw hardware at the problem and make it run faster indefinitely.
Okay?
So it's going to run more efficiently.
It's going to run faster.
And when that's appropriate, it should be done.
This is why I sometimes say when you're looking at Mode 1 and Mode 2,
Mode 2, your more ad hoc day discovery business practice,
it's not designed to get answers.
It's designed to ask better questions.
And then when you come up with a good question,
then you can say, now let's find out the answer
and i think everyone would agree or i would hope everyone would agree that
the answer is only as good as the question so if you're asking a bad question you're asking a bad
question it doesn't matter how fast you get to the answer it's still going to be a bad answer
so so yes i think everyone would agree with that but what's your what's your i mean we touched on um uh techniques there and ways of doing things that that you know people
would describe as maybe a data lake and data lakes i suppose my concern with them is it's a great idea
i think you know you if you have you have to understand the context in which they were thought
up but you know that there's ways in which data lakes are being used now which is putting a lot
of the burden of kind of data modeling and understanding onto users.
And they're becoming kind of, in quotes, operationalized.
You know, what's the Rick view on data lakes, really, and where the value is and risk and so on?
Well, that's a horrible mistake.
Look, a data lake, we have many definitions of data lake.
Even in Gardner, there are many more out there in the world.
But there's one definition I like.
A data lake is where you put data with unknown business value.
I'm not saying it doesn't have business value.
You just don't know what it is yet, okay?
So the thing that organizations have to be aware of is that there's a continuum
and there's got to be a flow.
So when you have data in a data lake and when it becomes useful,
and by that I mean it's being used frequently, it's being combined with data,
curated data from the EDW, it's being used in ways which are tending towards
operational decisions or strategic decisions, at that point it has business value.
And at that point, you have to combine it into your overall data warehouse estate.
Now, at Gartner, we have this concept called the logical data warehouse.
Other people have different names for it.
But one of the things it says is you don't necessarily have to move that data from the data lake to the data warehouse,
but you have to combine it together into a uniform semantic layer so people can access both of them.
Now, keep in mind, adding it to the semantic layer is going to require the governance.
It's going to require that under just the lump that I talked about before. Keep in mind also that the data lake is, in many cases,
not going to be as efficient as it would be
if you combined it with the data in the central repository.
So it's my view, and not everyone agrees with me,
but it's my view that the data lake is a fluid
and ever-changing environment where a lot of data comes in and some data goes out.
When that data proves its value, it gets integrated into the larger picture.
And whether that's a separate part of the data lake,
which is part of the LDW,
or whether it actually migrates to a more central repository,
that once again is an issue having to do with efficiency and performance.
But I think that's the way that it's going to work.
And when people say, and people have asked this, they say, well, isn't the EDW, isn't the Enterprise Data Warehouse being replaced by the Data Lake?
No! Not at all. Never.
And I don't say never because I'm an old-fashioned curmudgeon, although I am.
I say never because, you know, this reminds me of a very impactful experience I had when I was young.
I was watching television, a show with one of my favorite early philosophers, Bozo the Clown.
And Bozo was in a, you know Bozo Marquis of America?
Yeah, yeah, yeah.
Okay, anyway, sorry. Cultural reference, if you don't know Bozo, sorry.
He was a television cartoon clown.
He was in a dogfight in airplanes with his enemies, and he's got a biplane with a propeller.
And one of his enemies shoots at his bi a biplane with a propeller and one of his enemies shoots at
at his biplane and shoots off the propeller he says oh boy now i'm a jet now i can fly fast
that's kind of what this is like you know you don't turn a data lake into an edw
by wishing or by using it as if it were an edW. It lacks the qualities of an EDW.
Yeah, exactly.
Now, we'll get on later on to kind of, I suppose,
where the market and technology is going,
and it may well be that some of the technology could be used for that.
But let's take a step back,
and we made quite a spirited defence of the things that we think are important.
But let's take a sort of step back and think,
well, you know, this argument about IT is considered slow.
You know, let's think about why that is that is the perception okay and is that actually
a kind of a signal or proxy for something else really so yeah first of
all but you know what why do you think why do you hear that businesses say that
IT is slow and what do you think they're kind of really saying there what's the
underlying issue there do you think oh i i i can
tell you my opinion okay yeah um and by the way to some extent it brought this on themselves all right
and i'll get back to that in a moment but i uh my thinking on this really became clear in the course
of a discussion i was having with a client.
At one point in the discussion, he began to complain about having hundreds and hundreds of ETL processes, and they would drag
them down. He was spending all his time matching them, blah, blah, blah, blah, blah.
After a little bit, I said,
I think you may have forgotten the value that these processes are providing.
He just changed.
Like his voice changed.
He goes, you're right.
So I think what has happened is a lot of these things,
which are things that, as I mentioned before,
IT understands and business doesn't.
IT has just kind of not bothered to explain them very well, not bothered to
advance a value proposition that explains their value, and they were just saying, well,
IT, you have to do this with us.
Now, what's happened in the past decade and certainly has been accelerated by the cloud
is, no, we don't have to deal with IT.
We can go to Amazon, we can go to Microsoft, we can go to anyone and say, we're getting
this stuff in the cloud, and we're seeing an increasing amount of IT-type budget being
spent with entities other than IT.
And before that, it was IT budget being controlled by entities outside IT, which that's been
going on for a while.
But now you're being spent without IT even knowing about it.
So IT has to represent the value proposition better.
The key example of this is I was having a conversation with a guy who was the CTO for a pretty significant product
at a pretty significant company,
and I was going through my kind of explanation
of how business doesn't understand IT.
They really, you know, the stuff IT does that business can't do.
They're skipping over that because they don't understand it.
He's, yes, he's buying an agreement, blah, blah, blah, blah, blah.
I say, and IT doesn't understand business.
And he goes, oh, no, IT understands business.
And I said, you know, sorry, you can't have it both ways.
You can't say that IT understands everything and business understands nothing.
That actually doesn't work that way.
So there's going to have to be kind of a reset of the relationship between IT and business.
Yeah, definitely. I mean, I think, and there's a point that I think your colleague, Cindy
Housen, made a couple of episodes ago when she was talking about, it's all about risk
and it's about understanding risk and so on. And going back to your point earlier on, you
talked about the IT being there almost as the, I wouldn't say the conscience of the
business, but certainly the kind of the saying, you know,
that there's risk involved in not having kind of numbers matching up.
There's risk involved in not curating them and not sharing them and so on.
How do you get, how do you, how do you find,
what's the successful kind of strategy and approach that IT can have to try
and tell the business that it's kind of taking a risk that the business really
carries the risk for.
And it's not seen as basically being a blocker or being, you know, lecturing kind of taking a risk that the business really carries the risk for um and it's not seen
as basically being a blocker or being you know lecturing kind of business how can it how can it
get that message across without being seen as lecturing or getting outside the scope of what
it should cover i think i think there has to be a a new deal essentially And what this means is, so what business wants, what mode one wants, is what I call frictionless
access.
They just want to get access to stuff.
They don't want anything standing in their way.
Okay?
What IT wants to provide is, if we look at mode two, I would say it's more frictionless
integration.
So this is data that's been properly governed and all that, right?
IT has to stop saying no all the time, okay?
And the deal, I think, that has to be struck is IT and says, we will give you access, but you have to understand the limits of what you're doing.
They can't stop them.
I mean, I have a friend I've known for a long, long, long time.
Smart guy, actually has somewhat of a background in IT also.
At one point, he was so frustrated, he was telling me, I think we should fire the
entire IT staff and hire a new one we'd be better off.
And I just said, I guarantee you that's a bad idea.
I guarantee you that's a bad idea because, number one, if you're not paying more, you're
not going to get better people.
Number two, if you get the same level of people,
at least the people you have now understand your environment and the new people will not.
So it's going to be worse while they learn about your environment.
I don't just mean technical environment.
I mean business and organizational environment.
So there has to be compromise.
There have to be bridges built.
And really, here's my kind of big idea on this that I suggest.
And it's not a technical idea.
It's an organizational idea.
And I came about this from understanding the cloud.
The thing about the cloud is the cloud takes some part of your IT infrastructure and hides it from you, puts it behind a wall or a curtain
that you can't see through and you can't get through, right?
So if you were going to implement high availability,
if you want to guarantee availability from your cloud provider,
you wouldn't do it by saying you need to implement this replica this far away with this CPU.
That's not the way it works.
That's all hidden from you.
What happens instead is your cloud provider says,
I'm going to give you this service level agreement for this is how frequently I'm going to be,
this is how much I'm going to be up,
the percent of availability,
and this is how long it's going to take me
to recover in the event of a loss of availability,
and here's the penalties involved with that,
although the penalties are never reasonable
in the sense that if you have a really bad outage,
a 10% discount on next month's bill is not going to make it better.
So what I suggest as a starting point is that IT and business basically come up
with some SLAs of their own.
In other words, you say, business, when you make a request for me, you will get
a response, not an answer, but a response within X amount of time. And just like cloud
providers, the X amount of time that you give is a time which you will never miss. So if
you say you'll get an answer in 10 minutes,
but you know you can't make it,
that's an SLA, which is not going to serve you as a function.
If you say you're going to hear within two hours,
it means, number one, when an end user needs something,
they can count on getting a response in two hours.
Number two, if they need it by a certain time,
they know maybe I should begin two hours early
just in case I need help.
So this sets up some certainty in the relationship,
and the certainty in the relationship starts to build expectations properly
as to when things can get done.
The other thing that can happen from this is as you build your SLAs
on different aspects of what you're doing,
non-IT people come to understand the hard bits.
So, for instance, you know, putting data in a data lake and giving you an end-user tool to access it, that's not hard.
Getting data cleansed with proper quality and integrated with existing data stores, that's actually a lot harder. But if you say, if all you want to do is to get this data, you identify a data source,
we put it in a data lake, or rather, you identify a data source, we respond to you within an
hour as to whether we can consider it, we respond to you within 24 hours as to whether
we can do it, and give you a date as to when it's going to be available, that is typically
a good enough response, and you can set expectations around that.
Now, if they say, when can I use this with my enterprise data,
you can then continue to say, what does that mean?
Do you mean you just want to use enterprise data and play around with it,
or do you mean you want to use it with enterprise data in a way
that you're going to come up with enterprise answers?
Because the second requires governance, the first does not.
So this is the sort of thing you do to build a foundation, a new way of interacting,
because the reality is, you know, certainly 30 years ago, maybe 20 years ago, probably not 10 years ago, IT could more or less do what they needed to do with the amount of
requests that they had. That's no longer true and that's never going to get through again.
Interesting. And so that's quite an interesting kind of lead in, I think, because another
area is interesting to talk to you about, is which is cloud okay so one of the
I suppose one of the kind of responses from from businesses to is to use cloud for this
um your own old company Oracle is doing a lot of work in that space and just to recap a little bit
you just tell us what you did on cloud at Oracle before just to kind of set the scene a little bit
for your your kind of experience and then let's talk about where this is going so what what did
you what did you do with database and cloud cloud Oracle first of all I was the product
manager for the Oracle database cloud the first iteration of it which is
they're now calling the database schema service so that that was the thing that
came out about five years ago and I was involved in working on what came out
later which is what they call the unmanaged database cloud, where you get a pluggable database.
So you get your own database.
It's unmanaged, so it's not really a database as a service, but it's more than just infrastructure as a service.
Okay.
Okay.
So that was – and I think most listeners will be aware of
that and and so on there um so so i mean where do you i mean where do you see the database cloud
market now uh and where do you see where do you see it going really in terms of technology and
the value and so on so what's your view on where it is now first of all rick
well i mean first of all let's just talk about the two levels of the cloud which are
relevant when we're talking about database.
Infrastructure as a service and database as a service, which we call database platform
as a service because it falls into a category of path.
And what we're really talking about here is the different levels of the stack where the separation is between the cloud and on-premises.
Infrastructure as a service is basically saying I'm giving you a server in the cloud.
You can put whatever you want on that server.
You can load database software on there.
You could get a machine image, which is almost like a VM template,
to say when I allocate this, I'm going to get a
fully functional, fully installed database with default configurations when I allocate this.
Database as a service does the same thing, but the key difference is for database as a service,
all basic maintenance functions are managed by the cloud. You don't have full access to all the configuration options you would for that database instance.
You, in fact, do not own the instance.
You use it.
What I say is, in Oracle terms, or really in general database terms, for a database as a service,
you are not the system administrator.
For an infrastructure service, you are because you've loaded the software.
Now, your system administrator means you can do anything,
and also you're responsible for it.
So even though infrastructure as a service with database software may come
with the software fully installed, including automated maintenance procedures,
at the end of the day, you're responsible for it.
So if the backup isn't done, you have to take some action to make that happen.
So the infrastructure as a service marketplace, if you ask me, is pretty much over.
It's a duopoly.
It's Amazon and it's Microsoft.
Really enormous differences in scale between Amazon and Microsoft and everybody else.
Since infrastructure as a service is fairly low level,
it's more like the hardware market than the software market in some ways.
And we all know the hardware market exists on margin, right? Margin and market share. That's what we've seen this play out in hardware field after hardware field. And when you have a company as dominant in terms
of market share as Amazon is now, that's not going to change.
Just as an aside, I have long believed in the idea that mature markets have three players,
a leader and then a second and third, and the second and third may swap their positions back and forth,
but they basically never catch up.
In that market, markets change and evolve and new markets come up, up in that market. Markets change and evolve, and new markets come up in that market.
And I always thought it odd that in the infrastructures of service market,
really we're talking about a duopoly, not three different vendors. I mean, and whenever you ask people, they'll come up with different ideas for the third one, but, you know, whoever they bring up is still a significant distance
behind.
So what was your take, I mean, when you saw, I don't know if you saw, I listened to Open
World last year, and Thomas Coombe was very much bullishly talking about, you know, Oracle
will overtake Amazon, just in that basic market. I mean, that was, there was a lot of kind
of scepticism, to put it politely at that really did you did you
kind of see that I hear that at the time
oh yeah it's been diplomatic here yeah I heard it yes yeah I mean it's
interesting it's interesting that that I don't know if there was an analyst there actually,
you might have been there actually recently and it seems that for our part particularly,
the emphasis now has changed to be more getting the business that's currently managed service
business that partners have.
I think to try and compete head on with Amazon is, I mean this is a sort of separate issue
with databases, but certainly I get the impression that like you say that the lead of Amazon is so unassailable and it's all and it seems
it seems kind of ridiculous to think you can get there really but I don't know I
mean it's a it's a it's a target I suppose well I mean look keep in mind
what's the difference between IAS and managed hosting.
What's the difference?
Well, the difference, there's not a great deal of difference in many ways because in both of them you have the ability to do whatever you want on top of that platform.
But it really has to do with how it's priced and how it's achieved.
So self-service and pay-by-use. but it really has to do with how it's priced and how it's achieved.
So self-service and pay-by-use.
Managed services are not pay-by-use. Managed services are pay-by-configuration or whatever you want to call it.
So people are – in one way, the cloud is re-energizing that whole thing.
And one thing that we are seeing, kind of a little bit surprising to me,
is people are getting more and more interested in private cloud,
because there's some issues with public cloud.
And I personally think these issues are legislative and perceptional rather than
actual issues, you know, security and all that stuff. But there's also people who say,
I want to have the machine in-house. And we're seeing it. And Oracle's done this, by the
way, with cloud at customer, where they say, we're putting the machine in your data center. You're paying for it with the exact same pricing model and licensing model that we have in
the cloud, which is a really big difference, and we'll manage it for you.
So something like this is giving people all the advantages of the cloud and not having
to worry about it being out there,
outside of your firewall.
Now, what this can also mean is you can also say it's a little,
it's not a gigantic bridge to say, okay, and I want a custom configuration here.
Because in the cloud, you can't get custom configurations.
That kind of breaks the cloud model.
Inside the cloud is a uniform environment, fully under control of the cloud provider,
and it benefits because the cloud provider can automate.
They can manage a thousand instances with not a whole lot more than managing one.
Well, the only way you do that is if all 1,000 instances
are just like the one instance, right?
If all 1,000 are different,
you know,
it's like work harder,
work smarter, not harder.
Sorry.
So there's a lot of give here.
And it was funny.
When I was product manager for the cloud, I would get people coming up to me a lot of give here. And it was funny, when I was product manager for the cloud,
I would get people coming up to me a lot
and saying,
I want this,
I want that in terms of configuration.
So even when people would say to me things like,
do you use Rack?
And you know,
if you're Oracle,
Rack's a big deal.
Do you use Rack?
And I'd say,
none of your business.
Because I'm a smart ass.
But it really is none of their business. Whether Oracle is using RAC inside their cloud or
not, you can't do anything about it. It's a great technology. It's a really good idea.
They really should, but they can't change it
because they don't have access at that level what they will get is an SLA
what's your percentage of availability and how long does it take you to recover
from a failover and whether you implement that SLA with rack or data
guard or backup you know is transparent yeah transparent. Yeah, yeah, exactly.
I mean, one thing I'm interested to get your take on, really, Rick,
is, you know, we've talked about Oracle database in the cloud
and so on there, but what about some of the things we've seen
coming out of, say, Amazon, for example, and, say, Google with BigQuery?
I mean, I do a lot of work now, actually, with BigQuery,
and it's an interesting take on things.
The elasticity is something that's interesting there.
But you've obviously got Athena coming through from Amazon,
but Amazon seems to have quite a different approach to all this as well,
in that there isn't just one database engine there as well.
There's lots of different ones.
And what's, again, first of all,
what's your take on the way that Amazon are doing this thing, really?
Amazon is really interesting, OK? Because Amazon, and we run into this
when we try and evaluate them against other vendors, because Amazon is taking
just a philosophically different approach. Because let's compare Amazon with Oracle,
okay? And I can do this without saying good or bad.
It's just going to be different, right?
If you look at Oracle, essentially there's the Oracle database, right?
And the Oracle database does all kinds of stuff.
By the way, same thing with SQL Server.
SQL Server has three different processing engines within it, right?
So they have a single unified entity,
and this means a lot of stuff, okay?
This means that maintenance procedures are going to be the same,
even though you, you know,
if you have a local database that you're using for transactional work
and OLTP and ETL, you know, it's okay.
You're all doing it in one place.
When you back it up, you back it up all the same way.
When you allocate resources, you allocate from a single pool.
There's some efficiency to be gained by a unitary view of what you're doing.
Amazon says we're not going to do that.
Amazon says we're going to have Aurora and we're going to have Redshift
and we're going to have EMR and we're going to have Redshift,
and we're going to have EMR.
And they all do different things.
Now, what Amazon says is we think we're going to service your need better by giving you something designed specifically to do what you want
rather than a more general tool.
And any overhead involved in bringing this stuff together is up to you.
They're also, by the way, they're kind of making a bet,
saying when people come to purchase this service,
even though they will eventually need multiple services,
they're only going to need one to start.
And that's the one we'll sell them.
And when it comes time for them to do something else,
when you want to speed up what you're doing, we sell you ElastiCache.
When you need a document database, we sell you DynamoDB.
Now they are all, you get one bill for all this,
you get a management console that
manages these
as distinct entities
from what they call a single pane of glass,
and we'll give you that, and we're going to leave that
up to you.
And I've talked
to Amazon about this repeatedly,
and they believe
this is the way it should be done.
And that's different.
So it will be.
Now, of course, they've been wildly successful.
Wild.
I mean, when they first started telling us about customer numbers, we were stunned.
I mean, we frankly didn't even believe it.
I mean, and it's not that we didn't believe it,
that we thought they were being dishonest,
we just, we had no idea it grew that fast.
So, time will tell.
And by the way, by the way,
Amazon may change their tune.
They may decide that, you know,
it's going to be more unified.
But that's the way it is now
yeah I mean I guess I guess yeah to look at say Google I mean BigQuery is the
engine they use there is a lot more you know I suppose consumerized it's a lot
more kind of single service I guess really again what's I mean I think for
example Oracle might come up with something at some point along that lines
really you know what's your view on on these kind of big elastic data warehouse engines that are, you know,
that all, I suppose, in a way, are one type of engine for everything?
What's your view on that?
Yeah.
What's your view on that?
Well, first of all, when you have a product that's been...
So many, many products were designed for a pre-cloud environment, right?
You didn't worry about separation of storage and CPU because when you bought a computer,
you got both of them, right?
You could expand storage, but it came coupled together. So that's how they were designed, and that was such a fundamental assumption
that everything from the deep core of the product was designed that way.
Now you have products out there, some of them,
have been designed from the ground up for a cloud environment,
meaning there's a shared pool of storage separate
from a shared pool of CPU, and you can increment either one of these independently.
Now, there's a lot more to it than that, because, of course, it's not just storage and CPU.
There's interconnect, and you need that balancing and all that stuff.
So this doesn't solve all problems.
It solves some problems.
So, for instance, if you need an unbalanced configuration,
meaning sometimes you need more storage than would normally be allocated
for the amount of CPU you have,
if the product's been designed from the ground up for separation of compute and storage, you can do that.
If it hasn't, it's going to be harder to do.
Now, keep in mind, it's behind the cloud.
It's inside the cloud, so you can't necessarily see what's happening,
but you can feel the effects of it.
So something like Redshift is not going to allow you
to expand from a large size to a very large size in a matter of minutes
because of the way it's implemented behind the scenes.
Other products may allow you to do that.
Now, so that's something which is actually different.
It's designed differently.
It's implemented differently.
And by the way, even more so, I think you're going to see in the not-too-distant future, some cloud companies have really designed their infrastructure
from the ground up. So for instance, at Oracle, they've announced this, so I'm not telling
anything out of school, that they're a bare-metal database, where they put a lot of emphasis
on how they design the networking.
And they can do that because it's their environment.
They can do anything they want.
They can control it any way they want.
So we're starting to see the dawn of people going out and doing stuff, which is hard to do in a kind of unitary environment
where people are connected by networks,
and easy to do if you start with that assumption,
if you start with a distributed assumption from the beginning
and you take steps in your very initial underlying architecture to say,
we're going to take care of this.
So, for instance, you could build a service where you had
an absolutely synchronized atomic clock around the world.
Well, that in and of itself doesn't buy you anything, so no one's going to do that for
starters.
But for things like having a distributed system where you have to resolve conflicts based
on a timestamp, that's table stakes.
So I think we're going to see more and more people, cloud vendors, bringing out architectures that were based on the assumptions of an internal environment, totally controlled by the vendor.
What can I do when I'm in that place?
Interesting.
So I'm conscious of time now, and I think we're almost running out of time.
But one question I want to ask you, Rick, is if you're a developer, if you're a kind of buyer of kind of database technology and so on,
where would you start? What would your starting point be now in terms of your technology choices,
how you look at evaluating which database to use for analytics and so on?
You know, where would you what would you what would kind of, your advice be now really on this?
Where would you start?
In cloud, would you on-premise?
What would your thoughts on that be?
Yeah, I mean, first of all,
I would say that at this point in time,
saying cloud first for new projects is very feasible, okay?
And keep in mind what that means.
That doesn't mean cloud is the default choice.
That doesn't mean cloud only. That means cloud should be included in your initial selection
criteria. And if there's a cloud product which is appropriate for your needs, that should
be considered along with on-premises. Now, keep in mind, of course, I hope we've gotten to the point
where the values of cloud are not overemphasized.
And by that, I mean, at least certainly at Gartner,
there's a very widespread acceptance
that cloud is not cheaper.
Cloud is expensive in a different way.
If you're doing one-to-one comparisons, cloud is not going to be cheaper.
And by that I mean if you're using cloud 24-7 the same way you would on-premises
with the same hardware and the same resource requirements,
you're changing your licensing model, you're changing your cost model.
But three to five years out, you're going to be paying the same
and then you're going to be paying more.
The way you save money on cloud is you pay as you go, so you don't have to pre-buy hardware.
You don't have to use more than you're going to use.
So look at the cloud.
The other thing is I hope that people would realize that the fact that you can deploy
a cloud instance in a matter of minutes
as opposed to taking a week to setting it up in-house or something like that is,
it seems like a lot of time, and it is in the first month, but over five years it means nothing.
So I would never say cloud's your only answer.
I'd say look at cloud along with other things too.
Excellent, excellent. Well, Rick, thank you very much for your time. It's with other things too. Excellent, excellent.
Well, Rick, thank you very much for your time.
It's been great to speak to you, really.
It's great to talk, you know, talk databases,
talk with the veteran and so on there.
So thank you very much for your time.
Have a good, you're in the States, I take it now,
so have a good afternoon, morning, or whatever.
And thanks so much for coming on.
It's been great speaking to you.
It's a real pleasure, Mark. Thanks for having me it's been great speaking to you it's a real pleasure mark thanks for having me okay cheers thank you