Drill to Detail - Drill to Detail Ep.38 'Oracle's Big Data Strategy Before OOW' with Special Guest Jean-Pierre Dijcks
Episode Date: September 19, 2017Drill to Detail returns for a new season with special guest Jean-Pierre Dijcks, to talk about Oracle's Big Data Strategy now and in the past, thoughts on distributed query and storage in the cloud, an...d previewing themes and announcements to look forward to at the upcoming Oracle Open World 2017 event running in San Francisco next month.
Transcript
Discussion (0)
So welcome back to a new series of the drill to detail podcast the show about the world of big
data analytics and data warehousing and I'm your host Mark Rittman. So for this first show of the
new series I'm pleased to be joined by someone who I've known for about as long as I worked with Oracle technology, back in the days
of Warehouse Builder, back in the 2000s, and Oracle 8i, right through to big data in cloud today.
My special guest, therefore, is Jean-Pierre Dykes from Oracle. Welcome to the show,
and it's great to speak to you again. Yeah, it's great, Mark. It's always fun to talk to you,
and happy to be on the podcast. So Jean-Pierre just or JP is where
you'll call you actually um tell us a bit about kind of how you got into the world oracle what
you uh the products I suppose you involved in first of all just to sort of set the scene really
for everybody to give you the idea of how you came into this and uh and and your kind of route into
this yeah as you mentioned I've been around oracle for a while now um we shall refrain from saying
how long.
But most of my life, and I guess that's where we always intersect, is in the whole data analytics,
data warehousing, housing world, and things like that. And like in the 2010 area,
I was part of the database product management team for parallel execution. And we started looking at this whole emerging, God, I can't believe I'm saying that, Hadoop market.
We're looking at that and how does that work.
And so as of that time, we at Oracle started looking at Hadoop. And I particularly landed in the product management spot for some of our big data uh products initially big
data appliance um and and that essentially is is what ventured into what i do today which is
product management for some of our big data platform products and services so big data cloud
service big data appliance sideways involved with big data sql things like that so that's kind of
where i hang out these days okay
so so the reason i want to speak to you apart from the fact that i've known you for for a long
long time and you know you're involved with warehouse builder back in the day and and the
database and as you say through to the bda and so on is um i think particularly for you you know
you've got a good perspective on um i suppose data warehousing and and where it's come from
with oracle and where it's going to and and the Hadoop world and big data, and particularly kind of Oracle's take on Hadoop
and where it can complement, I guess, sort of the data warehouse.
And I want to talk to you really in this episode about, I suppose, kind of Oracle's big data strategy,
but also a theory of mine that both data warehousing and big data,
as they converge into the cloud and as they become sort of like services
that are elastic and so on um you know that in a way they're becoming sort of very similar and you
could argue um that those worlds are kind of combining a lot more than perhaps i thought
they were going to do a few a few years ago and i'd be interested to get your views on that really
um but let's start off really first of all with kind of the the area you're responsible for so
you kind of said you it was the bda first of all big data appliance first of all but now your role is kind of slightly wider um tell us about the role you do there
um with big data in oracle and and i suppose in a way what's oracles and what's your strategy um
around the current range of sort of big data products you know what what kind of market you're
trying to address and what kind of customer you're looking to serve at the moment yeah makes sense
um and and just just to kind of funnel it down right to serve at the moment. Yeah, makes sense. And just to kind of funnel it down,
at the end of the day, let's put it in context.
Oracle, from a big data perspective and analytics perspective,
obviously has a wide range of products to look at,
and we're really going to focus on some of the cloud services,
the BDA, the Hadoop platform.
I don't think people have missed the Oracle went to the cloud services, the BDA, the Hadoop platform.
I don't think people have missed the Oracle went to the cloud part.
Kept you quiet, though. Kept you quiet.
Yeah.
The role changed very dramatically when we really switched from the, okay, on-premise engineered systems.
We really started focusing on the cloud services.
Like the BDA is still an anchor product, right?
It's still and I think will be serving an important market,
which is people running things on-premise.
They buy their infrastructure.
They manage it.
And I think that's continuing. And we and certainly I, from a product management perspective, spent quite a bit of time on BDA and looking at the hardware platform and looking where we're going and discussing GPUs and analytics and things like that.
So I think I don't think that's going away.
Right. Let's let's be frank about that.
The cloud completely changes many of these things,
including hardware platforms and pillars. And so the strategy around this is, how do we enable
a customer at their pace to run their big data infrastructure effectively and efficiently,
right? That's the goal we're trying to get to here is if you would like to buy
something, yeah, sounds good to us, right? We have a really good solution. We can package up the
entire kind of stack in an almost cloud-like fashion where you check a bunch of boxes
and you install a Hadoop cluster, right? It's almost like provisioning just with a few different
steps. And so we're
really focusing on that platform, as well as then making sure we can, like I said, put this into
cloud, either on-prem, which of course is the cloud machine strategy or the cloud at customer
strategy Oracle does. And so we did announce and are shipping something called Big Data Cloud Machine, which replicates kind of the middle thing between the on-prem appliance and then the full cloud service with elasticity and scale out and all of that good stuff.
And so we're really trying to build a platform where we don't force a customer into, oh, but you have to go to cloud.
That's the only thing we have.
We're really trying to optimize these platforms and
go at the pace a customer wants to go. So that's really
I think a big strategic differentiator. And yes,
you hear lots about cloud, cloud, cloud, and obviously that's where everything's going.
And I think we'll chat some more about that. But it's, I think
Oracle's big differentiator is
if you would like to run optimized Cloudera
or other distribution in the cloud side, right,
we offer that to you.
And you have full control over how and when you want to move to cloud
and where you want to run things on-prem.
So I think that's really a big, big kind of thing.
Okay, so let's take a step back then.
So there's quite a few kind of terms in there
that maybe one or two people wouldn't know BDA and so on.
It'd be interesting to kind of take a step back
a little bit in history, really,
and to think about, I suppose, why the BDA came about
and the strategy there was around the connectors and so on.
I remember being at, I think it was a hot sauce event
a few years ago when the connectors were being kind of talked about.
I think it was Dan McClary of people, actually, who was kind of talking about them.
There was a presentation that was quite vague about some futures and so on.
And then there was a BDA as well.
I mean, look at the BDA, first of all.
So the big data appliance.
Just describe what that is, first of all, JP.
And then I suppose what I'm interested in is why Oracle went into the hardware market really at that point.
And, you know, that was an interesting why Oracle went into the hardware market really at that point.
And, you know, that was an interesting sort of move at the time, really.
It was, right?
And so let's go to what's a BDA. So BDA or Big Data Appliance, and I think we're about to do like DX7s, is an engineered system, think Exadata, Exalogic, Supercluster, stuff like that, where what we try to do or what we do is look at the software, the workloads, and essentially what we think people are going to do with this system and cover 80, 90% of that with a turnkey solution.
So it's not like a reference architecture.
It's really, we look at the hardware, we put it together, we benchmark and test it.
We then optimize layouts and things like that.
And so it is like a Cloudera in a box system.
And just to be clear on that as well, it comes with a full Cloudera distribution and only the full Cloudera distribution.
And so that's kind of what it is
and and it runs the full enterprise data hub stack and and you can run impala on it and you can run
hive on it and you can do spark on it and and everything um that we all love and and like about
the hadoop stack now you were asking how did it come about and and why did we go into the hardware
and the reason i asked that is because i remember i remember going in on on some pitches you know Now, you were asking how did it come about and why did we go into the hardware?
The reason I ask that is because I remember going in on some pitches before when I was consulting in that area.
And there was this perception with customers that they could do Hadoop on the cheap with cheap hardware.
And you end up getting into an argument with the IT department about this hardware, that hardware, and so on.
It was an unusual move for you guys to make.
But I suppose, in a way, it's a reflection of some of the stuff you've done with Exadata with the hardware balanced on there.
I think that is it, right?
Rightfully or wrongfully, so I'll give you my opinion on that as well.
But we looked at it and said, hey, this Exadata thing is actually highly interesting because rather than having indeed those discussions with with it departments on which hca or hba or how many this and how many that and and and what's a balanced system and all
that good stuff it's like here is one and it works really really well for let's just say 80 or 90
percent of of all your workloads and that worked really well on a mature database platform because
people had for years struggled with with optimizing systems and tuning and figuring this out.
And often the hardware caused a problem because it was unbalanced.
They didn't have enough network bandwidth for CPU power, blah, blah, blah, right?
All that good stuff. And so we looked at this and thought like, you know what, massively parallel computing, like large number of nodes, all working in unison. How about we kind of apply the lessons
we learned from the Exadata space and the database appliance space and kind of go there and say like,
why don't we all go out for the next 10 years, try to figure out what hardware should look like.
Let's figure it out, right? Put the system on there, optimize it and go, here you go.
And that was really the thought of it.
It was the, and I think you alluded to it, it was the lessons learned on complex data processing machinery.
And Hadoop's looked and smelled and still does essentially the same.
So we thought, no brainer.
Obviously, this will work.
And I think the lesson we learned after like a year and a half was we might be a little
bit early for the market because people were going there, right?
People were like, what are you talking about, appliance?
This is like self-healing and
amazing and wonderful and magic and any pizza box can run it so we can do this um and i i think it
has come around and uh people do whatever six seven years in start to appreciate the oh you
mean i don't have to think about any of this you guys have kind of like put the roles on the right
servers and you kind of optimize it so that I can have fewer nodes potentially
but still have the good throughput and whatnot, right?
People start to go like, yeah, I really don't want to do this by myself.
And, yeah, it's really cool that I could, but I have no interest, right?
I want to just load my data and start analyzing stuff or doing what I need to do.
Yeah.
I mean, I think from my experience, it sold well to a certain category of customer.
So those that would do big deals with you and would do buy X data machines and would have these big kind of like ULAs with Oracle.
The big BDA was fantastic.
And certainly the project that I worked on at the time, I remember working on it and it was fantastic.
And particularly the Mammoth utility.
I remember that was kind of very useful, useful obviously in doing the updates and so on.
But I guess it obviously meant that you guys, you only played in a certain space.
You know, you were certainly, it was, I mean the BDA is going to be an expensive piece of kit,
obviously it was in everything else.
So I mean, did you find, I mean in a way, did it preclude you from certain markets or
did, was always the intention for oracle to sell into
a particular kind of part of the market the enterprise market and so on well i mean whether
it's intentional or not i mean a a there's a perception um shall we call it a perception
problem um that that if it's an oracle engineered system it must be uh beautiful and amazing but
also uh expensive and and that good stuff So I think the perception didn't help,
right? And certainly initially, and there's quite a few studies and things done where people go
like, yeah, it may sound expensive, but if you put everything in a three-year TCO or four-year TCO
and the Cloudera cost is included and all of that stuff. It's not, right?
It's not like crazy or whatever.
Like generally speaking, I think the target market, I don't think,
the target market of a BDA was the enterprise customer, right?
It is not potentially, and we had a discussion with somebody on Twitter about this,
it is not a software development shop that by chance does car sharing, right, or whatever.
And if you have an army of developers and you go into the open source completely and you go like,
I'm going to manage this and maintain this, I'm going to contribute. I'm going to do all sorts of things like that.
A BDA is probably not where you go.
But if you're a commercial company, you're a bank, you're a communications company, you're whatever you are, right?
This becomes very appealing because it's a turnkey solution that gets you going much, much faster.
And I think that was the market we aimed it at.
We were just a little early,
and I think the market is now really kind of gearing up to us.
No, no, this makes sense, right?
And it is commercial customers.
It is Oracle's enterprise market, right,
which in the cloud is actually changing,
but on-prem is still like the Fortune 500,000,
whatever you want to call
it yeah it's easy to it's easy to forget i mean i i tend to work mostly in what you'd call kind
of startup world at the moment where it would be aws or or google cloud would be the obvious kind
of choice there but it's very different sort of world and you know you sometimes it's easy to
forget that that um you know government uh large companies and so on you know when they buy these
when they buy for example a hadoop environment then it's completely different set of kind of things they're buying for completely
different set of objectives and i mean i was a big fan of bda at the time and i think it worked well
and and i guess the only thing really was uh was was the it precluded itself from the startup world
there's a certain buzz around that world that was there but i mean certainly for what it was it was
it was good and i think that i mean on that point as well i what it was, it was good. And I think that, I mean, on that point as well, I mean, there was, so there was the
connectors as well.
I mean, I'd be interested to understand, I mean, were you around at the time when the
decisions were made about, I mean, whether to, I mean, obviously the big worry I would
have thought within Oracle with Hadoop is that it cannibalizes the market rather than
it being sort of complimentary.
And it must have been quite a lot of kind of like discussions internally, do we do this?
Do we build things like connectors and so on?
I mean, so the thinking process behind all the stuff
that went along with the connectors
and the big data SQL and so on,
what was that really about?
Was that to compliment the database
or was it there to take over from that?
What was your view on that?
So we spent quite a bit of time just looking at Hadoop and its characteristics.
And don't forget that in 2010, 2011, Spark did not exist, right?
So a lot of it was MapReduce.
A lot of it was large-scale processing.
We had the luxury, just being on 101 here in Silicon Valley,
to drive down and chat to some of the startup world
and talk to people like at LinkedIn.
And we chatted with some of the admins there and like,
what are you guys doing?
And we chatted with folks at Facebook at the time and various other people.
And I think most have long moved on to other things.
But we were kind of looking at, so what do customers do with this, right?
What do the pioneers today really drive and do?
And we came to a, again, like with the BDA,
we decided that packaging up and turnkeying it was a good thing.
Here we decided that there is no way,
and maybe this was wishful thinking or maybe it was just really smart,
there's no way SQL or databases are going away, right?
And you saw that immediately at Hive being built and the features they were building
into Hive from a partitioning perspective and stuff like that.
It's like, wow, SQL is really going to stick around.
And we have a good SQL engine.
So the decision or the thoughts were like, if we can put them together, just from a 10,000-foot glance at it, that smells like a good thing to do.
And you always have to be conscious that sometimes it's luck and sometimes it's smarts and sometimes it's a combo, right?
But we looked at it and thought, you know what, we have a good SQL execution engine.
If we can combine the two data tiers or whatever you want to call it, there must be something in it, right? And it felt
like a natural extension to kind of go to our customers, which of course have Oracle databases
and say, hey, we see a market here where we could extend this data into all of this wonderful
Hadoop stuff. But obviously, you'd like to join it, connect it, move it, whatever you want to do.
And the connectors were really the first venture into, is this real?
Do people actually do that?
And it essentially evolved while the connectors are still around in their original form.
It essentially shaped and formed the thinking around Big Data SQL
and the much, much deeper offloading integration into the platform, right?
So I think it was really the first venture into connecting the two together.
Exactly, exactly.
I mean, it's, yeah, it's having worked in the world of, in my case,
BigQuery a little bit recently and Hive before that and so on,
when you go back to using something like the Oracle database or any kind of, I suppose, traditional relational database
that is designed to kind of work with, I don't know, I suppose with even things like update statements, insert statements,
and reference integrity and things like that, you get quite a bit more respect for it, really, in some respects.
And certainly, to your point earlier on, databases aren't going to go away.
And I think there was an initial thought at the start with Hadoop,
it's going to take over all this workload.
But certainly, this kind of, I suppose,
I mean, if you were doing one of your architecture talks now, for example,
and a customer was saying to you, my data architecture going forward, where does Hadoop go in there?
Where does kind of relational databases go in?
What would you kind of say at the moment?
How would you kind of set that out for people?
Where's the sweet spot, I guess, between the two things?
Like I actually, if I were to draw it from scratch, I would actually start to say that all of your data originates or goes into,
is probably a better word because it originates somewhere else, right? But it goes into,
let's put HDFS for now down there and we'll push all the flows go into HDFS first and foremost.
And I would promote them to the relational database based on usage patterns,
right? In other words, I think they they complement each other I think the database is still king in in
performance with many concurrent users and and and bizarrely complex SQL constructs coming out of
BI tools right that whole performance complexity concurrency stuff the database is extremely good
at and and I think we've forgotten many of these things
like cursor sharing, like row level locking,
all sorts of things like that, right?
They're very material in large scale,
thousands of users querying stuff.
All of that really comes to fruition there.
So I think that's kind of what I would go with.
I would do any of the brunt work. Like why would you run ETL on your expensive, beautiful, shiny database?
But it makes no sense, just brute force it on a Hadoop cluster. And so that's typically my
architecture pitch is land stuff into Hadoop, let people query it. I think you guys wrote a blog
post at some point in time about the federation aspects, right, where you say, hey, I just exposed data in OBIE or your BI tool.
The data comes from Hadoop.
The SQL engine kind of globs it together, and life is great, and you can be very agile.
And then if people start to hit that data frequently, just lift it and move it into the database.
And all of a sudden, you've got this beautiful query engine and working on that,
but you also have the low cost,
the versatility, the flexibility,
the ease of kind of playing with different formats,
massaging data of the Hadoop platform,
and you have kind of a winner at that point in time.
So I think that's what I would draw these days.
So let's kind of move on a bit there, really.
Obviously, we talked about the BDA
and the connectors and so on,
but that is, you know, in kind of, I suppose, internet terms, software terms, a long time ago now.
Everything now is cloud.
So tell us a bit about what Oracle's strategy is around, I suppose,
kind of big data and data warehousing moving into the cloud.
I mean, what products have you got there at the moment?
And what, again, who is this appealing to, really?
And what are you trying to achieve with this kind of work at the moment?
So there's a spectrum of software that we developed on the on-prem stuff,
like our graph stuff, our spatial areas, and our machine learning, our things.
They are applicable across all of those.
Let's leave those aside for now.
I think they're worth a whole podcast.
But from a cloud perspective, we're doing two things.
On the one hand, we're taking our existing cloud-era infrastructure, like the BDA and kind of that footprint,
and the level of control a customer has over their Hadoop cluster
or their Spark cluster. And we actually take that into our cloud. And then we add all of the cloudy
features to it. On a BDA, you've bought the hardware. So bursting is kind of hard because
you don't have hardware. And so what we did in our cloud infrastructure is we abuse or use some
of the cool networking on InfiniBand.
And we've clustered many, many of these racks together on InfiniBand.
And it enables us to randomly burst any node in our pods and have absolutely fabulous throughput.
And this was in lieu of, to some extent, some of the networking stuff we wanted that is all being revamped and changed.
But we have all of these capabilities of bursting and shrinking.
We have a massive footprint.
And what we were really trying to initially do with our big data cloud service
is to kind of have a beautifully secure but fully controlled by you,
the customer, H I do cluster up and
running with the cloudy features and then that's what we're what we're
basically running towards so you have edge nodes you have bursting like I said
you can you can you have root access to the cluster so you could install any of
your wonderful latest Hadoop libraries or data science workbenches or whatever
you want essentially into your
cluster.
And so it's to some extent a bit of a bridging one, right, where you lift and shift these
workloads and you adhere to a very similar pattern to what people saw and a similar control
to what people seem to want of their Hadoop clusters.
Going forward, right?
Yes.
And this is where cloud becomes extremely interesting, right?
Because if you live in an on-prem world
and you have to put the infrastructure down,
it's extremely difficult to build up an object store, right?
You have to have the scale, the triple replication, the multi-site stuff. You difficult to build up an object store, right? You have to
have to scale the triple replication, the multi-site stuff, you have to solve all of those problems.
And then you have to get the scalability and the cost of all of that calculated. And I think that's
where HDFS on-prem is still king. In cloud, I think object store is king, right? Because it's even cheaper. It's also
dumber, but that's a different story. But it's even cheaper. And I think if you look at Big Data
Cloud Service CE, it starts to look like things like EMR, and it starts to look like some of that
where you're really segregating compute from storage. I think that's the other path we're on is how will that evolve?
What do we do there?
We chose to do a different distribution at that point as well.
And again, there's another one of these, where does this go, right?
Are distributions still relevant?
Are they not?
And I think there's a big pivot going on right what
a big transition in the market HDFS yes no distribution yes no security probably
but all of these things are our material yeah and and you mentioned I mean there's
a few is a lot in there's a lot of there's a lot you said there that I'd
like to kind of go back to really I. I mean, so first of all, you mentioned that about the BD.
First of all, you said object store.
So for anybody that doesn't know what you're talking about, what is object store?
And why do you think that's interesting?
And what part does that play in the conversation we're having here?
So when we talked about the architecture a bit a earlier right i said oh i would land my data in in hdfs and then upon needing to access it or or having
sufficiently frequent access and high performance access i would move to a database
and so that that's kind of like playing with cost and access. And object store, which is not really a file system, right?
It deals in objects and it can store anything.
It's kind of like blobs and clubs in a database.
It's a bunch of bits or bytes and throw them in a bucket.
And people have access to it in no particular way or API or whatnot or nothing like SQL or whatever. And these things are basically the bit bucket of the world now where it's my staging area,
my whatnot.
And I dump my documents there or whatever I do.
And we all do this in Dropbox and whatever all these things are.
And what Object Store is doing is enterprising essentially that
where I load my data into a
central, quote unquote, central place
and the cloud vendor under
object store just goes off and
triple replicates
or mirrors this or make sure
that if one side goes down, my files are still
there.
And that's what object store is and the relevance
of it is that it's cheap
and scalable and and accessible okay okay and you mentioned you mentioned distributions there so so you you the thing i noticed uh i think it's open world last year um oracle you know it was obvious
that the the distribution you're using on the the cloud you know big data computer edition was
alton works um which which was interesting and i get there's reasons why you might do that and technical reasons and so on.
But I suppose, in a way, what's interesting is that the actual end services,
the end product didn't really change as far as the user would be concerned.
Because as things go to the cloud, they become more abstracted and so on there.
I mean, you talked about distributions not being so important now.
Do you think that's going to be the case going forward?
Well, technically, I was asking the question
whether they're important, right?
I think they are.
I mean, again, I don't think they are so important now.
I think that individual parts of the stack,
like things like messaging, for example,
might come from, you know, you've got obviously Confluent
and you've got Kafka and so on there.
But whether it's just um maybe my own world i
work in now but these things were massively important you know what distribution you're
using whether it's map r whether it was kind of cloud air whether it's hortonworks and so on
but now everything services and really what what's powering those services under the covers
is is largely irrelevant really i mean i don't know that's that's my opinion anyway
well i i think i think you're you're you I think you're onto an interesting thread there, right?
And I think the distinction and the definition we have to put in here is
if you expose an API to me, a.k.a. you're going to a managed service,
as long as the service you provide me solves my problem, I'm happy, right?
And as long as your support is sufficient, life is great, right?
So if, and by the way, I do think that the road in cloud is to far more managed services than unmanaged services.
And there it becomes less relevant unless there is truly distinguishing factors between system A and system B. And I think
you see a shift there where, to your point, and I think I agree with you, where the distribution
becomes potentially less important. And I think that is the overall trend, right? I mean, unless
there is very specific IP in a component, they become interchangeable, right? Which is, I mean, unless there is very specific IP in a component, they become interchangeable, which is, I think, why, and not to do grandiose predictions here, why I think Oracle Cloud will actually be one of the cloud vendors going forward.
And that is partially because of the database infrastructure.
Interesting.
I mean, certainly for me, the shock, I mean, I came from, about a year and a half ago i came out of i suppose the
consulting world and out of the out of the i suppose on-premise world with hadoop and i actually
spend all my time you know um fiddling around with kind of hadoop hadoop um uh installations
and distributions and so on there and you know i went to work the place i'm working at now and um
they've been through that and and that managing their own infrastructure and managing their own
distributions and on-premise stuff and HBase clusters and so on,
it scales to a certain extent,
but beyond even another scale, it's unmanageable.
And the thing that struck me going into, I suppose,
the large-scale big data world
is how it's all running in services now.
And nobody now, really,
who's actually using this stuff at scale,
especially in the kind of software startup world, is working with actual servers now and it's all about serverless architectures and
services and so on and that's why i think that um you know certainly the work you guys are doing
with the with the computer edition big data service is kind of very interesting and uh but
it does mean it's i think people who work with hadoop now don't seem to realize perhaps don't
realize how much the of
an impact the world of services and serverless will have on what we're doing now really i mean
it's it's certainly quite a paradigm shift in how we think about things really well i i agree with
you right and and it's it's by the way uh i don't think it's all rosy and wonderful, right? There's a whole bunch of things I think need to be solved, right? And
like, I think object store is amazing and will over time, or rapidly replace like things like
HDFS and many other storage points. But it doesn't have a great access control, right?
Its security paradigm is, quote-unquote,
you have or you don't have access.
That's kind of the granularity.
And there, I think, is where,
and this is an opinion, right?
I think in the foreseeable future,
we will have a mix of serverless stuff running, data in object store, whatever, and working with it. infrastructure where we where we do run a specific distribution just because i certified my apps on
it and and how do i get everything recertified and all of that so i think there's a transition
period where you do need to offer both and i think that's coming back to what we started with
right that's i think one of the big differentiators we do bring to bear because we have
both of these models in place and it enables a customer to start where they want and go to,
right? And keep in mind, right? If like you were saying, you worked in like the startup areas and
everything's kind of nice, new and shiny. Yes. Because I don't have whatever 7,400
core banking systems lying around, whatever, right? Totally. And this is actually quite a
nice to get into the bit I want to talk to you about next, really, which is I'm just curious.
Obviously, there's been a profusion of different ways to solve, I suppose, distributed compute and query in the cloud over the last couple of years.
And as we talk about, I mentioned things like BigQuery, we've talked about, you know, those kind of, I suppose, different takes on serverless, I suppose, data warehousing, big data in the cloud.
But I'm kind of curious to get your personal take
on some of these technologies and some of the ways they're solving this problem really and
obviously I think one of the things is probably sort of fair to guess is that Oracle might announce
some stuff in the future around this but nothing's announced yet this is really kind of I suppose I
want I'm interested in your opinion in some of the different solutions to where this technology can
go in the future and and something I've always been kind of curious to know about is, you know,
BigQuery, Google BigQuery and Amazon Athena.
So those kind of very canonical takes on serverless, you know,
query in the cloud and that sort of thing.
What's your view on that and where does that work well and where does that
kind of like, you know, run out of steam and that sort of thing?
I think it makes perfect sense, right?
I mean, it's if I have a question and I need to ask that question kind of now when I don't want to buy anything, I just want to run my thing, right?
I think the architecture makes perfect sense. this is sometimes maybe ignorance, but as long as you can't guarantee SLAs, it becomes
a little hard to make, like, to depend on some of these things, right?
And I think a big requirement of a whole bunch of BI and analytic queries is it has to finish
in a predictable time.
And I think one of the things we learned in the Exadata days and the data warehousing
is, like, it's really nice that it runs really fast now in like three seconds and in like five minutes tomorrow.
Customers hate that, right?
They hate the fact that this query I need to know now needs to run in three seconds because it did so yesterday.
And I think there you're going to run into issues because how do you guarantee, in air quotes, right?
We can't see you in a podcast, but how do you guarantee
SLAs on serverless?
And I think that's where some of that is potentially not completely like the right thing to go.
I agree.
I mean, I actually, at the moment, I'm managing a product that runs on BigQuery, and it is
actually an analytic service we provide to customers.
And it's querying, i mean it's querying you
know famously it's querying kind of petabytes of data and we can go from that you get a very good
response time is that but they're never entirely consistent and the interesting thing is that last
almost the last the last part of the query you know getting a query down from say 30 seconds
down to consistent five seconds is is not i suppose the kind of the the space those things
work in and um and the other thing is the other thing that's interesting with those products is
when you port a data warehouse workload to those environments,
so first of all, there's a whole kind of like question around data modeling.
You know, do you port the same sort of normalized structure into those?
The answer is no, obviously, because they're not going to work very well.
Yeah, I was going to go with that.
Yeah, but, you know, and, but also how do you handle
some of the things like, I don't know, slow change of dimensions
and some of the things that we take for granted
in the kind of the analytics world are not possible in there really.
And I think that, you know, when you look at something
like BigQuery or Athena, you look at them and think,
that's it, it's game over.
You know, you can do everything you used to do in Oracle in this
or Kudu or stuff like that.
But actually, you know, it's when you come to do it
and you realise the kind of the full part of the solution
is not possible.
And I don't know.
I mean, again, I think to really appreciate things
like the Oracle database, you need to kind of work away
from it for a while sometimes.
I think it is.
And I think like we're talking about this, right?
And it's a little bit like the stuff we talked about earlier.
Like we have a database and we have this Hadoop.
Oh, this Hadoop thing is going to kill this other thing.
And I think what we've learned there, or at least I hope people learned, is that it's like don't try to shoehorn everything into one thing, right?
Within reason, you want to simplify your architecture, right?
Obviously, if everybody got to choose, I have one thing that i manage and maintain and life is great right but but you see these characteristics
come up and and i think somebody at some point in time at tdwi said i think um somebody from
facebook so like don't don't like like don't worry about like and or right it's not it's not
about or it's not like this thing or that thing just just if you have a set of workloads or a set of problems and they get really well solved in big
query athena or whatnot just go use it right but but but don't kind of assume that because it's
called because it runs some sequel don't assume it can run everything and and does everything and
is exactly right for everything you want to do right so i think i think that's the thing we need
to kind of stop doing.
I know.
I think those of us and probably all of us in this camp
that are tech enthusiasts,
we tend to sort of look at something and see the potential
and think this is going to be revolutionary.
But, you know, that is a mistake to try and get this new thing
to do everything the old one used to do
and to try and think, I suppose, that it's going to replace everything.
You know, you will still have database servers around the future.
I guess the kind of the issue is around cost of those sort of things, really.
But certainly, another take on this is SnowflakeDB.
I mean, I'm curious.
I know, obviously, the people behind that are Xoracle and so on.
That's always been an interesting product from my perspective,
in that it kind of, of obviously it has the elasticity of
of some things we're used to kind of running in the cloud but it's it seems quite oracle-like
in the way it works I mean what was your what's your view on on on snowflake and and this is
maybe I'm very very clear as a personal view you know what's your view on what they've done
the problem they're solving and so on um I think it like first and foremost yes the guys who built it right we know
fairly well I know yeah
it strikes me as an interesting one
to reproduce the things about Oracle
in the cloud like that I don't know
it kind of is right I mean
I think the thing it proves let's
ignore the technical implementation for
a second because these guys
are smart they know how to build databases
and I'm sure the thing works really well.
I think the interesting thing of it is, hey, while everybody's on the Hadoop
and Kafka and whatever bandwagon, they went somewhat far more traditional
and said, you know what, we're just going to build an analytics database engine
that just leverages cloud, right?
So here you see, I think, a good way of saying cloud gives me so many benefits
if I could only architect to it, right?
And it's kind of what you were saying.
Hey, this BigQuery thing is really cool, but I do have to redo my modeling.
And these guys said, like, hey, this cloud thing is really cool,
and I can do X, Y, and Z and that makes my data warehouse solution
really interesting, right?
And they're not the only ones
who figured that one out
because I think,
I'll do my open world pitch here.
Yeah.
I think you'll hear a whole bunch of it
in terms of data warehousing,
managed service,
stuff like that from Oracle at open world
because I think that's where databases are going.
Managed services in cloud get away from tuning things, right?
Get away from, I need to have an army of consulting
or an army of specialists or very smart people
that are going
to tweak the parameters and make this work right the system needs to take over that role and i
think that's where ai ml wonderful buzzwords are really going to drive the way we're going to
deploy services yeah i think snowflake is is one one take on that right yeah snowflake's interesting
one in that i i talked with you know kent graziano fair bit and i've always been a bit of a skeptic on snowflake although i know a few people there
and and respect what they're doing and but snowflake i think is a little bit like sequel
in that i've had to eat my words a little bit over the kind of like over the last year a little
bit with uh with these in that um first of all it struck me um people like you a while ago were
saying you know sequel is the language of big data and i used to scoff at you from the audience and
uh and actually and actually the more the more that I work with this stuff,
the more that I kind of end up agreeing with you.
But also, I was kind of questioning why Snowflake built things like,
I don't know, reference integrity into their engine,
and why bother to support things like updates
if all your workload is a kind of query workload.
But then you realize, well, how do I update dimensions? How do I update kind of um is a kind of query workload but but then you realize well how do i
update dimensions how do i update kind of reference data and and it's i've got grudging you know more
respect now for for snowflake and i think it's um i'd be interested to see you know if and when i
will announce something you know whether it's along these kind of lines and and but you're
right though that that actually sometimes we sort of we try and sort of you know we try and sort of
innovate so hard
and we launch something that maybe is ahead of the market,
but actually putting the features of an Oracle database
or relational database with the elasticity of the cloud
is a winner, isn't it, in some respects?
Yeah.
And I think what you were saying, right?
I mean, we look at this and say, like, oh, who needs updates?
But I think the thing I learned in in in financial data
warehousing is uh people do restate things right they they get transactions and then they turn out
to be not the actual transaction that truly happens they have to restate so if you want to
go to hadoop and restate as it's like uh not so much fun yeah so what about things like impala
and presto and so on i mean there's obviously a lot of products out there that have gone to the mpp kind of route i mean are they do
you think there's something that is a bit of a kind of a i suppose an evolutionary sort of dead
end or or what was your view on you know take impala for example and presto what was your
thoughts on those i think at the end of the day it it is a little bit of a dead end
because they're all, at the end of the day, SQL engines.
And he or she who can run the most complex SQL
in the most concurrent manner wins.
So I don't think the game is, oh, I can write a SQL engine
or, oh, I can run queries.
The game is the full-on complexity of a real BI workload.
Now, that doesn't mean that they're not useful or good products.
It simply means that if you want to play in this area,
and by the way, this is, I think, where scale really matters.
If you want to play in this area,
you better deal with what people think a database is and does.
And while they forget very quickly,
the moment you can't do something, you go like,
oh, okay, that's why I had one of these things.
The scale is interesting.
And if customers ask me,
so you're basically telling me I should never use Impala
or I should never use this.
And it's like, no.
But there is a lot of questions
and queries that can be answered by, for example, running Impala on our BDA, for example. You don't
always have to go to Oracle, right? But if you want to run BI dashboards at scale and concurrency,
you probably want to run Oracle. But if you want to have their discovery and you want to hack about
some stuff, why wouldn't you use Impala? Why?
It's like, again, it's like,
it's just combine them and use them
where they really make sense.
But I do think that at scale,
large enterprise deployments,
the SQL engine is to some extent
more important than anything else.
So taking a sort of a look forward to the future,
I mean, again, something that struck me
as I moved into work,
in my case actually more with Google Cloud recently,
was that when I came across BigQuery
and I realized that it had the characteristics
of a big data system
in that obviously it's scaled very horizontally
and all this kind of stuff.
But it had also the characteristics of a SQL engine. And very kind of what you know very horizontally and all this kind of stuff but it had the kind of also the characteristics of a sql engine and it kind of
struck me at the time that as data warehouse workloads as big data workloads moved into the
cloud you know in in a way the technology that underpins it is going to become less relevant
you know it's going to become distributed compute and storage sql on the top and so on i mean do
you think you know in a way do you think kind you think kind of the next generations of me and you
really kind of working on this technology
will not really have this distinction
of kind of data warehouses, relational databases,
big data and so on.
It would just be one big query service running in the cloud.
And actually, the mechanics of it
and how it works in the end is less relevant.
It would just be a service.
I mean, do you think that's the way it's kind of going?
I mean, that's been my view.
I think so.
Yeah. And I should actually add that i hope so right because
because at the end of the day uh i i really don't know how a network works right i don't know why
i'm talking to you over skype and how that really gets routed and works and why would i care right
and and i think and and i think this is what we internally look at as well is I just want to ask a question.
And can you just give me an answer, please?
And that's a very simple thing to say.
And that's where I think BigQuery and Athena and whatever kind of plays really well.
But the next level down is I would like to ask any question.
And by the way, my SLA is four seconds.
And by the way, my this is that. So I think it's the combo of this whole serverless scale out architecture,
infrastructure, whatnot, and then the somehow guaranteeing me for a class of workloads SLAs.
For some, I don't want to pay that money to you. So just run it whenever you can. Right. And
I think that whole gamification and that whole kind of cost-based, which I don't mean cost-based
optimizer, but to some extent I do, I guess, cost-based optimization of queries, SLAs and
data positioning, I think that is where it is going. And while I'm a big subscriber to SQL as
the language, I do think people want to ask graph questions
without completely and fully understanding
what that really does, right?
And like nobody argues why certain links
are at the top of the Google pages.
There's an algorithm that does it
and it's probably reasonably good, right?
And so I think we're going to dumb down,
it's a bad word to use, dumbing down,
but the consumption of it is going to be so much simpler
and the skills required to manage, maintain, set up,
all of that is going to be far less.
So in the market going forward,
let's imagine we were talking about kind of
how's Oracle
going to differentiate itself really from say sort of Google and Amazon and so on
in this space I mean we talked about we talked a moment ago about it just
becoming services and so on you took and you mentioned about SLAs and so on there
but I suppose what's the kind of angle that you guys are going to have going
forward to convince an organization to use you rather than to use say Google
Cloud or something?
What's the particular Oracle kind of angle or market you're aiming at really here?
I mean, we talked about enterprises earlier on,
but everyone wants to sell to enterprises.
Where do you think Oracle have a particular kind of strength here, really?
So I think it's actually the product manager details as well.
So you know all the wonderful comprehensive and all of these beautiful, like, oh, we have it all words.
But I do think there is kind of like a big leg up Oracle has in this, which is one, we are one of the few who do IaaS, PaaS and as well as SaaS, right? And so the integration of it and potentially the ease of going between these,
let's call them services, right,
is going to go forward
is going to be a very, very big plus to Oracle.
Oracle Data Cloud is the other one, right,
where we have whatever, five, yeah,
so five billion whatever households and whatnot.
So Oracle invested on the acquisition side a lot of money in building up this whole data cloud initiative.
And it is essentially he who has the most data or she who has the most data who is most attractive to have your data come to our cloud. Because if you put customer information in our cloud
and you can now mash it up with a pile of 5 billion other customer records
that we could use to enrich, augment, refine, and doing all of that,
that will tremendously drive the value of your data and your data in our cloud.
And I think Google, Amazon, and various other,
and I'm not the expert on this, but actually use all of this, right?
They use the data in Oracle Cloud to augment their stuff.
And I think that's going to really drive some of the macro decisions
as to which one of these clouds do I pick.
What about next generation of developers?
I mean, one of the kind of gripes I've had with kind of Oracle big data
and the cloud and so on is it's so hard to get hold of access.
I mean, you and I have discussed this in the past where, you know,
I mean, because I know people, I know you and so on,
I've always been able to get access.
But I suppose one of the kind of side effects of Oracle selling
just to kind of enterprises and not to the kind of side effects of Oracle selling just to kind of enterprises and not
to the kind of startup market and so on is that it's actually quite hard for somebody
on the spur of the moment to go and pick up their laptop and get access to Oracle big
data running in the cloud.
I mean, I know part of this will be down to capacity and so on, but are there initiatives
going on to make that a bit easier to try and seed the market a little bit and nurture
the next round of developers really in this area? I think very much so right and it's a nightmare it's a nightmare
for me yeah i'm with you right i mean it's it's um keep in mind that transition is not an easy thing
to do um right and and i think what you start to see and and and i'm i'm sure if you would have
like like we were looking at at this whole bDA thing in 2010 and this Hadoop thing scratched our head, I'm sure if you would have gone to Amazon Cloud in 2010 or whatever, you'd go like, huh?
Right?
I mean, there is this, like there is a set of years invested in certain vendors' infrastructure, and we're investing and catching up quickly.
It has a downside, right?
We still have some extent.
We know how to deal with complex contracts and we know how to deal with very large procurement
departments and we can do all of that.
Makes it a little harder to deal with the other side of the coin, right?
And I think other companies are transferring the other way.
So it's both are kind of complicated.
But we're hard at work to flip this company around and really getting to the point where you should just go to cloud out what are the cool groups to hang out with and kind of what are the things that we've learned from the other implementations.
And so we can implement some of that.
And I think you'll see a lot of that come to fruition in the next-gen infrastructure IaaS stuff because that stuff is really cool.
Exactly.
So we're getting there, I think.
I think so. I think so I mean certainly for me it was a kind of comedy thing where I try to get so I try to sign up for the elastic cloud service
and I think I think I think I was the only person I was the first person ever
who wasn't a part of a company to actually kind of put the illiteracy
swipe a credit card and try sign up for it and I it with the order went through
and and I ever just looked at it did not't know what to do with it, really. But, yeah, it's early days, I know.
And I think it's...
But certainly, I suppose, it would be...
For me personally, it's been quite hard to keep up
with kind of developing on the Oracle Big Data Cloud platform
because it has been so hard to get hold of the software
and so on, really.
And, you know, I suppose that is something
that's quite important, really, going forward.
And, I mean, yeah. So, obviously, there's quite important really going forward.
So obviously there's Open World coming up soon,
and you're going to be obviously there and presenting and doing keynotes and stuff and that sort of thing.
I mean, obviously you're limited in what you can talk about,
but give us a flavour of some of the things that you'll be speaking about
and some of the things to look out for really from your area
at Open World in a few weeks time okay um let's just start with the one that we alluded to a little bit and i'll i'll
say like four words about it but but data warehousing managed service cloud um those three
words uh are are probably going to be um prominently present many things from what I can tell right now.
And I think you'll see a bunch of announcements and things around that.
And so it's really an exciting, I think, opportunity, but it's also an exciting topic that will come out of open world.
Go to a little like lower grade or lower granularity topics um
some of the stuff we're we're really trying to do is is kind of what you were saying like how do you
get the next generation developers like like working on stuff so one of the things we're
we're really trying to figure out and we keep on going back to like how do i make it simpler for
somebody to use something without necessarily
having gone to like data science school.
Right.
So one of the things that,
that certainly my session will be about is,
is how do I have data in my object store or let's just,
just a place.
How do I get that into my Hadoop cluster?
How do I get a notebook running on it?
How do I make hive definitions on top of that cluster? How do I get a notebook running on it? How do I make Hive definitions on top of that?
This must be simple, right?
And I think we spend a lot of time over the past three to four months to really make that workflow blindingly simple.
Right mouse click, right mouse click, right mouse click, and off you go. if you want to load large data sets or you want to take a training data set of multiple terabytes out of object store into HDFS,
but you don't really want to write a Spark job
or you really actually don't want to go to ODI
and click all of this together and load it, right?
I just want to go, dude, put this thing in my HDFS and move on.
And so we spend a lot of time building
like better mousetraps to some extent, right?
Getting this stuff out of that and driving it into that.
So that's some of the stuff we're looking at.
We're looking at cloud machine and what that really is going to drive into the market
and how we think we can working on pretty hard is,
is how can we make files much more secure?
And how can we have SQL access to that kind of have baked in security?
And I'm going to be somewhat vague about this,
but at the end of the day, what we're trying to figure out is
if I go to object stores and I can't implement a fine-grained security mechanism,
somehow I need to be able to define roles and responsibilities,
and the file itself somehow needs to encapsulate that.
And so if I interrogate a file, I should potentially get a different answer
than Mark interrogating that exact same file on the exact same system.
And we've dubbed it an enterprise parquet as the working title.
And it actually does run in Parquet.
And that is something where we think there is an interesting future
on making files far more secure and more versatile
and more from a data government's perspective,
I don't want to create a source file,
Parquet schema one, Parquet schema two, ORC schema three, like can we condense this all into a source file parquet schema one parquet schema two overseas schema three like can
we condense this all into a single file and that's something that we'll be talking about at open world
in in like a small corner of one of the sessions and just chat about but i think it has huge
potential excellent well it's great well it's been great to speak to you jp i mean it's uh i mean
certainly i've you know it's been much about 17 quite about 17 years now since I've known you back from the days of OWB and ATI and that sort of thing and you've been
you've been kind of proved right over the years and that sort of thing so it's been it's been
it's been it's been interesting sort of like hearing from you and getting advice from you
and that sort of thing and particularly you know you must have seen you've obviously seen a lot
over the years in terms of technology and so on but a lot of things you know I guess for me that the themes out of this kind of conversation and the themes i suppose from
what i've been looking at is how you know tabular storage how sql how how those sort of things are
eternally important and i suppose also how new technologies that come along just just don't
replace the other one really you know that there would be a need for different ones there um you
know you'd have a need for the database need for kind of hadoop and that sort
of thing there and and yeah trying to sort of shoehorn everything into the same thing is is is
crazy um but also keep an open mind really i mean i think you're a good example of someone who has
kept an open mind over the years you've gone from you know gone from sort of very much client server
kind of etl tools and so on through to this and uh that's why it's all fun really isn't it i think
it's what it's why i particularly enjoy working in this industry,
in that every year things change.
The knowledge you've built up can be useful,
and it's certainly, you know, I think you can bet on how old you are,
but the age I am as well, the knowledge you've built up is useful,
but also it's all exciting as well.
So it's been great to speak to you, really.
Yeah, it was fun to be here,
and you did make me feel a little old at the end of there but it's I guess that's
what it is no it's great I mean it's it's about thinking we're still relevant
which is that which is the key thing really so yeah it's been great well I
hope open world goes well and it's been really good to speak to you and Stuart
Bryson says hello as well so he'll be seeing if we see you open I won't be
there but it's yeah it's great to speak to you and um take care and speak soon yep thanks