Drill to Detail - Drill to Detail Ep.38 'Oracle's Big Data Strategy Before OOW' with Special Guest Jean-Pierre Dijcks

Starting point is 00:00:00 So welcome back to a new series of the drill to detail podcast the show about the world of big data analytics and data warehousing and I'm your host Mark Rittman. So for this first show of the new series I'm pleased to be joined by someone who I've known for about as long as I worked with Oracle technology, back in the days of Warehouse Builder, back in the 2000s, and Oracle 8i, right through to big data in cloud today. My special guest, therefore, is Jean-Pierre Dykes from Oracle. Welcome to the show, and it's great to speak to you again. Yeah, it's great, Mark. It's always fun to talk to you, and happy to be on the podcast. So Jean-Pierre just or JP is where you'll call you actually um tell us a bit about kind of how you got into the world oracle what

Starting point is 00:00:49 you uh the products I suppose you involved in first of all just to sort of set the scene really for everybody to give you the idea of how you came into this and uh and and your kind of route into this yeah as you mentioned I've been around oracle for a while now um we shall refrain from saying how long. But most of my life, and I guess that's where we always intersect, is in the whole data analytics, data warehousing, housing world, and things like that. And like in the 2010 area, I was part of the database product management team for parallel execution. And we started looking at this whole emerging, God, I can't believe I'm saying that, Hadoop market. We're looking at that and how does that work.

Starting point is 00:01:38 And so as of that time, we at Oracle started looking at Hadoop. And I particularly landed in the product management spot for some of our big data uh products initially big data appliance um and and that essentially is is what ventured into what i do today which is product management for some of our big data platform products and services so big data cloud service big data appliance sideways involved with big data sql things like that so that's kind of where i hang out these days okay so so the reason i want to speak to you apart from the fact that i've known you for for a long long time and you know you're involved with warehouse builder back in the day and and the database and as you say through to the bda and so on is um i think particularly for you you know

Starting point is 00:02:17 you've got a good perspective on um i suppose data warehousing and and where it's come from with oracle and where it's going to and and the Hadoop world and big data, and particularly kind of Oracle's take on Hadoop and where it can complement, I guess, sort of the data warehouse. And I want to talk to you really in this episode about, I suppose, kind of Oracle's big data strategy, but also a theory of mine that both data warehousing and big data, as they converge into the cloud and as they become sort of like services that are elastic and so on um you know that in a way they're becoming sort of very similar and you could argue um that those worlds are kind of combining a lot more than perhaps i thought

Starting point is 00:02:52 they were going to do a few a few years ago and i'd be interested to get your views on that really um but let's start off really first of all with kind of the the area you're responsible for so you kind of said you it was the bda first of all big data appliance first of all but now your role is kind of slightly wider um tell us about the role you do there um with big data in oracle and and i suppose in a way what's oracles and what's your strategy um around the current range of sort of big data products you know what what kind of market you're trying to address and what kind of customer you're looking to serve at the moment yeah makes sense um and and just just to kind of funnel it down right to serve at the moment. Yeah, makes sense. And just to kind of funnel it down, at the end of the day, let's put it in context.

Starting point is 00:03:31 Oracle, from a big data perspective and analytics perspective, obviously has a wide range of products to look at, and we're really going to focus on some of the cloud services, the BDA, the Hadoop platform. I don't think people have missed the Oracle went to the cloud services, the BDA, the Hadoop platform. I don't think people have missed the Oracle went to the cloud part. Kept you quiet, though. Kept you quiet. Yeah.

Starting point is 00:03:58 The role changed very dramatically when we really switched from the, okay, on-premise engineered systems. We really started focusing on the cloud services. Like the BDA is still an anchor product, right? It's still and I think will be serving an important market, which is people running things on-premise. They buy their infrastructure. They manage it. And I think that's continuing. And we and certainly I, from a product management perspective, spent quite a bit of time on BDA and looking at the hardware platform and looking where we're going and discussing GPUs and analytics and things like that.

Starting point is 00:04:38 So I think I don't think that's going away. Right. Let's let's be frank about that. The cloud completely changes many of these things, including hardware platforms and pillars. And so the strategy around this is, how do we enable a customer at their pace to run their big data infrastructure effectively and efficiently, right? That's the goal we're trying to get to here is if you would like to buy something, yeah, sounds good to us, right? We have a really good solution. We can package up the entire kind of stack in an almost cloud-like fashion where you check a bunch of boxes

Starting point is 00:05:18 and you install a Hadoop cluster, right? It's almost like provisioning just with a few different steps. And so we're really focusing on that platform, as well as then making sure we can, like I said, put this into cloud, either on-prem, which of course is the cloud machine strategy or the cloud at customer strategy Oracle does. And so we did announce and are shipping something called Big Data Cloud Machine, which replicates kind of the middle thing between the on-prem appliance and then the full cloud service with elasticity and scale out and all of that good stuff. And so we're really trying to build a platform where we don't force a customer into, oh, but you have to go to cloud. That's the only thing we have. We're really trying to optimize these platforms and

Starting point is 00:06:07 go at the pace a customer wants to go. So that's really I think a big strategic differentiator. And yes, you hear lots about cloud, cloud, cloud, and obviously that's where everything's going. And I think we'll chat some more about that. But it's, I think Oracle's big differentiator is if you would like to run optimized Cloudera or other distribution in the cloud side, right, we offer that to you.

Starting point is 00:06:35 And you have full control over how and when you want to move to cloud and where you want to run things on-prem. So I think that's really a big, big kind of thing. Okay, so let's take a step back then. So there's quite a few kind of terms in there that maybe one or two people wouldn't know BDA and so on. It'd be interesting to kind of take a step back a little bit in history, really,

Starting point is 00:06:54 and to think about, I suppose, why the BDA came about and the strategy there was around the connectors and so on. I remember being at, I think it was a hot sauce event a few years ago when the connectors were being kind of talked about. I think it was Dan McClary of people, actually, who was kind of talking about them. There was a presentation that was quite vague about some futures and so on. And then there was a BDA as well. I mean, look at the BDA, first of all.

Starting point is 00:07:16 So the big data appliance. Just describe what that is, first of all, JP. And then I suppose what I'm interested in is why Oracle went into the hardware market really at that point. And, you know, that was an interesting why Oracle went into the hardware market really at that point. And, you know, that was an interesting sort of move at the time, really. It was, right? And so let's go to what's a BDA. So BDA or Big Data Appliance, and I think we're about to do like DX7s, is an engineered system, think Exadata, Exalogic, Supercluster, stuff like that, where what we try to do or what we do is look at the software, the workloads, and essentially what we think people are going to do with this system and cover 80, 90% of that with a turnkey solution. So it's not like a reference architecture.

Starting point is 00:08:05 It's really, we look at the hardware, we put it together, we benchmark and test it. We then optimize layouts and things like that. And so it is like a Cloudera in a box system. And just to be clear on that as well, it comes with a full Cloudera distribution and only the full Cloudera distribution. And so that's kind of what it is and and it runs the full enterprise data hub stack and and you can run impala on it and you can run hive on it and you can do spark on it and and everything um that we all love and and like about the hadoop stack now you were asking how did it come about and and why did we go into the hardware

Starting point is 00:08:44 and the reason i asked that is because i remember i remember going in on on some pitches you know Now, you were asking how did it come about and why did we go into the hardware? The reason I ask that is because I remember going in on some pitches before when I was consulting in that area. And there was this perception with customers that they could do Hadoop on the cheap with cheap hardware. And you end up getting into an argument with the IT department about this hardware, that hardware, and so on. It was an unusual move for you guys to make. But I suppose, in a way, it's a reflection of some of the stuff you've done with Exadata with the hardware balanced on there. I think that is it, right? Rightfully or wrongfully, so I'll give you my opinion on that as well.

Starting point is 00:09:25 But we looked at it and said, hey, this Exadata thing is actually highly interesting because rather than having indeed those discussions with with it departments on which hca or hba or how many this and how many that and and and what's a balanced system and all that good stuff it's like here is one and it works really really well for let's just say 80 or 90 percent of of all your workloads and that worked really well on a mature database platform because people had for years struggled with with optimizing systems and tuning and figuring this out. And often the hardware caused a problem because it was unbalanced. They didn't have enough network bandwidth for CPU power, blah, blah, blah, right? All that good stuff. And so we looked at this and thought like, you know what, massively parallel computing, like large number of nodes, all working in unison. How about we kind of apply the lessons we learned from the Exadata space and the database appliance space and kind of go there and say like,

Starting point is 00:10:19 why don't we all go out for the next 10 years, try to figure out what hardware should look like. Let's figure it out, right? Put the system on there, optimize it and go, here you go. And that was really the thought of it. It was the, and I think you alluded to it, it was the lessons learned on complex data processing machinery. And Hadoop's looked and smelled and still does essentially the same. So we thought, no brainer. Obviously, this will work. And I think the lesson we learned after like a year and a half was we might be a little

Starting point is 00:10:54 bit early for the market because people were going there, right? People were like, what are you talking about, appliance? This is like self-healing and amazing and wonderful and magic and any pizza box can run it so we can do this um and i i think it has come around and uh people do whatever six seven years in start to appreciate the oh you mean i don't have to think about any of this you guys have kind of like put the roles on the right servers and you kind of optimize it so that I can have fewer nodes potentially but still have the good throughput and whatnot, right?

Starting point is 00:11:32 People start to go like, yeah, I really don't want to do this by myself. And, yeah, it's really cool that I could, but I have no interest, right? I want to just load my data and start analyzing stuff or doing what I need to do. Yeah. I mean, I think from my experience, it sold well to a certain category of customer. So those that would do big deals with you and would do buy X data machines and would have these big kind of like ULAs with Oracle. The big BDA was fantastic. And certainly the project that I worked on at the time, I remember working on it and it was fantastic.

Starting point is 00:12:02 And particularly the Mammoth utility. I remember that was kind of very useful, useful obviously in doing the updates and so on. But I guess it obviously meant that you guys, you only played in a certain space. You know, you were certainly, it was, I mean the BDA is going to be an expensive piece of kit, obviously it was in everything else. So I mean, did you find, I mean in a way, did it preclude you from certain markets or did, was always the intention for oracle to sell into a particular kind of part of the market the enterprise market and so on well i mean whether

Starting point is 00:12:30 it's intentional or not i mean a a there's a perception um shall we call it a perception problem um that that if it's an oracle engineered system it must be uh beautiful and amazing but also uh expensive and and that good stuff So I think the perception didn't help, right? And certainly initially, and there's quite a few studies and things done where people go like, yeah, it may sound expensive, but if you put everything in a three-year TCO or four-year TCO and the Cloudera cost is included and all of that stuff. It's not, right? It's not like crazy or whatever. Like generally speaking, I think the target market, I don't think,

Starting point is 00:13:17 the target market of a BDA was the enterprise customer, right? It is not potentially, and we had a discussion with somebody on Twitter about this, it is not a software development shop that by chance does car sharing, right, or whatever. And if you have an army of developers and you go into the open source completely and you go like, I'm going to manage this and maintain this, I'm going to contribute. I'm going to do all sorts of things like that. A BDA is probably not where you go. But if you're a commercial company, you're a bank, you're a communications company, you're whatever you are, right? This becomes very appealing because it's a turnkey solution that gets you going much, much faster.

Starting point is 00:14:02 And I think that was the market we aimed it at. We were just a little early, and I think the market is now really kind of gearing up to us. No, no, this makes sense, right? And it is commercial customers. It is Oracle's enterprise market, right, which in the cloud is actually changing, but on-prem is still like the Fortune 500,000,

Starting point is 00:14:24 whatever you want to call it yeah it's easy to it's easy to forget i mean i i tend to work mostly in what you'd call kind of startup world at the moment where it would be aws or or google cloud would be the obvious kind of choice there but it's very different sort of world and you know you sometimes it's easy to forget that that um you know government uh large companies and so on you know when they buy these when they buy for example a hadoop environment then it's completely different set of kind of things they're buying for completely different set of objectives and i mean i was a big fan of bda at the time and i think it worked well and and i guess the only thing really was uh was was the it precluded itself from the startup world

Starting point is 00:14:57 there's a certain buzz around that world that was there but i mean certainly for what it was it was it was good and i think that i mean on that point as well i what it was, it was good. And I think that, I mean, on that point as well, I mean, there was, so there was the connectors as well. I mean, I'd be interested to understand, I mean, were you around at the time when the decisions were made about, I mean, whether to, I mean, obviously the big worry I would have thought within Oracle with Hadoop is that it cannibalizes the market rather than it being sort of complimentary. And it must have been quite a lot of kind of like discussions internally, do we do this?

Starting point is 00:15:26 Do we build things like connectors and so on? I mean, so the thinking process behind all the stuff that went along with the connectors and the big data SQL and so on, what was that really about? Was that to compliment the database or was it there to take over from that? What was your view on that?

Starting point is 00:15:41 So we spent quite a bit of time just looking at Hadoop and its characteristics. And don't forget that in 2010, 2011, Spark did not exist, right? So a lot of it was MapReduce. A lot of it was large-scale processing. We had the luxury, just being on 101 here in Silicon Valley, to drive down and chat to some of the startup world and talk to people like at LinkedIn. And we chatted with some of the admins there and like,

Starting point is 00:16:10 what are you guys doing? And we chatted with folks at Facebook at the time and various other people. And I think most have long moved on to other things. But we were kind of looking at, so what do customers do with this, right? What do the pioneers today really drive and do? And we came to a, again, like with the BDA, we decided that packaging up and turnkeying it was a good thing. Here we decided that there is no way,

Starting point is 00:16:41 and maybe this was wishful thinking or maybe it was just really smart, there's no way SQL or databases are going away, right? And you saw that immediately at Hive being built and the features they were building into Hive from a partitioning perspective and stuff like that. It's like, wow, SQL is really going to stick around. And we have a good SQL engine. So the decision or the thoughts were like, if we can put them together, just from a 10,000-foot glance at it, that smells like a good thing to do. And you always have to be conscious that sometimes it's luck and sometimes it's smarts and sometimes it's a combo, right?

Starting point is 00:17:18 But we looked at it and thought, you know what, we have a good SQL execution engine. If we can combine the two data tiers or whatever you want to call it, there must be something in it, right? And it felt like a natural extension to kind of go to our customers, which of course have Oracle databases and say, hey, we see a market here where we could extend this data into all of this wonderful Hadoop stuff. But obviously, you'd like to join it, connect it, move it, whatever you want to do. And the connectors were really the first venture into, is this real? Do people actually do that? And it essentially evolved while the connectors are still around in their original form.

Starting point is 00:18:02 It essentially shaped and formed the thinking around Big Data SQL and the much, much deeper offloading integration into the platform, right? So I think it was really the first venture into connecting the two together. Exactly, exactly. I mean, it's, yeah, it's having worked in the world of, in my case, BigQuery a little bit recently and Hive before that and so on, when you go back to using something like the Oracle database or any kind of, I suppose, traditional relational database that is designed to kind of work with, I don't know, I suppose with even things like update statements, insert statements,

Starting point is 00:18:40 and reference integrity and things like that, you get quite a bit more respect for it, really, in some respects. And certainly, to your point earlier on, databases aren't going to go away. And I think there was an initial thought at the start with Hadoop, it's going to take over all this workload. But certainly, this kind of, I suppose, I mean, if you were doing one of your architecture talks now, for example, and a customer was saying to you, my data architecture going forward, where does Hadoop go in there? Where does kind of relational databases go in?

Starting point is 00:19:09 What would you kind of say at the moment? How would you kind of set that out for people? Where's the sweet spot, I guess, between the two things? Like I actually, if I were to draw it from scratch, I would actually start to say that all of your data originates or goes into, is probably a better word because it originates somewhere else, right? But it goes into, let's put HDFS for now down there and we'll push all the flows go into HDFS first and foremost. And I would promote them to the relational database based on usage patterns, right? In other words, I think they they complement each other I think the database is still king in in

Starting point is 00:19:51 performance with many concurrent users and and and bizarrely complex SQL constructs coming out of BI tools right that whole performance complexity concurrency stuff the database is extremely good at and and I think we've forgotten many of these things like cursor sharing, like row level locking, all sorts of things like that, right? They're very material in large scale, thousands of users querying stuff. All of that really comes to fruition there.

Starting point is 00:20:21 So I think that's kind of what I would go with. I would do any of the brunt work. Like why would you run ETL on your expensive, beautiful, shiny database? But it makes no sense, just brute force it on a Hadoop cluster. And so that's typically my architecture pitch is land stuff into Hadoop, let people query it. I think you guys wrote a blog post at some point in time about the federation aspects, right, where you say, hey, I just exposed data in OBIE or your BI tool. The data comes from Hadoop. The SQL engine kind of globs it together, and life is great, and you can be very agile. And then if people start to hit that data frequently, just lift it and move it into the database.

Starting point is 00:21:02 And all of a sudden, you've got this beautiful query engine and working on that, but you also have the low cost, the versatility, the flexibility, the ease of kind of playing with different formats, massaging data of the Hadoop platform, and you have kind of a winner at that point in time. So I think that's what I would draw these days. So let's kind of move on a bit there, really.

Starting point is 00:21:22 Obviously, we talked about the BDA and the connectors and so on, but that is, you know, in kind of, I suppose, internet terms, software terms, a long time ago now. Everything now is cloud. So tell us a bit about what Oracle's strategy is around, I suppose, kind of big data and data warehousing moving into the cloud. I mean, what products have you got there at the moment? And what, again, who is this appealing to, really?

Starting point is 00:21:43 And what are you trying to achieve with this kind of work at the moment? So there's a spectrum of software that we developed on the on-prem stuff, like our graph stuff, our spatial areas, and our machine learning, our things. They are applicable across all of those. Let's leave those aside for now. I think they're worth a whole podcast. But from a cloud perspective, we're doing two things. On the one hand, we're taking our existing cloud-era infrastructure, like the BDA and kind of that footprint,

Starting point is 00:22:22 and the level of control a customer has over their Hadoop cluster or their Spark cluster. And we actually take that into our cloud. And then we add all of the cloudy features to it. On a BDA, you've bought the hardware. So bursting is kind of hard because you don't have hardware. And so what we did in our cloud infrastructure is we abuse or use some of the cool networking on InfiniBand. And we've clustered many, many of these racks together on InfiniBand. And it enables us to randomly burst any node in our pods and have absolutely fabulous throughput. And this was in lieu of, to some extent, some of the networking stuff we wanted that is all being revamped and changed.

Starting point is 00:23:06 But we have all of these capabilities of bursting and shrinking. We have a massive footprint. And what we were really trying to initially do with our big data cloud service is to kind of have a beautifully secure but fully controlled by you, the customer, H I do cluster up and running with the cloudy features and then that's what we're what we're basically running towards so you have edge nodes you have bursting like I said you can you can you have root access to the cluster so you could install any of

Starting point is 00:23:39 your wonderful latest Hadoop libraries or data science workbenches or whatever you want essentially into your cluster. And so it's to some extent a bit of a bridging one, right, where you lift and shift these workloads and you adhere to a very similar pattern to what people saw and a similar control to what people seem to want of their Hadoop clusters. Going forward, right? Yes.

Starting point is 00:24:09 And this is where cloud becomes extremely interesting, right? Because if you live in an on-prem world and you have to put the infrastructure down, it's extremely difficult to build up an object store, right? You have to have the scale, the triple replication, the multi-site stuff. You difficult to build up an object store, right? You have to have to scale the triple replication, the multi-site stuff, you have to solve all of those problems. And then you have to get the scalability and the cost of all of that calculated. And I think that's where HDFS on-prem is still king. In cloud, I think object store is king, right? Because it's even cheaper. It's also

Starting point is 00:24:47 dumber, but that's a different story. But it's even cheaper. And I think if you look at Big Data Cloud Service CE, it starts to look like things like EMR, and it starts to look like some of that where you're really segregating compute from storage. I think that's the other path we're on is how will that evolve? What do we do there? We chose to do a different distribution at that point as well. And again, there's another one of these, where does this go, right? Are distributions still relevant? Are they not?

Starting point is 00:25:21 And I think there's a big pivot going on right what a big transition in the market HDFS yes no distribution yes no security probably but all of these things are our material yeah and and you mentioned I mean there's a few is a lot in there's a lot of there's a lot you said there that I'd like to kind of go back to really I. I mean, so first of all, you mentioned that about the BD. First of all, you said object store. So for anybody that doesn't know what you're talking about, what is object store? And why do you think that's interesting?

Starting point is 00:25:56 And what part does that play in the conversation we're having here? So when we talked about the architecture a bit a earlier right i said oh i would land my data in in hdfs and then upon needing to access it or or having sufficiently frequent access and high performance access i would move to a database and so that that's kind of like playing with cost and access. And object store, which is not really a file system, right? It deals in objects and it can store anything. It's kind of like blobs and clubs in a database. It's a bunch of bits or bytes and throw them in a bucket. And people have access to it in no particular way or API or whatnot or nothing like SQL or whatever. And these things are basically the bit bucket of the world now where it's my staging area,

Starting point is 00:26:49 my whatnot. And I dump my documents there or whatever I do. And we all do this in Dropbox and whatever all these things are. And what Object Store is doing is enterprising essentially that where I load my data into a central, quote unquote, central place and the cloud vendor under object store just goes off and

Starting point is 00:27:14 triple replicates or mirrors this or make sure that if one side goes down, my files are still there. And that's what object store is and the relevance of it is that it's cheap and scalable and and accessible okay okay and you mentioned you mentioned distributions there so so you you the thing i noticed uh i think it's open world last year um oracle you know it was obvious that the the distribution you're using on the the cloud you know big data computer edition was

Starting point is 00:27:41 alton works um which which was interesting and i get there's reasons why you might do that and technical reasons and so on. But I suppose, in a way, what's interesting is that the actual end services, the end product didn't really change as far as the user would be concerned. Because as things go to the cloud, they become more abstracted and so on there. I mean, you talked about distributions not being so important now. Do you think that's going to be the case going forward? Well, technically, I was asking the question whether they're important, right?

Starting point is 00:28:08 I think they are. I mean, again, I don't think they are so important now. I think that individual parts of the stack, like things like messaging, for example, might come from, you know, you've got obviously Confluent and you've got Kafka and so on there. But whether it's just um maybe my own world i work in now but these things were massively important you know what distribution you're

Starting point is 00:28:29 using whether it's map r whether it was kind of cloud air whether it's hortonworks and so on but now everything services and really what what's powering those services under the covers is is largely irrelevant really i mean i don't know that's that's my opinion anyway well i i think i think you're you're you I think you're onto an interesting thread there, right? And I think the distinction and the definition we have to put in here is if you expose an API to me, a.k.a. you're going to a managed service, as long as the service you provide me solves my problem, I'm happy, right? And as long as your support is sufficient, life is great, right?

Starting point is 00:29:07 So if, and by the way, I do think that the road in cloud is to far more managed services than unmanaged services. And there it becomes less relevant unless there is truly distinguishing factors between system A and system B. And I think you see a shift there where, to your point, and I think I agree with you, where the distribution becomes potentially less important. And I think that is the overall trend, right? I mean, unless there is very specific IP in a component, they become interchangeable, right? Which is, I mean, unless there is very specific IP in a component, they become interchangeable, which is, I think, why, and not to do grandiose predictions here, why I think Oracle Cloud will actually be one of the cloud vendors going forward. And that is partially because of the database infrastructure. Interesting. I mean, certainly for me, the shock, I mean, I came from, about a year and a half ago i came out of i suppose the

Starting point is 00:30:05 consulting world and out of the out of the i suppose on-premise world with hadoop and i actually spend all my time you know um fiddling around with kind of hadoop hadoop um uh installations and distributions and so on there and you know i went to work the place i'm working at now and um they've been through that and and that managing their own infrastructure and managing their own distributions and on-premise stuff and HBase clusters and so on, it scales to a certain extent, but beyond even another scale, it's unmanageable. And the thing that struck me going into, I suppose,

Starting point is 00:30:35 the large-scale big data world is how it's all running in services now. And nobody now, really, who's actually using this stuff at scale, especially in the kind of software startup world, is working with actual servers now and it's all about serverless architectures and services and so on and that's why i think that um you know certainly the work you guys are doing with the with the computer edition big data service is kind of very interesting and uh but it does mean it's i think people who work with hadoop now don't seem to realize perhaps don't

Starting point is 00:31:03 realize how much the of an impact the world of services and serverless will have on what we're doing now really i mean it's it's certainly quite a paradigm shift in how we think about things really well i i agree with you right and and it's it's by the way uh i don't think it's all rosy and wonderful, right? There's a whole bunch of things I think need to be solved, right? And like, I think object store is amazing and will over time, or rapidly replace like things like HDFS and many other storage points. But it doesn't have a great access control, right? Its security paradigm is, quote-unquote, you have or you don't have access.

Starting point is 00:31:51 That's kind of the granularity. And there, I think, is where, and this is an opinion, right? I think in the foreseeable future, we will have a mix of serverless stuff running, data in object store, whatever, and working with it. infrastructure where we where we do run a specific distribution just because i certified my apps on it and and how do i get everything recertified and all of that so i think there's a transition period where you do need to offer both and i think that's coming back to what we started with right that's i think one of the big differentiators we do bring to bear because we have

Starting point is 00:32:39 both of these models in place and it enables a customer to start where they want and go to, right? And keep in mind, right? If like you were saying, you worked in like the startup areas and everything's kind of nice, new and shiny. Yes. Because I don't have whatever 7,400 core banking systems lying around, whatever, right? Totally. And this is actually quite a nice to get into the bit I want to talk to you about next, really, which is I'm just curious. Obviously, there's been a profusion of different ways to solve, I suppose, distributed compute and query in the cloud over the last couple of years. And as we talk about, I mentioned things like BigQuery, we've talked about, you know, those kind of, I suppose, different takes on serverless, I suppose, data warehousing, big data in the cloud. But I'm kind of curious to get your personal take

Starting point is 00:33:25 on some of these technologies and some of the ways they're solving this problem really and obviously I think one of the things is probably sort of fair to guess is that Oracle might announce some stuff in the future around this but nothing's announced yet this is really kind of I suppose I want I'm interested in your opinion in some of the different solutions to where this technology can go in the future and and something I've always been kind of curious to know about is, you know, BigQuery, Google BigQuery and Amazon Athena. So those kind of very canonical takes on serverless, you know, query in the cloud and that sort of thing.

Starting point is 00:33:54 What's your view on that and where does that work well and where does that kind of like, you know, run out of steam and that sort of thing? I think it makes perfect sense, right? I mean, it's if I have a question and I need to ask that question kind of now when I don't want to buy anything, I just want to run my thing, right? I think the architecture makes perfect sense. this is sometimes maybe ignorance, but as long as you can't guarantee SLAs, it becomes a little hard to make, like, to depend on some of these things, right? And I think a big requirement of a whole bunch of BI and analytic queries is it has to finish in a predictable time.

Starting point is 00:34:39 And I think one of the things we learned in the Exadata days and the data warehousing is, like, it's really nice that it runs really fast now in like three seconds and in like five minutes tomorrow. Customers hate that, right? They hate the fact that this query I need to know now needs to run in three seconds because it did so yesterday. And I think there you're going to run into issues because how do you guarantee, in air quotes, right? We can't see you in a podcast, but how do you guarantee SLAs on serverless? And I think that's where some of that is potentially not completely like the right thing to go.

Starting point is 00:35:14 I agree. I mean, I actually, at the moment, I'm managing a product that runs on BigQuery, and it is actually an analytic service we provide to customers. And it's querying, i mean it's querying you know famously it's querying kind of petabytes of data and we can go from that you get a very good response time is that but they're never entirely consistent and the interesting thing is that last almost the last the last part of the query you know getting a query down from say 30 seconds down to consistent five seconds is is not i suppose the kind of the the space those things

Starting point is 00:35:42 work in and um and the other thing is the other thing that's interesting with those products is when you port a data warehouse workload to those environments, so first of all, there's a whole kind of like question around data modeling. You know, do you port the same sort of normalized structure into those? The answer is no, obviously, because they're not going to work very well. Yeah, I was going to go with that. Yeah, but, you know, and, but also how do you handle some of the things like, I don't know, slow change of dimensions

Starting point is 00:36:08 and some of the things that we take for granted in the kind of the analytics world are not possible in there really. And I think that, you know, when you look at something like BigQuery or Athena, you look at them and think, that's it, it's game over. You know, you can do everything you used to do in Oracle in this or Kudu or stuff like that. But actually, you know, it's when you come to do it

Starting point is 00:36:24 and you realise the kind of the full part of the solution is not possible. And I don't know. I mean, again, I think to really appreciate things like the Oracle database, you need to kind of work away from it for a while sometimes. I think it is. And I think like we're talking about this, right?

Starting point is 00:36:42 And it's a little bit like the stuff we talked about earlier. Like we have a database and we have this Hadoop. Oh, this Hadoop thing is going to kill this other thing. And I think what we've learned there, or at least I hope people learned, is that it's like don't try to shoehorn everything into one thing, right? Within reason, you want to simplify your architecture, right? Obviously, if everybody got to choose, I have one thing that i manage and maintain and life is great right but but you see these characteristics come up and and i think somebody at some point in time at tdwi said i think um somebody from facebook so like don't don't like like don't worry about like and or right it's not it's not

Starting point is 00:37:22 about or it's not like this thing or that thing just just if you have a set of workloads or a set of problems and they get really well solved in big query athena or whatnot just go use it right but but but don't kind of assume that because it's called because it runs some sequel don't assume it can run everything and and does everything and is exactly right for everything you want to do right so i think i think that's the thing we need to kind of stop doing. I know. I think those of us and probably all of us in this camp that are tech enthusiasts,

Starting point is 00:37:50 we tend to sort of look at something and see the potential and think this is going to be revolutionary. But, you know, that is a mistake to try and get this new thing to do everything the old one used to do and to try and think, I suppose, that it's going to replace everything. You know, you will still have database servers around the future. I guess the kind of the issue is around cost of those sort of things, really. But certainly, another take on this is SnowflakeDB.

Starting point is 00:38:14 I mean, I'm curious. I know, obviously, the people behind that are Xoracle and so on. That's always been an interesting product from my perspective, in that it kind of, of obviously it has the elasticity of of some things we're used to kind of running in the cloud but it's it seems quite oracle-like in the way it works I mean what was your what's your view on on on snowflake and and this is maybe I'm very very clear as a personal view you know what's your view on what they've done the problem they're solving and so on um I think it like first and foremost yes the guys who built it right we know

Starting point is 00:38:46 fairly well I know yeah it strikes me as an interesting one to reproduce the things about Oracle in the cloud like that I don't know it kind of is right I mean I think the thing it proves let's ignore the technical implementation for a second because these guys

Starting point is 00:39:02 are smart they know how to build databases and I'm sure the thing works really well. I think the interesting thing of it is, hey, while everybody's on the Hadoop and Kafka and whatever bandwagon, they went somewhat far more traditional and said, you know what, we're just going to build an analytics database engine that just leverages cloud, right? So here you see, I think, a good way of saying cloud gives me so many benefits if I could only architect to it, right?

Starting point is 00:39:35 And it's kind of what you were saying. Hey, this BigQuery thing is really cool, but I do have to redo my modeling. And these guys said, like, hey, this cloud thing is really cool, and I can do X, Y, and Z and that makes my data warehouse solution really interesting, right? And they're not the only ones who figured that one out because I think,

Starting point is 00:39:54 I'll do my open world pitch here. Yeah. I think you'll hear a whole bunch of it in terms of data warehousing, managed service, stuff like that from Oracle at open world because I think that's where databases are going. Managed services in cloud get away from tuning things, right?

Starting point is 00:40:17 Get away from, I need to have an army of consulting or an army of specialists or very smart people that are going to tweak the parameters and make this work right the system needs to take over that role and i think that's where ai ml wonderful buzzwords are really going to drive the way we're going to deploy services yeah i think snowflake is is one one take on that right yeah snowflake's interesting one in that i i talked with you know kent graziano fair bit and i've always been a bit of a skeptic on snowflake although i know a few people there and and respect what they're doing and but snowflake i think is a little bit like sequel

Starting point is 00:40:50 in that i've had to eat my words a little bit over the kind of like over the last year a little bit with uh with these in that um first of all it struck me um people like you a while ago were saying you know sequel is the language of big data and i used to scoff at you from the audience and uh and actually and actually the more the more that I work with this stuff, the more that I kind of end up agreeing with you. But also, I was kind of questioning why Snowflake built things like, I don't know, reference integrity into their engine, and why bother to support things like updates

Starting point is 00:41:19 if all your workload is a kind of query workload. But then you realize, well, how do I update dimensions? How do I update kind of um is a kind of query workload but but then you realize well how do i update dimensions how do i update kind of reference data and and it's i've got grudging you know more respect now for for snowflake and i think it's um i'd be interested to see you know if and when i will announce something you know whether it's along these kind of lines and and but you're right though that that actually sometimes we sort of we try and sort of you know we try and sort of innovate so hard and we launch something that maybe is ahead of the market,

Starting point is 00:41:47 but actually putting the features of an Oracle database or relational database with the elasticity of the cloud is a winner, isn't it, in some respects? Yeah. And I think what you were saying, right? I mean, we look at this and say, like, oh, who needs updates? But I think the thing I learned in in in financial data warehousing is uh people do restate things right they they get transactions and then they turn out

Starting point is 00:42:12 to be not the actual transaction that truly happens they have to restate so if you want to go to hadoop and restate as it's like uh not so much fun yeah so what about things like impala and presto and so on i mean there's obviously a lot of products out there that have gone to the mpp kind of route i mean are they do you think there's something that is a bit of a kind of a i suppose an evolutionary sort of dead end or or what was your view on you know take impala for example and presto what was your thoughts on those i think at the end of the day it it is a little bit of a dead end because they're all, at the end of the day, SQL engines. And he or she who can run the most complex SQL

Starting point is 00:42:52 in the most concurrent manner wins. So I don't think the game is, oh, I can write a SQL engine or, oh, I can run queries. The game is the full-on complexity of a real BI workload. Now, that doesn't mean that they're not useful or good products. It simply means that if you want to play in this area, and by the way, this is, I think, where scale really matters. If you want to play in this area,

Starting point is 00:43:22 you better deal with what people think a database is and does. And while they forget very quickly, the moment you can't do something, you go like, oh, okay, that's why I had one of these things. The scale is interesting. And if customers ask me, so you're basically telling me I should never use Impala or I should never use this.

Starting point is 00:43:42 And it's like, no. But there is a lot of questions and queries that can be answered by, for example, running Impala on our BDA, for example. You don't always have to go to Oracle, right? But if you want to run BI dashboards at scale and concurrency, you probably want to run Oracle. But if you want to have their discovery and you want to hack about some stuff, why wouldn't you use Impala? Why? It's like, again, it's like, it's just combine them and use them

Starting point is 00:44:09 where they really make sense. But I do think that at scale, large enterprise deployments, the SQL engine is to some extent more important than anything else. So taking a sort of a look forward to the future, I mean, again, something that struck me as I moved into work,

Starting point is 00:44:31 in my case actually more with Google Cloud recently, was that when I came across BigQuery and I realized that it had the characteristics of a big data system in that obviously it's scaled very horizontally and all this kind of stuff. But it had also the characteristics of a SQL engine. And very kind of what you know very horizontally and all this kind of stuff but it had the kind of also the characteristics of a sql engine and it kind of struck me at the time that as data warehouse workloads as big data workloads moved into the

Starting point is 00:44:53 cloud you know in in a way the technology that underpins it is going to become less relevant you know it's going to become distributed compute and storage sql on the top and so on i mean do you think you know in a way do you think kind you think kind of the next generations of me and you really kind of working on this technology will not really have this distinction of kind of data warehouses, relational databases, big data and so on. It would just be one big query service running in the cloud.

Starting point is 00:45:15 And actually, the mechanics of it and how it works in the end is less relevant. It would just be a service. I mean, do you think that's the way it's kind of going? I mean, that's been my view. I think so. Yeah. And I should actually add that i hope so right because because at the end of the day uh i i really don't know how a network works right i don't know why

Starting point is 00:45:34 i'm talking to you over skype and how that really gets routed and works and why would i care right and and i think and and i think this is what we internally look at as well is I just want to ask a question. And can you just give me an answer, please? And that's a very simple thing to say. And that's where I think BigQuery and Athena and whatever kind of plays really well. But the next level down is I would like to ask any question. And by the way, my SLA is four seconds. And by the way, my this is that. So I think it's the combo of this whole serverless scale out architecture,

Starting point is 00:46:12 infrastructure, whatnot, and then the somehow guaranteeing me for a class of workloads SLAs. For some, I don't want to pay that money to you. So just run it whenever you can. Right. And I think that whole gamification and that whole kind of cost-based, which I don't mean cost-based optimizer, but to some extent I do, I guess, cost-based optimization of queries, SLAs and data positioning, I think that is where it is going. And while I'm a big subscriber to SQL as the language, I do think people want to ask graph questions without completely and fully understanding what that really does, right?

Starting point is 00:46:51 And like nobody argues why certain links are at the top of the Google pages. There's an algorithm that does it and it's probably reasonably good, right? And so I think we're going to dumb down, it's a bad word to use, dumbing down, but the consumption of it is going to be so much simpler and the skills required to manage, maintain, set up,

Starting point is 00:47:14 all of that is going to be far less. So in the market going forward, let's imagine we were talking about kind of how's Oracle going to differentiate itself really from say sort of Google and Amazon and so on in this space I mean we talked about we talked a moment ago about it just becoming services and so on you took and you mentioned about SLAs and so on there but I suppose what's the kind of angle that you guys are going to have going

Starting point is 00:47:40 forward to convince an organization to use you rather than to use say Google Cloud or something? What's the particular Oracle kind of angle or market you're aiming at really here? I mean, we talked about enterprises earlier on, but everyone wants to sell to enterprises. Where do you think Oracle have a particular kind of strength here, really? So I think it's actually the product manager details as well. So you know all the wonderful comprehensive and all of these beautiful, like, oh, we have it all words.

Starting point is 00:48:09 But I do think there is kind of like a big leg up Oracle has in this, which is one, we are one of the few who do IaaS, PaaS and as well as SaaS, right? And so the integration of it and potentially the ease of going between these, let's call them services, right, is going to go forward is going to be a very, very big plus to Oracle. Oracle Data Cloud is the other one, right, where we have whatever, five, yeah, so five billion whatever households and whatnot. So Oracle invested on the acquisition side a lot of money in building up this whole data cloud initiative.

Starting point is 00:48:54 And it is essentially he who has the most data or she who has the most data who is most attractive to have your data come to our cloud. Because if you put customer information in our cloud and you can now mash it up with a pile of 5 billion other customer records that we could use to enrich, augment, refine, and doing all of that, that will tremendously drive the value of your data and your data in our cloud. And I think Google, Amazon, and various other, and I'm not the expert on this, but actually use all of this, right? They use the data in Oracle Cloud to augment their stuff. And I think that's going to really drive some of the macro decisions

Starting point is 00:49:40 as to which one of these clouds do I pick. What about next generation of developers? I mean, one of the kind of gripes I've had with kind of Oracle big data and the cloud and so on is it's so hard to get hold of access. I mean, you and I have discussed this in the past where, you know, I mean, because I know people, I know you and so on, I've always been able to get access. But I suppose one of the kind of side effects of Oracle selling

Starting point is 00:50:03 just to kind of enterprises and not to the kind of side effects of Oracle selling just to kind of enterprises and not to the kind of startup market and so on is that it's actually quite hard for somebody on the spur of the moment to go and pick up their laptop and get access to Oracle big data running in the cloud. I mean, I know part of this will be down to capacity and so on, but are there initiatives going on to make that a bit easier to try and seed the market a little bit and nurture the next round of developers really in this area? I think very much so right and it's a nightmare it's a nightmare for me yeah i'm with you right i mean it's it's um keep in mind that transition is not an easy thing

Starting point is 00:50:36 to do um right and and i think what you start to see and and and i'm i'm sure if you would have like like we were looking at at this whole bDA thing in 2010 and this Hadoop thing scratched our head, I'm sure if you would have gone to Amazon Cloud in 2010 or whatever, you'd go like, huh? Right? I mean, there is this, like there is a set of years invested in certain vendors' infrastructure, and we're investing and catching up quickly. It has a downside, right? We still have some extent. We know how to deal with complex contracts and we know how to deal with very large procurement departments and we can do all of that.

Starting point is 00:51:17 Makes it a little harder to deal with the other side of the coin, right? And I think other companies are transferring the other way. So it's both are kind of complicated. But we're hard at work to flip this company around and really getting to the point where you should just go to cloud out what are the cool groups to hang out with and kind of what are the things that we've learned from the other implementations. And so we can implement some of that. And I think you'll see a lot of that come to fruition in the next-gen infrastructure IaaS stuff because that stuff is really cool. Exactly. So we're getting there, I think.

Starting point is 00:52:08 I think so. I think so I mean certainly for me it was a kind of comedy thing where I try to get so I try to sign up for the elastic cloud service and I think I think I think I was the only person I was the first person ever who wasn't a part of a company to actually kind of put the illiteracy swipe a credit card and try sign up for it and I it with the order went through and and I ever just looked at it did not't know what to do with it, really. But, yeah, it's early days, I know. And I think it's... But certainly, I suppose, it would be... For me personally, it's been quite hard to keep up

Starting point is 00:52:32 with kind of developing on the Oracle Big Data Cloud platform because it has been so hard to get hold of the software and so on, really. And, you know, I suppose that is something that's quite important, really, going forward. And, I mean, yeah. So, obviously, there's quite important really going forward. So obviously there's Open World coming up soon, and you're going to be obviously there and presenting and doing keynotes and stuff and that sort of thing.

Starting point is 00:52:54 I mean, obviously you're limited in what you can talk about, but give us a flavour of some of the things that you'll be speaking about and some of the things to look out for really from your area at Open World in a few weeks time okay um let's just start with the one that we alluded to a little bit and i'll i'll say like four words about it but but data warehousing managed service cloud um those three words uh are are probably going to be um prominently present many things from what I can tell right now. And I think you'll see a bunch of announcements and things around that. And so it's really an exciting, I think, opportunity, but it's also an exciting topic that will come out of open world.

Starting point is 00:53:43 Go to a little like lower grade or lower granularity topics um some of the stuff we're we're really trying to do is is kind of what you were saying like how do you get the next generation developers like like working on stuff so one of the things we're we're really trying to figure out and we keep on going back to like how do i make it simpler for somebody to use something without necessarily having gone to like data science school. Right. So one of the things that,

Starting point is 00:54:09 that certainly my session will be about is, is how do I have data in my object store or let's just, just a place. How do I get that into my Hadoop cluster? How do I get a notebook running on it? How do I make hive definitions on top of that cluster? How do I get a notebook running on it? How do I make Hive definitions on top of that? This must be simple, right? And I think we spend a lot of time over the past three to four months to really make that workflow blindingly simple.

Starting point is 00:54:36 Right mouse click, right mouse click, right mouse click, and off you go. if you want to load large data sets or you want to take a training data set of multiple terabytes out of object store into HDFS, but you don't really want to write a Spark job or you really actually don't want to go to ODI and click all of this together and load it, right? I just want to go, dude, put this thing in my HDFS and move on. And so we spend a lot of time building like better mousetraps to some extent, right? Getting this stuff out of that and driving it into that.

Starting point is 00:55:08 So that's some of the stuff we're looking at. We're looking at cloud machine and what that really is going to drive into the market and how we think we can working on pretty hard is, is how can we make files much more secure? And how can we have SQL access to that kind of have baked in security? And I'm going to be somewhat vague about this, but at the end of the day, what we're trying to figure out is if I go to object stores and I can't implement a fine-grained security mechanism,

Starting point is 00:55:58 somehow I need to be able to define roles and responsibilities, and the file itself somehow needs to encapsulate that. And so if I interrogate a file, I should potentially get a different answer than Mark interrogating that exact same file on the exact same system. And we've dubbed it an enterprise parquet as the working title. And it actually does run in Parquet. And that is something where we think there is an interesting future on making files far more secure and more versatile

Starting point is 00:56:35 and more from a data government's perspective, I don't want to create a source file, Parquet schema one, Parquet schema two, ORC schema three, like can we condense this all into a source file parquet schema one parquet schema two overseas schema three like can we condense this all into a single file and that's something that we'll be talking about at open world in in like a small corner of one of the sessions and just chat about but i think it has huge potential excellent well it's great well it's been great to speak to you jp i mean it's uh i mean certainly i've you know it's been much about 17 quite about 17 years now since I've known you back from the days of OWB and ATI and that sort of thing and you've been you've been kind of proved right over the years and that sort of thing so it's been it's been

Starting point is 00:57:13 it's been it's been interesting sort of like hearing from you and getting advice from you and that sort of thing and particularly you know you must have seen you've obviously seen a lot over the years in terms of technology and so on but a lot of things you know I guess for me that the themes out of this kind of conversation and the themes i suppose from what i've been looking at is how you know tabular storage how sql how how those sort of things are eternally important and i suppose also how new technologies that come along just just don't replace the other one really you know that there would be a need for different ones there um you know you'd have a need for the database need for kind of hadoop and that sort of thing there and and yeah trying to sort of shoehorn everything into the same thing is is is

Starting point is 00:57:52 crazy um but also keep an open mind really i mean i think you're a good example of someone who has kept an open mind over the years you've gone from you know gone from sort of very much client server kind of etl tools and so on through to this and uh that's why it's all fun really isn't it i think it's what it's why i particularly enjoy working in this industry, in that every year things change. The knowledge you've built up can be useful, and it's certainly, you know, I think you can bet on how old you are, but the age I am as well, the knowledge you've built up is useful,

Starting point is 00:58:18 but also it's all exciting as well. So it's been great to speak to you, really. Yeah, it was fun to be here, and you did make me feel a little old at the end of there but it's I guess that's what it is no it's great I mean it's it's about thinking we're still relevant which is that which is the key thing really so yeah it's been great well I hope open world goes well and it's been really good to speak to you and Stuart Bryson says hello as well so he'll be seeing if we see you open I won't be

Starting point is 00:58:43 there but it's yeah it's great to speak to you and um take care and speak soon yep thanks

Drill to Detail - Drill to Detail Ep.38 'Oracle's Big Data Strategy Before OOW' with Special Guest Jean-Pierre Dijcks

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.