Drill to Detail - Drill to Detail Ep.4 'Reference Architectures Revisited' with Special Guest Andrew Bond

Starting point is 00:00:00 Hello and welcome to Drill to Detail, the podcast series about the world of big data, business intelligence and data warehousing, and the people who are out there leading the industry. I'm your host, Mark Rittman, and Drills of Detail goes out twice a month with each hour-long episode featuring a special guest, either from one of the vendors whose products we talk about all the time, or someone who's out there implementing projects for customers or helping them understand

Starting point is 00:00:35 how they work and how they all fit together. You can subscribe for free at the iTunes Store podcast directory, and for show notes and details of past episodes, visit the Drills of Detail website at www.drillsofdetail.com, where you'll also find links to previous episodes and the odd link to something newsworthy that we'll probably end up discussing in an upcoming show. In this episode, I'm very pleased to be joined by someone I've known actually for many years. First in my days as a consultant, where I helped his company implement an Oracle financial planning application way back in the early 2000s, but more recently through his work as part of Oracle's enterprise architecture team,

Starting point is 00:01:08 where he and I collaborated on several updates to the reference data warehouse and more recently big data architecture. His name is Andrew Bond, and like me, he's a fellow Brit. And so, Andrew, why don't you introduce yourself and the architecture work you do at Oracle? Yeah, okay. Thanks, Mark. It's nice to be here and it's nice to think about old times as well. My role now is I head up both the what's called the cloud enterprise architecture team, so the client advisor team within Oracle across Europe, Middle East, Africa, and also Asia Pacific. Those two teams, ostensibly, the client advisor team is really to have a business transformation conversation. And they tend to work with our biggest accounts, our key accounts,

Starting point is 00:02:01 to drive a conversation around, for example, cloud adoption, digitization, enhanced transformation, and transformation of the core business. And then the architects, the cloud enterprise architects, are really there to talk about how that can be made a reality, both in terms of how the building blocks that we can use to create that but also in terms of creating a roadmap for IT adoption and as the name implies increasingly they're building solutions based on cloud and I mean that in the broadest sense um but typically cloud-based technologies interesting interesting so so in the first episode of this kind of podcast i had um stuart bryson on

Starting point is 00:02:52 the call and we talked about the uh the reference architecture that we worked with you on a few years ago particularly the uh the one that incorporated some of the thinking around big data and some of the ideas around kind of execution layers, innovation layers, and so on. And we talked on that call about, I suppose, in a way, a couple of years on, how do we think about that? How much did we think that the architecture that we worked on and talked about with you was being used? How much of it maybe in retrospect is maybe not so kind of relevant now? And also we touched on how cloud would affect it as well. And I know later on in the call, we're going to talk about kind of relevant now. And also we touched on how cloud would affect it as well. And I know later on in the call,

Starting point is 00:03:26 we're going to talk about kind of cloud, particularly in context of kind of big data. But for the listeners on the podcast, do you want to kind of talk about the reference architectures that Oracle do and that we worked on, particularly this last one and some of the thinking behind the incorporation of, say, big data and fast data and that sort of thing?

Starting point is 00:03:42 Just to allow first of all, what it's all about. Yeah, okay. big data and fast data and that sort of thing just just that alone first of all what it was all about yeah okay so i think historically we we had we'd found that the and and and we were sharing experiences with you from a long long time ago on this we had all of the components that were needed to do good BI solutions, good data warehousing solutions. But I think you and I were both seeing projects where the technology may be great, but we were getting a lot of these things wrong, frankly, and our customers were getting a lot of things wrong.

Starting point is 00:04:18 And we started wanting to build a reference architecture to encapsulate best practice. And this is going a long time back now. But probably the first iteration of that started, what, eight, nine years ago, something like that. And we went through several iterations working with customers, working with you in particular and other organizations like yours. And we came up with the latest iteration, as you said, a couple of years ago.

Starting point is 00:04:53 And it was really building on things that we'd done in the past. I think we'd found we'd moved already away from a kind of historic view of things being BI and data warehousing. And we were going off down a route where we were talking about slightly more agile ways of delivering results. And I remember even in the second generation, we were talking about things like Query Federation and those sorts of things. But I think what we found was that there was, particularly in big data sense, there was a lot of confusion about what big data actually meant. I mean, probably two, three years ago, I think you and I were probably both getting quite frustrated about these terms like unstructured versus structured data and an understanding of what that actually meant. And we kind of zeroed in on really, rather than this being about types of data, it was more about being able to deliver results fast and almost like a pace layering type approach to information delivery.

Starting point is 00:06:09 And I know with you and with Stuart in particular, we were having a lot of really interesting discussions about the agile and scrum type methodologies for delivering an information discovery process and that becoming part of the overall information delivery were the key things that big data type technologies could be exploited for and could support. At the same time, as you said, we started to see what had traditionally been on the periphery as maybe the last thing you did put into the heart of the architecture. And we'd seen that with some data warehouse and good BI implementation. But big data really enabled that. And particularly the piece around, as you said, streaming type analytics and fast data. And what I would term, you know, IOT type architectures.

Starting point is 00:07:05 And we really started to see these move to the heart of it. So one of the things we developed was a very high level, in fact, higher level than we'd had before, conceptual model. And that's pretty much worked for us. nice way of introducing best practice and architecture and architectural thoughts at a very very high level to not necessarily a tech savvy and certainly not a data warehouse savvy audience and then we moved on from there to really overhauling the logical architecture and specifically introducing techniques around data reservoir and, and moving from what you could term information discovery through to the consumption of a system of record type data.

Starting point is 00:08:07 So you could almost lay over this, the PACE layers from Gartner, but you could take them rather than being systems of innovation, systems of differentiation, you could almost say this was data of innovation, this was data of record and so on. And we had a nice facility for doing that. So I think those things really worked and we got a lot right there. were and still do uh uh uh particularly came down to um certain elements of physical choice and i think we'll talk a lot more about um things like um tapping to the apache uh toolset a little later on um and and choices around um uh things like polyglot versus multi-model type choices, specialization versus consolidation and standardization. a lot of desire by organizations at certain levels and, and almost two competing forces with a lot of desire to exploit new technology,

Starting point is 00:09:29 which was great. But at the same time, you know, a lot of desire and understandable desire from it to try and, uh, keep control over the proliferation of technology. And I, I, I still, I i what we've had to do with the architecture i don't think the architecture per se does a great job in explaining where and what to use for certain use cases um and we've had to do a lot of work particularly in terms of documenting things like the Apache Zoo. The other thing that we probably didn't do a great job of explaining first of all and partially because I think we weren't really thinking in these terms, but was things like Lambda architectures.

Starting point is 00:10:26 So we started to see Lambda emerging as a trend. And I don't think we'd really worked out, and I don't think our customers had, the use cases for it. So, for example, we'd find Lambda being considered, and we didn't really have good rules of thumb, and the architecture didn't really support when we should make decisions on that and when we shouldn't. I mean, obviously, particularly in the context of you, the way you talked about it before in terms of fast data

Starting point is 00:11:02 and streaming and those sorts of things. Lambda was intended for this ingestion and processing of timestamped events. And we saw, and certainly rather than sort of overwritten data and state being determined by, you know, natural time-based ordering of the data. I mean, you know, it'd come out of the sort of social media space. And so typically the space it had come out of was the events were mutating over time and maybe the accuracy wasn't typically required. Whereas we were certainly historically, the architecture that we had was coming from the point of view that actually consistency and accuracy were very important. And this is where I start to think, you know,

Starting point is 00:11:57 that being able to classify data according to the architecture became really important. And as as i said you know we started the combination of that uh desire to um event stream process data in an extremely fast or very fast type way uh to better capture data capture events have different message types and being able to exploit them quickly whilst at the same time being able to use reliable data so for example I would have events that are happening to a customer right now and transactions that a customer were making but but my my next best activity may well be determined both by what's happening now and also by what goes on in the future and what's

Starting point is 00:12:51 gone on in the past in terms of things I've derived about that customer that became really interesting and challenging first of all and also more and more technology driven. So, you know, I mean, what tended to happen was we would have conversations where people not only told you that they were going to do Lambda, but actually that the technology choice that they were using was, I don't know, Kafka and Flume or some other physical aspect and then there was a piece that we had to do then in terms of sort of revisiting architecture a little bit and revisiting architecture best practice to to make sure that we were actually doing the right thing so so yeah Andrew I thought that looking back I mean you meant you I heard you know you mentioned at the time about the start there you know it's been several years and it has been, you know, and several years in kind of, you know, internet time and big data time is a long time. And so I think certainly it would be interesting to think about, you know, what the architecture would look like now.

Starting point is 00:13:58 Would it be any different? Would it be, you know, would it be the same but i think particularly some of the things that worked for me in that architecture were first of all that it kind of legitimized big data brought it into the kind of the the kind of the view of the people that do enterprise architectures and the customers that we deal with and that was good i think particularly the the kind of separation of innovation and execution that for me if you took one thing away from the architecture um and for anyone listening we're going to put the reference architecture link on there on the show notes so you can read it and so on it's a presentations and some pdfs and so on but in this architecture it made this distinction between execution layer and innovation layer on the basis that customers that do kind of you know they do start a big data

Starting point is 00:14:39 project they might well start it off as a kind of like a skunkworks project and then that goes into production but then to further innovate you need to still have that sort of that ability to to try things differently try different tools and so on so that for me was was really good and i think um some of the bits that were a little bit unclear i think in the architecture then were things like a data factory you know what does that kind of what does that mean and so on but in general i think it was kind of good but But again, looking back, some of the things that have happened, I suppose, since then, or things that are on people's minds a lot is things like self-service BI, the whole kind of rise of kind of tools that don't kind of like require the users to do data modeling at the start and so on and do things from there. I'd be interested to get your view on kind of, you know, where they fit in and so on. But also, I suppose, one question that I often used to have is you know where does the data reservoir go because

Starting point is 00:15:29 you know you could say it's a system that is being kind of maintained and it's a very much a kind of like run the business system but there's obviously kind of like less controls in its schema on read in there and so on I mean just going back to that and I want to get on to things like cloud later on and so on but did you have any kind of questions from customers about things like self-service and data reservoirs and where they fit in? Yeah, I mean, so we had an awful lot of these conversations. And I think, to your point, one of the things the architecture's never done is describe um because i think there's as much here about architecture and technology as there is about operating model and governance and those sorts of things and definitely i think the one thing we weren't being prescriptive on and still aren't is you know um operating model

Starting point is 00:16:23 and who controlled the data and what went into the lake or reservoir i mean i you know we we even i was pleased to hear you say reservoir because we liked the term reservoir because it it implied some rigor and some control uh and um in terms of physically how it manifested itself i i think that was one of the things i was trying to hint i that was that was a real challenge i think it i i my belief is it still is a real challenge when i look at what we do now in terms of things like choices on persistent stores or choices on persistent data stores or what we're going to do about workflow orchestration, what we're going to do about data movement. I think those are hugely challenging, not least because the technology tool set that we've put underneath it is rapidly evolving um um and uh we need to stay abreast of probably a whole series of technologies that we didn't need to know in the past and that new

Starting point is 00:17:40 capability introduces new architectural uh possibilities so um i think the principles of the architecture um and some rough rules of thumb are still applicable so you know i my guiding thoughts are you know we're going to do polyglot but we're going to do it by need rather than by religion we still want to aim for a small tech estate. We want minimum number of moving parts and technologies. We, at the same time, want the shortest data chain that we can possibly have and the minimum number of copies. And a lot of our original thinking, I think, around that, and it still holds. And I certainly think,

Starting point is 00:18:26 to your point about bringing big data into the enterprise class, I think it's probably fair to say that we saw a fair few projects, first of all, where they were probably developer-driven, perhaps by people that didn't entirely understand data and information exploitation so much as understanding MapReduce and therefore proliferation of data was starting to rear its ugly head again and we could take the architecture and critique implementations based on that. Striving for data consistency and if you couldn't immediately get it, trying to get there as quickly as possible and introducing things like the ideas of unified processing logic. I think that was those rules of thumb still hold. The complication for me, and I think where you need additional pieces of information is in particular in the physical mapping.

Starting point is 00:19:28 I think that's where we need to supplement a logical architecture with what we would describe as almost like a dictionary of the Apache Zoo. Interesting, yeah. And we'll get onto that later on. I think that would be an interesting area. I guess as a kind of slight tangential bit with it the thing i find with with any kind of customer or any kind of organization doing big data is often like you say it's very developer driven and it's almost like kind of the days of the fall of the roman empire in that there's loads of projects going on like the barbarians sort of thing and nobody wants to talk about data modeling

Starting point is 00:20:02 or data consistency or or you know the old vendors and that sort of thing in a way you know do you find that you took these these enterprise architectures you're finding they're being used i mean beyond the kind of people you speak to in the architecture department you know do you do cut it right to the bottom down do people believe in this or do we have to do an education job with people um doing big data projects about the reason for an architecture and the reason for these sort of things even reason with data modeling as well I mean do you find that there's it's a person away do you find that the work you're doing in enterprise architectures is getting taken up by the developers and yes but it and I don't think necessarily this has changed, Mark. It's just that new technologies have enabled us to make more mistakes more quickly.

Starting point is 00:20:49 I think even historically, I mean, going back to when you and I were a little younger, people were making the same mistakes quite often. And so there is definitely at the moment, I think these things come in waves, but there's definitely a trend at the moment towards more stylification of BI and there has been ever since big data took off. And whilst I talked warmly about information architecture moving to the part of things like IoT and customer experience type architectures, at the same time, whilst the information delivery within that may well have subscribed to parts if not all of the reference architecture

Starting point is 00:21:47 was that true necessarily at an enterprise scale probably not uh in the vast majority of cases and could more have been done at an enterprise scale to uh both control data and um and steward data in a good way. I think almost undoubtedly, yes. extent um we are all of us as as as um big data professionals as enterprise architects and so on i think we're probably all guilty to some extent of failing to address that point and we are still um going down the road of um fairly siloed information delivery. And of course, the more we, it depends entirely on whether you think there's a role for the kind of the city planner type enterprise architect in the customer organization or not. The more and more we're solution driven, the more this stylification is likely to happen

Starting point is 00:23:06 and if it is then the best thing we can do is make sure that the architecture that's put together for that for that specific silo for that line of business and line piece at least that subscribes to the reference architecture in microcosm even if we can't do this at the at the grand layer um i data modeling i i'm i i kind of shudder um because i i think huge mistakes were made by data modelers historically in terms of going away in a darkened room for 10 years and coming back with is that what you wanted answer probably not or if even if it was, yeah, but now I want to answer the next question. I think that was something that you and I and Stuart really latched onto about big data was that we didn't have to put that up front. I think data modeling

Starting point is 00:23:58 got a bad name for itself because it took you ages to deliver anything. And then it took you a long time to iterate. It seemed to me that with big data technologies in particular, we could iterate much more quickly because of the schema on read factor. Yeah, I think, you know, in a way, we had a debate on Twitter yesterday, as you do, about data modelling. And it's an interesting area. I mean, I think, you know, in a way, data modelling is an industry that's ripe for disruption, really.

Starting point is 00:24:29 I think that the kind of the people that do data modelling are often the most kind of pedantic, the most kind of dogmatic people you can get, you know, for good reason, the same way you might make... Yeah, and it sort of struck me that it was definitely an industry that was ripe for disruption. And if you look at when the Gartner report came through, the Gartner Magic Quadrant, and there was the kind of modern BI platform, and some of the ideas they came up with, which was that data modeling should be optional,

Starting point is 00:24:54 and it should be as automagic as possible to get scheming out of the data. Your first reaction was, well, that's ridiculous. But actually, you can see why customers would not want to kind of spend months and months and months doing some kind of data model design that would be kind of very rigid and and then as you say by the time the questions come along um yeah the questions changed but i think also probably the reaction against it was was too extreme as well to sort of say we don't need that at all and like all these things is that kind of like boring but but kind of you know probably sensible thing in the middle of some of its needed and some it's not but i think certainly i think like a lot of things i think the days of customers paying for very kind of large scale bi projects with lots and lots of

Starting point is 00:25:34 upfront data modeling they're just gone really so in a way like you said with the architectures you've got to accept that's not going to be there anymore and it's how can you make the best of what you've got and how can you still try and sort of drive through that quality and so on. But that for me, you know, data modeling and self-service, they're kind of areas that have really changed things. Another area I want to get onto really that I think is another big change that's happened since that architecture was done is cloud. And I think like a lot of things, any kind of new innovation in any industry, just do this just do you just do the this new thing in the old way so a lot of cloud um you know bi implementations that i saw you know by the start or even big data ones we're just porting the same thing into cloud and running it kind of in a shared server rather than you know your own but but how how have you guys has

Starting point is 00:26:19 oracle seen cloud affecting big data enterprise architecture is it just a case case of putting it all into Oracle Cloud and running it there? Or fundamentally, has it changed things or did it give us new options really? What's your take on cloud and Oracle? Yeah, I think this is a fascinating area. And it's probably gone faster than I thought it would. And I have to caveat what I'm about to say by I think this is where working for Oracle, I probably get a slightly different view to maybe some of the other vendor employees. I mean, because we do have such a wide capability. And this becomes important because there are a lot of elements. I kind of want to focus on what I perceived as the challenge, first of all, which was around things like data ingestion and creating things like multi-tenancy support

Starting point is 00:27:28 and having trust in that multi-tenancy support. There were a lot of questions around networking, bulk data loads, incremental data loads, bidirectional and unidirectional data movement and replication across data centers. I mean, questions around BI tooling, data discovery, exploitation, questions around things like access to on-premise tools when your data's sitting in a cloud. And then most obviously, I think the big question a lot of people were asking was around security governance organization so you know i want this solution to integrate with

Starting point is 00:28:15 the old app because in many ways this data is my crown jewels i want to comply with regulation in terms of things like data masking encryption now i would argue that a lot of those things, or audit, monitoring, that sort of stuff, but I would argue that a lot of those things customers probably should have been thinking about anyway. But cloud and cloud-based solutions make people think about them even further. So, right. So, to answer your question, can you go and take that architecture and kind of stick it onto an infrastructure service? Possibly. But that's pretty dull, to your point. And probably not a transformation, maybe a, an it cost saving, but probably not a true transformation. And I think that within that, if that's all that you do, you know, there are interesting questions to answer about, um, particularly about things like performance and, um, whether you can deliver the analytics at the speed that you actually require with a fairly standardized infrastructure. So what we found is that actually we could set up and we have and we have defined end-to-end solutions for big data analytics first of all now and that I think

Starting point is 00:29:51 that's quite startling the fact that we can take not just elements and the silos of the architecture most most obviously you could take a BI application running off of a sales or marketing or erp solution and of course you can deliver a bi reporting solution on top of that i mean that's obvious but what surprised me is that you could actually put together an end-to-end solution with a data landing area, a real-time data landing area, a data reservoir, a data warehouse, BI tooling, discovery and development type lab. And you could do all of that in a cloud, even though the sources of that data were predominantly on-premises.

Starting point is 00:30:48 And typically, the way we've seen that being architected is you end up with some kind of on-premise data hub, which is then pushing data out into what is a fairly classical architecture. So we've used components like data integrator, then pushing that up into things like our storage cloud, for instance, our storage cloud service, then moving that, and you can imagine that as a landing area, moving that along then to a big data cloud service, an exadata cloud service in our terms, but more generically a database cloud service,

Starting point is 00:31:38 still running that big data SQL time reach through from one to the other. You could do all of that. You could do the data discovery, again, through those cloud services and the interpretation of the data through things like big data discovery cloud service, BI cloud service. Now, I'm well aware I've kind of gone away from architecture a little bit here into products, and forgive me for that. But the reason I do that is because what's really interesting about this is when you have that ability to have an end-to-end solution within a single cloud provider. And I think what I've seen less of, and this may be because I work for Oracle,

Starting point is 00:32:35 but I guess what I've seen less of is an end-to-end cloud solution where that cloud solution is heavily distributed across a wide number of cloud vendors. I genuinely don't think that happens. I think what's more likely to happen in that scenario is you are going to go for, in effect, an infrastructure as a service approach, which in effect is a virtualization of various on-premise tooling. Whereas if we can elevate that to, in effect, PaaS, and if not bordering on SaaS, I have to say, then I think this becomes much more interesting in terms of the agility effect that cloud can have

Starting point is 00:33:24 in that fast delivery and that fast scaling and all of those things that you would expect with cloud, a genuine business benefit to cloud. So a couple of areas that struck me were interesting in moving this area to the cloud is, well, first of all, I noticed there's that data as a service that Oracle offer. And I think this ability to bring in external data, when that external data could be in the cloud as well, it's much simpler to bring that in, especially if Oracle themselves act as a broker.

Starting point is 00:33:53 And I think there's far too little, it's my mind, external data brought into people's systems like this. And I think, you know, certainly, and I think maybe because the way that Oracle market things, you know, data as a service does not appear on the same sort of presentations as it does for things like big data cloud service. But that data as a service thing is interesting. But also, the other thing that's interesting as well is it's not just BI running the cloud.

Starting point is 00:34:16 It's also the customer's CRM. It's the customer's kind of ERP and so on. Did you think Oracle particularly have got a kind of um an angle which is quite interesting which is you know at the same time you're moving you're moving their their kind of analytics into the cloud chances are what drove that was moved with the customer moving their their kind of erp into the cloud and do you think there's a special opportunity oracle have got there that say kind of you know amazon wouldn't have or microsoft wouldn't have and do you think that's interesting for the future yeah i mean i mean, I think there's a more generic question

Starting point is 00:34:48 around cloud adoption in general, which is associated with your point, which is that I think cloud historically has been seen as either a fairly dumb piece of infrastructure adoption or SaaS adoption. Now, you know, I mean, Oracle plays in both spaces. But actually, I think where cloud becomes really interesting is if you see a cloud as the continuum of IT services. And if you start to see cloud as that, then I think we are almost uniquely placed to be able to provide that complete continuum

Starting point is 00:35:35 of IT services, starting with SaaS, as you say, and then moving down through the stack. I think it's probably fair to say, as I said, we saw plenty. We saw lots and lots of adoptions where customers have started with SaaS to be historically CX, HCM, increasingly now ERP, and you provide the associated business intelligence out of the box, frankly, as you would have consumed a BI app on premise. But now you do that out of the box on the cloud solution itself. And then to your point, you broaden that out.

Starting point is 00:36:22 You broaden that out with other PaaS components, data discovery, and so on. What's really interesting now is, simultaneous to that, we're seeing end-to-end cloud adoption for the whole thing. And that's the thing I think that's really shocked me. I think the previous one was quite predictable. I think the idea of having a complete end-to-end solution in a single cloud is amazing, frankly, and almost regardless of where the data comes from, to be able to deliver a true enterprise solution with all of the data

Starting point is 00:37:06 discovery elements that we've talked about with with data quality and so on and again i i feel like we're fairly uniquely placed there i i agree with your observation about um data as a service now what's interesting about data as a service and particularly when you pick up on the customer experience stuff, the CRM stuff, is that typically it's being pulled through for that reason. But really, a really decent CX architecture will tend to be based from its information perspective on the kind of architecture that we were laying out in the reference architecture and therefore it's it's the sum of that external series of data sources maybe about customers but but then but then uh that data is is is merged with uh mashed up with uh data that you've derived internally about those customers as well um to provide that that overall um best feed of next best activity um and you can combine that in a fast data solution and and and the fact that you can do all of this in the cloud now you can do an end-to-end solution in the cloud which is fully auditable that you can get past regulators even in a financial services type environment i think frankly maybe a sign of

Starting point is 00:38:32 my age but the speed at which we've been able to do that has shocked me um and is very pleasing it's very exciting i mean it's an interesting area and i've been doing a few kind of like future bi presentations of various user groups over the last few months and years, in a way. I think that what... The saying is always that things change slower than you expect and so on, but I think that an analytics project in a few years' time will be quite different to the ones we saw several years ago with a few people, or lots of people, actually,

Starting point is 00:39:03 sitting with desktop tools desktop tools carefully kind of modeling data and carefully feeding it in from from erp systems into a into a data warehouse and so on i think you know moving to the cloud means that i think there's much broader opportunity to apply analytics and machine learning and so over kind of you know erp data in there as well but i think also you know again back to this thing about kind of data things things will need to be a lot more kind of automatic and a lot more sort of you know i suppose did that induction of schema out of data and so on i think the days of carefully kind of crafting these things and and so on and you know back to the things of kind of weekly loads and so on there will be quite different and and i think that we'll find analytics will be everywhere but it will be

Starting point is 00:39:44 quite different in a way to what it is now and it will be much more'll find analytics will be everywhere but it will be quite different in a way to what it is now and it will be much more pervasive it will be much more taking in data from external sources it will be about applying analytics to transactions as they're done um but i think there'll be far less kind of engineering going on it's almost like you know going from say people talk about having you know power plants in their factories and now it's kind of in the grid i think that we'll see massive differences in that way the work people will do there'll still be work for consultants and there'll still be work for implementers but it won't be kind of like hand crafting dimensional models and so on it'll be about kind of you know synthesizing data

Starting point is 00:40:16 and bringing external stuff and so on as well i mean i think i think be quite different in a way and i think that we now think about BI and analytics moving into the cloud as being just porting the same system running into there, you know, that we have now. But I think it'll be quite different. And how it'll be different is hard to tell. But I think all these things coming together means that it'll be, you know, hopefully far more value, but far less kind of hand-tinkering, really. Not entirely.

Starting point is 00:40:44 I think it's fascinating i think you present a rosy view and and and and and potentially one which um which it is um kind of interesting to business users i i think the uh you know but but at the same time i i i think you've got that's definitely one area and the cloud area is interesting from the point of view of automation as you as you describe it and and also big data preparation big data discovery all of those things where you can use algorithm to derive structure yeah i'm i'm fully with you what i've also seen now we made reference to it earlier was the is the rise of the developer and and and that's true in general but specifically we've seen the rise of the developer in the big data space and you know you know, if I took a subject like data preparation

Starting point is 00:41:49 and integration at the moment, and I looked at the kind of choices that we had to make there in terms of the tooling that we would use, I mean, sure, you could say could um say well you know it's great because there's oracle's big data um preparation cloud service right so that's good it's and and to your point it's a visual it's a visual thing delivered over the cloud it's utilizing spark under the covers we're good to go but at the same as that, you've got developers actually not using a visual interface, using, I don't know, morph lines or something like that.

Starting point is 00:42:33 You've got cascading coming along, which will require a plugin to visualize the data pipelines, but is more likely to have been frankly the logic is likely to have been a series of pipe assemblies and so what i think is happening is that there is a developer um a developer aspect of this and a developer orientation of this which i think is quite different and and quite hard for us to get our heads around i mean i you know my my seven-year-olds coding python at the moment and and and and maybe there is gonna be a very significant number of people who don't necessarily sit in it

Starting point is 00:43:26 but have developer skills exactly exactly and i'm not there was a there was an article i think i i posted it or tweeted it a while ago and i think it was on the cloud era blog and it was it was about doing bi in python and it was just pages of code and you know very productive and it's very different to what we do and i think i posted at the time yeah this is the future of bi it's not going to be graphical you know and that's kind of slightly ironic or slightly pessimistic or whatever but you know it is interesting to sort of see i think i was at a cloudera sessions event in amsterdam a while ago and and a guy stood up and very proudly kind of said you know i'm now doing my data loading using kind of scripts and python and so on and you know you felt like saying you, have you heard of the idea of ETL tools?

Starting point is 00:44:08 But you'd been like the kind of the dad at the party, wouldn't you? Telling people to kind of like put an old record on. Yeah, people don't want to hear that. And it's, but I think one thing, the last thing I want to talk about really on that topic, and, you know, you mentioned at the start is just the amount of these products that are out, the amount of kind of Apache products. And you mentioned morph lines, you know, there's, I think for every day and for every vendor,

Starting point is 00:44:28 there's a new kind of like data pipeline tool, a new kind of like NoSQL storage and so on. In your architecture before, you're very careful not to get into specifics of products and so on. But, you know, why do you think this is an interesting area? And why do you think it's something that you keep coming back to really as being an area people do think about? Give us an outline of why this is an interesting area and why do you think it's something that you keep coming back to really as being an area people do you think about give us an outline of why you're why this is

Starting point is 00:44:46 so i think historically i i think that our reason for being uh technology choice agnostic with the logical architecture i think that still applies and that still flies and i'm i'm quite happy with that i think one of the reasons we did it is that there were relatively few end-to-end Oracle solutions and typically we were integrating with something that we needed in architecture that worked regardless of what the incumbent technologies were and we were trying to encapsulate best practice now to your point with with big data there is a very large number of um uh vendors and indeed projects in particular various varying states of maturity and a lot of choices are being made it's almost an architect's nightmare because ultimately what the architects can sit there and do for a while and hopefully

Starting point is 00:45:58 we don't make the mistake of going down the same route as the data modelers and pontificate the rest of our lives but hopefully we actually deliver something but but you spend a long time getting the logical architecture right yeah i'm good with that but then you have to start placing your bets on physical choices now historically we could yeah you were placing bets on one or two um uh choices which which were among major vendors. And, you know, that could go right or wrong, but typically, you know, you were on fairly safe ground and the cadence of change was, you know, fairly easy to handle, fairly easy to absorb. But now that's not the case i mean i i kind of look at the projects that i saw being done two three years ago where people were hand coding map reduce

Starting point is 00:46:52 and i just want to cry because you have to get used to the notion now you are going to throw all of that away all of that code that you saw there mark two years ago and you talked about i mean i mean that stuff is going to go now you there's one or two things where either you say well you know this can't possibly be right and we're going to run away from it or you simply accept that the cost of delivering bi now is that cadence of code change i mean a great example is, you know, data capture and delivery. I think, you know, you and I saw the early iterations of Flume and then Flume Next Generation being completely different. And then along comes Kafka and we're thinking, OK, well, this is a different way of thinking

Starting point is 00:47:40 about this. We've now got something where, you we've we've addressed some of the issues of flume it does other things at the same time um we need a framework underneath that like storm or samsa or something like that and then brilliantly of course we start to learn that we can combine the two and we come up with a fantastic term called FLAFCA where we can embrace probably the two use cases of traditional message broker website activity logging with the aggregation capabilities but at the same time you know still not necessarily dealing brilliantly with things like um bulk load and trickle from dbms into hdfs or um struggling to deliver mission critical sensor data to a cp sync or something like that i mean so i the the problem is with this polyglot thing is it's

Starting point is 00:48:48 very very granular first of all so your choices are are tough in the first place and then the cadence of change you know i and i we've we've got to a point now where certainly from an architecture perspective, we basically keep a set of architectural guidelines for physical choices. evolutionary maturity versus and the other element that we're considering here is the breadth of adoption and mind share of a particular piece. So,'t know, data query, for example. This is a really interesting merging area. We have things that are well established like Hive, but we've then, you know, and that's had some deficiencies, not necessarily great for interactive applications, typically a high latency with things like the JVM setup, every map reduced job, writing back to disk, those kinds of things. Then we've seen things like, you know, Oracle had the XQuery thing. We've seen things like Impala arriving. And again, drawbacks of that in terms of things like resilience and so on.

Starting point is 00:50:32 And now you're seeing, what, Apache Drill. You're seeing Stinger. You're seeing Presto. You're seeing Phoenix. Spark SQL. Big Data SQL. and spark sql um big data sql and and and and to my mind this whole thing i mean we're in it we're going through an iteration now where we will update what we're doing every month to try and keep tabs on what we think is going on in the apache world so for example you know what's going

Starting point is 00:51:02 on with druid right now? Has it got a future? What's that future like to be? What does it replace? What could we use instead? What are the use cases for it? We've probably, I don't know, probably if I combine everything like search, like workflow and orchestration, like advanced analytics, we've probably got, I i mean we could probably make a hundred a hundred different products a hundred different components that we could we could think about at the moment it's interesting isn't it i think from my perspective it's got it's got a few impacts and

Starting point is 00:51:37 so some of the projects i've worked on and the impacts of this is is that because of the way these things are built you know typically they're hand coded you know hive hive scripts are typically written by hand and and so on is that these projects become baked in for that technology kind of like evolution and and there isn't the budget there isn't the inclination there isn't the kind of time really sometimes to move on to say spark or stuff like that because of i suppose the nature of big data projects is they're very kind of experimental and very kind of like, you know, disruptive and so on. You find that, you know, there isn't the ability to kind of move from, say, sort of Hive to Spark or whatever in customers. And they tend to be locked into a certain version, you know, which is a shame.

Starting point is 00:52:14 And the amount of projects I see around that are still using kind of old school Hive, you know, instead of, say, things like Impala or Drill and so on is kind of sad. The other thing really is is that the things people obsess about in this world you know about whether it's kind of low latency kind of sequel and so on and then in a way they're not thinking about the stuff that is actually important to kind of to organize organizations properly like security and and it's to my mind it's it's what you mentioned about maturity of these products and and one thing i've always found very interesting i found this with Spark on a project recently,

Starting point is 00:52:45 is just how much you expect to kind of be there that isn't there. You know, so the kind of security you get in, say, an Oracle or a Teradata or whatever database is just not there now. And it's an interesting world. I think a lot of the stuff that we obsess about, you know, whether it's the latest version of a SQL on Hadoop engine or the latest NoSQL engine, in a way, it's a bit like any open source project. The hard stuff, the stuff that is not interesting,

Starting point is 00:53:07 but it's not sexy, but it's important, often doesn't get done. And, you know, I'm quite pleased to see some of the initiatives coming out of, say, Cloudera with record service and so on there. But again, you know, I think what you tend to find is that there's these kind of big groupings of vendors. You've got the Clouderas, the Hortonworks, the Mapars, and they're all doing their own take on security, their own take on sort of sql and so on as well and it's very you know it's it's quite saddening in a way coming from the world of etl tools and and in a way database is being a solved problem we've kind of gone back 20 years

Starting point is 00:53:39 in a way and we've got this very kind of balkanized and very kind of like you know low maturity sort of systems but again i suppose we might sound up the mainframe people of years ago that were complaining about these kind of mini computers that are coming along that couldn't do the things that mainframes could do but they took over and it's it's i suppose it's classic disruptive technology really but going back to the job that you guys do around architecture and thinking about these kind of bigger problems this is where i think it adds that kind of level of kind of almost adult supervision that sometimes you don't get when these initiatives are driven by the driven by it or driven by particularly developers yeah yeah i think that's right i mean it's funny you you undermined uh the

Starting point is 00:54:16 joke i had coming which is of course if you if you think it's sad that people are still using hive there are there are customers out there still using Teradata, which I find even more appalling. But surely that's the reason I go back to the architecture every time. If we take that approach where we say, look, there is this piece and it's all about innovation our main focus on it is innovation it's not data or record it's not important um stuff that's actually running the business it is stuff that we can afford to get slightly wrong and it is stuff that we will reinvent but if we can create sufficient differentiation in it for a year then good enough it's paid for itself and then we accept for as

Starting point is 00:55:12 long as it lives it's a good thing and then at some point it's going to die if we accept that it is delivered in that multi multi-paced uh form now in general then why should information technology be any different what architecture it seems to me gives you gives you the right to do is to be able to classify that capability and put it in the right place and as long as we do that i think we're in a good place i mean people ask me about whether i think that the oracle database is still uh a a sufficient a thing that you're going to need for overall information delivery in the event of big data and and so far i've got to say yes it is because it is that it's that line that you talked about earlier between discovery and innovation and then actually exploitation of mission-critical information.

Starting point is 00:56:11 And at the moment, the one thing I think customers can still bank on is that eventually the goodness that you get out of that big data part, you're going to post off into a data warehouse somewhere. Now, I'm certainly not arguing for the preparation of data warehouses. I'm certainly not arguing for lots and lots of data modeling going on. And I just think, I mean, you referred to mainframes. There's still a lot of mainframes around. And it seems to me that these technologies that are emerging, they will complement everything that we're talking about. But you have to embrace the side that says this is fast fail. But it's also something that even if it succeeds, it's half-life is probably significantly less than the things

Starting point is 00:57:08 I've been used to up until now. And as long as you accept that, as long as there's a return on investment for that, then I think you're good to go. Okay, brilliant. I mean, that's good. Well, Andrew, thanks very much for coming on the show, and it's always interesting to speak to you, really. So thank you very much for that, Andrew. And yeah, basically, thank you you very much and uh yeah thanks for coming on the show cheers cheers it's been a pleasure thank you take care okay Thank you.

CODACE Plant Stand

Drill to Detail - Drill to Detail Ep.4 'Reference Architectures Revisited' with Special Guest Andrew Bond

Mark Rittman is joined in this episode by Oracle's Andrew Bond, to talk about their Big Data Reference Architecture two years on"...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Drill to Detail - Drill to Detail Ep.4 'Reference Architectures Revisited' with Special Guest Andrew Bond

Mark Rittman is joined in this episode by Oracle's Andrew Bond, to talk about their Big Data Reference Architecture two years on"...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.