Drill to Detail - Drill to Detail Ep.4 'Reference Architectures Revisited' with Special Guest Andrew Bond
Episode Date: October 11, 2016Mark Rittman is joined in this episode by Oracle's Andrew Bond, to talk about their Big Data Reference Architecture two years on"...
Transcript
Discussion (0)
Hello and welcome to Drill to Detail, the podcast series about the world of big data,
business intelligence and data warehousing, and the people who are out there leading the
industry. I'm your host, Mark Rittman, and Drills of Detail goes out twice a month
with each hour-long episode featuring a special guest,
either from one of the vendors
whose products we talk about all the time,
or someone who's out there implementing projects
for customers or helping them understand
how they work and how they all fit together.
You can subscribe for free
at the iTunes Store podcast directory,
and for show notes and details of past episodes,
visit the Drills of Detail website
at www.drillsofdetail.com, where you'll also find links to previous episodes and the odd link to something newsworthy that we'll probably end up discussing in an upcoming show.
In this episode, I'm very pleased to be joined by someone I've known actually for many years.
First in my days as a consultant, where I helped his company implement an Oracle financial planning application way back in the early 2000s, but more recently through his work as part of Oracle's enterprise architecture team,
where he and I collaborated on several updates to the reference data warehouse and more recently
big data architecture. His name is Andrew Bond, and like me, he's a fellow Brit. And so, Andrew,
why don't you introduce yourself and the architecture work you do at Oracle?
Yeah, okay. Thanks, Mark. It's nice to be here and it's nice
to think about old times as well. My role now is I head up both the what's called the cloud
enterprise architecture team, so the client advisor team within Oracle across Europe, Middle East, Africa, and also Asia Pacific.
Those two teams, ostensibly, the client advisor team is really to have a business transformation
conversation. And they tend to work with our biggest accounts, our key accounts,
to drive a conversation around, for example, cloud adoption, digitization,
enhanced transformation, and transformation of the core business. And then the architects,
the cloud enterprise architects, are really there to talk about how that can be made a reality,
both in terms of how the building blocks
that we can use to create that but also in terms of creating a roadmap for IT
adoption and as the name implies increasingly they're building solutions
based on cloud and I mean that in the broadest sense um but typically cloud-based technologies
interesting interesting so so in the first episode of this kind of podcast i had um stuart bryson on
the call and we talked about the uh the reference architecture that we worked with you on a few
years ago particularly the uh the one that incorporated some of the thinking around big
data and some of the ideas around kind of execution layers, innovation layers, and so on. And we talked on that call about,
I suppose, in a way, a couple of years on, how do we think about that? How much did we think that
the architecture that we worked on and talked about with you was being used? How much of it
maybe in retrospect is maybe not so kind of relevant now? And also we touched on how cloud
would affect it as well. And I know later on in the call, we're going to talk about kind of relevant now. And also we touched on how cloud would affect it as well.
And I know later on in the call,
we're going to talk about kind of cloud,
particularly in context of kind of big data.
But for the listeners on the podcast,
do you want to kind of talk about the reference architectures
that Oracle do and that we worked on,
particularly this last one and some of the thinking behind
the incorporation of, say, big data and fast data
and that sort of thing?
Just to allow first of all, what it's all about.
Yeah, okay. big data and fast data and that sort of thing just just that alone first of all what it was all about yeah okay so i think historically we we had we'd found that the and and and we were sharing experiences with you from a long long time ago on this we had all of the components that were
needed to do good BI solutions,
good data warehousing solutions.
But I think you and I were both seeing projects
where the technology may be great,
but we were getting a lot of these things wrong, frankly,
and our customers were getting a lot of things wrong.
And we started wanting to build a reference architecture
to encapsulate best practice.
And this is going a long time back now.
But probably the first iteration of that started, what, eight, nine years ago,
something like that.
And we went through several iterations working with customers,
working with you in particular and other organizations like yours.
And we came up with the latest iteration, as you said, a couple of years ago.
And it was really building on things that we'd done in the past.
I think we'd found we'd moved already away from a kind of historic view of things being BI and data warehousing. And we
were going off down a route where we were talking about slightly more agile ways of delivering
results. And I remember even in the second generation, we were talking about things like Query Federation and those sorts of things.
But I think what we found was that there was, particularly in big data sense, there was a lot of confusion about what big data actually meant. I mean, probably two, three years ago, I think you and I were probably both getting
quite frustrated about these terms like unstructured versus structured data and an understanding of
what that actually meant. And we kind of zeroed in on really, rather than this being about types
of data, it was more about being able to deliver results fast and almost like a pace layering type approach to information delivery.
And I know with you and with Stuart in particular, we were having a lot of really interesting discussions about the agile and scrum type methodologies for delivering an information discovery process and that becoming part of the overall information delivery
were the key things that big data type technologies could be exploited for and could support.
At the same time, as you said, we started to see what had traditionally been on the periphery
as maybe the last thing you did put into the heart of the architecture.
And we'd seen that with some data warehouse and good BI implementation.
But big data really enabled that.
And particularly the piece around, as you said, streaming type analytics and fast data.
And what I would term, you know, IOT type architectures.
And we really started to see these move to the heart of it.
So one of the things we developed was a very high level, in fact, higher level than we'd had before, conceptual model.
And that's pretty much worked for us. nice way of introducing best practice and architecture and architectural thoughts
at a very very high level to not necessarily a tech savvy and certainly not a data warehouse
savvy audience and then we moved on from there to really overhauling the logical architecture and
specifically introducing techniques around data reservoir and,
and moving from what you could term information discovery through to the
consumption of a system of record type data.
So you could almost lay over this, the PACE layers from Gartner,
but you could take them rather than being systems of innovation,
systems of differentiation, you could almost say this was data of innovation,
this was data of record and so on.
And we had a nice facility for doing that.
So I think those things really worked and we got a lot right there. were and still do uh uh uh particularly came down to um certain elements of physical choice
and i think we'll talk a lot more about um things like um tapping to the apache
uh toolset a little later on um and and choices around um uh things like polyglot versus multi-model type choices, specialization versus consolidation and standardization. a lot of desire by organizations at certain levels and, and almost two competing forces with a lot of desire to exploit new technology,
which was great. But at the same time, you know,
a lot of desire and understandable desire from it to try and,
uh, keep control over the proliferation of technology.
And I, I, I still, I i what we've had to do with the
architecture i don't think the architecture per se does a great job in explaining where and what
to use for certain use cases um and we've had to do a lot of work particularly in terms of documenting things like the Apache Zoo. The
other thing that we probably didn't do a great job of explaining first of all and partially because
I think we weren't really thinking in these terms, but was things like Lambda architectures.
So we started to see Lambda emerging as a trend.
And I don't think we'd really worked out, and I don't think our customers had, the use cases for it.
So, for example, we'd find Lambda being considered,
and we didn't really have good rules of thumb,
and the architecture didn't really support
when we should make decisions on that and when we shouldn't.
I mean, obviously, particularly in the context of you,
the way you talked about it before in terms of fast data
and streaming and those sorts of things.
Lambda was intended for this ingestion and processing of timestamped events.
And we saw, and certainly rather than sort of overwritten data and state being determined by, you know, natural time-based ordering of the data. I mean, you know, it'd come out of the sort of social media space.
And so typically the space it had come out of was the events were mutating over time
and maybe the accuracy wasn't typically required. Whereas we were certainly historically,
the architecture that we had was coming from the point of view that actually
consistency and accuracy were very important.
And this is where I start to think, you know,
that being able to classify data according to the architecture became really
important. And as as i said you know
we started the combination of that uh desire to um event stream process data in an extremely fast
or very fast type way uh to better capture data capture events have different message types and being able to exploit
them quickly whilst at the same time being able to use reliable data so for example I would have
events that are happening to a customer right now and transactions that a customer were making but
but my my next best activity may well
be determined both by what's happening now and also by what goes on in the future and what's
gone on in the past in terms of things I've derived about that customer that became really
interesting and challenging first of all and also more and more technology driven. So, you know, I mean, what tended to happen was we would have conversations where people not only told you that they were going to do Lambda,
but actually that the technology choice that they were using was, I don't know, Kafka and Flume or some other physical aspect and then there was a piece that we had to do then in terms of
sort of revisiting architecture a little bit and revisiting architecture best practice to
to make sure that we were actually doing the right thing so so yeah Andrew I thought that
looking back I mean you meant you I heard you know you mentioned at the time about the start
there you know it's been several years and it has been, you know, and several years in kind of, you know, internet time and big data time is a long time.
And so I think certainly it would be interesting to think about, you know, what the architecture would look like now.
Would it be any different?
Would it be, you know, would it be the same but i think particularly some of the things that worked for me in that architecture were first of all that it kind of legitimized big data brought it into the kind of the the kind of
the view of the people that do enterprise architectures and the customers that we deal with
and that was good i think particularly the the kind of separation of innovation and execution
that for me if you took one thing away from the architecture um and for anyone listening we're
going to put the reference architecture link on there on the show notes so you can read it and so on it's a presentations
and some pdfs and so on but in this architecture it made this distinction between execution layer
and innovation layer on the basis that customers that do kind of you know they do start a big data
project they might well start it off as a kind of like a skunkworks project and then that goes
into production but then to further innovate you need to still have that sort of that ability to
to try things differently try different tools and so on so that for me was was really good and i
think um some of the bits that were a little bit unclear i think in the architecture then were
things like a data factory you know what does that kind of what does that mean and so on but
in general i think it was kind of good but But again, looking back, some of the things that have happened, I suppose, since then, or things that are on people's minds a lot is things like self-service BI, the whole kind of rise of kind of tools that don't kind of like require the users to do data modeling at the start and so on and do things from there.
I'd be interested to get your view on kind of, you know, where they fit in and so on.
But also, I suppose, one question that I often used to have is you know where does the data reservoir go because
you know you could say it's a system that is being kind of maintained and it's a very much
a kind of like run the business system but there's obviously kind of like less controls in its schema
on read in there and so on I mean just going back to that and I want to get on to things like cloud
later on and so on but did you have any kind of questions from customers about things like self-service and data reservoirs and where they fit in?
Yeah, I mean, so we had an awful lot of these conversations.
And I think, to your point, one of the things the architecture's never done is describe um because i think there's as much here about architecture and technology as
there is about operating model and governance and those sorts of things and definitely i think
the one thing we weren't being prescriptive on and still aren't is you know um operating model
and who controlled the data and what went into the lake or reservoir i
mean i you know we we even i was pleased to hear you say reservoir because we liked the term
reservoir because it it implied some rigor and some control uh and um in terms of physically how it manifested itself i i think that was one of the things i
was trying to hint i that was that was a real challenge i think it i i my belief is it still
is a real challenge when i look at what we do now in terms of things like choices on persistent stores or choices on persistent data stores or what we're
going to do about workflow orchestration, what we're going to do about data movement.
I think those are hugely challenging, not least because the technology tool set that we've put underneath it is rapidly evolving um um and uh we need to stay abreast of
probably a whole series of technologies that we didn't need to know in the past and that new
capability introduces new architectural uh possibilities so um i think the principles
of the architecture um and some rough rules of thumb are still applicable so you know i
my guiding thoughts are you know we're going to do polyglot but we're going to do it by need
rather than by religion we still want to aim for a small tech estate.
We want minimum number of moving parts and technologies.
We, at the same time, want the shortest data chain that we can possibly have
and the minimum number of copies.
And a lot of our original thinking, I think, around that, and it still holds. And I certainly think,
to your point about bringing big data into the enterprise class, I think it's probably fair to
say that we saw a fair few projects, first of all, where they were probably developer-driven,
perhaps by people that didn't entirely understand data and information exploitation so much as understanding
MapReduce and therefore proliferation of data was starting to rear its ugly head again and we could
take the architecture and critique implementations based on that. Striving for data consistency
and if you couldn't immediately get it, trying to get there as quickly as possible and introducing things like the ideas of unified processing logic.
I think that was those rules of thumb still hold.
The complication for me, and I think where you need additional pieces of information is in particular in the physical mapping.
I think that's where we need to supplement a logical architecture
with what we would describe as almost like a dictionary of the Apache Zoo.
Interesting, yeah.
And we'll get onto that later on.
I think that would be an interesting area.
I guess as a kind of slight tangential bit with it the thing i find with with any kind of customer or any kind of organization doing big data is often like you say it's very developer
driven and it's almost like kind of the days of the fall of the roman empire in that there's loads
of projects going on like the barbarians sort of thing and nobody wants to talk about data modeling
or data consistency or or you know the old vendors and that sort of thing in a way you know do you find that you took these these enterprise
architectures you're finding they're being used i mean beyond the kind of people you speak to in
the architecture department you know do you do cut it right to the bottom down do people believe in
this or do we have to do an education job with people um doing big data projects about the reason
for an architecture and the reason for these sort of things even reason with data modeling as well I mean
do you find that there's it's a person away do you find that the work you're
doing in enterprise architectures is getting taken up by the developers and
yes but it and I don't think necessarily this has changed, Mark. It's just that new technologies have enabled us to make more mistakes more quickly.
I think even historically, I mean, going back to when you and I were a little younger, people were making the same mistakes quite often.
And so there is definitely at the moment,
I think these things come in waves,
but there's definitely a trend at the moment towards more stylification of BI
and there has been ever since big data took off.
And whilst I talked warmly about information architecture moving to
the part of things like IoT and customer experience type architectures, at the same time,
whilst the information delivery within that may well have subscribed to parts if not all of the reference architecture
was that true necessarily at an enterprise scale probably not uh in the vast majority of cases and
could more have been done at an enterprise scale to uh both control data and um and steward data in a good way.
I think almost undoubtedly, yes. extent um we are all of us as as as um big data professionals as enterprise architects and so on
i think we're probably all guilty to some extent of failing to address that point and we are still
um going down the road of um fairly siloed information delivery. And of course, the more
we, it depends entirely on whether you think there's a role for the kind of the city planner
type enterprise architect in the customer organization or not. The more and more we're
solution driven, the more this stylification is likely to happen
and if it is then the best thing we can do is make sure that the architecture that's put
together for that for that specific silo for that line of business and line piece at least that
subscribes to the reference architecture in microcosm even if we can't do this at the at the grand layer um i data modeling i i'm i i kind
of shudder um because i i think huge mistakes were made by data modelers historically in terms of
going away in a darkened room for 10 years and coming back with is that what you wanted answer
probably not or if even if it was, yeah, but now
I want to answer the next question. I think that was something that you and I and Stuart really
latched onto about big data was that we didn't have to put that up front. I think data modeling
got a bad name for itself because it took you ages to deliver anything. And then it took you a long time to iterate.
It seemed to me that with big data technologies in particular,
we could iterate much more quickly because of the schema on read factor.
Yeah, I think, you know, in a way,
we had a debate on Twitter yesterday, as you do, about data modelling.
And it's an interesting area.
I mean, I think, you know, in a way,
data modelling is an industry that's ripe for disruption, really.
I think that the kind of the people that do data modelling
are often the most kind of pedantic,
the most kind of dogmatic people you can get,
you know, for good reason, the same way you might make...
Yeah, and it sort of struck me that it was definitely an industry
that was ripe for disruption.
And if you look at when the Gartner report came through, the Gartner Magic Quadrant, and there was the kind of modern BI platform,
and some of the ideas they came up with, which was that data modeling should be optional,
and it should be as automagic as possible to get scheming out of the data.
Your first reaction was, well, that's ridiculous.
But actually, you can see why customers would not want to kind of spend months and months and months doing some kind of data model design
that would be kind of very rigid and and then as you say by the time the questions come along
um yeah the questions changed but i think also probably the reaction against it was was too
extreme as well to sort of say we don't need that at all and like all these things is that kind of
like boring but but kind of you know probably sensible thing in the middle of some of its needed and some it's not but i think certainly i think like a lot of things i think
the days of customers paying for very kind of large scale bi projects with lots and lots of
upfront data modeling they're just gone really so in a way like you said with the architectures
you've got to accept that's not going to be there anymore and it's how can you make the best of what
you've got and how can you still try and sort of drive through that quality and so on.
But that for me, you know, data modeling and self-service, they're kind of areas that have really changed things.
Another area I want to get onto really that I think is another big change that's happened since that architecture was done is cloud.
And I think like a lot of things, any kind of new innovation in any industry, just do this just do you just do the this new thing in the old way so a lot of cloud um you know bi implementations that i saw
you know by the start or even big data ones we're just porting the same thing into cloud and running
it kind of in a shared server rather than you know your own but but how how have you guys has
oracle seen cloud affecting big data enterprise architecture is it just a case case of putting it all into Oracle Cloud and running it there?
Or fundamentally, has it changed things or did it give us new options really?
What's your take on cloud and Oracle?
Yeah, I think this is a fascinating area.
And it's probably gone faster than I thought it would. And I have to caveat what I'm about to say by I think this is where working for Oracle, I probably get a slightly different view to maybe some of the other vendor employees.
I mean, because we do have such a wide capability. And this becomes important
because there are a lot of elements. I kind of want to focus on what I perceived as the challenge,
first of all, which was around things like data ingestion and creating things like multi-tenancy support
and having trust in that multi-tenancy support.
There were a lot of questions around networking, bulk data loads, incremental data loads,
bidirectional and unidirectional data movement and replication across data centers.
I mean, questions around BI tooling, data discovery, exploitation,
questions around things like access to on-premise tools
when your data's sitting in a cloud.
And then most obviously, I think the big question a lot of people were
asking was around security governance organization so you know i want this solution to integrate with
the old app because in many ways this data is my crown jewels i want to comply with regulation in
terms of things like data masking encryption now i would argue that a lot of those things, or audit, monitoring, that sort of stuff,
but I would argue that a lot of those things customers probably should have been thinking
about anyway.
But cloud and cloud-based solutions make people think about them even further. So, right. So, to answer your question, can you
go and take that architecture and kind of stick it onto an infrastructure service? Possibly.
But that's pretty dull, to your point. And probably not a transformation, maybe a, an it cost saving, but probably not a true transformation. And I think that within that, if that's all that you do, you know, there are interesting questions to answer about, um, particularly about things like performance and, um, whether you can deliver the analytics at the speed that you actually require with a fairly standardized infrastructure.
So what we found is that actually we could set up and we have and we have defined end-to-end solutions for big data analytics first of all now and that I think
that's quite startling the fact that we can take not just elements and the silos of the architecture
most most obviously you could take a BI application running off of a sales or marketing or erp solution and of course you can
deliver a bi reporting solution on top of that i mean that's obvious but what surprised me is that
you could actually put together an end-to-end solution with a data landing area,
a real-time data landing area, a data reservoir, a data warehouse,
BI tooling, discovery and development type lab.
And you could do all of that in a cloud,
even though the sources of that data were predominantly on-premises.
And typically, the way we've seen that being architected is you end up with some kind of
on-premise data hub, which is then pushing data out into what is a fairly classical architecture.
So we've used components like data integrator,
then pushing that up into things like our storage cloud,
for instance, our storage cloud service,
then moving that, and you can imagine that as a landing area, moving that along then to a big data cloud service,
an exadata cloud service in our terms,
but more generically a database cloud service,
still running that big data SQL time reach through from one to the other.
You could do all of that.
You could do the data discovery, again, through those cloud services
and the interpretation of the data through things like
big data discovery cloud service, BI cloud service.
Now, I'm well aware I've kind of gone away from architecture a little bit here into products, and forgive me for that.
But the reason I do that is because what's really interesting about this is when you have that ability to have an end-to-end solution within a single cloud provider.
And I think what I've seen less of, and this may be because I work for Oracle,
but I guess what I've seen less of is an end-to-end cloud solution
where that cloud solution is heavily distributed across a wide number of cloud
vendors. I genuinely don't think that happens. I think what's more likely to happen in that
scenario is you are going to go for, in effect, an infrastructure as a service approach, which
in effect is a virtualization of various on-premise tooling.
Whereas if we can elevate that to, in effect, PaaS,
and if not bordering on SaaS, I have to say,
then I think this becomes much more interesting in terms of the agility effect that cloud can have
in that fast delivery and that fast scaling
and all of those things that you would expect with cloud, a genuine business benefit to cloud.
So a couple of areas that struck me were interesting in moving this area to the cloud is,
well, first of all, I noticed there's that data as a service that Oracle offer.
And I think this ability to bring in external data,
when that external data could be in the cloud as well,
it's much simpler to bring that in,
especially if Oracle themselves act as a broker.
And I think there's far too little, it's my mind,
external data brought into people's systems like this.
And I think, you know, certainly,
and I think maybe because the way that Oracle market things,
you know, data as a service does not appear
on the same sort of presentations as it does for things like big data cloud service.
But that data as a service thing is interesting.
But also, the other thing that's interesting as well is it's not just BI running the cloud.
It's also the customer's CRM.
It's the customer's kind of ERP and so on.
Did you think Oracle particularly have got a kind of um an angle which is quite
interesting which is you know at the same time you're moving you're moving their their kind of
analytics into the cloud chances are what drove that was moved with the customer moving their
their kind of erp into the cloud and do you think there's a special opportunity oracle have got
there that say kind of you know amazon wouldn't have or microsoft wouldn't have and do you think
that's interesting for the future yeah i mean i mean, I think there's a more generic question
around cloud adoption in general,
which is associated with your point,
which is that I think cloud historically
has been seen as either a fairly dumb piece of infrastructure adoption or SaaS adoption.
Now, you know, I mean, Oracle plays in both spaces.
But actually, I think where cloud becomes really interesting is if you see a cloud as the continuum of IT services. And if you start to see cloud as that,
then I think we are almost uniquely placed
to be able to provide that complete continuum
of IT services, starting with SaaS, as you say,
and then moving down through the stack.
I think it's probably fair to say, as I said, we saw plenty. We saw
lots and lots of adoptions where customers have started with SaaS to be historically CX, HCM,
increasingly now ERP, and you provide the associated business intelligence out of the box, frankly, as you
would have consumed a BI app on premise.
But now you do that out of the box on the cloud solution itself.
And then to your point, you broaden that out.
You broaden that out with other PaaS components, data discovery, and so on.
What's really interesting now is, simultaneous to that, we're seeing end-to-end cloud adoption for the whole thing.
And that's the thing I think that's really shocked me.
I think the previous one was quite predictable.
I think the idea of having a complete end-to-end solution
in a single cloud is amazing, frankly,
and almost regardless of where the data comes from,
to be able to deliver a true enterprise solution with all of the data
discovery elements that we've talked about with with data quality and so on and again i i feel
like we're fairly uniquely placed there i i agree with your observation about um data as a service
now what's interesting about data as a service and particularly when you pick up on the customer experience stuff, the CRM stuff, is that typically it's being pulled through for that reason.
But really, a really decent CX architecture will tend to be based from its information perspective on the kind of architecture that we were laying out in the reference architecture and therefore it's it's the sum of that external series of data sources maybe about customers
but but then but then uh that data is is is merged with uh mashed up with uh data that you've derived internally about those customers as well um to provide that
that overall um best feed of next best activity um and you can combine that in a fast data solution
and and and the fact that you can do all of this in the cloud now you can do an end-to-end solution
in the cloud which is fully auditable that you can get past regulators even in a financial services type environment i think frankly maybe a sign of
my age but the speed at which we've been able to do that has shocked me um and is very pleasing
it's very exciting i mean it's an interesting area and i've been doing a few kind of like future
bi presentations of various user groups over the last few months and years, in a way.
I think that what...
The saying is always that things change slower than you expect and so on,
but I think that an analytics project in a few years' time
will be quite different to the ones we saw several years ago
with a few people, or lots of people, actually,
sitting with desktop tools desktop tools carefully kind of modeling data and carefully feeding it in from from erp
systems into a into a data warehouse and so on i think you know moving to the cloud means that i
think there's much broader opportunity to apply analytics and machine learning and so over kind
of you know erp data in there as well but i think also you know again back to this thing about kind
of data things things will need to be a lot more kind of automatic and a lot more sort of you know i suppose did that
induction of schema out of data and so on i think the days of carefully kind of crafting these
things and and so on and you know back to the things of kind of weekly loads and so on there
will be quite different and and i think that we'll find analytics will be everywhere but it will be
quite different in a way to what it is now and it will be much more'll find analytics will be everywhere but it will be quite different
in a way to what it is now and it will be much more pervasive it will be much more taking in
data from external sources it will be about applying analytics to transactions as they're done
um but i think there'll be far less kind of engineering going on it's almost like you know
going from say people talk about having you know power plants in their factories and now it's kind
of in the grid i think that we'll see massive differences in that way the work people will do there'll still
be work for consultants and there'll still be work for implementers but it won't be kind of
like hand crafting dimensional models and so on it'll be about kind of you know synthesizing data
and bringing external stuff and so on as well i mean i think i think be quite different in a way
and i think that we now think about BI and analytics moving into the cloud
as being just porting the same system running into there, you know, that we have now.
But I think it'll be quite different.
And how it'll be different is hard to tell.
But I think all these things coming together means that it'll be, you know,
hopefully far more value, but far less kind of hand-tinkering, really.
Not entirely.
I think it's fascinating i think you present a rosy view and and and and
and potentially one which um which it is um kind of interesting to business users i i think the
uh you know but but at the same time i i i think you've got that's definitely
one area and the cloud area is interesting from the point of view of automation as you as you
describe it and and also big data preparation big data discovery all of those things where you can use algorithm to derive structure yeah i'm i'm fully with you
what i've also seen now we made reference to it earlier was the is the rise of the developer
and and and that's true in general but specifically we've seen the rise of the developer
in the big data space and you know you know, if I took a subject like data preparation
and integration at the moment,
and I looked at the kind of choices that we had to make there
in terms of the tooling that we would use,
I mean, sure, you could say could um say well you know it's great because
there's oracle's big data um preparation cloud service right so that's good it's and and to your
point it's a visual it's a visual thing delivered over the cloud it's utilizing spark under the
covers we're good to go but at the same as that, you've got developers actually not using a visual interface,
using, I don't know, morph lines or something like that.
You've got cascading coming along,
which will require a plugin to visualize the data pipelines,
but is more likely to have been frankly the logic is
likely to have been a series of pipe assemblies and so what i think is happening is that there is
a developer um a developer aspect of this and a developer orientation of this which i think is quite
different and and quite hard for us to get our heads around i mean i you know my my seven-year-olds
coding python at the moment and and and and maybe there is gonna be a very significant
number of people who don't necessarily sit in it
but have developer skills exactly exactly and i'm not there was a there was an article i think i i
posted it or tweeted it a while ago and i think it was on the cloud era blog and it was it was
about doing bi in python and it was just pages of code and you know very productive and it's
very different to what we do and i think i posted at the time yeah this is the future of bi it's not going to be graphical you know and that's
kind of slightly ironic or slightly pessimistic or whatever but you know it is interesting to sort of
see i think i was at a cloudera sessions event in amsterdam a while ago and and a guy stood up and
very proudly kind of said you know i'm now doing my data loading using kind of scripts and python
and so on and you know you felt like saying you, have you heard of the idea of ETL tools?
But you'd been like the kind of the dad at the party, wouldn't you?
Telling people to kind of like put an old record on.
Yeah, people don't want to hear that.
And it's, but I think one thing, the last thing I want to talk about really on that topic,
and, you know, you mentioned at the start is just the amount of these products that are out,
the amount of kind of Apache products.
And you mentioned morph lines, you know,
there's, I think for every day and for every vendor,
there's a new kind of like data pipeline tool,
a new kind of like NoSQL storage and so on.
In your architecture before,
you're very careful not to get into specifics of products and so on.
But, you know, why do you think this is an interesting area?
And why do you think it's something that you keep coming back to really
as being an area people do think about?
Give us an outline of why this is an interesting area and why do you think it's something that you keep coming back to really as being an area people do you think about give us an outline of why you're why this is
so i think historically i i think that our reason for being uh technology choice agnostic with
the logical architecture i think that still applies and that still flies and i'm i'm quite
happy with that i think one of the reasons we did it is that
there were relatively few end-to-end Oracle solutions and typically we were integrating
with something that we needed in architecture that worked regardless of what the incumbent
technologies were and we were trying to encapsulate best practice now to your point with
with big data there is a very large number of um uh vendors and indeed projects in particular various varying states of maturity and a lot of choices are being made it's almost an architect's
nightmare because ultimately what the architects can sit there and do for a while and hopefully
we don't make the mistake of going down the same route as the data modelers and pontificate the
rest of our lives but hopefully
we actually deliver something but but you spend a long time getting the logical architecture right
yeah i'm good with that but then you have to start placing your bets on physical choices now
historically we could yeah you were placing bets on one or two um uh choices which which were among major vendors. And, you know, that could go right or wrong, but
typically, you know, you were on fairly safe ground and the cadence of change was, you know,
fairly easy to handle, fairly easy to absorb. But now that's not the case i mean i i kind of look
at the projects that i saw being done two three years ago where people were hand coding map reduce
and i just want to cry because you have to get used to the notion now you are going to throw
all of that away all of that code that you saw there mark two years ago and you talked about i mean i mean that stuff
is going to go now you there's one or two things where either you say well you know this can't
possibly be right and we're going to run away from it or you simply accept that the cost of
delivering bi now is that cadence of code change i mean a great example is, you know, data capture and delivery.
I think, you know, you and I saw the early iterations of Flume and then Flume Next Generation
being completely different.
And then along comes Kafka and we're thinking, OK, well, this is a different way of thinking
about this.
We've now got something where, you we've we've addressed some of the issues
of flume it does other things at the same time um we need a framework underneath that like storm
or samsa or something like that and then brilliantly of course we start to learn that we can combine
the two and we come up with a fantastic term called FLAFCA where we can embrace probably the two use cases of traditional
message broker website activity logging with the aggregation capabilities but at the same time you know still not necessarily dealing
brilliantly with things like um bulk load and trickle from dbms into hdfs or um struggling to
deliver mission critical sensor data to a cp sync or something like that i mean so i the the problem is with this polyglot thing is it's
very very granular first of all so your choices are are tough in the first place and then the
cadence of change you know i and i we've we've got to a point now where certainly from an architecture perspective, we basically keep a set of architectural guidelines for physical choices. evolutionary maturity versus and the other element that we're considering here is the
breadth of adoption and mind share of a particular piece. So,'t know, data query, for example. This is a really
interesting merging area. We have things that are well established like Hive, but we've then,
you know, and that's had some deficiencies, not necessarily great for interactive applications,
typically a high latency with things like the JVM setup, every map reduced job, writing back to disk, those kinds of things.
Then we've seen things like, you know, Oracle had the XQuery thing. We've seen things like Impala arriving.
And again, drawbacks of that in terms of things like resilience and so on.
And now you're seeing, what, Apache Drill.
You're seeing Stinger.
You're seeing Presto.
You're seeing Phoenix.
Spark SQL.
Big Data SQL. and spark sql um big data sql and and and and to my mind this whole thing i mean we're in it we're
going through an iteration now where we will update what we're doing every month to try and
keep tabs on what we think is going on in the apache world so for example you know what's going
on with druid right now? Has it got a future?
What's that future like to be?
What does it replace?
What could we use instead?
What are the use cases for it?
We've probably, I don't know, probably if I combine everything like search, like workflow and orchestration, like advanced analytics, we've probably got, I i mean we could probably make a hundred
a hundred different products a hundred different components that we could we could think about at
the moment it's interesting isn't it i think from my perspective it's got it's got a few impacts and
so some of the projects i've worked on and the impacts of this is is that because of the way
these things are built you know typically they're hand coded you know hive hive scripts are typically written by hand and and so on
is that these projects become baked in for that technology kind of like evolution and and there
isn't the budget there isn't the inclination there isn't the kind of time really sometimes to move on
to say spark or stuff like that because of i suppose the nature of big data projects is they're
very kind of experimental and very kind of like, you know, disruptive and so on.
You find that, you know, there isn't the ability to kind of move from, say, sort of Hive to Spark or whatever in customers.
And they tend to be locked into a certain version, you know, which is a shame.
And the amount of projects I see around that are still using kind of old school Hive, you know,
instead of, say, things like Impala or Drill and so on is kind of sad.
The other thing really is is that
the things people obsess about in this world you know about whether it's kind of low latency kind
of sequel and so on and then in a way they're not thinking about the stuff that is actually
important to kind of to organize organizations properly like security and and it's to my mind
it's it's what you mentioned about maturity of these products and and one thing i've always found
very interesting i found this with Spark on a project recently,
is just how much you expect to kind of be there that isn't there.
You know, so the kind of security you get in, say, an Oracle
or a Teradata or whatever database is just not there now.
And it's an interesting world.
I think a lot of the stuff that we obsess about, you know,
whether it's the latest version of a SQL on Hadoop engine
or the latest NoSQL engine, in a way, it's a bit like any open source project.
The hard stuff, the stuff that is not interesting,
but it's not sexy, but it's important, often doesn't get done.
And, you know, I'm quite pleased to see some of the initiatives
coming out of, say, Cloudera with record service and so on there.
But again, you know, I think what you tend to find
is that there's these kind of big groupings of vendors.
You've got the Clouderas, the Hortonworks, the Mapars,
and they're all doing their own take on security, their own take on sort of sql and so on as well and it's very you know it's it's quite saddening in a way coming from the world of
etl tools and and in a way database is being a solved problem we've kind of gone back 20 years
in a way and we've got this very kind of balkanized and very kind of like you know low maturity sort
of systems but again i suppose we might sound up the mainframe people of years ago that were complaining about these kind
of mini computers that are coming along that couldn't do the things that mainframes could do
but they took over and it's it's i suppose it's classic disruptive technology really but going
back to the job that you guys do around architecture and thinking about these kind of bigger problems
this is where i think it adds that kind of level of kind of almost adult supervision
that sometimes you don't get when these initiatives are driven by the driven by it or driven by
particularly developers yeah yeah i think that's right i mean it's funny you you undermined uh the
joke i had coming which is of course if you if you think it's sad that people are still using hive
there are there are customers out there still using Teradata,
which I find even more appalling. But surely that's the reason I go back to the architecture
every time. If we take that approach where we say, look, there is this piece and it's all about innovation
our main focus on it is innovation it's not data or record it's not important um stuff that's
actually running the business it is stuff that we can afford to get slightly wrong and it is stuff
that we will reinvent but if we can create sufficient
differentiation in it for a year then good enough it's paid for itself and then we accept for as
long as it lives it's a good thing and then at some point it's going to die if we accept that
it is delivered in that multi multi-paced uh form now in general then why should information technology be any different what
architecture it seems to me gives you gives you the right to do is to be able to classify
that capability and put it in the right place and as long as we do that i think we're in a good place
i mean people ask me about whether i think that the oracle database is
still uh a a sufficient a thing that you're going to need for overall information delivery in the
event of big data and and so far i've got to say yes it is because it is that it's that line that
you talked about earlier between discovery and innovation and then actually exploitation of mission-critical information.
And at the moment, the one thing I think customers can still bank on is that eventually the goodness that you get out of that big data part, you're going to post off into a data warehouse somewhere.
Now, I'm certainly not arguing for the preparation of data warehouses.
I'm certainly not arguing for lots and lots of data modeling going on.
And I just think, I mean, you referred to mainframes.
There's still a lot of mainframes around.
And it seems to me that these technologies that are emerging, they will complement everything that we're talking about.
But you have to embrace the side that says this is fast fail.
But it's also something that even if it succeeds, it's half-life is probably significantly less than the things
I've been used to up until now. And as long as you accept that, as long as there's a return
on investment for that, then I think you're good to go.
Okay, brilliant. I mean, that's good. Well, Andrew, thanks very much for coming on the
show, and it's always interesting to speak to you, really. So thank you very much for
that, Andrew. And yeah, basically, thank you you very much and uh yeah thanks for coming on the show
cheers cheers it's been a pleasure thank you take care okay Thank you.