Drill to Detail - Drill to Detail Ep.99 “Is the Modern Data Stack Dead?” with Special Guest Chris Tabb
Episode Date: February 15, 2023...
Transcript
Discussion (0)
so welcome back to a new series of the drill to detail podcast and i'm your host mark rittman
i'm very pleased to be joined today by someone I knew and worked with around eight years ago or so.
We went our different ways then, but like great minds think alike, find ourselves both running consultancies specialising in the modern data stack.
So welcome to the podcast, Chris Tabb, and it's great to have you as our first guest in this new series.
Thank you very much, Mark.
And yeah, it doesn't seem it seems like yesterday we worked on that project
but it was quite a long time ago now
wasn't it?
Thanks for the show
and I think as you say
the paths have led us in the same direction
and both
focusing on the modern data stack
which is a hot
topic of mine and something I'm quite a lot
passionate about and I think
using our experiences that we lever leverage from back in the day,
you know, before the big data bubble.
Yeah, I think it helps us understand what really is needed
and what's required and what the definition of modern data stack really is.
Chris, so if anyone doesn't know you,
so you're pretty kind of high profile on LinkedIn
and on various social media sites.
But more importantly, you run a co-founder of a consultancy called Leap Data.
So just kind of start by just telling everybody what it is you do, okay, and the role you do within Leap Data.
Yeah, so I'll start with who Leap Data is.
So a relatively new consultancy company,
come together with myself and two other founders
that had worked together over that era of the data platform,
data warehouses and data clouds now.
And we came together after a couple of successful
snowflake implementations when we were other side of the fence
and thought there's
something here and we set this consultancy up and um yeah we've we've uh we focus on the modern
data stack and i think what i also like to say we focus on business value that's delivered by
the modern data stack um so we're very much um business focused technology supported or the ways
of getting technology to support that i think rather than some people you know start with technology and find the use cases afterwards um and yeah my my
my um what i do in the company now so due to the expansion and we're now working in the us as well
as the uk um i now run the cco function so i'm the chief commercial officer um i think i did a post
about this the other day of what i actually do people just see me probably traveling around doing conferences
doing these podcasts and taking photos or selfies with some fellow data community people but
i think behind the scenes it's getting to talk and discuss the different trends the different
movements in the market you know the different technologies out there what's
what's hot what's not and validating ways that we think that we implement for our clients
constantly optimizing that and i think looking to how we can always work smarter not harder
and something that i use a hashtag the mean data data streets. The focus of that is trying to make them less mean,
trying to cut through some of the complications
that the modern data stack has created on its evolution
and trying to look at how you can simplify that
and reduce the time to value for our clients
and the community that listen to us.
I do quite a lot of LinkedIn posts and podcasts like this.
And I think it's a lot about education, data modeling being just one example
of that, which I think the project we worked on back then was very,
very much focused on a very optimized data model that could support what was then a very high volume, high throughput payment processing system.
So I said, I've known you for about eight years now.
And I was kind of surprised to sort of see that you, to link what I was seeing on LinkedIn and the activity of your consultancy with the Chris tab that I knew from before, because the Chris tab that I knew from before was a very kind of
studious data architect who was central on a project that I was working on and quite a lot
of the company I used to work with at the time. As you say, payment processor company, but you
were doing a very technical and I suppose individual contributor type role at the time so maybe
just tell people about what you were doing then okay as much as you can do and maybe kind of like
what led into that first of all so I think in the past you worked in Cognos and so on but what was
the role you were doing then really and how did and I suppose how did that lead to what you're
doing now? Yeah so I think it's always a bit of a blend of different things but
yeah we'll start with that with that payment company where we met so that was i put at the
time for the largest digital transformations that was happening and very complex landscape
very complex architecture and many moving parts and yeah the the role the role there was a very much a technical designing
hands-on role and i love that and i think what i've learned what i've learned from that i can
no longer do that hands-on but now i have a team that you know still feed me that knowledge which
i can still speak comfortably about but just share some of the experiences and the battle scars and
mistakes that happen on some when when you when you embark on some of those sort of projects.
Yeah, it was very much an Oracle-focused implementation,
complex billing process.
And I think my role there was the,
started off as the data architect,
the lead data architect,
and then I think I became the chief architect
towards the end of it.
But working on something that well I feel it's a really good challenge met loads of people look a lot of them for example
two of those people I met or three of the people on that project well actually
four four or five people that project that now work or involved in in the
company and we're still still still kept in contact with them.
But, yeah, you mentioned how I got into it.
So my journey into data probably wouldn't be seen as the normal routine.
I was working in an administration department for a company called Cognos
and just got involved in working on their products because they had some spaces
on the, on the training courses, started using their own data.
So my background was to some degree,
and I didn't finish university or go to university was,
was business accounts, economics, um, uh, maths to some degree.
I was not that very good at maths. I was okay.
Better than my English at at least, anyway.
But I think I had a business lens,
but then I could go down to technical,
which has allowed me to bounce between those two different roles over my career of being hands-on as a DBA,
as an ETL developer, as a report developer,
collecting requirements because we had to do that back then, but understanding what requirements look like, how it feeds
into projects, looking at how to deliver focus on it.
So I think every aspect, including that payment companies, provided me some good foundational
skills and a great network.
I think you're no one without your network and i've got a fantastic network that um you know i help me with
my knowledge and things i always give advice i also keep you real as well than some of my ideas
um which is always a key skill that you you know you need from a a team to call out when something's not possible.
And, yeah, so since that payment company, I think it just kept,
you know, we had that big data era.
Yeah, so, yeah, big data era.
Maybe that leads on to, like, we're talking about the,
what I call the evolution piece really so um i i yeah talking about my career and
how that how how that's traveled over over the data world over the past 30 years now you know
i started when i was you know to give my age away but i started young obviously 19 i started in the
industry yeah and it's 30 years now so yeah we can do the math um let's let's kind of let's let's
let's just to interject there a little bit.
So you're starting to get into sort of technology, right?
And this is kind of really where I want to go with you on this kind of conversation, right?
So you've been doing this thing now for probably actually maybe slightly longer than me because I didn't get into it until I was about 25 or whatever.
But we've both been in this industry for quite a while now.
And we've seen a lot of kind of, I suppose, a lot of trends. industry for quite a while now okay um and and
we've seen a lot of kind of i suppose a lot of trends we've seen a lot of technologies come and
go um and we're both now working in uh what we what is now called the modern data stack space
which is which has got a lot of similarities to what we've done in the past but it's got things
that are different as well and it's got its challenges and it's got its kind of benefits
and so on so um you know and the project we worked on back in the day the one we met on it
was it was it was it uses elt it had kind of like sql based kind of transformations it had a big
database and so on there um and you know and so there's there's things that haven't changed and
things that have really so let's kind of let's let's start a little bit with your journey and
and and i suppose um but actually before we get into what i want to talk about with the modern data stack you mentioned big data there okay so so something
that happened probably between that big project and what we're doing now is that whole world of
hadoop right so how did how did you how did you get involved in that what was your take on that
to that of interest uh yeah so um i i think i'll just do a bit of background on the data warehouse
and then go on to that big data one.
So go for it in that order.
So I refer to the data warehouse era as the beginning.
You know, I think it was the basic era.
So in the early 90s, you know, the Oracle for your database,
maybe Informatica for your ETL, Cognos or Business Objects for your BI,
you know, three vendors all working, doing what they do best.
But it was reserved for the big players, you know, and it was expensive.
And later on, that moved into that consolidation era
where you've got Oracle having the Exadata, ODI,
that we work with, and OBIE.
So single vendor, but, you know, three major components,
you know, your data storage, your ETL,
and associated things that came with it back then.
So it was more like a one-stop orchestration,
quality, governance sort of product.
And then OBIE, which is the equivalent of your reporting tools
like Dapp, Tableau, and Power BI nowadays. bi nowadays but you know that was the oracle was
offering them and that that was only reserved for the the big players it required a lot of
planning and and uh budget to go and secure that and then then as you say yeah that big data what
what did i make of it so what what that what i introduced was a um an era where it was
open to the masses you know there was many many people would get hold of it it was all commodity
hardware but it was very much technology um focus and i think that that created a bigger gap between
the business and technology um that you know the engineers weren't really connected to what they
were doing you know they were using this approach of just sticking it in a data lake so there was
no ownership no no idea of how it was going to be used i think the biggest the biggest problem
that happened in that is data modeling data modeling just got forgotten the um the view of
schema on read and you know we'll model it later.
You know, it meant that there wasn't as much thought going into how you'd combine this data,
not as much thought of how you'd optimize it for its usage, not only from storage, but also from access as well.
So I think that was a major downside of that big data it was
very hard to tune unless you knew the access path and how it was going to come
in so you tune it for one scenario then you have to tune for another scenario
and it was only really good for large volumes of data and not everyone had the
sort of volume of data that really warranted, you know,
map produce as a way of querying it.
You know, large file sizes where, you know,
you're only wanting to get the odd record from it.
So it didn't quite live up to the dream or what it promised.
And, you know, it got the data into a bit of a bad name, I think,
to some degree of lots of failed projects.
And I joke around, you know, the names of these,
you can go and get a new budget from the board
to go and do another data project because the last one failed.
So that leads, and each of them have given us something
that if we've learned by it and and and um combined the different um different methodologies or
approaches or frameworks you know it led us to the modern data stack you know dbt that we we mentioned
um before that that approach of that data ops approach wouldn't have existed unless
we had influence either from the devops or you know the the do one where you could check in
check out code you could you know you could use your um cicd your jenkins your deployment framework
because it was very easy to look at the code changes it wasn't like the tools we referred
to before like like ODI,
where it's a separate repository where you're putting all the metadata.
So to export it out, you couldn't look at it visually to say what the differences were
and whether to accept them or not.
Or if you did, you'd have to have specialist knowledge.
It was more complex.
More complex to live it or to work in that data ops approach.
And we've now, with the templating, the ginger templating, and the Python wrappers and everything there,
that's something that was good that came out of that big data era
that's now being used in the modern data stack.
I think that the modern data stack itself has evolved. And I think if someone asks you what the modern data stack um i think that the modern data stack itself is is has evolved and i think if someone
asked you what the modern data stack was five years ago they would just list five tran dbt
snowflake and um maybe maybe um tableau look up probably actually if you go back five six years
ago that would have been the blueprint probably.
Looker being that first SAS 1-ounce,
Snowflake not being the first,
but being the predominant one,
and DVT becoming the transformation logic of choice.
And most of them would have Airflow.
That was probably the standard one.
But yeah, things have moved on, I think, now.
Okay, okay.
So there's a few things in that,
what you just said there,
that's kind of interesting,
I want to sort of dig into really so yeah so you you mentioned about big data there and you talk about sort of i suppose it's more technical approach and so on um so so who did
did you find that this type of person that was working on these kind of projects changed a bit
when big big big data came along and that and that that in itself led to a lot of this talk
around cicd and so on so first of all you know has the persona of the practitioner changed over time do you think
100 and i think that that term data engineer got born you know from during that period
before there were etl developers and um i i speak to joe reese, you know, the author of the Fundamentals of Data Engineering.
And I got to read it and help review it as he was writing it.
So I think that role has evolved during that period
and even fragmented to some degree because, you know,
we used to have an ETl developer that used to build
the pipeline but then with all of that cicd you needed more of a devops person so the the skill
set required and i think this is how i end up joint getting getting more into this this um linkedin
and doing these podcasts i think is you know i was commenting on the skills needed to be a data engineer and you know
the list was like 20 30 different technologies and there's no way of one person having that
that's a unicorn so yeah it did influence a new skill set into it i think this is where
um there's there's a team now and especially a little while ago i had to have so many different
complementary skills now
to achieve that creation of the data pipeline
from source right the way through to a modern layer.
So that was dictable.
That was caused by the number of different technologies around them
as well as it becoming more of an engineering sort of
data developer sort of skill set needed
less guis were involved you know the products we were talking about earlier all were very much gui
led okay so so so i suppose the logic so one of the one of the things that i noticed you getting
involved in on on on linkedin other forums is the debate around is the modern data stack dead okay um so
maybe just start off by just explaining what is this kind of i suppose meme or kind of i suppose
trend or or thought going around to say this thing that you know we call the modern data stack it's
it's dead why why is it dead and why people are saying it's dead first of all yeah so i think the
first thing is you try and ask people to define what the modern data stack
is and i did this and this is what i challenged a lot of people out there it was very hard we
got very mixed views of what it was so some answer would go straight down to technology
just list the technologies uh and others others would say that the quoting the reason for it
because it didn't have data modeling in it or it didn't have lineage
and each of these reasons or rationale that i saw for saying the modern data stack was dead
let me say it's not dead it's just not been done correctly you know we haven't looked at some of the
best practice and problems we've already solved in the past and why haven't they why haven't they just been
implemented correctly and i put this down to the the um bubble or the the very high increase in
engineers that entered into this industry but they entered into it in the big data world and
they entered into that and then they moved into the modern data stack. So a lot of that best practice and knowledge that we had prior to that
was reserved to a smaller amount.
And most of those people may be architects now.
I refer to them as recovering architects nowadays,
based on recovering data scientists is what Joe refers to himself as. So it's that lack of knowledge or best advice
that was not being given to the people
that were using modern data stack
that's caused this issue.
So people started coining with the post-modern data stack.
I mean, just because we've run out,
and just because we've run out and and that i said just because you know
your first mod your first implementation modern data stack was didn't go well you know don't go
and call it a post-modern data stack to give it a new name maybe call it modern data stack 2.0
i even joked say maybe modern data stack 10.3 patch 2 it was a good oracle version you know
um but what about the argument which some people are
making which i think has got some value to it saying that um with i suppose the move towards
analytics engineering and the modern data stack you are accumulating a lot of human capital costs
and all the sort of things that you know in a way we went through in the past where
we moved away from say scripting dba scripts or database scripts and we moved to say
graphical tools with repositories you know um and and so on do you think there's a valid argument to say that we're accumulating a lot of
human capital costs in this that people aren't necessarily aware of until it becomes a cost
yes i do i think i think there is there is some truth in that and i think i refer to something
as the meta metadata and um i think that automation and working smarter lens.
So if you think of all of the data as it gets created,
when you have a system or something designed,
they know the attribution of that data,
they know the business usage of it.
If, say, for example, going back to that project as well,
you had all the business services mapped out,
from high-level right to the L5 business services, level five.
So you actually know what attribute is linked to each business service,
how that attribute is stored, and then that metadata in the old days is,
you know, that data will be available, it will be CDC'd somewhere,
and then the data team will go and redo it all over again.
You know, they go and define that that all and i've never seen a project where all that metadata that once it's been collected
or defined at its creation or its inception has actually followed through so i think the
what's what i do like i think is happening and i think this is a good thing is the the world of
operations the world of analytics is coming closer together.
Products, Snowflake having the uni store meaning you can have, maybe not replace OLTP totally,
but you can have more apps running there.
The team's working much closer together.
I think if we can have used knowledge graph information to collect all the metadata that
then can be transferred down to create DDL but also create business services or also then go and follow through into how it's
stored in the analytics to to help catalog it define the ownership um the more the more we
can do with automation and with um metadata driven approaches uh and reuse of metadata metadata management um i think the less
will need that manual effort or or um cost associated to it i mean you can go on the
finops aspect of it as well you know how how um how you can do cost optimization based or
recharging based on business business usage or um you know domain usage of your data
you know supporting data products data mesh approaches where you know you know who's using
it there's good contracts in place um you're reducing that cost of ownership uh and you're
sharing the cost based on the usage and consumption. Okay. Okay. Interesting. You mentioned about, about,
about cost there,
cost attribution and so on.
There was,
I think on Twitter today,
there was a,
there was,
you might've seen it,
but there was a post,
I think from,
from high touch talking about using their tool,
the reverse ETL tool to help with the usage based billing for SAS
companies.
And it was,
I think Lauren was,
was commented on Twitter about it and sort of said
the worst possible thing you can do is use your reverse etl tool to do your customer billing so
what's your take on i suppose activating data and the use of say reverse etl tools for that kind of
oh i mean this is another favorite topic of mine and and actually i did a post actually while i was
with lauren in new york last year about etl in etl out um so so maybe i'll touch
before i touch before i go into my view on reverse etl let's just talk about the use case of going
for that billing and you think of things like socks compliance and lineage and accountantship
and ownership so you know what gets gone what goes into your billing system is it needs to be done by an accountant.
It needs to have some level of controls over it.
It needs lineage to understand where that data has come from.
And it needs that whole end-to-end pipeline supported if it's a production system
or with production controls. So unless you can guarantee or provide lineage
or provide that end-to-end assurance and security controls around that,
who's going to stop maybe some data engineer going and ingesting
another accountancy process and quite easily creating fraud internally
or unintentionally posting the wrong information.
And anything that touches your accountancy system, it needs to have a level of governance controls around it.
And this is where I'll go on to my view of reverse ETL tools.
So if we go back to our day, back in the day if i asked for informatica to bring the data
in and i asked the data stage to take the data out and i asked to go and buy two licenses for it
do you think i would have got the budget do you think i'd still be in this job now
probably not i mean it's too too separate they and this i think this has happened where
you know the modern data science evolved
and the likes of
high touch and sensors
they've come into the market
because
we've created now business
processes on top of our modern data stack
and we've used Fivetran or something
to bring it in and Fivetran doesn't put the data
back out
so I think if you step back and if you were
to design a product now you'd look at something that did both and maybe provided that more of a
governance controls maybe provide that lineage maybe provide the security controls and this is a
a production production class pipeline that needs support needs monitoring on it
it has different different security controls
around it, different governance associates.
I did a
my post was a joke, I had a picture
of two Aston Martins,
ETL in, ETL out.
We need an excuse to buy two products.
So there are products
that can provide that end-to-end
one, the Riverage one that we
work with, not a plug, but there's others as well.
So yeah, that
simplification of modern data stack,
I'd look at
what can provide that whole end-to-end
capability and be very
very careful if you're going to be using it
to
post
accounts
information.
And Lauren would have had a very different approach of tackling that answer, believe me.
Okay.
So, you know, at risk of sounding like a couple of old farts
saying that everything now is new, is awful, and so on.
I mean, so you've got a consultancy that specializes in the modern data stack,
and you were like, I think you were Snowflake partner of the year last year and and all that so what does
it what does that what does the data stack look like that you your company builds and and what
products um do you particularly think are good in the market at the moment then in that sort of the
area we've been talking about yeah so i'll go back to that whole simplification piece as well
so five trends great as an ingest and it's been the market leader for ages.
But you still need an orchestration tool with it as well.
So if I was to break down what I think the modern data stack components are,
and so you need ingest, you need orchestration,
you need ability for transformation, you need storage and compute,
you need some data modeling and data contracts capability.
Observability, I'll touch on that in a minute.
But a data ops framework as well.
And maybe you might need some reverse ETL.
You'll definitely need some visualization.
You'll definitely need some machine learning capability.
So if I was to map that to what I think of the players,
Snowflake, I'll start with them.
Snowflake for storage and compute, I think it's the de facto one.
It's the only real challenger to it, I'd say,
is BigQuery and then certain use cases.
So as a de facto, I'd still think that they've got the competitive edge.
I think their roadmap with Snowpark is great.
Yeah, we've done very well with them and and we like working on that um if i talk about um the
next one which would be you know you ingest your orchestration and optional reverse etl and that
would be riveri um riveri um provides that single SaaS ETL product.
The underlying architecture, I know personally that it's built state-of-the-art on Kubernetes,
sort of like modern architecture from a development perspective.
And it does that orchestration for you.
It does that reverse ETL if you need it.
It has a lot of metadata-driven components,
which if I went back before and I had to build that self,
it's complicated unless you do it properly.
And I know that getting someone like yourself in as well,
you know that we designed well,
but a lot of time we get in and the horse is bolted.
So having that ingest orchestration reverse ETL
with transformation that you can pull out the
dbt or you can run your own scripts from it and that that would be what I choose
there I think another cool player in that market is coalesce so coalesce
we've seen them quite a while I've seen recently it's more of a well it's very
much a dbt replacement or alternative should I. It has that GUI, so your drag and drop approach
to building your data pipelines.
And it provides, go back to that lineage piece anyway,
it provides a lot of that lineage and some of the complexity
with very large-scale DBT projects and maybe some of the in some of the inefficiencies they may they may
introduce because many data engineers work at the same time maybe not with a common data model
um yeah i think having something like that helps encourage that data modeling approach
um i'd say that sql dbm would be my go-to choice for doing data modeling now
um you know having that model be deployed and then you build pipelines that match onto it I'd suggest SQL DBM would be my go-to choice for doing data modeling now.
Having that model be deployed and then you build pipelines that match onto it.
Visualization, I think the thought spot would be my go-to one.
Why is that?
Yeah, let me share my view on that one.
We'll give the other players as well.
So I'll give the other ones that I think are in there as well. looker you know was one of the leaders i think right at the beginning it's acquisition by google has made it more of a google focus than
the licensing model of that it's come a little bit more expensive but you know it's still a good
product um tableau i think that that life cycle of developing Tableau dashboards,
but you need to have Tableau developers, that time to market, time to value,
time to getting the users in touch, the report took a bit of time.
Power BI, you know, it's great from a, if you're already a Microsoft shop,
it did the job, but you still need those developers.
And they seem to encourage you just downloading it to Excel to go do pivot tables and things like that as well so that that's not what what um you
know i think it's the best use of all that data after it's been processed and modeled
so what their thoughts what and you know there's others out there as well um but um thoughts what
it's that self-service capability i think it's It's that if you've got it modelled, it's a very good start schemers,
and you've built some, you know, the metadata on top of it,
which understands the context of data,
you can then prevent the need for having, you know,
an army or loads of dashboards being developed
and not knowing who they're being used by,
and give it to clients or maybe not as tech-savvy people
to actually explore that data, find insight they wouldn't have found easy.
And I think what a dashboard does when you build it,
you give it to someone and they say, oh, yeah, but what about this now?
Then they go off and have to go and make an alteration to it.
If the tool allows that person to ask that second
question and that third question without the need to involve tech again that's what that's why i
like it and i suppose from a consultancy sort of you know um uh perspective you think that's a bit
of an anti-pattern because we know we want to put people and develop it so we do help with
thought spot we help in that initial setup but it setup. But it doesn't require that long-term consultancy assistance from us.
Whereas if it was maintaining Tableau dashboards and things like that,
we'd probably more repeat work.
But yes, that's my reason and rationale for that one.
Okay.
So what's your take on, I suppose, headless BI and metrics layers and so on?
Because that separation, I mean, in some respects, it's nothing new,
having the idea of a semantic model.
But the other bit, I suppose, is who within the organization
would kind of set up and maintain it?
And what does it mean about the kind of workflow?
And what does it mean also about how maybe the industry might get reconfigured
around different ecosystems and so on? So what's your take on metrics layers and semantic models and i suppose the elephant
in the room being the dbt labs one yeah so i mean if you go back and again that that semantic there
is nothing new if i go back to my days on cognos you know you you had the metadata there and they
are the catalogs as they refer to them them, as business object had its universe.
So I think the key thing is centralization of business logic.
And things like data vault methodologies,
which I know you're very much in with as well,
having that centralized business logic layer,
whether the rules are consolidated, managed, governed,
reused is essential.
Whether you do that or – I think it's defining the rules where they go.
And the more reuse, the more I'd like to have them pre-built in the database.
If you have some sort of complex um in different teams using different products
maybe some things could be centralized in maybe your thought spot model or your
um maybe you're using microsatural or something so you may have some some logic put in that but
the the key to getting one source of the truth and making sure everyone is reporting the same way is making sure those business rules are in one location. in modeling DataVault or a similar approach, as long as it's structured in a way
that allows that business logic
to be put in one place and managed,
that's the end goal.
How you achieve that, I'm less concerned on.
Moving on a little bit from, I suppose, products and so on,
you've mentioned a few things around data modeling there.
You've mentioned about,
I think you mentioned data contracts earlier on and so on and you've also just mentioned about centralization of
logic okay so probably one of the one of the sort of the again one of the trends or or things that
are interesting industry at the moment is around the idea of data contracts and data meshes and so
on right so and and what's your take on that and what's your take around what's your take on that? And what's your take on, I suppose, the idea of there being a central kind of warehouse with all the logic in there?
And I suppose centralized versus decentralized.
What do you find works in practice over the years?
Okay.
So I'll start to touch on data contracts first of all.
So anyone from a development background, so it could be your API contract. So it's a metadata structure or structure that's defined with metadata that's used to transfer data between one location and another in a standardized format.
So, for example, if we went back to the old world of ISO codes, you know, ISO codes provide structure and they provide consistency
on how we do currency codes or country codes.
And then you can move on to more
with like payment contracts.
You know, a payment contract was an ISO 8583
that had all the attributes needed
for you to make a credit card payment
or a direct debit payment.
No, sorry, a debit card payment.
So data contracts have been around in the application
world for many many years um i think why they're why they're becoming more prominent in the in the
data world now is we've got these data product we've got a data product concept we've got data
mesh as a concept or a methodology and what that provide what that needs now is also what what that requires is a ability
to be able to um exchange information in a consistent way that people can build on top of
so um where where that where that works between the um um the the the in the data product world
i'm sorry that centralized, decentralized approach.
So, and we talked about business rules as well later, and I think maybe touching on the role
of the analytics engineer as well,
a subset of a data engineer.
So what companies or larger companies should provide
is domain-based structures
that can be subscribed to by multiple data products.
And they would be subscribed to by those multiple data products
by using data contracts.
Those data contracts would be, I think when we did that,
I called them data access layers and data input layers,
which are dills and downs.
So it's a way of you exchanging information
between different parts of your data platform
in the same way you would with an API contract.
So why are they so important?
I think without having that defined
and the governance around it,
or the guardrails I like to refer to it as,
your data products or your data mesh um um platform approach will have
fixed dependencies on on things that you'd have no control over so unless these contracts you've
got a they said they're they're available so for um you know and supported and any changes will be
communicated or or any new new attributes will be added in a controlled manner.
You know,
you,
you're building on top of sand or building on top of things that could,
could cause your project to fail later.
And you have a complex monolith application again,
really.
Okay.
So,
so imagine,
so let's put this into,
into sort of practical things.
Imagine your,
your,
your company's being brought in to architect
and build an analytics layer for a reasonable-sized,
I don't know, sort of Series C, Series D kind of company,
and you're talking about a warehouse architecture.
Okay, do you tend to sort of recommend what we might consider
to be a traditional kind of Kimball-style warehouse,
or do you talk about things like data meshes or what's the what's the kind of starting point really from a for a design
that you would kind of like put your name behind really yeah I mean we it's it's always a bit of
hybrid there's no one size fits all and I think the there's a few characteristics that I'd use
um the stability of the source systems.
So for a large transformation or maybe a company
that has gone through many acquisitions,
they may have multiple CRM systems,
maybe another acquisition comes on later,
I'd very much go down the route of a data vault approach
for that one to insulate them from those
and build some solid foundations
that you'll be able
to react quick later if there were any changes and which are also planned changes and it's very hard
to work in those environments to do you know legacy testing and new world testing in the same
place unless you use that approach so that would be one one approach i always am very much in favor
of a kimball kimball insight layer you know or information layer presentation layer um it works very well for things like sports well we're doing dimensions and facts and having
it having it modeled in a way that i think how we how we think and operate you know we think in
measures we think in dimensions we think in hierarchy so um for human interaction analysis
i'd always have that as the the end product presentation layer but then
on the flip side if it's being used to do machine learning i might do a feature model so you know
you'd go that route for that but the key thing to any of these approaches is to have some structure
and not just have a one big data lake dumping everything in there so if if you don't go full
data vault but just at least have some clear either separation of the
schemas that goes what's your what's your customer information your co information so so i always
bring in the data based on the source system it come from so you have that clear separation and
then there's that middle layer that um you know you you'd you'd mix and match depending on the environment you may go data vault you may go
um i'm not going to go for that it's an inman-esque style if he has the simplest building
uh structure and then predominantly a kimball kimball um style uh dimensional facts as a
presentation there with you know sourcing that same data in a different structure but in a
consistent way.
And that's where that business rule aspect comes into it.
Because if you have gone to a situation where you've got multiple different ways
that data is being presented,
you want them both to be able to use the same business rules.
So that's why I'd always introduce a common layer,
which may be what you refer to as the semantic
or business rules layer that any of them can
subscribe to and also if you make the changes make them it can be replayed easily so you can
go and replay and recorrect or or adjust any historical information very easily that that
would be complex if you didn't have that okay okay so just to kind of round things off really i mean
so so you're you're a consultant i'm a consultant okay and one of the things off really, I mean, so, so you're, you're a consultant, I'm a consultant. Okay. And one of the things that really surprised me when I,
when I got back into this world after a little kind of a couple of years I took
out was,
was going to a looker conference and explaining that we were a looker partner
at the time and we were consultants and people not knowing what on earth we
were talking about or why on earth, why they want to hire us in.
And so to you, Chris,
where do you see the value
in a consultancy um these days for people implementing these kind of projects and you
know and where where does that given that given that most companies we speak to really ideally
want to hire their own data team where does the consultant come in and what unique value do we
bring into it do you think so i think there think there's two, probably it's which stage.
So let's start right at the beginning.
And you may have your own team,
but bringing in someone that's done this together as a collective team
in many locations before, the acceleration you can get
and preventing you going down any rabbit holes or
areas that you may not have foreseen without without out seeing it done properly in many
many places and also you know it's what's best practice now um so getting the right foundations
in and the right level of foundations at the right time and a roadmap to get to where you can have
that competitive advantage against anyone in your in your area um building your own team full-time
is quite expensive so i think what we've we seem to do now is we have the right amount of a skill
at the right level so we have our principals we've got our seniors we've got our engineers we've got
our juniors you don't need to have a principal the whole time for our whole project you have them right at the front and then after that you use the right level at the
right price and we can be cost effective i think as an alternative route to you know to having a
very large bloated team with multiple skills because you've got a very complex architecture
um you know we people like us can come in and show where that simplification can happen
where you can actually save costs from an operational cost,
from a simplification of a delivery.
I refer to something as delivery friction.
So we look at all the different components of delivering a data pipeline.
And how can we simplify that?
Sometimes it's a people process and way of working,
or frameworks, or templates, or metadata-driven approaches.
So not saying that everyone in there, those people have a job and their job is dependent on their work.
My mission or role in a company is to work smarter, not harder, more efficiently, quicker time to value.
Maybe you don't need the same size teams you've got now, but maybe your team won't always be the first ones what do you think about what do you think about um there's there's project there's project products
out there like say portable or mozart data that that are kind of i suppose modern data stack in
a box right so do you have any exposure experience to those at all and and and is that an alternative
or or where do you see that fitting in? Is it maybe a different stage in maturity?
What's your thoughts on those products?
So Portable, I know Ethan very well.
I'm going to his meet-up next week.
So Portable, I'd say, he refers to as the longhorn adapters.
So they're not your common.
So they're not your large sales force.
Well, maybe they do that, but they're not the your large sales force and think well maybe we do that but they're not not the big
big irps in fact so um yeah i think it's baked down to a use case and i think it does work
that simplification and it's it's what you need at the side the maturity of a company in the level
you're at what what level of complexity you need we go back to that modern data stack so what's
modern for one is, you know,
maybe not modern to someone else.
If you're like Coca-Cola or Pepsi, you know,
the modern data stack is going to be some severe, you know,
machine learning algorithm that can run against all sorts of things,
optimization on production line information.
So not everyone needs that level.
If you're a two-man band,
a spreadsheet might be your modern data stack so it's it's working out um of the of the skill set skills of the team you've got
of the budget you've got and of your of the complexity of environment and your requirements
what's right for you and that goes back down to me and you being both technologists that have
moved into developing consultancy companies you know i think both of us have that we say what's right for the client when and less you know less pushing the
pushing a vendor-driven solution it's a business value-driven solution that is supported by what
we think are the best vendors okay okay so just to round things up then how do people find out
more about your company um and maybe get in touch with you yeah so um you'll
find me i'm quite predominant on linkedin so there's not many chris tabs so if you look for
chris tab you'll find me uh i use the hashtag mean data streets i've got a couple of couple
of the ones i use is bring back data modeling but i've had to spell it in two ways because Americans have got one L, we've got two. But yeah, and our company's www.leit-data.com.
So you can check out our website.
We are on Twitter as well.
And I think we're also on a,
I'm not too sure I have to check the other outlets as well,
but they're the main ones.
Oh, YouTube as well.
But you'll find me on quite a few podcasts yes
so if you google hashtag mean data streets you'll find a lot more content fantastic well chris it's
been fantastic speaking to you thank you very much for uh for sharing your thoughts on on the
industry and where we are now um good luck with uh with the company not too good luck because
obviously there's uh us as well in the market but but certainly you're a good guy. You know what you're talking about.
There's room for both of us.
Exactly.
But well done though.
And you've done,
the company's done really well.
So best of luck and thank you very much.
Thank you very much, Mark.
Thanks for having me on. you