The Data Stack Show - 10: The Evolution of the BI Market with Huy Nguyen of Holistics
Episode Date: October 14, 2020In this week’s episode of The Data Stack Show, Kostas Pardalis and Eric Dodds are joined by CTO and Co-Founder of Holistics, Huy Nguyen. Holistics takes an approach to business intelligence and data... analytics that they call DataOps. They focus on data team productivity and company-wide access to insights. Important points in the conversation included:Introduction to Huy and Holistics (3:12)Approaching BI with more than just visualization (8:59)How friction between different roles within an organization is addressed by Holistics (15:20)Holistics as a complementary tool (23:25)Describing their own data stack (34:47)History of BI and trends for the future (39:33)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome back to the Data Stack Show.
Today we are going to talk with Holistics, which is a self-service BI platform.
They do some really interesting things relative to a lot of other options in the market. And I actually had a chance to meet the Holistics team in person over a year ago
in San Francisco at a conference I attended. And their team is incredibly sharp, really enjoyed
meeting them. But from a technical standpoint, I'm interested, Kostas, what do you want to ask
them based on what differentiates them? Because there are so many options in the BI space?
Yeah, actually, it's very interesting that we are discussing with someone from the BI space because
as a market, it has gone through like a lot of changes lately. I mean, it's not been long ago
that we had the acquisition of Looker from Google, for example, the acquisition and going out of the public market of Tableau from Salesforce,
the merge of Sisense with Periscope data. So it's a very interesting market. There are many
things that are happening. Products are really competing with each other, trying to differentiate.
And we also have to consider that we have a company that is actually based from Asia.
They're based in Singapore and Vietnam.
And they managed to do an amazing job in expanding both in the United States and in Europe.
So it's very interesting, first of all, to see how they managed to differentiate as a product and how they perceive differently the BI problem and the visualization problem
and how they succeed in that, especially with the constraints that they have, right?
I mean, it's really hard to compete in this space, even if you're in an environment like
Silicon Valley.
It's even harder when you have to do that from completely different culture and times
on, like being in Asia, for example.
And I think the team there has managed to do an amazing job
in building a technology that really stands out
compared to the rest of the products.
I'm really interested to see because they are approaching the problem
from two sides.
One is from the data analyst, of course,
which is the main consumer of a product like BI.
But at the same time, they're trying to approach
and solve the problems of the data engineer.
And we have seen in this show that the data engineer as a person inside the organization
is becoming more and more important.
And it would be great to see how these problems are addressed from a purely BI tool, which
comes from the visualization space.
So yeah, I'm very interested to see what Hugh has to say.
And let's move forward and chat with him.
Hi, Hugh.
It's very nice to have you tonight at the Data Stack Show.
Thank you so much for your time.
I'm very excited to hear about yourself and Holistics and your products and the stories
that you have to share around BI and the data industry in general.
So welcome. Can you please introduce quickly yourself and also say a few things about the company?
Okay. Hi, Kosta. Thanks for having me. I'm very excited to be here.
So my name is Hui. I run this company called Holistics. We are sort of like a data platform that helps data team build and
maintain a central source of truth for analytics logic or business logic, and then expose that
to the business user, to the non-technical users via a very simple interface for them to ask data questions and get answers themselves without bothering
the technical people, the data teams.
Essentially, that's what we do.
That's great.
So, can you give us a little bit more color on the original idea and the background behind
Holistics, like who came up with the idea and the evolution of the idea
because as everything else in this world,
probably it was a bit different at the beginning when we started
compared to how it looks today.
So it would be great to see the evolution of Holistics.
Yeah, okay.
So my background is a software engineer.
I studied computer science.
I have been doing programming all my life.
And then when I studied in Singapore,
and then when I finished university,
I kind of joined this company called Viki, V-I-K-I.
It's actually a US startup, very popular back then,
and still now, that they do you know they they do korean drama
chinese drama taiwanese drama japanese drama so they basically take uh think of it as a hulu for
international audience international movies so then i joined that company as a data engineer
my first job out of college back in 2012 2013 2013. So then we have a very small data
team back there, right? Only two or three people. And then I was kind of the first hire that they
hire there. And I worked with my boss, the director of analytics. And, you know, in a short
span of, you know, one or two years, we started to build our internal analytics infrastructure
that serves the company internal users and even external users.
So along the way, we can get more about what kind of a data stack back then we were using
in a bit.
But along the way, we built a bunch of internal tools inside a company.
And then one of the tools turns out to be a simple
dashboarding tool right you know you write a sql query you slap a chat visualization on top and
then you you write some sort of boilerplate around it to share it to other people right so so i
started building that thing and then after a while i realized that hey this can can be abstract
it can be externalized to other people
so I went to my boss to talk to them to him I went to the management team and asked hey if I can spin
this off into a separate startup then they all say yes so so that the idea kind of started from
there right so and then I talked to my friends recruited my my current co-founders, and then we kind of worked
in the part-time a bit in the evenings.
Then once we landed our first customer, we went full-time from there.
That was how it started.
That's great.
Where are you based right now, Huey?
The company started in Singapore, but then when it started, I moved back to
Vietnam and then started the product and engineering team here.
So right now the company is split between Singapore, Vietnam and Indonesia.
Oh, that's great.
That's very interesting.
And in terms of like your customers, I mean, are you mainly active in the Asia, in the
Asian markets?
You have like global customers.
Can you share a little bit more about that?
No.
So actually, we started with Southeast Asian customers.
But as of now, the majority of our customers are U.S. and European-based.
That's a common question people ask me, right?
You know, you guys are based in Asia.
You guys only work with Asian customers.
More than 50% of our customers are US and European.
That's amazing.
I have to say it's amazing what you have managed to do.
I tried to do something similar from Greece, from Europe,
and I can understand the difficulties of trying to build something
that Gates is trying to serve the American market, for example,
and do it outside the American market.
So, yeah, well done for that.
Thank you, thank you.
I mean, if anything we learn, we should have come to the US sooner.
That's one of our lessons.
Yeah, that's probably also my lesson, to be honest.
But anyway, that's the topic for a
business podcast. Now we want to discuss a little bit more about technology. So getting back to the
product, because I wouldn't like to spend a little bit more time like on the product. If I understand
correctly, and because like, I mean, before I ask the question, the perception that most people have
when we're talking about BI is that it's all about the visualization.
We have somehow associated strongly the term of BI tool with visualization.
But from what I understand, Holistics, how you compare to other tools out there
like Looker, Chart.io, SciSense, and the rest of the tools.
And yeah, I mean, at the end, what's special?
What's the secret sauce behind your success?
No, thank you.
I think you brought up a very good point about visualization.
I think for the longest time, when people talk about BI,
they think about visualization. It's for the longest time, when people talk about BI, they think about
visualization. It's a very understandable line of thinking. And I think, you know, the leader in
that space, Tableau, has done a tremendous job educating the people. And it's, think about it,
right, if you are a business user and people talk about data, you know, the first thing that comes
to mind is the visualization, right? And then the next thing that comes to their mind is,
what are the software that does a good job at visualizing the data,
which is Tableau, the leader in that space.
But that's because the business user, the non-technical user,
is not the person who prepared, who went through the entire process
of collecting, cleaning, valid the entire process of collecting, you know, cleaning,
validating the data, transforming, preparing the data for the final visualization, right?
The data preparation period, right?
That's the data team's job, right?
So essentially, if you look at the statistic, a lot of people will say that they spend more than 80 or 90% of their time just trying to get the data, preparing and cleaning the data to the right format before visualizing the data.
So from a data team perspective, a majority of the time is not spending on visualizing data, but spend on preparing it.
So that's kind of how Holistic comes in as first cut.
Contrary to say Tableau, where in order to visualize the data in Tableau effectively,
you need to get the data into the right format.
That's where Holistic comes in in so that's the first thing so the second thing
is if you look at the bi space in general i mean i'm going to give you a more long-winded answer
because it's kind of you know we're basically talking to the data analyst audience here right
so they wanted that nuances if you look at into the BI space in general, you see that there are two groups of,
generally there are two groups of BI tools.
On the left is what I call the pre-cloud tool.
These are the BI tool that build pre-cloud,
build for the desktop era, the server era.
Where what they do is,
what the assumption is that the data warehouse is expensive. So then what they do is what the assumption is that the data warehouse is expensive so then what they
do is they were going to build a data store for themselves and then they're going to load all your
data into their proprietary data store and then they're going to expose kind of a very simple drag and drop interface for, you know, for the users to build the report.
Right.
And then on the other hand, you have a group of tools where they is built around SQL.
Right.
You know, when more and more people realize that, hey, you know, I should let the data warehouse do the storing and the processing of the job instead of letting the BI tool do it, a lot of people started building tools that rely on SQL.
So you write a SQL query, you send to the database, it's executed query and return the result and visualize it. Clear. Right. So the difference is that on one hand, the first group of people, too, very easy for
the business user to use because they can use the drag and drop interface to build a
report.
But it's not actually very friendly for data people.
The data people prefer SQL.
Right.
And then the second group of BI tool is, because it's SQL native,
so data analysts and data team really like it
because they prefer working with SQL, right?
But on the flip side,
they are not friendly for the non-technical users
because now the non-technical user have to learn SQL, right?
So Holistics is kind of sit in the middle right there, right?
So on the one hand, we are SQL-based.
We are based on SQL, so that is a very friendly experience for the data team.
But on the other hand, we don't require the non-technical users
to learn SQL in order to build their own reports.
So, that's kind of how we are differentiated, right?
Does that kind of answer your question?
Yeah, absolutely. It's great.
I mean, I can understand that,
especially compared to products like Tableau or Power BI, for example,
or even products like if people remember Periscope data
before it became part of Sisense and compared it with Sisense,
because Sisense was exactly as you are saying, right? Like you had in one hand part of Sisense and compared it with Sisense. Because Sisense was exactly as you are saying,
you had in one hand something like Sisense
that was very strong to visualize stuff.
I didn't have the kind of SQL experience that data teams needed,
and that's why the two products also merged together,
trying to deliver at the end this kind of experience.
Something similar also happens in a way with Looker, I would say.
The difference there is that Looker implemented their own language at the end.
So you have LookML, which then it becomes something similar to what DBT is.
So yeah, I totally get the vision and I think it makes total sense.
Yeah, yeah.
As always, the devil is in the details.
So it's all about
how you implement the product.
From what I understand is based
on what you've shared with us so far about
the product. So Holistics is not
just like a product for the data,
for the consumer of the data, who is like the person
who is going to create the reports, right?
It's also about the people who care
about bringing the data together, shaping the data, create the reports, right? It's also about the people who care about bringing the data together,
shaping the data, transforming the data, etc.
So it's also about,
outside of the analyst or the business analyst,
it's also about the data engineer,
from what I understand.
If not, please correct me,
but I assume that it's also a tool
that has a special role for the data engineer.
In this case, can you, if this is true, of course, can you share with us what are the
problems that these people have and why these problems and how, actually, they are addressed
by your platform, from how Holistic addresses these problems and shows them?
So you talk about the data engineer, right?
So essentially, we don't really focus on the data engineer that much.
The kind of person, the role we want to focus on is a data analyst.
So basically, we wanted to make the data analyst life better, right?
And the data engineer, soon you'll see that the data engineer is part of the picture, right?
So if you think about the data analyst role, you know, he or she will be the perfect person with the right incentive and with the right kind of a skill set to be the main person driving the data team, right? Because, you know, a data analyst person,
she's technical enough to understand how the data structure looks like,
understand the nuances and the technicality of the data.
She's also have the business mind enough
to be able to work with the business users
to kind of understand their perspective,
how they think about the business model,
how the business is running,
and offer the data perspective to how the business is running, and offer
the data perspective to help them make decisions better, right?
But if you kind of look at the pain point, the problems that the data analyst is usually
facing in the data organization, then you see that she has a lot of bottlenecks, she
has a lot of things you need to overcome.
For example, right, a very common example is
if your organization has nothing in store,
have no data set up, the data team, sorry,
the data consumer, the non-technical user,
will keep coming to your data analyst for ad hoc reports.
You know, I want to check, hey, how is my sales in this quarter
compared to last quarter?
I want to check if there's any customer abnormality
happening in my department, stuff like that.
And then the analysts have to always spend time
manually compiling those reports to prepare for the business user.
So that is a bottleneck.
I call it a bottleneck, right?
Because the analyst wastes time doing that manual reporting, and then the consumer actually
wastes time waiting for it to happen, right?
The second bottleneck I see is between the data analyst and the data engineer, right?
Because the data analyst usually comes from
not very engineering background,
maybe from economics background, finance background.
She knows a little bit of SQL,
a little bit of business knowledge, right?
But usually she doesn't know how to write code
like Python, Ruby be a programming code.
So then whenever she needs some sort of data, she actually had to go to the data engineer
to ask, hey, can you pull in this data for me?
Or can you prepare this data for me so that I can do this report on run data analysis
for the CEO?
So that is the second bottleneck.
The analysts actually have to wait for the engineer
to get prepared
to prep the data for
her.
And then the third one
you will see is that
as the organization
grows,
you have more data
analysts and then you started to get into
analytic chaos, right? You know, the data analysts have no mechanism to collaborate between each
other, right? You know, some analysis you've done or some kind of aggregation work you do
are not being shared or communicated to other data analysts. So then different analysts kind of use different formula
to run the report, to get the numbers.
So it started getting into analytics chaos.
So that is another friction.
So essentially there are three bottlenecks that happen with data analysts.
Between the data analyst and the business user,
the data analyst and the engineer and the data analyst with another data analyst.
So our hope, our vision is to build a platform
that can empower analysts to resolve
to remove these bottlenecks altogether.
Does that make sense?
Yeah, absolutely.
So can you share a little bit more information
of how this can be done today with Holistics?
I mean, for these three types of frictions
that exist inside the organization, right?
Do you solve all of them, first of all?
Do you focus on one of them right now?
And yeah, tell us a little bit more about the current state of the product
in solving these problems and your future plans about this.
Yeah, so we kind of solve all three problems in kind of a one-shot.
So the three values that Holistics solve is that, first of all,
it allows the non-technical users to self-service and get the data directly without going through the data analyst.
So that's the first bottleneck between the data analyst and the business user.
Second of all, a lot of the work that data analysts require the data engineer to do are pipelining work.
So we actually give data analysts kind of a data engineering powers to do so, right?
They can load data from different sources into the platform.
They can do simple transformations and procedures.
They can view reports to expose to the business users.
So those are kind of what we call data engineering powers
that previously data analysts don't have.
And then the third thing,
which is the friction between the data analysts and data analysts,
we actually, as I mentioned earlier in our kind of intro pitch,
we actually help data analysts build a central source of truth
for the analytics logic,
for the business logic,
so that whatever work that you do
is being, you know,
is being checked in,
is being version control in a way
and being communicated
with other people
so that the team don't repeat themselves.
So we do this. the way we do this is
To be we build a very logical semantic layer
That sit between the business logic and the data logic and the underlying data warehouse logic
So that the data team uses kind of you know
They just kind of you they define all the business logic,
all the transformations, all the pipelining
in that semantic layer.
And then that semantic layer becomes the source of proof
for all the organization to come to get the answers.
Does that make sense to you?
Yeah, absolutely.
That's very, very interesting.
So I guess through like all the interactions
that you had so far with your customers,
you will also like expose to the data stacks
that your customers have, right?
Because I assume that Holistics
is not like the only product
that they are using for their data needs.
So can you share a little bit more
about like what you have seen out bit more about what you have seen out
there? What you have seen in terms of technologies that companies are using, how they try to
architecture their data stacks, and of course, at the end, how holistic fits to these
architectures that we see out there? Yeah, I think over the years, we have started seeing people shifting.
So your question is specifically around what data stacks that we see our
customer using, right?
What are they thinking about their data stack?
Yeah.
Yeah.
Okay.
Okay.
And what technologies you usually see working together with Holistics?
Okay.
Let me think. So one of the things that we wanted to position Holistics. Okay. Let me think.
So one of the things that we wanted to position Holistics
is we are not a replacement.
We are a complementary tool, like an augmentative tool
with the existing data stacks.
So when it comes in, they actually don't,
if they have something that's working,
they don't need to replace it completely.
But over the years, i've seen that you know
when people were small first of all people so i mean i mean people are people here are familiar
with data warehouse right but first of all what i see is that when they started when companies
started they actually don't need a data warehouse yep The first thing they do is they just take some sort of BI tool,
a SQL BI tool, like Holistics,
plug it directly into their production database,
and off they go. They can start building reports.
These are not very frequently accessed reports,
so all the best practice advice that we give people about,
hey, it's going to increase load to your database,
it's going to affect your production applications,
it doesn't apply.
They just need something that works,
that fits their needs, very simple, their analytics needs.
Except for when they have something like a MongoDB database,
for example, where it's very difficult to do analytics on top
of MongoDB. That's where we kind of recommend them to say, hey, you know, you should spin up
a data warehouse instances like BigQuery, Redshift, Snowflake, or even a simple Postgres database
to you, and then pull the data over from MongoDB over to the data warehouse and slap a BI tool on top. Right.
So that's one thing.
The other thing that I see is that I see increasing usage of,
I see kind of a shift between things like,
and so recently we see a lot more customers using Snowflake,
you know, the hot new data warehouse.
Yeah. Not as for the older, for the longer, for the older customers, they are still
on Redshift or BigQuery. But I do see, you know, some trends where people kind of moving away from
things like Redshift when they run into some performance problem over to say BigQuery or Snowflake, right?
You know, unless the infrastructure requires them to stay on, say like AWS, right?
Third thing is, you know, I do see that, you know, tools like, another thing I started
to see is that they're moving away from tools like Google Analytics to over to tools like Snowflow and of course, if other
stack being, you know, being, being Google Analytics don't give you that kind of granularity
in the events data that they need as compared to collecting the events data themselves.
Right.
And then, you know, they start to realize that, Hey, so we see a lot of company moving
from mixed panel to a custom view in-house solution,
usually open source solution.
And of course, there's SnowCloud Redis that comes into mind.
So that's the third thing we see.
I mean, that's what I can remember right now,
if I have more, I can share more.
Yeah, yeah, it makes sense.
I think the, let's say, democratization of data warehouses
because actually data warehouses, because actually data warehouses,
they're like the past decade or something, they have become much more accessible to almost like
everyone. Actually, I think today, even for very small companies, like accessing something like
BigQuery, or even like Snowflake, it's very cheap, right? I mean, all these systems, they charge
based on like your, either either the volume that you have there
or the processing that you are doing.
So if your data set is pretty small,
you're not going to be charged a lot of money anyway.
So it's becoming a bit of a no-brainer
for companies to, even at an early stage,
as you said, to use some of these technologies.
Redshift is a bit of a different story
mainly because it requires a lot
more management, although they are working to change that. And I think that's one of the
reasons that we see that fully managed services like Snowflake and BigQuery are winning big.
So to comment more on that, I think interestingly, if you go back to say i remember in 2012 and 13 that was when
redshift first came out right i remember because i was the data engineer back then working for viki
and this company so we were our data warehouse back then was postgres
and when redshift came out in beta,
we kind of immediately jumped into that
to try it out. And it
was wonderful, right? It worked
great. It's basically
because it's compatible with Postgres,
there were very little things that
we needed to do to
migrate the data or to migrate
our reporting system over
because it's compatible it's known that.
All right.
So Redshift was the first cloud data warehouse that is popularized, basically dropped the
price of data warehouse dramatically.
But I think the downside of that is because they were, I'm sure you know about the history of RedShift,
right? They were based on Paracel, which is based on Postgres. And then Amazon kind of
struck a deal with them to kind of bring the Paracel version onto their cloud. I mean,
they did an amazing job of kind of making it more cloudized and make it more accessible
to people.
But essentially,
all this infrastructure, I don't think they are built natively for the
cloud era.
If you look at BigQuery and
Snowflake, one of the main
advantages is the
splitting of compute
and storage
out. So then the compute
and storage don't sit in the same
kind of physical servers so to speak right and and and and and that's why you know even though
redshift was the first to come out of the market it become like an educating factor it educate
people to start using data warehouse and then then when they face the problem with performance,
usually, and then the cost,
because they have to constrain themselves
to a physical unit of compute,
like a server unit because of storage and compute.
And then that's where kind of BigQuery and Snowflake
kind of comes in and take off from there.
So I think that's very interesting
kind of thing to observe over there.
Yeah, and I found very interesting kind of thing to observe over there. Yeah, and
I found very interesting what you
were saying about applications like
Mixpanel and these very specialized
let's say web applications
around analytics because
my feeling is that as
we start having like very
powerful and pretty cheap
to use data
infrastructure like the modern data
warehouse on the cloud and very sophisticated BI and visualization tools
like holistics. Having your own infrastructure to actually do the
product analytics that you could do with this kind of products becomes much much
easier. So instead of like reintroducing another data silo, another product, another like a cost
center inside your company, like you can reuse your data warehouse and actually build like
at least some of these functionalities that you find on these products on your data warehouse
using like something like Holistics and something like Simpliq. Exactly. So, so when I coming back to my blast job, so back then what we were doing is that for
our events data, we were actually storing them in Hadoop.
So we have this kind of a, we build this collector, right?
You know, a custom view collector on top of a tool called Fluentd.
And then, you know, we, we build a web endpoint, we push event data
there, and then we use Fluentd and push it to our S3, and then we slap a Hadoop cluster
on top, and then we run some sort of aggregations, and then the aggregated results get pushed
back to our Redshift data warehouse.
Does that make sense to you? Yeah, yeah, yeah. Absolutely.
Yeah. The way we do that, the reason why we do that was because I think the cost of data
warehouse would also, it's not like BigQuery or Snowflake where they separate compute and storage.
In Redshift, the more events data you push, the more raw events data you push to it,
the more storage is consumed.
And then when you want to upgrade,
you have to actually upgrade the entire cluster.
So essentially, we maintain a dual system,
a dual data warehouse, so to speak.
One runs on top of Hadoop ecosystem,
and then the other one runs on top of traditional MPP databases.
Yeah, this is a setup that even now you can see in some companies,
like any company that has to operate on AWS,
and they have huge amounts of data to work with,
especially if they are event-related data,
you can see that they're probably going to implement something like a data lake where the data will be stored on S3.
Then just the subset of this data is going to be loaded in Redshift
or use something like Spectrum and Athena
to prepare and load the data or even query the data directly.
And yeah, I think that this is also a big byproduct
of the architecture and
the amount of data that some companies out there have to deal with. So yeah, what you did with
Hadoop, I mean, I think it's still happening. It's just like the technologies have matured and
things are a little bit easier than spinning up like... It's convergent basically. I mean,
if you think about it, then there are two tracks, right?
On the one track, there is things like MPP database
that's been out of C-Store, the columnar storage mechanism.
And then on the other track that you have, you have Hadoop, right?
The idea of separating compute and storage and map reviews.
And then what you are saying is that,
and what I'm seeing is that these two tracks will somehow converge to the same idea.
A lot of the concept from the MPP columnar storage database has already been applied over to the Hadoop ecosystem and vice versa.
Yeah, I totally agree. That's also what I see.
And it's very interesting to see how this market is going to develop in the future
because I think we are still at the beginning
with what is happening with the technology around data.
So I'm very excited to see what the next couple of years will bring us.
So, okay, we talked about the data stacks that you have seen out there in the wild.
Quickly, can you, because at the end you are also like a company, right?
And you also have to work with data internally do reporting and all that stuff so very quickly can you share with us a
little bit of like your infrastructure what kind of tools you use i assume you're using holistics
in-house but if you don't you can you can tell that so yeah share with us like what are you doing
what kind of best practices you are also following and what kind of stack you have?
Okay, okay.
I mean, our data, I mean, we're a B2B company, right?
Our data stack is pretty boring, so to say.
You know, we don't have a lot of, you know,
huge volume of data to process.
I mean, we just kind of a standard, right?
We have, you know, our production database
is a Postgres database.
We use, and then our data warehouse, we are using BigQuery right now.
We loaded our data from, we use Holistics for sure.
We loaded our data from Postgres over to BigQuery using Holistics, you know, and then we use
Holistics to do the modeling, you know, to do all the business logic, to data logic mapping,
to do all the transformations within the data warehouse, big queries.
And then we also use Holistic to expose kind of self-service interface for the business
user of the predefined dashboards.
We use Holistic to set up these push, the data push from, so, you know, we don't log into holistics every day right we
we push data into our slack channel so we set up this report and we push the data over to our slack
so then every morning we log in to slack we open up and then we can see a very nice visualization
that sits there to say how many uh say users we got the last day of the last week, stuff like that.
So that's on the transactional side.
On the kind of event, the analytics side,
we set up Snowblow, you know, that was like a year back.
And then similar thing, Snowblow pushed data to BigQuery.
And then in the BigQuery, we, you know,
we also use Holistics to model a lot of that events data,
page views data,
and then push to the visualization front.
So that's pretty standard.
That's pretty boring.
I mean, essentially, you can reduce the three things, right?
Snowplow, BigQuery, and then Holistics.
Well, to be honest, I think that it's something that I have encountered a lot,
like in this podcast, like the most successful data stacks that I have encountered a lot like in this podcast like the most successful
data stacks that we have seen so far usually they employ some kind of like boring technology
and boring design principles but at the end makes sense I mean you like you can't just use a every
state-of-the-art thing out there because you will pretty much end up duct-taping your infrastructure.
So it's quite important to also use proven technology out there.
And it's very interesting.
We had some discussions,
and this is something that we also do in Denali at Rudderstack.
For example, for our product,
we created some kind of queuing mechanism on top of Postgres, right?
I was talking, and this is an upcoming episode
with the guys at Slapdash, for example.
For their own product, they also needed to build
some kind of, not some kind, actually,
a graph database.
And they decided to do that again on top of Postgres
instead of using like one of the state-of-the-art products
around graph database that you can find out.
Yeah, that's very cool.
Yeah, I mean, if you think about it,
Postgres is a piece of software that has been developed
for almost, I don't know, 20, 30 years now.
So there has been so much human energy in it.
I mean, so mature. and when you're building a product
at the end you need to make sure that you deliver the best possible experience to your customer
right like your customer doesn't care what you're using on the background yeah yeah so
yep yep i love them sometimes is good yep yep i mean i think i mean I can say so I have so much good things to say about Postgres
I mean, it's a very good generic purpose database
You can use for a lot of the use cases, especially analytics, right? I mean the sequel
Syntax of the functionality around Seagulls is insane way better than my sequel
And we actually I actually wrote a blog post about, this is like six years ago, on why you should use
Postgres over MySQL when it comes to analytics.
Yeah.
Yeah, makes total sense.
I can understand that.
So, all right, moving a little bit forward, actually going a little bit back, let's expand
a little bit more about around the
bi market and can you tell us i mean you're an expert in the bi market many things have happened
like in the past two years many acquisitions products had merged uh looker was acquired by
google for a huge amount of money so what do you see that is happening right now in the bi space
and more specifically in the visualization space what are the what do you see that is happening right now in the BI space? And more
specifically in the visualization space, what are the trends that you see there? And what do you
think is the next big thing when it comes to BI? So, I mean, if we step back a little bit
and look at the, I mean, BI has been around for 60 years, right? I mean, it's been around for a very long time.
But if we really look into the history
and how the BI market evolved,
it's very interesting to look at.
And then we wrote this in our guidebook
on our holistic website.
If you look at the, we call it the three stages
or three waves of BI.
So at the beginning, this is maybe 40, 50 years ago,
BI is a very centralized system. You have things like Cognos, IBM Cognos. You work for only the big
corporations can afford BI, not for small companies, they can't afford BI. So you invest
millions of dollars to build this BI system. It's centralized, it's managed by IT. Basically,
because the computing resource was so expensive, they will only be able to serve the top-level management in that order.
Basically, there is some sort of a huge system, the data gets loaded into that system,
they run overnight, and then in the morning they churn out some sort of report,
the very standardized report for the business user to look at.
If you have random questions, like ad hoc questions,
you can't, right?
Basically your request goes into a queue in the IT desk.
And then, you know, the IT person will prioritize
the CEO, the C-level executive request over your request.
So usually you wait for maybe one or two months
to get the data, to get the report you need.
And every report will have to go through IT.
So in a way, I call that the centralized era.
And then, you know, with the centralized era, they have all these problems
that basically only the top executives have access to report.
You know, the mid-level, the low-level operational person don't have access to report. The mid-level, the low-level operational person don't have access to it.
So then there comes the second era, what I call the decentralized era. The decentralized era
happens when tools like Tableau or Excel even comes about. Basically, instead of, you know, submitting the request over to the IT team,
you log into some sort of a system, the CIM system, the production system.
You download the CSV file, the Excel file, export the CSV file, right?
And then you load that CSV file into a desktop program you installed on your computer,
Tableau desktop, for example.
It was awesome.
You know, you load, you dump that CSV into Tableau
and then you started to, you know,
really explore the data, right?
It's completely drag and drop.
You know, it requires no SQL knowledge whatsoever.
Non-technical user could learn to use Tableau.
And, you know, assuming they got the right data extract,
they could come up with fancy graph
for the rest of the company to consume, right?
So that was kind of the second era,
the decentralized era,
which tool like Tableau is a solution
to the problem the pain point faced by the first era.
Are you following? Are you with me so far? Yeah, of course.
Yeah. So then there comes the problem with the second era, what we call the metric night fight.
So the problem with the second era is because it's so decentralized, and people started using this data through a workaround route,
without going through central IT, it's very easy for the data to come out wrong.
Now the non-technical users are the ones that do the exploring, the building of the reports.
And basically, a scenario will happen where you have someone from the sales department
say that the revenue is X, and then there's someone from the marketing department say
the revenue is Y.
And then what happened is that each of them maybe extract the same CSV, but use the different formula
to calculate revenue.
Each of them may extract a different CSV because one CSV is stale data and one is not stale
data.
And then this will become a disaster because imagine that you use the wrong data to report
to your board director.
Things like that, it's going to happen.
And then it's become a total mess.
Does that make sense so far for you?
Yeah, absolutely.
Yeah.
So you see, on the first era, people have little access to data,
but at least because
it went through central ID, they are experienced people, they double-check the data and the
data has not been all over the place.
So then the accuracy of the data is correct, but in exchange for accessibility of the data.
In the second era, where the data is being decentralized, anyone can extract the data from a system and build their own reporting.
Basically, you get an abundance of access to the data.
But in exchange, you don't have the accuracy of the data, which is very important.
Because if you don't trust the data, you stop using it altogether to make a decision.
Then there comes the third era.
Basically, we say,
there is this friction between the business user and the IT team.
The business user wants access to the data,
but at the same time, the data slash IT team wants control
over the accuracy, the consistency of the data slash IT team wants control over the accuracy, the consistency of the data.
So that's where tools like Holistics or Looker comes about.
Instead of letting the business user download the data and build their own report in tools
like Tableau directly, or lock them out altogether and ask them to log into a central system to view predefined
report. And there's no way for them to ask how question. Tools like Tableau and Holistics expose
a semantic layer of data, a modeling layer, right? And then instead of building the report for every
single request from the business user, the data team, the IT team only need to work on maintaining the data modeling layer to make sure that all the data business logics are properly recorded.
All the metrics are defined clearly in the modeling layer.
So then this will be exposed, as I mentioned earlier, exposed as a BI interface for the
business user.
So then they can still get the decentralized experience in the second era, but this time
they don't have to kind of rebuild every report from scratch again with maybe using the wrong
formula or you don't have to download a CSV from somewhere
or the system to load into the BI anymore, right?
They use the source, the data from that data team,
the IT team provide to them.
They use the definition of the metrics
that the IT team, the data team prepare for them.
All they need to do is just explore on that restricted, although flexible,
but restricted interface to get that data. So that's kind of the third era, right? That's
already happening now, right? It's not clearly obvious yet, but I think that's going to happen
sooner or later. Does that make sense? Yeah, absolutely. I think you managed to do an amazing description
of what has happened in the BI market from its creation up to today. And I think we still have
like very exciting things to see that will happen in the future. And yeah, I'm looking forward to it.
And I'm pretty sure that like Holistics is going going to be a company that will make some of these
new things happen. So having said that, and moving to the end of the show for today,
one last question. Would you like to share something about Holistics that is coming in
the future? Something that is really exciting for you and you would like to share with our audience?
Oh, thanks. Maybe not so much on Holistic itself,
but let me comment a little bit more on another trend that we are seeing in the data analytics
space, which at Holistic, we're also trying to figure out how to tap on. Would that be okay?
Yeah, of course. Of course. Yeah. So I think the other trends that we see happening, which you can say that the fourth trend, you can say that the fourth wave or the fourth stage of BI or analytics in general that we think is going to happen is the fact that a lot of, which I think this is already happening right a lot of basically analytics is
people are actually taking the learning from the best practices that happening in the software
engineering space or the dev ops space over to applying to the data space analytics space
right you know basically whatever principle is that DevOps, in the DevOps business,
they are applying like CI, CD, like continuous delivery, agile development, being applied over to
the data space. And people call it DataOps or as I think Tristan Handy from DBT Fishtail, he coined the term analytics engineer
or analytics engineering, which also fits nicely with that kind of trend.
So among those trends, basically applying the software engineering principles over the
data, one of the key elements that I see happening is that the use of code or text to represent logic in the data.
If you look at the infrastructure space, there's obviously tools like Terraform.
Are you familiar with Terraform or Ansible?
Yeah, of course. So the Terraform and Ansible use allow you to write code or rather text
to represent your entire infrastructure.
And then you just run a command to kind of recreate that infrastructure
on the cloud in your production.
So basically, there's no more lock into the system,
UI drag and drop, click here, click there.
Everything is code.
It's coded as text.
And that has amazing benefits.
It's enabled automation.
It's enabled maintainability.
It's enabled reusability.
It's enabled clarity of logic. It's like a simple practice,
a simple mechanism of using code as code
to represent infrastructure has a bunch of multiple,
multitude of benefits to the company, right?
So what I'm seeing happening slowly
is that that has been applied to analytics, right?
Maybe what we call it analytics as code, right?
And I mean, tools like Look, right?
With LookML, you know, tools like dbt
among the first tools to adopt
either consciously or subconsciously
to adopt these practices.
And I see more and more tools.
I'm sure more and more we basically catch on
to adopt these practices.
Does that make sense?
Yeah, absolutely.
I mean, and I totally agree with you
that this is like a huge trend
that is actually currently forming
and we see more and more best practices
from both development and engineering,
but also from infrastructure management. As you very well said and engineering, but also from infrastructure management,
as you very well said and mentioned,
like Ansible and Terraform.
We see these paradigms coming also to the management of data
and how to work, how to use these kinds of paradigms
to accelerate productivity and increase quality
and solve many of the unsolved problems
around working with data from the past.
So yeah, that's pretty exciting.
And I'm really interested to see what happens there.
And I hope we will see things happening in this space
also from Holistics.
So Hugh, thank you so much.
It was great chatting with you today.
I'm pretty sure we will have the opportunity
to chat again in the future.
I think we need a couple of episodes at least
to cover all the different things
we can discuss together.
I would encourage everyone to check your websites.
I know that you have an amazing wealth of content there.
So I'm pretty sure that people can find
some very opinionated and interesting stuff
around data, BI, and all the stuff that we discussed together.
And of course, give a try to Holistics, right?
Yeah.
We also wrote a free book.
I mean, sorry about the simple excuse.
We wrote a very free book for those of you
who basically wanted to get a better understanding
about the data BI space.
We wrote a free guide understanding about the data BI space,
we wrote a free guidebook to explain the BI space
or the analytics space in a very layman term.
You can check it out on our website.
That's great.
I would encourage everyone to go and download it.
And yeah, thank you.
And I'm looking forward to chat with you again in the future.
Thank you, Costa.
Thanks for having me.
I appreciate it.
That was a really interesting conversation. I think their approach to separating various components
within the BI ecosystem is fascinating. But Kostas, what piqued your interest and what did
you like most about that conversation? Yeah, first of all, it's like a pure delight to chat with Huey.
I mean, it's been like more than 50 minutes,
probably our longest episode.
And I feel like we still have a lot of things to discuss with him.
Huey is like an amazingly aware person
around what is going on with the BI space
and anything that has to do with data in general.
I think the whole conversation that we had
around the evolution of the BI market
and the products out there was great
with the three different phases,
how things started,
what was the second wave of BI tools,
where we stand right now,
and what's the future.
I think the team there has a very crystal clear, actually,
vision of what's going to happen with the BI space and they are executing like pretty well on that. It was like
a great mix of both business and technology related insights. I think it was very interesting
part where we were discussing on analytics as a code and we see that a lot happening lately where we have companies
and products like dbt look ml from looker like look ml was like a big part of the success of
looker and we see like how dbt becomes like one of the most favorite tools for data engineers
and how the same approach can also be used like in the BI space in general.
We had the opportunity to even chat about Snowflake, the different data warehouse solutions.
We literally went through the whole data stack and Huey shared with us his experience from the BI
point of view of every single part of the data stack. And that was extremely interesting.
Unfortunately, we didn't have enough time
to go through everything.
I'm pretty sure that we will have
at least another call with him in the future
to revisit some of these topics
and also see what Holistics is going to come up next
in their product.
They're really building an amazing product and it's very interesting to see how they are going to come up next in their product. They're really like building an amazing product
and it's very interesting to see how they are going to progress.
I agree.
Well, we will definitely schedule another call with their team
and we'll catch you next time on the Data Stack Show.