The Data Stack Show - 10: The Evolution of the BI Market with Huy Nguyen of Holistics

Starting point is 00:00:00 Welcome back to the Data Stack Show. Today we are going to talk with Holistics, which is a self-service BI platform. They do some really interesting things relative to a lot of other options in the market. And I actually had a chance to meet the Holistics team in person over a year ago in San Francisco at a conference I attended. And their team is incredibly sharp, really enjoyed meeting them. But from a technical standpoint, I'm interested, Kostas, what do you want to ask them based on what differentiates them? Because there are so many options in the BI space? Yeah, actually, it's very interesting that we are discussing with someone from the BI space because as a market, it has gone through like a lot of changes lately. I mean, it's not been long ago

Starting point is 00:00:56 that we had the acquisition of Looker from Google, for example, the acquisition and going out of the public market of Tableau from Salesforce, the merge of Sisense with Periscope data. So it's a very interesting market. There are many things that are happening. Products are really competing with each other, trying to differentiate. And we also have to consider that we have a company that is actually based from Asia. They're based in Singapore and Vietnam. And they managed to do an amazing job in expanding both in the United States and in Europe. So it's very interesting, first of all, to see how they managed to differentiate as a product and how they perceive differently the BI problem and the visualization problem and how they succeed in that, especially with the constraints that they have, right?

Starting point is 00:01:50 I mean, it's really hard to compete in this space, even if you're in an environment like Silicon Valley. It's even harder when you have to do that from completely different culture and times on, like being in Asia, for example. And I think the team there has managed to do an amazing job in building a technology that really stands out compared to the rest of the products. I'm really interested to see because they are approaching the problem

Starting point is 00:02:16 from two sides. One is from the data analyst, of course, which is the main consumer of a product like BI. But at the same time, they're trying to approach and solve the problems of the data engineer. And we have seen in this show that the data engineer as a person inside the organization is becoming more and more important. And it would be great to see how these problems are addressed from a purely BI tool, which

Starting point is 00:02:41 comes from the visualization space. So yeah, I'm very interested to see what Hugh has to say. And let's move forward and chat with him. Hi, Hugh. It's very nice to have you tonight at the Data Stack Show. Thank you so much for your time. I'm very excited to hear about yourself and Holistics and your products and the stories that you have to share around BI and the data industry in general.

Starting point is 00:03:05 So welcome. Can you please introduce quickly yourself and also say a few things about the company? Okay. Hi, Kosta. Thanks for having me. I'm very excited to be here. So my name is Hui. I run this company called Holistics. We are sort of like a data platform that helps data team build and maintain a central source of truth for analytics logic or business logic, and then expose that to the business user, to the non-technical users via a very simple interface for them to ask data questions and get answers themselves without bothering the technical people, the data teams. Essentially, that's what we do. That's great.

Starting point is 00:03:57 So, can you give us a little bit more color on the original idea and the background behind Holistics, like who came up with the idea and the evolution of the idea because as everything else in this world, probably it was a bit different at the beginning when we started compared to how it looks today. So it would be great to see the evolution of Holistics. Yeah, okay. So my background is a software engineer.

Starting point is 00:04:23 I studied computer science. I have been doing programming all my life. And then when I studied in Singapore, and then when I finished university, I kind of joined this company called Viki, V-I-K-I. It's actually a US startup, very popular back then, and still now, that they do you know they they do korean drama chinese drama taiwanese drama japanese drama so they basically take uh think of it as a hulu for

Starting point is 00:04:55 international audience international movies so then i joined that company as a data engineer my first job out of college back in 2012 2013 2013. So then we have a very small data team back there, right? Only two or three people. And then I was kind of the first hire that they hire there. And I worked with my boss, the director of analytics. And, you know, in a short span of, you know, one or two years, we started to build our internal analytics infrastructure that serves the company internal users and even external users. So along the way, we can get more about what kind of a data stack back then we were using in a bit.

Starting point is 00:05:38 But along the way, we built a bunch of internal tools inside a company. And then one of the tools turns out to be a simple dashboarding tool right you know you write a sql query you slap a chat visualization on top and then you you write some sort of boilerplate around it to share it to other people right so so i started building that thing and then after a while i realized that hey this can can be abstract it can be externalized to other people so I went to my boss to talk to them to him I went to the management team and asked hey if I can spin this off into a separate startup then they all say yes so so that the idea kind of started from

Starting point is 00:06:19 there right so and then I talked to my friends recruited my my current co-founders, and then we kind of worked in the part-time a bit in the evenings. Then once we landed our first customer, we went full-time from there. That was how it started. That's great. Where are you based right now, Huey? The company started in Singapore, but then when it started, I moved back to Vietnam and then started the product and engineering team here.

Starting point is 00:06:50 So right now the company is split between Singapore, Vietnam and Indonesia. Oh, that's great. That's very interesting. And in terms of like your customers, I mean, are you mainly active in the Asia, in the Asian markets? You have like global customers. Can you share a little bit more about that? No.

Starting point is 00:07:10 So actually, we started with Southeast Asian customers. But as of now, the majority of our customers are U.S. and European-based. That's a common question people ask me, right? You know, you guys are based in Asia. You guys only work with Asian customers. More than 50% of our customers are US and European. That's amazing. I have to say it's amazing what you have managed to do.

Starting point is 00:07:36 I tried to do something similar from Greece, from Europe, and I can understand the difficulties of trying to build something that Gates is trying to serve the American market, for example, and do it outside the American market. So, yeah, well done for that. Thank you, thank you. I mean, if anything we learn, we should have come to the US sooner. That's one of our lessons.

Starting point is 00:08:01 Yeah, that's probably also my lesson, to be honest. But anyway, that's the topic for a business podcast. Now we want to discuss a little bit more about technology. So getting back to the product, because I wouldn't like to spend a little bit more time like on the product. If I understand correctly, and because like, I mean, before I ask the question, the perception that most people have when we're talking about BI is that it's all about the visualization. We have somehow associated strongly the term of BI tool with visualization. But from what I understand, Holistics, how you compare to other tools out there

Starting point is 00:08:46 like Looker, Chart.io, SciSense, and the rest of the tools. And yeah, I mean, at the end, what's special? What's the secret sauce behind your success? No, thank you. I think you brought up a very good point about visualization. I think for the longest time, when people talk about BI, they think about visualization. It's for the longest time, when people talk about BI, they think about visualization. It's a very understandable line of thinking. And I think, you know, the leader in

Starting point is 00:09:11 that space, Tableau, has done a tremendous job educating the people. And it's, think about it, right, if you are a business user and people talk about data, you know, the first thing that comes to mind is the visualization, right? And then the next thing that comes to their mind is, what are the software that does a good job at visualizing the data, which is Tableau, the leader in that space. But that's because the business user, the non-technical user, is not the person who prepared, who went through the entire process of collecting, cleaning, valid the entire process of collecting, you know, cleaning,

Starting point is 00:09:46 validating the data, transforming, preparing the data for the final visualization, right? The data preparation period, right? That's the data team's job, right? So essentially, if you look at the statistic, a lot of people will say that they spend more than 80 or 90% of their time just trying to get the data, preparing and cleaning the data to the right format before visualizing the data. So from a data team perspective, a majority of the time is not spending on visualizing data, but spend on preparing it. So that's kind of how Holistic comes in as first cut. Contrary to say Tableau, where in order to visualize the data in Tableau effectively, you need to get the data into the right format.

Starting point is 00:10:42 That's where Holistic comes in in so that's the first thing so the second thing is if you look at the bi space in general i mean i'm going to give you a more long-winded answer because it's kind of you know we're basically talking to the data analyst audience here right so they wanted that nuances if you look at into the BI space in general, you see that there are two groups of, generally there are two groups of BI tools. On the left is what I call the pre-cloud tool. These are the BI tool that build pre-cloud, build for the desktop era, the server era.

Starting point is 00:11:21 Where what they do is, what the assumption is that the data warehouse is expensive. So then what they do is what the assumption is that the data warehouse is expensive so then what they do is they were going to build a data store for themselves and then they're going to load all your data into their proprietary data store and then they're going to expose kind of a very simple drag and drop interface for, you know, for the users to build the report. Right. And then on the other hand, you have a group of tools where they is built around SQL. Right. You know, when more and more people realize that, hey, you know, I should let the data warehouse do the storing and the processing of the job instead of letting the BI tool do it, a lot of people started building tools that rely on SQL.

Starting point is 00:12:13 So you write a SQL query, you send to the database, it's executed query and return the result and visualize it. Clear. Right. So the difference is that on one hand, the first group of people, too, very easy for the business user to use because they can use the drag and drop interface to build a report. But it's not actually very friendly for data people. The data people prefer SQL. Right. And then the second group of BI tool is, because it's SQL native, so data analysts and data team really like it

Starting point is 00:12:48 because they prefer working with SQL, right? But on the flip side, they are not friendly for the non-technical users because now the non-technical user have to learn SQL, right? So Holistics is kind of sit in the middle right there, right? So on the one hand, we are SQL-based. We are based on SQL, so that is a very friendly experience for the data team. But on the other hand, we don't require the non-technical users

Starting point is 00:13:15 to learn SQL in order to build their own reports. So, that's kind of how we are differentiated, right? Does that kind of answer your question? Yeah, absolutely. It's great. I mean, I can understand that, especially compared to products like Tableau or Power BI, for example, or even products like if people remember Periscope data before it became part of Sisense and compared it with Sisense,

Starting point is 00:13:43 because Sisense was exactly as you are saying, right? Like you had in one hand part of Sisense and compared it with Sisense. Because Sisense was exactly as you are saying, you had in one hand something like Sisense that was very strong to visualize stuff. I didn't have the kind of SQL experience that data teams needed, and that's why the two products also merged together, trying to deliver at the end this kind of experience. Something similar also happens in a way with Looker, I would say. The difference there is that Looker implemented their own language at the end.

Starting point is 00:14:11 So you have LookML, which then it becomes something similar to what DBT is. So yeah, I totally get the vision and I think it makes total sense. Yeah, yeah. As always, the devil is in the details. So it's all about how you implement the product. From what I understand is based on what you've shared with us so far about

Starting point is 00:14:34 the product. So Holistics is not just like a product for the data, for the consumer of the data, who is like the person who is going to create the reports, right? It's also about the people who care about bringing the data together, shaping the data, create the reports, right? It's also about the people who care about bringing the data together, shaping the data, transforming the data, etc. So it's also about,

Starting point is 00:14:50 outside of the analyst or the business analyst, it's also about the data engineer, from what I understand. If not, please correct me, but I assume that it's also a tool that has a special role for the data engineer. In this case, can you, if this is true, of course, can you share with us what are the problems that these people have and why these problems and how, actually, they are addressed

Starting point is 00:15:14 by your platform, from how Holistic addresses these problems and shows them? So you talk about the data engineer, right? So essentially, we don't really focus on the data engineer that much. The kind of person, the role we want to focus on is a data analyst. So basically, we wanted to make the data analyst life better, right? And the data engineer, soon you'll see that the data engineer is part of the picture, right? So if you think about the data analyst role, you know, he or she will be the perfect person with the right incentive and with the right kind of a skill set to be the main person driving the data team, right? Because, you know, a data analyst person, she's technical enough to understand how the data structure looks like,

Starting point is 00:16:09 understand the nuances and the technicality of the data. She's also have the business mind enough to be able to work with the business users to kind of understand their perspective, how they think about the business model, how the business is running, and offer the data perspective to how the business is running, and offer the data perspective to help them make decisions better, right?

Starting point is 00:16:30 But if you kind of look at the pain point, the problems that the data analyst is usually facing in the data organization, then you see that she has a lot of bottlenecks, she has a lot of things you need to overcome. For example, right, a very common example is if your organization has nothing in store, have no data set up, the data team, sorry, the data consumer, the non-technical user, will keep coming to your data analyst for ad hoc reports.

Starting point is 00:17:03 You know, I want to check, hey, how is my sales in this quarter compared to last quarter? I want to check if there's any customer abnormality happening in my department, stuff like that. And then the analysts have to always spend time manually compiling those reports to prepare for the business user. So that is a bottleneck. I call it a bottleneck, right?

Starting point is 00:17:29 Because the analyst wastes time doing that manual reporting, and then the consumer actually wastes time waiting for it to happen, right? The second bottleneck I see is between the data analyst and the data engineer, right? Because the data analyst usually comes from not very engineering background, maybe from economics background, finance background. She knows a little bit of SQL, a little bit of business knowledge, right?

Starting point is 00:17:58 But usually she doesn't know how to write code like Python, Ruby be a programming code. So then whenever she needs some sort of data, she actually had to go to the data engineer to ask, hey, can you pull in this data for me? Or can you prepare this data for me so that I can do this report on run data analysis for the CEO? So that is the second bottleneck. The analysts actually have to wait for the engineer

Starting point is 00:18:25 to get prepared to prep the data for her. And then the third one you will see is that as the organization grows, you have more data

Starting point is 00:18:43 analysts and then you started to get into analytic chaos, right? You know, the data analysts have no mechanism to collaborate between each other, right? You know, some analysis you've done or some kind of aggregation work you do are not being shared or communicated to other data analysts. So then different analysts kind of use different formula to run the report, to get the numbers. So it started getting into analytics chaos. So that is another friction. So essentially there are three bottlenecks that happen with data analysts.

Starting point is 00:19:28 Between the data analyst and the business user, the data analyst and the engineer and the data analyst with another data analyst. So our hope, our vision is to build a platform that can empower analysts to resolve to remove these bottlenecks altogether. Does that make sense? Yeah, absolutely. So can you share a little bit more information

Starting point is 00:19:49 of how this can be done today with Holistics? I mean, for these three types of frictions that exist inside the organization, right? Do you solve all of them, first of all? Do you focus on one of them right now? And yeah, tell us a little bit more about the current state of the product in solving these problems and your future plans about this. Yeah, so we kind of solve all three problems in kind of a one-shot.

Starting point is 00:20:14 So the three values that Holistics solve is that, first of all, it allows the non-technical users to self-service and get the data directly without going through the data analyst. So that's the first bottleneck between the data analyst and the business user. Second of all, a lot of the work that data analysts require the data engineer to do are pipelining work. So we actually give data analysts kind of a data engineering powers to do so, right? They can load data from different sources into the platform. They can do simple transformations and procedures. They can view reports to expose to the business users.

Starting point is 00:21:06 So those are kind of what we call data engineering powers that previously data analysts don't have. And then the third thing, which is the friction between the data analysts and data analysts, we actually, as I mentioned earlier in our kind of intro pitch, we actually help data analysts build a central source of truth for the analytics logic, for the business logic,

Starting point is 00:21:30 so that whatever work that you do is being, you know, is being checked in, is being version control in a way and being communicated with other people so that the team don't repeat themselves. So we do this. the way we do this is

Starting point is 00:21:48 To be we build a very logical semantic layer That sit between the business logic and the data logic and the underlying data warehouse logic So that the data team uses kind of you know They just kind of you they define all the business logic, all the transformations, all the pipelining in that semantic layer. And then that semantic layer becomes the source of proof for all the organization to come to get the answers.

Starting point is 00:22:20 Does that make sense to you? Yeah, absolutely. That's very, very interesting. So I guess through like all the interactions that you had so far with your customers, you will also like expose to the data stacks that your customers have, right? Because I assume that Holistics

Starting point is 00:22:37 is not like the only product that they are using for their data needs. So can you share a little bit more about like what you have seen out bit more about what you have seen out there? What you have seen in terms of technologies that companies are using, how they try to architecture their data stacks, and of course, at the end, how holistic fits to these architectures that we see out there? Yeah, I think over the years, we have started seeing people shifting. So your question is specifically around what data stacks that we see our

Starting point is 00:23:12 customer using, right? What are they thinking about their data stack? Yeah. Yeah. Okay. Okay. And what technologies you usually see working together with Holistics? Okay.

Starting point is 00:23:24 Let me think. So one of the things that we wanted to position Holistics. Okay. Let me think. So one of the things that we wanted to position Holistics is we are not a replacement. We are a complementary tool, like an augmentative tool with the existing data stacks. So when it comes in, they actually don't, if they have something that's working, they don't need to replace it completely.

Starting point is 00:23:43 But over the years, i've seen that you know when people were small first of all people so i mean i mean people are people here are familiar with data warehouse right but first of all what i see is that when they started when companies started they actually don't need a data warehouse yep The first thing they do is they just take some sort of BI tool, a SQL BI tool, like Holistics, plug it directly into their production database, and off they go. They can start building reports. These are not very frequently accessed reports,

Starting point is 00:24:19 so all the best practice advice that we give people about, hey, it's going to increase load to your database, it's going to affect your production applications, it doesn't apply. They just need something that works, that fits their needs, very simple, their analytics needs. Except for when they have something like a MongoDB database, for example, where it's very difficult to do analytics on top

Starting point is 00:24:45 of MongoDB. That's where we kind of recommend them to say, hey, you know, you should spin up a data warehouse instances like BigQuery, Redshift, Snowflake, or even a simple Postgres database to you, and then pull the data over from MongoDB over to the data warehouse and slap a BI tool on top. Right. So that's one thing. The other thing that I see is that I see increasing usage of, I see kind of a shift between things like, and so recently we see a lot more customers using Snowflake, you know, the hot new data warehouse.

Starting point is 00:25:21 Yeah. Not as for the older, for the longer, for the older customers, they are still on Redshift or BigQuery. But I do see, you know, some trends where people kind of moving away from things like Redshift when they run into some performance problem over to say BigQuery or Snowflake, right? You know, unless the infrastructure requires them to stay on, say like AWS, right? Third thing is, you know, I do see that, you know, tools like, another thing I started to see is that they're moving away from tools like Google Analytics to over to tools like Snowflow and of course, if other stack being, you know, being, being Google Analytics don't give you that kind of granularity in the events data that they need as compared to collecting the events data themselves.

Starting point is 00:26:18 Right. And then, you know, they start to realize that, Hey, so we see a lot of company moving from mixed panel to a custom view in-house solution, usually open source solution. And of course, there's SnowCloud Redis that comes into mind. So that's the third thing we see. I mean, that's what I can remember right now, if I have more, I can share more.

Starting point is 00:26:39 Yeah, yeah, it makes sense. I think the, let's say, democratization of data warehouses because actually data warehouses, because actually data warehouses, they're like the past decade or something, they have become much more accessible to almost like everyone. Actually, I think today, even for very small companies, like accessing something like BigQuery, or even like Snowflake, it's very cheap, right? I mean, all these systems, they charge based on like your, either either the volume that you have there or the processing that you are doing.

Starting point is 00:27:07 So if your data set is pretty small, you're not going to be charged a lot of money anyway. So it's becoming a bit of a no-brainer for companies to, even at an early stage, as you said, to use some of these technologies. Redshift is a bit of a different story mainly because it requires a lot more management, although they are working to change that. And I think that's one of the

Starting point is 00:27:30 reasons that we see that fully managed services like Snowflake and BigQuery are winning big. So to comment more on that, I think interestingly, if you go back to say i remember in 2012 and 13 that was when redshift first came out right i remember because i was the data engineer back then working for viki and this company so we were our data warehouse back then was postgres and when redshift came out in beta, we kind of immediately jumped into that to try it out. And it was wonderful, right? It worked

Starting point is 00:28:11 great. It's basically because it's compatible with Postgres, there were very little things that we needed to do to migrate the data or to migrate our reporting system over because it's compatible it's known that. All right.

Starting point is 00:28:29 So Redshift was the first cloud data warehouse that is popularized, basically dropped the price of data warehouse dramatically. But I think the downside of that is because they were, I'm sure you know about the history of RedShift, right? They were based on Paracel, which is based on Postgres. And then Amazon kind of struck a deal with them to kind of bring the Paracel version onto their cloud. I mean, they did an amazing job of kind of making it more cloudized and make it more accessible to people. But essentially,

Starting point is 00:29:08 all this infrastructure, I don't think they are built natively for the cloud era. If you look at BigQuery and Snowflake, one of the main advantages is the splitting of compute and storage out. So then the compute

Starting point is 00:29:24 and storage don't sit in the same kind of physical servers so to speak right and and and and and that's why you know even though redshift was the first to come out of the market it become like an educating factor it educate people to start using data warehouse and then then when they face the problem with performance, usually, and then the cost, because they have to constrain themselves to a physical unit of compute, like a server unit because of storage and compute.

Starting point is 00:29:56 And then that's where kind of BigQuery and Snowflake kind of comes in and take off from there. So I think that's very interesting kind of thing to observe over there. Yeah, and I found very interesting kind of thing to observe over there. Yeah, and I found very interesting what you were saying about applications like Mixpanel and these very specialized

Starting point is 00:30:12 let's say web applications around analytics because my feeling is that as we start having like very powerful and pretty cheap to use data infrastructure like the modern data warehouse on the cloud and very sophisticated BI and visualization tools

Starting point is 00:30:32 like holistics. Having your own infrastructure to actually do the product analytics that you could do with this kind of products becomes much much easier. So instead of like reintroducing another data silo, another product, another like a cost center inside your company, like you can reuse your data warehouse and actually build like at least some of these functionalities that you find on these products on your data warehouse using like something like Holistics and something like Simpliq. Exactly. So, so when I coming back to my blast job, so back then what we were doing is that for our events data, we were actually storing them in Hadoop. So we have this kind of a, we build this collector, right?

Starting point is 00:31:18 You know, a custom view collector on top of a tool called Fluentd. And then, you know, we, we build a web endpoint, we push event data there, and then we use Fluentd and push it to our S3, and then we slap a Hadoop cluster on top, and then we run some sort of aggregations, and then the aggregated results get pushed back to our Redshift data warehouse. Does that make sense to you? Yeah, yeah, yeah. Absolutely. Yeah. The way we do that, the reason why we do that was because I think the cost of data warehouse would also, it's not like BigQuery or Snowflake where they separate compute and storage.

Starting point is 00:31:59 In Redshift, the more events data you push, the more raw events data you push to it, the more storage is consumed. And then when you want to upgrade, you have to actually upgrade the entire cluster. So essentially, we maintain a dual system, a dual data warehouse, so to speak. One runs on top of Hadoop ecosystem, and then the other one runs on top of traditional MPP databases.

Starting point is 00:32:27 Yeah, this is a setup that even now you can see in some companies, like any company that has to operate on AWS, and they have huge amounts of data to work with, especially if they are event-related data, you can see that they're probably going to implement something like a data lake where the data will be stored on S3. Then just the subset of this data is going to be loaded in Redshift or use something like Spectrum and Athena to prepare and load the data or even query the data directly.

Starting point is 00:33:01 And yeah, I think that this is also a big byproduct of the architecture and the amount of data that some companies out there have to deal with. So yeah, what you did with Hadoop, I mean, I think it's still happening. It's just like the technologies have matured and things are a little bit easier than spinning up like... It's convergent basically. I mean, if you think about it, then there are two tracks, right? On the one track, there is things like MPP database that's been out of C-Store, the columnar storage mechanism.

Starting point is 00:33:32 And then on the other track that you have, you have Hadoop, right? The idea of separating compute and storage and map reviews. And then what you are saying is that, and what I'm seeing is that these two tracks will somehow converge to the same idea. A lot of the concept from the MPP columnar storage database has already been applied over to the Hadoop ecosystem and vice versa. Yeah, I totally agree. That's also what I see. And it's very interesting to see how this market is going to develop in the future because I think we are still at the beginning

Starting point is 00:34:07 with what is happening with the technology around data. So I'm very excited to see what the next couple of years will bring us. So, okay, we talked about the data stacks that you have seen out there in the wild. Quickly, can you, because at the end you are also like a company, right? And you also have to work with data internally do reporting and all that stuff so very quickly can you share with us a little bit of like your infrastructure what kind of tools you use i assume you're using holistics in-house but if you don't you can you can tell that so yeah share with us like what are you doing what kind of best practices you are also following and what kind of stack you have?

Starting point is 00:34:46 Okay, okay. I mean, our data, I mean, we're a B2B company, right? Our data stack is pretty boring, so to say. You know, we don't have a lot of, you know, huge volume of data to process. I mean, we just kind of a standard, right? We have, you know, our production database is a Postgres database.

Starting point is 00:35:03 We use, and then our data warehouse, we are using BigQuery right now. We loaded our data from, we use Holistics for sure. We loaded our data from Postgres over to BigQuery using Holistics, you know, and then we use Holistics to do the modeling, you know, to do all the business logic, to data logic mapping, to do all the transformations within the data warehouse, big queries. And then we also use Holistic to expose kind of self-service interface for the business user of the predefined dashboards. We use Holistic to set up these push, the data push from, so, you know, we don't log into holistics every day right we

Starting point is 00:35:46 we push data into our slack channel so we set up this report and we push the data over to our slack so then every morning we log in to slack we open up and then we can see a very nice visualization that sits there to say how many uh say users we got the last day of the last week, stuff like that. So that's on the transactional side. On the kind of event, the analytics side, we set up Snowblow, you know, that was like a year back. And then similar thing, Snowblow pushed data to BigQuery. And then in the BigQuery, we, you know,

Starting point is 00:36:21 we also use Holistics to model a lot of that events data, page views data, and then push to the visualization front. So that's pretty standard. That's pretty boring. I mean, essentially, you can reduce the three things, right? Snowplow, BigQuery, and then Holistics. Well, to be honest, I think that it's something that I have encountered a lot,

Starting point is 00:36:42 like in this podcast, like the most successful data stacks that I have encountered a lot like in this podcast like the most successful data stacks that we have seen so far usually they employ some kind of like boring technology and boring design principles but at the end makes sense I mean you like you can't just use a every state-of-the-art thing out there because you will pretty much end up duct-taping your infrastructure. So it's quite important to also use proven technology out there. And it's very interesting. We had some discussions, and this is something that we also do in Denali at Rudderstack.

Starting point is 00:37:18 For example, for our product, we created some kind of queuing mechanism on top of Postgres, right? I was talking, and this is an upcoming episode with the guys at Slapdash, for example. For their own product, they also needed to build some kind of, not some kind, actually, a graph database. And they decided to do that again on top of Postgres

Starting point is 00:37:44 instead of using like one of the state-of-the-art products around graph database that you can find out. Yeah, that's very cool. Yeah, I mean, if you think about it, Postgres is a piece of software that has been developed for almost, I don't know, 20, 30 years now. So there has been so much human energy in it. I mean, so mature. and when you're building a product

Starting point is 00:38:07 at the end you need to make sure that you deliver the best possible experience to your customer right like your customer doesn't care what you're using on the background yeah yeah so yep yep i love them sometimes is good yep yep i mean i think i mean I can say so I have so much good things to say about Postgres I mean, it's a very good generic purpose database You can use for a lot of the use cases, especially analytics, right? I mean the sequel Syntax of the functionality around Seagulls is insane way better than my sequel And we actually I actually wrote a blog post about, this is like six years ago, on why you should use Postgres over MySQL when it comes to analytics.

Starting point is 00:38:53 Yeah. Yeah, makes total sense. I can understand that. So, all right, moving a little bit forward, actually going a little bit back, let's expand a little bit more about around the bi market and can you tell us i mean you're an expert in the bi market many things have happened like in the past two years many acquisitions products had merged uh looker was acquired by google for a huge amount of money so what do you see that is happening right now in the bi space

Starting point is 00:39:24 and more specifically in the visualization space what are the what do you see that is happening right now in the BI space? And more specifically in the visualization space, what are the trends that you see there? And what do you think is the next big thing when it comes to BI? So, I mean, if we step back a little bit and look at the, I mean, BI has been around for 60 years, right? I mean, it's been around for a very long time. But if we really look into the history and how the BI market evolved, it's very interesting to look at. And then we wrote this in our guidebook

Starting point is 00:39:55 on our holistic website. If you look at the, we call it the three stages or three waves of BI. So at the beginning, this is maybe 40, 50 years ago, BI is a very centralized system. You have things like Cognos, IBM Cognos. You work for only the big corporations can afford BI, not for small companies, they can't afford BI. So you invest millions of dollars to build this BI system. It's centralized, it's managed by IT. Basically, because the computing resource was so expensive, they will only be able to serve the top-level management in that order.

Starting point is 00:40:47 Basically, there is some sort of a huge system, the data gets loaded into that system, they run overnight, and then in the morning they churn out some sort of report, the very standardized report for the business user to look at. If you have random questions, like ad hoc questions, you can't, right? Basically your request goes into a queue in the IT desk. And then, you know, the IT person will prioritize the CEO, the C-level executive request over your request.

Starting point is 00:41:18 So usually you wait for maybe one or two months to get the data, to get the report you need. And every report will have to go through IT. So in a way, I call that the centralized era. And then, you know, with the centralized era, they have all these problems that basically only the top executives have access to report. You know, the mid-level, the low-level operational person don't have access to report. The mid-level, the low-level operational person don't have access to it. So then there comes the second era, what I call the decentralized era. The decentralized era

Starting point is 00:41:54 happens when tools like Tableau or Excel even comes about. Basically, instead of, you know, submitting the request over to the IT team, you log into some sort of a system, the CIM system, the production system. You download the CSV file, the Excel file, export the CSV file, right? And then you load that CSV file into a desktop program you installed on your computer, Tableau desktop, for example. It was awesome. You know, you load, you dump that CSV into Tableau and then you started to, you know,

Starting point is 00:42:39 really explore the data, right? It's completely drag and drop. You know, it requires no SQL knowledge whatsoever. Non-technical user could learn to use Tableau. And, you know, assuming they got the right data extract, they could come up with fancy graph for the rest of the company to consume, right? So that was kind of the second era,

Starting point is 00:42:59 the decentralized era, which tool like Tableau is a solution to the problem the pain point faced by the first era. Are you following? Are you with me so far? Yeah, of course. Yeah. So then there comes the problem with the second era, what we call the metric night fight. So the problem with the second era is because it's so decentralized, and people started using this data through a workaround route, without going through central IT, it's very easy for the data to come out wrong. Now the non-technical users are the ones that do the exploring, the building of the reports.

Starting point is 00:43:46 And basically, a scenario will happen where you have someone from the sales department say that the revenue is X, and then there's someone from the marketing department say the revenue is Y. And then what happened is that each of them maybe extract the same CSV, but use the different formula to calculate revenue. Each of them may extract a different CSV because one CSV is stale data and one is not stale data. And then this will become a disaster because imagine that you use the wrong data to report

Starting point is 00:44:23 to your board director. Things like that, it's going to happen. And then it's become a total mess. Does that make sense so far for you? Yeah, absolutely. Yeah. So you see, on the first era, people have little access to data, but at least because

Starting point is 00:44:45 it went through central ID, they are experienced people, they double-check the data and the data has not been all over the place. So then the accuracy of the data is correct, but in exchange for accessibility of the data. In the second era, where the data is being decentralized, anyone can extract the data from a system and build their own reporting. Basically, you get an abundance of access to the data. But in exchange, you don't have the accuracy of the data, which is very important. Because if you don't trust the data, you stop using it altogether to make a decision. Then there comes the third era.

Starting point is 00:45:28 Basically, we say, there is this friction between the business user and the IT team. The business user wants access to the data, but at the same time, the data slash IT team wants control over the accuracy, the consistency of the data slash IT team wants control over the accuracy, the consistency of the data. So that's where tools like Holistics or Looker comes about. Instead of letting the business user download the data and build their own report in tools like Tableau directly, or lock them out altogether and ask them to log into a central system to view predefined

Starting point is 00:46:08 report. And there's no way for them to ask how question. Tools like Tableau and Holistics expose a semantic layer of data, a modeling layer, right? And then instead of building the report for every single request from the business user, the data team, the IT team only need to work on maintaining the data modeling layer to make sure that all the data business logics are properly recorded. All the metrics are defined clearly in the modeling layer. So then this will be exposed, as I mentioned earlier, exposed as a BI interface for the business user. So then they can still get the decentralized experience in the second era, but this time they don't have to kind of rebuild every report from scratch again with maybe using the wrong

Starting point is 00:47:01 formula or you don't have to download a CSV from somewhere or the system to load into the BI anymore, right? They use the source, the data from that data team, the IT team provide to them. They use the definition of the metrics that the IT team, the data team prepare for them. All they need to do is just explore on that restricted, although flexible, but restricted interface to get that data. So that's kind of the third era, right? That's

Starting point is 00:47:33 already happening now, right? It's not clearly obvious yet, but I think that's going to happen sooner or later. Does that make sense? Yeah, absolutely. I think you managed to do an amazing description of what has happened in the BI market from its creation up to today. And I think we still have like very exciting things to see that will happen in the future. And yeah, I'm looking forward to it. And I'm pretty sure that like Holistics is going going to be a company that will make some of these new things happen. So having said that, and moving to the end of the show for today, one last question. Would you like to share something about Holistics that is coming in the future? Something that is really exciting for you and you would like to share with our audience?

Starting point is 00:48:23 Oh, thanks. Maybe not so much on Holistic itself, but let me comment a little bit more on another trend that we are seeing in the data analytics space, which at Holistic, we're also trying to figure out how to tap on. Would that be okay? Yeah, of course. Of course. Yeah. So I think the other trends that we see happening, which you can say that the fourth trend, you can say that the fourth wave or the fourth stage of BI or analytics in general that we think is going to happen is the fact that a lot of, which I think this is already happening right a lot of basically analytics is people are actually taking the learning from the best practices that happening in the software engineering space or the dev ops space over to applying to the data space analytics space right you know basically whatever principle is that DevOps, in the DevOps business, they are applying like CI, CD, like continuous delivery, agile development, being applied over to

Starting point is 00:49:38 the data space. And people call it DataOps or as I think Tristan Handy from DBT Fishtail, he coined the term analytics engineer or analytics engineering, which also fits nicely with that kind of trend. So among those trends, basically applying the software engineering principles over the data, one of the key elements that I see happening is that the use of code or text to represent logic in the data. If you look at the infrastructure space, there's obviously tools like Terraform. Are you familiar with Terraform or Ansible? Yeah, of course. So the Terraform and Ansible use allow you to write code or rather text to represent your entire infrastructure.

Starting point is 00:50:32 And then you just run a command to kind of recreate that infrastructure on the cloud in your production. So basically, there's no more lock into the system, UI drag and drop, click here, click there. Everything is code. It's coded as text. And that has amazing benefits. It's enabled automation.

Starting point is 00:50:57 It's enabled maintainability. It's enabled reusability. It's enabled clarity of logic. It's like a simple practice, a simple mechanism of using code as code to represent infrastructure has a bunch of multiple, multitude of benefits to the company, right? So what I'm seeing happening slowly is that that has been applied to analytics, right?

Starting point is 00:51:23 Maybe what we call it analytics as code, right? And I mean, tools like Look, right? With LookML, you know, tools like dbt among the first tools to adopt either consciously or subconsciously to adopt these practices. And I see more and more tools. I'm sure more and more we basically catch on

Starting point is 00:51:45 to adopt these practices. Does that make sense? Yeah, absolutely. I mean, and I totally agree with you that this is like a huge trend that is actually currently forming and we see more and more best practices from both development and engineering,

Starting point is 00:52:03 but also from infrastructure management. As you very well said and engineering, but also from infrastructure management, as you very well said and mentioned, like Ansible and Terraform. We see these paradigms coming also to the management of data and how to work, how to use these kinds of paradigms to accelerate productivity and increase quality and solve many of the unsolved problems around working with data from the past.

Starting point is 00:52:28 So yeah, that's pretty exciting. And I'm really interested to see what happens there. And I hope we will see things happening in this space also from Holistics. So Hugh, thank you so much. It was great chatting with you today. I'm pretty sure we will have the opportunity to chat again in the future.

Starting point is 00:52:46 I think we need a couple of episodes at least to cover all the different things we can discuss together. I would encourage everyone to check your websites. I know that you have an amazing wealth of content there. So I'm pretty sure that people can find some very opinionated and interesting stuff around data, BI, and all the stuff that we discussed together.

Starting point is 00:53:06 And of course, give a try to Holistics, right? Yeah. We also wrote a free book. I mean, sorry about the simple excuse. We wrote a very free book for those of you who basically wanted to get a better understanding about the data BI space. We wrote a free guide understanding about the data BI space,

Starting point is 00:53:27 we wrote a free guidebook to explain the BI space or the analytics space in a very layman term. You can check it out on our website. That's great. I would encourage everyone to go and download it. And yeah, thank you. And I'm looking forward to chat with you again in the future. Thank you, Costa.

Starting point is 00:53:43 Thanks for having me. I appreciate it. That was a really interesting conversation. I think their approach to separating various components within the BI ecosystem is fascinating. But Kostas, what piqued your interest and what did you like most about that conversation? Yeah, first of all, it's like a pure delight to chat with Huey. I mean, it's been like more than 50 minutes, probably our longest episode. And I feel like we still have a lot of things to discuss with him.

Starting point is 00:54:15 Huey is like an amazingly aware person around what is going on with the BI space and anything that has to do with data in general. I think the whole conversation that we had around the evolution of the BI market and the products out there was great with the three different phases, how things started,

Starting point is 00:54:35 what was the second wave of BI tools, where we stand right now, and what's the future. I think the team there has a very crystal clear, actually, vision of what's going to happen with the BI space and they are executing like pretty well on that. It was like a great mix of both business and technology related insights. I think it was very interesting part where we were discussing on analytics as a code and we see that a lot happening lately where we have companies and products like dbt look ml from looker like look ml was like a big part of the success of

Starting point is 00:55:13 looker and we see like how dbt becomes like one of the most favorite tools for data engineers and how the same approach can also be used like in the BI space in general. We had the opportunity to even chat about Snowflake, the different data warehouse solutions. We literally went through the whole data stack and Huey shared with us his experience from the BI point of view of every single part of the data stack. And that was extremely interesting. Unfortunately, we didn't have enough time to go through everything. I'm pretty sure that we will have

Starting point is 00:55:52 at least another call with him in the future to revisit some of these topics and also see what Holistics is going to come up next in their product. They're really building an amazing product and it's very interesting to see how they are going to come up next in their product. They're really like building an amazing product and it's very interesting to see how they are going to progress. I agree. Well, we will definitely schedule another call with their team

Starting point is 00:56:14 and we'll catch you next time on the Data Stack Show.

The Data Stack Show - 10: The Evolution of the BI Market with Huy Nguyen of Holistics

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.