The Data Stack Show - 01: Discussing Mattermost Data Infrastructure with Alex Dovenmuehle

Episode Date: August 12, 2020

In this episode, Kostas Pardalis sits down with Alex Dovenmuehle, head of data engineering for Mattermost, an open-source self-hosted communication tool that optimizes dev workflows in highly secure e...nvironments. Kostas and Alex discuss:Alex's background and experience (2:29)Data stack Mattermost is using (9:25)How Mattermost built their Data Stack (21:05)Using data to understand the story of the customer's journey (24:58)Focus on privacy and security (26:33)Practical ways Mattermost is using data (37:14)What's next for data analytics at Mattermost and wrap up (42:45)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 . Eric Dodds here on the Data Stack Show. Today we're going to talk with the head of data engineering for Mattermost. Mattermost is a company based on open standards. They create communications tools that optimize dev workflows in highly secure environments. Really interesting company. We align
Starting point is 00:00:25 with them pretty closely because we're also based on open standards here at Rudderstack. So really excited to learn more about how they're building their data pipeline. In terms of technical specifics Kostas, what interested you about the conversation we had with Alex? Yeah, the whole conversation with Alex was very interesting. We went through the whole data stack that they have. So we had the opportunity to learn about the different data sources where they pull data from, things like the different data types that they work with
Starting point is 00:01:01 and all the different technologies that they have to incorporate to create a very up-to-date data stack. We were going to discuss and we discussed around the problems that they have and actually some very interesting points that we touched was about data modeling and the importance of data modeling and some technologies like LookML and DPD that they are incorporating in their data stack. And also how important open source is and the different open source tools out there in order to build and maintain such a complex data stack as the one that they currently have. So I'm pretty excited. I think everyone will listen to some very interesting ideas around how to build a complete data stack and what kind of issues you might encounter and also some possible
Starting point is 00:01:57 solutions to them. So hi Alex, thank you so much for your time today. It's a great pleasure to have you here. And yeah, we are here today to discuss about Mattermost, you, and learn more about what you are doing with analytics and your overall experience, of course, and especially what you are doing at Mattermost. So, would you like to start with giving like a brief background, introduction about yourself and also say a few things about Mattermost?
Starting point is 00:02:29 ALEX DOVENMILLEN- Sure. So my name is Alex Dovenmill. You'll have to find my name on the internet to see how to spell it. My background is a computer science degree in college. I went to UNC Charlotte, which is in North Carolina. And for a long time, I just worked, you know, around Charlotte. I did back end, front end, everything.
Starting point is 00:02:57 I did, you know, VB.net, C Sharp, Ruby, Python. I've kind of done all sorts of stuff. And then about five years ago, I started a job at Heroku, which is a Salesforce company. It's sort of used to be bigger back in the day, I guess, but, you know, platform as a service type company. And I joined on the vault team,
Starting point is 00:03:26 which was essentially the team responsible for the systems that handled billing the customers. And then our group was called business operations. And as part of that group, we also ran the data warehouse and the analytics for the company, for Heroku itself, I guess, as a business unit. And that's kind of where I started to really get into a lot of the data engineering.
Starting point is 00:03:55 And I'll say big data, but it wasn't really that big. I always say when people say big data, because big data to me is like petabytes, like Google, Facebook scale kind of stuff. Yeah. I mean, I think we were using Redshift at Heroku, and it was maybe, I think we were getting up into the 30 terabyte range or something, but it's not like it's that big. Come on, guys. Anyway. Yeah. So really learned a lot there um and i was actually switched to like an actual data engineering
Starting point is 00:04:35 role about two years into my time there and um i got them onto airflow i got them onto Airflow. I got them onto DBT. And we actually, at that company, we built our own. We were using Segment at the time at Heroku for user analytics. And we wanted to get away from Segment. So we actually built a homegrown analytics pipeline, essentially, using Amazon Kinesis, which worked fairly well, and actually
Starting point is 00:05:16 had a lot of traffic going through that. But yeah, so then, about six months ago, I left Heroku and started at Mattermost. When I started at Mattermost, they kind of had a data warehouse, they were using Redshift, but, and they were also using Chart.io, which is sort of like you know business intelligence kind of tool like looker or whatever but they really didn't have any actual like what I would consider data engineering things set up and really the warehouse that they had at the time only had essentially user and in our case server analytics but they weren't pulling anything in from
Starting point is 00:06:06 salesforce or zendesk or any of the other myriad of tools that we had and i came in and built a data engineering infrastructure with it's on a amazon eks so it's on Amazon EKS, so it's Kubernetes. It uses Airflow, dbt, and then we're actually using Looker at Mattermost. But that's kind of the, I'll say, that was sort of a long-winded introduction, but we'll just go with it, that's it. Yeah, yeah, I mean, we will have the opportunity
Starting point is 00:06:43 to dive deeper about on the architecture that you have there and how the overall setup is and why you decided to go this way. But it's very interesting what you said and a couple of questions on that. First of all, going from Heroku to Mattermost and it's been quite some time between these two companies. How you've seen things progress and how they have changed in terms of data engineering and the role of the data inside the company and what are the differences that you have experienced as a data engineer going from Kuroko to Mattermost? Well, I think we've tried to take the best of what we learned at Heroku to Mattermost.
Starting point is 00:07:32 And really, the overarching goal is get all your data from all the different places into one warehouse, in our case, Snowflake, and then use a tool like dbt to make sense of all of it and kind of aggregate it up to the levels and stuff so that when you present it in Looker, you can not only know that you're presenting accurate data, but you're also really enriching all the data that you do have and making all the little connections between all the data to really unlock the power of it. Because it doesn't exactly help you if like, I can go log into Zendesk and see this.
Starting point is 00:08:15 I can go log into Salesforce and see that. Being able to combine it and see it all in one place makes it a lot more valuable, in my opinion. Yeah, that's very interesting. And I think it's even more interesting when we see how a common pattern, what you're describing is even between so different companies as Heroku, which is like an infrastructure company, and Mattermost is around chat application, but still the need to collect all the different data from there and put it in one place and try to unlock the value out of it remains
Starting point is 00:08:51 the same. That's great. So moving forward, can you give a little bit more color around the stack that you currently have? What kind of technologies? I mean, you have mentioned some of them already. But give some more information about the whys behind you chose these. And also, it would be interesting to hear
Starting point is 00:09:12 about how these tools also changed from your experience this past few years, starting from Heroku to today, Mattermost. Yeah, so let's see where to start. I think we'll start with Airflow. I did mention that we were running it on EKS, Amazon EKS with Kubernetes. And so Airflow, if you don't know,
Starting point is 00:09:37 is really a job scheduler and runner at its most basic, but it has this concept of DAGs, which are directed acyclic graphs, and you essentially can do lots of fancy things, like say, once these two jobs run successfully, then run this next job, and then it also
Starting point is 00:09:58 has a lot of stuff built in with automatic retries, it'll alert you if anything fails, it logs the logs of a specific job in a specific place. So the UI is kind of nice to be able to make sure that everything's running the way that you think it should and being able to monitor everything and all that kind of stuff.
Starting point is 00:10:19 So Airflow was something I discovered three years ago. I know it's a lot older than that. I think it was originally an Airbnb tool, if I'm not mistaken. Yeah, I think so, yeah. Yeah, and now it's like an Apache open source project. So yeah, that was definitely like day one, I was like, we definitely need Airflow. Like you're just going to have to have it.
Starting point is 00:10:45 So built that out. And one of the we definitely need Airflow. Like, you're just going to have to have it. So built that out. And one of the interesting features that Airflow came out with, I can't remember exactly what release it was or even how long ago, but it's fairly recent. Each job in a DAG is called an operator. And they came up with what they're they call the kubernetes pod operator and what it essentially does is it runs the job in it spins up a kubernetes pod with whatever image that you tell it to and then it runs the job in that pod now what's really cool
Starting point is 00:11:20 about that is now you have decoupled the scheduling and operation of just running the jobs with the actual what is running the job. What does the job actually do? And what's cool about it is Airflow is written in Python, but because you're using a Kubernetes pod operator, you could run a job in any language that you want, because it just needs to be some container, like a pod running in a container or whatever, to run it. So you could have, if the rest of your company's on, you write GoLang or maybe Haskell, I don't know, whatever craziness you want to try,
Starting point is 00:11:59 it doesn't matter, with the Kubernetes pod operator, you can do it. One thing I will call out is that I'm assuming most people have heard of GitLab, but if you haven't, it's sort of like an open source, open core kind of competitor to GitHub. But they actually open sourced all of their data engineering stuff, just like we have at Mattermost. But I sort of took some of the patterns that they were using to get the stuff set up.
Starting point is 00:12:38 So shout out to them for all that stuff, because it's all pretty nice. Yeah, so that's Airflow. Next tool to probably talk about, maybe we'll talk about Snowflake, I guess. So Snowflake is a kind of a data warehouse tool, essentially a database columnar data store. Kind of competes with Redshift. I don't even know what the rest of them are.
Starting point is 00:13:08 I'm sure there's others out there. Yeah, BigQuery. Yeah, BigQuery, right. Now, the big difference between Snowflake and Redshift is that the way that Snowflake works is the storage is decoupled from the compute. And that's really the key to why Snowflake, in my opinion, is really nice. Because it allows you to scale your compute very easily.
Starting point is 00:13:33 Whereas if you're using Redshift, and we used a lot of Redshift at Heroku, if we needed more space, we'd have to add more nodes to our cluster. There wasn't any way to really just add compute if you needed it. Now, I do know that Redshift has been starting to add features like that, but I haven't personally used those.
Starting point is 00:13:58 And honestly, Snowflake's been pretty nice. We use a lot of Snowflake's been pretty nice. You know, we use a lot of Snowflake. And then on top of Snowflake, so for our actual, like, data visualization and sort of business intelligence tool, we're using Looker. And that's mainly because, actually, when I joined Mattermost, I actually joined with somebody else. And basically four people from that business operations group at Heroku, four of us came over to Mattermost within two months of each other. And so we purposely did that. But anyway, we used a lot of Looker at Heroku as well. So it was kind of a tool that we're comfortable with.
Starting point is 00:14:46 And the other nice thing about Looker too, is that we actually have, Looker uses a, I'll call it a language, called LookML for defining your models and views and explorers and all this stuff. And Mattermost, we're actually, we have our LookML for everything that we do in Looker is also open source.
Starting point is 00:15:10 So you can go find that as well, which is kind of cool. Another nice thing about being sort of open source about stuff. The next one I'll talk about is dbt, which stands for data build tool. The purpose of dbt is to essentially take care of all your data transformation in your data warehouse. It's just super good at what it does. and it really makes it easy to build models
Starting point is 00:15:46 that are not only accurate, but also they're just easy to build, and it's easy to figure out what's going on. It also has this cool, has this docs site that it will generate for you, and you can go see, let's say you have this table that's been aggregated from 10 different tables,
Starting point is 00:16:12 and all this stuff has gone to make this one big aggregated table. It'll actually show you the data lineage of where all that data came from, like how is this thing actually calculated, which makes it a lot easier for somebody coming in new to understand what's going on, like how did this data get calculated, and all that kind of stuff,
Starting point is 00:16:34 where if you just have a bunch of crazy SQL all over the place, it's just impossible to figure out where it's coming from. Yeah, actually, from my experience with dbt, I think what they have managed to do in an amazing way is that take all the good practices that engineering in general has and that always were missing from when you had to work in a database environment with SQL
Starting point is 00:16:58 and actually apply them. So you have like, you can version your models. You can roll back and do tests and stuff like that, that we were always discussing about to do that on the database level, but it just wasn't there for whatever reason. And I think that dbt managed to do an amazing job on that. A quick question. You mentioned LookML and dbt.
Starting point is 00:17:29 And just out of curiosity, because you are using both, and I think this is interesting. I mean, there is some kind of overlap between these two. How do you separate these two tools and how do you use them inside MotherMost? Yeah, that's a good question. So to me, the point of DBT is to take all the raw data that you have and jam it together in a way that's accurate.
Starting point is 00:18:03 And like you said, you can do the testing and all that stuff in a very... It has some actual engineering discipline to it. And then we generally only build stuff in the LookML, the views and the models and all that stuff in Looker on top of those already pre-built models from dbt that we made. And then we just start showing those ones in Looker because that way we know, okay, we've verified this model that dbt made,
Starting point is 00:18:41 like this table really, has all this stuff. It's accurate. It's what we want, and then we can show it, use Looker to let people explore the data, and that keeps out a lot of the weird things where if you have people in Looker and they don't necessarily understand exactly what the data means, and then they start making weird explorers that try to map data together that doesn't really, you know, map the way that they think it does.
Starting point is 00:19:13 So it just makes it a lot cleaner and it makes a lot, I think it makes it more approachable for the users and looker. Like you don't want them to have to care or know about the data model at like a super deep level. You just want to give them to care or know about the data model at, like, a super deep level. You just want to give them easy tools where they can just, like, oh, I just want to see this, this, and this, and, you know, give me some group by aggregate sum or whatever. And allow them to do that without, really, without them having to come to you and asking questions about, what does this data mean? That's always the, you know, I mean you get you're gonna get those regardless but trying to minimize you know giving them the giving them the power like empowering them to do stuff is it just makes it more scalable and
Starting point is 00:20:01 that's really what you're going for yeah Yeah, I think there is a good... Because you mentioned earlier about Snowflake, about the power of decoupling the storage from the processing. I think something similar is also happening with these tools where we see that the modeling of the data is actually decoupled from the actual visualization and modeling of the data is actually decoupled from the actual visualization and working with the data and trying to come out with data and all that stuff.
Starting point is 00:20:32 And I think that this kind of decoupling will appear more and more in anything that has to do with data. And I think that it's a very powerful paradigm and we can see it manifesting itself in different ways. And I think this is also what is happening with products like Looker, LookML, DBT, and all that stuff. I mean, this whole industry is still shaping because it's still a work in progress. But I think I'm just kind of decoupling.
Starting point is 00:20:58 It's what actually is moving forward in the industry right now. Cool. So anything else around your data stack that you would like to mention? Oh, yes. There is one more. It's called Rudder Stack. Like Heroku, when I joined Mattermost, they were using Segment. And I think it was about two years ago,
Starting point is 00:21:24 and two years ago at Matter most when I wasn't there. They started getting billed like an insane amount per month because they were going like way over their usage on Segment and they essentially ended up turning off all except 2% of the events that they were sending to Segment. So essentially, and I mean, you know, losing 98% of your data, like just gone, because you can't, you know, deal with Segment paying that much.
Starting point is 00:22:01 So when I came on, I was like, well, I've already gotten away from segment once. I'm going to go ahead and do it again. Not that I don't think segment is like a, I think it's fine if that's what you're looking for, but it was never what I was looking for. So I wanted to get rid of it as soon as possible. And so as part of that, it was actually during my interview with Mattermost,
Starting point is 00:22:29 the CEO Ian was like, Oh, hey, have you heard this thing called Rudderstack? I was like, Oh no, I hadn't heard of that. Let me go check that out. And it's like, Oh, it's this open source second alternative. I was like, yes, this is exactly what we need. So starting about, I think, in March, I want to say, we started this project of getting rid of segment
Starting point is 00:22:55 and replacing it with Rutter Stack. And from a code perspective and everything like that, it was a really simple change. Like, it's not that complicated to, you know, replace segment with rudder stack. It's really quite simple. But it was more of a, it's more of like, what's all the downstream effects and like what's the what else can we do with rudder stack that we weren't able to do with segment because we had such uh limitations on the amount of data we could send so um so like i've implemented or i've had implemented that's more of like a project manager on this project
Starting point is 00:23:45 more than actually doing the code. But essentially, Mattermost now has RutterStack implemented on the server, the web app, the mobile app, the plugins. On Mattermost.com, we're using RutterStack because they're also using Google Analytics on there, but Google Analytics only gives you aggregated views of your data, whereas Rutter Stack, not only can you have custom events that you can just trigger in JavaScript, they click this button, they do that, whatever, but it also does all the page tracking and everything else for you and so having that all go through like the same tool you know it integrates with snowflake just
Starting point is 00:24:32 fine and i know rudder stack has a bunch of other connectors to different data destinations or whatever but obviously i just care about snowflake um so getting all that raw data into Snowflake where now you can really start telling the story of, and this will, I'll just, some foreshadowing. Really get the story of the customer journey with RotorStack. And the other thing that's been really nice with RotorStack is now that we have no real limitations on how many events that we can send,
Starting point is 00:25:07 now it's like, oh yeah, just add an event for that. Why not? It's not going to hurt anything. You want to be a little careful on exactly what events you add and all that kind of stuff so you're not just plowing a bunch of data in there that doesn't mean anything. But if it's a meaningful event, then, you know, it's just opening up a whole new world for, especially the product managers that matter most.
Starting point is 00:25:33 But yeah, Rosh. Yeah. So from what I understand, you're using RouterStack to capture all the interactions that your customers have on many different touch points with them, right? So it's not just your website, it's like many different touch points that you have. And then these data are pulled into Snowflake and from there, dbt is used to do the data modeling and then LookML and Looker are used like for visualization and diving deeper into the data.
Starting point is 00:26:06 Because you mentioned at the beginning that a very important goal in building this kind of analytics infrastructure is to collect all the data into one place. We talked a lot about the customer events and the customer-related behavioral data. Are there other data sources that you pull? And if yes, how do you do it and what kind of tools you are using for that? Yeah, great question.
Starting point is 00:26:39 So actually, one thing, this is sort of tangential, but because Mattermost is a very privacy and security focused company um we actually the data that we're sending to rudder stack is actually sort of there's no pii in it if you will um we're not sending email addresses or anything like that to it it's literally just the you know like if you're on the web app or whatever and you're just chatting, all that gets sent to rudder is like the internal user ID that the server identifies you as and like the server ID, just so we can have that kind of stuff, which is kind of an interesting take on some of that stuff because, you know, like cough, cough,
Starting point is 00:27:17 Facebook is trying to, you know, figure out exactly what you're doing on everything at all times, listening on your phone. I'm sure All that crazy stuff so anyway, that's sort of an aside, but yeah, so as far as like You know getting all the data, and then how do we like put it together? So not only are we getting all the rudder stacks So that's like you said like you know what kind of website stuff like web traffic and that kind of things But also like in product user events and even server telemetry.
Starting point is 00:27:51 But then also we've got, we use Zendesk for support, Salesforce for sales. They're using Marketo to do like marketing type stuff. Those are pretty much the main ones. And one thing that we do, and that's one thing that we kind of brought over from Heroku, is we're using a tool called Heroku Connect, which allows you to have a bidirectional sync between a Salesforce instance and a Heroku Postgres database.
Starting point is 00:28:22 And what that allows us to do is not only can we read the data from Salesforce and get it into Snowflake, but then we can also write back to Salesforce. And what that's really helpful for is that, like, the salespeople and, you know, maybe solution architects, sales engineers or whatever, they live in Salesforce all day. You don't want to have to force them to go to some other system, like even Looker, really. You want the data to live in there. So what we do is we'll generate these data points about, let's say, like an account, you know, like some sales guy is trying to sell
Starting point is 00:29:00 to some account. But maybe we'll sync some data that, you know, like how many users do they have or how many active users have they had in the last week or something like, just something to give them a little bit more context into what's going on with their customer. But we are using Stitch data to sync some of the data from the various data sources.
Starting point is 00:29:22 So like Google Analytics, Zendesk, Jira, but yeah, Jira, I forgot about that one. So Mattermost uses Jira for all of its internal project management, and we're actually pulling that data into the data warehouse, and then we can actually use that
Starting point is 00:29:44 data to give, you know, like, say the VP of engineering a view of, like, how are these different teams doing, and how many tickets are they doing, and all this kind of stuff. So you can kind of even do, like, performance metrics with this data. Yeah, so that's sort of, so it's like, get all the data in with Airflow or Stitch data or Rudder stack, and then use dbt to transform it, then Looker to visualize it. So that's the way we do it. Yeah, sounds great. I mean, it sounds like you have managed to build these single source of truth
Starting point is 00:30:22 around your data and pulling all the different data that you need in one place. Is there something missing? Is there something that data sources that you don't touch yet and you are considering to do it in the future? Do you think that's something missing from the data that you need? Or now it's more about focusing on working on the data and creating things like... And we are going to talk more about that later, but the customer journey that you mentioned. Yeah.
Starting point is 00:30:53 So, yeah, I think it's both, really. One of the challenges we have with the way that Mattermost as a product is distributed, because we don't currently have a SaaS product, is that people have to upgrade their own servers. And most people don't. So, like, when I say, oh, we just released RudderStack and, you know, the 5.23 release that we released, like, a month ago,
Starting point is 00:31:20 like, most people are not on that one. And so that makes it a challenge when you're trying to add some of these telemetry items. You know, if somebody's not on the right version where that telemetry is even implemented, you're just not getting it, and so you don't know. And so I think that's sort of the biggest challenge that we have, and really until we have a SaaS product or somehow make upgrading your server brain dead easy, which I think is kind of tough in itself.
Starting point is 00:31:53 Like, you know, we're always going to be fighting that essentially. Yeah, so like I said previously, when we moved, or like two years ago at Mattermost, they turned off all but 2% of those events coming into segment. So now that we've turned them back on for Rudder, with Rudder stack, now we have to go actually figure out what this data all means because we didn't really have any of it.
Starting point is 00:32:21 So we didn't really know how to model it and do this kind of stuff. So now that the Rudder stack release has been in the wild for a little while, we're finally getting enough data to where we can start we figured it out and we're modeling it, aggregating
Starting point is 00:32:35 it up so you can visualize it in Looker. And then to the customer journey piece is now that we have now that we're using RudderStack across all these different properties, that's if a user visits Mattermost.com, let's say they go to some blog post, or maybe we'll host this podcast, and they went there, right? And then, you know, they're reading around,
Starting point is 00:33:13 and they see all this stuff, and they're like, oh, I'd like to, like, buy that or do a trial or whatever, right? And then you can see, like, okay, they downloaded this trial, and then, well, let's take the version where they actually buy it they'll go to our customer portal which again uses rudder stack so now we can see you know what they're doing in the portal
Starting point is 00:33:33 then they buy it and then we can track which license kind of like a license ID that they're using for that server and so we can say like okay okay, they went to this podcast, they looked at this blog, you know, they read our mattermost.com stuff, they went to the portal and bought the product,
Starting point is 00:33:55 then they actually started using the product. And then, you know, here's how they're using, you know, you actually know how at least that server is using the product, which can really help, I think, you know you actually know how at least that server is using the product which can really help i think you know what that enables you to do is not only like you know i mean you can start doing crazy things like trying to a b test some marketing site um which is something we're planning on like you know and start answering really detailed questions about like, of these people who visited like, you know, some certain page on mattermost.com. Like, not only did they even buy the product, but how did they use the product once they did start buying it? And is there something about like, those set of users that we should be thinking about from, I don't know, a marketing perspective, a sales perspective?
Starting point is 00:34:44 Should we be building more features for these people? Like, all this kind of stuff sort of allows you to really, really unlock the value of that data to a level that, I mean, we never, I don't even, like, we didn't really even get there at Heroku, to be honest. So it's kind of, I don't know, it's like we're going to have it. So, I mean, there's kind of heat that matter most, like we're gonna have it. So I mean, there's a lot of data modeling and like, you know, the stuff goes into it. So we're not totally there yet. But that's sort of what we're moving towards. Yeah, the feeling that I get personally, and I
Starting point is 00:35:19 would like to hear your opinion on that, like, you know, like of the first problems that data engineering had to face was getting access to all the relevant data, right? Getting access to the data, collecting the data consistently. This is a pretty hard problem, actually, also from an engineering perspective. And you can see that, but all these very complex platforms that they have been
Starting point is 00:35:40 like Kinesis, Kafka, it's not an easy engineering problem to solve correctly. But for all these years, it was all about how we are going to collect the data, put it in one data warehouse, then, okay, now all the things that we can do in the data warehouse with this next generation
Starting point is 00:35:59 like snowflake that we have. And it seems like even when we solve these problems, there are more problems coming up. And I think that the next cycle will be more about the lifecycle of data, how you track the changes that are happening there. And I think that was very interesting what you said, how you can track, for example, events that are coming from many different versions of the product out there. And I mean, it might be more profound in your case, because you have many installations and not all people are updating.
Starting point is 00:36:33 But I think that as we move forward, these problems will become even more important. And it will be, I think, very interesting to see how the industry will respond to that and what solutions will come up around this. So moving forward, we discussed a lot about your infrastructure and it sounds like Mattermost is a very data-driven company. Can you, and you touched some of the use case of what you are doing,
Starting point is 00:37:00 can you expand a little bit more on the role of data analytics and data in general inside Mothermost and how the company is using this data? Yeah, yeah. A good question. So, I mean, there's so many aspects to the stuff that we do. So I'll just touch on a few, but one of the first things that we actually built, and this was before RudderStack, and really it was just
Starting point is 00:37:32 based on the Salesforce data, but really we were trying to provide a way for the executive level to track sort of like financials and stuff, and do like financials and stuff and do financial forecasting and modeling and that kind of stuff
Starting point is 00:37:48 and have that all in Looker so that they can see what we really – and what we actually have now is sort of like a health of the business Looker dashboard that has all these metrics and stuff that we've defined and populated about how much revenue is coming in, you know, financial numbers, like how much revenue is coming in, all that kind of stuff. But then also like, how's our support team doing? Like, are we, you know, are we getting good feedback from the tickets or whatever?
Starting point is 00:38:21 And then also, you know, are more people installing Mattermost? Are more people going to the website? All these really top level things that they can get a view of so that they don't have to spend hours clicking around to all these different other, if we had all these dashboards separate or whatever, where they have to spend hours just trying to find where this data is, but providing that data to them
Starting point is 00:38:44 in just one place. And then actually we even built a board-level view, you know, because it's a venture capital company. You know, you have these guys from Battery Ventures. I should really know these names better, but, you know. The investors, right, they want to see see like, how is their investment doing? And by providing them with a like board level dashboard, that's even higher level, you know,
Starting point is 00:39:12 I think not only does that give like the board members probably be like, Oh, these people know, you know, like this company knows what it's doing. Like they're pretty mature on how they like do this stuff. So I think that's pretty cool. But yeah, and now that we have access to all this RutterStack data, we're able to answer so many questions for the product managers that they just didn't have answers for before
Starting point is 00:39:36 because the product managers are trying to make, they want to make the highest impact and best changes to the product to either A, make like add a feature that we don't have that, you know, Microsoft Teams has or Slack or something, or just making like a feature better. Like for instance, they're working on a project to revamp how threads are done. And threads, if you don't use Mattermost, it does threads a little bit differently than Slack
Starting point is 00:40:10 where Slack has the collapsed threads and then you have to click it and it's over in the sidebar. Mattermost just has inline threads so you can't collapse them. And that's always been the biggest thing when people switch from Slack. They're like, what are you guys doing? Why is this thread stuff so stupid?
Starting point is 00:40:28 But the product manager for that, he wants to make sure that he's building threads, not just to copy Slack, just so people stop complaining about it, but how do people actually want to use threads? And so we're able to provide him with a bunch of
Starting point is 00:40:44 really specific data points on how are people actually using threads, and how many concurrent threads are going on in some of these user chat rooms and stuff, and how many messages on average are posted under a specific thread, or under a given thread or whatever. And so he's wanting to use all this data so that he can make a better decision on which way do we actually go with this threading thing. And I think we're seeing that from really all the product managers at Heroku, or Heroku, at Mattermost, where they've been so data starved because of the second thing that now with RotorStack we really have the view of, hey, how are people actually using this stuff and like how can we make it better? So that's the one I'm kind of most excited about.
Starting point is 00:41:37 Yeah, sounds great. I mean, it sounds like we are talking about a company that is pretty much data-driven. I mean, almost every aspect of the company works on some kind of data. And yeah, it sounds like the industry and the tools out there are mature enough at this point to enable this kind of use cases, from the top leadership down to even the product manager who's going to use the data to drive their decisions. And that's very interesting to hear because it's pretty common to hear about data
Starting point is 00:42:12 in very specific areas of the company, like more about marketing, because of course marketing was one of the first functions in the company to rely on data to do that. But pretty much if you want to be competitive today, you need to utilize your data that you have in everything that you do. Yeah.
Starting point is 00:42:31 That's good to hear that you are doing that on Mattermost. All right, so we are very close to the end of our discussion. So one last question. What is next for the data analytics inside Mattermost? What makes you excited about it? It sounds like you have all the data there now to build some very interesting internal and external products. So yeah, what makes Alex excited? Yeah, so I think for me, it's really all that customer journey mapping stuff.
Starting point is 00:43:08 To really have that end-to-end view of things and really make sense of it for people so that non-analytics people can understand what we're showing them. Just being able to do that stuff, I think,
Starting point is 00:43:24 is going to unlock a lot of stuff in that direction. And I mean, you know, we're on the way to it. We're not there yet. I will, you know, I'm going to write some blog posts and stuff once we actually get it up and running
Starting point is 00:43:39 because I think it'll be really cool. Yeah. Sounds great. I'm pretty sure we'll have more opportunities to discuss again in the future about more exciting things that you will be building at Mothermost. So, Alex, thank you so much.
Starting point is 00:43:59 It was extremely interesting for me to hear what you are doing there. I hope you enjoyed your time and for the conversation. And I'm looking forward to chat again in the future. Yeah, absolutely. I appreciate you having me. So that was it with Alex. I think it was very interesting. We touched, as we said, many technical details.
Starting point is 00:44:21 And we covered the whole data stack that they have. A very interesting takeaway from this conversation is how many different moving parts a data stack has and how complex it can become and how important is to have the right tool for each job in this data stack and how difficult it is to actually maintain and actually deliver the data and enable everyone inside the company to use the data, even if you are using all the current best practices for building a data stack and operating it.
Starting point is 00:44:55 Yeah, I agree. I think it's really interesting that just having the data now across teams and in a place that is usable across teams is opening up all sorts of new opportunities that are going to reach into other departments that matter most. If you think about marketing and sales and really empowering them with data, much like they're doing with the product. So it'll be exciting to touch base with Alex in the coming months and see how the data spreads across the organization. Until the next one, thanks for joining us on the Data Stack Show, and we'll catch you on the next episode.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.