The Data Stack Show - 21: Data Integrity and Governance with Patrick Thompson and Ondrej Hrebicek from Iteratively

Episode Date: January 20, 2021

On this week’s episode of The Data Stack Show, Kostas and Eric are joined by the co-founders of Iteratively, CEO Patrick Thompson and CTO Ondrej Hrebicek. Iteratively helps companies know that their... data can be trusted by helping capture clean, consistent product analytics. Today’s conversation digs into the behind the scenes of Iteratively and how trust in data can help accelerate the velocity of an organization.Highlights from this week’s episode include:Patrick and Ondrej’s background and the biggest problem Iteratively addresses (2:50)Why some companies still use spreadsheet schema management and the potential pitfalls they’re setting themselves up for with this (4:39)Defining schema in the context of data (7:02)Viewing the process as a team sport (11:34)Identifying common mistakes and implementing best practices (13:46)A walkthrough of Iteratively (17:13)Utilizing a JSON schema format (26:58)Laying Iteratively on top of or integrating it with an implementation for analytics (30:36)Entry point into organizations (33:02)Organizational change and velocity realized after implementing Iteratively (36:04)What’s next for Iteratively? (42:47)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome back to the Data Stack Show. I have actually been spending time creating content around our data governance API at Rutter Stack, and that makes me really excited to talk to our guests today. We have Patrick and Andre from Iteratively, and they provide really cool tooling around data governance. I think a couple of the things that I'm interested in is first how the tool interacts with current analytics setups. You know, whenever you talk about data governance and sort of adjusting the way that you're doing things, you run into instrumentation problems. So from a practical standpoint,
Starting point is 00:00:45 just interested to know how they handle that. And then also interested to know they have a couple of different people that they probably serve within an organization, right? Data governance, as we've heard before in our conversation with Steven from Emuta, really crosses the organization across many different roles and teams.
Starting point is 00:01:03 So those are my two questions that I want to make sure I ask. But Kostas, what are you thinking about in terms of data governance and the iteratively team? I'm very excited to have them today on our show, mainly because as we have seen in the past, data governance is a very big thing. There are so many different things that need to happen in order to implement data governance. We talked about access control within Neuta and now we are going to talk more about data quality. And that's something very interesting for me from a product perspective, but also from a
Starting point is 00:01:38 technology perspective because a very fundamental part of data quality is how we can describe our data, how we can attach syntactic and semantic meaning to this data, and how we can track changes from that. And also, one more, how we can connect this syntactic and semantic meaning with business goals, because we don't do technology for technology, we do technology because we try to achieve something, right? So I think it's going to be very interesting, both from a product perspective and also from a technology perspective, to see what kind of technologies they use, how they represent this information, how they track this information, and also how by implementing all that stuff, the organization is getting value at the end. So let's dive in and let's start with
Starting point is 00:02:25 them. Sounds great. Let's do it. Patrick and Andre from Iteratively. Welcome to the show, gentlemen. Eric, thanks for having us. Definitely excited to be here today. Good to be here, Eric. Great. Well, why don't we do just a quick intro? We like to start, if each of you could just give a brief background and then just tell us what Iteratively is and the problem that you're solving. Perfect. Yeah, I'm happy to start on our side. So my name is Patrick. I'm one of the co-founders and CEO of Iteratively. We're working on Iteratively now with Andre for about the last two years. And previous to that was on the growth team at Atlassian for four years. And then before that, I had the opportunity of working with Andre at his startup. Alternatively, we're solving one of the biggest problems that we heard from six
Starting point is 00:03:08 months of customer discovery with different software teams, which was companies not trusting the data that they're capturing primarily because of human error. We solved that by really trying to centralize the tracking plan within these organizations and making it really easy for teams to collaborate and schematize the data that they're capturing. And yeah, I'm Andre. Nice to meet everybody. I'm the CTO and co-founder at Iteratively. I run the product team here. Like Patrick mentioned before that I was a co-founder of a company called Syncplicity, also in the data space and prior to that at Microsoft. So we have lots of questions about the product. And actually, it's Patrick, you mentioned the word trust, which we had talked about with a previous guest around data and data governance. So I definitely want to dig into that. And Anacostas has a bunch of questions. But I have a just a question coming from my background, because I've done the sort of spreadsheet schema management thing for many years. And when we started talking with you about being on the show, I just couldn't help but think that it's crazy that it's now 2021, I guess.
Starting point is 00:04:14 And still so many teams are doing the shared Google Sheet schema management thing, which just seems so wild to me because, I mean, it's primitive really with how advanced we can get with software. But you talk with customers who are using your product every day. Why are we in a place where companies still haven't moved past the spreadsheet for this? Yeah, no, great question. I think really, generally speaking, we were actually surprised by this as well. I mean, we spent a ton of time interviewing these companies and the pain came up time
Starting point is 00:04:47 and time again. And yeah, I mean, it was either a Confluence page, a spreadsheet, a Notion page, you name it. But the reality is that teams just didn't have good tooling available to solve this. And the problem becomes so acute at some point where you grow and the state has revenue driving at the end of the day. So a lot of teams are solving this in-house by building out their own internal tooling but the vast majority of teams out there today are yeah simply relying on a spreadsheet or nothing at all which inevitably
Starting point is 00:05:13 leads to a lot of human error within the process i think generally just comes to like a lack of knowledge or foresight like once you've been bitten by this before and have to suffer the consequences of bad data it's definitely something that you look to solve during implementations and as part of your process moving forward. But yeah, most companies don't intuit that there's a solution beyond a spreadsheet for documenting and collaborating around their analytics. Yeah. Yeah. It's, I mean, I've seen situations where it gets so bad that you literally just start over
Starting point is 00:05:45 because fixing it is way, way harder than actually just starting with a clean slate and doing it right from the get-go, which is pretty wild. Yeah, 100%. We've talked to a lot of companies. I think Airtasker comes top of mind for me where they had to pause their entire development roadmap for six weeks to unfortunately kind of throw the baby out of the bathwater, but to re-implement and architect their data model and start from scratch because none of the data that they're capturing was reliable. And it was a huge, huge shift for
Starting point is 00:06:15 them, but definitely something that, you know, when it comes to valuable for the business, data is kind of the lifeblood of most organizations these days. So something that was definitely worth the investment for them. A quick question. I think before we move forward and dive deeper into the product, let's discuss a little bit more about what schema is. I mean, I know that among us, it's a term that is very easy to understand and communicate.
Starting point is 00:06:40 But schema is something that can mean many different things in technology and in data in general. So what is schema? What do you consider as the data schema in your case? Great question. I'll actually pass this puck to Andre for helping kind of define some of the personas that we typically work with and then how we think about schema in the context of data. Yeah, definitely, Patrick. It's interesting because the word schema really isn't used very often in this particular space. People call this definition of analytics data, tracking plan, data plan, measurement plan. There's all sorts of terms, but schema usually isn't used. And we actually don't use it that often ourselves either for that reason. At the end of the day, it is schema and it's actually represented as
Starting point is 00:07:25 schema under the covers. But what it really is, is the structure and the definition of the analytics data that you want to capture and send to your analytics destination. It's the names of the events, the internal structure of those events, such as the attributes or properties that are attached to those events and the types of those properties. Are those properties numbers or strings or true or false values? What are some of the restrictions or rules on those values of those properties? That's all embodied in this so-called schema in order to not just define what the structure is, but then potentially enforce it when the data is actually collected and do some other interesting
Starting point is 00:08:05 things that we'll talk about a little bit later, such as generate code that matches that schema and helps developers instrument analytics. That's very interesting because I think that everyone who is involved in technology in general, the term of schema is usually more associated with something like a database, which usually is in your use case, it's probably, let's say the destination where the data arrives to, is delivered to. But what we are talking about here is actually how we can have a schema and how we can enforce or monitor the schema at the source where the data is created, where the data is generated and captured, right? Yes.
Starting point is 00:08:40 And that's a bit of like, let's say, a shift in terms of the perception of where the schema is implemented. So what's the value of doing that on the source? Why it's important actually to do that and we cannot just do it on the destination, like on the database and finish the work that we have to do around the semantics of the data and the structure on the there. Right, right. Great question, Costas. And technically you can, and we've, as an industry, have been doing that for a long time. Companies like Secbent have been around through a lot of contortions to make it all work and have the schema catch up with the data that's coming in. And it leads to a lot of problems, leads to a lot of data quality and data management problems, data analytics problems as well, which is really the big reason for why thinking about schema and structure way ahead of the
Starting point is 00:09:41 ingestion when the data is actually being instrumented and captured is so important. It means that the knowledge around what is going to be stored in a particular database and what the schema of the database is going to look like is known ahead of time. And the whole team can be on the same page about what's being captured, what is it going to look like when it's persisted in a data warehouse, and then how can we then analyze the data given a structure that we're all behind and all aware of? Yeah, I think the one thing that I'd add also is kind of when we think about how teams operate and analyze data, putting the effort ahead of time to define your tracking plan, define the analytics events that you actually do want to capture
Starting point is 00:10:27 is super helpful versus trying to take a reactive approach to kind of cleaning data or capturing data. Being as proactive as you can solving this in the source is very much a best practice when it comes to actually getting data that you can use and consume across your entire data stack. That's super interesting, guys. Actually, we keep mentioning things about the whole lifecycle of the data, right? We started talking about storing the data in the data warehouse of the database, the
Starting point is 00:10:57 point where the data is captured. And I assume that there are many different roles and many different people who are involved in this data. Let's say, lifecycler. So who are the people who are involved? Who are the roles? And at the end, who should care about the schema? And after that, we can discuss a little bit also on how you see who should govern the schema, who should define it, who is implementing it.
Starting point is 00:11:25 But I get the feeling that there are many people involved in this. And I think this is quite interesting to learn a little bit more about. Yeah, Gustav, great question. We definitely view this whole process as very much a team sport, generally speaking, when it comes to the definition of analytics,
Starting point is 00:11:42 the instrumentation of analytics, and the validation or verification of the data downstream as well. So that involves typically folks like data analysts, data scientists, product managers, defining kind of what are the success criteria of the features that they're working on, or the experiments that they're shipping, what needs to be captured in order to analyze that effectively, all the way from the engineers actually having to write instrumentation code to actually capture that data as part of their work to downstream data engineers who may have to update or maintain the data pipelines or the data warehouse to actually deprecating and collection of that as well, right? So generally speaking, if the data is not being used, how do
Starting point is 00:12:20 you remove that data? Is that data still something that's worth capturing? What is our risk profile for actually maintaining and storing that data long-term from everybody from security, legal, and compliance organizations to typically the governors within bigger organizations that we might be working with? So yeah, it's very much dispersed across the organization. Typically, the folks that we work with quite often are the product organizations as well. So yeah, typically your PM, your data analysts, and your engineers, really trying to create analytics, you know, integrate analytics into their software development lifecycle. well, but I'm just thinking about the situation that we mentioned earlier around things getting pretty bad, you know, just from a data quality standpoint. What are some of the common things you see? And I'm thinking about our listeners who may be in a role where they're working on this, right? Or this is part of their job. And we hear this all the time with, you know, data engineers or developers, especially earlier stage companies,
Starting point is 00:13:28 what are some of the things that you see are the best practices, maybe just a couple of practical things for people saying, man, I don't want to get into that place. And I feel like I have the opportunity right now to do something about it. I mean, other than signing up for iteratively to make the process easier, what are some of the, maybe some of the big or most common mistakes that you see? Yeah, not having an owner is probably the biggest one. So like having, you know, a central gatekeeper, somebody to own your tracking plan, regardless
Starting point is 00:13:53 of that's an iteratively or if you are using a spreadsheet or some other type of solution. Having somebody that really maintains the quality, the understanding of this, you know, shepherds that documentation throughout the organization is definitely something that's very important. And then creating consistent standards around taxonomy. So what is your naming conventions look like? How are you representing this data? Being able to pull in folks like the product manager and the analysts and having a conversation around what we should be tracking. Those meetings are super critical to the success of getting good, clean quality data into tools like Redderershack or you name it.
Starting point is 00:14:27 And generally speaking, we tend to view, and this goes into a little bit more about how the product works for editorly, but we tend to view that having a single source of truth that is codified is definitely the best practice. So being able to generate strongly typed SDKs that match those conventions is very, very important. And something that we've seen other companies like obviously Atlassian and the Airbnbs and the Ubers of the world adopt that really help improve overall data quality. Other than that, like, you know, the data model for each organization is really specific. So thinking through it, making sure that it's something that can answer the business questions that you have as an organization is very important. That goes beyond tooling at the end of the day to building a culture where people can feel empowered to be able to utilize their data, be able to ask insightful questions. So there's definitely a
Starting point is 00:15:18 cultural impact beyond just tooling within most organizations. Sure, yeah, it's interesting. I mean, we've talked to people who have come from different teams, right? Whether it's, you know, on the analyst side or the data engineering side, but you do have almost this role that acts as sort of an internal ambassador of sorts across teams, right? And so they have to interact with various stakeholders, but act as the owner, which is really interesting. And that's a common theme we're hearing more and more, but that really does seem to be a key piece of making it work really well inside of a company. Yeah, definitely. The other thing I would add there, Eric,
Starting point is 00:15:53 is in terms of the kind of problems in best practices, there's definitely the aspect of the schema and is the structure of the analytics event correct? The other aspect is the event firing at the right time and the right place in the source code. That's another thing that we see folks run into a lot where they think they've implemented analytics correctly, but it's actually not working that way in production. And the best practice that we see best teams follow, and it's what we recommend right now as well, is to add automation to analytics, just like you had automation for other functionality in your product and treat analytics as a first-class citizen inside your application. I feel like the industry has gone through this
Starting point is 00:16:29 mindset change maybe a decade ago on the security and performance side. Now we all think of security and performance as something that we, you know, as a feature, something that we pay attention to anytime we share, but analytics still seems to be a bit of a redheaded stepchild here, which you think is a huge mistake and moving it up the priority stack a little bit and encouraging engineering teams to add coverage for analytics into their unit and integration tests is paramount in our opinion. That's great, guys. So can you also give us a quick walkthrough of the product? Let's say I'm a new user, like just signing up on the product. What should I expect and what I should do in order to start realizing the value that I can get from the product?
Starting point is 00:17:13 Yeah, Gustav, that's a great question. So, I mean, a typical lifecycle for somebody who's adopting iteratively is they'd be working on importing their schema into the tool. And that could just be from a CSV or a mixed panel or amplitude export or some other type of export. They'd import that data into the tool to really kind of create that single source of truth. They'd invite their team into the tool as well. One of the things that it really is, it really is a documentation tool for the entire team. So we want as many folks to have access to it to be able to understand what is being tracked and why it's being tracked within their organization. They'd invite their developers into the product as well.
Starting point is 00:17:48 So a new feature or a new experiment that you're working on, you'd actually go define your events. And you have, you know, you can think of it as kind of like GitHub for analytics. You have all the same features and functionalities that you'd be used to for collaborating on code. So you can create a new branch for a new feature and experiment that you're running, add your new events, add your new properties on those events, assign
Starting point is 00:18:09 that to a developer to work on. They actually pull down a strongly typed SDK. So all of those new events that you've defined, all those new properties would get included in a bundle that we generate for your developers, they'd instrument it and actually verify that the instrumentation is correct and you get a lot of the benefits because of the type safety, which is built into the SDK, but we also validate all of the runtime payloads as well against the schema or the validation that you've actually defined inside of iteratively. They update the status of the branch and merge that back in similar to how you'd be merging in a feature branch inside of Git into your mainline branching code. And all of this kind of happens really seamlessly by keeping everyone up to date on what's happening, integrating with tools like Slack and Jira, making it really easy for everyone to have kind of insight into how their analytics are evolving.
Starting point is 00:18:58 Super important for us. And then we also sync that schema into other third-party tools. So if you're using an analytics tool like Amplitude or Mixpanel, we'll actually federate the schema there as well. So you have all of the descriptions that you've added for your event show up in all of these third-party tools, which makes it really easy for data consumers to analyze as you're publishing new changes to your tracking plan. Anything I'm missing there, Andre?
Starting point is 00:19:20 You covered the main parts, Patrick, definitely. Usually, based on your experience so far, how long it takes for someone to deploy products? From our perspective, it really just depends on the size of the organization and how much analytics they have in place today. If a company is starting from scratch, they can get successful in less than a day. It's really easy to get started and get going. If somebody has analytics in place, typically our recommendation is for them
Starting point is 00:19:43 to adopt the tool iteratively, progressively, use it for a new feature, a new product that you might be deploying. Make sure that it really meets the needs of your team and your organization, put it through its paces and then sort of treat your existing analytics as sort of technical event and then migrate it over time, add it to your test coverage as well. Integrate iteratively into CI, CD to really validate and give you kind of ongoing assurance that your analytics are correct. But yeah, it's need for work from less than a day to typically around two to three weeks for most of our companies. That's quite fast, to be honest. I would expect that might get longer, like to do it, especially because you have to include many different roles there and you need to set up some things to start like experiencing the value of the
Starting point is 00:20:25 product but that's a great indication that you are on a good track and you are building like a great product experience so well done guys yeah thanks good stuff i was gonna say the aha moment for most of these companies is when all of their events show up as green inside the tracking plan so that's really something that we've been focusing a lot of time and energy on. Depending on the size of the organization, most of the folks who typically need, most of the stakeholders we're working with are, as I mentioned earlier, kind of the data analysts and the PMs.
Starting point is 00:20:54 So typically as long as everybody's bought into analytics being important to the organization, it's relatively a painless process. Great. So guys, let's focus a little bit more on the schema because I have a feeling that the schema is a very important part of the product. I have two questions actually. One is more technical, the other is more on the business side of things a little bit. So technically speaking, what is a schema for you and how it is defined, what kind of serializations you use or you support, and
Starting point is 00:21:26 if there are like, what are the, usually the customers out there are using. Eric mentioned at some point that many people are just using Excel sheets or Google sheets for that purpose. And yeah, how you consume that, like how, how you version it and all these things around the schema itself. And also if you can give us a little bit more statistics from your experience, like how often a schema changes at the end in an organization? The first thing I'd say, Kostas, is as far as the user of the platform is concerned,
Starting point is 00:21:56 they really don't interact with the underlying implementation of the details of the schema itself. We try to hide as much of it as possible because it's not really relevant to the day-to-day operations. Behind the scenes, everything is driven by JSON schema. That's the format that we decided to double down on for our integrations. It's a de facto standard in the world of analytics data. Pretty much all analytics data
Starting point is 00:22:21 is represented as JSON today. So it was a natural choice for us. And we use the JSON schema not only to push the definitions, the tracking plan definitions into other tools like mixed panel and amplitude and snowplow, but also to drive the validation of the data on the client side. So the schemas that the rules and the definitions that get defined in the iterative delay tracking plan are represented as a JSON schema document, which is bundled into the SDK that we co-generate. And we use just a standard best of breed JSON schema validation libraries to validate the payloads against those schemas. And if we detect that anything is,
Starting point is 00:23:02 or if those libraries detect that anything is off, we let the developer know right then and there that there is a problem. The layer that sits on top of this is, like Patrick mentioned, very Git-like. So there is support for versioning of schemas. There's support for branching of schemas as well. And it works very similarly to how a code, source code gets versioned and branched as well there is a a way for folks to propose changes to the schema in a staging version they can comment on those they can collaborate around those when they're ready they publish a new version which generates a new version of the of the tracking plan and ultimately when they're ready to merge those changes into the mainline tracking plan branch they go ahead and do that just like they would in, let's say, GitHub or Bitbucket.
Starting point is 00:23:49 The other thing related to versioning that's probably worth mentioning is it's not really just the tracking plans that get versioned. It's the events as well. So every event that gets changed and published gets a new version. We were inspired by the work that the Snowplow team did with Igloo and specifically the schema of respect, which defines how you apply semantic versioning to schemas that represent data. And we've adopted that approach for our event versions as well, which lets us and our customers tell whether a particular change to an event schema is minor, meaning it's backwards compatible and forwards compatible, or whether it's a major, major change that will require usually changes on the backend
Starting point is 00:24:36 where the data is stored in order to persist the new version of the schema correctly. So that's the story there on versioning and branching. As to your last question, Costas, as to how frequently these change, every customer is a little bit different. There's definitely a lot of work that happens upfront to make sure that the tracking plan is correct initially. There's a lot of common events that everybody wants to capture related to user identity, sign up, sign in, log out, page views, things like that. So there's definitely a lot of activity up front. that cares about measuring the success and the outcome of a particular release or a particular feature will come to the tool on a weekly basis to add new branches and new events to their tracking plan for whatever new features they're working on. Other companies that may not be quite as data mature will come in a little bit less often and only create events when the marketing team or
Starting point is 00:25:41 the customer success team has a new analytics requirement. That makes total sense. Andrei, I have a bit of a more technical question based on the stuff that you mentioned, especially about using JSON as the internal serialization for presenting a schema. I'm old enough to have gone through many different technologies that have to do with representing schema and structure of the data and semantics around the data, starting from XML schema, for example, which is something that is super expressive, right? Even to things like ontologies and all that stuff. On the other hand,
Starting point is 00:26:18 we have something like JSON, right? Which actually it was never intended for describing something like schema and it's very lightweight. And it doesn't have like the expressivity of the technologies on the other extreme. How limiting is this? And do you think that it's in terms of capturing and monitoring and dealing with the schema of the data and trying to create this on top layer, which is more semantic about both like the meaning of data and and also the structure of it. Do you think that there's something missing there? Is JSON enough? Or do you see that in the future the
Starting point is 00:26:50 industry will come up with new ways of implementing and describing the schema for data? Yeah, it's a good question, Kostas. As far as our use cases have been concerned so far, the JSON schema spec has been phenomenal and we haven't come across any core definitions that we wouldn't be able to represent in a JSON schema format specifically for analytics data. So we've been very happy with the standard and I think are using it to its full potential. There are a couple examples where we've had to think about extending the standard and actually the igloo standard, which is itself an extension to JSON schema comes to mind. And we use it as well, where the ability to specify some of the metadata around the schema is not always possible inside
Starting point is 00:27:36 the JSON schema format. Things like owners of the schema or the internal name or the display name of the event we don't have, or the metadata or the sources where the particular event is supposed to be captured from. There's really no place to represent that in a JSON schema document and you need an extension. But as far as the core structure of the event is concerned, we've been able to get everything that we needed from the JSON schema format. That's great.
Starting point is 00:28:04 One last question from me before I let Eric ask his own questions. We talked about the schema from a technical perspective so far. So that's a question a little bit more of like, let's say the business perspective of things inside the organization. How the schema at the end is connected with my business goals. And how, let's say, I translate these goals into the schema and track the data that are going to be used in the future to do analysis, come up with KPIs and all that stuff. So how do you see from an organizational point of view, this connection happening inside the organization? And you also as a company, because you are interested in order to communicate the value to your potential customers.
Starting point is 00:28:51 How do you communicate this? Great question. So the way that that typically happens is pretty messy, actually, within most organizations. But quite often it happens within more planning meetings, typically within the team. So if you're, imagine you're working on a software team and you're releasing a new experiment, typically you would define your, both your macro and micro success metrics for that piece of work as part of the spec
Starting point is 00:29:15 that you're publishing before any engineering work actually kicks off. As part of doing that, then you'll break that work down into, you know, tickets that your engineering team might have. You'll break that work down to kind of the micro level goals that you have for that work as well. And typically it's at that point where it really comes into play, where you're actually defining the actual representation of that schema inside of our tool and linking those changes into
Starting point is 00:29:40 kind of your product spec sheet with whether or not that's in Confluence or Notion or some other type of tool. But yeah, typically those two those two pieces happen separately question and this is more practical just thinking about being a user so let's say i've done i already have an analytics implementation right so i've you know implemented you know whatever segment or or a direct implementation with mix panel or what you. So could you just walk us through, because when we talk about this stuff, well, when we talk about specifically using iteratively, it almost sounds like, is this kind of replacing the instrumentation that I've already done? How does that work? Could you just give us a really practical sort of technical run through if I already have, say, a mixed panel implementation instrumented for analytics? How, from a technical
Starting point is 00:30:30 standpoint, do I lay iteratively on top of that or integrate it? What does that look like? Yeah, there's two ways that we advise customers to deal with this, Eric. One is to just go in and, yeah, take out the kind of ad hoc mixed-valent instrumentation that you have in your product and replace it gradually, progressively, like Patrick said, with iteratively. So it's what most of our companies do. They come in, they sign up for iteratively, they figure out what their most important events, what are the key events that they're tracking about their customers are. They define those in iteratively.
Starting point is 00:31:01 They create a new version of that tracking plan, and they ask the developers to migrate those events over to iteratively and get the strong typing, get the CI checks and the testability support for those events. And then they treat the rest of the events as technical debt that gets chipped at over the next couple of weeks, couple of months, couple of years. It really depends on the company. The second thing that we're working on right now is the ability to effectively audit and inspect the instrumentation that you have in your product today through an SDK like Mixpanel. So we have an SDK of our own called an audit SDK that will effectively hook into Mixpanels or Amplitude Source Segments SDK and monitor the events that are being sent, report those
Starting point is 00:31:45 over to iteratively and compare them to the tracking plan. And if we spot issues, events that aren't defined in the tracking plan, events that are being tracked differently or with different property types, we would alert the tracking plan owner and let them know that, hey, something is wrong. And that's usually the idea here is that the idea here is that the, the development team is going to get so fed up with all the, all the problems and all the issues that are being raised by the, by the PM team or the analytics team that they'll just go ahead and implement the, the iteratively SDK in the, in the product, or at least accelerate the implementation of the SDK so that they are in sync with the, with the tracking plan.
Starting point is 00:32:20 And these problems just don't arise. Sure. That makes total sense. Yeah. It's kind of the, the diagnostic approach, right? You may not know how sick you actually are. So you may, you know, you may be less willing to do surgery on the solution. That's super interesting. Definitely. And the, you said you work with, you said you work with a lot of product teams. Could you talk a little bit more about other teams you work with? Is that your main
Starting point is 00:32:46 entry point in terms of people who are interested in using it? Or do you also sort of begin conversations with analysts? It sounds like the devs sort of get involved after an internal stakeholder who needs higher quality has raised the conversation. Yeah, our main entry point within most organizations is the data analyst or the analytics engineer within those teams, followed by if it's more of a, you know, the head of data or VP of analytics as well. Typically, they're the ones that have the most pain related to, you know, data munging or data quality. So they're the ones who are actively looking for solutions. And then they typically introduce us to the product manager. It could be a data PM or the head of product within those organizations and the
Starting point is 00:33:30 engineering team for kind of solution validation. And one follow-up question to that, and this is more of a, you know, Costas and I work on product all day, every day as well. And so this is more of a selfish question, but I think it's interesting that you have these various stakeholders and even on your website, you sort of have, you know, different percenters who are stakeholders involved. How has that influenced the way that you think about building the product and sort of feature prioritization since there are multiple people involved in the equation? I mean, even down to interface decisions and other things like that, we'd just love to know about how you think through the product development process with a couple of different personas who are ideally using the product. Yeah, it's really hard, Eric. It's hard
Starting point is 00:34:15 to balance these. Honestly, I wish I had a better answer for this one. I think that the decision that Patrick and I have made that we have to, it's kind of like air or water. We need both. So we got to make the data analyst, the person who's responsible to kind of the gardener for your tracking plan successful with the tool. So the interface that they're interacting with on a daily basis, the website, the support for, you know, for branching and kind of shepherding and curating the tracking plan must be top notch. And then the developers have to have a great experience as well, right? They are key to a successful implementation of the analytics platform. You need, you need both in the, in the picture. And yeah, we've, we've had to kind of struggle there a little bit and straddle both,
Starting point is 00:34:57 both sides of the fence here to, to build a product that does support the organization as a whole. Sure. And this is, this may be sound like an interesting question, but talking about stakeholders brought it to mind. I'd love to hear any stories you have around, you know, so let's, let's quickly talk about the percentiles that we just mentioned. So we have, you know, an analyst and a developer. And so the analyst has a major data quality problem. They find iteratively, they get adoption, they work with the developer to implement it. I'd love to hear any stories you have about how that's impacted the organization beyond just those stakeholders, because they sort of see the, you know, the sharpest end of that, right? In that
Starting point is 00:35:41 they're solving a problem that's making their job really difficult every single day. But when you get two or three layers removed from that in an organization, say, you know, you go to, you know, a VP or even someone on the executive team, have there been any stories where the work around data quality has sort of reverberated around the organization at a pretty wide scale? Yeah, definitely. I think they're, you know, the two main drivers that we typically see as far as uptick when iteratively introduced to our organization is one is a lot of the data quality issues purely from a human error perspective go away,
Starting point is 00:36:15 which is easy enough for these organizations to quantify the number of data bugs that are being raised in JIRA. And the second one is kind of speed of instrumentation. So there's a lot of time savings involved when you have a tool that can really manage this process and cut down on all the kind of back and forth between stakeholders within this organization.
Starting point is 00:36:33 But typically we see a lot of folks who have it really in place and this process adopted. It's just a lot easier for them to get their analytics tracking code shipped to production. At a higher level, when we think about the reporting aspect within most of these organizations, we tend to get really good qualitative feedback
Starting point is 00:36:49 from our VPs of data that like, hey, PMs are starting to take ownership of tracking and starting to get integrated more into our release process. We have PMs who want to actually look at the data when they're shipping features. So we tend to see this more organizational change. It's harder to quantify when it comes to kind of the business value that's being derived
Starting point is 00:37:09 from that. But yeah, definitely more of an organizational change when folks are actually planning and thinking about tracking upfront, you know, before the feature actually ships. Quite often, most organizations tend to ship work and think about analytics as an afterthought or where their CEO comes down and asks, hey, how did that release go? And you have to kind of scramble to get some data, which isn't a great way of working. Sure. What's been really interesting, Eric, as well, is just, and this was surprising to me in the early days, is just how much of the organization actually cares about analytics data beyond the PMs and the analysts directly.
Starting point is 00:37:45 So not going through the analytics or through the data team. The analytics data that folks capture doesn't just get sent to Mixpanel or Amplitude, right? You know this firsthand. It gets sent to many other tools these days as well. Marketing automation tools, sales automation tools, customer success tools as well. And it's those directors and heads of those departments that are actually raising their hand and pushing for quality as well so that they can rely on this data for doing their job well. And that used to, I think, not be quite the case in the past, but is coming up more and more often. And when we look at the customers that we've probably made most successful,
Starting point is 00:38:21 it was the rest of the org that supported the iteratively deployment and was excited about the possibility of actually being able to influence the tracking plan and have a say in what's being captured in a collaborative way, and then actually be able to count on high quality data to hook up to email campaigns, for example, and drip campaigns and things like that. Sure. Yeah, it is really interesting. The one theme, and Patrick, you may have mentioned this term specifically, so forgive me if I'm repeating it, but one thing that both of you have talked about that's a byproduct of having accurate data is velocity, which is really interesting. You don't necessarily, you know, when you think about the word trust, like I trust my data or I trust my,
Starting point is 00:39:13 you know, data team that they're giving me the right information. You don't necessarily immediately think, oh, that translates to moving faster, right? But if you don't have to think twice about making decisions on the data, that's a really, really big deal for a company. And it's just funny, just thinking about, you know, sort of the classic startup wisdom, everyone talks about moving really fast, but you don't necessarily think about trust in data as a major lever you can pull in order to increase velocity. So it's really neat to hear that that's sort of a major consequence of iteratively being part of an organization. No, definitely. And it definitely goes twofold as well, right? Like I think back to my days on the growth team at Atlassian where we'd ship experiments only to find out six weeks later that
Starting point is 00:39:58 we actually forgot to have all the tracking in place and we weren't able to analyze the success of those features. And having to relaunch the experiment means that that was six weeks of missed learning opportunity for the business, right? At the end of the day, like those who win in the market are going to be those who are best enabled to kind of have this high velocity model of being able to take these learnings and actually drive insights and influence decisions. So analytics and data trust are huge drivers of that at the end of the day. So we are close to the end of our conversation, which is super interesting, by the way.
Starting point is 00:40:39 And I have one final question for you. We keep talking about trust around data and how important it is and of course how important is the role of product theoretically for increasing the trust of data based on your experience so far because okay you're i i would assume you're an expert right now in terms of trusting data what is missing right now to increase even further the trust that we can have on the data that we are using in an organization? That's one question. And as a continuation to this question, what is next for you? What are your plans? How you are going to keep delivering value on this front? And what exciting is coming in the next couple of months from you? I can take the data trust one and I'll hand off the last one to Andre.
Starting point is 00:41:26 I mean, specifically when it comes to data trust for us, like we think of this as very much like, how do we get the entire organization bought into this way of working where you think of analytics as integrated into your STLC? And generally speaking, we want to make it so that analytics are less of an afterthought.
Starting point is 00:41:40 So for us, it means living where our customers are living and the tools that they're living in. So like building out great collaboration and workflows with tools like Slack and Jira and other types of solutions that teams are operating in, and then making it really easy for people to have a good understanding of the data and how this data is being used and consumed within the organization, being able to tie in your tracking plan into potentially the report that's being consumed in a tool like Amplitude or filling out more of an integration with something like dbt or even directly linking into your reporting tools like Looker all this stuff is very much for us trying to create this
Starting point is 00:42:17 you know single source of truth around around your analytic schema and the ontology of that as it evolves over time. Andre, anything you'd add there? Yeah, Kostas, I just would add a couple of things to what Patrick said. I mean, we want to build the best collaboration tool for tracking plans. At the end of the day, that's our big mission. And we started with making the experiences as nice as possible for the governors, we call them gardeners sometimes, and the developers in order to improve analytics health overall inside an organization. I think
Starting point is 00:42:45 what's next for us is doubling down on that, but also adding more integrations, both on the destination side, the analytics destination side, as well as the schema sync side, and then adding some analytics health and monitoring for products that have had the SDK instrumented in them as well so that we can give folks a more holistic view of not just what's implemented, but is it actually doing the right thing in production? Is it actually working the way that they had expected when they created the track compliant? That's a big thing for us going forward. Andre and Patrick, thank you so much for the amazing conversation today. I hope you enjoyed it as much as I and Eric did. We've learned many interesting things, data quality and schema management. I think it's a very important
Starting point is 00:43:32 part of every data stack that is built out there. And I'm pretty sure that there are very exciting times ahead of us, especially for you and your company. So let's meet in a couple of months again and see what's new, what has changed and keep the discussion. I'm pretty sure we have even more things that we can talk about. Definitely. Gustav, Eric, thanks so much for having us. Well, that was a really interesting conversation.
Starting point is 00:43:57 I think one thing that stuck out to me was a common trend that we're seeing and that is the concept of trust as it relates to data within an organization. And we've just heard different ways that trust impacts an organization. And it was really interesting to hear the guys from Iteratively talk about the way that they see, you know, trust around data impact various teams and then the organization as a whole. So just that theme, since we've heard it multiple times, really stuck out to me.
Starting point is 00:44:29 What jumped out to you, Costas? Yeah, absolutely, Eric. At the end, data governance is all about trust. And in order to put the right trust there to your data, you need many different things to happen at the same time. One is access control, as we said, also at the beginning. But data quality is also super, super important. Without this and without the right mechanisms there to understand when we can trust and when we cannot trust the data anymore. And as a second step, also try to figure out
Starting point is 00:44:59 what went wrong and how we can fix it. I think it's of paramount importance for defining and implementing a really robust data supply chain inside the organization. I thought it was an excellent track to solve at least part of this problem. It was very interesting for me to hear how they do it,
Starting point is 00:45:18 the technology that they use, how they build that on top of qualification and how they perceive actually and deliver this product as let's say a suite of productivity for the whole organization but for the developers also. So we have all these SDKs that they are strongly typed where they can help the developers avoid many bugs and going back and forth fixing the problems. And at the end, if we fix that, we can also end up having a very robust quality assurance mechanism inside the organization. So yeah, and still quite early, right? I'm pretty sure that like in a couple of
Starting point is 00:45:59 months, if we chat with them again, they will have even more exciting things to share with us. Absolutely. Well, we will catch you next exciting things to share with us. Absolutely. Well, we will catch you next time on the Data Stack Show. Subscribe to get our weekly episodes in your podcast feed.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.