The Data Stack Show - 154: Making Cross-Company Data Exchange Easy with Pardis Noorzad of General Folders

Episode Date: September 6, 2023

Highlights from this week’s conversation include:Pardis’ background and journey in data (3:24)AI before the hype (8:37)Founding General Folders (12:36)Data collaboration challenges (15:31)Examples... of data sharing (17:40)Data transfer in various industries (22:16)Defining the transfer problem (28:30)The demand for scalable solutions (32:06)Data transfer and model exposition (41:02)Data governance and API (43:23)Final thoughts and takeaways (56:48)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome back to the Data Stack Show. Kostas, I'm so excited. We are talking with Pardis, who has been part of data science teams at some really big companies. So a big, huge retail provider in India, Twitter and the social networking space.
Starting point is 00:00:46 And she recently started her own company in the data collaboration space. And I am really excited because I don't think we've covered data collaboration or data sharing, which is sort of, you know, some sort of data transaction between two companies in detail.
Starting point is 00:01:04 And so I actually want to talk with Parviz about how big that world is. I've done some of that in the past, actually. Of course, I've gotten weird with advertising data and marketing. But that's just one slice of it and i think this is actually a much much larger footprint of companies doing it than we probably even think and so that's what i'm going to ask how about you so eric like from what i understand when we click that button that says do not sell my data it's like they should say do not sell my data. It's like, they should say, do not sell my data to Eric specifically, right? Like, that's exactly what was happening.
Starting point is 00:01:49 You are that person. Like, that's the person. Yeah, I was in a former life. Yeah. I was in a former life, yes. But now, like, you're in, you know, like, the path of light. That's exactly.
Starting point is 00:02:02 Okay, good. Light, yes, yes. of light. That's exactly right. Yes. Yes. I don't know. Maybe I'll have to pay for those sins one day. Getting weird by people's data rarely leads to good outcomes. Yeah. I can guarantee to everyone you are reformed. You are a good citizen today. You're not doing that stuff anymore. Okay, so yeah. First of all, I'm also very excited that we have Parviz on the show. I know her for a while now. And
Starting point is 00:02:33 she's like an amazing person outside of the problems or the technical solutions that she's going after. She also has a very interesting journey with starting a company. And I'd love to learn more about that. I'd love to hear like her experience and what made her do this.
Starting point is 00:03:00 And we'll take it from there. And of course, like we are going to also ask like more technical questions and more of like it from there. And of course, we are going to also ask more technical questions and more product-related questions. Indeed. Well, let's dig in with Pardis. Yeah, let's do it. Pardis, welcome to the Data Stack Show. We are so excited to chat with you. Thank you, Eric. Thanks for having me on the show.
Starting point is 00:03:19 I'm excited to be here. Very cool. Well, let's start where we always do. So give us your background. How did you get into data? Did you start working on data in school, after school, and sort of what's your journey been like? Definitely. So I did my undergrad in software engineering at the time, you know, in high school and in elementary school, kind of was super interested about, you know, how the internet worked. And I thought it was super fascinating and, you know, wanted a degree
Starting point is 00:03:53 where like, or a program where I could learn more about how, you know, computers are connected to each other and you could share information in that way. And so it wasn't a difficult decision. And then in the ECE program that I was in, we also had this robotics kind of, you know, team. I would always pass by the room where they were testing out these robots. And so I started getting interested in AI, smart software, and things like that. And so I thought the evolution of just regular software is smart software. And so I started reading up on AI, took some courses, started with reinforcement learning, and then actually did a master's in AI where I learned more about more things and deep learning and just
Starting point is 00:04:53 machine learning algorithms and stuff like that. And then from there, I moved into a, you know, as I was getting more interested into like ML and I'm trying to, you know, as we were comparing these various methods to train models, I got into a lot of math. And I thought, wow, wouldn't it be nice to have like a math degree. And I wasn't really ready to commit to a PhD at that point, but I started to apply to a couple of master's programs in math and applied math. And I got into this program, started doing some of what I wanted to do, which was work on some of the theoretical foundations of these ML models. But then I met this professor and he told me all about his work on social networks and graph theory. And I kind of changed what I was working on and got into a lot of that, which was super fascinating. And at the same time, I started taking some courses in the
Starting point is 00:06:07 MBA department. My school was super entrepreneurial and kind of pushing everyone to go and, you know, start a company. And that's where the, you know, the world is going and that's how you should get your job and things like that. And so, which was awesome. And I started taking some MBA courses and in the MBA department, there was this magazine that I picked up one day, marketing magazine, and they talked about, you know, your area of interest. I was going to say, we're getting into very trepidatious waters here, but I like it. Let's keep going. Yeah. And they were and they did a profile on Hilary Mason and the work she was doing in Bitly.
Starting point is 00:06:51 I then went on Google, searched, okay, and then saw some of her videos talking about writing dupe jobs and things like that. And I was like, this is so cool. It's kind of all the things, it's a kind of, you know, all the things that I've learned kind of coming into one place in this kind of type of job. And so I started looking for that sort of thing. And luckily, I found this really early stage startup. They were looking for their first data person and they were doing retail analytics and
Starting point is 00:07:26 building kind of an AI platform, analytics platform for retailers online and offline. And I got to work on super interesting problems. And I would say kind of the rest is history, but that's kind of the path to data science. Very cool. Now, I have a question. So I want to talk about what you're doing today. But let's just take a brief detour. So you were doing data science, you know, sort of in the retail space, and then later in the social networking networking space before it was as hot of a news item as it is today, right? And of course, AI is sort of the, you know, LLMs are creating a huge amount of news, you know, and everyone's talking about it. But back then, to your point, it was data science, it was machine learning, right? I mean, maybe some people called it AI. Can you just talk about, you know, you worked in it for years, sort of before it was as
Starting point is 00:08:34 crazy as it is now, before it hit fever pitch. And so, can you just provide us some perspective on that? You know, has that much changed, actually? Or is it just sort of the next manifestation of things that have been happening for a long time? Totally. I think it was around 2010 and maybe a little earlier when I started to get into AI and things like that. I would say it was already kind of becoming very, you know, interesting topic. A lot of people were talking about it. There were professors in University of Alberta, University of Montreal, and then of course, Stanford and Berkeley, a lot in MIT, a lot of interesting stuff happening. And I was following all of these
Starting point is 00:09:26 kind of professors at these places and following their research, their students' research. And so I would say even at that time, it was pretty hot. And, you know, deep networks were gaining a lot of attention. Sparse learning at the time was super, you know, all the papers in this area would get thousands of references and things like that. So I would say even at that time, I was kind of following the crowd to some extent. I can't get too much credit. But in terms of like how it has changed, I would say it's definitely much more part of the public kind of, you know, conversation today than it was at that time, you know, in the university was a huge deal, but maybe outside you needed to go through, talk about more detail.
Starting point is 00:10:32 Whereas right now, you know, you might go out with some friends and they're in other industries and they will ask, Hey, you know, tell me more about chat GPT and like how does it work and things like that. So definitely more of the public conversation today. Yeah, for sure. Yeah. That's a helpful perspective because I agree. I think a lot of times news cycles can make things feel very new.
Starting point is 00:10:58 But these are things that have been around for a long time. It's just that now maybe like your mom is texting you. Have you heard of chat GT with three questions? You know, and that's, that just makes it feel a little bit closer to home. For sure. Yeah. And I guess, you know, this kind of textual interface, the ability of, you know, reaching more people, making it more accessible has really helped, you know, everyone kind of feel the magic of AI. Right, as opposed to like having to have a pretty significant amount of domain knowledge
Starting point is 00:11:38 in order to see the magic. Totally. And, you know, like Google has been doing this forever, you know, with search and things like that. But, you know, it's really, it has now that people are seeing, really feeling the magic or things like that, which is really interesting and nice. I mean, I'm definitely happy to see AI, you know, being talked about all the time. I love it. Well, thank you for indulging my little sort of historical AI interlude. What are you doing today? So you've done a huge amount of work at, you know, at multiple different types of companies working on data science, AI stuff. But you recently founded a company, which congratulations, is very exciting.
Starting point is 00:12:36 Thank you. Yeah, I appreciate that. So I started General Folders because, you know, I have seen this recurring problem in every single job I've had. And up until, I would say, my last job, I didn't really, you know, think that this should really be a tool I can buy because up until that point, I wasn't responsible for buying and making inference decisions. And so at my last job, it was really my responsibility to think about,
Starting point is 00:13:15 okay, how can we make our team more efficient? How can we make our data more secure? How can we drive down the cost of managing this infra and build things faster and things like that with decisions that we make in terms of what we buy, what we decide to build and invest in, how we collaborate with engineering and DevOps and some of the other teams to build these things and connect these things together. And so I think that was an interesting role to have because now I could see that, okay, this is something that needs to turn into a company. And so I was really looking for people looking for ideas to tell them, hey, you know, this is something I want. I don't see it on the market. Can you build it? And at that point, I actually, when I left the company, I had a couple of ideas for things to build. And this seemed like something where, you know, you're managing company, other companies
Starting point is 00:14:29 data. It seems like, okay, initially it's a harder thing to build. There's more kind of things that you need to pay attention to more infra that you need to build to make something something like this work well. But I thought out of all the other things, this is something where I truly believe that the need exists. And it's something that I think me and later as a team, we can sell better. And so that's why I kind of, you know, started building this and started talking to a lot of people from various industries, just to make sure that this is not just something that I have been seeing all the time, but people are seeing this across various industries and in various roles and things like that. And so, yeah, I can talk about that problem as well.
Starting point is 00:15:31 Yeah. Well, let's, I'd love to zoom. Well, actually, why don't we do this? In just a couple of sentences, describe what General Folders does or sort of like the mission of the company, just a sort of level set, like at the simplest level? Totally. I would say at the highest level, General Folders is a tool to make business collaboration easy and secure. One very important aspect of business collaboration and partnerships is data transfer and data
Starting point is 00:16:05 collaboration. And so every time you sign a contract with another company, there's always an aspect of data sharing that happens. And you notice this when you become responsible for all the data for the company and you're like, oh, wow, Every time we signed a contract, there was this aspect of it that we needed to pay attention to. And so at the highest level, this is what general holders makes easy and secure. That's the mission of the company. Love it. Let's talk about data collaboration or data sharing, maybe could be a term or data transfer.
Starting point is 00:16:45 There's probably a bunch of different modes in which this actually occurs. Sort of one directional, bi-directional. But I just love to talk about, I don't even, Costas, have we even talked about this on the show? Maybe Brooks has an encyclopedic knowledge of the show catalog, but I don't know if we've ever actually talked about sort of two businesses sharing data and dug into that on the show. So I'd love to just talk about how big of a footprint does that have among businesses? Because I have done some of this in past roles, and it's a way larger world than I ever would have imagined. It's almost like looking at the ocean
Starting point is 00:17:36 and then putting your head under and seeing how big the coral reef is. It's like, whoa, there's way more here than I ever would have thought happening under the surface, right? And there's all sorts of crazy ways that companies share data. Absolutely. And you know, part of the reason why I think when you look online, it's hard to say, okay, how big is this market? Because a lot of the work that happens, I think one of the investors I was talking to kind of had a word for this, said this is gray area. And what that means is that a lot of the work that happens in this area
Starting point is 00:18:15 is kind of do it yourself. So companies just, the moment they see a problem like this, they start building some ad hoc solution. A lot of times as a company, you might not even know how big this is going to become. What type of headache is it going to cause you down the line? And it was the same for us. We started working with Valencia Hospital, started sending data initially, and then this problem became just they had more requests.
Starting point is 00:18:50 They, you know, the type of contract we had with them changed over time. Now we needed to change the pipeline. We didn't have really, for some of these places, we didn't have a bigger team to work with. So there was only one person. For some of these places, we didn't have a bigger team to work with. So there was only one person. They weren't necessarily well-versed in helping us manage these pipelines or things like that, which is completely okay. But it was causing so much problems for us to fulfill what we had promised to in these contracts. And so, yeah, I think a bit of a challenge in actually,
Starting point is 00:19:30 you know, calculating the size of the problem is most people don't even know how big of a problem this is, even inside their own company, and only realize it when, you know, you have a data team that's responsible for all the data in the company when you realize, wow, you know, there're sort of performing some sort of healthcare service or you're a software provider that provides some sort of software in a healthcare context that needs to be sort of rolled up into a larger hospital system. Maybe it's a subsidiary, whatever the context is, right? And so you have some sort of data that needs to get, you know, rolled up into a larger entity and shared or something. So that could be one context. Another context that comes to mind is, let's say you're like a payment processor that processes transactions or some sort of financial information on behalf of sort of maybe like an end user facing application, right? Where users are interacting with it, but then you need some
Starting point is 00:20:45 sort of backend service provider, right? Huge. That's, I mean, a huge market for that, right? And so you have data transfer between two businesses in that context. But what are just a couple of others? Because I think one thing that's really interesting to me about the nature of this problem is that it's so varied. I mean, we've talked about healthcare data and we've talked about financial data, right? I mean, two crazy realms in and of themselves, but I think there are probably hundreds more. Definitely. So those two are super important, especially because the data in this space is very regulated. Security and privacy are extremely important in those two spaces. So definitely way more here and, you know, we were working with companies that would provide data for, let's say, wanted to find out where to build the next clinic. So we needed data on all the clinics in a certain city or state or country and, you know, and the types of kind of demographics in those areas.
Starting point is 00:22:05 And so you go to a data marketplace, you want to buy data, you want to explore that data. And finally, you want to transfer that data into your warehouse. None of those steps are really, you know, there's not really a great tool for those steps, of course. And even the past year and in recent years, a lot of major cloud service providers have been providing really good kind of services in this area. But, you know, we for sure have not really had that. even in my last role, where we were seeing a lot of Excel sheets getting sent via email. And some of the procurement kind of process for us getting to that data just took forever,
Starting point is 00:22:55 you know, many months. So that's one. Another one that I'm particularly interested in is working with a lot of AI companies. I really like to work with these companies because I feel like there's a lot to talk about. And I also can get a lot, especially in the earlier stages of a company, get their help in evolving the product as well. But AI companies kind of need their customers' data to kind of build the models. And usually, you know, they all have various ways. If you go on their website, they have different ways that they allow for companies to, you know, deploy. And, but probably a lot of them will prefer SaaS deployment, you know,
Starting point is 00:23:50 forever, you know, software that was deployed with this approach is, you know, more efficient, it's easier, but it's especially for AI and kind of some of these because you want hardware optimization. There's that aspect to it as well, where it kind of makes some sense to like bring the data to their own kind of warehouse and where they are building and training these models. And so they want to move data to their warehouse. And it's extremely important for them for these pipelines to be reliable. So what they do is they kind of end up managing these end-to-end pipelines, asking for their customers' credentials, connecting to their warehouse,
Starting point is 00:24:37 and bringing the data in. And so what we want to be able to do is to tell these kind of customers, hey, we can manage these pipelines for you. We can monitor it. We're going to, you know, no matter what your stack, what your customer's stack, we will be able to offer this stack agnostic solution and monitoring. And also add on some additional kind of features that I think are extremely important,
Starting point is 00:25:10 which is like data validation on both sides. You know, so many times customers will send mistakenly PII or PHI that, you know, they weren't supposed to send. And these are like really simple kind of validation steps that you can do to just avoid any of that sort of activity happening between companies. I love it. Okay.
Starting point is 00:25:37 I have a question and this is actually for you and for Costas, okay? And I'm going to ask this question by sort of giving you three scenarios of data sharing. And then I'll pose the question. So this is a group question. I love this.
Starting point is 00:25:58 We don't do this enough, Kostas. I agree. So scenario number one, and let's file this under sort of like primarily infrastructure. So you have, let's use the example of a company, maybe two companies have a partnership or maybe it's a company who has a customer and they need to transfer data to that customer, right? And it's primarily an that customer, right? And it's primarily an
Starting point is 00:26:26 infrastructure question, right? Is this a situation where it's a pipeline question? I think parties, you said that well, right? I mean, it's a question of pipelines, you know, do we just give you creds to our database? Like, hopefully not. So okay, so someone's running pipelines, which means that I have to send data, you need to receive that data. And it creates a host of infrastructure issues around scheduling, pipelines, differences in infrastructure, validation, and all that sort of stuff, right? But you're my customer. And so it's really just a question of managing infrastructure and how do I get this data to be right. In the middle, let's talk about maybe a clean room or the concept of a clean room, which is where two companies have data that they want to dataset, but it needs to happen in a sort of agnostic environment where it's impossible for either side to get information that they shouldn't have. Let's call that the security category. So we have peer infrastructure,
Starting point is 00:27:39 how do I get the data to you? We have the security side, which is like, well, we need to share, but like security is sort of our top priority. And then the third scenario is what I'm going to call like our contractual exchange, right? And the example here is something that I did a lot of, which is really crazy to think about, but this is in the advertising space. And so there were certain audiences that these companies had that I wanted to advertise to, but they didn't want to send them through a data brokerage, like a traditional sort of cookie or data brokerage or whatever. And so I literally just sent them a check and said, send me some screenshots or a CSV
Starting point is 00:28:28 of the ad performance or whatever. Sort of the most primitive, really there we're talking about, maybe we can call this one sort of like economic exchange. Like I'm transacting and I'm exchanging money and it just so happens that there's sort of like data that's sort of governing the terms of the transaction or whatever. Maybe I'm being long-winded, but what I want to ask you and Costas is which of those categories, infrastructure, security, or economics, is the primary underlying definition of this problem, right? Because like you said early on, Pardis, there's some sort of contract that governs data exchange, whether it's official or not, right? And infrastructure and security, I guess maybe a question would be like, are infrastructure and security ways to describe some sort of spoken or unspoken contract? Or is it primarily like a security problem or primarily an infrastructure problem?
Starting point is 00:29:31 I would say it's primarily on the infrastructure side. You know, you have a lot of tools kind of right now where, and improving every day, kind of creating stack agnostic data transfer between two sources, whether it be kind of just batch data replication, whether it be streaming. And a lot of these problems are actively being worked on by a lot of great companies. And so I think the fact that, you know, in 2020, when I was working on this problem at my company, I still didn't have a tool to help me was because of that kind of credential management layer where because we're leaving the boundaries of one company, now it's not just an info problem. It's a kind of, you know, how do we manage the security for two parties problem. That was one. And the other is maybe higher up that stack,
Starting point is 00:30:41 which is kind of on the application layer side, which is problems of, you know, validation, but then also kind of cost accounting. Because now when you have two companies involved, right, who's paying for egress? Who's paying for the pipeline? Who's paying for any compute that maybe either parties kind of incur through transformations that they want. Not just the cost of the transaction. Yes, and the cost of the transaction. So there's all of these bits and pieces that need to be accounted for.
Starting point is 00:31:19 And it's not always clear who the motivated party is and so that's a complexity that's added when there's two parties involved or two or more right and so you kind of want a thing that will help you kind of you know split the check like and how to do it and be some flexibility there, where it's not really clear, I think, who should pay. And so, yeah, so definitely a security problem on one side. And then some little bits and pieces on the application side, where it's around costs and data validation and things like that. Yeah, I totally agree with all that. I would add probably like one more dimension to what both of you have said.
Starting point is 00:32:15 I think it's primarily like the problem is like primarily driven by market conditions and it's primarily an economics issue. And what do I mean by that? It's not like today, the first time in history that we have to share data. We literally have been doing that since the inception of the internet. We had protocols, we had FTP, and we made SFTP because we wanted it to be secure and security was always an issue. Right.
Starting point is 00:32:46 And at the end, okay, if you want like to be super, super secure, you can always, you know, mail a whole hard disk to the other person. Right. So what I mean by like market and economics problem is that when, let's say, in the maturity of the markets, the problem of the need to exchange data becomes important enough that it's actually every detail around that transaction becomes important to be figured out. And today, for many reasons that we can talk about, obviously AI is a catalyst for that. There is the need for more companies today to share data with other parties as part of the way that they grow and the things that they're doing. So we need to formalize that. Like we can't just, it's not like a, you know, a problem anymore that we can just be scrappy and everything is fine.
Starting point is 00:33:53 Like what I hear both of you like saying all this time is that you have this problem for like forever. But at the same time, it wasn't that important of a problem for the whole market to try and find a solution and build businesses on top of that and try to optimize it as much as it's possible, right? So I think what I find very fascinating about today is that we reached that critical point where the markets demand from our industry to go out there and find the solutions and turn these into an actual scalable product. So that's my contribution to your question, Eric. I don't know if I helped you, made you any wiser or less wiser.
Starting point is 00:34:39 No, it's interesting. I think it was, I mean, it was certainly a little bit of a loaded question, but it's fascinating to think about the friction in transactions, right? And it really sort of crosses all three of those vectors. And I didn't even think about, you know, it's like, well, we're paying for compute, right? And so how much of that are we responsible for and how much are you responsible for? And so I love the splitting the check analogy. But yeah, it's a very interesting multidimensional problem. And it seems like the transaction friction and the security piece are really driving a lot of that, right? Like, you know, the security breaches today are far more costly than they've ever been. And then, of course, both of these AI is also driving a ton of it.
Starting point is 00:35:35 So, yeah, super fascinating to sort of be at the, you know, at a critical point for that. So, yeah. Thank you. That was very helpful. I feel much more educated. You are welcome. That's why we are here.
Starting point is 00:35:49 Like both me and Pardis, right? Like educate you. All right, Pardis, I have a question. So I was hearing like all this time, talking about like the problems that you are excited about and actually like super excited, excited to the point where you started the company. But I keep hearing about two things.
Starting point is 00:36:14 One thing is, let's say, the technical side of things, which has to do with how we move the data around and how we can have like, you know, like specific guarantees around that and security and like all these things that we, we touched already. The other thing though, is also like, you talk a lot about what I would call, let's say the product or like the experience, right? That is driven by the need, like the business need, like that these people out there have, right, that is driven by the needs, like the business needs, like that these people out there have, right? And you mentioned stuff like, okay, you're having, you're making like a
Starting point is 00:36:54 contract and part of this contract is like some stuff around the data, right? Or there are some requirements around the data that has to be met, right? All these things are not, I mean, they obviously have, let's say, a technical dimension too, because you have to be able to automate these processes. But they are primarily driven, let's say, from a user need, right? Can you tell us a little bit more about that and describe to us what an experience like this looks like? Let's say I am an AI company and I do need to go to Eric and get his marketing data, user behavioral marketing data. Don't have the technology.
Starting point is 00:37:42 Forget about the technology. How does this interaction looks like like how do we do it definitely so like as you know ai company you will have you're probably like offering some sort of you know um modeling approach that you can help where Eric doesn't have to. As an example, I guess, given some of my experience working in FinTech, one area when you start working directly with customers is the issue of fraud detection. Fraud detection is a super complex problem. It needs so much experience, kind of, you know, just human behavior for you to understand how to solve fraud detection with ML and AI and things like that. And so even what data to collect and how to kind of structure that data is just needs
Starting point is 00:38:44 a lot of domain expertise. And so there are companies that say, hey, I can build a really great fraud detection models that if you start this financial services company, you can just start using this model. And so like in this case, that financial services company is now highly motivated to use kind of or try out these kinds of ai platform companies right and so now there's one initial evaluation phase where even before i sign a contract i want to transfer some data and see how it works on some of, you know, the past two days of data and try to see, okay,
Starting point is 00:39:28 are you guys able to use this tool to train a model to capture at least, you know, 80%, 90% of fraudulent activity? And then I can give you more data so you can get higher accuracy on the type of data and things like that. So there's evaluation, but then the moment we decide, okay, this is actually working really well. It's way better than something I can build in that amount of time, in the amount of time that
Starting point is 00:39:56 I have. Let me kind of sign a contract. I signed a contract. Now I want to send my data on a recurring basis for you all to kind of build that model. And it kind of depends on, you know, my volume of transactions and your need for like how often you need to update the model for us to come up with a sort of cadence at which data should be sent. And so if I'm low volume, I might even not need the model to be trained, like even more than once a day. But let's say if I'm very high volume, and if the types of stuff that can happen in a day is just too much, I might even need higher cadence or things like that. So it really depends on the type of business and the size of the business in terms of how often they decide to set the cadence of these pipelines.
Starting point is 00:40:57 But essentially, I think that's the workflow. 100%. And okay, so actually, it's interesting what you said, because you also described like part of another question that I have. So, okay, we have Eric, Eric gives to me his data, and I train a model and I have to somehow like expose this model back to him, right? So there is, let's say, that's the input that I need from him and the output of my work, that's what I'm also getting paid for, is the model, right? I don't think that it makes that much sense right now to get into the logistics around that,
Starting point is 00:41:39 like how this model ends up back to Eric and what Eric does with that. Let's consider this like a solved problem, right? But you mentioned, for example, how often these data transfers should happen, right? Which means that there is, like, it's not a one-time process. It's not like Eric will send to me a data set, I'll train this, and then we say goodbye, right?
Starting point is 00:42:04 Like, we have to iterate on this thing, which reminds me of like what, like usually like a pipeline is supposed to be doing in like data infrastructure. The difference that I see here is that we are talking about, usually when we are talking about like data in front, like pipelines, we are talking always about something internal, right? Like it's my data infrastructure, my pipelines, I run them. And that has like a lot of implications, both in terms of like the governance, but also like the technology itself, right?
Starting point is 00:42:36 Like it's a different thing to build like a pipeline that I know exactly where it runs, like even not which data center this thing runs, like the software that we will be building for that is completely different compared to be, oh, I need now a pipeline that is going to be connected to entities wherever, right? Without controlling anything in between. It's over the internet. So from a technical perspective, like what does it mean to establish this relationship between me and Eric and send the data like in a regular pace and like making sure
Starting point is 00:43:16 that we have specific guarantees around like the quality, both of the infrastructure and the data itself. Definitely. I can talk a little bit about the initial thing that you were mentioning to how to serve the model back. Usually these companies just expose the API and, you know, so for every new data point, just call the API with the trained model and kind of evaluate for that new data point. Let's say whether it's fried or not or things like that.
Starting point is 00:43:48 And so with respect to kind of governance, I think something that's really important to consider about kind of, you know, one, a tool like general folders and the other like customers of general folders is it's very similar to like every other kind of cloud based tool and so you know when let's say we sign a contract with any cloud service provider, right? We are you know, we have some control over, okay, I want to be in this country and I want these particular regions and things
Starting point is 00:44:25 like that. And then we, okay, sign these contracts on the privacy side, security side, and all of that. And this would be very similar. So when we go into business with any of these customers, the two sides already have a business contract. They already have a data contract where one is trusting the other side to manage their data, to kind of, you know, follow whatever limitations or requirements or regulations that this certain business has. And so they've already kind of signed that type of contract together. And that's where we go in and kind of try to adhere to those same rules. There might be cases, though, where the two sides don't have a contract signed.
Starting point is 00:45:22 Let's say one side is trying to evaluate the other kind of product. But we have a contract signed with both sides. For example, to set up a third party place, a trusted place where the two can collaborate. On that front, you know, now we adhere to whatever requirements there may be. So, hey, I want to be in this country, in this region, on this cloud or wherever. So very much like all the other kind of data tools will kind of follow those same processes. That makes sense. And what's like the difference between what you are having in mind as the solution to this problem compared to, let's say, products like the data sharing capabilities that Snowflake has, right?
Starting point is 00:46:15 I mean, they're also trying or they already succeeded. I don't know exactly how successful it is, but they have like kind of like a marketplace at the end, right? Because at the end when you are talking about like the end, what you are describing, it reminds a lot of let's say like a two-side marketplace transaction where you are, you know, the broker in between, right? You are making it easy for this part to like transact, right? Regardless of, obviously like in your case, it's not something that you buy, like you do in eBay, which is a physical object, it's data, right? So what do you see? How do you see these attempts to solve a similar problem? That's the first question, and then I'll follow up with another one.
Starting point is 00:47:05 But let's talk a little bit about the competition, let's say. Yeah, definitely. I think with Snowplake, first of all, they've been very public with their data on this particular business unit for data sharing and data marketplaces, they've released their data and it looks really good. I think for me, looking at that, it's actually kind of, you know, confidence building because, you know, this is a company that knows what they're doing there. They know their customers. And so for them to be public about this particular business unit, and kind of continuing to invest on this side is definitely a good sign. In terms of like data marketplaces, it's certainly a one to many kind of relationship,
Starting point is 00:47:59 which I think is very important in like data collaboration in general. Not the main focus for us, I think, early on. We definitely want the one-to-one kind of connections. We really want that type of experience to work on that first. See how that goes. That's one thing. The other is the approach that companies like Snowflake take on the kind of zero ETL side, right? They're saying, hey, don't move data around. You know, there's no need to replicate data from one place to the other. If you're both a customer of Snowflake, we can provide you a view of the data where you have real-time access. You see the exact changes. There's only one copy of the data. It's pretty efficient. From the perspective of this company, we believe there's all this other needs that are not captured with this particular kind of approach. And although this is super efficient and makes so much sense,
Starting point is 00:49:11 and we should be doing this when we can to save a cost, we want to be flexible to offer other ways of doing that. And so, yeah, so like I would say that's how we differentiate just to be more flexible in terms of what is possible. Yep. Yep. No, but a hundred percent. Makes sense.
Starting point is 00:49:33 Is there anything like, I would say that like a feature of these platforms that you really find interesting and before, does that like to make it like a little bit easier for you, like to understand the question, because I think it's like quite bad, I'll just like to make it like a little bit easier for you to understand the question, because I think it's like quite vague. I'll give you an example. I always found very fascinating how BigQuery has used the public datasets that they have for marketing purposes actually, right? So they are like exposing, for example, like the daily GitHub data, which
Starting point is 00:50:07 is like a huge data set, right? Yeah. And okay. It's like an amazing way for someone like to get exposed to BigQuery, right? Like I might be looking for something completely like, just for the data at the end and like, I'll hear about BigQuery. So I found like always very interesting and I think like successful, this like marketing activity that like BigQuery is doing with data sharing in a way.
Starting point is 00:50:36 Is there, but okay. You are much more into that product. So I'd love to hear from you you like something that really excites you that someone else has built i'm definitely very excited about snowflake i used the product and bought it at my old company i i really like it i think it's really makes the team quickly become very efficient and kind of independent as well. You know, as a data team, I felt really good to independently manage our infra and move as fast as we needed to.
Starting point is 00:51:20 While kind of, you know, it was also possible to manage our costs and we had dashboards and monitoring and all of that to manage those. So definitely excited about that. I'm trying to think of other products in the data space that I'm kind of very excited. Do you want in particular in the data collaboration space? I think you answered my question, to be honest. And I would like to close because it's my last question and then I have to give the microphone back to Eric.
Starting point is 00:51:59 With asking the same question, but for the things that you are building, something that you are really... I mean, obviously you're like proud for everything that has been built and I can relate to that myself. But if there's something like a feature or I don't know, like even something that you've learned by interacting and trying to like to solve the problem that has like a special place in your heart or mind, let's say. I'd love to hear that. Like something that was like surprising in a very positive way for you through like this journey of like building and starting a company and trying to solve this particular problem.
Starting point is 00:52:43 For sure. I mean, like I can think of so many things and you know, one of the things I would say that's kind of about people rather than product, which is it was just over this journey ever since I started the company, it's been so heartwarming to see kind of, you know, that I'm able, even pre-product, kind of able to talk to so many people who will give such good feedback and provide help and support in so many different ways, which was just, you know, makes me feel so thankful to be part of this community, to be able to have access to that. So that, I would say, number one, kind of really super interesting and exciting thing that I've experienced.
Starting point is 00:53:41 Okay. You have to choose only one. So you can say it. Yeah. So that definitely the rest is like a way for us to make sure that you come back because we have to hear okay
Starting point is 00:53:51 so that's all from my side Eric the microphone is yours again yes well we're we're at the buzzer as we say but Hardice I have a question I was thinking about how many different contexts you've worked in a similar problem space. So you studied software engineering.
Starting point is 00:54:17 You got into sort of data science type stuff, the mathematics behind that. You did that in several forms at several different companies. You're starting a company that sort of is in a related problem space. Now as a founder, is there sort of a lesson or a principle that you feel like has served you really well throughout all of the contexts or a piece of advice that you've returned to over the past decade or so that has sort of been a theme for you throughout all those different contexts? I wish there was something that I knew all along and there definitely wasn't. I mean, but there were so, there were so many ups and downs, and I'm sure there will be
Starting point is 00:55:08 many in my future, where something that I've, I would say, learned and, you know, that I think I want to take with me on this journey is kind of one, put people first and no matter what happens on the business side or product side or, you know, something doesn't work out, there's ups and downs and all of that, but really put people in relationships first. That's super important. And two, kind of always try to pay attention to yourself as a human, you know, exercise, eat well, pay, you know, go out with friends and spend a lot of time with other people. And so because we need all of that to function well. And so as I'm thinking about how to manage myself and as I've become a manager of more people to kind of ensure that everyone has a well-rounded lifestyle because it's super important for the long run.
Starting point is 00:56:28 Yes, yes. Very wise words. And I think those will certainly serve you well as a founder. Well, Pardis, thank you so much for joining us on the show. It's been a pleasure. So many things we didn't cover, so we'd love to have you back sometime. Thank you so much. This was so fun.
Starting point is 00:56:44 I appreciate you having me on the show. Man, Costas, this episode with Pardis from General Folders really got me thinking about economics a lot. I mean, I know that sounds weird. But when you think about the idea of two businesses or two entities sort of collaborating around data, whatever that looks like, sharing one way, bi-directional or whatever, it's a huge infrastructure problem, right? I mean, the fragmentation makes it a nightmare for anyone who's had to do that, which is really interesting. But the more Cardice talked about it, the more you realize that the fragmentation on the transactional side is actually far worse than it is on the infrastructure side. And so the infrastructure is almost a proxy for the way that companies are trying to interact. And so I guess maybe you think about the Snowflake marketplace as an interesting sort of economic
Starting point is 00:57:47 layer on top of their ecosystem that gives them a lot of market power. But if you actually zoom out and think about the economics of that in an infrastructure agnostic way, that's pretty crazy. So I'm going to be thinking about that a whole lot. And that's just such a compelling idea in general from Pardis. So that was my big takeaway. Yeah, 100%. I totally agree with you that there's some very interesting economic implication in all the stuff that we discussed about with Pardisans. That's what I'm going to keep also, to be honest, because it is, let's say, an extremely strong signal
Starting point is 00:58:34 that data is turning into actually kind of a commodity. Like we will start like transacting on top of data. I mean, we were doing it already, but it was like more of, let say a nice kind of neat right like yeah financial markets are known for like being ahead of their time in like transacting over data because data is actually like the most there right sure sure probably something similar might happen already like with like in the most, right? Sure, sure. Probably something similar might happen already, like with, like in the medical space, right? But what we actually see here happening is that we are rendering this new,
Starting point is 00:59:17 I'd say that like era in a way where actually we're going to see a lot of economical activity on top of data. Yeah. Actually, we're going to see a lot of economical activity on top of data. Something that is accelerated by this whole AI thing that is happening. But that's just a catalyst that makes things go faster. It's not an enabler. It was there. People predicted that it will come, and it seems that it's coming faster than we thought.
Starting point is 00:59:44 So that's what i keep and i would encourage everyone like to go and like listen to the episode because there are like i think like some very interesting insights around like what the future will look like yeah we should do a shop talk on that because it's it i hadn't thought about this, but you bring up the fintech industry transacting over data, right? And one thing that's really interesting is that what makes that possible is that you assign value to data that allows for transactions on a very wide scale across a very large number of people in the market. And I didn't even think about this being the beginning of an era where you have all this
Starting point is 01:00:28 fragmented data across all this fragmented infrastructure across all these different companies and beginning to see value being assigned to that even through the fragmentation is that's crazy. Brooks, we should do a shop talk. That's going to be wild. Yeah, 100%. And just like to make something like clear, it's not just be wild. Yeah, 100%. And just to make something clear, it's not
Starting point is 01:00:47 just fintech. Even traditional financial markets were doing that. If you go and see the traders in Wall Street or these HFT things or the
Starting point is 01:01:03 hedge funds, it's all about data. Yep. At the end. Because, and that's an interesting conversation I had with someone who came from there. He's not doing that anymore, but, like, he was saying that, yeah, like, the strategies, the algorithms, they all get known at some point. So,
Starting point is 01:01:20 all that is left at the end, that is, like, consistent, is the data. All right. We're doing it. Okay. Yeah. I think we should do definitely a shop talk and also maybe find people to have a panel or something around that stuff.
Starting point is 01:01:38 But before that, everyone should listen to the episode because there's a lot of discussion around that with Partition. Yes, and I got to come clean on some past sins in terms of data collaboration with advertising data, which honestly made me feel really clean. So thank you for that opportunity. Yes, you are forgiven. Thank you, Father. All right. Well, thanks for listening to the Data Stack Show. Definitely listen to this episode. Great one. Subscribe if you haven't, tell a friend, and we'll catch you on the next one. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite
Starting point is 01:02:20 podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.