The Data Stack Show - 03: Turning All Data at Grofers into Live Event Streams

Episode Date: August 27, 2020

In this week’s episode of The Data Stack Show, Kostas Pardalis connects with Satyam Krishna, a data engineer at Grofers, India’s largest low-price online supermarket. Grofers boasts a network of m...ore than 5,000 partner stores, a user base with three million iOS and Android app downloads, and an efficient supply chain that allow it to deliver more than 25 million products to customers every month. Satyam offers insights into how he helped build the data engineering function at Grofers, how they developed a robust data stack, how they’re turning production databases into live event streams using Change Data Capture, how Grofers’ internal customers consume data, and the company made adjustments due to the pandemic. Topics of discussion included:Satyam moving from a developer to a data engineer (2:43)Describing Grofers’ data stack and data lake (6:41)Who is consuming data inside the company and what are some of their common uses specific to Grofers? (12:03)What are the biggest issues day-to-day as a data engineer? (18:21)COVID’s impact on business practices and the data stack (21:28)The big problem of data discoverability and metadata cataloging (27:44)Completely changing architecture to something that can scale up (33:16)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome back to the Data Stack Show. Today we're going to talk with Satyam, who's a data engineer at Grofers. Grofers is a huge grocery delivery service in India. And we had a really interesting conversation about a lot of the challenges that they faced with the pandemic and the changes that were happening every day and how they had to adapt to that. It'll be interesting to hear about how they had built out a robust data stack and were able to manage through that. I think one of the other interesting things that we'll dig into is how they're turning some of their production databases into live event streams using change data capture,
Starting point is 00:00:47 which is just a really neat component of their stack. What other things are you excited to discuss from a technical standpoint, Costas? Actually, it was very interesting to hear from him like the different open source components that they are using. I mean, he touches on a couple of different tools that they use internally, like for example, Debezium to recreate streams out of their production databases
Starting point is 00:01:12 and a couple of other tools to manage, especially to manage their data lake and their data warehouses. That's one thing that I believe is super interesting. Another very important and interesting concept that we touched was about what to do with the data after they are delivered. I mean, there are many issues in every organization, and I think that Satyam did a very good job to describe these challenges that they have with the internal customers, like all the product managers, the engineers, the analysts, the marketing teams or the sales teams that need to consume
Starting point is 00:01:52 this data. And I think he's really sharing some excellent insights around that, what are the challenges and what kind of solutions they built internally. And he also touches some aspects of data governance and data quality and how this can be addressed inside the organization and why they are so important. Great. Well, let's dive right in and talk to Satyam about his role at Grofers and dig into their stack. Hi, Satyam. It's very nice to have you here with us today and participate on this episode
Starting point is 00:02:32 of Data Stack Show. Can you give us a quick intro about yourself, your background, and the company, Grofers, that you work at? Hey, Kostas. Thanks for having me here. I'm Satyam, and I lead the consumer data engineering team at Growforce. I've been working in Growforce for a very long time now. So it's almost been six years. And I was actually the third engineer here. And I started working as a mobile engineer back then. I bootstrapped the mobile applications for some time and decided I wanted to do something else.
Starting point is 00:03:14 And so I shifted to data engineering two years back and have been working on data problems since then. About Growfirst, so Growfirst is a low price online supermarket. I would say we are basically an e-commerce platform that's making quality products affordable for Indian customers. And we have our in-house techs which manages kind of right from managing our partner stores that enable the company to run fast and we try to have a lean supply chain. And ultimately, the goal is to have an efficient supply chain to deliver millions of quality products to the consumers every month.
Starting point is 00:03:58 Sounds great. I find very fascinating that you started as a developer and then you decided to move into data engineering. How do you feel about this, by the way? I mean, are you happy with this choice? Also, can you share a little bit more about what made you make this choice and from being a developer, specialize more into anything that has to do with data
Starting point is 00:04:25 and become like a data engineer? So it kind of started from the fact that I wanted to look at my product from all the different angles. And I had spent a good enough time in building that consumer application, but I wanted to see how the users interact with it, what's the data around it.
Starting point is 00:04:45 And that kind of always excited me to look at how we are getting the conversions, how the different metrics are getting tracked. So that was always something that it's there, it's happening, and I'm excited about it. And when the opportunity came that, yes, I can shift from the engineering team and start learning about data,
Starting point is 00:05:03 that's how I kind of got into that team. Yeah, that's a great point, actually. I think one of the characteristics of working with data and becoming a data engineer is that you have the opportunity to pretty much have like a 360 view of what is happening in the company. I mean, you have to work with the product, you have to work with the rest of the business lines inside the company.
Starting point is 00:05:30 So I think this is quite interesting. What do you think? For sure, for sure. And the biggest change for me was that when your stakeholders are the users of your application to the internal users of the organization, that was also one of the biggest shifts that I actually saw. So when you're a mobile engineer, you're working towards making your application better,
Starting point is 00:05:50 you are impacting the users outside your organization. That's a completely different challenge. But once you start building internal tools, you're building for the stakeholders and you also get the feedback much faster. Because they will come to you and say that the data platform, that these are the issues that need to be fixed. Whereas the feedback loop in a typical consumer life cycle is longer, right? So there were completely different challenges in terms of, you know,
Starting point is 00:06:16 how you manage a mobile application versus how you manage an internal data product. Absolutely. Absolutely. I totally agree with that. So moving forward, can you share some more information about the current data stack that you have in Growforce and what's the infrastructure and the applications and
Starting point is 00:06:35 technologies that you are using to manage this data stack? Sure. Primarily, we are based on AWS services and that is why our primary warehouse is Redshift. So all our data ultimately resides in Redshift and ingestion in different tools
Starting point is 00:06:56 happens basically through some of our batch jobs and some of our streaming jobs. So these are separate jobs which are responsible for different kinds of data needs. We primarily use Airflow as our orchestration layer, which helps us in managing these jobs. And we are also working on a Hudi Lake, which is essentially going to become our data lake
Starting point is 00:07:19 for all our source data. We also have some of our Spark jobs. So we also have our Spark cluster running, which is used to process all the different event data and a lot of our ML, AI workflows. So these are primarily our ingestion and warehouse competence. And over the top,
Starting point is 00:07:39 our people basically query these data marts on their red shirt using primarily an open source tool called ReaDash. And we also have a self-serve analytics use case where we use Tableau. Sounds great. So that's a lot of information around the current technologies that you are using. What about the data sources?
Starting point is 00:08:03 I mean, where do you collect data from and how do you work with them? Okay. So primarily because of the nature of our business, it's a transactional business. So mostly our source databases are Postgres and MySQL depending upon the service that it's using it. So we have a microservice architecture where you have different services like cart, orders,
Starting point is 00:08:31 last mile. And because it's an e-commerce company, again, you have a lot of components. You have a delivery app, you have a picker app shopper. So a lot of these services have their independent service and they have their own databases, which are primarily Postgres and MySQL. And what we essentially do is we capture the replication logs from these databases. Like in case of MySQL, it can be a bin log or val logs in Postgres. And we use Debezium, which basically captures those CDC changes and dump it into a Kafka
Starting point is 00:09:04 stream and dump it into a Kafka stream. And dump it into Kafka, actually. And from Kafka, we dump it into a raw location in S3 where we process it and convert it into a Hudi Lake. And that's how our lake gets created. Sounds good. So this is how you're getting, I guess, the data that are generated mainly on your production databases.
Starting point is 00:09:27 Are there any other data sources where you get data from? I mean, cloud applications that you might also pull data from, or event data that you might be capturing on your web properties or the application that you have? I mean, the mobile application? For sure, for sure. So we use Rudder Stack to basically capture one of our data for all our impression needs on our consumer application.
Starting point is 00:10:00 And that's another part of the data that we capture. We also have some vendors through which we collect data and ultimately all of that flows into our lake and in Redshift. So yes, we have other sources like event data. Sounds good. So between the two main data sources that you mentioned, all the data that are coming from your production databases and the event data,
Starting point is 00:10:34 which source generates actually most of the data that you have to work with, or they are on par in terms of like the volume? No, definitely. The volume of our event data is much, much higher. So we capture around five to six billion events every month. Whereas if I talk about our transactional data, we must be generating some terabytes of data every month. So definitely the scale of the event data is on a completely different level. And we definitely have different workflows to manage our event data compared to our normal transactional data.
Starting point is 00:11:10 So as I said, for the transactional data, we are using CDC. And for the event data, because most of the vendors, what they do is they dump your data into your S3 in a raw format. And then we have Spark jobs running, which get them converted into Parquet compressed files, partitioned by date, so that that can be accessed through Redshift Spectrum. And we don't even keep it in Redshift because of the size of the data. So yes, definitely there is a very big gap in terms of the nature of the size and the volume of the data. Yeah, yeah. That's an impressive amount of
Starting point is 00:11:42 data that you have to work with. That's very interesting. So, okay, we talked so far about the current data stack that you have and the different data sources and the volume of the data that you have to work with. So my next question is, what are the, I mean, who is the consumer of this data inside the company? I mean, you mentioned at the beginning about the difference of being a data engineer and who is the stakeholder and who is actually your customer as a data engineer compared to building a product, which I assume when we are talking about this, the customers are actually the people inside the company that they need this data to drive their work.
Starting point is 00:12:27 And so who are these people? And also, if you can share with us some common use cases that you see inside Grofers that are outside the common use case of reporting and BI analytics, that is like the most common use cases and the first use case that every company is trying to build? So definitely. So Groofer is a data-driven company and each of the different aspects of the business
Starting point is 00:12:58 are basically grouped together into these different teams. And I said, like, for example, you have a category team which manages all the different products which are going to be visible, how much is the margin and everything on it, right? So each team has its own purpose
Starting point is 00:13:13 and each team will have its own data analyst which help them reach their target metrics, which help them show where they stand right now. So I would say like, if we have 20 teams inside Grofers which help manage the different aspects of it, then each team becomes my data stakeholder and they use data in some form or the another
Starting point is 00:13:35 to get information around. As you said, like your typical reporting cases where they want to track their L0 metrics and they use Redash and they use Tableau to create queries on the top of the existing data to get those metrics. So these are definitely the users inside our data. And so we have a decentralized team.
Starting point is 00:13:58 So we have a common data engineering team which manages the data product and the data platforms. But in terms of data analysis, these are decentralized teams across all the functions. And in terms of use cases, I would say that for event data, there are a lot of different use cases that people use. For example, when they have to test a rolling out a feature, they want to see how the users are using it. They are running an A-B test,, they want to see how the users are using it. They are running an A-B test,
Starting point is 00:14:27 how they want to see the conversions happening on it. So all of that happens through the event data that is basically added to these different features. And there are other use cases as well. We do a lot of recommendations based on the existing data that we get and the associations of the data, right? So for example, once you visit our, and that's what we call our homepage is the feed
Starting point is 00:14:51 homepage, right? So when, once you visit our feed, the view that you get is a very personalized view and that's basically created based on your past buying and what other people buy, right? So that kind of association helps us create those recommendation engines which help power the different field APIs and the sorting APIs. And I can even talk about one of the other big use case for us is merchandising a product, right?
Starting point is 00:15:20 So we have customers who are coming to our platform from all the different aspects, right aspects and the different companies out there, they want to basically boost their product or they want to sell their product on our website or on our application. them a page where they can showcase their products. And that's what we call it Merchandising. We sell it as a product to the different brands out there where they can utilize it and make it like a sponsored kind of product. So these are some of the use cases that come to my mind. So recommendation, feature testing, merchandising. Yeah, it sounds great. I mean, from what I hear from you, data is pretty much used in everything inside the
Starting point is 00:16:10 company. So it's not just about reporting and business intelligence where you're trying to figure out what happened with your business in the past and how you can make choices for the future. But the whole product actually, I mean, not the whole product, but a big part of the product is also based on using this data to create recommendations or promote some of the products that you sell through Growforce. That's pretty amazing and very, very interesting, I would say.
Starting point is 00:16:39 All right, so moving forward, I mean, we talked so far about the technologies and the data stack that you have and the volume of the data that you have to deal on a daily basis. So how many people are currently supporting the infrastructure that you have and what is the structure of the team that is supporting data engineering inside GrowFers? Sure. So as I said, we have a centralized data engineering inside Grofers? Sure.
Starting point is 00:17:06 So as I said, we have a centralized data engineering team and our entire company or the entire organization is basically broken into these two aspects of, one is the biggest aspect is the supply chain, which actually delivers the product. And one aspect is your consumer team, which I am a part of which basically you know manages how the consumers are placing an order how they are coming onto an application the acquisition
Starting point is 00:17:33 retention part of it right so we have these two big chunks which we call as a consumer and the supply team and we have a data engineering team which sits across right so we currently have five data engineers who manage all these different products, including, as I said, about managing the past services and managing our open source systems, part clusters, everything. And we have a one product guy who helps us create these products, figure out what is the next most useful thing. And in terms of data warehousing
Starting point is 00:18:08 or in terms of data analysts, as I said, we have two people in our team, but most of the data analysts are kind of decentralized into their own teams. Sounds good. So what is the most common issue that you have to deal with as a data engineer right now? I mean, what's the biggest pain or your biggest,
Starting point is 00:18:32 let's say, what you would like to solve in terms of your day-to-day work as a data engineer? Yeah, so as I said, having a decentralized team kind of have a downside also, right? So it becomes very difficult to manage your like standard L0 metrics and there's repetition of metrics, there's repetition of dashboards because it is really difficult to get that communication happen across teams all the time.
Starting point is 00:19:07 So you see a lot of these things happening. Plus, it's also very difficult to enforce best practices because those people, they don't work with you day in, day out. And they have all come from different backgrounds. And it's not necessary that everyone has worked with Redshift and Redshift has its own nuances, how you want to write query. And it's very different from a very different warehouse where they might have already been using some of the other practices.
Starting point is 00:19:35 So to make them understand that Redshift is very different from whatever they have been using in the past and you create platforms or you create abstractions on top of it to ensure that they don't have to know about Redshift. I think that is one of the most challenging things for us right now. Very interesting. So from what I hear from you, one thing is actually dealing with Redshift and how Redshift can be used.
Starting point is 00:20:04 I mean, the data warehouse doesn't have to do necessarily with Redshift. Redshift is what weshift can be used. I mean, the data warehouse doesn't have to do necessarily with Redshift. Redshift is what we are using right now. But from what I understand, one thing is how you can have all these different people interact and follow the best practices interacting with the data warehouse.
Starting point is 00:20:18 That's one thing. And then there's a lot of, let's say, issues that they arise because of, let's say, the governance and the communication between the team as a result of having a decentralized way of doing analytics and working with data inside the organization. That's great. Okay, one of the things that's, I mean, it's very, very common question lately is about COVID and how this has is actually affecting like businesses. And I assume that also in your case, because you are like a B2C company and also in something like quite sensitive and essential service, which has to do with buying products. Did you see COVID having any kind of impact on your business? And mainly, what kind of impact it had on your work, right? Like on the data infrastructure that you're managing,
Starting point is 00:21:20 the models that you're building, and your everyday work with data? Definitely. I think, as you mentioned, all of the businesses, COVID has definitely impacted us. And it is an essential service. So customers were definitely, we were seeing one of our biggest organic growths when the COVID started. But as you know, right, so like the ground operations were impacted massively. There were lockdowns happening everywhere and it was getting very difficult to, you know, serve our customers.
Starting point is 00:21:57 So I would say that the initial days were very tough. And most of the time that we actually spend on initially was changing our business model in terms of obviously the first thing that comes was that how we how do we ensure the security safety of our customer right how do we reduce the touch points for a customer and how are we able to serve all the customers right we want to give those essential services to everyone so that people get their groceries safely from their home. And that kind of pushed us into thinking around how we can better batch products, how we can essentially not worry about the delivery timelines more rather than worry more about how you can, if someone is placing an order today and someone is placing an order tomorrow, but they, you know, go to the same place, we would ideally
Starting point is 00:22:48 want to club them so that, you know, you have less delivery routes and you're able to deliver more orders. And there were a lot of product decisions that were basically taken at that time. Like one, we built a completely contactless delivery feature. We added capabilities in our application around how you can edit your order multiple times, which was not even supported. So once you place an order, it's kind
Starting point is 00:23:12 of finalized. But because we know at time of a pandemic, people forget about things, they want to add something, they realize, oh, I need to add that, right? And if they place another order, it probably might not get clubbed, right right or it becomes difficult for our operations to manage n number of orders so yes we we did a lot of our business changes around
Starting point is 00:23:32 um how we can you know better serve our customers in a way and ensure the safety of our customers we so uh that's that's something that definitely happened and And one of the biggest change for us, and at least for our team, was that there were a lot of new needs around reporting. Because once you change your business model, or once you try to adapt to the situation, you also want to track metrics at a much faster pace. So that you know that whatever things that you're changing, and people were changing a lot of things in a given day, they want to track that, okay, this change caused this, right? So there are a
Starting point is 00:24:09 lot of new real time reporting needs that came with that situation. So that was definitely one of our bigger challenges where we had to get better in our alerting systems and near real time reporting. Yeah, it's very interesting. So you mentioned a couple of challenges that you had with the business side of things. Is there something that you had to change or something that you had to introduce on your data stack because of the impact that COVID had, like maybe having more, like a larger volume of data that you had to operate with? Or as you mentioned earlier, you said that you had very rapidly to create like a massive number of new reports. So is there something that you've learned or something that you improved in terms of either your best practices
Starting point is 00:25:01 or your infrastructure as a result of having to deal with COVID? So, excuse me. So I think one of the challenge we faced was more in terms of the communication internally in the team rather than the infrastructure. Like infrastructure, it was easy to scale because we usually use managed services and we have cloud infrastructure hosted on KX. So to be very honest, for a data engineering team, the issues are not
Starting point is 00:25:31 in that direction. But because this was the first time that we are working in a, you know, working from home kind of scenario for a longer duration, right? So that was definitely one of the biggest challenges for us. So I would say it was more from the communication side of things where if you're managing a team, you have to ensure that the health of the team is always good. And when you have logged on for months, you are, you know, you're cooped up inside a room. So you want to ensure that, you know, that energy gets released. And, you know, we figured out games that we can play in our, you know, typical meetings, when at the end of the day, we are having some online games to release that energy.
Starting point is 00:26:09 So I think that was definitely a bigger challenge for us. There were challenges around infrastructure, but mostly around how you have to scale up services and not increase the cost also massively, right? So if your business is impacted, you also need to figure out how do you save it at other places, because you're losing more on your deliveries. So you have to, you can't just scale it infinitely. So you also have to figure out how you can scale it sustainably. So that I think those are the two bigger challenges for the data engineering team once the pandemic started. Yeah, it's very, That's super interesting to hear
Starting point is 00:26:46 the organizational challenges because of all these changes that happened during the pandemic. Moving forward and staying with challenges for the next question, let's discuss a little bit more about the challenges that you are facing with the technologies that you are using. I mean, previously you mentioned that there are some difficulties in aligning everyone to follow best practices
Starting point is 00:27:19 when they have to work with Redshift, Amazon Redshift. Can you share a little bit more information around that? What are the type of challenges that you are encountering around the data warehouse that you have and how you are dealing with these today? As I said, we have created a platform, and we have democratized in such a way that anyone using you know, the data, using the data can go and create their own data marks,
Starting point is 00:27:57 right, and they can basically go ahead and do a lot of analysis on their own. But that also brings in as I said, a lot of repetition of data and at times people don't know if they're looking at the right data or not. So I would say data discoverability right now is a very big problem. So we have around
Starting point is 00:28:17 2,000 to 3,000 tables in Redshift and people at times they don't know whether this is a production table or it's something that was created on an ad hoc basis. So data discoverability is one of the things that we are working on right now. And we are trying to solve it through, you know, your typical data cataloging, figuring out how you can create a data catalog at Grofos, which can basically help you, you know, give a picture about, okay, this table,
Starting point is 00:28:43 what does this table mean? Who created it? When was it last updated? So a lot of meta information about your tables and not just the fields and the columns, but also about the data inside it is going to help a lot of folks understand
Starting point is 00:28:59 what this table does. So data discoverability is one of the things that I feel is a challenge right now and that's something that we are investing our time and solving it another i would say is that how we are currently managing a real-time pipeline is something is of a big big challenge as i said it was a very make-do solution to you know support our business requirements at that time but we for sure know that it's not going to scale up very well. So what we are doing is because we are getting those CDC changes from our data and what we want is that people are able to join data across databases in real time.
Starting point is 00:29:41 So what we are doing is we are dumping these different CDC streams into one single database and then allowing people to query from that and we keep on pruning the past data. So that's something that we have built up as a make-do real-time pipeline. So you have data changes coming in but it's getting difficult to difficult, it's getting more
Starting point is 00:30:00 difficult to manage primarily because we are dumping it into a post-based kind of database and obviously your OLAP queries are not going to work on OLTP. So that's something that we definitely want to change and we want to move to
Starting point is 00:30:16 Kafka Streams or we want to move to some different system which can definitely scale up with our needs. Very interesting. One more question about the first challenge that you mentioned that you are trying to solve using data cataloging. You mentioned that apart from cataloging
Starting point is 00:30:36 the standards metadata around the table, you want to add some more fields there and some more metadata that can help understand even better what kind of data you are dealing with. Can you share some more fields there and some more metadata that can help understand even better what kind of data you're dealing with. Can you share some more information around that? Yeah. So basically, very basic things like even if you get some of the common values that
Starting point is 00:30:58 are present in a column, or how much of the data is null, or what are the maximum or the minimum values, what are the different values. So all of these, when you start seeing that metadata for a table, you get a very overview level of understanding, okay, what are the different values that are possible in the system? And it really helps. Like if you see that a particular column is null 90% of the time, you either know that, okay, something is not right here. Right. And you can, you know, reach out to the data team and say, this table is not getting populated
Starting point is 00:31:30 properly. Or even for us, we can start setting up alerts on that. Right. And you have some verified tables and you have verified dashboards where you're saying that this is the metric that you're always going to hit. And if you're not hitting that, we get to know. Right now, it's more of a, you know, something that we get alerted from different teams and the data is not coming in,
Starting point is 00:31:50 or the data is not properly populated, right? And the downstream services and the downstream reports are impacted. So we also want to change that. So with data discoverability, if you're building that system, which can get that meta for you, you can also kind of set up alerts over the top of it and you get alerted internally rather than someone else reaching out to you. So it brings in more confidence that the data that you're looking at is right. And you can trust your metrics, right? Because if the data, if you start losing confidence in that, you can never trust your metrics and then the business decisions that you're
Starting point is 00:32:22 taking, it gets more and more difficult to, you know, be sure about what you're doing is actually impacting or actually you know making a positive change yeah that's a great point actually and it's very interesting to see how metadata catalog can also help with quality around data and how you can have mechanism to figure out if something is going wrong or understand if you have to fix something around your data. That's very interesting, super interesting. So moving to the last question of this very, very interesting interview, any interesting projects that you would like to share with us? I mean, it can be something either internal that you're doing at Growfirst or something personal that you do. Anything that excites you, actually, that has, of course, like to do with data and data engineering.
Starting point is 00:33:15 So I can think of something that's, so we have been, as I said in the beginning, right, so we have been working on creating this lake at Growfirsters, which is updated at a good frequency. And I would say that in the start of 2020, most of our tables were getting updated in Redshift using a bad job, right? And I said, we massively rely on those kinds of bad jobs. But as we are growing and as we are seeing the scale of our data, we
Starting point is 00:33:46 know that that's not going to scale up so well. So we started moving to an architecture where we have more of a Kappa architecture and how we are updating our data and how we get that data into our system, we have completely changed it. And that's something that we are working on. So we are using Debezium, which is something which basically, you know, converts these events from different databases of their own format. Like each database has its own format into a standard message,
Starting point is 00:34:16 which can be then used to, you know, create your data lake. So we are using Debezium combined with Kafka to get the raw dumps in our S3 and then running our Spark jobs to basically populate our Hudi Lake, which is something, so Hudi is basically, it gives you absurd capabilities on Hadoop kind of thing. So Uber made it a couple of years back and it's now a part of Apache ecosystem.
Starting point is 00:34:42 So we are making Apache Hudi Lake using that raw data and then essentially populating a redshift. So that's something that we have been working for some time now and I think it's really exciting in how you can create a very dynamic lake using Hudi where your lake
Starting point is 00:34:59 like you're able to absurd the data into a lake and have that asset kind of transactions happening over S3, which was not earlier possible. So we have a much faster refresh of how we are updating our lake compared to
Starting point is 00:35:16 our bad jobs. That's great. I think we have many reasons to get on another interview in the future. I mean, you're building and you have some very interesting projects inside Growforce and it will be very interesting in the future to see how things went with these projects and what new lessons you've learned from that.
Starting point is 00:35:39 So thank you so much, Satyam, for your time today. It was a great pleasure chatting with you and I hope to have the opportunity soon to chat again. Sure, sure. Thanks, Kostas. Thanks for having me here. It was a great conversation and I loved it. So that was our interview with Satyam. I really enjoyed it and I found it very interesting with all the insights that he gave to us. I mean, it's always
Starting point is 00:36:08 amazing to hear someone who has started as a software engineer and ended up from a small company because he started at Grofers when they were just like a couple of folks there building the company and he ended up
Starting point is 00:36:23 pretty much building and running the data engineering function inside the company. So I think it was very interesting to hear how this happened and the experience that he gathered from there. There are a couple of things that, as we said also in the introduction, we touched there. There are many, many other things that Grow as we said also in the introduction, we touched there. There are many, many other things that Growforce is doing in terms of how they utilize and they extract value out of their data
Starting point is 00:36:52 that we haven't touched. Things like, for example, how they do personalization using this data, and also in terms of the organization, how they involve other functions like marketing, for example, to actually drive this personalization, which is quite unique. Because most of the times when we are talking about personalization, we just think of some algorithms doing it. But here we have a very complex scenario where we have also people from different departments involved in this. So, yeah, Eric, I think we will have the opportunity in the future to talk again in a couple of months with Satyam and learn more about what they are doing internally at Growforce in terms of using their data and what kind of interesting technologies they are building.
Starting point is 00:37:39 Yeah, I'm excited. I think, you know, with my background and early career in marketing, I would say that the way that they're using their data at every point in the customer journey, as far as audience is pretty exciting. So that's going into personalization for users of the app, but they're also packaging that to go out and do more sophisticated acquisition of new customers. So we didn't get a chance to talk about that, obviously, in this episode of the show,
Starting point is 00:38:08 but we're going to circle back up with the Grofers team to learn about some more of those use cases and hopefully get a couple of people from those other teams like product and marketing to join us as well. So stay tuned and we'll let you know when that's going to happen.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.