Disseminate: The Computer Science Research Podcast - High Impact in Databases with... Ali Dasdan

Starting point is 00:00:00 Hello and welcome to Disseminate the Computer Science Research Podcast. Jack here as usual. The podcast is brought to you by Pomtree. Pomtree are the developers behind Raftree, the open source temporal graph analytics engine for Python and Rust. Raftree supports time traveling, multi-layer modeling, and comes out of the box with advanced analytics like community evolution, dynamic scoring, and temporal motifs mining. It is blazingly fast, scales to hundreds of millions of edges on your laptop, and connects directly to all your data science tooling, including Pandas, PyG, and Langchain. Go check out what the Pomfrey guys are doing at www.rafty.com where you can dive into their tutorial for the new 0.80 release. Awesome. On to today's episode.

Starting point is 00:01:05 So we have another installment of our high-impact series. For new listeners to the show, this series or this type of episode is motivated by a blog post from Ryan Marcus about the most influential papers in databases. And today, amongst other things, we're going to be chatting about MapReduce Merge with Ali Dazdan. This was the number one paper for 2007 And today, amongst other things, we're going to be chatting about MapReduce Merge with Ali Dazdan. This was the number one paper for 2007 and actually was the number eight overall in all database papers, Ali.

Starting point is 00:01:34 So, yeah, very influential work. So some more info on Ali. Ali is the CTO at ZoomInfo, and he's worked at numerous large companies over his career and he's focused on big data systems and science work and he's built a variety of large-scale data platforms across his career which I'm sure we'll touch on at various points during the podcast. So welcome to the show Ali. Thank you Jack, thanks for inviting. It's a pleasure. So it's always customary on my podcasts to start off with your story in your own words. So yeah, tell us about your journey and your career so far yeah thanks for that um so it started i was born in turkey and finished my undergrad over there in computer science uh came to us in 94 for my phd work did it in university of illinois urbana-champaign. It was on timing analysis of embedded real-time systems.

Starting point is 00:02:28 So some systems, even in the PhD work, lots of timing aspects to that, including concurrence and all that. So good learning from there. It's mostly algorithmic and graph theory work. After that, I worked in 10 other companies until the one ending up at ZoomInfo. I worked in different industries,

Starting point is 00:02:50 big companies, as well as small companies. I also worked in the UK for two years at Tesco. Actually, I built a data platform. Yeah, that's the journey so far. Lots of learning, lots of failures. I hope it's along the way yeah yeah so the tesco stuff obviously that kind of would have been would have been from the uk that kind of jumps out immediately straight there so was that kind of all to do with their club card stuff and all that sort

Starting point is 00:03:15 of kind of analysis more than that overall okay so i i because of family reasons i had to go to europe um so i joined tesco to build, to run their data and data platform and marketing automation team. So Tesco was doing data mining, data platform, data science using a Teradata database at the time when I joined.

Starting point is 00:03:37 So we brought them to 21st century. We built a 24 petabyte data platform. You know, Spark and Hadoop and Capcom and, you know, anything that you can imagine, including Teradata. So it was a good component of that. So completely revamped the way that Tesco was doing data. You know, for example, they were doing demand forecasting at certain interval uh it's confidential information but we were able to bring it to a level that they can do many many times a day so um it was it was i think

Starting point is 00:04:11 revolutionary for tesco and i i hear good things about this so they are still taking advantage awesome stuff cool so you mentioned it there brief you mentioned the map reduce right and that's kind of how i reached out to you originally was to talk about the MapReduce merge paper. And obviously that was, what, 2007 now? So how many years ago, if I can do a quick math? 17 years ago. So let's cast our minds back 17 years ago and tell us about MapReduce merge.

Starting point is 00:04:35 And so I guess, yeah, kind of what is it? The elevator pitch to start off with and some context for the work, I guess. Yeah, yeah, yeah. I think let me give the context first before talking about the paper. So I joined Yahoo Web Search in 2006, and my charter was to start the internal data team for Yahoo Web Search. So Yahoo definitely had their own data platform teams and all that, but there was no specific team for web search to do.

Starting point is 00:05:11 Web search has crawling systems, there's indexers and web crawlers and graph internal. There was a system called WebMap, which was an internal copy of the entire World Wide Web. And each of these were running on thousands of servers in the Yahoo data centers with petabytes of data. And for each one of these, if you up-level it, basically the context is, right, you are

Starting point is 00:05:39 crawling the web, you are trying to serve your users, you want the content to be comprehensive, to be fresh, to be relevant, to be diverse, to be basically overall good. But what does it mean? It's easy to say in English, but how do you measure these things? What are the metrics you're going to use to do that? How do you do all of these at scale

Starting point is 00:05:58 with four petabytes of data, right? For example, we were building a graph in 2006-7 which had 200 billion nodes and 1 trillion edges. And then we were building it, I think, every week. Now, how do you know the graph

Starting point is 00:06:18 that you built one week ago did not mess up with the new version? What does it mean for this graph to be still okay right so think of all of these so the context was to create a team like that to do all of that um to create the metrics to build the um do the necessary data analysis uh do all the monitoring um and build the monitoring systems for all of it all of, as well as some algorithmic work to improve crawling and all that. So my team was tasked with that.

Starting point is 00:06:47 I joined myself first, and there was another person, Hang Chi, who was another author of the paper. And yeah, that was the task. We were doing data analysis day in, day out. And it so happened that almost, I think the same week, the creator of Hadoop,

Starting point is 00:07:04 Doug Cutting, joined Yahoo. We were actually in the same onboarding meeting for Yahoo. Yeah, and then right away, Yahoo started building the Hadoop, right? Instead of, they had their own internal systems after the Google papers were published. Then they hired Doug Cutting. They decided actually to continue with Hadoop. There is no need. So there was a team that was created for that.

Starting point is 00:07:29 And that team happened to be right next to my team. And we started using Hadoop from the beginning. And it was painful, right? API changes almost every time. It was failing from 20 nodes to some hundred nodes. It was going all the way like that. And because of the huge amount of data that we were analyzing, sometimes with custom code later using Hadoop,

Starting point is 00:07:56 doing joins naturally came, right? You got to do joins because it's so fundamental. And my staff had a little bit of database background. Hangzhi had a stronger background on database. So we came up

Starting point is 00:08:10 actually with different ideas. He came up with the merge idea. I came up with a different idea. We actually have three patent

Starting point is 00:08:16 applications on that which are all donated to, they became patents and donated to Hadoop for open sourcing Hadoop.

Starting point is 00:08:23 Sorry, Apache to open sourcing Hadoop. Anyway, that was the idea, right? Hey, this thing, this MapReduce work to Hadoop for open sourcing Hadoop. Sorry, Apache to open sourcing Hadoop. Anyway, that was the idea, right? Hey, this map reduce work on Hadoop, it was painful in the beginning, but it works pretty well. Why don't we actually extend the infrastructure framework

Starting point is 00:08:38 in such a way that we can actually do database processing on Hadoop? So that was the initial idea, which was natural extension of what we were doing. And if you notice, basically, even the paper name reflects that. The original MapReduce paper says simplified data processing on large clusters. Our name is simplified

Starting point is 00:08:56 relational data processing on large clusters. And the funny thing, Jack, is basically Yahoo had a similar system, similar to MapReduce. It was called Dreadnought, for example, for building the web graph. Actually, porting that application, building of that web graph at that huge scale, like one trillion or so edges to Hadoop, was one of the first applications that actually helped scale Hadoop. But anyway, let me not sidetrack over there.

Starting point is 00:09:26 So the idea was literally do joins on this. Right away, we actually jumped at it. We actually even thought about, hey, this could be a company. We should actually quit Yahoo. We should have done that maybe. We did work very quickly in 2006, actually. 2006, I had the third author of the paper, Roy Long, as an intern from UCLA. So three of us basically got this quickly written as a paper, submitted to SIGMAT.

Starting point is 00:09:57 And then existing work that we were doing was already some progress on Hadoop. Actually, in 2006 summer, we'd released the first production application on Hadoop, which was analyzing crawler logs. I have a LinkedIn article on that, actually, if you look at my profile. So, anyway, long story short, we were very early users of that. We were doing data analysis.

Starting point is 00:10:17 Joins were natural. And then we realized that, yes, with MapReduce, you can do joins, but a little bit, you know, pulling your ear, holding it from other side, and there could be a more natural way of doing that one. And paper came as a result of that. And if you look at the paper, you can also see that using, you know, this extra step, and you can structure and support all relational algebra operators so that you can do database processing on Hadoop completely.

Starting point is 00:10:46 If you had a merge, a merge step. So that was the context. That was the reason we came up with that. And yeah, so that's how it ended up with publishing it. And we were so happy that it was published at Sigma. There was some interest at that time. We even went to Baidu because the conference was in Beijing.

Starting point is 00:11:04 We went to Baidu headquarters. We actually presented to them. So it was good interest at that time. So we're happy that we got it. Yeah, it sounds a real sort of kind of serendipitous almost. The fact that you, like, kind of that Hadoop was happening

Starting point is 00:11:17 at the same time, kind of as you joined. In the first week, I mean, like, I mean, what are the odds of that, right? I mean, really, it's a sliding door sort of moment. Yeah. That's fascinating. And obviously, we can talk about the impact this this work has

Starting point is 00:11:28 had but i mean i mean i guess from day one right it's having real world impact and that while the paper has been written that it was already been used in a production system right and then obviously the interest from kind of companies from day one but yeah i guess kind of what is your opinion on the impact of the paper? Yeah, I think on the impact, probably we were, we ourselves were so busy with existing work that we did not continue literally, for example, expanding Hadoop MapReduce with a merge step. Even though we were working with the Hadoop team,

Starting point is 00:12:02 you know, the focus was mostly on initially to get the basic MapReduce scalable issues with the, you know, Google file system, sorry, Hadoop file system or Nane node, right? You know, that was the main focus. You know, the focus was not, let's add another step to extend MapReduce. Some more fuel on the fire, right?

Starting point is 00:12:23 Yeah, it's already complicated. Trying to get this thing stable. Let's go and do this other fuel on the fire, right? It's already complicated. Trying to get this thing stable. Let's go and do this other complicated thing as well, right? Exactly. I think the other thing is, many people may not know this, but Hadoop team was trying really hard to get early users.

Starting point is 00:12:37 Later, Hadoop became so famous, but in the beginning, people were a little bit skeptical. Like, why should I actually give up my existing infrastructure to use Hadoop? So since I started around the same time, and we were tasked with all data analysis for web search, it was natural for us to jump at Hadoop because there was no legacy that we had to rely on. But many people were suffering for that. And that was actually the reason that we were almost the first users of Hadoop. So that was the original approach to that. And since we did not take the step to start a company on that,

Starting point is 00:13:11 we just basically published the paper. Paper got lots of citations. I think people were citing it because it was an early work. And it was already donated to, I think, as far as I know, to Apache, the patents. And we did not follow up. Actually, we did not go to the database conferences to champion our paper and saying, we should get the best paper award or anything like that.

Starting point is 00:13:33 So there was no, actually, to be honest, we forgot about that. So that's the impact. So I'm happy that I didn't get cited a lot because it's probably the earliest paper on doing database processing. It is published work on using Hadoop. And there are, whether it's any of the big companies, if you notice, they have done database processing on Hadoop-like systems.

Starting point is 00:13:57 So probably they're already touching upon the patented work over there. But since it's part of Apache, it was part of that. So no issues around that. So we actually moved on to different companies, different works, so we did not follow up, but that was the impact from our side. But I'm happy to see that people still cite it and use it, and hopefully there are some good ideas.

Starting point is 00:14:20 One day, people will implement some of these. Yeah, yeah. Well, good ideas are timeless right so yeah i'm sure they will uh cool so you mentioned it there kind of you all kind of moved on from from yahoo and moved on to new companies and new problems so that kind of leads nicely into kind of what you're working on at the moment at zoom info so yeah tell us what's the current the current problem you're working on yes uh so i I am leading engineering at ZoomInfo. So ZoomInfo is in the business-to-business B2B space. It's not super well-known because, you know,

Starting point is 00:14:54 you don't have millions and millions of users from that perspective, but it's the leading company in B2B. So we have a platform that enables basically every company has to sell and we make better sellers out of existing sellers. So in a way, basically for you to retain your current customers or find new customers, you need to figure out who's on the market for the product that you are selling. How do you know they're actually in the market?

Starting point is 00:15:27 How eager they are? What kind of things are happening with that company that you might be getting as a potential customer? If it's a startup, for example, whether they got funding or whether they posted that job requirement saying, I am looking for somebody who's going to manage XYZ. And you happen to be the company that is selling XYZ. And you go like, well, I know all of that.

Starting point is 00:15:52 And if you find out about this company, let's say you find out that they are interested in your product, how do you know whom to reach? What titles? What's the contact number, right? How can you send an email? What kind of response you are getting, all of that. So the idea with this one is if you just simplify it, it's almost like a contact company database. But if you go beyond that, what's called go-to-market,

Starting point is 00:16:16 the whole idea is to make sure that your go-to-market platform works well for you to find, acquire, and grow your current business, find new customers and all that. So that's what we are doing. And yeah, that's how I would describe the work. Awesome. We can get into the kind of details and what it looks like on the system level, but I've just had a quick question on,

Starting point is 00:16:39 is this primarily if I'm selling, I go on the platform, and I guess it's pulling in data from loads of different sources and saying, okay, yeah, this company's just been acquired or has some funding or they post these job adverts. Is there also a platform from the other side of like, I can go on there if I want to get sold something or like I'm looking for something. Can I, as the second B in the B2B, I guess,

Starting point is 00:17:01 can I also go on there and say like, I want this sort of stuff and who's going to sell it to me as well? Is it working both directions? You can do searches related to that, with respect to industry, technology, and all that, but it's not intended for that purpose, but that's a good idea.

Starting point is 00:17:17 Maybe we should extend it to that direction. But related to that, on the seller side, Jack, one thing is basically, one is you can go to the platform. You can do a search. You can find the customers you care about. Again, existing customers as well as new customers. But mostly people would be using it for what's called prospecting, you know, to find new customers.

Starting point is 00:17:38 But also, we are actually releasing this thing called Copilot with lots of AI capability, AI-driven capability, so that you don't have to go to the platform all the time. Actually, we will send you what you should be paying attention to, what we call signals. We're going to say, these are the accounts or companies that you care about or industries. Hey, this is what's happening, and here are the recommended actions for you to act on it. Or you want to catch up with some company data. Let's say you have been having lots of conversations with them, one year of data, customer calls, videos, communication back and forth.

Starting point is 00:18:16 How are you going to catch up with that? Let's say you are the new sales guy assigned to it. How do you catch up really fast? So we will help you literally with the summary of what's going on, so you can actually act on that one. Actually, this is going to be released in May, so we are actually extending it towards that direction, not exactly find customers to

Starting point is 00:18:34 buy from, but at least you don't have to come to us all the time. We will come to you and make your life easier. Yeah, that's great. Kind of like onboarding people quicker and giving that sort of digest. This is all things that have happened. So yeah, it leads up to that.

Starting point is 00:18:49 Yeah, cool. So what are the engineering challenges and enabling this vision then? So kind of what are you rubbing up against in performance? Obviously, kind of what you can disclose but obviously not sort of giving away any sort of...

Starting point is 00:19:02 Yeah. So it's, it's lots of data as you would imagine. And this is business data and people rely on it. So it has to be accurate. It has to be comprehensive. So coverage has to be super good as well as accuracy. Because I,

Starting point is 00:19:20 you know, I am, I am, let's say I'm going to reach out to somebody. I want to make sure that phone number is correct, or I am going to contact somebody. This company is really in need of my product rather than, you know, I'm just sending a spam email or anything like that. So, you know, accuracy and coverage is key.

Starting point is 00:19:37 So, we deal with lots of data. The other thing actually similar to closer to your background, Jack, is that when you have data, let's say you have company data from multiple sources, there is no unique key that can enable you to join the data. So you have to run what's called entity resolution.

Starting point is 00:19:55 You have to figure out exactly, you know, this company in my database is the same company in the other database that I'm bringing. So this is one of the challenges. Lots of algorithmic work over there, some rule-based, some AI-driven, machine learning-driven. Sometimes even, you know, other database that I'm bringing. So this is one of the challenges. Lots of algorithmic work over there,

Starting point is 00:20:07 some rule-based, some AI-driven, machine-driven. Sometimes even humans might actually be resolving some conflicts if there are any. Normally, it's not that many. So that's one of the challenges. Definitely running this at the scale that we are operating at and doing all the tasks that the seller has to do. So there is data cleaning side, combining data into a unified master, let's say, database

Starting point is 00:20:32 for you to act on. Well, you found these companies you care about. Now we've got to send an email or you're going to talk to them. If we have a product that can record those conversations and extract sentiment and lots of analytics and acting on top of that, right? How do you actually manage all of these? If you want to reach out to these people through advertisements, we also have a product where you can actually do that. So all of these different products running on the same platform and making this platform reliable, secure, right? Privacy-sensitive performance, right? same platform and making this platform reliable, secure, privacy sensitive,

Starting point is 00:21:07 performant. All of these are just basic engineering challenges that I've got to deal with day in, day out. But we are on the right path. So it's going really well. That's awesome stuff. So how old is the company? I forgot to ask at the start. I think 17, 18 years old, something like that.

Starting point is 00:21:23 Okay, so yeah. Nice, yeah. Cool. I mean, yeah, let's see how things are going. Good luck with all the sort of features you've got coming out soon. And I guess we might actually touch on this later on, kind of when I ask kind of future trends and directions. You mentioned it a second ago about kind of machine learning and how AI is going to kind of how large language models are fitting into.

Starting point is 00:21:43 So because some of these features you mentioned sound very sort of like they could be kind of used with that quite nicely or dovetail quite nicely with all the sort of advances there but we can maybe get into that later on in the podcast yeah um cool so yeah the next sort of section of the podcast is sort of again having a retrospective and we can maybe talk about some of the the other um projects you've worked on across your career career and the ones that are the most rewarded you found the most challenging and rewarding now they might not be the same necessarily things are probably there's two questions i guess which ones are the most challenging and which ones are the most rewarding so yeah and take your pick which ones you want to do first. No, I think they probably overlap.

Starting point is 00:22:26 Yeah, that's good. So I think first, after my PhD, I joined a company called Synopsys. So actually, I was lucky in that I ended up building a new product and or a new team in every company that I joined. So it was almost like doing startups in even big companies. So that was very rewarding overall. But I would say basically after Synopsys, which is in the chip design software business, then I did Yahoo, WebSearch.

Starting point is 00:22:56 That was the place I learned lots of distributed systems, big data, machine learning, and all that. Then I moved to eBay, Lots of product building experience. We rewrote eBay's recommendation engine from scratch. But then after that, I joined a startup. It was called Turn, which was in

Starting point is 00:23:16 real-time advertising space. And it was a late-stage startup. Real-time advertising, the way it works is very fast operation, right? So when you go to a webpage, an advertisement on that webpage normally is not there. It has to be found in real-time based on you. Let's say CNN.com, right?

Starting point is 00:23:42 You go over there and CNN says says hey i need an advertisement it contacts multiple platforms um on almost on a chain and in the end uh everything comes to an ad exchange which is the market that's that will determine the pricing of that advertisement and all that but that exchange does not usually have the ad it has to ask somebody to give an ad so somebody has to be on the advertising site. And TURN was one of those companies. They are called demand site platforms. It's a very high speed operation.

Starting point is 00:24:12 Within a couple of years, we had about 100 petabytes of data. We were getting about over 200 billion events a day processed. Multiple millions of requests per second. And the company, and this had to grow from, you know,

Starting point is 00:24:30 queries per second was when I joined, it was 50, 60,000. So when I left, it was close to 3 million. So you are growing this fast with, you know, tens of petabytes of data with a very small team startup, you know, with financial constraints. When I joined issue after issue, you know, production incidents and all that, dealing with all of these.

Starting point is 00:24:55 And it was also my first CTO job. So I think it was rewarding and challenging at the same time, right? It was, sometimes the expression is you are driving a race car and it's getting faster and faster and you are changing the parts of it while it's it's moving fast uh but also you are in the driver's seat it's not like you know you can you have an excuse so uh that was very challenging from that perspective unbelievable learning from for me

Starting point is 00:25:19 as well as very rewarding because that ended up uh first of all, it was a good training ground for me to test what I had learned at Yahoo and eBay and all that. So they came really handy. At the same time, learning experience over there from leading the team as a CTO and moving so fast with a very small team, hiring these people, right? How do you onboard these? How do you make sure that you deal with incidents in a proper manner? We had to invent many things ourselves in the process. So all of these are combined with that company.

Starting point is 00:25:55 I ended up replicating similar work after the time, but first time, that was the real place that I got challenged and super rewarding and lots of learning. Awesome. So I kind of followed on from that question that I want to like to ask is what you're most proud of. But is that the same? Do those two things correlate? Is that what you're most proud of in your career as well? What I'm most proud of is I think probably on the personal side, you're ending up changing my industry and company multiple times how fast i was able to learn and you know keep my brain open and

Starting point is 00:26:31 come to a level that i can actually i can contribute but biggest i think thing i'm proud of is i was able to build great teams and contribute to the careers of so many people. And many of these people, if you go check the LinkedIn, are in amazing places. And hopefully I had some role in their success from the time that we worked together. In the end, it's all about people and teams. And that's still rewarding to me. Sometimes I meet with them right we talk about those days uh reminisce yeah yeah i don't mean so that's what all the

Starting point is 00:27:12 fires you were putting out every day 10 fires a day wow yeah incident after incident go to these two things that jumped out there the first one would be um is that you you're proud of sort of have you kind of as you've kind of gone through your career i've always kind of had this having this open mind and being able to take on new information and contribute quickly to the new problem you're trying to solve was that something you were conscious of and kind of systematically kind of worked at being like that or is that something that just kind of naturally happened and yet you kind of became good at um i I think probably it came from childhood. I am still a very curious person.

Starting point is 00:27:51 So I guess it helped a lot because being curious and being in this learning mode, somehow keeps your brain open and you are open to absorbing as fast as possible. And the fact that I had to change my company multiple times, moving to completely different industries, meant almost like I was willing to start from scratch. But at the same time, I had to get up to speed super fast. So I had to learn exactly what that industry is about. And I really changed drastically, right?

Starting point is 00:28:28 I was in the chip design software business to web search, to e-commerce, to real-time advertising, right? It's just completely drastic differences. But I think that curiosity, natural curiosity that I had, I guess, ended up helping me to learn fast and contribute. And I still keep it and recommend to everybody around me because that way you don't have, you know,

Starting point is 00:28:53 if you fix your brain or close it, right, I already know this or somehow I am an expert in this one from the beginning, you don't realize sometimes you are missing so much. I always start from a beginner mindset and curiosity coupled with that um i think were super super uh productive for me so that's how it came about yeah i really like kind of staying curious and keeping an open mind because like as soon as you shut it off right

Starting point is 00:29:19 then you kind of you don't know what you don't know so if you kind of go into it with a kind of closed mind attitude that you think you know it all, then you're not going to learn anything, right? So yeah, stay curious. I guess it's the message. Yeah, I think also there are lots of interesting things to learn about almost everything. So if you just take a step back

Starting point is 00:29:38 and look at even things that look mundane, somehow you go like, what is in it? Or you try to do it the way that everybody does. But if you take a step back and look at from that curious mindset, you sometimes realize, you know, there are different aspects of the problem that you are looking at or the system or whatever. That actually also helped me. If you look at some of my patent applications or patents,

Starting point is 00:30:03 like they're all in different areas. For example, when pandemic came, we pandemic came, everybody was on Zoom. And I realized it has some issues. Actually, I have a patent on improving Zoom. 20 people are joining. How do you actually schedule through these people who should talk first? Maybe somebody is not talking in those meetings. Maybe they should be given priority, right?

Starting point is 00:30:26 So the point is like, why would I? I mean, I was working in Atlassian at the point. It had nothing to do with Zoom. We were just using it. But then if you keep your mind and look at everything with that, you know, almost like childlike perspective and look at how it can be better, that actually helps you, right? You can even innovate things

Starting point is 00:30:46 or invent things along the way. So I would highly recommend it. I love that, Kelly. Going around, what are the problems that I have here that could make someone's life better? How can I make this better? And obviously you see everything,

Starting point is 00:30:58 you never know when you're going to have some great idea, right? So that's awesome. And yeah, the next question kind of off the back of the answer yeah the next question kind of off um that off the back of the answer to the previous question was about team building i know there's like countless books written on this and the and the art of team building and how to kind of hire people and all that but yeah i guess i want to kind of get what's your secret of building teams

Starting point is 00:31:17 and how you approach that problem and making sure you get the right people to solve the right problem. Yeah. I think it probably starts with realizing that people come first in every company. You know, if your team is not happy, if they are not top performing people, in the end, if it's a public company, for example, your shareholders and customers will also suffer. So it's very important to realize that first. And second thing is genuinely care about people, not only as part of your team, even to the level that in my case, if somebody, let's say, was leaving my team, I was, of course, talking to them to keep them around. But at the same time, I was genuinely interested in finding out where they were planning to go. And can I help them? Whether this company is good for them?

Starting point is 00:32:16 Can I actually, if I have a connection with people over there, that I can help this person to find even a better position? Or maybe even better deal. So I think high level is, if you up level it, it starts with caring about people first and genuinely caring about them. And people will see that one. Now coming back to once you have that fundamental, it comes back to if you are building a team from scratch, it requires what kind of skill sets you need,

Starting point is 00:32:49 where they should be located, how many people you will need, how you should be interviewing them, what does interviewing mean, what kind of data points you should be collecting, how do you make sure that you're going to onboard these people as fast as possible so they can be productive and feel part of the team immediately, hopefully.

Starting point is 00:33:08 How do you keep them engaged going forward? How do you make sure that they feel you care about that and you're not going to say that, but through your actions, you are showing that. How do you make sure that they keep improving with respect to what they do, with respect to their productivity, their careers are getting better. Some of maybe your curiosity or other good things

Starting point is 00:33:30 that if you are doing will be contagious and they're going to be following up in your footsteps. Also, how much you are learning from them, how much you are interested in what they are doing. So in the end, it's almost like a person-to-person relationship, but it happens, of course, in a business setting and there are business goals and all that. But if you run it with that genuine concern, genuine attention to what you are doing and care about people, the rest of the steps, I think, come naturally. And there are multiple other things that you've got to do.

Starting point is 00:34:02 But overall, I think that's how I would what i would describe it nice yeah i love that i said the root of it kind of caring about people and everything else naturally not naturally but falls from that right i guess um yeah awesome cool um yeah for the sort of the next question i kind of dig into your motivation a little bit more and things that have motivated you across your career and yeah i mean obviously we kind of we're speaking about sort of academic papers and stuff here so i kind of want to ask if there's any sort of specific papers or research or people even that have um had an impact on your career yeah who they are what they are yeah um i can maybe start with the people of what people so definitely i think getting the right advice,

Starting point is 00:34:46 getting the right mentorship at the right point. Sometimes people underestimate, but it is hugely important. So in my own experience, definitely, you know, I come, you know, originally, as I mentioned,

Starting point is 00:35:00 I come from Turkey, you know, very small city, modest backgrounds. We had a math teacher, you know, he believed in us. And, you know, even just a little bit of encouragement and teaching us a couple of tricks helped a lot. I still remember his name. So it was amazing, you know, some getting that kind of attention. And I think after that, definitely, you know, doing undergrad work and grad work, right, advisors along the way.

Starting point is 00:35:26 You notice that basically, I have a theory actually, if you look at many of the famous mathematicians, most of their advisors were also famous. So Riemann advisor was Gauss. So somehow I think you learn from it's like almost they are the masters that you are learning as an apprentice uh from that okay yeah yeah because it's not just like reading the books and learning so there is a certain way of approaching a problem or asking the right questions or how they actually

Starting point is 00:35:57 can make trade-offs or how they can make decisions so along the way i think those kinds of people helped me a lot and you know i got i got a little bit of from um everybody uh definitely those are the people that i would i would i would think at this point uh that contributed to my my growth the jedi and the apprentice right yeah similar sort of thing right you need that kind of biosmosis as well being around that sort of person is is very um you naturally pick up on a lot of where they approach things right i guess and yes cool yeah so you mentioned there about advice as well so what is the best piece of advice anyone's given you kind of any of any of your mentors across the year what's across the years i think this yeah the best one is it sounds so mundane

Starting point is 00:36:41 and simple uh and it is uh like just do it like just keep start yeah no it's so true though right i mean like yeah you can sit and think about something for age i am gonna do this but well it's like okay go and do it then like as soon as you start doing it then that's when you start the real problems start hitting you in the face and that's how you're gonna start iterating and yes yeah i think if you look at you know millions of books of success and all that you know everybody has a different story but i noticed uh and i i i am a i can claim that i am a student of these things so what are the basically principles or fundamentals to help you to learn you know how do you learn to learn

Starting point is 00:37:20 and i have done lots of my own research around that. So it literally, in the end, just comes down to that simple method. Just start and keep swimming. It kind of fails, try something else and keep iterating, right? And kind of just keep trying, just keep doing it. As simple as that. Yeah. Like you said, you know, the fact that you start,

Starting point is 00:37:40 you actually see issues as early as possible. And, you know, the fact that you are facing something, it helps you to ask more questions maybe or see the problems early enough. You act on that. And as a result, you learn something new. It's not like, you know, statically, you know,

Starting point is 00:37:57 hitting the same wall or banging your head on the wall all the time. It's just that as you are iterating, it also adapts. You also adapt to what you are what you are seeing but literally just start and keep swimming but yeah it reminds it reminds me i think i read actually in a um some database textbook ages ago about it was about concurrency control and i was like when i was first sort of learning these concepts and it was like i read the book about two two-phase lock-in and it was like, now you've read about it, go and implement it. Because until you've actually implemented it and had to think about it and actually done it,

Starting point is 00:38:30 you haven't really maybe properly grasped it all the way. So there's that aspect, I think, as well, just kind of doing something because then it kind of makes you kind of ask, oh, that's kind of quite inefficient or I could do this differently, I could do that differently. So yeah, I agree with that principle. Just go and get your boots dirty and get doing it and then it'll all kind of go there's actually yeah there's actually a good paper from i think google about implementing paxos algorithm okay in real life you know what

Starting point is 00:38:55 they had to face and they go like algorithm works it's just that there are so many implementation details that unless just like you're saying unless you implement it you never you never know that you know they're real cool so yeah i guess kind of related to a little bit of kind of hitting these kind of roadblocks when you do something and it doesn't work out is setbacks and how you approach how you've approached those across the years and setbacks and rejections and and how that kind of yeah what's your approach for that i think sometimes it's not easy like yeah as simple as like you're a paper rejection, for example, or in real life,

Starting point is 00:39:28 your manager might not like you or get a bad performance review this year. I think, you know, yes, emotions sometimes will come. You might feel angry at that point. But I have seen over and over again that many times, let's say I interviewed with a company and I did not get accepted. And later I realized

Starting point is 00:39:48 that that company actually failed. If I had joined, I would have spent, I know this many years, I might actually might have suffered because of that. There was no benefit that I was going to get in the end.

Starting point is 00:39:59 Or actually you later learned that the company had a very bad culture. So I think there's probably a reason for some of these things. Yes. Emotions will be coming, but try to keep it short maybe. And sometimes there's a wisdom in these things and just definitely learn from it,

Starting point is 00:40:15 right? There must be some actions that you got to take. You cannot just keep blaming the other side. You know, if I get a good performance, bad performance review, my manager doesn't know me or whatever, but there must be a reason that you go like, well, why am I even surprised with that?

Starting point is 00:40:28 Maybe along the way, we did not discuss these things. Or maybe I was hearing these, but I did not act on these. So maybe I was not hearing or I was not listening. I think it's a good idea to learn from these and act accordingly to get better. But yeah, sometimes in the short term, hear something you go like oh you know it's human nature right on some other kind of your instant reaction is always not we're kind of irrational but you're back to react emotionally i guess at first right to these things and then maybe after you've called off a little bit you think okay maybe they were kind of right and i can improve and do this better and yeah yeah especially if it's a paper i

Starting point is 00:41:05 mean you you are you have published a lot too so sometimes you get the you know review review of comments and they rejected your paper you go like you don't know you didn't even get the paper that's not true but everybody yeah there's no way you can respond to it but there are other places you can submit right so these days i don't publish anything in regular conferences i just send to archive and if people like it people will like it so yeah yeah that's awesome cool so are you so we mentioned this earlier on i think maybe a kind of a little kind of this principle of curiosity right but and the next kind of this is actually my favorite question of all the questions i ever ask anyone is about the creative process and how you approach idea generation and if there's a systematic way of doing that and obviously we've mentioned this curiosity sort of principle

Starting point is 00:41:49 and but then once you kind of have that then selecting what to work on or how to narrow your focus to not kind of bounce around off a million different projects and not actually ever making any tangible progress with any of them so yeah I kind of want to get what your approach is to being creative. Yeah. Like you said, I think it starts again from showing genuine interest in the problem that you're

Starting point is 00:42:15 facing or you're reading a paper. I think the attitude of I already know this is probably not good. The other thing is uh maybe the other attitude that oh it's just there's no way i'm gonna be interested in this or it's not even in my area why should i even worry about that i think all of these literally you are limiting your potential in my opinion so one is just keeping that again curiosity hey seems like

Starting point is 00:42:40 something interesting over here you know it's not like i'm just gonna use this for my work but uh sounds like an interesting problem let me just look into that so i think it just starts from there that from there and keeping your brain open um just like you said i think it's a good idea to get your hands dirty a little bit if it's a math problem i don't know you can try to solve it yourself or if it's a you know result that you want to drive yourself or if if it is uh some um programming task or something, algorithm or whatever, I think just start implementing that, see how it goes. There are different ways of doing that.

Starting point is 00:43:12 You can definitely make progress and learn. And along the way, you wouldn't know what's going to come out. Definitely write things. I have noticed that people sometimes keep avoiding writing. If you write what you understand from a problem, writing, in my opinion, is also thinking. It literally opens up. Because the fact that, and you have seen this when you write paper, writing an introduction or abstract is so difficult because it just tells you that probably you have not completely simplified the

Starting point is 00:43:44 problem to a way that you can actually explain to somebody else, right? It also shows probably gaps in your own understanding. So writing is another one that I use. I try to make my assumptions explicit, even if they are sometimes trivial. Even in meetings, when I talk to people, I actually say it like that too. Hey, I'm going to make my assumptions explicit. Sometimes people say, oh, it's like, this is trivial. Why would you even say that? But you don't realize that after some time, even though simple assumptions might not be shared, that actually opens up the discussion and more people can be contributing to that. I think there are lots of techniques of thinking well. So if you look at, for example, in math,

Starting point is 00:44:26 I think from, there's a famous mathematician, Jacobi or Jacobi, so he's, I forgot how they pronounce his name, but he has two really good suggestions. One is generalize, right? Moving an abstraction level sometimes actually simplifies the problem.

Starting point is 00:44:43 This is also one of the techniques that if you look at George Paul, he has a book about how to approach problems. It's more about math, but if you realize some of those are applicable to generic problem solving too. And the other thing is invert. Sometimes you look at problem one perspective, let's say you forward from A to B,

Starting point is 00:45:02 but if you invert it, sometimes you see the problem from the other side, and that actually could be a new way of understanding. And also you might actually invent something new, and in my experience it has happened a couple of times. Even though it sounds simple, but there are some, I think, uber

Starting point is 00:45:19 techniques with respect to how to explore something. There are lots of heuristics that you can use to, for example, summarize something really fast. What's the main things that I should be looking at? Or if you're looking at reading a paper, even the simple techniques like, you know, how do you read a paper, right? You know, you look at, let's say, plotting that paper, you know, even just paying attention

Starting point is 00:45:44 to that and X and Y. Can I actually reason very quickly? Should I just start reading from beginning to the end or scan it first, look at a couple of important things, maybe abstract and conclusion or whatever, then deep dive into that. And for each sentence over there, sometimes you got to pay attention to any high level. I would say, you know, curiosity, keeping your brain open. But there are a couple of techniques to think, heuristics or fundamentals or principles. It's a good idea to learn. And you notice that many people actually have been using those.

Starting point is 00:46:19 Some of these are not verbalized or all that, but there are places that we can get some of these. And they are, at least in my case, were super useful and i am conscious of those sometimes i explicitly use them to help me awesome yeah just kind of going back to the the writing is thinking um uh you mentioned the other answer and that that is something about my uh my supervisor for my phd said those exact exact same that's the exact same thing to me because obviously I've always on the sort of the math versus sort of English sort of I've always leaned more towards numbers than words shall we say hence probably why I've ended up kind of down the career path I kind of kind of have chosen um but I always kind of tried to not like write because I just

Starting point is 00:47:00 didn't like it and then it was kind of when I was doing my phd it was kind of like you need to do this and iterate write more and more because the first thing you're going to be terrible but then that just shows that you're thinking isn't fully crystallized yet and writing is a way of communicating your ideas right so you need to keep iterating because that helps you then explain and understand the problem better and makes you think through more so yeah writing definitely is is thinking it forces you to try and put it into words. You think you know it, try and put it into words. I bet you don't most of the time, right? So I really, really like that, yeah.

Starting point is 00:47:30 That's true, especially if you have also like formulas or whatever, even like symbols that you are using, right? So it looks too complicated. You got to iterate on that. But going back to the earlier principle that we were discussing, Jack, writing sometimes you have writing block, right? It just doesn't feel like you should actually go ahead and write. I think it comes back to the other recommendation that we were discussing, Jack. Writing, sometimes you have writing block. It just doesn't feel like you should actually go ahead and write.

Starting point is 00:47:46 I think it comes back to the other recommendation that we were discussing. Just start, even if it's just one sentence, and keep random stuff. And after some time, after some iterations, like you said, you're going to reach a far better state. And actually, it's going to clarify. It's a two-way process.

Starting point is 00:48:02 The fact that you're writing, you're actually making progress. But at the same time, it helps you to clarify it. So it's a two-way process. The fact that you're writing, you're actually making progress, but at the same time, it helps you to clarify your thinking and what you're trying to express and actually improving your thinking. So it's super helpful with that. Yeah, and on the sort of making the assumptions explicit up front

Starting point is 00:48:18 and communicating those kind of helps everyone get on the same page, right? Because we did this exercise once at work where it was, people had to, you were kind of helps everyone got on the same page right because we did this exercise once um at work where it was people had to you were kind of back to back and you had to basically describe a house to somebody and then they would draw it and then based off your description it would they were you kind of like well that's not the house i described but it's those sort of um what's the word i'm looking for there's a implicit assumptions you're making that it could

Starting point is 00:48:41 be completely different to the other person and then you end up just talking past each other almost. Right. So making them clear and getting everyone on the same page, I guess, makes for kind of a better environment to be creative because everyone's sort of in the same page. But yeah, and I'm going to go check out that book, How to Solve It as well. And that sounds very interesting. So, yeah. One more to the reading list. The monotonically increasing reading list. Right. That's the other thing. Yeah. So I can recommend a couple of other books related to that jack one is uh richard hemming i don't know he was a you know hemming distance hemming that person yes yeah yeah he had a couple of uh really insightful books um i would i would recommend the art of i think the main name was the um

Starting point is 00:49:21 the art of science and engineering or something like that. I forgot the exact title, but really good book. There is this professor in, I think, Germany, Gigerenzer, I think his last name is. He's about talking about heuristics, how to, you know,

Starting point is 00:49:38 be smart about things and, you know, what are the heuristics people are using, for example, to make decisions, to trade off analysis and all that. There are a couple of, you know, it's not super common, but there are books like that.

Starting point is 00:49:48 There's also one I would recommend is, the person's name is Rektin. His last name is Rektin. Eberhard Rektin, R-E-C-H-T-I-N. He has a book about systems architecting. So this is systems architecting. Normally, you know, people are using this term when they build, let's say, space systems or planes or ships and all that, like super, super complex systems. And software is just one part of it. And there are lots of learnings from those books.

Starting point is 00:50:18 And you can also see in those books, in his books, for example, how do you approach a complex system? How do you build it? What are the heuristics you are using? How do you partition the system? What are the criteria you are looking for? And many of these are not well known, to be honest, in the software architecting world.

Starting point is 00:50:39 Normally, definitely, you look at enterprise architecture, software architecture. There are lots of books on that. But I was lucky somehow, hit upon these books and I had to learn a different aspect of how they approach building big, little, huge systems.

Starting point is 00:50:57 And they also have lots of good heuristics about, again, approaching, not only systems building, but also how to think about solving a very, very complex problem. Nice, yeah. There's two more there as well. I'll put links to them in the show notes and also the listener can go

Starting point is 00:51:13 and find them as well. Yeah, that'd be awesome. Yeah, I appreciate that. Thank you. I've sketched it down on my notes, but my handwriting when I'm writing notes is sometimes it's kind of unreadable by the end of it. But yeah. Cool. Awesome stuff, Ali so the the next sort of question is is kind of what you think about the interaction between academia and industry and I'm guessing kind of you've you've kind of touched both camps over the years but primarily being in the, from the industry perspective. So yeah, I'd like to get kind of what your take is on that interaction and how it can

Starting point is 00:51:48 be improved. Yeah. So, uh, since like yourself, I did PhD definitely, um, you know, spend lots of years in academia. Uh, I actually wanted to become a professor right away, but somehow I chose a middle path and I joined a research organization within the first company that I worked, which was Synopsys. They had a group called Advanced Technology Group. It was mostly researchers in that specific domain, but at the same time, it was almost like doing academic work in industry.

Starting point is 00:52:24 So you see this in especially big companies. I think no successful company can definitely avoid doing work at the forefront of research. So now who should be doing it? So maybe the typical understanding is that kind of research, open-ended thinking comes from academia and industry just implements these things. But just like we discussed earlier, right, sometimes doing the real work creates more problems for you to solve. And as a result, how are you going to now do that? Are you going to route these problems back to academia to solve or are you going to do it internally? So in a way, this work

Starting point is 00:53:06 has been blended for a while. I think definitely they both need each other. So industry, I'm hoping that will support academia more and more with respect to maybe introducing them to problems that they are facing. Hopefully, financially supporting

Starting point is 00:53:22 the universities and researchers. Definitely through internships, summer sabbaticals or things like that, focusing hopefully financially supporting the universities and researchers definitely through internships summer sabbaticals or things like that i see more and more uh people who are doing you know let's say graduate work or professors are spending some time in industry so that that i think blending is working out pretty well the other direction is probably not that not that um productive at this point. You don't see many, let's say, people from industry going and

Starting point is 00:53:49 spending, I don't know, one year in academia, just to bring that perspective, the real-world perspective to students and professors, people around them. I think probably there should be a way of maybe doing that too. There's also also in academia,

Starting point is 00:54:05 as you know, publication pressure, right? All that. So industry, you know, you got to find grants and financial support for your students and yourself. I think overall,

Starting point is 00:54:16 basically I would say this is just the two, two legs of the same unit. Definitely. There has to be a really good working relationship between these two. And interaction should be both ways. It should not be just academia and just raise people and we're just going to hire them away.

Starting point is 00:54:34 I know the financial situation probably is in the industry is far better, but if you keep hiring all the good professors, very soon you're going to run into issues not having good students because who are going to educate them? So I think there are lots of people professors very soon you're gonna run into issues not having good students because we're gonna educate them so i think there are lots of people who want to do definite academic work um so i think relationship is reasonable but it could be far better in my opinion along the lines that i

Starting point is 00:54:56 suggest yeah i kind of i mean well for personally one of the reasons why i kind of veered more towards industry after my um phd was that what you mentioned there about the incentives in academia are very different in the sense that you each grants it's people sort of optimizing their h index and that you know that sort of uh attitude public the publication pressure and it's often restrained i don't know it didn't really sort of align with kind of what I wanted to do. And obviously, well, I mean, the financial situation is a lot better in industry as well.

Starting point is 00:55:29 So there's that aspect to it as well. But yeah, I definitely agree. And it would be nice to sort of have that kind of having people from industry kind of going to academia for a year and take those real world problems, because that's another thing that I think sometimes missing from academia is like, it's a lot more rewarding when you're working on projects that have an actual you can see who it's going to affect or how it's going to affect a product or a group of

Starting point is 00:55:52 people right so having that kind of scoped initially is a lot nicer as well so if you kind of have somebody come from uh to come from uh industry into academia for a year that would be nice as well so yeah that's a nice interesting idea there are lots of real problems that i think uh it's gonna be a good opportunity also for the people in academia to publish more papers because yeah right exactly give the h index a bump right so yeah awesome cool so on to the on to the last question now ali so it's about the future and current trends. And what are the most exciting things you're observing at the moment and the trends and what you see as promising directions for the future of this space?

Starting point is 00:56:35 I think the biggest thing, what we are seeing is right now, what is all the buzz about? I think AI and GenAI especially. It's, if you look at my lifetime, first time when we got the web, World Wide Web, the fact that you could try it yourself and you could see the benefit right away, it was useful with only with you in the picture.

Starting point is 00:57:00 So with now, so AI, as you know, has been, we have been using when I was doing advertising or Yahoo Web Search, search results ranking was using machine learning. This was whatever, like almost 15, 20 years ago. So it's not like AI was not in the industry and many things were automated at huge scale. Most of the advertising was using machine learning

Starting point is 00:57:22 for sending you the best advertisements. There's no way a human can do it millions of times per second. But right now, I think, especially with Gen AI or the level that AI has reached, you can see that this becomes now useful to ordinary people. Literally, you can go ahead and use yourself and send a big article right it's going to summarize it for you and most of the time you go like well actually pretty good pretty good job or writes a song for you or creates an image right in the past these were very difficult problems i think it's just amazing how it has reached uh but more and more i think this one is how much of that can be used in industry, definitely.

Starting point is 00:58:06 And you can see people have, companies have jumped at it already because the benefits are obvious. But there are lots of questions around that, right? For example, where are the places that now you have to use humans for, right? It's a good idea. I think it's impossible to eliminate humans completely from this picture, but what's the best place to take advantage of their input across that, across that, let's say a flow of things that the company has to do, you know,

Starting point is 00:58:33 whatever it generates, for example, or you are producing results, how are we going to verify that? Right. So if I, let's say I'm summarizing a three hour conversation between us, even this podcast,

Starting point is 00:58:46 how am I going to know that the summary was correct? Should I go back and listen to the hour of podcast myself and say, oh yeah, I would have produced that summary? So some of these are difficult problems and have not been solved yet. But I think more and more, we have to definitely find solutions to verifying the results of all this AI deployment,

Starting point is 00:59:09 especially Gen AI deployment. There will be lots of machine-generated data, definitely. I mean, you already know there's some work on security aspect of it, whether these things are fake or not or real or not. But I think these will be unavoidable. So you have to pay attention to that and figure out ways of again detecting all of these uh more and more you're gonna see i think ai used in defense and offense right in military applications and all that again where are the boundaries like what's the what's the ethical

Starting point is 00:59:41 usage but beyond what's the best usage for the, again, benefit of humanity? There is surveillance aspect of that. You can automate everything now. Again, how do you solve implications of that? How do you make sure that basically this is used for the benefit of everybody, and it's part of the productivity that we need? I don't think to the level of you say you should ban it, right,

Starting point is 01:00:04 because benefit is so huge that it's going to be unlike that people will be doing something like that. But at the same time, how can we make it part of our lives and find a really good place for all of us to actually contribute to the overall good? So I think research around all of these, probably many of them are already active. And so finding solutions for these probably is super important

Starting point is 01:00:30 because we ourselves need solutions like that for the products that we are deploying very soon for our customers. Yeah, the one that sort of jumps out to me there is the verification aspects of it. And obviously, it's well established that Gen EI and LLMs have this sort uh kind of problem of kind of hallucinating right and just throwing out some garbage and being like yeah this is the truth this is like i know for academic

Starting point is 01:00:54 papers it's really good at generating some citations of papers that sound really really good but like they just don't exist the authors don't exist so yeah i gotta come back in that a provenance problem of like why it's saying what it is and actually supplying a ground truth of like yeah i say i'm saying x because i because of y right um it's something i think that's not what's not been solved yet and it's really important it's becoming in there are some you know reasonable solutions almost closer to the way that you write papers right right? So you write a paper, you give references to other work. So you see more and more now many of the, what has

Starting point is 01:01:30 been generated is referenced. But at the same time, are you going to go, let's say there are 100 references, are you going to go verify all of these? Oh, yeah. If you use another LLM to do that one, right? So I think, yeah, related to that,

Starting point is 01:01:46 I will also, Jack, mention that I think understanding definitely more and more why they are so good is going to help us to do things better, I think, you know, as humans, right? Maybe we are going to say, you know what,

Starting point is 01:01:55 maybe all the things that we were saying that humans are really good at, actually, maybe it's as simple as that. It's just that we did not know the solution before, but using the fact that we now found an automated solution and found a way to explain that reasoning maybe the secret sauce was simple yeah it will reveal all right when we pull back

Starting point is 01:02:17 the curtains yeah yeah that's awesome stuff that's brilliant well i think we we can we can end things there it's been an absolutely fascinating chat i thank you very much for taking the time to speak with me today it's been awesome and i'm sure the listener will really have enjoyed it as well so thank you very much thanks so much great stuff what an opportunity yeah hopefully it's going to be useful so thank you i'm sure it will be great stuff and yeah and we'll see you all next time for some more awesome computer science research.

Disseminate: The Computer Science Research Podcast - High Impact in Databases with... Ali Dasdan

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.