Disseminate: The Computer Science Research Podcast - High Impact in Databases with... Ali Dasdan
Episode Date: October 8, 2024In this High Impact episode we talk to Ali Dasdan, CTO at Zoominfo. Tune in to hear Ali's story and learn about some of his most impactful work such as his work on "Map-Reduce-Merge".The podcast is pr...oudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.Materials mentioned on this episode:Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters (SIGMOD'07)The Art of Doing Science and Engineering: Learning to Learn, Richard HammingHow to Solve It, George PolyaSystems Architecting: Creating & Building Complex Systems, Eberhardt RechtinYou can find Ali on:TwitterLinkedIn Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello and welcome to Disseminate the Computer Science Research Podcast. Jack here as usual.
The podcast is brought to you by Pomtree. Pomtree are the developers behind Raftree,
the open source temporal graph analytics engine for Python and Rust. Raftree supports time
traveling, multi-layer modeling, and comes out of the box with advanced analytics like community
evolution, dynamic scoring, and temporal motifs mining. It is blazingly fast, scales to hundreds of millions of edges on your laptop,
and connects directly to all your data science tooling, including Pandas, PyG, and Langchain.
Go check out what the Pomfrey guys are doing at www.rafty.com where you can dive into their
tutorial for the new 0.80 release. Awesome. On to today's episode.
So we have another installment of our high-impact series.
For new listeners to the show, this series or this type of episode
is motivated by a blog post from Ryan Marcus
about the most influential papers in databases.
And today, amongst other things, we're going to be chatting
about MapReduce Merge with Ali Dazdan.
This was the number one paper for 2007 And today, amongst other things, we're going to be chatting about MapReduce Merge with Ali Dazdan.
This was the number one paper for 2007 and actually was the number eight overall in all database papers, Ali.
So, yeah, very influential work. So some more info on Ali.
Ali is the CTO at ZoomInfo, and he's worked at numerous large companies over his career and he's focused on big data systems and science work and he's
built a variety of large-scale data platforms across his career which I'm sure we'll touch
on at various points during the podcast. So welcome to the show Ali. Thank you Jack,
thanks for inviting. It's a pleasure. So it's always customary on my podcasts to start off
with your story in your own words. So yeah, tell us about your journey and your career so far yeah thanks for that um so it started i was born in turkey and
finished my undergrad over there in computer science uh came to us in 94 for my phd work
did it in university of illinois urbana-champaign. It was on timing analysis of embedded real-time systems.
So some systems, even in the PhD work,
lots of timing aspects to that,
including concurrence and all that.
So good learning from there.
It's mostly algorithmic and graph theory work.
After that, I worked in 10 other companies
until the one ending up at ZoomInfo.
I worked in different industries,
big companies, as well as small companies.
I also worked in the UK for two years at Tesco.
Actually, I built a data platform.
Yeah, that's the journey so far.
Lots of learning, lots of failures.
I hope it's along the way yeah yeah so the tesco
stuff obviously that kind of would have been would have been from the uk that kind of jumps out
immediately straight there so was that kind of all to do with their club card stuff and all that sort
of kind of analysis more than that overall okay so i i because of family reasons i had to go to
europe um so i joined tesco to build, to run their data
and data platform and marketing
automation team.
So Tesco was doing data
mining, data platform, data
science using a Teradata database
at the time when I joined.
So we brought them to 21st
century. We built
a 24 petabyte data
platform.
You know, Spark and Hadoop and Capcom and, you know, anything that you can imagine, including Teradata.
So it was a good component of that.
So completely revamped the way that Tesco was doing data.
You know, for example, they were doing demand forecasting at certain interval uh it's confidential information but we were able to bring it to a level that they can do many many times a day so um it was it was i think
revolutionary for tesco and i i hear good things about this so they are still taking advantage
awesome stuff cool so you mentioned it there brief you mentioned the map reduce right and that's kind
of how i reached out to you originally was to talk about the MapReduce merge paper.
And obviously that was, what, 2007 now?
So how many years ago, if I can do a quick math?
17 years ago.
So let's cast our minds back 17 years ago
and tell us about MapReduce merge.
And so I guess, yeah, kind of what is it?
The elevator pitch to start off with
and some context for the work, I guess.
Yeah, yeah, yeah.
I think let me give the context first
before talking about
the paper. So I joined Yahoo Web Search in 2006, and my charter was to start the internal data
team for Yahoo Web Search. So Yahoo definitely had their own data platform teams and all that, but there was no specific team for web search to do.
Web search has crawling systems,
there's indexers and web crawlers and graph internal.
There was a system called WebMap,
which was an internal copy of the entire World Wide Web.
And each of these were running on
thousands of servers in the Yahoo data centers
with petabytes of data. And for each one of these,
if you up-level it, basically the context is, right, you are
crawling the web, you are trying to serve your users, you want the content to be
comprehensive, to be fresh, to be relevant, to be diverse,
to be basically overall good.
But what does it mean?
It's easy to say in English,
but how do you measure these things?
What are the metrics you're going to use to do that?
How do you do all of these at scale
with four petabytes of data, right?
For example, we were building a graph in 2006-7
which had
200 billion nodes and 1 trillion
edges.
And then
we were building it, I think, every week.
Now, how do you know the graph
that you built one week ago
did not
mess up with the new version?
What does it mean for this graph to be still okay
right so think of all of these so the context was to create a team like that to do all of that um
to create the metrics to build the um do the necessary data analysis uh do all the monitoring
um and build the monitoring systems for all of it all of, as well as some algorithmic work to improve crawling and all that.
So my team was tasked with that.
I joined myself first,
and there was another person,
Hang Chi, who was another author of the paper.
And yeah, that was the task.
We were doing data analysis day in, day out.
And it so happened that almost,
I think the same week,
the creator of Hadoop,
Doug Cutting, joined Yahoo.
We were actually in the same onboarding meeting for Yahoo.
Yeah, and then right away, Yahoo started building the Hadoop, right?
Instead of, they had their own internal systems after the Google papers were published.
Then they hired Doug Cutting.
They decided actually to continue with Hadoop.
There is no need.
So there was a team that was created for that.
And that team happened to be right next to my team.
And we started using Hadoop from the beginning.
And it was painful, right?
API changes almost every time.
It was failing from 20 nodes to some hundred nodes.
It was going all the way like that.
And because of the huge amount of data that we were analyzing,
sometimes with custom code later using Hadoop,
doing joins naturally came, right?
You got to do joins because it's so fundamental.
And my staff had a little bit of
database background.
Hangzhi had a
stronger background
on database.
So we came up
actually with
different ideas.
He came up with
the merge idea.
I came up with
a different idea.
We actually have
three patent
applications on that
which are all
donated to,
they became patents
and donated to
Hadoop for
open sourcing
Hadoop.
Sorry,
Apache to
open sourcing
Hadoop.
Anyway, that was the idea, right? Hey, this thing, this MapReduce work to Hadoop for open sourcing Hadoop. Sorry, Apache to open sourcing Hadoop. Anyway, that was the idea, right?
Hey, this map reduce work on Hadoop,
it was painful in the beginning, but it works pretty well.
Why don't we actually extend the infrastructure framework
in such a way that we can actually do database processing on Hadoop?
So that was the initial idea,
which was natural extension of what
we were doing. And if you notice, basically, even the paper
name reflects that.
The original MapReduce paper says
simplified data processing on large
clusters. Our name is simplified
relational data processing on large clusters.
And
the funny thing, Jack, is basically Yahoo
had a similar system, similar
to MapReduce.
It was called Dreadnought, for example, for building the web graph.
Actually, porting that application, building of that web graph at that huge scale, like one trillion or so edges to Hadoop, was one of the first applications that actually helped scale Hadoop.
But anyway, let me not sidetrack over there.
So the idea was literally do joins on this.
Right away, we actually jumped at it.
We actually even thought about, hey, this could be a company.
We should actually quit Yahoo.
We should have done that maybe.
We did work very quickly in 2006, actually.
2006, I had the third author of the paper, Roy Long, as an intern from UCLA.
So three of us basically got this quickly written as a paper, submitted to SIGMAT.
And then existing work that we were doing was already some progress on Hadoop.
Actually, in 2006 summer, we'd released the first production application on Hadoop,
which was analyzing crawler logs.
I have a LinkedIn article on that, actually,
if you look at my profile.
So, anyway, long story short,
we were very early users of that.
We were doing data analysis.
Joins were natural.
And then we realized that,
yes, with MapReduce, you can do joins,
but a little bit, you know,
pulling your ear, holding it from other side, and there could be a more natural way of doing that
one. And paper came as a result of that. And if you look at the paper, you can also see that
using, you know, this extra step, and you can structure and support all relational algebra
operators so that you can do database processing on Hadoop completely.
If you had a merge, a merge step.
So that was the context.
That was the reason we came up with that.
And yeah, so that's how it ended up with publishing it.
And we were so happy that it was published at Sigma.
There was some interest at that time.
We even went to Baidu
because the conference was in Beijing.
We went to Baidu headquarters.
We actually presented to them.
So it was good interest at that time.
So we're happy that we got it.
Yeah, it sounds a real sort of
kind of serendipitous almost.
The fact that you, like,
kind of that Hadoop was happening
at the same time,
kind of as you joined.
In the first week, I mean,
like, I mean, what are the odds of that, right?
I mean, really, it's a sliding door
sort of moment.
Yeah.
That's fascinating. And obviously, we can talk about the impact this this work has
had but i mean i mean i guess from day one right it's having real world impact and that while the
paper has been written that it was already been used in a production system right and then obviously
the interest from kind of companies from day one but yeah i guess kind of what is your opinion on
the impact of the paper?
Yeah, I think on the impact, probably we were,
we ourselves were so busy with existing work that we did not continue literally, for example,
expanding Hadoop MapReduce with a merge step.
Even though we were working with the Hadoop team,
you know, the focus was mostly on initially
to get the basic MapReduce scalable issues
with the, you know, Google file system,
sorry, Hadoop file system or Nane node, right?
You know, that was the main focus.
You know, the focus was not,
let's add another step to extend MapReduce.
Some more fuel on the fire, right?
Yeah, it's already complicated.
Trying to get this thing stable. Let's go and do this other fuel on the fire, right? It's already complicated. Trying to get this thing stable.
Let's go and do this other complicated thing as well, right?
Exactly. I think the other
thing is, many people may not
know this, but Hadoop team
was trying
really hard to get early users.
Later, Hadoop became so famous, but
in the beginning, people were a little bit skeptical.
Like, why should I actually give up
my existing infrastructure to use Hadoop? So since I started around the same time, and we were tasked with
all data analysis for web search, it was natural for us to jump at Hadoop because there was no
legacy that we had to rely on. But many people were suffering for that. And that was actually
the reason that we were almost the first users of Hadoop. So that was the original approach to that.
And since we did not take the step to start a company on that,
we just basically published the paper.
Paper got lots of citations.
I think people were citing it because it was an early work.
And it was already donated to, I think, as far as I know, to Apache, the patents.
And we did not follow up.
Actually, we did not go to the database conferences
to champion our paper and saying,
we should get the best paper award or anything like that.
So there was no, actually, to be honest,
we forgot about that.
So that's the impact.
So I'm happy that I didn't get cited a lot
because it's probably the earliest paper
on doing database processing.
It is published work on using Hadoop.
And there are, whether it's any of the big companies, if you notice, they have done database processing on Hadoop-like systems.
So probably they're already touching upon the patented work over there.
But since it's part of Apache, it was part of that.
So no issues around that.
So we actually moved on to different companies,
different works, so we did not follow up,
but that was the impact from our side.
But I'm happy to see that people still cite it and use it,
and hopefully there are some good ideas.
One day, people will implement some of these.
Yeah, yeah. Well, good ideas are timeless right so
yeah i'm sure they will uh cool so you mentioned it there kind of you all kind of moved on from
from yahoo and moved on to new companies and new problems so that kind of leads nicely into kind of
what you're working on at the moment at zoom info so yeah tell us what's the current the current
problem you're working on yes uh so i I am leading engineering at ZoomInfo.
So ZoomInfo is in the business-to-business B2B space.
It's not super well-known because, you know,
you don't have millions and millions of users from that perspective,
but it's the leading company in B2B.
So we have a platform that enables basically every company has to sell and we make
better sellers out of existing sellers. So in a way, basically
for you to retain your current customers
or find new customers, you need
to figure out who's on the market for the product that you are selling.
How do you know they're actually in the market?
How eager they are?
What kind of things are happening with that company
that you might be getting as a potential customer?
If it's a startup, for example, whether they got funding
or whether they posted that job requirement saying,
I am looking for somebody who's going to manage XYZ.
And you happen to be the company that is selling XYZ.
And you go like, well, I know all of that.
And if you find out about this company, let's say you find out that they are interested
in your product, how do you know whom to reach?
What titles?
What's the contact number, right?
How can you send an email?
What kind of response you are getting, all of that. So the idea with this one is
if you just simplify it, it's almost like a
contact company database. But if you go beyond that, what's called go-to-market,
the whole idea is to make sure that your go-to-market platform
works well for you to find, acquire, and grow your current
business, find new customers and all that.
So that's what we are doing.
And yeah, that's how I would describe the work.
Awesome.
We can get into the kind of details and what it looks like on the system level,
but I've just had a quick question on,
is this primarily if I'm selling, I go on the platform,
and I guess it's pulling in data from loads of different sources
and saying, okay, yeah, this company's just been acquired
or has some funding or they post these job adverts.
Is there also a platform from the other side of like,
I can go on there if I want to get sold something
or like I'm looking for something.
Can I, as the second B in the B2B, I guess,
can I also go on there and say like,
I want this sort of stuff and who's going to sell it to me as well?
Is it working both directions?
You can do searches
related to that, with respect to industry,
technology, and all that,
but it's not intended
for that purpose, but that's a good idea.
Maybe we should extend it to that direction.
But related to that,
on the seller side, Jack,
one thing is basically, one is you can go to the platform.
You can do a search.
You can find the customers you care about.
Again, existing customers as well as new customers.
But mostly people would be using it for what's called prospecting, you know, to find new customers.
But also, we are actually releasing this thing called Copilot with lots of AI capability, AI-driven capability,
so that you don't have to go to the platform all the time. Actually, we will send you what you
should be paying attention to, what we call signals. We're going to say, these are the
accounts or companies that you care about or industries. Hey, this is what's happening,
and here are the recommended actions for you to act on it. Or you want to catch up with some company data.
Let's say you have been having lots of conversations with them,
one year of data, customer calls, videos,
communication back and forth.
How are you going to catch up with that?
Let's say you are the new sales guy assigned to it.
How do you catch up really fast?
So we will help you literally with the summary of what's going on, so
you can actually act on that one. Actually, this is going to be released
in May, so we are actually extending
it towards that direction, not exactly
find customers to
buy from, but at least you don't
have to come to us all the time. We will come to you
and make your life easier.
Yeah, that's great. Kind of like onboarding
people quicker and giving
that sort of digest.
This is all things that have happened.
So yeah, it leads up to that.
Yeah, cool.
So what are the engineering challenges
and enabling this vision then?
So kind of what are you rubbing up against
in performance?
Obviously, kind of what you can disclose
but obviously not sort of giving away
any sort of...
Yeah.
So it's,
it's lots of data as you would imagine.
And this is business data and people rely on it.
So it has to be accurate.
It has to be comprehensive.
So coverage has to be super good as well as accuracy.
Because I,
you know,
I am,
I am,
let's say I'm going to reach out to somebody.
I want to make sure that phone number is correct,
or I am going to contact somebody.
This company is really in need of my product rather than, you know, I'm just sending a spam email or anything like that.
So, you know, accuracy and coverage is key.
So, we deal with lots of data.
The other thing actually similar to closer to your background, Jack, is that when you have data,
let's say you have company data
from multiple sources,
there is no unique key
that can enable you to join the data.
So you have to run
what's called entity resolution.
You have to figure out exactly,
you know, this company in my database
is the same company
in the other database
that I'm bringing.
So this is one of the challenges.
Lots of algorithmic work over there,
some rule-based, some AI-driven, machine learning-driven. Sometimes even, you know, other database that I'm bringing. So this is one of the challenges. Lots of algorithmic work over there,
some rule-based, some AI-driven, machine-driven.
Sometimes even humans might actually be resolving some conflicts if there are any.
Normally, it's not that many.
So that's one of the challenges.
Definitely running this at the scale
that we are operating at
and doing all the tasks that the seller has to do.
So there is data cleaning side, combining data into a unified master, let's say, database
for you to act on.
Well, you found these companies you care about.
Now we've got to send an email or you're going to talk to them.
If we have a product that can record those conversations and extract sentiment and lots of analytics and acting on top of that, right?
How do you actually manage all of these?
If you want to reach out to these people through advertisements, we also have a product where you can actually do that.
So all of these different products running on the same platform and making this platform reliable, secure, right?
Privacy-sensitive performance, right? same platform and making this platform reliable, secure, privacy sensitive,
performant.
All of these are just basic engineering challenges that I've got to deal with day in, day out.
But we are on the right path.
So it's going
really well.
That's awesome stuff. So how old is the company?
I forgot to ask at the start.
I think 17, 18 years old, something like that.
Okay, so yeah.
Nice, yeah.
Cool.
I mean, yeah, let's see how things are going.
Good luck with all the sort of features you've got coming out soon. And I guess we might actually touch on this later on,
kind of when I ask kind of future trends and directions.
You mentioned it a second ago about kind of machine learning
and how AI is going to kind of how large language models are fitting into.
So because some of these features you mentioned sound very sort of like they could be kind of
used with that quite nicely or dovetail quite nicely with all the sort of advances there but
we can maybe get into that later on in the podcast yeah um cool so yeah the next sort of
section of the podcast is sort of again having a retrospective and we can maybe talk about some of the the other um projects you've
worked on across your career career and the ones that are the most rewarded you found the most
challenging and rewarding now they might not be the same necessarily things are probably there's
two questions i guess which ones are the most challenging and which ones are the most rewarding
so yeah and take your pick which ones you want to do first. No, I think they probably overlap.
Yeah, that's good.
So I think first, after my PhD, I joined a company called Synopsys.
So actually, I was lucky in that I ended up building a new product and or a new team in every company that I joined.
So it was almost like doing startups in even big companies.
So that was very rewarding overall.
But I would say basically after Synopsys,
which is in the chip design software business,
then I did Yahoo, WebSearch.
That was the place I learned lots of distributed systems,
big data, machine learning, and all that.
Then I moved to eBay, Lots of product building experience.
We rewrote eBay's recommendation
engine from scratch.
But then after that, I joined
a startup.
It was called Turn, which was in
real-time advertising space.
And it
was a late-stage startup.
Real-time advertising, the way
it works is very fast operation, right?
So when you go to a webpage, an advertisement on that webpage normally is not there.
It has to be found in real-time based on you.
Let's say CNN.com, right?
You go over there and CNN says says hey i need an advertisement it contacts
multiple platforms um on almost on a chain and in the end uh everything comes to an ad exchange
which is the market that's that will determine the pricing of that advertisement and all that
but that exchange does not usually have the ad it has to ask somebody to give an ad so somebody has
to be on the advertising site. And TURN
was one of those companies.
They are called demand site platforms.
It's a very high speed operation.
Within a couple of years,
we had about 100 petabytes of data.
We were getting about
over 200 billion events
a day processed.
Multiple millions of requests
per second.
And the company, and this had to grow from, you know,
queries per second was when I joined, it was 50, 60,000.
So when I left, it was close to 3 million.
So you are growing this fast with, you know,
tens of petabytes of data with a very small team startup, you know, with financial constraints.
When I joined issue after issue,
you know, production incidents
and all that,
dealing with all of these.
And it was also my first CTO job.
So I think it was rewarding
and challenging at the same time, right?
It was, sometimes the expression is
you are driving a race car
and it's getting faster and faster and you are changing the parts of it while it's it's moving fast uh
but also you are in the driver's seat it's not like you know you can you have an excuse
so uh that was very challenging from that perspective unbelievable learning from for me
as well as very rewarding because that ended up uh first of all, it was a good training ground for me to test what I had learned at Yahoo and eBay and all that.
So they came really handy.
At the same time, learning experience over there from leading the team as a CTO and moving so fast with a very small team, hiring these people, right?
How do you onboard these?
How do you make sure that you deal with incidents
in a proper manner?
We had to invent many things ourselves in the process.
So all of these are combined with that company.
I ended up replicating similar work after the time,
but first time, that was the real place
that I got challenged and super rewarding
and lots of learning.
Awesome. So I kind of followed on from that question that I want to like to ask is what you're most proud of.
But is that the same? Do those two things correlate? Is that what you're most proud of in your career as well?
What I'm most proud of is I think probably on the personal side, you're ending up changing my industry and company multiple times
how fast i was able to learn and you know keep my brain open and
come to a level that i can actually i can contribute but biggest i think thing i'm
proud of is i was able to build great teams and contribute to the careers of so many people.
And many of these people, if you go check the LinkedIn, are in amazing places.
And hopefully I had some role in their success from the time that we worked together.
In the end, it's all about people and teams.
And that's still rewarding to me.
Sometimes I meet with them
right we talk about those days uh reminisce yeah yeah i don't mean so that's what all the
fires you were putting out every day 10 fires a day wow yeah incident after incident
go to these two things that jumped out there the first one would be
um is that you you're proud of sort of have you kind of as you've kind of gone through your career i've always kind of had this having this open mind
and being able to take on new information and contribute quickly to the new problem you're
trying to solve was that something you were conscious of and kind of systematically kind
of worked at being like that or is that something that just kind of naturally happened and yet you
kind of became good at um i I think probably it came from childhood.
I am still a very curious person.
So I guess it helped a lot because being curious and being in this learning mode,
somehow keeps your brain open and you are open to absorbing as fast as possible.
And the fact that I had to change my company multiple times,
moving to completely different industries,
meant almost like I was willing to start from scratch.
But at the same time, I had to get up to speed super fast.
So I had to learn exactly what that industry is about.
And I really changed drastically, right?
I was in the chip design software business to web search, to e-commerce,
to real-time advertising, right?
It's just completely drastic differences.
But I think that curiosity,
natural curiosity that I had, I guess,
ended up helping me to learn fast and contribute.
And I still keep it and recommend to everybody around me
because that way you don't have, you know,
if you fix your brain or close it, right,
I already know this or somehow I am an expert in this one
from the beginning,
you don't realize sometimes you are missing so much.
I always start from a beginner mindset
and curiosity coupled with
that um i think were super super uh productive for me so that's how it came about yeah i really
like kind of staying curious and keeping an open mind because like as soon as you shut it off right
then you kind of you don't know what you don't know so if you kind of go into it with a kind of
closed mind attitude that you think you know it all,
then you're not going to learn anything, right?
So yeah, stay curious.
I guess it's the message.
Yeah, I think also there are lots of interesting things
to learn about almost everything.
So if you just take a step back
and look at even things that look mundane,
somehow you go like, what is in it?
Or you try to do it the way that everybody does.
But if you take a step back and look at from that curious mindset,
you sometimes realize, you know, there are different aspects
of the problem that you are looking at or the system or whatever.
That actually also helped me.
If you look at some of my patent applications or patents,
like they're all in different areas.
For example, when pandemic came, we pandemic came, everybody was on Zoom.
And I realized it has some issues.
Actually, I have a patent on improving Zoom.
20 people are joining.
How do you actually schedule through these people who should talk first?
Maybe somebody is not talking in those meetings.
Maybe they should be given priority, right?
So the point is like, why would I?
I mean, I was working in Atlassian at the point.
It had nothing to do with Zoom.
We were just using it.
But then if you keep your mind and look at everything with that,
you know, almost like childlike perspective
and look at how it can be better, that actually helps you, right?
You can even innovate things
or invent things along the way.
So I would highly recommend it.
I love that, Kelly.
Going around,
what are the problems that I have here
that could make someone's life better?
How can I make this better?
And obviously you see everything,
you never know when you're going to have
some great idea, right?
So that's awesome.
And yeah, the next question
kind of off the back of the answer yeah the next question kind of off um
that off the back of the answer to the previous question was about team building i know there's
like countless books written on this and the and the art of team building and how to kind of hire
people and all that but yeah i guess i want to kind of get what's your secret of building teams
and how you approach that problem and making sure you get the right people to solve the right problem. Yeah. I think it probably starts with realizing that people come first in every company.
You know, if your team is not happy, if they are not top performing people,
in the end, if it's a public company, for example,
your shareholders and customers will also suffer.
So it's very important to realize that first.
And second thing is genuinely care about people, not only as part of your team, even to the level that in my case, if somebody, let's say, was leaving my team, I was, of course, talking to them to keep them around. But at the same time, I was genuinely interested in finding out where they were planning to go.
And can I help them?
Whether this company is good for them?
Can I actually, if I have a connection with people over there, that I can help this person to find even a better position?
Or maybe even better deal.
So I think high level is, if you up level it,
it starts with caring about people first and genuinely caring about them.
And people will see that one.
Now coming back to once you have that fundamental,
it comes back to if you are building a team from scratch,
it requires what kind of skill sets you need,
where they should be located,
how many people you will need,
how you should be interviewing them,
what does interviewing mean,
what kind of data points you should be collecting,
how do you make sure that you're going to onboard these people as fast as possible
so they can be productive
and feel part of the team immediately, hopefully.
How do you keep them engaged going forward?
How do you make sure that they feel you care about that
and you're not going to say that, but through your actions,
you are showing that.
How do you make sure that they keep improving
with respect to what they do, with respect to their productivity,
their careers are getting better.
Some of maybe your curiosity or other good things
that if you are doing will be contagious
and they're going to be following up in your footsteps.
Also, how much you are learning from them,
how much you are interested in what they are doing.
So in the end, it's almost like a person-to-person relationship,
but it happens, of course, in a business setting and there are business goals and all that.
But if you run it with that genuine concern, genuine attention to what you are doing and care about people, the rest of the steps, I think, come naturally.
And there are multiple other things that you've got to do.
But overall, I think that's how I would what i would describe it nice yeah i love that i said the root of it kind of caring
about people and everything else naturally not naturally but falls from that right i guess um
yeah awesome cool um yeah for the sort of the next question i kind of dig into your motivation
a little bit more and things that have motivated you across your career and yeah i mean obviously
we kind of we're speaking about sort of academic papers and stuff here so i kind of want to ask
if there's any sort of specific papers or research or people even that have um had an impact on your
career yeah who they are what they are yeah um i can maybe start with the people of what people
so definitely i think getting the right advice,
getting the right mentorship
at the right point.
Sometimes people underestimate,
but it is hugely important.
So in my own experience,
definitely, you know,
I come, you know,
originally, as I mentioned,
I come from Turkey,
you know, very small city,
modest backgrounds.
We had a math teacher, you know, he believed in us.
And, you know, even just a little bit of encouragement and teaching us a couple of tricks helped a lot.
I still remember his name.
So it was amazing, you know, some getting that kind of attention.
And I think after that, definitely, you know, doing undergrad work and grad work, right, advisors along the way.
You notice that basically, I have a theory actually,
if you look at many of the famous mathematicians, most of their
advisors were also famous. So Riemann
advisor was Gauss. So somehow I think you learn from
it's like almost
they are the masters that you are learning as an
apprentice uh from that okay yeah yeah because it's not just like reading the books and learning
so there is a certain way of approaching a problem or asking the right questions or how they actually
can make trade-offs or how they can make decisions so along the way i think those kinds of people helped me a lot and you know i got
i got a little bit of from um everybody uh definitely those are the people that i would i
would i would think at this point uh that contributed to my my growth the jedi and the
apprentice right yeah similar sort of thing right you need that kind of biosmosis as well being
around that sort of person is is very um you naturally pick up on a lot of
where they approach things right i guess and yes cool yeah so you mentioned there about advice as
well so what is the best piece of advice anyone's given you kind of any of any of your mentors
across the year what's across the years i think this yeah the best one is it sounds so mundane
and simple uh and it is uh like just do it like just keep start yeah
no it's so true though right i mean like yeah you can sit and think about something for age i am
gonna do this but well it's like okay go and do it then like as soon as you start doing it then
that's when you start the real problems start hitting you in the face and that's how you're
gonna start iterating and yes yeah i think if you look at you know
millions of books of success and all that you know everybody has a different story but
i noticed uh and i i i am a i can claim that i am a student of these things so what are the basically
principles or fundamentals to help you to learn you know how do you learn to learn
and i have done lots of my own research around that. So it literally, in the end, just comes
down to that simple method. Just start
and keep swimming.
It kind of
fails, try something else and keep iterating, right?
And kind of just keep trying, just keep doing it.
As simple as that. Yeah. Like you
said, you know, the fact that you start,
you actually see
issues as early as possible.
And, you know, the fact that you are facing something,
it helps you to ask more questions maybe
or see the problems early enough.
You act on that.
And as a result, you learn something new.
It's not like, you know, statically, you know,
hitting the same wall or banging your head on the wall all the time.
It's just that as you are iterating, it also adapts.
You also adapt to what you are what you are seeing but literally just start and keep swimming but yeah it reminds
it reminds me i think i read actually in a um some database textbook ages ago about it was about
concurrency control and i was like when i was first sort of learning these concepts and it was like
i read the book about two two-phase lock-in and it was like, now you've read about it, go and implement it.
Because until you've actually implemented it
and had to think about it and actually done it,
you haven't really maybe properly grasped it all the way.
So there's that aspect, I think, as well, just kind of doing something
because then it kind of makes you kind of ask,
oh, that's kind of quite inefficient or I could do this differently,
I could do that differently.
So yeah, I agree with that principle.
Just go and get your boots dirty and get doing it and then it'll all kind of go there's actually yeah there's actually a good
paper from i think google about implementing paxos algorithm okay in real life you know what
they had to face and they go like algorithm works it's just that there are so many implementation
details that unless just like you're saying unless you implement it you never you never know that you
know they're real cool so yeah i guess kind of related to a little bit of kind of hitting these
kind of roadblocks when you do something and it doesn't work out is setbacks and how you approach
how you've approached those across the years and setbacks and rejections and and how that kind of
yeah what's your approach for that i think sometimes it's not easy like yeah as simple as
like you're a paper rejection,
for example, or in real life,
your manager might not like you
or get a bad performance review this year.
I think, you know, yes, emotions sometimes will come.
You might feel angry at that point.
But I have seen over and over again that many times,
let's say I interviewed with a company
and I did not get accepted.
And later I realized
that that company actually failed.
If I had joined,
I would have spent,
I know this many years,
I might actually might have suffered
because of that.
There was no benefit
that I was going to get in the end.
Or actually you later learned
that the company had a very bad culture.
So I think there's probably a reason for some of these things.
Yes.
Emotions will be coming,
but try to keep it short maybe.
And sometimes there's a wisdom in these things and just definitely learn from
it,
right?
There must be some actions that you got to take.
You cannot just keep blaming the other side.
You know,
if I get a good performance,
bad performance review,
my manager doesn't know me or whatever,
but there must be a reason that you go like, well, why am I even surprised with that?
Maybe along the way, we did not discuss these things.
Or maybe I was hearing these, but I did not act on these.
So maybe I was not hearing or I was not listening.
I think it's a good idea to learn from these and act accordingly to get better.
But yeah, sometimes in the short term, hear something you go like oh you know it's human nature right on some other kind of your instant reaction is
always not we're kind of irrational but you're back to react emotionally i guess at first right
to these things and then maybe after you've called off a little bit you think okay maybe
they were kind of right and i can improve and do this better and yeah yeah especially if it's a paper i
mean you you are you have published a lot too so sometimes you get the you know review review of
comments and they rejected your paper you go like you don't know you didn't even get the paper that's
not true but everybody yeah there's no way you can respond to it but there are other places you can
submit right so these days i don't publish anything in regular conferences i just send to archive and if people like it people will like it so yeah yeah that's awesome cool so are you so we mentioned
this earlier on i think maybe a kind of a little kind of this principle of curiosity right but
and the next kind of this is actually my favorite question of all the questions i ever ask anyone
is about the creative process and how you approach idea generation and if there's a
systematic way of doing that and obviously we've mentioned this curiosity sort of principle
and but then once you kind of have that then selecting what to work on or how to narrow your
focus to not kind of bounce around off a million different projects and not actually ever making
any tangible progress with any of them so yeah I kind of want to get what your approach is to being creative.
Yeah.
Like you said, I think it starts again
from
showing genuine interest in
the problem that you're
facing or you're reading
a paper. I think
the attitude of I already know this
is probably not
good. The other thing is uh
maybe the other attitude that oh it's just there's no way i'm gonna be interested in this or it's not
even in my area why should i even worry about that i think all of these literally you are limiting
your potential in my opinion so one is just keeping that again curiosity hey seems like
something interesting over here you know it's not like i'm just gonna use this for my work but
uh sounds like an interesting problem let me just look into that so i think
it just starts from there that from there and keeping your brain open um just like you said
i think it's a good idea to get your hands dirty a little bit if it's a math problem i don't know
you can try to solve it yourself or if it's a you know result that you want to drive yourself or if
if it is uh some um programming task or something, algorithm or whatever,
I think just start implementing that, see how it goes.
There are different ways of doing that.
You can definitely make progress and learn.
And along the way, you wouldn't know what's going to come out.
Definitely write things.
I have noticed that people sometimes keep avoiding writing.
If you write what you
understand from a problem, writing, in my opinion, is also thinking. It literally opens up. Because
the fact that, and you have seen this when you write paper, writing an introduction or abstract
is so difficult because it just tells you that probably you have not completely simplified the
problem to a way that you can
actually explain to somebody else, right? It also shows probably gaps in your own understanding.
So writing is another one that I use. I try to make my assumptions explicit, even if they are
sometimes trivial. Even in meetings, when I talk to people, I actually say it like that too. Hey,
I'm going to make my assumptions explicit. Sometimes people say, oh, it's like, this is trivial. Why would you even say that? But you don't realize
that after some time, even though simple assumptions might not be shared, that actually
opens up the discussion and more people can be contributing to that. I think there are lots of
techniques of thinking well. So if you look at, for example, in math,
I think from,
there's a famous mathematician,
Jacobi or Jacobi,
so he's, I forgot how they pronounce his name,
but he has two really good suggestions.
One is generalize, right?
Moving an abstraction level
sometimes actually simplifies the problem.
This is also one of the techniques
that if you look at George Paul,
he has a book about how to approach problems.
It's more about math, but if you realize some of those
are applicable to generic problem solving too.
And the other thing is invert.
Sometimes you look at problem one perspective,
let's say you forward from A to B,
but if you invert it, sometimes you see the problem from the other
side, and that actually could be a new
way of understanding.
And also you might actually invent
something new, and in my experience
it has happened a couple of times.
Even though it sounds simple, but
there are some, I think, uber
techniques with respect to
how to explore something. There are
lots of heuristics that you can use to, for example, summarize something really
fast.
What's the main things that I should be looking at?
Or if you're looking at reading a paper, even the simple techniques like, you know, how
do you read a paper, right?
You know, you look at, let's say, plotting that paper, you know, even just paying attention
to that and X and Y.
Can I actually reason very quickly?
Should I just start reading from beginning to the end or scan it first, look at a couple of important things, maybe abstract and conclusion or whatever, then deep dive into that.
And for each sentence over there, sometimes you got to pay attention to any high level. I would say, you know, curiosity, keeping your brain open.
But there are a couple of techniques to think,
heuristics or fundamentals or principles.
It's a good idea to learn.
And you notice that many people actually have been using those.
Some of these are not verbalized or all that,
but there are places that we can get some of these.
And they are, at least in my case, were super useful and i am conscious of those sometimes i
explicitly use them to help me awesome yeah just kind of going back to the the writing is thinking
um uh you mentioned the other answer and that that is something about my uh my supervisor for
my phd said those exact exact same that's the exact same thing to me because obviously I've always on the sort of the math versus sort of English sort of I've always leaned more towards
numbers than words shall we say hence probably why I've ended up kind of down the career path
I kind of kind of have chosen um but I always kind of tried to not like write because I just
didn't like it and then it was kind of when I was doing my phd it was kind of like you need to do
this and iterate write more and more because the first thing you're going to be terrible but then
that just shows that you're thinking isn't fully crystallized yet and writing is a way of
communicating your ideas right so you need to keep iterating because that helps you then explain and
understand the problem better and makes you think through more so yeah writing definitely is is
thinking it forces you to try and put it into words. You think you know it, try and put it into words.
I bet you don't most of the time, right?
So I really, really like that, yeah.
That's true, especially if you have also like formulas
or whatever, even like symbols that you are using, right?
So it looks too complicated.
You got to iterate on that.
But going back to the earlier principle
that we were discussing, Jack,
writing sometimes you have writing block, right?
It just doesn't feel like you should actually go ahead and write. I think it comes back to the other recommendation that we were discussing, Jack. Writing, sometimes you have writing block. It just doesn't feel like you should actually go ahead and write.
I think it comes back to the other recommendation
that we were discussing.
Just start, even if it's just one sentence,
and keep random stuff.
And after some time, after some iterations, like you said,
you're going to reach a far better state.
And actually, it's going to clarify.
It's a two-way process.
The fact that you're writing, you're actually making progress.
But at the same time, it helps you to clarify it. So it's a two-way process. The fact that you're writing, you're actually making progress, but at the same time,
it helps you to clarify your thinking
and what you're trying to express
and actually improving your thinking.
So it's super helpful with that.
Yeah, and on the sort of
making the assumptions explicit up front
and communicating those
kind of helps everyone
get on the same page, right?
Because we did this exercise once
at work where it was, people had to, you were kind of helps everyone got on the same page right because we did this exercise once um at work where it was people had to you were kind of back to back and you had to basically
describe a house to somebody and then they would draw it and then based off your description
it would they were you kind of like well that's not the house i described but it's those sort of
um what's the word i'm looking for there's a implicit assumptions you're making that it could
be completely different to the other person and then you end up just talking past each other almost. Right.
So making them clear and getting everyone on the same page, I guess, makes for kind of a better environment to be creative because everyone's sort of in the same page.
But yeah, and I'm going to go check out that book, How to Solve It as well.
And that sounds very interesting. So, yeah. One more to the reading list.
The monotonically increasing reading list. Right. That's the other thing. Yeah.
So I can recommend a couple of other books related to that jack one is uh richard hemming i don't know
he was a you know hemming distance hemming that person yes yeah yeah he had a couple of uh really
insightful books um i would i would recommend the art of i think the main name was the um
the art of science and engineering or something like that. I forgot the exact title,
but really good book.
There is this professor in,
I think, Germany,
Gigerenzer,
I think his last name is.
He's about talking about heuristics,
how to, you know,
be smart about things and, you know,
what are the heuristics people are using,
for example,
to make decisions,
to trade off analysis and all that.
There are a couple of,
you know,
it's not super common, but there are books like that.
There's also one I would recommend is, the person's name is Rektin.
His last name is Rektin.
Eberhard Rektin, R-E-C-H-T-I-N.
He has a book about systems architecting.
So this is systems architecting.
Normally, you know, people are using this term when they build, let's say, space systems or planes or ships and all that, like super, super complex systems.
And software is just one part of it.
And there are lots of learnings from those books.
And you can also see in those books, in his books, for example, how do you approach a complex system?
How do you build it? What are the
heuristics you are using? How do you
partition the system?
What are the criteria you are looking for?
And many of
these are not well known, to be
honest, in the software architecting world.
Normally, definitely, you
look at enterprise architecture, software
architecture. There are lots of books on that.
But I was lucky somehow,
hit upon these books
and I had to learn a different aspect
of how they approach building big,
little, huge systems.
And they also have lots of good heuristics
about, again, approaching,
not only systems building,
but also how to think about solving a very,
very complex problem.
Nice, yeah. There's two more there as well.
I'll put links to them in the show notes
and also the listener can go
and find them as well. Yeah, that'd be awesome.
Yeah, I appreciate that. Thank you.
I've sketched it down on my notes, but my handwriting when I'm
writing notes is sometimes it's kind of
unreadable by the end of it.
But yeah.
Cool. Awesome stuff, Ali so the the next sort of question is is kind of what you think about the interaction between academia and industry and I'm guessing kind of you've you've kind of touched
both camps over the years but primarily being in the, from the industry perspective. So yeah, I'd like to get kind of what your take is on that interaction and how it can
be improved.
Yeah.
So, uh, since like yourself, I did PhD definitely, um, you know, spend lots of years in academia.
Uh, I actually wanted to become a professor right away, but somehow I chose a middle path and I joined
a research organization within the first company that I worked, which was Synopsys.
They had a group called Advanced Technology Group.
It was mostly researchers in that specific domain, but at the same time, it was almost
like doing academic work in industry.
So you see this in especially big companies.
I think no successful company can definitely avoid doing work at the forefront of research.
So now who should be doing it?
So maybe the typical understanding is that kind of research, open-ended thinking comes from academia and industry just implements these things.
But just like we discussed earlier, right, sometimes doing the real work creates more problems for you to solve.
And as a result, how are you going to now do that?
Are you going to route these problems back to academia to solve or are you going to do it internally?
So in a way, this work
has been blended
for a while. I think definitely
they both need each other. So industry,
I'm hoping that will support academia more and more
with respect to
maybe introducing them to
problems that they are facing.
Hopefully, financially supporting
the universities and researchers.
Definitely through internships, summer sabbaticals or things like that, focusing hopefully financially supporting the universities and researchers definitely through
internships summer sabbaticals or things like that i see more and more uh people who are doing you
know let's say graduate work or professors are spending some time in industry so that
that i think blending is working out pretty well the other direction is probably not that
not that um productive at this point.
You don't see many, let's say,
people from industry going and
spending, I don't know, one year in academia,
just to bring that perspective,
the real-world perspective to
students and professors,
people around them. I think probably
there should be a way of maybe
doing that too.
There's also also in academia,
as you know,
publication pressure,
right?
All that.
So industry,
you know,
you got to find grants and financial support for your students and yourself.
I think overall,
basically I would say this is just the two,
two legs of the same unit.
Definitely.
There has to be a really good working relationship between these two.
And interaction should be both ways.
It should not be just academia
and just raise people
and we're just going to hire them away.
I know the financial situation
probably is in the industry is far better,
but if you keep hiring all the good professors,
very soon you're going to run into issues
not having good students
because who are going to educate them? So I think there are lots of people professors very soon you're gonna run into issues not having good students because we're gonna
educate them so i think there are lots of people who want to do definite academic work um so i
think relationship is reasonable but it could be far better in my opinion along the lines that i
suggest yeah i kind of i mean well for personally one of the reasons why i kind of veered more
towards industry after my um
phd was that what you mentioned there about the incentives in academia are very different in the
sense that you each grants it's people sort of optimizing their h index and that you know that
sort of uh attitude public the publication pressure and it's often restrained i don't know
it didn't really sort of align with kind of what I wanted to do.
And obviously, well, I mean, the financial situation
is a lot better in industry as well.
So there's that aspect to it as well.
But yeah, I definitely agree.
And it would be nice to sort of have that kind of having people
from industry kind of going to academia for a year
and take those real world problems, because that's another thing
that I think sometimes missing from academia is like,
it's a lot more rewarding when you're working on projects that have an
actual you can see who it's going to affect or how it's going to affect a product or a group of
people right so having that kind of scoped initially is a lot nicer as well so if you
kind of have somebody come from uh to come from uh industry into academia for a year that would
be nice as well so yeah
that's a nice interesting idea there are lots of real problems that i think uh it's gonna be a good
opportunity also for the people in academia to publish more papers because yeah right exactly
give the h index a bump right so yeah awesome cool so on to the on to the last question now
ali so it's about the future and current trends.
And what are the most exciting things you're observing at the moment and the trends and what you see as promising directions for the future of this space?
I think the biggest thing, what we are seeing is right now, what is all the buzz about?
I think AI and GenAI especially. It's,
if you look at my lifetime, first time when we got the
web, World Wide Web, the fact that
you could try it yourself and you could see
the benefit right away,
it was useful with only
with you in the picture.
So with now,
so AI, as you know, has been, we have been using
when I was doing advertising or Yahoo Web Search,
search results ranking was using machine learning.
This was whatever, like almost 15, 20 years ago.
So it's not like AI was not in the industry
and many things were automated at huge scale.
Most of the advertising was using machine learning
for sending you the best advertisements.
There's no way a human can do it millions of times per second.
But right now, I think, especially with Gen AI or the level that AI has reached, you can see that this becomes now useful to ordinary people.
Literally, you can go ahead and use yourself and send a big article right it's going to summarize it for you
and most of the time you go like well actually pretty good pretty good job or writes a song for
you or creates an image right in the past these were very difficult problems i think it's just
amazing how it has reached uh but more and more i think this one is how much of that can be used
in industry, definitely.
And you can see people have, companies have jumped at it already because the benefits are obvious.
But there are lots of questions around that, right?
For example, where are the places that now you have to use humans for, right?
It's a good idea.
I think it's impossible to eliminate humans completely from this picture, but what's the best place to take advantage of their input across that,
across that,
let's say a flow of things that the company has to do,
you know,
whatever it generates,
for example,
or you are producing results,
how are we going to verify that?
Right.
So if I,
let's say I'm summarizing a three hour conversation between us,
even this podcast,
how am I going to know that the summary was correct?
Should I go back and listen to the hour of podcast myself
and say, oh yeah, I would have produced that summary?
So some of these are difficult problems
and have not been solved yet.
But I think more and more,
we have to definitely find solutions
to verifying the results of all this AI deployment,
especially Gen AI deployment.
There will be lots of machine-generated data, definitely.
I mean, you already know there's some work on security aspect of it,
whether these things are fake or not or real or not.
But I think these will be unavoidable.
So you have to pay attention to that and figure out ways of again detecting all of these
uh more and more you're gonna see i think ai used in defense and offense right in military
applications and all that again where are the boundaries like what's the what's the ethical
usage but beyond what's the best usage for the, again, benefit of humanity?
There is surveillance aspect of that.
You can automate everything now.
Again, how do you solve implications of that?
How do you make sure that basically this is used
for the benefit of everybody,
and it's part of the productivity that we need?
I don't think to the level of you say you should ban it, right,
because benefit is so huge that it's going to be unlike that people will be
doing something like that.
But at the same time,
how can we make it part of our lives and find a really good place for all of
us to actually contribute to the overall good?
So I think research around all of these,
probably many of them are already active.
And so finding solutions for these probably is super important
because we ourselves need solutions like that
for the products that we are deploying
very soon for our customers.
Yeah, the one that sort of jumps out to me there
is the verification aspects of it.
And obviously, it's well established
that Gen EI and LLMs have this sort uh kind of problem of kind of hallucinating right and just
throwing out some garbage and being like yeah this is the truth this is like i know for academic
papers it's really good at generating some citations of papers that sound really really
good but like they just don't exist the authors don't exist so yeah i gotta come back in that
a provenance problem of like why it's saying what it is and actually supplying a ground truth of like yeah i say i'm saying x
because i because of y right um it's something i think that's not what's not been solved yet
and it's really important it's becoming in there are some you know reasonable solutions almost
closer to the way that you write papers right right? So you write a paper, you give references to other work.
So you see more and more
now many of the, what has
been generated is referenced.
But at the same time,
are you going to go, let's say there are 100 references,
are you going to go verify all of these?
Oh, yeah.
If you use another LLM to do that one, right?
So I think,
yeah, related to that,
I will also, Jack, mention that I think understanding
definitely more and more
why they are so good
is going to help us
to do things better, I think,
you know, as humans, right?
Maybe we are going to say,
you know what,
maybe all the things
that we were saying
that humans are really good at,
actually, maybe it's as simple as that.
It's just that we did not know
the solution before,
but using the fact that we now found an automated solution and found a way to explain
that reasoning maybe the secret sauce was simple yeah it will reveal all right when we pull back
the curtains yeah yeah that's awesome stuff that's brilliant well i think we we can we can end things
there it's been an absolutely fascinating chat i thank you very much for taking the time to speak
with me today it's been awesome and i'm sure the listener will really have enjoyed it as well so
thank you very much thanks so much great stuff what an opportunity yeah hopefully it's going to
be useful so thank you i'm sure it will be great stuff and yeah and we'll see you all next time
for some more awesome computer science research.