Disseminate: The Computer Science Research Podcast - Andra Ionescu | Topio: The Geodata Marketplace | #31
Episode Date: April 25, 2023Summary: The increasing need for data trading across businesses nowadays has created a demand for data marketplaces. However, despite the intentions of both data providers and consumers, today’s dat...a marketplaces remain mere data catalogs. In this episode, Andra tells us about her vision for marketplaces of the future which require a set of value-added services, such as advanced search and discovery. Also, she tell us about her and her team's effort to engineer and develop an open-source modular data market platform to enable both entrepreneurs and researchers to setup and experiment with data marketplaces. Tune in to learn more about Topio a real-world web platform for trading geospatial data, that is currently in a beta phase.Links: Topio MarketplaceAndra's HomepageAndra's Twitter Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello and welcome to Disseminate the Computer Science Research Podcast.
I'm your host, Jack Wardby.
I'm delighted to say that I'm joined today by Andra Ionescu,
who will be talking about her work on the Topio Marketplace.
And this is work that will feature at ICWE.
And Andra also had a demo on it at EDBT that won the Best Demo Award.
So congratulations for that, Andra also had a demo on it at EDBT that won the Best Demo Award. So congratulations for that, Andra.
So yeah, so more about Andra.
She's a PhD student at the Technical University of Delft in the Netherlands.
And her research interests are data marketplaces and data set augmentation.
So Andra, welcome to the show.
Thank you.
Thank you.
I'm happy to be here.
Great.
So let's jump straight in then.
So I've given you a brief introduction there,
but maybe you can tell us a little bit more about yourself
and how you became interested in data management research.
All right.
Hello, everyone.
My name is Andra, and I'm a fourth-year PhD student at TU Delft.
Well, fourth year means that this is my last year in my PhD journey and I'm close to
graduation, hopefully. How I got into data management? Well, I got into data management
since my master's. So my master was in data science also at TU Delft and for my master's thesis I had the opportunity to collaborate with
Christos Koutras who was my thesis supervisor back then and now he's actually my colleague.
And we worked together on data integration, we created Valentine which is a benchmarking tool
for data integration. We did this together with more colleagues, of course.
And this was a challenging experience as a master student.
And I worked on this big project with tools and topics
new to me.
So I loved it.
I really liked it.
And I expressed my intent to pursue a PhD career, PhD trajectory.
And luckily, Asterios Katsifodimos, who is my supervisor now, had the position open.
And this is how I got into data management.
So it's basically thanks to my master's.
Oh, amazing.
It's quite a similar sort of experience I had in mind.
I had a really nice master's experience that kind of encouraged me to then pursue a PhD.
So that's fantastic.
So today we're going to be talking a little bit about data marketplaces, right?
So maybe for the new initiated, you can kind of start off and give us some background and explain to us what they are. Well the data marketplace is
actually a marketplace but for data for our data sets so data is treated as a commodity
and it's traded between providers and consumers so someone has data that they want to to trade, to share with others in exchange for maybe some money.
And other people are interested in actually getting more data.
And it's not necessarily about, you can take an example,
us as researchers, because we need data,
but also companies who are working on a million things,
they also need data.
So data market platforms trade these data sets and can generate revenue by providing
external extra services to help both providers and consumers.
Awesome.
Yeah, I'm forever reading about what we've had over the past five, ten years.
Data is the new oil, right?
So they're always saying, right?
So it's only fitting that it has a marketplace to buy and sell the commodity, right?
So yeah, that's really cool.
So maybe you could tell us a little bit more about what existing platforms are there out there
and what are the problems with them, essentially?
All right. Well, there are a lot of platforms out there and
I can answer this question from different angles because you can look
at the landscape and say that there are too many platforms and why not have one
and sell everything on that single platform?
But this can become messy and hard to maintain.
On the other hand, there are a lot of marketplaces which are specialized
for different data types or businesses or fields. For example, geospatial data as we developed the platform.
And there's also the data management perspective.
And we can look at the struggles regarding traditional data management challenges,
such as profiling, integration, metadata curation, enrichment, data search recommendation. And I think some of the platforms that are out there
handled and solved one of these problems,
multiple problems, to some extent or completely.
We don't know because they are businesses in the end.
So in the end, I think it's all about the gain and the benefit.
Are there any sort of names that I'll be aware of in the data marketplace?
I guess the big players maybe have some platforms that they have,
but I'm not really aware of any of them, really.
I mean, I've never used them. I've never looked for them.
So, I mean, that could be my problem, right?
So I'm never going to buy data, so I'll sell it.
So we have AWS Data Exchange, DataRate.
These are more industry-focused based.
Right, yeah.
And for GeoData, there's Carto and here.
To some extent, they are data market platforms,
and they do other things as well.
So I don't think there's one that has only one focus I see I see
okay so I guess kind of building on that then she can give us maybe the yellow
bitch elevator picture top you then so how do you go go about addressing some
of these problems you've mentioned and maybe as well can you maybe ask folks on
why geospatial data specifically well we wanted to to focus on geospatial data specifically? Well we wanted to focus on geospatial data as open source data
as well because geodata is well there are a lot of companies and businesses around geodata, but there's not one place for geodata also in the context
of Europe. They are more focused on the global expansion and so on. So I wanted to make it
more focused, more specific, also because it's open. So you need to go a bit focused
and specific. And yeah, I would say that Topio is an instance of an open source market platform.
It's designed with openness, reusability in mind. We have a lot of usable libraries,
which you can use not necessarily to build a marketplace, but also independently.
And I see Topio as a joint effort to build an open source platform.
Awesome. Cool. So when you were going about sort of designing, it's obviously you've surveyed the
existing marketplace and the existing kind of the landscape of data marketplaces. But you also,
I know in your upcoming ICWE paper, you did a survey, right,
where you kind of asked both the people who consume data and those who produce data
to kind of motivate the design and make it usable, like you said a moment ago.
So can you maybe tell us a little bit more about this study and what you found from it?
Sure. So we conducted this service with the goal to understand the needs, the requirements,
the preferences of both providers and consumers from very diverse backgrounds.
So we have participants from geography, information technology, marketing,
with different roles in the organization and different business fields.
And to summarize the findings, we observed a high interest into being part of a data market platform and selling data that way.
But there are a lot of challenges.
So the consumers have challenges regarding standardization of pricing, of contracts, of payment, fees, commissions, this sort of business things.
And the providers, on the other hand, expect easy access to data, transparent terms and
conditions, transparent costs. So there are a lot of bureaucratic and financial aspects
which probably they are an impediment with the other platforms who make a business out of it. And aside from these parts,
both consumers and providers want the same things,
which is perfect.
So they want the same data formats.
They want to use the same services.
They want high data quality
and possible templates for licenses, contracts.
So the good part of the surveys is that they turn out to be aligned
in terms of requirements, which is perfect, I would say.
Yeah, that's a great find, right?
Because if you find sort of a disconnect between the producers and the consumers,
you then need to do something to sort of kind of realign them,
which makes the challenge even harder, right? So it's good that they both want to be on the same
page because it's just now giving them the platform to do that. So let's dig into, on that,
let's dig into Tokyo a little bit more then. So can you tell us, given what you found from your
study, can you maybe describe the architecture of the marketplace a little bit more and the various components that make up the marketplace.
All right.
Well, this is going to be like a 30-minute talk.
Go for it.
Yeah, yeah.
Just go for it, yeah.
Probably like the ICW presentation.
Well, it is a good practice.
Join it.
All right. Back to the question. So the platform, as we described it in our paper, has five
major components. Ingestion of the data, search and discovery, recommendation, profiling, and delivery. Of course, around these components,
we also have other components regarding the workflow
of the platform, the legal aspects,
like I mentioned before, the contracts, licensing,
things that we intentionally left out of the papers,
because this is not really our domain.
So we were merely developers for this component.
And of course, there's also the UI part.
That's a different story because you have to make the platform look nice and also be functional for the users.
So I will focus on the backend side.
So more specifically, also the research side. So the more specifically also the research side.
So first the assets are ingested and stored in Topio.
So a data asset is upload, it's versioned,
it's curated and then stored.
From there, the asset is directly delivered to consumers
in their preferred format.
So data assets lifecycle includes publishing, purchasing, delivery.
Then we developed value added services, such as the data discovery, the recommender system,
profiler, because we wanted to increase the benefits for the consumers.
So these benefits are twofold.
We want them to better understand the value of an asset
based on the metadata that we compute through the profiling service
because we want them to understand what the data set can be used for,
what is the value of it, and then make an informed purchase. And the second benefit
is also easier access and discovery, personalized recommendations of related and complementary data.
We understand how difficult it is to find data, so having an engine that does that for you,
based on your searches and behavior in the platform i think that's valuable
for the consumers and of course also for the providers because we they can sell their assets
in the end and make them discoverable and not buried down somewhere in the corner of the the
platform so to say awesome cool so yeah there's two things that kind of jump out that i want to
dig into in a little bit more detail. How the heck you go about
Working out the value of a piece of data like and how do you go about pricing assets in?
Pricey keys indeed a hot and debated topic and now especially in data management as well
We investigated the possibility of making our own pricing strategy
and deriving pricing from selling subsets of data sets, for example, or views. But this became very
challenging. I mean, I think you can do a whole PhD on this topic. So TopView prices datasets in two ways in the end.
So first is paper dataset,
which means that this is the simplest form of pricing.
And which means that we don't really do the pricing
as a platform, but we let the provider offer datasets
to the consumers for a fixed price
and provide discounts or price per bundle or
they do their pricing and the second way is pay per api call on a value added service so when
consumers read data from a value added service API, providers can set the price per API
call. These calls are logged and then you charge on a per call basis. Similar to cloud platforms,
right? You pay for as much as you use. But this is it as far as pricing is concerned from Topio
side. We wanted to let the providers do their own estimation and assessment.
Sure, cool.
Obviously, I guess there's a whole,
you hinted there you could do a whole PhD on the pricing of data.
Is there any other sorts of potential strategies you explored initially
that then you decided to go, you know what,
it's going to simplify proceedings and just make it a lot easier to build this marketplace if we kind of shift that sort of pricing either to the provider to set the price or go for like a pair API sort of model., actively explore other methods because for that we
already needed to deploy the platform basically and have the users who will act normally.
Yeah, it will not work if we just pretend to know what we are doing because the the providers um from our surveys at least are people who
know how to sell data so they sold data before through other means not necessarily using a data
market platform but they know how this business is going cool so yeah let's let's let's then dive
into a little bit more maybe about the discovery, the value-added service there.
So you mentioned a few different value-added services that you've incorporated in.
So let's talk about discovery first.
Can you maybe tell us how you go about achieving that essentially in the marketplace?
Well, we start, so imagine you just deploy to the platform.
You have some data sets, either open data sets
that are free to purchase, by the way, but we just make it discoverable,
or providers who already
uploaded some data sets. So we
first create our own data representation
structure, so to say, but we use a graph
where we map all the datasets.
And then from there, we have two strategies
of making datasets discoverable.
So first, we have the joinable, unionable case,
where we just look at the directly connected nodes
to one given node.
In the case of Marketplace marketplace that will be the data set
that you're currently inspecting, viewing, so you already found something and you
want to to find more to augment it maybe. And there's another component which is
linking data sets. So let's say you have some assets which are favorite
and you mark them in your favorite list
and then you are browsing other assets.
So we can look now on how to connect the assets
that you're currently viewing to the assets
that you marked as favorites.
And we are doing this by looking at transitive paths, so traversing the graph basically from
a source to a target.
And we implemented this in a Jupyter Notebook environment, which is also provided by Topio.
I guess I want to touch a little bit more on the you mentioned something earlier on about
combining data sets and like doing views over them is that something that you currently that
functionality that exists or did you decide not to do that or is it basically is the the boundary
of each asset essentially like you're not doing any sort of pre-processing in topio to kind of
combine two probably similar data sets together into one not doing any sort of pre-processing in Topia to kind of combine two
similar data sets together into one bundle?
Is it sort of very, everything's very siloed?
Is that like, the producer puts it on
and that's what gets sold, essentially?
No, we didn't do anything
in this direction.
I think this would
be the data discovery
limitations, in a way.
Okay. Because we did not explore the fully capabilities of working with geodata.
Okay.
And I think this part can, yeah, this can look very nice in the context of geodata,
but it's not yet explored.
Okay.
So it's another PhD explored okay i see so it's a it's another phd's worth of work
no i think that you can do this in a one quarter no one quarter of your phd so that would be one
year yeah yeah cool awesome so you said that a second ago as well about kind of when when a
consumer is inspecting a data set and they can obviously then follow through to see other data sets
and the discovery sort of engine within Topia will recommend to them,
will show them things that are similar.
But when I'm looking at inspecting a single data set,
what sort of stats does a consumer get presented with?
Like what's the interface there?
Like what are the things that I would potentially be interested i would potentially be interested in and like how do you characterize
data assets i guess is my question all right we have a lot okay about that
so first is the data the metadata that the providers um input when they upload the data set
things particular to to the data set such as the format,
language, and so on.
Things that a human can input easily without any problem.
Then we have the automated metadata
which comes from our profiling service.
That's something that we developed.
There's also a paper about it from Athena Research Center.
They develop Big Data Buoyant, it's called.
And it's a profiler specifically for geodata, geospatial data.
And we provide a lot of statistics around the column, distributions, also small graphs for each column.
Then there's the map section where you can actually visualize the data set and see
which area it belongs to. You have different samples. I think there's four samples for each
data set so you can actually scroll through it and see the the values um
and i think that that's about it yeah okay nice so you get you as a consumer and you get presented
with a whole host a whole a wealth of statistics there to make your decision whether you want to
you want to buy buy said data set that's really cool i'm i guess um guess you mentioned a second ago, again, about the implementation of Topio.
You mentioned that this is expressed in terms of a Jupyter notebook,
and that's kind of how a consumer would interact with the marketplace.
Can you maybe tell us a little bit more about the back end?
Like you said, the assets come in.
Where are the assets stored in the sense of is there a database in the background there that's storing
the information or how is everything sort of hosted i guess it's all in a three buckets and
then you build on top of that or yeah how does it look like well um unfortunately i don't have
any details about that because um um my co-authors and the partners from Greece, from Athena, they handled all these technicalities.
And I know that all the services are deployed in Kubernetes.
That I can tell you.
And it was quite complicated at some point.
Or at least in my opinion.
But there are, I mean, I know for recommender system and for discovery service,
they use the graph database.
One of them is Neo4j.
They also use for the metadata Postgres,
but I don't really know about other technical details.
Okay, cool.
Yeah, indexing the metadata and searching for the assets that's Elasticsearch based.
It's kind of a collection of existing solutions that are kind of together as one,
I've been deployed as one sort of whole marketplace together, like a wrapper around it,
kind of coordinating all these different services and products and data.
So that's the thing,
because we have multiple components,
which are open source.
You have the GitHub repo and each component,
each library basically describes what they are using.
So in the end, it was just a matter
of putting all the services together so that's why
you use kubernetes because it's easily plug and play kind of a microservice architecture i see i
see yeah awesome that's really cool i'm yes i guess whoever kind of stitched a lot together
using kubernetes had some fun trying to get all that to work yeah fun i would not describe that this fun
um cool let's talk a little bit more about the um the usability of this is obviously kind of a big
design principle in designing this marketplace and i know you in your icWE paper you have an initial sort of study on the usability and the performance of the marketplace.
So can you share your initial findings on that, please?
Of course. So we used the beta version to assess the data lifecycle, basically, in the platform.
So we measured the time that the users spend on
publishing and purchasing because ultimately this is the most important thing for the data
market platforms can you publish your assets and do you have any problems publishing your assets
and then once you found something that you want to buy can you actually buy the assets do you have any
difficulties in buying assets so we evaluated novice and expert suppliers um the positive
outcome was that most suppliers actually added more data metadata so this means that they understand the need for sharing metadata and to make it
as explicit as possible because in the end this makes it very easy for for the user to discover
and purchase their assets another interesting observation and this is regarding the pricing
so when the suppliers uploaded the data they also have the option to create services.
And they spent a lot of time on this process because they didn't know how to price the services.
So they didn't know how to price their data that was easy peasy, lemon squeezy, but when it was about
services, which is basically a new market activity, they needed more time.
So more consideration was needed to allocate the right price for the service.
And one expected finding was that the consumers actually didn't have any problem buying the
asset.
And we are very happy with this because in the end,
we wanted to go for a e-shop experience.
And people nowadays,
I think they are used with finding an item,
put it in a cart, buy it, done.
That's about it.
Yeah, click, click, click, done, right?
Yeah, it's great.
It's deadly that the amount of clothes I buy online
because of that,
they make it too easy to buy things right but anyway that's really cool so
i mean the findings are obviously very very positive and kind of motivate you continuing
this this this line of research and but can you give us some sort of idea in terms of the
sort of volume of data that sort of flows through the marketplace. Do you have any sort of numbers on that?
And maybe how much money, I guess, transfers through Topio at any point?
I don't have this data point.
I know that for the time when I had a demo,
there were hundreds of datasets, some open source, some private.
But in terms of transactions, I don't have these numbers okay no no problems
i guess though i guess there must be like quite a bit of activity in there for you to sort of
get feedback from like producers and consumers right so i guess that's a good indicator that
um it's it's useful right if people are using it so um that's really cool um yeah so that obviously
paints um uh top here in a very in a very good positive light and shows that it's
kind of going in the right direction but are there any sort of
things that are probably suboptimal with Topio at the moment or kind of what are the general
limitations of the marketplace at the moment today?
We definitely have limitations
so one thing that I can think about, it's also the discovery service,
because I see that it's performing a bit slow.
So when we will have even more data or consumers, it will become quite slow.
So we need more optimizations in that part. Definitely more research on data versioning,
provenance, watermarking,
even segmentations of the assets, right?
To create different smaller views,
but based on the geo coordinates.
One thing that we got feedback actually at
EDB team, and it was a very interesting point that I haven't
thought about it before. To you as an user, you have a data set.
And scenario will be that you can use the marketplace to find
related assets to the one that you have. So that would mean
that you can upload your assets without actually becoming
a consumer. Because for now, we have this workflow that if you want to do anything in the platform,
so sell data, you have to become a provider, use the notebooks, and so on, you have to become a
consumer. It's not difficult to do this, but there's an extra step, right, that
we didn't take into account in the beginning. So I would say this is a limitation.
This is a nice segue into my next question is, how do we go about addressing these limitations?
And where do we go next in general with Topio? It seems like it's a very big project, right?
There's a lot of different people involved. So I guess what's the big picture sort of view?
And then we can maybe go into your specific next steps.
Well, for now that we made the beta version,
there are still some services in the alpha phase
that are not yet deployed in the platform.
For example, the recommender system is still not deployed because we were waiting for more
users, more data.
With the recommender system, you need activity in the platform in order to actually build
something that is working in your advantage.
But for now, we are taking a break
because it's been a very intense effort.
And yeah, as I said, I'm towards my last year.
So I have other things to worry about as well.
Yeah, that's from my side.
I don't know what my partners have planned
because, as I said, we are in a bit of a holiday mood now.
Shutting down for summer, right?
Yeah.
Awesome.
So there's no sort of plan initially
or the immediate future to integrate chat GPT
into the marketplace, right?
Well, this is so new that we couldn't the marketplace, right? Well, this is so
new that we, I mean, we couldn't
foresee it, right?
Oh yeah, I would enjoy it.
That's cool. Awesome.
Maybe chat GPT will become
a data market platform.
Maybe so, right? Hey chat, can you
find a data set for me to
buy? You can see it happening, right?
As a sort of software developer, as a kind of data engineer,
how can I go about leveraging the things in your research and top it up?
I guess the answer to that is I should just go and use it, right?
And have a play around with it.
But yeah, I guess bigger question, what impact do you think it can have?
So aside from actually using the platform
um i think the impact that we have is that we open source the libraries so you can
have a look at them you can use the services that we developed um i can think for example
the profiler you can use that for other purposes
or just have a look at how we
implemented the entire library.
Maybe there's some thing there that is
useful for your work, your research.
I can think of also data discovery, augmentation.
There are a lot of components there that can
be used for other scenarios.
So I think the goal was to show that it's possible to have small,
well, relatively small components that they can combine them
and have this instance of an open source platform.
Amazing.
Just a random thought that's popped into my mind um whilst we were talking there was that
so you know the data sets they get so the producers of the data here maybe this information
you might not be aware but is it sort of individual level like so is it me selling my personal data
or is it more sort of business level sort of people say hey we've got this customer data
set that we want to sell like what's the granularity on the data set there?
I think there's no limits.
Okay.
But thinking about geo data especially, that can be business level
because we're thinking now about personal data or customer data,
but with geo data, you actually have data about your surroundings.
And other companies use this data to actually do the research on where to place the next store,
for example, is the infrastructure good, and so on. So looking a bit further from our own data, our customers' data,
there is a lot of data out there that can be just used for other purposes.
And it's not personal.
It's just about our surroundings.
Yeah, I was just thinking that maybe I could sell my Google Maps history.
And then if I was feeling a bit, needed some I don't know some beer money or something I could say I'll sell all of my tweet data and
the locations where I sent these tweets from so I don't know yeah I don't think anyone would pay
much for it maybe I don't know I'm not sure if you're allowed to do that by the way but I don't
know have you checked the terms and conditions actually the point and the GDPR will protect me and I'm allowed to have the right to my own data I don't know actually have you checked the terms and conditions? Actually, good point. The Cholo GDPR will protect me
and I'm allowed to have the right to my own data.
I don't know, actually, that's a very interesting question.
Well, you can have your data,
but I'm not sure if you are allowed to sell it,
so to profit from it.
Interesting. That's interesting.
Anyway, yes.
Right.
Where were we?
Yes, so my next question is,
across the time sort of working on this project,
what's maybe the most interesting lesson
that you've learned while working?
I think, so because it's such a big project,
I think the number one lesson will be plan ahead
and double the time that you think
it it takes because not once we we rushed to ship something because we underestimated
or because coding you know i mean i think it works but it doesn't and then i have a bug and
i'm stuck one week in a bug or something like this. And the usual say, it works and I don't know why. I don't know why it works.
It doesn't work and I don't know why. So in software development, there's a lot of uncertainty.
I know you cannot plan for it, but at least take it into account. And besides the time management,
there's also a lot of people management.
Right.
Because you have to communicate your requirements,
your needs, your expectations,
so that others can help you, right?
Because in the end, it's about being a team.
Yeah, yeah, it's funny.
The estimation for software,
like how long it'll take to deliver
this thing is it's basically like a guessing game right because i can say i can say yeah i'll have
this done in in a week or two and guarantee you it'll be at least a month well basically the rule
of thumb whatever i say double it and that's the minimum right and that's even that's even if with
a good backwind yeah probably will be longer but anyway yeah that's that's a really interesting point you raised there yeah so my next question is uh from sort of the initial conception of the idea
obviously i don't know what point in the project's life cycle you joined in at but
whether sort of maybe up until this the icwe paper were there like things that you tried
along the way that failed that maybe the listener would find interesting?
Oh, definitely, definitely.
So for the data discovery service, for example,
I think we changed the data representation model three times
until we were confident that this is the right way.
And that took a lot of work.
And I also think that at some point we just circled
around but yeah i mean that's research right uh you have to try all the options until you're sure
that it's the right one it's very annoying very time consuming i guess there's a lot of um wasted
implementation effort there um changing the data representation i guess like
you did it once and then it was like you want to do it again really and then over and over again
yeah i'm not sure if it's wasted but it's definitely not something that you want yes
yeah i agree yeah it's not wasted right because it was part of the journey to get into the end
goal yeah work so it was it was a good thing that you tried it and it didn't work but yeah i guess there is at
times that can be i guess frustrating right yes yes so um right so yes it is obviously most of
your research has been within the topio project is there any other research that the list that
you've done over the course of your PhD
that the listener would find interesting?
Yes, so I mentioned data discovery multiple times
because this is actually my main focus.
So my research is in DBML, I would say, so databases for machine learning.
I'm working on data augmentation,
on feature discovery to improve machine learning performance.
I'm currently working on my next submission.
I don't want to give too many details,
just that my research is in data augmentation.
Stay tuned.
Maybe we'll do another podcast about it.
You'll have to do another one when when it comes out
for sure yeah great stuff so yeah kind of getting on from that then how do you go about sort of now
this is one of my favorite questions i love asking this question and seeing what um answers i get to
it from different people because it's always different it's all right how do you go about
generating ideas and then selecting what to work on?
So what's your creative process?
Oh, OK.
Now I have to go back in the beginning.
So I started from something that I knew and then I did a lot of reading.
And I come with tens of ideas from conferences, but lately,
because I'm the pandemic generation,
so I didn't attend anything in my first years.
But because I have too many ideas,
I'm lucky to have my supervisors who kind of tone me down
and make sure that I'm on the right path.
Because every time I want to just go and start
working on a different project
which is shinier and nicer and so on.
So I think I'm very lucky to have in total three supervisors, including my promoter,
and they help tremendously with feedback.
I think I also circle around an idea a lot until I start the development, so the coding,
for example.
But it's still progress.
I mean, it's still movement.
And I also think it's very important to like what you're doing.
Because you can jump on an idea and start working on it, and everybody loves it, but
you don't like it and then you're gonna
have a horrible time right i think what's very important is to to like what you're doing and to
have the right team to support you technically on a technical level sorry and also on an emotional
level also a good mentor helps so besides your supervision team a mentor is is very good because
he she can look at the the problem from a different perspective since he's not
actually involved in the problem yeah so how do you go about sort of sourcing out a mentor then
or was it sort of something that the university that Delve has in place for to match you with a mentor or was it something you seeked out yourself? I know that the university
has some programs to match you with a mentor and especially in the beginning when you start
but yeah because of the pandemic I think we I was not matched with with the mentor, that I'm sure.
So I found my mentor.
I found my mentor.
Yeah.
I mean, work events, conferences or other events in the country,
the local events.
And, yeah, we just got friends and that was about it.
Yeah, I think that's a fantastic answer to that question.
I mean, having a mentor is great, right?
And if you find someone that can,
if you find a good mentor, it's like invaluable, right?
I think I totally agree with that.
That's a great answer to that question.
We've just got two more questions now.
The first is, what do you think is the biggest challenge in data management research now?
I'm smiling and laughing at the same time.
The biggest challenge, I would say, is reproducibility.
People are reluctant to share resources, data, especially code.
And I think this hinders the progress.
I mean, you know, it's,
why should I reinvent the wheel when the wheel is there?
Well, because I don't have access to the wheel,
so I have to reinvent it.
So then there's no way to actually advance it
because we'll just keep on doing the same thing.
You know, there are efforts on improving this and other communities are doing much better from this perspective. Maybe they have
other issues as well but reproducibility I think it's or sharing the resources because people have
different definitions of reproducibility.
But I think this is one of the biggest challenges in data management.
Yeah, for sure.
And that is something we should as a community strive more for, right?
Totally agree with that, for sure.
And also about data, because many times people just want,
well, people, and I say for like a mean reviewers wants to see want to see experiments on real data but
nobody's sharing anything how can we actually work on real data yeah we need
then partnerships right with industry and then you can share anything because the the
businesses don't want to share anything so it's like a vicious circle so yeah we definitely need
to be more open more reproducibility completely completely agree with you on that one andrew
and cool yeah so last last word now it's the last question and it's what's the one thing
you want the listeners to take away from this podcast episode today?
Well, definitely have a look at Topio.
It's beta.topio.market.
I think I forgot already.
We'll put a link to it in the show notes.
Don't worry about it.
Yeah, we'll put it on all the socials so the listener can go and figure it, find it and play around with it. But I think on a general level, let's say, I think it's very important to be aware that
it's a joint group effort to create something big and impactful. So for the PhD students,
no matter how lonely your PhD trajectory is,
just find your team, attend whatever, everything, anything to find your team.
It will make a very big difference.
And with this opportunity, I like to thank my students, my research engineer,
collaborators, industry partners, mentors, supervisors.
You see, it actually takes a village.
That's one takeaway.
It takes a village to graduate.
That is a great message to end it on.
That is awesome.
Well, great.
Yeah, so let's wrap it up there.
Thanks so much, Andrew, for coming on the show.
It's been great to talk to you.
And if the listeners are interested to learn more about Andrew's work,
we'll put links to everything at the top here in the show notes.
And if you enjoy listening to the show,
please consider supporting the podcast through buying me a coffee.
It really helps cover all of our hosting costs, et cetera.
So, yeah, please do that if you enjoy the show.
And we'll see you all next time for some more awesome computer science research.