PurePerformance - How CERN analyzed 1 PetaByte per second using K8s with Ricardo Rocha
Episode Date: March 3, 2025One PetaByte is the equivalent of 11000 4k movies. And CERN's Large Hadron Collider (LHC) generates this every single second. Only a fraction of this data (~1 GB/s) is stored and analyzed using a mult...icluster batch job dispatcher with Kueue running on Kubernetes. In this episode we have Ricardo Rocha, Platform Engineering Lead at CERN and CNCF Advocate, explaining why after 20 years at CERN he is still excited about the work he and his colleagues at CERN are doing. To kick things off we learn about the impact that the CNCF has on the scientific community, how to best balance an implementation of that scale between "easy of use" vs "optimized for throughput". Tune in and learn about custom hardware being built 20 years ago and how the advent of the latest chip generation has impacted the evolution of data scientists around the globeLinks we discussedRicardo's LinkedIn: https://www.linkedin.com/in/ricardo-rocha-739aa718/KubeCon SLC Keynote: https://www.youtube.com/watch?v=xMmskWIlktA&list=PLj6h78yzYM2Pw4mRw4S-1p_xLARMqPkA7&index=5Kueue CNCF Project: https://kubernetes.io/blog/2022/10/04/introducing-kueue/
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready! It's time for Pure Performance with Andy Grabner and Brian Wilson.
Welcome everyone to another Pew Performance episode. This is one of the rare moments when you tune in and you don't hear the voice from Brian
Wilson but you hear the voice of Andy Grabner, which means Brian is not there with us today.
But nevertheless, even though I miss him dearly, I have a great guest today that I was fortunate enough to
bump into at the recent KubeCon in Salt Lake City. And without further ado,
Ricardo Roja, I hope I kind of got the name right, the pronunciation. Welcome so
much to the show and thanks for having time with us.
Yeah, it's a pleasure. Thanks for the invitation.
Hey Ricardo, when I saw you on stage at KubeCon, you had a talk, a keynote called
Multi-Cluster Batch Chops Dispatching with Q. I was fascinated.
First of all, the topic is really cool, but then you also did a live demo on stage.
And that's always very nerve-racking if you know you have like 10,000 people in
the room.
But you did it folks, if you want to watch the recording, the link to the YouTube video
is part of the description of the podcast.
Before Ricardo, I want to go into some of the Kubernetes topics, the performance topics
and the solutions that you've built for CERN.
I would like to learn a little bit more about you.
I looked in your LinkedIn profile. You spent overall about 20 years at CERN, is this right?
This is right. Yeah, it's been around 20 years I came as a student originally at CERN. So it's been quite alright. I had the pleasure to change roles quite a bit, but yeah, it's been a while right now.
And so I think you had two times almost 10 years.
I mean, you're still at CERN.
What did you do in the middle?
Yeah, so I came to CERN and at some point I thought I would like to experiment a bit
with something different, especially in the industry. And I was kind of fascinated since I was a child with remote places.
And so I moved to New Zealand.
And while I was there, I helped build the first public cloud provider in New Zealand.
That was really a big challenge, a great opportunity and a huge pleasure actually.
I had a great time there.
But eventually I came back to stay closer to my family as well.
Yeah.
Well, I think building a public cloud provider is not something that many can put on their
resume.
So that's quite phenomenal.
Is this, what type of service is this? I guess this
was about 10-15 years back?
Yeah. So it was early days and it was open source based. So the company is called Catalyst
New Zealand and they wanted to experiment with something different. So they gave us
some freedom to experiment with launching such a service.
It was the first API-based cloud in New Zealand. It was based on open source tools. At the time,
it was OpenStack. We launched it and it was quite popular. It's still there. So if you're
living in New Zealand, it's likely that if you have some cloud services, you're
running some stuff there.
So I still keep in touch and yeah, really an enormous pleasure to work there.
Yeah, cool.
Now we'll definitely be, I know we have some listeners from that part of the world and
I also have some friends.
I will make sure that they will listen into this once this airs.
I guess open source, that's then also an interesting segue because you are, besides working for
CERN, you're also very active in the CNCF and the Cloud Native Computing Foundation.
Can you tell me a little bit more on what brought you into the CNCF and what you do
there right now?
Yeah, so currently I have a couple of roles in the CNCF. This came out of the work
on containers and Kubernetes. So when I first proposed this internally, I saw that we would need,
it's like, it takes a village. So we would need to engage with the community,
get really our use cases seen,
and collaborate as much as possible.
So I got turned to join the CNCF and the Cloud Native
Competing Foundation as an end user member.
This introduced me to the community.
I also had been invited for a couple of talks in Cookons
right at the start.
We had some cool use cases, so people are always
willing to hear about them. And in the end, we had some cool use cases, so people are always willing to hear about them.
And in the end, this gradually grew
into becoming more involved, especially
in the technical oversight committee, which
I'm still part of.
And also, I helped build the new end user technical advisory
board, which I'm also part of.
And this is really where we stand.
We are very engaged, but we are there representing end users
and helping as much as we can.
And maybe as I have you on the line on this topic,
can you explain, because some people are confused,
what does end user mean in the CNCF?
Can you briefly explain to me what does qualify,
what is an end user qualified for,
or what does qualify an end user? That's the right way to phrase it. Absolutely, that's a very good question. So this is also
something we struggle in the technical oversight committee because, well, the main role of the
technical oversight committee is to oversee the maturity of the projects. So when projects
join the CNCF, they are reviewed and when they apply to graduate between the different
maturity levels, which is sandbox, incubation, graduation, they also have to go through some
due diligence, but in particularly they have to go through what we used to call end user interviews
and we now call it adopter interviews. And This is where end users have a huge impact in the community, which is we will interview
not the vendors, not the project maintainers, but we'll interview the end users, the actual
people using the project and they will explain to us in which level of usage they are.
Is it production, pre-production, just experimentation? And what's their experience with interacting
with the project and the community
around the project?
And this is really, I would say, this is also
kind of my view of the whole community.
But I think this is the core of the CNCF
and the Cloud Native community is the end users.
We are the ones that will eventually
say if the projects are successful or not successful
by adopting them.
So an end user in the CNCF is someone
that is an adopter of the project,
but does not have anything to sell
or doesn't necessarily have a huge engagement in the project,
apart from contributing to feedback
and testing and improving the project but just by providing feedback and user.
Yeah cool yeah I remember that we've launched years ago the Open Source Descends F Project
Captain and brought it from Sandbox to Incubation. We also launched OpenFeature a couple of years ago and I think that's that as you said,
right, as you mature in a project in the CNCF you need to prove not only that the project is stable
but that it's adopted and I think that's exactly what I find great and also when you look at the
CNCF landscape there's obviously a lot of sandbox projects,
but once you make it over the next hurdle,
you really know there is a community behind it,
adopters behind it, and it's actively been developed.
So that's great to know.
Exactly, so there's the part of helping the projects
in this journey towards maturity, towards graduation
and helping them building a community and having all the best practices that are needed
for a sustainable project.
And then there's the end user part, which is we provide some sort of certification or
some sort of like badge towards the maturity of a project, which simplifies the work of
an end user when they're choosing the best project for a task.
They get some assurance from this kind of diligence that we do in the TOC towards the
sustainability of projects.
And this is all together, like we work together.
It's not like challenging projects,
it's really helping them out,
building this kind of maturity.
Well, thank you so much for giving this brief explanation.
I wanted to bring it up because I know
the question always often comes up,
what is an end user?
And thanks for the detailed explanation.
Now I wanna go back to your line of work
and also what brought you to speak at KubeCon last year. Thanks for the detailed explanation. And I'm actually, what I learned, I didn't know, that you are analyzing all of this data,
at least the way I got it from the talk, using Kubernetes as the core platform.
And then you're using Q as a way to schedule the jobs because obviously GPUs are very expensive.
And then you had a very nice talk where you are explaining how you can best use Q to then
schedule all those jobs so that they most efficiently
leverage the underlying resources that are available.
Now before I go into that details, I first would like to ask a couple of questions about
how much data are we talking about?
Because this would be if we can give any numbers, like what data gets collected on the regular
basis.
Yeah, this is one of my favorite topics.
I'm glad to go through it.
So, as you mentioned, the main experiment right now at CERN
is called the Large Hydrogen Collider, which is a large
particle accelerator, the biggest scientific experiment
I ever built, actually, which is a 27 kilometer in perimeter
particle accelerator, which is a 27 kilometer in perimeter particle accelerator which is 100 meters
underground. And what we do is we accelerate protons to very close to the speed of light
and we make them collide in these special places where we've built very large detectors that act
as sort of like gigantic cameras that will look into what happened into these collisions. Of course
these are not like traditional cameras, what we do is there's several layers in these detectors.
We collect a lot of data and we try then to store it and analyze. But we actually collect
something like one petabyte of data per second on each of these experiments, which is not something
we can... Exactly. One petabyte of data per second, Okay. Yeah. So this is not something we can store and analyze with current technology. So what we do
is we have these filters that are very close to the detectors. And these filters will filter
the majority of the data very quickly on the nanosecond. And then we will actually store
on the nanosecond and then we will actually store something like 10 gigabytes per second per experiment that then we have to re-process and analyze. Still this comes to around 100 petabytes
of data every year which we need to store and analyze. So it's a lot of storage, it's a lot of computing capacity to handle all of this, but the main challenge
we have at CERN, and it's not new, it's not new with Kubernetes, is that we always have
to do more with the same budget. So what we're talking about is the experiments will always
push more. So for example, we are doing an upgrade in a couple of years
that's called High Luminosity LHC.
And what this is translating to is 10 times more data.
But we have to analyze, store and analyze
10 times more data with the same budget.
And this is why we are constantly searching
for new technologies, new paradigms
that will allow us to do a lot more with the same resources.
And this is where Kubernetes, Q, and all these tools, GPUs, they come into the picture.
It's that this is where we are looking into to be able to handle these levels of data in just a couple of years.
Hey, I got maybe a little controversial question first, and then I want to go back
to the way you collect the data.
With every layer in every system that you add, you're obviously always using throughput
or efficiency, right?
And with every layer, I mean, if you think about we have our physical hardware, we have
our virtualized operating system, We have our layer over layer.
Is Kubernetes the right choice then? Because you have so many layers
and you need to squeeze out the most out of the hardware.
So I think it's, I guess it's always the balance
between convenience and also like efficiency.
So I'm just curious, have you ever thought
about implementing it more closer to the hardware than having the orchestration layer in the middle?
This is a brilliant question and this is I can explain how we do things today and what
we are trying to do and I think it will reply at least part of your question. So up to now,
this very quick filtering nanosecond level
that we are doing for one petabyte of second,
a second, are actually done with custom hardware.
These are electronics we've built 20 years ago
that will filter this immediately with very low latency.
And then we will have what we call level one filters
or high level filters that come right
after that will be CPU based.
And this is a CPU farm that is very close to the detector that will then still already
be able to reconstruct some of the events and do event selection a bit slower, but still
on the microsecond or millisecond latency.
Now, what is happening is that technology has evolved to a point where we can actually consider replacing
definitely the CPU farm, but also some parts of the custom electronics part
with things like FPGAs, with GPUs, and this is what we are looking at.
And what you mentioned about convenience is exactly what's happening. What was custom hardware, custom software,
custom deployments in these special farms close to detectors is actually
being replaced with very large Kubernetes clusters for the next round
of upgrades. And in there we are planning to move a large fraction of the event selection to GPUs, in some cases to machine learning and model inference, that are served and managed by Kubernetes because this gives us all the convenience, all the monitoring, all the nice things we know about Kubernetes.
And we got to a point in technology where this is possible.
So this is something that is happening now.
For example, one of the very large experiments called Atlas is already changing in the next run,
we call it, coming in a couple of years, to replacing custom electronics in some parts and then their custom deployment for the
farm with one single Kubernetes cluster with around 5,000 nodes. Yeah, thanks for that. So
that's really interesting. So like the progression that you're making from the custom hardware level 0 to CPU based and then
as you said right the custom hardware has been built 20 years ago?
Something like that a lot of it yes like there has been there have been upgrades
in between but like the first the initial design for this hardware was around 20 years ago. They came to operation around 15 years ago.
So yeah.
Yeah, obviously, you know, a lot of things have improved
and changed on the hardware side.
So it makes a lot of sense.
Coming back to then, even though you explained it
a little bit now on how the filtering and you're,
I call it the data pipeline, right?
For me, a data pipeline is where the data originates until it ends up where you can
then actually store and analyze it.
You mentioned that you are the level zero, like you're filtering from petabytes to gigabytes
of a second.
How do you make the decision?
Do you really filter?
Do you aggregate? Do you
have a fear of losing relevant data? I mean, that must be a challenging decision to make.
Yeah, so this is really on the side of the physicist and it's very specific to each detector.
So I could bring a physicist to explain this much better. But in a very quick summary, the
In a very quick summary, the majority, large majority is really noise. We know it's not useful.
The remaining things from the first level filtering are the ones that then we need to
do some sort of very quick event reconstruction to do selection on the interesting events.
But this is already a large, a very small fraction of the original data.
And this is where we can optimize the most. Of course, some physicists would say like the more
we if we could store everything, it would be great, because this is where you might find what's
unknown physics, new things. For the rest is kind of tuned to things that we know we
want to look for.
But still, the large amount is noise.
But the way it works is that we actually tune the knobs, like we turn the knobs so that
we store based on the budget, on the computing and storage we have.
So if we would have 10 times more computing we would probably turn the knobs a little bit to be less strict. But this is
how we do it. So if we manage to come up with new paradigms, new ways of doing computing,
then the physicists will start having ideas of course.
Yeah, I was going to add, so just to complete, so this is what we call traditionally online
computing, which is the stuff that is really close to the detectors and the data coming
out.
Then we store that and that's what we call raw data.
And this is where we start what we usually call offline computing, which is the reconstruction
of the events from the raw data to see what
actually happened there, and then all the different steps to come up with what we call
analysis objects, which is what the physicists are actually looking at.
And this is where it's more this kind of very high throughput computing with very large batch farms that are at CERN in many
centers around the world, as well as public clouds, HPC farms, everything we can get hold
of we try to use for this kind of offline high throughput computing.
And this is also then, hopefully I got this right, This is also where you're now leveraging projects like Q to schedule those jobs and find the right hosts, nodes, resources that have the right
specs that you can then run your batch jobs in a fast way.
Yeah, so the big advantage or thing that we look into when we look at Kubernetes is that it became kind of a commodity, something that everyone is sort of exposing.
And this simplifies the access to resources quite a bit.
So in the last 20 years ago, and this is actually my first project when I came to CERN
as a student, we built this grid computing infrastructure, which is a sort of middleware
that abstracts different infrastructures around the world and with some interfaces that were
sort of common. Well, that would allow us to make use of around 200 centers around the world and expand our computing
capacity from something like 400,000 or half a million CPUs to a million CPUs. This is what we
have been relying on for the last 20 years. Now, all this middleware was built on the pre-cloud era.
We had to actually write all the software ourselves because no one or very few people had big data 20 years ago.
But as it became sort of a commodity,
we have tools like Kubernetes and all the ecosystem around it.
So what we do, what we try to do is simplify this stack
or even replace it completely in some cases
with something that is more standard,
like exposing a Kubernetes API.
So a lot of these sites now have replaced all their stack
with a Kubernetes endpoint,
which means we integrated that into our infrastructure.
But it also means that if, for example,
integrating a hyperscaler,
as long as we have a Kubernetes endpoint,
becomes much easier to integrate those resources
into our existing tools.
The motivation is really to get access to things we don't necessarily have on premises.
And the demo I did was especially focusing on GPUs, things like AI ML,
where we don't have necessarily a lot of resources on premises right now.
But we would love to have access to more GPUs, to
specialized accelerators like TPUs or other sorts of specialized hardware
accelerators. And this is where we are exploring the flexibility that Kubernetes
and tools like Q offer. For many years we struggled a bit with Kubernetes
because it was really designed for
initially for IT services. So we advocated a lot for scientific computing, the need for batch
primitives, things like advanced scheduling, co-scheduling, queuing priorities. It took a
while, but actually, Gen.AI was the time where everyone now wants it. So suddenly, we got a huge investment and we are really benefiting from that.
How do you solve? So now with Q, you're distributing your workloads, right?
You're distributing your jobs, but how can you make sure that these jobs have the data local to
them?
Because I think that's the big problem, right?
How do you get the right data?
Are you already, when you're collecting the data, are you then already distributing it
to all these different data centers?
Because in the end, data residency is a big challenge because otherwise you need to constantly
pull the data from some remote location in the world I guess Fred. Yeah so this is one of the benefits
we have from having built this computing grid infrastructure is that we already
had a distributed infrastructure of something like 200 different centers
where we had to deal with these issues. So we built the services that allow us to distribute the data to where the computing capacity is
or might be. So we have this data management systems that allow us to have
some sort of subscription system for datasets where we know that we can
define the subscriptions of the type of data that should go for the different
computing centers and we do this in advance.
This simplifies the workload scheduling because in many cases we already have the data where
we want to send workloads.
In some cases we don't and you will have to pull and wait a bit. We do have also an advantage, which is we have a pretty extensive network infrastructure
with 100 gigabits, or in some cases,
multi-hundred gigabit connections
between these centers.
And we've extended to public cloud providers as well.
So we have dedicated links to those regions
that we depend on.
So you will feel the latency but in terms of
network capacity and so what we are actually quite good. Yeah, yeah it sounds like you have no
problems when you start a Netflix movie that you have to wait for a long data set. It should be okay. Well, do you have SLOs somehow defined on how fast your team needs to kind of provide
this compute to come up with results?
Is there any type of SLO concept like service level objective or whatever KPIs that you
have to say, hey, we need to ensure that this type of data, this amount of data is processed in this amount of time,
because otherwise we will not be able to analyze all this data that comes up in a year.
Yeah, yes, but they are very predictable. So all this is well known in the fans. Each experiment has what they call the technical design report, which kind of predicts what
will be the data rates for the next run, which usually will last a couple of years.
So we know very well what will be the data produced and the computer capacity required.
Of course, we have some safeguards, some buffers
to in case of like disasters so that we don't lose the data.
So there are some buffers close to the experiments that
will basically cover for some major issues in the data
centers.
But they are pretty large.
And so we don't really have to handle
with this very detailed or up to the minute or up to the second availability.
We know for the large majority, we know how much capacity we need.
Where things get interesting is more for the interactive analysis, for the kind of chaotic
analysis from the physicists, and especially now that everyone got interested or more interested
even in machine learning. There it's a little bit more chaotic and this is where we try
to use as much as possible on-demand capacity and opportunistic capacity because we don't want to procure for peak
capacity because that would be too expensive.
We want to procure for what we know is the nominal need we need and that is predictable
and then complement that with opportunistic resources and this is where this kind of more
flexible infrastructures come to play.
And then on this AtoC analysis where data scientists come in, I guess this is also where
the convenience of Kubernetes comes in again because I guess they can write their, and
again, I'm not at all informed as well so I have no clue how they write their algorithms,
but they basically package it up in the container and then they send it over into your queue
and then you take it and then you deploy it and then in the end give them the result back.
Is that kind of like the high level view of how this works?
So this has been the case even pre-Curbernetics.
Just physicists are quite IT knowledgeable. They got
used to working with very low-level interfaces, submitting to HPC centers, but in particular has
never been super easy. So they know what they're doing. But basically that's it. They wrap their job
But basically that's it. They wrap their job in some sort of definition that is submitted. The interesting part here is that actually for the software distribution,
traditionally we actually built our own internal systems of hierarchical caches for efficient software distribution.
And we have a system called CERN-VMFS or CVMFS, which is basically hierarchicals with caches.
And this is the way that we still distribute software today.
It is a read-only file system.
So in some sense, it's kind of similar to containerization,
but we don't have this kind of notion with that system of a single unit of a container
that can be easily shared and packaged.
This is where containers brought some benefits to our ecosystem as well is that this notion
of reproducibility became more clear. So we are actually in a kind of complementary world where people use containers,
but they will grab some additional software from the default releases from the experiments from this CVMFS system.
So it's a very interesting world, but there are cases where the full software is packaged in containers. And this
is a challenge because these containers are not necessarily how you would expect them
to be. So we have containers with several tens of gigabytes. And yeah, this has been
other parts of the work we've been doing to improve things in the ecosystem for these
workloads.
So we talked about your kind of your early days.
You started 20 years ago or you had 20 years, you know, at CERN in total.
If you look a little bit into the future, if you think about KubeCon in let's say 2030
in five years from now, what do you think will
be something you will be talking about?
What will have changed?
What is your goal of having things optimized?
Yeah, so that's a question we keep asking ourselves also in the technical oversight
committee.
This is always something that we get asked about.
What's next for cloud native?
I think right now the main effort is to make AI
work properly and make cloud native
the best place to run AI.
And there are reasons for that.
The main reason internally,
and I think for other end users as well, is that we made such
a big investment into having an infrastructure that scales and that works well with this
kind of ecosystem.
Doing something totally different for AI would be too expensive, too much of a change, and
there's no real push for that. So it's actually the last
year or year and a half has been all about ensuring that these tools are there.
So I would say that in five years the challenge will be to adapt and that's my
view, it's not Cern's view necessarily, but I think the challenge will be to adapt to the new
types of hardware that we are seeing. We lived in a pretty calm world of CPU expansion with Moore's
law and very clear technology evolution. We had a big change with the advent of GPUs and sort of paradigm shift on how we develop software
and how the software works
and how the machine learning platforms
are using this kind of hardware.
I think what I see is this becoming quite big.
Like if we look at the trend of the size of the GPUs
and the capacity of these GPUs,
the needs for very low latency between the different nodes that host these GPUs.
It's kind of funny because we went from very large mainframes before I joined CERN.
That's how scientific computing was done, sort of commodity computing and scaling
out for multiple nodes.
And we, I don't know, I kind of see a trend
of putting things back together
with very low latency interconnects
and things that do even lower latency than InfiniBand
with things like NVSwitch from Nvidia.
And they start offering this very large
and hardware accelerators or app, I would almost call them mainframes where people will
actually get slices of compute in these very large computing devices. And this is a challenge for
I think mostly for the data centers because we learned how to manage these individual nodes at scale with very large amount of nodes.
We'll need to learn again how to host mainframe-like computers.
But it's also a challenge for the software because we will stop thinking about nodes quite a bit with Kubernetes
in terms of workloads, but we will need to learn how to share very large pieces of hardware
between users efficiently. So I think that will be the main challenge.
But I would assume this should not be the concern of the developer that builds the workload.
It will be the concern of whoever builds that orchestration layer, right? To make
sure that whoever, like coming back to the Q project, right? You have like 10,000 jobs that
need to be executed and then Q figures out where to best place them and whether this is now 10,000
nodes around the world or slices in one big mainframe, like modern mainframe, it should
not concern whoever submits the job.
I think you're right.
So this is one of the advantages of Kubernetes and the separation of concerns that we got
from the declarative APIs of defining the workloads and the actual orchestration
and execution of those workloads.
But we see things like there is a feature
that I think I also mentioned or Marcin mentioned
during the keynote, which is called DRA, Dynamic Resource
Allocation.
This is, I think, a key feature that
is being developed right now, and it will mature
during this year.
But this will be key to actually expose these resources in a very flexible way
and even allow the orchestration and the scheduler to reshape the resources dynamically.
Things like partitioning GPUs that right now is kind of a manual process has to be done in advance.
It has to become more dynamic advance, it has to become
more dynamic. The system has to become smart enough to optimize the whole infrastructure,
partition it if needed, and assign slices as needed for each workload. I think a lot
of it will be in this kind of optimization of resource usage but hopefully this will have little impact in the upper layers
and the end user workloads. Ricardo, did I miss anything? Is there something in the end where you
say hey I would have wished Andy would have asked me or anything you want to say about CERN, about the CNCF as some closing remarks?
I would say I think I've repeated this in other places but this is something I'm very keen on
repeating as much as possible which is all the community effort and all the ecosystem, every single component that is available in the ecosystem is making a huge impact.
Not only at CERN, definitely at CERN because this is where I see it, but I would say in the whole scientific community.
This is changing the way easier for anyone to approach without having to have access
to very large laboratories or scientific infrastructures.
This is really changing the way we do science, improving things quite a bit.
Even if it's not obvious from the daily work, I always like to express that appreciation for everyone's
contributions. So, whatever you're working, I hope to meet everyone at QCon, to meet everyone
in the CNCF, different CNCF bodies and have an opportunity to exchange ideas, but also
exchange ideas but also give my appreciation for everyone's work.
That actually means the next CubeCorn is coming up in a couple of weeks. I guess we will see you in London. Absolutely, looking forward to it. See you everyone there. Perfect. All right, Ricardo,
thank you so much for the time. I know it's tough to take time out of your busy day and out of
you know not only the work day but also the personal day because we're recording this at
a late hour. Thank you so much for doing this and I'm looking forward to seeing you in London.
Thank you, thank you for the invitation, see you soon.