The Vergecast - Marc Levoy on moving from Google to Adobe and the ethics of computational photography
Episode Date: September 8, 2020The Verge's Nilay Patel talks with former Google engineer Marc Levoy about his move to Adobe, the state of the smartphone camera, and the future of computational photography. We are conducting an aud...ience survey to better serve you. It takes no more than five minutes, and it really helps out the show. Please take our survey here: voxmedia.com/podsurvey. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Transcript
Discussion (0)
Support for the show comes from Retool.
Too many companies run critical operations on duct taped spreadsheets,
Slack workflows, and whatever else they could cobble together.
Not because they want to, but because building internal tools
means weeks of waiting on someone else's backlog.
That's where Retool comes in.
Build custom internal tools just by describing what you need.
Prompts something like,
Build Me a Revenue Dashboard on our Salesforce data.
And Retool actually builds it on your company's data,
in your cloud with enterprise security built in.
Go to retool.com slash Verchcast.
We all need to retool how we build software.
What's up, y'all. I'm Skyler Diggins, seven-time WMBA All-Star, Olympic gold medalist, and mom.
And I'm Cassidy Hubbard, host and reporter for nearly 20 years covering the biggest names and stories in sports and mom.
And this is Am Mom, a community for athletes, game changers, and moms of all kinds.
dropping May 14th.
Tap in with us.
Hey everybody.
It's now from the Vergecast,
really fun interview episode this week.
Adobe VP and fellow Mark Lavoie
joins us to talk about
smartphone photography,
specifically computational photography.
You might recognize Mark's name.
He was one of the senior scientists
and engineers on the Google Pixel camera team
that really brought computational photography
to the masses,
really changed the industry,
change what we think a smartphone can do
when taking a photo.
So computational photography, if you don't know, is when the phone takes a series of images
and merges them together to make a final image.
That's how you get those HDR effects.
That's how you get tone mapping.
That's how you get things like refocusing on something like Letro.
Mark has worked on all of this stuff.
We talked about the jump from Google to Adobe, why he decided to make the switch.
At Adobe, his team is tasked with building a universal camera app that will work across platforms.
We talked about what that might look like, what the next frontier of smartphone photography
might be, how video factors into it.
And of course, if you've ever heard us talk about computational photography on the Vergecast,
you know that I asked Mark Levoy what the nature of reality truly is.
Really interesting conversation.
We got all the way into it.
Check it out.
Mark LaVoy,
Adobe, Adobe, Adobe.
Mark LaVoy, you are a VP and fellow at Adobe.
Welcome to the Vergecast.
Thank you for having me.
So, Mark, you and I have talked before at pixel events.
You were at Google.
I think for our audience, most sort of notably, you worked on a team that developed a pixel camera
and really ushered in computational photography.
but you've had a long and varied career.
Now you're at Adobe.
Can you just give people a sense of,
I'm looking at your website.
It's a long list,
but give people a sense
of all the kinds of things you've worked on.
Well, I was a professor at Stanford for 25 years.
I've worked on a variety of things.
I've worked on cartoon animation, actually, in the 1970s.
I worked on medical imaging,
a technique called volume rendering in the 1980s.
I worked on three-dimensional scanning in the 1990s,
culminating in bringing a team of 30 Stanford students with me,
to Italy to digitize the statues of Michelangelo, a project called the Digital Michelangelo
project. I worked on lightfield imaging in the 2000s, which means taking many closely spaced
images in order to be able to either move around after the fact or refocus after the fact.
And that's led to some commercial companies like Lytrae with a refocable camera. And then in the
20 teens, I've worked on computational photography, a term that I actually coined or re-coined
or at least popularized in a course I taught at Stanford in 2004.
That's a lot of things.
I want to focus now on photography very specifically.
You made the jump to Adobe.
What are you doing at Adobe?
So Adobe is a unique company.
They are world class in image editing.
And they have more imaging experts than any other company on the planet.
So it's really a unique opportunity.
The initial press announcement reported that I'll be building a universal camera app.
So what does that mean?
What does that mean?
What does that mean?
So I think of universal in three ways.
First of all, we do want a camera experience that is based on computational photography at the point of capture and that runs on as many platforms as possible.
At Google, the pixel was just one platform.
I'd like it to be universal in the sense that it affects all of Adobe's products.
Adobe has a number of products that do have a camera capture experience as part of them.
They don't yet use computational photography.
And thirdly, I think of Universal as trying to appeal to a spectrum of users,
all the way from the everyday consumer who just wants a great picture when they press the button,
to the creative communicators and all the...
way up to the professional pros. And that's not really a market that I have targeted before,
and it's an interesting one. So, universal in all three of those senses. So I want to talk
about the pixel just a little bit. Obviously that the pixel two, I think in particular,
was a massively disruptive product and that the camera is so much better than everything else.
They've been using kind of the same sensor ever since. I think there are leaks of the new
camera will use the same sensor, the Sony IMX, 633.
I think. That's one sensor. When you say universal, there's a lot of action in camera sensors beyond just one Sony sensor and one camera. Can you actually get the same look out of all these different sensors? Is that one of the goal? Or is it to enhance every kind of sensor that you might come across? The mobile sensor industry is fairly mature. It does improve, but the improvements are coming with some diminishing returns over the years. One variable that's of particular interest is the read noise.
As the reed noise decreases, you can take pictures in lower and lower light.
And so if Sony or someone else comes up with a sensor that has lower read noise, a lot of people will grab onto it.
There are other developments that are happening in the sensor world.
There's something called Quad Bayer, which has a two-by-two pattern of red and then a two-by-two pattern of green and then blue.
Whether those catch on or not remains to be seen, they have trade-offs.
So there are improvements being made in the sensors, but I'm not sure that they're pivotal.
They're incremental.
Yeah, and I wanted to start there with the pixel and the conversation about sensors because
the thing the pixel showed everybody is, oh, man, we can do a lot with the processors
on the phone to improve the quality of the image it's generated.
And that seems to have just become kind of a dominant theme across all the major smartphone vendors
now. I have not yet seen that kind of technique applied beyond a smartphone. Right. There's some
mirrorless cameras that'll do some of that stuff, but the big computational photography idea of, let's
take a bunch of frames and then generate a final frame lives on a smartphone. Can that be expanded
beyond the phone? Yeah, that's a really good question. So when we won DP reviews, I think it was
the smartphone camera beer, no, it was the innovation of the year. The first time,
The user forum comments were running 60-40.
Well, this isn't real photography.
They're combining a bunch of different moments.
Okay, okay.
The next year, we also won the Innovative Technology Award from TP Review.
The user forum at NightSight had come out, and user forum comments had flipped.
They were now running 80-20.
Huh, why don't the SLR makers do this?
So the answer is complicated.
it. I wrote an article, which you can look up online, about 10 years ago, that surveyed the SLR industry
and tried to identify why they have not begun to use computational methods. And it has to do
somewhat with conservatism and who they believe their market is. One factor I've learned since I
wrote that article is that actually their programmable processors are not very powerful.
They have, for example, on Canon, this digit signal processing chip that's basically an ASIC, a special purpose chip that does just their algorithms.
And it would take them several years to spin a new one to do a different algorithm.
And so the arm core that is programmable is relatively weak.
And so in terms of programmable and therefore agile software development, the mobile phone makers are actually in a better position than the SLR makers.
also the SLR sensors are large so they would have to do a lot more processing so do you think you kind of hit just the limit of what you could do with the pixel platform and that's why you're expanding to a more universal app like that's the question i'm i'm thinking about is the pixel was like far and away the winner in that one little slice of photography to some extent apple caught up but building an app that works across every phone perhaps more devices seems like a bigger opportunity was that the thing that attract you?
you and made you jump? Well, if you look at my career, which I outlined at the beginning here,
I get intellectually restless every seven to ten years. And I think it was time to declare victory and
move on. There were diminishing returns among these table stakes of high dynamic range imaging
and low light imaging. And it was time to look for a new frontier. When you say universal across
three fronts, is that the new frontier or is there an imaging new frontier? There is also an
Imaging New Frontier. I, of course, don't want to tip my hand as to exactly what we might work on at Adobe.
But if you look at the research literature in computer graphics and computer vision, you can see
threads for techniques beyond the table stakes that people are trying to accomplish. One example of
those would be removing window reflections. There's a lot of papers people have published on
removing window reflections. Another thread in the literature is removing harsh shadows.
and relighting. And there have been some products in that space, maybe not fully successful yet.
So if you look at the research literature, you can see a lot of these threads, removing distracting
objects, the bus in front of the Eiffel Tower, the power lines, the trolley wires. And there again
have been papers published in the research literature. No one's really nailed it yet. So I think
there's still a lot to do. And I think of those as being beyond table stakes, but features
is that pass what Larry Page at Google like to call the toothbrush test. You use it twice a day
and it improves your life. In other words, it's a feature that people really want or need.
You know, it's interesting. You started with Forum Nerds at DP Review saying this isn't a real
photo. It's not just a single frame in a moment in time, which is an argument that we have on
the VAR chest all of the time, just because it's fun to dive into. But the things you just described
are edit features. They're not capture features. Is that how you think of it?
Not necessarily.
Okay.
This is a holy grail.
I don't know if we'll ever accomplish this, but could you imagine that we had a smartphone
where you could tap on a button and the window reflections would disappear?
And then you could proceed to frame your photograph.
So in other words, it happens in real time during composition.
That's before capture.
And it would really make a different kind of a device.
It would give you a superpower that would be useful.
How do you think of that?
I mean, I was just scanning all of your Stanford lectures.
that are on YouTube.
And you start, you start in the first one with the equation for depth of field,
and you end with, like, what makes a good photo with, like, the artistic nature of photography.
Looking through a viewfinder or smartphone screen, editing reality and then capturing that
is a very different kind of art making.
How do you think about your responsibility to that?
I'd like to empower it and let the artists decide what it's good for.
But, as I said, I think consumers would find these capabilities useful.
if they could press on that bus in front of the Eiffel Tower and it would just disappear,
they would know, okay, I don't need to wait for the bus to drive away.
I can take my picture now.
And that is actually affecting their composition and what they do at the point of capture.
I mean, I feel like I can just go down this rabbit with a hole for you.
I mean, there's a lot to be said for that.
How would you accomplish something like that?
Would you need to know, would you need a base frame of what the Eiffel Tower looks like to re-ad it?
How would you accomplish something like that?
Again, if you look at the research literature, there have been a lot of different approaches.
One of them is exactly what you just say.
The entertainment industry would call that a clean plate, a clean background plate.
That's one possible approach.
Another approach, depending on which part of the scene the bus obscures, is to just use your prior knowledge,
a machine learning algorithm's prior knowledge of what is likely to be there and to put plausible things there.
If the bus is too large, of course, that's not going to work well.
But if it's a small object, it might work very well.
And there have been a lot of papers about this so-called in-painting problem.
And Adobe as well has some products in that area.
The content-aware fill is an in-painting algorithm, essentially.
Adobe makes Photoshop.
And I feel like Photoshop, as a verb, lends itself to this kind of creation, right?
This kind of after-the-fact reality creation.
When you say universal camera app from Adobe, they already have Photoshop camera.
a camera in Lightroom. Is that where this stuff goes?
Unclear. That's sort of product roadmap and that's unclear. At this point, I plan to
build technology and we'll see where it lands. Of course, I don't want to focus just on what
happens before you take the picture. If you can, for example, store the burst that you've
captured plus any metadata, then you can also edit it after the fact and decide to remove the bus
later. So I think editing can be thought of as happening pre-captured during capture and after capture.
So to edit afterwards, you probably need to store a number of frames, right? I mean, you brought up
Lytra and Lightfield, right? That was, that entire game was a new file format that contained a
variety of data. Right, and that's what I mean by capturing and storing a burst.
Yeah. So, but that was that implicate like a new file format. I mean, one of the things that I think
about with all these cameras is at the end of the day, I'm just exporting JPEGs. For better or worse,
That's the world we live in.
And I often think it would be great if I could go back to any of these JPEGs and change them in various ways.
Well, actually, the world is a little bit larger than that.
So on most phones, you can ask for a DNG to be stored as well, a raw format.
And on the pixel phone, that raw file, that DNG file, was merged from a burst of frame.
So it was a very good DNG file.
It was SLR-ish quality.
in also on the pixel if you took a portrait mode shot it stored both the original and the background defocused version and it also stored something about the segmentation mask of the person or the depth map of the background so we're already moving in that direction where we're storing more do you think as you envision a universal app you will have the access you need to the sensors across a variety of platforms to do the kind of photographic computation
that you want to do? That's an excellent question, to which I don't fully know the answer.
Does iOS give you that capability? Ask it straight out. So the history of these APIs is that at
Stanford, we developed a prototype idea called the Franken Camera, published a paper on it.
One of my PhD students, Eddie Talvalla, moved to Google, and that was the genesis of the camera to
API. I think once that became known, the other vendors began to follow. Apple followed with a more
flexible camera API that could capture bursts and opened it to third-party developers. And so I think
the world is moving in the right direction there. When you think about sort of the state
of cameras now, smartphone cameras in particular, the race has been to, you know, Samsung will add
a gigantic, I think they're up to 48 megapixel sensor. They'll do the pixel binning that you described
earlier. Apple's just adding lenses left and right with various sensor sizes. There's definitely a
hardware race going on. Is that useful to you as you? Do you think having four, four lenses and a
LiDar sensor is going to help you out? Potentially, yes. Potentially no. The Pixel 4 last year did
add a telephoto lens and that did help. Google shipped a telephoto lens plus the super
res zoom technology that came out of my team. And the two of them worked together very well.
So there's definitely something to be said for more hardware.
A depth sensor could help with a variety of tasks.
So I think hardware is important, but I think what I've shown over the last 10 years is that software is very important.
So I think the two work hand in hand.
Well, I'll tell the audience, before we jumped on, you and I were joking that we're both using standalone cameras as webcams right now.
I think you said you had a Sony A7.
I have an RX100.
Is there some reason that a laptop camera can't achieve the effect where you and I have both gone after with dedicated hardware in like big lenses and big sensors here?
Yeah, that's a good question.
So one difference between our use of a fancy camera as a webcam and what a smartphone camera could do is that this is real time.
So video is an entirely different ballgame.
The computational photography that we did at Google on the pixel was large.
in the still photography area.
There were some other teams at Google
working on video, but there was
less that they could do because they had
to do it in real time. And so
as a webcam, you can't
take a burst as easily
for a single frame, and
processing would be very difficult.
And so I think video is one of the
next frontiers.
So that kind of leaves me right.
I did want to ask you a video.
When I look at sort of the world
of creative expression on the internet,
What are the kids doing, right?
They are using largely real-time AR filters or video editing software in apps like TikTok,
Instagram Reels, Snapchat is actually very good at this.
That's all, you know, when you open a camera on Instagram, they open you to the video camera.
And they'll take a still off that video camera, which drives me crazy because it looks horrible.
But they know that seamless switching between video and photo is more important than one Instagram photo.
I mean, they've said as much to me.
You know, the pixel is a video device just never got there.
We complain about it all the time.
You look at what Apple's doing.
They are doing a little bit of real-time tone mapping.
Their video looks very clean.
Samsung is obviously racing ahead.
Is there a connection between what you're working on with computational photography
and what we see consumers wanting to do with their video cameras
and what they are demanding to do with their video cameras?
Right.
So the way to move ahead in video is going to depend more on hardware.
And in particular, it's going to depend on hardware accelerators for the computations.
So the CPU, central processing unit, the GPU, the graphics processing unit, DSPs, digital signal processors, neural engines for machine learning networks.
I think that's going to be one of the key battlegrounds going forward.
And so the vendors do need to put hardware, are already putting hardware in their devices for accelerating machine learning calculations.
and they're providing APIs programming interfaces for developers,
that's still sort of nascent.
They're not yet stable.
They're not yet fully performant.
I think that is the next battleground.
And those will be useful for doing computational videography.
I think that is going to be a future battleground.
Is that connected to what you and your team,
your forthcoming team, will be doing at Adobe?
We'll have to see.
How big is this?
I noticed there's a couple of different.
couple job listings up on LinkedIn. I saw that you tweeted them. How big is this team going to be?
We'll have to see. How big is it now? Is it just one person? Yeah. Yep. It is just me.
My team at Google was about 30 people. That's a nice size for a close-knit engineering team,
if not on the larger size for a close-knit engineering team. Did you work on the hardware at Google,
because now you're fully going to be in the software world, right? You won't have that connection to the
hardware side. How connected were you to the hardware side of Google?
I gave them advice. Whether they listened to it or not would be another question.
Yeah, that sounds about right. When you look across the sweep of smartphone hardware,
is there a particular device or style of device that you're most interested in expanding these
techniques to? Is it, you know, the 96 megapixel sensors we see on some Chinese phones? Is it
whatever Apple has the next iPhone? Is there, is there, is there,
a place where you think there's yet more to be gone? Because of the diminishing returns due to the
laws of physics, I don't know that the basic sensors are that much of a draw. I don't know that
going to 96 megapixels is a good idea. I mean, the signal to noise ratio will depend on the
size of the sensor. And so it is more or less a question of how big a sensor can you stuff
into the form factor of a mobile camera.
Before the iPhone, smartphones were thicker.
If we could go back to that, if that would be acceptable,
then we could put larger sensors in there.
Nokia experimented with that wasn't commercially successful.
Other than that, I think it's going to be hard to innovate a lot in that space.
I think it will depend more on the accelerators,
how much computation you can do during video or right after photographic
capture. And I think that's going to be a battleground. When you say 96 is a bad idea, right? I mean,
much like we had megahertz wars for a while, we did have a megapixel war for a minute. Then there was,
I think, much more excitingly, an ISO war, where low-light photography and DSLRs got way better,
and then soon that came to smartphones. But we appear to be in some sort of megapixel count war,
again, especially in the Android side. When you say it's not a good idea, what makes it specifically
not a good idea.
As I said, the signal to noise ratio is basically a matter of the total sensor size.
If you want to put 96 megapixels and you can't squeeze a larger sensor physically into the
form factor of the phone, then you have to make the pixels smaller.
And you end up close to the diffraction limit.
And those pixels end up worse.
They are noisier.
So it's just not clear how much advantage you get.
There might be a little bit more headroom there.
Maybe you can do a better job of demosaking if you have, meaning computing the red, green, blue at each pixel, if you have more pixels.
But there isn't going to be that much headroom there.
Maybe the spec on the box attracts some consumers, but I think eventually, like the megapixel war on SLRs, it will tone down and people will realize that's not really an advantage.
Do you think any of the kind of pixel binning or quad-veyor techniques?
Because the 48-machshel cameras, they still spit out a 12-megapixel photo by default, right?
Do you think those help?
That remains to be seen.
If you have four reds, four greens, and four blues, that makes demosayaking, interpolating
the reds, greens, and blues that you don't see harder.
And so those quad-bear sensors have been subject to spatial aliasing artifacts of one kind
or another, zippering along rows or columns.
and whether that can really be adequately solved remains to be seen.
One of the things as we review the phones kind of on our very consumer side,
and it's always really interesting to connect sort of how we review things
that consumers use with how you make them, how you build them,
how you think about them, is we have noticed a particular HDR look has emerged on each of the phones.
So Samsung has a very particular, very saturated look.
I can spot a Samsung photo a mile away.
Apple started in one place.
went to another place and they're they're going to yet a third place, I think. The pixel has been
relatively constant, but it's moved a little closer to where the other other folks are, in my
opinion. That is a big artistic decision, right, that's connected to a lot of engineering,
but at some point you have to make a qualitative determination. How are these photos going to look?
You obviously had a huge hand in that. How did you make that determination? You're right that
it's an artistic decision. And my team was instrumental in that.
I looked at a lot of paintings and looked at how painters over the centuries have handled dynamic range.
One of my favorite painters was Caravaggio.
Caravaggio had dark shadows.
I liked that.
That really explains a lot about the pixel, too.
Right.
Last year, we moved a little bit more toward Tishin.
Titian has lighter shadows.
It's a constant debate.
It's a constant emerging taste.
and you're right that the phones are different.
It's also true that there is probably some ultimate limit on high dynamic range imaging.
Not necessarily on how high a dynamic range you could capture,
but on how high a dynamic range you can effectively render without the image-looking cartoony.
One of my favorite photographic artists is Trey Ratcliffe,
and his look is deliberately pushed and cartoony.
That's his style.
But I'm not sure I would want the Trey Ratcliffe look with every picture that I took every day with a smartphone.
And so I think that's an important limit.
It's not clear how we get beyond that limit or whether we ever can.
You know, our friend Marquez Brownlee, who I'm sure you know, does these challenges every so often.
We ask people to vote like blind Pepsi challenges of smartphone photos.
And I think every time he's done it, it doesn't matter how good the photo.
The brightest photo always wins, right?
The crappiest photo, but it's the brightest one, and that's the easiest cheat that any camera maker has.
It's just to overexpose it a little bit, and then you'll win on Twitter.
How do you fight that?
I mean, like, literally the Pepsi Challenge, Pepsi won.
This is like ancient history, but Pepsi would win those wine taste tests because Pepsi had more sugar, but then over time people would prefer Coke.
Like, that seems like the same category of the brighter photo wins, even if the overall quality over time is lower.
How do you solve for that in a moment like this?
That was a debate that at Google we had all the time.
At Adobe, I'm hoping to put it more in the hands of the consumer or the creative professional.
Let them decide what the look will be.
But, of course, that was a constant debate, because you're right, brighter would often win in a one-to-one comparison.
One factor that you haven't mentioned that I should add in here is the tuning of the displays on these smartphones.
most smartphones are a little bit cranked relative to a calibrated so-called SRGB display.
They're more saturated than more contrasty.
You could argue that that's probably the right thing to do on the small screen.
It would be a terrible thing to do on a large screen.
It would look very cartoony.
But that kind of contributes to what people want to see and to taste,
especially since most photographs are looked at only on the small screen.
Yeah, it's a constant debate, a constant debate,
a constant emerging trend, it will probably change again. We can look at photographs from the 50s and 60s,
and partially because of Kodach's choices, but also because of their technology, we could identify
a kodachrome or an ectochrome picture. And we'll be able to recognize pictures from various
decades of digital photography as well. It's kind of your vision for a universal app.
And I recognize you're a team of one, building on team, many steps to come. But that anybody will
download it on any phone, and the image that comes out will look the same no matter of the phone.
That remains to be seen. One of the interesting questions sort of hidden under your question is
personalization, but also regionalization, and some phone vendors do regionalize their phones,
and some do not. At Adobe, I think the preferences to leave the creative decisions in the hands
of the photographers more so than the phone vendors have been doing in their software. Unknown.
how would that will shake out?
Well, the reason I asked, I mean, depending, again, I said there's many layers underneath
it, but assuming you get a sort of standardized access to sensor data, would your instinct
be to here's a relatively neutral thing for the creative to work from that looks the same
across every phone?
That's one possible path.
So to dive down a little bit into the way raw images are processed, you can take an
approach of just among SLRs, let's say, of putting out the kind of image that
that that SLR would have put out.
And so Nick Imaging, when the company still existed, did that.
Adobe tries to do its own white balancing and processing and give a fairly uniform look across any SLR.
That's a different decision.
And maybe that would be continued for the different smartphone vendors.
Maybe not.
Remains to be seen.
No one on the Lightroom team will tell me this.
I just don't think they ever will.
but I swear the auto button in Lightroom has been tweaked to look more like the iPhone.
So when you hit it, it generates a more even dynamic range in the highlights and shadows.
I swear they made this change like two years ago.
They will not admit it to me.
But that's like a, the reason I bring that up is that is for better or worse, the iPhone and the pixel and now Samsung, that is what photos look like.
They look like high dynamic range.
The vast majority of photos people will capture and see on online platforms.
platforms, that's what they look like.
To a large extent, that is an expectation that you helped create.
Do you think if we go look back and look at those cotochrome photos or ectochrome photos,
they did not look like that?
They could not look like that.
Do you think that is going to have an effect on the shape of photos to come beyond just,
well, now the sky's exposed and so is your face?
Well, it's useful, just from a practical point of view,
to not have highlights blown out and shadows crushed to black.
If you can institute controls at the time of capture,
you could let the photographer make some artistic decisions.
It's okay if this blows out.
It's okay if this gets crushed to black.
I actually want it to look like a silhouette.
And in the pixel four, we did that.
We had these brightness and shadow controls.
But to the extent a consumer just wants to press the button and get a photo,
they probably want to avoid blown out highlights and crushed shadows.
And so you're kind of forced into this regime where you do do high dynamic range tone mapping.
and then the question just becomes within that range.
Do you make it high tone?
Do you make it low tone?
Do you make it more like Caravaggio or more like Titian?
And those tastes could change over time.
They could be regionalized.
They could be personalized.
Who knows where that will go?
It's almost like you're asking me,
what will artistic tastes look like in 10 or 20 years?
Sure.
I wouldn't dare.
Yeah, but I wouldn't dare try to answer that.
I mean, to be very blunt about it,
the techniques you pioneered,
shipped,
have radically altered
the way that people
think of what a
photo should look like.
You're a good person
to ask for what
might happen the next time
around.
Right, but that's like
trying to ask the
people in the,
I believe it was
the Netherlands
who first developed
oil painting
in the Renaissance
and it eventually
got down to
Italy where they were
doing Tempora instead
and asking one
of those inventors of
oil paints,
so what are paintings
going to look like
in 200 years
because of your invention?
No idea.
Yeah.
How do you think, I mean, that the, one of the kind of the themes of the verge, and I know this is in your work too, is the tools have become so much easier to use.
Computational photography is hard.
Like, I'm sure you will agree.
It was not accessible to a lot of people for a long time.
The first HDR photo I took, I had to set up my DSLR on a tripod and take 15 frames and merge them using some plugin that hardly worked in Photoshop.
Now that's just all happening.
smartphone. Is that a trend that you can see the next thing that's hard or difficult or complicated
will get democratized? Maybe that's a better way to ask this question of what will change.
Yeah. Video and real time. What happens in the viewfinder while you're composing your shot?
What specifically with video? I think that's such a rich area now where we're just seeing
so much experimentation because the platforms are doing it at the platform level. And then we are
seeing because there's a closed loop of distribution, inspiration, creation, with users on a platform,
with the app in the same space that they distribute, it just seems to be happening way faster
than anywhere else. What do you see is the next kind of trend there that will, not creatively,
but in terms of actual caption edit? Well, so the same table stakes that my team and the other
players in the industry brought to still photography still has not been applied to video. So
things blow out too often on video. Shadows are crushed to black. The white balance is not stable.
All of those things need to be fixed. You don't need to look any further than video conferencing to
see how bad video is. So that all needs to be to be fixed. And then the question becomes,
beyond these table stakes, what are the creative effects you can do with either short videos or
longer videos that will be interesting or useful to consumers? And I think there's a lot
of games that can be played in that, in that space of synthetic motion blur or fast multiples,
the kinds of effects that you saw being done at the Olympics, the last Olympics. There's just a lot
of playing that can be done in there. One question, of course, is how easy can you make that for
consumers? Video editing has always been a harder hill to climb than photo editing. And so consumers
generally don't edit their videos.
So can you come up with user experiences, user interface paradigms that make that easier?
I think that's really going to be a challenge coming forward.
Scott Belsky was on the podcast last year now, I think.
Scott Belsky is a chief product officer at Adobe for the audience.
He and I had a conversation about whether Adobe ever feels like Instagram is just ripping
Adobe off, right?
Like inside of Instagram is a tiny version of Photoshop.
inside of TikTok is a tiny version of Premiere.
And that seems like, I mean, from my perspective, is great, right?
The tools are democratized.
They're being used.
They are teaching an entire generation of kids maybe to use other software.
Scott's answer was very much like, this is great.
It inspires us.
We're all just going to keep competing, which is always, that's what every executive says.
It seems like what you're going to say, too.
What is it that you think Adobe can do in this moment that the consumer platforms cannot?
I am going to give exactly the answer.
I didn't expect.
Well, let me nuance it a little bit.
When I was at Google, we published as we went along.
Clearly, we created fast followers by doing that publishing.
But it had the advantage that it would allow me as an executive to hire PhD superstars
who wanted the international reputation and interaction with a research community that publishing
brought.
and the larger impact as well.
And that enabled us to move even faster
because we had such smart and creative people.
And that's no different at Adobe.
Adobe research has a stable of the best imaging experts in the world.
They do publish.
That does mean that others will be able to follow what they do,
but it also means that they can innovate fast
and come up with lots of great innovations.
And I think that's just the way Silicon Valley
is developing. Not universally, Apple does not publish as much, but in many of the companies,
that is true. And so how does a company respond that is being fast followed because its own
researchers or engineers are publishing? They have to run faster and breathe deeper and come up
with better innovations. And I think that's what Adobe is going to have to do as well.
This doesn't seem, I mean, publishing right is one way to communicate, but I'm assuming the
ecosystem of people who can build these products is relatively small. There must be some amount
of conversation. Obviously, there's conferences like Cigraf, although probably all virtual right now.
Inside of the community, where are the tension points of photography, where are the,
of developing computational photography and video? Where are the debates?
So companies will have a natural advantage over academia, for example, because they can capture
more training data. And so a lot of the discussion these days is about the move from
classical algorithms to machine learning for many of these tasks.
And then the question is, what does your training data look like?
And the larger companies can do a better job of capturing training data.
So a lot of the discussion is around training data.
Another point of discussion right now is that computer graphics is getting so good.
You can't tell a real photograph from a computer graphics rendering anymore in most instances.
Could we use computer graphics to train computational photography algorithms?
There are just a whole lot of really, really juicy intellectual questions in there.
Those are the kinds of things that we debate endlessly at conferences about.
What would the, I mean, I think I can maybe guess, but what would the challenges or the ethical considerations of using computer graphics to train a computational photography algorithm look like?
Mostly the question of whether you're rendering a set of scenes that are representative of what you'll photograph.
That's the question for most machine learning models, is, is it a,
sufficiently diverse and representative training data.
Support for the show comes from Framer.
Framer is an enterprise-grade,
no-code website builder used by teams at companies
like Perplexity and Muro to move faster.
With real-time collaboration and a robust CMS,
with everything you need for great SEO,
not to mention advanced analytics that include
integrated A-B testing,
your designers and marketers are empowered to build
and maximize your dot-com
from day one. So whether you want to launch a new site, test a few landing pages, or migrate your
full.com, Framer has programs for startups, scaleups, and large enterprises to make going from
idea to live site as easy and fast as possible. Learn how you can get more out of your dot com from a
framer specialist or get started building for free today at framer.com slash verge for 30% off
a Framer Pro annual plan.
That's Framer.com
slash verge for 30%
off. Framer.com
slash verge. Rules and restrictions
may apply.
Support for the show comes from Upwork.
The days of doing it all,
all by yourself, are over.
There's no romance in burning out
while you're trying to scale.
Instead, you can check out Upwork.
Upwork helps grow your business
by giving you fast access
to specialize talent
across more than 125 categories so you can fill skill gaps, launch projects faster, and scale
without committing to full-time headcount. And finding the right talent is easy. You can browse profiles,
review past work, and get help scoping the role so you can get started quickly. Seriously,
you could connect with the right freelancer in just a few hours, especially when you sign up with
Business Plus. Their AI-powered shortlisting pairs you with the top 1% of talent in under six hours.
No endless searcher required.
You can visit upwork.com right now to post your job for free.
That's upwork.com to connect with top talent ready to help your business grow.
That's upw-w-rk.com.
Upwork.com
I feel like underlying this entire conversation ever since you said you can take the bus out of the Eiffel Tower.
That's great in one instance.
It could be applied in many unethical instances
do you think that as you create the tools, you have a responsibility to kind of build an ethical framework into them?
I realize that's a – we have like 10 minutes left, so that's like a thorny conversation, but I'm very curious.
I think that has to get layered on top of the technology.
And there are a number of aspects to it.
Bias is one aspect.
Authenticity of the photograph is another.
In some cases, it doesn't matter whether it's authentic.
You're making something creative.
In other cases like photojournalism, it's very important whether it's authentic.
And so those things are important.
They'll get layered on top of the technologies.
The technologies will be developed anyway.
Wait, let me challenge you in that, actually.
Right.
I mean, that is how we have heard technologists for decades now describe the nature of technology.
I would suggest that maybe that approach has had its ups and downs.
And particularly, we are very clearly seeing many of the downsides of saying the technology is neutral now.
Your project is capturing reality in some way.
Have you reconsidered that approach of I'm going to build the neutral platform and other people will make ethical decisions?
No. Science and engineering will move forward anyway, and the ethical use of it has to be layered on top of that.
There are very, very few instances in which you should actually stop the science and engineering.
I could think of a couple like genetic engineering, but I have not seen that in my area.
Do you think as more of our communication becomes visual, which is undeniably happening,
that things like deep fakes or here's a video of a protest where I've deleted at capture some of the people so you can't even check my work.
Do those things occur to you?
Yeah.
So deep fakes is a good example.
It's important to remember that a lot of the technology that is now useful in deep fakes has been developed and continues to be developed for the special effects industry for movies, where people go to the movies and very much enjoy the things they see there that they know aren't real.
would you prefer not to have a special effects industry and in any of the creative movies that they that they've made?
I mean, I think Star Wars has been a great addition to our culture.
Those things didn't happen.
There is a lot of fakery going on there.
It's a dual-use technology.
There's a little bit of a gatekeeping effect with the special effects industry, right?
Those things are hard to use.
They're hard to access.
They require skill.
They have to be commercially viable.
As you make all of those tools more democratic,
Does that change the nature of them?
Right.
But at that point, the horses, you can't close the door after the horses already left the barn.
You can't say, we're not going to allow those techniques to be used on the mobile platform
or for everyday photography.
The techniques were developed, they will be used there.
What society needs to do is to layer on top of that, the proper controls and expectations
so that they're not misused.
Let me give this a more specific example.
You mentioned regionalization on phones.
When I think of phone cameras and the single bit of regionalization that is most prominent to me, it is face smoothing, right?
Samsung phones will do it.
They do it more aggressively in Korea and other Asian countries.
Here in the United States, there's just a cultural norm that that is bad.
And then you talk to folks like Instagram, they will police the filters that are available on their app because they know that teenagers will enter bad cycles if the filters do communicate the wrong things.
that does seem like a big discussion, right?
Should we just automatically smooth faces?
Where do you think that, not to get two in the weeds, but again, we started with, I want the photos to look like paintings.
And now the question is, what should the painters make, right?
How do you think about those techniques and those tools at a company like Adobe where anything is possible, right?
Where like the goal is to empower the creative as much as possible.
Right.
That's a legitimate question.
And so, as you said, if you look across all the phone vendors, they've taken different
decisions on how much of that to do. And that's a complex decision that includes ethics and taste
and so on. And they may change their minds over the years. That is a question that gets layered
on top of the technology. And it may change. And maybe some kind of trace of authenticity is important,
but for most of the uses where photos get shared on social media, no one cares about the authenticity.
They just care. Is that really what this teenager looks like or not?
it's questions that will have to be answered sort of at the societal level and in the product decisions of each company at each moment in time.
There's no pat answer to that question.
Yeah, it's interesting because, again, the last time Scott was on, he was on to talk about Adobe's content authenticity initiative, which provides a level of being able to check if a photo on Twitter is really from the New York Times, right?
And I think that that makes sense for photojournalists.
It seems like it requires a lot of checking.
Like it will require a lot of checking over time.
But, you know, we just ran a giant package on citizen journalism effectively around the actions of the police.
And it feels like it's just going to come to everyone as more of the tools get democratized.
Right.
Right.
And it would be interesting to see whether it becomes pressure either from the phone vendors or from other software companies to put that kind of content authenticity into all photographs.
I'll be interested to follow that.
I want to just shift gears for one second.
I mean, I could do this conversation in particular for a long time.
But we have only talked about phones.
We have someone talked about DSLRs.
But Adobe also makes a suite of powerful desktop software.
They make Photoshop.
Do you see these techniques coming to other types of devices, laptops, tablets?
Do you see those cameras accelerating in the same way?
There are things you can definitely do on a desktop computer that you can never do in a phone.
and vice versa. How do you see those two things converging?
It may. There are a lot of hard intellectual questions to be answered. If, as I said earlier,
we're capturing bursts of frames, we're capturing depth maps, we're capturing metadata,
what do those look like as creative manipulable objects? How are they represented in programs like
Photoshop and Lightroom? How much storage are they taking in your cloud? There are a lot of
technical challenges there. And a lot of just design questions, what does the user experience
or the creative professionals experience look like manipulating those things? Is it too much
complexity or is there a way we can make it straightforward to do? And I look forward to getting
into exactly those design questions. I mean, I'm asked earlier, but you just said what does the
manipulable data look like? That really does feel like a new file format or a new container format
that is just going to have to be, okay, here's a re-editable photo.
Yes, I think that's the direction that the entire industry is headed.
And who will want to re-edit them and what kinds of re-edits will be supported
and what that software will look like is going to be very interesting going forward.
Do you think that those edits should travel with the file format?
I think the options should certainly be there.
Not everybody will want it.
If I share a picture on social media, I don't know that I need to share the whole re-editable blob.
Yeah.
But as the creator, I'd like as much flexibility as possible.
You have probably seen image formats come and go and get developed and have their various controversies.
What is the state of the sort of image format debate right now as everyone's moving towards this?
The big companies do not seem great at standardizing anything lately.
So I'm wondering if any company is pushing one direction or we're going to end up with 15 proprietary smartphone formats.
If I tried to make a prediction there, I would just get it wrong.
So I'd rather not make any prediction.
One thing I will say is that it's darn nice that JPEG has lasted as long as it has.
It means that I can go back and look at pictures that I took in the 1980s even and I can still view them.
Unfortunately, the same thing is not true in the video world because of the way the business models and the patent wars and so on developed in the video world.
I cannot look at a video file that I created in the 1990s or even in the early.
the 2000s, unless I have some very specialized software.
I'd like to see that settle down.
And we'll have to see what happens with these file formats that enable richer metadata.
Hopefully, we'll come up with file formats that will stand the test of time.
When you think about what in your new project, what that format could look like, would it just be the burst?
Or would it be a base frame and some additional information from the rest of the frames you capture?
You are asking the $64,000 question.
I don't know the answer to that yet.
That's exactly the right question to ask, though.
Yeah, I mean, it just seems, it seems like incredible.
Because the next question, I feel like fundamentally I've just been asking
what is the nature of reality over and over again?
The fundamental question is, which is the base frame?
How do you pick that base frame if you're only going to store the metadata from all the
other frames?
Because that decision becomes much higher stakes then.
Wouldn't you like the ability to change the base frame after capture if someone's
eyes were closed or they weren't smiling at that particular moment.
Well, yeah, but now you're storing all the frames, right?
Well, that's one possibility.
I feel like you will know the answer better than me.
Is that, but that's like, if you have a base frame and someone's eyes are closed and then you
only store the part of the image where the eyes are open, I've come right back to what is
the nature of reality, right?
Yeah.
Or a lot of people have proposed these digital photo montage methods where you either open the eyes
synthetically, or if it's a group shot and someone's eyes were open here and the other person
smiled in this other frame, do you combine parts of different frames? And then you are creating a
moment that never actually happened, but maybe it's still a better photograph. And I think a lot of
that stuff will happen. Yeah. How do you, again, I just, we have circled this. What do you think
the nature of reality is? The nature of reality matters more perhaps for photo.
journalism than it does for you and I taking a picture?
Well, so, I mean, that's like my bias, right?
Like, that's what we make.
So just an example, we get requests when we publish photos of protests to blur the faces.
And we say no, right?
I mean, I think other people, for good reasons, accept that request and they do it.
And I think people on social media do it.
We just would never, right?
We have to, that's our responsibility.
So that is definitely my initial frame of thinking.
But you're saying beyond that, maybe it doesn't matter so much.
When I take a picture of a landscape, I don't want it to look like something when I wasn't there, but some people will.
Some people would love to make the sky look clear or starry or something like that.
It is a complete spectrum of what people want that to look like.
Is it clearly an artistic interpretation?
Is it just a little tweak to remove the bus in front of the Eiffel Tower?
Is it exactly what I saw with all the grunge?
Everyone has a different opinion there.
And everyone has a different purpose for their photography.
That's great.
So just to wrap this up, you're obviously at the beginning,
you're obviously thinking about a wide sweep of things from sensor hardware to the ability to re-edit to file formats.
What does the next year look like for you?
What does success look like at a year?
The next year for me is just hiring a team of PhD superstars who would like to publish what they do
and are willing to roll up their sleeves
and build the next generation
of amazing computational photography.
You don't think you'll have product shipped in a year?
Probably not.
Probably not.
It just takes time to build these things.
When I took a leave of absence from Stanford
to move to Google to work on Google Glass in 2011,
Google Glass came out with our photography on it experimentally.
It was at least two years after that.
I don't remember exactly how.
long it was. These things do take time. Yeah. And that's because you want to go beyond what you have
described as table stakes. That's right. You're not just trying to build G-Cam for the Samsung phone
from Adobe. That's right. All right. Well, Mark, this was an incredible conversation. Like I said,
I think I could probably, I didn't even talk about smart phone displays with you. I feel like I
just do that for an hour. You're going to have to come back, especially as this project continues,
because I want to hear everything about it. Thank you so much for taking the time.
And thank you for inviting me. All right, my thanks to Mark LaVoy for joining us. That was
just a wild conversation. I want to talk to that guy again and again as he begins to build this
app. I think it's going to be really fun. We'll go back on Friday with the chat show, lots of news to
talk about its fall hardware season. Dieter's got those big reviews coming up. We'll see you then.
