Tech Brew Ride Home - Tue. 07/30 – A (Mostly) AI Day
Episode Date: July 30, 2024Perplexity wants to share ad revenue with publishers. But lots of AI companies are continuing to gamble with scraping. Meta’s new Segment Anything 2 model. AI influencers on Instagram. Canva makes a...n AI acquisition. And in non-AI news, Meta makes a huge settlement with Texas. Links: Perplexity is cutting checks to publishers following plagiarism accusations (The Verge) Websites are Blocking the Wrong AI Scrapers (Because AI Companies Keep Making New Ones) (404Media) Zuckerberg touts Meta’s latest video vision AI with Nvidia CEO Jensen Huang (TechCrunch) Instagram creators can now make AI doppelgangers to chat with their followers (Engadget) Canva acquires Leonardo.ai to boost its generative AI efforts (TechCrunch) Meta to pay $1.4 billion to settle Texas facial recognition data lawsuit (Reuters) Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco.
Hey, who did this to you?
What happened next turned the story into a political firestorm.
Reports have identified the victim as Bob Lee, the founder of Cash App.
From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16.
Welcome to the Techmeme right home for Tuesday, July 30th, 2024. I'm Brian McCullough today. Perplexity wants to share ad revenue with publishers, but lots of AI companies are continuing to gamble with scraping. Meta's new segment anything to model. AI influencers on Instagram, Canva makes an AI acquisition and in non-AI news, Meta makes a huge settlement with Texas. Here's what you miss today in the world of tech. Man, it's all AI stuff today. Perplexity is launching a
program to share ad revenue with partners such as Time, Der Spiegel, Fortune, and WordPress.com
after weeks of plagiarism accusations. I guess that's one way to do it, quoting the verge.
Under this program, when Perplexity features content from these publishers in response to user queries,
the publishers will receive a share of the ad revenue. Publishing partners will also get a
free one-year subscription to Perplexity's Enterprise Pro Tier and access to Perplexity's
developer tools plus insights through scalepost.a.ai, a new AI startup that helps secure partnerships
between AI companies and publishers, such as how frequently their articles appear in search queries.
Dmitri Chevalenko, Perplexity's chief business officer, declined to share exact deal terms,
but said that the revenue share is a multi-year agreement with a double-digit percentage,
consistent across all publishers, with especially favorable terms for the initial partners.
Perplexity spokesperson Sarah Platnik added that payments are made on a per source basis, meaning publishers are compensated for each article used in responses.
The program will temporarily provide cash advances on revenue to publishers as perplexity builds a long-term advertising model.
The advances aren't a licensing fee for content like OpenAI's deals.
It's a much better revenue split than Google, which is zero.
Automatic CEO Matt Mullenweg told me via direct message.
The publishing agreement doesn't cover WordPress.org, but automatic will be sending payments to direct customers of WordPress.com. The amount? I don't know. Probably small to start because they don't make much revenue now. But if Perplexity is the next Google, which I think it has a chance of being, these numbers could become meaningful, and we're looking to help publishers get paid in every way we can, he said, end quote. This new program comes a month after a Forbes editor found the publication's paywalled reporting, plagiarized, and Perplexity's new product, pages.
an AI-powered tool that lets users create a report or article based on prompts.
The AI-generated version of the Forbes story, along with an AI-generated perplexity podcast of the story,
was then sent to subscribers via a mobile push notification Forbes reported.
Wired, then published an investigation that found Perplexity's AI was, quote,
paraphrasing Wired Stories, and at times summarizing stories inaccurately and with minimal attribution.
Forbes has since threatened legal action against perplexity.
Chevalenko told me, the company started work,
on this program back in January well before the blowback, saying the team took inspiration from
X's ad revenue sharing program. Perplexity planned to launch this program last month amid the drama,
but decided to hold off until now, he said, I asked him if this was a well-timed apology tour,
or if it was just a stopgap to prevent lawsuits. Quote, we don't want people saying nasty things
about us more than we don't want to get sued, as Chevalenko said, end quote.
Yeah, but you get the sense that other folks are making the strategic calculation
to just go ahead and risk getting sued at this point. For example, some popular sites like
Kande Nast's titles and Reuters.com modified their robots. Text files to block Anthropics-specific bots,
but Anthropic has allegedly just made new bots with other names. Other folks are apparently doing this
as well, quoting 404 media. Hundreds of websites trying to block the AI company Anthropic from scraping
their content are blocking the wrong bots, seemingly because they are copy-pasting
outdated instructions into their robots. Text files, and because companies are constantly launching
new AI crawler bots with different names that will only be blocked if website owners update
their robots.com. In particular, these sites are blocking two bots no longer used by the company
while unknowingly leaving Anthropics Real and new scraper bot unblocked. This is an example of,
quote, how much of a mess the robots.coms landscape is right now, the anonymous operator of dark visitors,
told 404 media, Dark Visitors is a website that tracks the constantly shifting landscape of web crawlers
and scrapers, many of them operated by AI companies and which helps website owners regularly
update their robots.com files to prevent specific types of scraping. The site has seen huge
increases in popularity as more people try to block AI from scraping their work. Last week,
Repair Guide site, I Fix It, said that Anthropics Crawlers had hit its website nearly a million
times in one day. And the coding documentation deployment service Read the Docs published a blog post
saying that various crawlers had hit its servers at a huge scale. One crawler, it said,
access 10 terabytes worth of files in a single day and 73 terabytes total in May. This cost us
over $5,000 in bandwidth charges and we had to block the crawler, they wrote. We are asking
all AI companies to be more respectful of the sites they are crawling. They are risking many sites
blocking them for abuse, irrespective of the other copyright and moral issues that are at play in the industry.
The Anthropic finding was published in a paper by the Data Providence Initiative that more broadly shows the pervasive confusion content creators and website owners face when trying to block AI tools from being trained on their work.
The onus on blocking AI scrapers is put entirely on website owners, and the number of scrapers is constantly increasing.
New scraper bots, often called user agents, are popping up all the time. AI companies sometimes ignore,
the stated wishes of website owners and bots that are seemingly connected to well-known companies
sometimes aren't connected to them at all, end quote. As best as I can tell, the calculation here
is if we scrape to build our model, once we have the model, fine, we'll take what comes. But if you
don't even have a model to begin with, you don't have anything. So scrape first and find out
what happens later, I guess. Meta has released the Segment Anything Model 2 with support for
object segmentation in videos and images. The code and weights are available under an Apache 2.0
License, quoting TechCrunch. Segmentation is the technical term for when a vision model looks at a
picture and picks out the parts. This is a dog. This is a tree behind the dog. Hopefully, and not
this is a tree growing out of a dog. This has been happening for decades, but recently it's gotten
way better and faster with Segment Anything being a major step forward. Segment Anything
Two is a natural follow-up in that it applies natively to video and not just still images,
though you could, of course, run the first model on every frame of a video individually.
It's not the most efficient workflow. Scientists use this stuff to study like coral reefs and
natural habitats, things like that. But being able to do this in video and have it be zero shot
and tell you what you want, it's pretty cool, Mark Zuckerberg said in a conversation with
NVIDIA CEO Jeffson Huang. Processing video is, of course, much more computationally
And it's a testament to the advances made across the industry in efficiency that SA2 can run without
melting the data center. Of course, it's still a huge model that needs serious hardware to work,
but fast, flexible segmentation was practically impossible even a year ago. The model will,
like the first, be open and free to use, and there's no word of a hosted version,
something these AI companies sometimes offer. But there is a free demo. Naturally, such a model
takes a ton of data to train, and meta is also releasing a large annotated database of 50,000
that it had created just for this purpose.
In the paper describing SA2,
another database of over 100,000 internally available videos
was also used for training,
and this one is not being made public.
I've asked Meta for more information on what this is
and why it's not being released.
Our guess would be that it's sourced from public Instagram
and Facebook profiles, end quote.
Meta has also rolled out AI Studio in the US,
letting users create and share AI chatbots,
and Instagram creators set up chatbots to answer DM questions,
to answer DM questions and story replies.
Quoting and gadget,
the next time you DM a creator on Instagram,
you might get a reply from their AI.
Meta is starting to roll out its AI studio,
a set of tools that will allow Instagram creators
to make an AI persona that can answer questions
and chat with their followers and fans on their behalf.
According to Meta, the new creator AIs
are meant to address a long-running issue
for Instagram users with large followings.
It can be nearly impossible for the service's most popular users
to keep up with the flooded messages they receive every day.
Now, though, they'll be able to make an AI that functions as an, and quote, extension of themselves, says
Connor Hayes, who is VP of Product for AI Studio at Meta. These creators can actually use the comments that
they've made, the captions that they've made, the transcripts of the reels that they've posted,
as well as any custom instructions or links that they want to provide, so that the AI can answer
on their behalf, Hayes tells Engadgett. Mark Zuckerberg has suggested he has big ambitions for such
chatbots. In a recent interview with Bloomberg, he said he expects there will eventually be
hundreds of millions of creator-made AIs on Meta's apps. However, it's unclear if Instagram's
users will be as interested in engaging with AI versions of their favorite creators. Meta previously
experimented with AI chatbots that took on the personalities of celebrities like Snoop Dog and
Kendall Jenner, but those characters proved to be largely underwhelming. One thing that ended up
being somewhat confusing for people was, am I talking to the celebrity that is embodying this AI,
or am I talking to an AI and they're playing the character?
Meta's Hayes says about the celebrity-branded chatbots.
We think that going in this direction where the public figures can represent themselves
or an AI that's an extension of themselves will be a lot clearer, end quote.
AI Studio isn't just for creators, though.
Meta will also allow any user to create custom AI characters that can chat about specific topics,
make memes, or offer advice.
Like the creator-focused characters, these chatbots will be powered by Meta's new Lama 3.1 model.
Users can share their chatbot creations and track how many people are using them,
though they won't be able to view other users' interactions with them, end quote.
Canva is acquiring AI Image Generation Service Leonardo.a.i for an undisclosed amount.
Leonardo.ai launched in December 2022 and has more than 19 million registered users,
quoting TechCrunch. The financial terms of the deal weren't disclosed,
but Canva co-founder and chief product officer Cameron Adams said it's a mix of cash and stock.
All of Leonardo.a.i's 120 employees will be joining Canva, including the executive team.
Leonardo will continue to run independently of Canva with a focus on rapid innovation, research
and development now backed by Canva's resources, Adams told TechCrunch.
We'll keep offering all of Leonardo's existing tools and solutions.
This acquisition aims to help Leonardo develop its platform and deepen their user growth with our
investment, including by expanding their API business and investing in foundational model R&D, end quote.
Sydney-based Leonardo.AI, founded in 2022, was originally meant to focus on video game asset creation.
The startups founders met while working at a video game company.
But then Leonardo.a.i's team decided to build out the platform to meet more scenarios like creating and training AI models for image creation across industries such as fashion, advertising and architecture.
Today, leonado.com.A.I. offers collaboration tools and a private cloud for models, including video generators,
as well as access to APIs that less customers build their own tech infrastructure on top of
Leonardo.ai's platform.
Leonardo.a.i differentiates itself from other generative AI art platforms by the amount of control
that it gives users, co-founders, Jachin Basmi, and J.J. Faisan and Chris Gillis told TechRunch
in an interview last December. For example, Leonardo.a.i's live canvas feature enables users to
enter a text prompt and then make a quick sketch of what they want the end result to look like.
As the user sketches, Leonardo.AI creates a photorealistic image based on both text and sketch prompts in real time.
It's unclear how Leonardo.a.I. trains its in-house generative models like its flagship model Phoenix.
It's an important question to ask about any generative AI service given the legal ramifications of training models on copyrighted content, sans permission.
Leonardo.a.i's PR kept it vague when asked for clarification, saying only that the models are trained on licensed, synthetic, and publicly available slash open source data.
Leonardo.AI has over 19 million registered users, and its tools have been used to create more
than a billion images. Leonardo.a.i is Canva's eighth acquisition overall, and its second acquisition
this year, coming three months after it bought UK design company Affinity for an estimated
$380 million. Canva also owns presentations startup Zeedings, free stock photography sites
Pixabay and Pexels, and Czech-based product mock-up app smart mockups, end quote.
Finally, would you believe non-AI news, META has agreed to pay $1.4 billion to settle Texas's lawsuit
accusing META of using facial recognition tech to collect biometric data of millions of Texans without consent,
quoting Reuters. The terms of the settlement disclosed on Tuesday marked the largest accord ever by any
single state, according to the lawyers for Texas, whose legal team included the plaintiff's
firm Keller Postman. The lawsuit filed in 2022 was the first major case.
to be brought under Texas' 2009 biometric privacy law, according to law firms tracking the
litigation. A provision of the law provides damages of up to $25,000 per violation. Texas accused
Facebook of capturing biometric information billions of times from photos and videos that users
uploaded to the social media platform as part of a free discontinued feature called tag suggestions.
A spokesperson for META said the company is pleased to resolve the matter and looks forward to,
quote, exploring future opportunities to deepen our business investments in Texas,
including potentially developing data centers.
It has continued to deny any wrongdoing.
Texas and META said they reached an accord in May
weeks before the start of a trial in state court was scheduled to begin.
Meta separately agreed to pay $650 million in 2020
to settle a biometric privacy class action
that was brought under an Illinois privacy law
that is considered one of the nation's most stringent.
The company also denied wrongdoing.
Alphabet's Google separately is fighting a lawsuit by Texas
accusing the company of violating the state's biometric law, end quote.
Now, I think more for you today. Talk to you tomorrow.
