Tech Brew Ride Home - Tue. 07/30 – A (Mostly) AI Day

Episode Date: July 30, 2024

Perplexity wants to share ad revenue with publishers. But lots of AI companies are continuing to gamble with scraping. Meta’s new Segment Anything 2 model. AI influencers on Instagram. Canva makes a...n AI acquisition. And in non-AI news, Meta makes a huge settlement with Texas. Links: Perplexity is cutting checks to publishers following plagiarism accusations (The Verge) Websites are Blocking the Wrong AI Scrapers (Because AI Companies Keep Making New Ones) (404Media) Zuckerberg touts Meta’s latest video vision AI with Nvidia CEO Jensen Huang (TechCrunch) Instagram creators can now make AI doppelgangers to chat with their followers (Engadget) Canva acquires Leonardo.ai to boost its generative AI efforts (TechCrunch) Meta to pay $1.4 billion to settle Texas facial recognition data lawsuit (Reuters) Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco. Hey, who did this to you? What happened next turned the story into a political firestorm. Reports have identified the victim as Bob Lee, the founder of Cash App. From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16. Welcome to the Techmeme right home for Tuesday, July 30th, 2024. I'm Brian McCullough today. Perplexity wants to share ad revenue with publishers, but lots of AI companies are continuing to gamble with scraping. Meta's new segment anything to model. AI influencers on Instagram, Canva makes an AI acquisition and in non-AI news, Meta makes a huge settlement with Texas. Here's what you miss today in the world of tech. Man, it's all AI stuff today. Perplexity is launching a program to share ad revenue with partners such as Time, Der Spiegel, Fortune, and WordPress.com after weeks of plagiarism accusations. I guess that's one way to do it, quoting the verge.
Starting point is 00:01:20 Under this program, when Perplexity features content from these publishers in response to user queries, the publishers will receive a share of the ad revenue. Publishing partners will also get a free one-year subscription to Perplexity's Enterprise Pro Tier and access to Perplexity's developer tools plus insights through scalepost.a.ai, a new AI startup that helps secure partnerships between AI companies and publishers, such as how frequently their articles appear in search queries. Dmitri Chevalenko, Perplexity's chief business officer, declined to share exact deal terms, but said that the revenue share is a multi-year agreement with a double-digit percentage, consistent across all publishers, with especially favorable terms for the initial partners.
Starting point is 00:02:02 Perplexity spokesperson Sarah Platnik added that payments are made on a per source basis, meaning publishers are compensated for each article used in responses. The program will temporarily provide cash advances on revenue to publishers as perplexity builds a long-term advertising model. The advances aren't a licensing fee for content like OpenAI's deals. It's a much better revenue split than Google, which is zero. Automatic CEO Matt Mullenweg told me via direct message. The publishing agreement doesn't cover WordPress.org, but automatic will be sending payments to direct customers of WordPress.com. The amount? I don't know. Probably small to start because they don't make much revenue now. But if Perplexity is the next Google, which I think it has a chance of being, these numbers could become meaningful, and we're looking to help publishers get paid in every way we can, he said, end quote. This new program comes a month after a Forbes editor found the publication's paywalled reporting, plagiarized, and Perplexity's new product, pages. an AI-powered tool that lets users create a report or article based on prompts. The AI-generated version of the Forbes story, along with an AI-generated perplexity podcast of the story,
Starting point is 00:03:11 was then sent to subscribers via a mobile push notification Forbes reported. Wired, then published an investigation that found Perplexity's AI was, quote, paraphrasing Wired Stories, and at times summarizing stories inaccurately and with minimal attribution. Forbes has since threatened legal action against perplexity. Chevalenko told me, the company started work, on this program back in January well before the blowback, saying the team took inspiration from X's ad revenue sharing program. Perplexity planned to launch this program last month amid the drama, but decided to hold off until now, he said, I asked him if this was a well-timed apology tour,
Starting point is 00:03:47 or if it was just a stopgap to prevent lawsuits. Quote, we don't want people saying nasty things about us more than we don't want to get sued, as Chevalenko said, end quote. Yeah, but you get the sense that other folks are making the strategic calculation to just go ahead and risk getting sued at this point. For example, some popular sites like Kande Nast's titles and Reuters.com modified their robots. Text files to block Anthropics-specific bots, but Anthropic has allegedly just made new bots with other names. Other folks are apparently doing this as well, quoting 404 media. Hundreds of websites trying to block the AI company Anthropic from scraping their content are blocking the wrong bots, seemingly because they are copy-pasting
Starting point is 00:04:36 outdated instructions into their robots. Text files, and because companies are constantly launching new AI crawler bots with different names that will only be blocked if website owners update their robots.com. In particular, these sites are blocking two bots no longer used by the company while unknowingly leaving Anthropics Real and new scraper bot unblocked. This is an example of, quote, how much of a mess the robots.coms landscape is right now, the anonymous operator of dark visitors, told 404 media, Dark Visitors is a website that tracks the constantly shifting landscape of web crawlers and scrapers, many of them operated by AI companies and which helps website owners regularly update their robots.com files to prevent specific types of scraping. The site has seen huge
Starting point is 00:05:21 increases in popularity as more people try to block AI from scraping their work. Last week, Repair Guide site, I Fix It, said that Anthropics Crawlers had hit its website nearly a million times in one day. And the coding documentation deployment service Read the Docs published a blog post saying that various crawlers had hit its servers at a huge scale. One crawler, it said, access 10 terabytes worth of files in a single day and 73 terabytes total in May. This cost us over $5,000 in bandwidth charges and we had to block the crawler, they wrote. We are asking all AI companies to be more respectful of the sites they are crawling. They are risking many sites blocking them for abuse, irrespective of the other copyright and moral issues that are at play in the industry.
Starting point is 00:06:05 The Anthropic finding was published in a paper by the Data Providence Initiative that more broadly shows the pervasive confusion content creators and website owners face when trying to block AI tools from being trained on their work. The onus on blocking AI scrapers is put entirely on website owners, and the number of scrapers is constantly increasing. New scraper bots, often called user agents, are popping up all the time. AI companies sometimes ignore, the stated wishes of website owners and bots that are seemingly connected to well-known companies sometimes aren't connected to them at all, end quote. As best as I can tell, the calculation here is if we scrape to build our model, once we have the model, fine, we'll take what comes. But if you don't even have a model to begin with, you don't have anything. So scrape first and find out what happens later, I guess. Meta has released the Segment Anything Model 2 with support for
Starting point is 00:07:03 object segmentation in videos and images. The code and weights are available under an Apache 2.0 License, quoting TechCrunch. Segmentation is the technical term for when a vision model looks at a picture and picks out the parts. This is a dog. This is a tree behind the dog. Hopefully, and not this is a tree growing out of a dog. This has been happening for decades, but recently it's gotten way better and faster with Segment Anything being a major step forward. Segment Anything Two is a natural follow-up in that it applies natively to video and not just still images, though you could, of course, run the first model on every frame of a video individually. It's not the most efficient workflow. Scientists use this stuff to study like coral reefs and
Starting point is 00:07:45 natural habitats, things like that. But being able to do this in video and have it be zero shot and tell you what you want, it's pretty cool, Mark Zuckerberg said in a conversation with NVIDIA CEO Jeffson Huang. Processing video is, of course, much more computationally And it's a testament to the advances made across the industry in efficiency that SA2 can run without melting the data center. Of course, it's still a huge model that needs serious hardware to work, but fast, flexible segmentation was practically impossible even a year ago. The model will, like the first, be open and free to use, and there's no word of a hosted version, something these AI companies sometimes offer. But there is a free demo. Naturally, such a model
Starting point is 00:08:25 takes a ton of data to train, and meta is also releasing a large annotated database of 50,000 that it had created just for this purpose. In the paper describing SA2, another database of over 100,000 internally available videos was also used for training, and this one is not being made public. I've asked Meta for more information on what this is and why it's not being released.
Starting point is 00:08:46 Our guess would be that it's sourced from public Instagram and Facebook profiles, end quote. Meta has also rolled out AI Studio in the US, letting users create and share AI chatbots, and Instagram creators set up chatbots to answer DM questions, to answer DM questions and story replies. Quoting and gadget, the next time you DM a creator on Instagram,
Starting point is 00:09:12 you might get a reply from their AI. Meta is starting to roll out its AI studio, a set of tools that will allow Instagram creators to make an AI persona that can answer questions and chat with their followers and fans on their behalf. According to Meta, the new creator AIs are meant to address a long-running issue for Instagram users with large followings.
Starting point is 00:09:30 It can be nearly impossible for the service's most popular users to keep up with the flooded messages they receive every day. Now, though, they'll be able to make an AI that functions as an, and quote, extension of themselves, says Connor Hayes, who is VP of Product for AI Studio at Meta. These creators can actually use the comments that they've made, the captions that they've made, the transcripts of the reels that they've posted, as well as any custom instructions or links that they want to provide, so that the AI can answer on their behalf, Hayes tells Engadgett. Mark Zuckerberg has suggested he has big ambitions for such chatbots. In a recent interview with Bloomberg, he said he expects there will eventually be
Starting point is 00:10:07 hundreds of millions of creator-made AIs on Meta's apps. However, it's unclear if Instagram's users will be as interested in engaging with AI versions of their favorite creators. Meta previously experimented with AI chatbots that took on the personalities of celebrities like Snoop Dog and Kendall Jenner, but those characters proved to be largely underwhelming. One thing that ended up being somewhat confusing for people was, am I talking to the celebrity that is embodying this AI, or am I talking to an AI and they're playing the character? Meta's Hayes says about the celebrity-branded chatbots. We think that going in this direction where the public figures can represent themselves
Starting point is 00:10:40 or an AI that's an extension of themselves will be a lot clearer, end quote. AI Studio isn't just for creators, though. Meta will also allow any user to create custom AI characters that can chat about specific topics, make memes, or offer advice. Like the creator-focused characters, these chatbots will be powered by Meta's new Lama 3.1 model. Users can share their chatbot creations and track how many people are using them, though they won't be able to view other users' interactions with them, end quote. Canva is acquiring AI Image Generation Service Leonardo.a.i for an undisclosed amount.
Starting point is 00:11:21 Leonardo.ai launched in December 2022 and has more than 19 million registered users, quoting TechCrunch. The financial terms of the deal weren't disclosed, but Canva co-founder and chief product officer Cameron Adams said it's a mix of cash and stock. All of Leonardo.a.i's 120 employees will be joining Canva, including the executive team. Leonardo will continue to run independently of Canva with a focus on rapid innovation, research and development now backed by Canva's resources, Adams told TechCrunch. We'll keep offering all of Leonardo's existing tools and solutions. This acquisition aims to help Leonardo develop its platform and deepen their user growth with our
Starting point is 00:11:58 investment, including by expanding their API business and investing in foundational model R&D, end quote. Sydney-based Leonardo.AI, founded in 2022, was originally meant to focus on video game asset creation. The startups founders met while working at a video game company. But then Leonardo.a.i's team decided to build out the platform to meet more scenarios like creating and training AI models for image creation across industries such as fashion, advertising and architecture. Today, leonado.com.A.I. offers collaboration tools and a private cloud for models, including video generators, as well as access to APIs that less customers build their own tech infrastructure on top of Leonardo.ai's platform. Leonardo.a.i differentiates itself from other generative AI art platforms by the amount of control
Starting point is 00:12:42 that it gives users, co-founders, Jachin Basmi, and J.J. Faisan and Chris Gillis told TechRunch in an interview last December. For example, Leonardo.a.i's live canvas feature enables users to enter a text prompt and then make a quick sketch of what they want the end result to look like. As the user sketches, Leonardo.AI creates a photorealistic image based on both text and sketch prompts in real time. It's unclear how Leonardo.a.I. trains its in-house generative models like its flagship model Phoenix. It's an important question to ask about any generative AI service given the legal ramifications of training models on copyrighted content, sans permission. Leonardo.a.i's PR kept it vague when asked for clarification, saying only that the models are trained on licensed, synthetic, and publicly available slash open source data. Leonardo.AI has over 19 million registered users, and its tools have been used to create more
Starting point is 00:13:32 than a billion images. Leonardo.a.i is Canva's eighth acquisition overall, and its second acquisition this year, coming three months after it bought UK design company Affinity for an estimated $380 million. Canva also owns presentations startup Zeedings, free stock photography sites Pixabay and Pexels, and Czech-based product mock-up app smart mockups, end quote. Finally, would you believe non-AI news, META has agreed to pay $1.4 billion to settle Texas's lawsuit accusing META of using facial recognition tech to collect biometric data of millions of Texans without consent, quoting Reuters. The terms of the settlement disclosed on Tuesday marked the largest accord ever by any single state, according to the lawyers for Texas, whose legal team included the plaintiff's
Starting point is 00:14:24 firm Keller Postman. The lawsuit filed in 2022 was the first major case. to be brought under Texas' 2009 biometric privacy law, according to law firms tracking the litigation. A provision of the law provides damages of up to $25,000 per violation. Texas accused Facebook of capturing biometric information billions of times from photos and videos that users uploaded to the social media platform as part of a free discontinued feature called tag suggestions. A spokesperson for META said the company is pleased to resolve the matter and looks forward to, quote, exploring future opportunities to deepen our business investments in Texas, including potentially developing data centers.
Starting point is 00:15:02 It has continued to deny any wrongdoing. Texas and META said they reached an accord in May weeks before the start of a trial in state court was scheduled to begin. Meta separately agreed to pay $650 million in 2020 to settle a biometric privacy class action that was brought under an Illinois privacy law that is considered one of the nation's most stringent. The company also denied wrongdoing.
Starting point is 00:15:23 Alphabet's Google separately is fighting a lawsuit by Texas accusing the company of violating the state's biometric law, end quote. Now, I think more for you today. Talk to you tomorrow.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.