Tech Brew Ride Home - Wed. 05/17 – Open Source Vs. Centralized AI, Part II

Starting point is 00:00:04 Welcome to the TechMeme right home for Wednesday, May 17th, 2020. I'm Brian McCullough today. If you've been letting some of your Google accounts sit fallow, you'd better take a look into that because Google is going to start deleting things soon. Why some new top-level domains have people concerned, why tech companies are racing to put generative AI on your phone. And part two of the open source versus centralized AI debate. Here's what you miss today in the world of tech. This is perhaps urgent news for some of you out there. Google has updated its inactivity policy, and going forward, accounts inactive for at least two years will be deleted, except for those with YouTube videos attached. This will start happening in December. So, if you have old Google accounts that I don't know, have emails or photos or whatnot from some earlier period in your life and you've just been letting it sit there dormant because you've been assuming it's fine, yeah, you're going to want to make arrangements, quoting 9 to 5 Google. If a Google account has not been used or signed into for at least two years, Google will delete that personal account and its contents.

Starting point is 00:01:12 In addition to the email address becoming inaccessible, Gmail messages, calendar events, drive, docs, and other workspace files, as well as Google Photos backups, will be removed. At the moment, Google is not planning to delete accounts with YouTube videos. That would be tricky, as some old abandoned clips might have historical relevance. Google will start deleting inactive accounts in December at the earliest and take a phased approach, starting with what they say are, quote, accounts that were created and never used again. The company says it is, quote, going to roll this out slowly and carefully, end quote.

Starting point is 00:01:47 Before deleting an account, we will send multiple notifications over the months leading up to the deletion to both the account email address and the recovery email if one has been provided, says Google. Meanwhile, this only applies to free Google. accounts and not those managed by a business or school. What keeps an account active? Besides signing in periodically being logged in and performing basic actions counts as activity. For example, reading or sending an email, like viewing an inactivity alert, using Google Drive, watching a YouTube video, downloading an app on the Google Play Store, using Google search,

Starting point is 00:02:23 using sign-in with Google to sign into a third-party app or service. Additionally, Google tells us that using a signed-in Android device is, considered activity. Google Photos already has a separate two-year sign-in and usage policy to be considered active. Meanwhile, accounts with active Play Store subscriptions like Google One or third-party apps are considered active. Today, Google recommends users assign a recovery email, and the company points users toward the inactive account manager to decide what happens to their account and data when it becomes inactive for a period of up to 18 months. Options include sending files to trusted contacts, setting a Gmail auto-responder,

Starting point is 00:03:01 or account deletion. In making this change, Google cites security, as inactive accounts, often with old or reused passwords that may have been compromised, are more likely to be compromised. This also, quote, limits the amount of time Google retains your unused personal information with this time frame considered to be an industry standard. Unlike other services with different security slash privacy implications, Google will not free up Gmail addresses to reclaim with deletions, end quote. Speaking of Google, cybersecurity researchers and IT admins have been raising concerns over two new Google top-level domains. The domains in question are dot zip and dot MOV, and the concern is that, you know, those look like file extensions and threat actors could use them for fishing and malware schemes. Quote, earlier this month, Google introduced eight new top-level domains or TLDs that could be purchased for hosting websites or

Starting point is 00:04:04 email addresses. The new domains are dot dad, dot esq, dot prof for professor. phd, dot nexus, dot fu, and for the topic of our article, the dot zip and dot MOV domain TLDs. While the zip and MOVTLDs have been available since 2014, it wasn't until this month that they became generally available allowing anyone to purchase a domain like bleeping computer.zip for a website. However, these domains could be perceived as risky, as the TLDs are also extensions of files commonly shared in forum posts, messages, and online discussions, which will now be automatically converted into URLs by some online platforms or applications.

Starting point is 00:04:47 Two common file types seen online are Zip Archives and MPEG 4 videos, whose file names end in dot zip for zip archive or dot MOV video file. Therefore, it's very common for people to post instructions containing file names with the dot zip and dot MOV extensions. However, now that they are TLDs, some messaging platforms and social media sites will automatically convert file names with dot zip and dot MOV extensions into URLs. For example, on Twitter, if you send someone instructions on opening a zip file and accessing a MOV file, the innocuous file names are converted into a URL as shown below. When people see URLs and instructions, they commonly think that the URL can be used to download the associated file and may click on the link. For example, linking file names

Starting point is 00:05:35 to downloads is how we usually provide instructions on bleeping computer in our articles, tutorials, and discussion forums. However, if a threat actor owned a dot-zip domain with the same name as a linkified file name, a person may mistakenly visit the site and fall for a phishing scam or download malware thinking the URL is safe because it came from a trusted source. While it's very unlikely that threat actors will register thousands of domains to capture a few victims, you only need one corporate employee to mistakenly install malware for an entire network to be affected. Abuse of these domains is not theoretical with cyber intel firm silent pushlabs already discovering what appears to be a phishing page at Microsoft-office.zip attempting to steal

Starting point is 00:06:16 Microsoft account credentials. These developments have sparked a debate among developers, security researchers, and IT admins, with some feeling the fears are not warranted and others feeling that the Zip and MOV TLDs add unnecessary risk to an already risky online environment. Open source developer Matt Holt also requested that the ZipTLD be removed from Mozilla's public suffix list, a list of all public top-level domains to be incorporated in applications and browsers. However, the PSL community quickly explained that while there may be a slight risk associated with these TLDs, they are still valid and should not be removed from the PSL as it would affect the operation of legitimate sites. Removing existing TLDs from the PSL for this reason would

Starting point is 00:06:57 just be wrong. This list is used for many different reasons, and just because these entries are bad for one very specific use case, they are still needed for almost all others, explained software engineer Felix Fontaine, end quote. Now for the day's AI news, it has become news, really, for our purposes when things like this get released. Stability AI took the wrappers off stable studio, an open-source version of Dream Studio, its commercial interface for the text to image stable diffusion model. Quoting the Verge, making an open-source version of Dream Studio carries benefits for stability AI. It allows community developers to improve and experiment with the interface with the company potentially reaping the rewards conferred by these improvements.

Starting point is 00:07:46 Stability AI stressed community building in its press release, noting how from enabling local first development to experimenting with a new plugin system, we've tried hard to make things extensible for external developers, end quote. Stability AI has previously leaned hard on its open source approach to create interest in its products. Various versions of Stable Diffusion have been freely available to download and tinker with since the model was publicly released back in August 2022, and last month, the company released a suite of open source large language models collectively called Stable LM. Stability AIs founder and CEO Imad Mostaki has been outspoken about the importance of making AI

Starting point is 00:08:24 tools open source in order to increase public trust, claiming that open models will be essential for private data in a Zoom call with the press last month. However, the company's approach sometimes seems to lack direction. For example, Stable Studio will be available alongside Dream Studio and potentially compete with it. The company has previously said it plans to generate revenue by creating customized versions of Dream Studio for corporate clients, but it's not clear how successful this strategy has been. Recent reports suggest the firm is burning through cash and note that its most important models like stable diffusion were built in collaboration with other parties, end quote. We've been talking about the open versus centralized AI debate, but another sort of angle to

Starting point is 00:09:11 that debate is going to some sort of cloud provider to use generative tools versus being able to run such tools locally on your own hardware. So interesting article from the Financial Times looking at how tech companies are racing to put generative AI natively on mobile devices. Quote, as tech companies rush to embed generative AI into their software and services, they face significantly higher computing costs. The concern has weighed in particular on Google, with Wall Street analysts warning that the company's profit margins could be squeezed if internet search users come to expect AI generated content in standard search results.

Starting point is 00:09:48 Running generative AI on mobile handsets rather than through the cloud on servers operated by big tech groups could answer one of the biggest economic questions raised by the latest tech fad. Google said last week, that it had managed to run a version of Palm 2, its largest large language model, on a Samsung Galaxy handset. Though it did not publicly demonstrate the scaled-down model called Gecko, the move is the latest sign that a form of AI that has required computing resources only found in a data center is quickly starting to find its way into more places. The shift could make

Starting point is 00:10:19 services such as chatbots far cheaper for companies to run and pave the way for more transformative applications using generative AI. You need to make the AI hybrid running in both of the data center and locally, otherwise it will cost too much money. Cristiano Amman, chief executive of mobile chip company Qualcomm, told the Financial Times, tapping into the unused processing power on mobile handsets was the best way to spread the cost, he said. Handsets lack the memory, though, to hold large models like the one behind chat GPT, as well as the processing power required to run them. Generating a response to a query on a device rather than waiting for a remote data center to produce a result could reduce the latency or delay from using an application. When a user's

Starting point is 00:11:02 personal data is used to refine the generative responses, keeping all the processing on the handset could also enhance privacy. More than anything, generative AI could make it easier to carry out common activities on a smartphone, for instance, when it comes to things that involve producing text. You could embed the AI in every office application. You get an email. It suggests a response, said, Amon, you're going to need the ability to run those things locally as well as on the data center, end quote. Rapid advances in some of the underlying models have changed the equation. The biggest and most advanced, such as Google's Palm 2 and OpenAI's GPT4, have hogged the headlines. But an explosion of smaller models has made some of the same

Starting point is 00:11:41 capabilities available in less technically demanding ways. These have benefited in part from new techniques for turning language models based on a more careful curation of the data sets they are trained on, reducing the amount of information they need to hold. According to Arvin Krishna, chief executive of IBM, most companies that look to use generative AI in their own services will get much of what they need by combining a number of these smaller models. Speaking last week, as IBM announced a technology platform to help its customers tap into generative AI, he said that many would opt to use open source models where the code was more transparent and could be adapted, in part because it would be easier to fine-tune the technology using their own data. Some of the

Starting point is 00:12:18 smaller models have already demonstrated surprising capabilities. They include Lama, an open-source language model released by meta, which is claimed to have matched many of the features of the largest systems, Lama comes in various sizes, the smallest of which has only 7 billion parameters, far fewer than the 175 billion of GPT3, the breakthrough language model OpenAI release in 2020. The number of parameters in GPT4 released this year has not been disclosed. A research model based on Lama and developed at Stanford University has already been shown running on one of Google's Pixel-6 handsets, as well as their far smaller, size, the open source nature of models such as this, has also made it easier for researchers

Starting point is 00:12:58 and developers to adapt them for different computing environments. Qualkom earlier this year showed off what it claimed was the first Android handset running Stable Diffusions Image Generation model, which has about one billion parameters. The chipmaker had quantized or cut down the model size to run it more easily on a handset without losing any of its accuracy, said Zyad Ashgar, a senior vice president at Qualcomm, end quote. And finally today, as promised, here's Ben Thompson's big recent essay on this debate between open-source AI and let's call it platform-based AI. He says that Google's recent I.O. suggests to him that AI is going to be a sustaining innovation for big technology, not a disruptor. The true fight will be between the major players' centralized models and the open source models.

Starting point is 00:13:52 Quote, over the past seven years, Google's primary business model innovation, has been to cram evermore ads into search, a particularly effective tactic on mobile, and, to be fair, the sort of searches where Google makes the most money, travel insurance, etc., may not be well suited for chat interfaces anyways. That, though, ought only increase the concern for Google's management that generative AI may in the specific context of search represent a disruptive innovation instead of a sustaining one. Disruptive innovation is, at least in the beginning, not as good as what already exists. That's why it is easily dismissed by managers,

Starting point is 00:14:26 can avoid thinking about the business model challenges by correctly telling themselves that their current product is better. The problem, of course, is that the disruptive product gets better, even as the incumbent's product becomes ever more bloated and hard to use, and that certainly sounds a lot like Google Search's current trajectory. I tend to believe that disruptive innovations are actually quite rare, but when they come, they are basically impossible for the incumbent company to respond to. Their business models, shareholders, and most important customers make it impossible for management to respond. If that is true, though, then an incumbent responding is in fact evidence that innovation is actually not disruptive, but sustaining.

Starting point is 00:15:03 To that end, I take this Google I.O. as evidence that AI is in fact a sustaining technology for all of big tech, including Google. Moreover, if that is the case, then that is a reason to be less bearish on the search company because all of the reasons to expect them to have a leadership position, from capabilities to data to infrastructure, to a plethora of consumer touchpoints remains. open source models running locally might be a big boon to Apple, but they are the truly disruptive threat to centralize companies like Google and Open AI. I think it is meaningful, though, that Google made clear it views AI as a sustaining innovation and that it intends to fully implement generative AI across its business, including search. Of course, that means there

Starting point is 00:15:44 are battles to come within that context. The aggressiveness and competitiveness we've seen from these large tech companies is a refreshing change from the stasis of the previous decade. At the same time, the fact that all of big tech is on board and given their supernatural, supernatural, I should say, nature will inevitably be incentivized to be a helpful and engaged partner to regulators all around the world, suggesting that the true fight will be between centralized models, which regulators will more easily work with, and open source. In this view, the recently proposed EU regulations for AI and the threatened crackdown on open AI via GDPR are simply the first salvo in what may be the defining war of the digital era.

Starting point is 00:16:25 Will centralized and thus controllable entities win, or will there be a flowering on the fringe of open models that truly explore the potential of AI for better or for worse? End quote. This is a long essay. It's expansive. I only dipped in and out to give you a taste. I highly recommend you read the whole thing.

Starting point is 00:16:42 Final link in today's show notes. Beep, beep. Who's got the keys to the Jeep vroom? I'm driving to the beach. Top down, loud sounds, see my peeps. oddly, that's a Zelda reference, although of course it's Missy Elliott. If you're playing the new Zelda game, then you know to a surprising degree, Tears of the Kingdom is a truck and cart, and also maybe airship and rocket ship building sim.

Starting point is 00:17:14 Replace peeps in those lyrics with Korox, and you get the point. Talk to you tomorrow.

Tech Brew Ride Home - Wed. 05/17 – Open Source Vs. Centralized AI, Part II

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.