Tech Brew Ride Home - Thu. 02/15 – We Already Have Gemini 1.5!

Starting point is 00:00:00 On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco. Hey, who did this to you? What happened next turned the story into a political firestorm. Reports have identified the victim as Bob Lee, the founder of Cash App. From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16. Welcome to the Tech meme right home for Thursday, February 15th, 2024. I'm Brian McCullough today. Google's moving fast. They've already launched Gemini 1.5. An EU court rules breaking encryption violates human rights,

Starting point is 00:00:47 why social media is flooded with posts of users returning their Applevision pros, and more proof that the AI moment is making InVDiv one of the most powerful tech companies in the world. Here's what you miss today in the world of tech. They only rolled out Gemini 1. What was it last week? And Gemini itself, is barely two months old, but Google today launched Gemini 1.5 to developers and enterprise users offering support for one million tokens and says Gemini 1.5 Pro is on par with its Gemini Ultra model, quoting the verge. There are a lot of improvements in Gemini 1.5. Gemini 1.5 Pro, the general purpose model in Google's system, is apparently on par with the high-end Gemini Ultra that the company

Starting point is 00:01:34 only recently launched, and it bested Gemini 1.0 Pro on 87% of benchmark tests. It was made using an increasingly common technique known as mixture of experts, or M-O-E, which means it only runs part of the overall model when you send in a query rather than processing the whole thing the whole time. That approach should make the model both faster for you to use and more efficient for Google to run. But there's one new thing in Gemini 1.5 that has the whole company, starting with CEO Sundar Pichai, especially excited. Gemini 1.5 has an enormous context window, which means it can handle much larger

Starting point is 00:02:10 queries and look at much more information at once. That window is a whopping 1 million tokens compared to 128,000 for OpenAI's GPT4 and 32,000 for the current Gemini Pro. Tocons are a tricky metric to understand, so Pichai makes it simpler. It's about 10 or 11 hours of video, tens of thousands of lines of code. The context window means you can ask the AI bot about all of that content at once. Pichai also says Google's researchers are testing a 10 million token context window. That's like the whole series of Game of Thrones all at once. As he's explaining this to me, Pichai notes offhandedly that you can fit the entire Lord of the Rings trilogy into that context window. This seems too specific, so I ask him, this has already happened, hasn't it?

Starting point is 00:02:55 Someone in Google is just checking to see if Gemini spots any continuity errors trying to understand the complicated lineage of Middle Earth and seeing if maybe AI can finally make sense of Tom Bombadil. I'm sure it has happened, Pichai says with a laugh or will happen, one of the two. Pichai also thinks the larger context window will be hugely useful for businesses. This allows use cases where you can add a lot of personal context and information at the moment of the query, he says. Think of it as we have dramatically expanded the query window.

Starting point is 00:03:23 He imagines filmmakers might upload their entire movie and ask Gemini what reviewers might say. he sees companies using Gemini to look over masses of financial records. I view it as one of the bigger breakthroughs we have done, he says, end quote. Quoting TechCrunch. Gemini 1.5 Pro can take in 700,000 words or 30,000 lines of code, 35x the amount Gemini 1.0 Pro can handle, and the model being multimodal, it's not limited to text. Gemini 1.5 Pro can ingest up to 11 hours of audio or an hour of video in a variety of different languages. To be clear, that's an upper bound. The version of Gemini 1.5 Pro available to most developers,

Starting point is 00:04:01 and customers starting today in a limited preview can only process around 100,000 words at once. Google's characterizing the large data input Gemini 1.5 Pro as experimental, allowing only developers approved as part of a private preview to pilot it via the company's GenAI DevTool AI Studio. Several customers using Google's vertex AI platform also have access to the large data input Gemini 1.5 Pro, but not all. During the private preview, Gemini 1.5 Pro with the 1 million token context window will be free

Starting point is 00:04:31 to use, Google says, but the company plans to introduce pricing tiers in the near future that start at the standard 128,000 context window and scale up to 1 million tokens. I have to imagine the larger context window won't come cheap, and Google didn't allay fears by opting not to reveal pricing during the briefing. If pricing's in line with Anthropics, it could cost $8 per million prompt tokens and $24 per million generated tokens, but perhaps it'll be lower. Stranger things have happened, we'll have to wait and see, end quote. Google also made Gemini 1.0 Pro and Gemini 1.0 Ultra generally available today, adding support for adapter-based tuning in Vertex and rolling out new developer tools. Different flavor species of regulatory ruling out of Europe today.

Starting point is 00:05:22 The European Court of Human Rights has ruled that backdoors that weak, end-to-end encryption violate human rights law. This was in response to Russia requiring telegram to decrypt messages beginning back in 2017, quoting Ars Technica. The International Court's decision could potentially disrupt the European Commission's proposed plans to require email and messaging service providers to create backdoors that would allow law enforcement to easily decrypt users' messages. This ruling came after Russia's intelligence agency, the Federal Security service, or FSS, began requiring Telegram to share users' encrypted messages to deter, quote, terrorism-related activities in 2017, ECHR's ruling said. A Russian telegram user alleged that

Starting point is 00:06:07 FSS's requirement violated his rights to a private life and private communications, as well as all telegram user's rights. The telegram user was apparently disturbed, moving to block required disclosures after Telegram, refused to comply with an FSS order to decrypt messages on six users, suspected of terrorism. According to Telegram, quote, it was technically impossible to provide the authorities with encryption keys associated with specific users, and therefore any disclosure of encryption keys would affect the, quote, privacy of the correspondence of all Telegram users, the ECHR's ruling said. For refusing to comply, Telegram was fined, and one court even ordered the app to be blocked in Russia, while dozens of Telegram users rallied to continue challenging

Starting point is 00:06:48 the order to maintain Telegram service in Russia. Ultimately, users' multiple court challenges failed, sending the case before the ECHR, while telegram services seemingly tenuously remain available in Russia. In the end, the ECHR concluded that the telegram user's rights had been violated partly due to privacy advocates and international reports that corroborated Telegam's position that, complying with the FSB's disclosure, would force changes impacting all its users. The confidentiality of communications is an essential element of the right to respect for private life and correspondence, the ECHR's ruling said. requiring messages to be decrypted by law enforcement, quote, cannot be regarded as necessary in a

Starting point is 00:07:28 democratic society, end quote. Martin Huseovec, a law professor who helped to draft EISI's testimony, told ours that EISI is, quote, obviously pleased that the court has recognized the value of encryption and agreed with us that state-imposed weakening of encryption is a form of indiscriminate surveillance because it affects everyone's privacy, end quote. So the social media cycle has turned over. The socials are full right now of Apple Vision Pro users saying they are returning their Apple Vision Pro devices. Why now? Well, it's because the traditional 14-day return window Apple offers on most of its products is closing. So if you got one just to test out and have any doubts at this

Starting point is 00:08:16 point, this is the time for folks to pull the ripcord. Not saying this is suggesting the AVP is a failed product. You can just get a lot of attention these days by saying anything about the AVP online right now. But I thought this analysis from Cult of Mac was interesting, quote, it's highly unusual to see hordes of early adopters returning a product from Apple, especially something as heavily hyped as Vision Pro. Without data from Apple, it's impossible to tell the real numbers. There could be a silent majority of people who are keeping their Vision Pros and keeping quiet about it. In fact, Cult of Mac's unscientific poll on X currently shows 55% of respondents plan to keep Vision Pro. However, if 45% of people did plan on returning their headset,

Starting point is 00:08:57 that would amount to a massive number of returns. Apple reportedly sold 180,000 headsets over the Vision Pro launch weekend. And I have definitely seen a lot more posts about returning Vision Pro than posts about keeping it. The main reasons for returning Vision Pro seem to fall into three camps. The headset seems too isolating. Vision Pro is too heavy and or uncomfortable. There is no compelling daily use case for the headset. Sebastian DeWith, developer of the highly rated Halide iPhone camera app, posted on X that he plans to return his headset. I'm returning mine, most likely, he wrote, it's cool tech, but it's a lot of money for an indie shop. Personally, I'm considering returning my Vision Pro because I haven't found it good for work. My feelings are well summarized by Quinn Nelson,

Starting point is 00:09:39 a Utah YouTuber content creator who wrote that he has little desire to use the headset for work. Nelson was working on his computer, trying to get something done, and his Vision Pro was sitting right there, but he had no desire to put it on to work in the headset, preferring to work on his computer. I feel the same way, even though I can project my computer screen much larger than my max physical screen, Vision Pro is not better for work. Yes, movies and TV shows and immersive experiences are great, but they're not $3,500 worth of great. Plus, these experiences can't be easily shared, end quote. Just my two cents here, Brian, I think a lot of people actually ordered Vision Pros with the intention of trying them out and then sending them back. I'll

Starting point is 00:10:21 straight up tell you I considered doing that. But also, I think a lot of people might also be making the same calculation that a lot of us made when we didn't pull the trigger on buying. This is a beta product almost. I'll wait for version two or three once they've worked out more of the kinks. A lot of the social posts I've seen say just that. They say that they liked what they experienced with Vision Pro, just not enough to pay $3,500, but they were sold on the idea of what it does. And once this comes in a cheaper, maybe different, form factor, they're planning on coming back. One more thing to think about. If there are a lot of returns, Apple will refurbish them and sell them as refurbished, right? Discounted? I wonder what sort of

Starting point is 00:11:02 price you could get one of these for. Maybe that's the strategy. Wait for the discount bin. From the file of Sam Altman's ambitions, the information is reporting that OpenAI has been developing a web search product, partly powered by Bing. Quote, it isn't clear whether the search product would be separate from ChatGPT, the Chatbot OpenAI runs, and which also uses Bing's index of the web to retrieve information to answer some questions. But ChatGPT, which runs in Microsoft's data centers, isn't as fast as Google in answering questions. OpenAI could be looking to speed up the service, which can be slow because it also does tasks like proofreading email drafts, generating poetry or computer code. If OpenAI launches the search service, it would further

Starting point is 00:11:52 heighten its rivalry with Google, which has scrambled to catch up to the start. startup in conversational AI. OpenAI relied in part on ex-Google employees to launch ChatGPT, and some of these employees previously developed machine learning models used by Google's search engine. The companies continue to compete fiercely for talent. The move to launch a search app comes a year after Microsoft's CEO Sachin Adela said his company would, quote, make Google dance by incorporating artificial intelligence from OpenAI into Microsoft's Bing Search Engine. That partnership has failed to dent Google's search dominance, end quote. Since we're here, though, I'll also update you on a bit of Open AI feature updates.

Starting point is 00:12:29 They've started testing a memory feature that is on by default and let's chat GPT and custom GPDs remember info about users and their previous conversations over time, quoting the verge. Memory works in one of two ways. You can tell chat GPD to remember something specific about you. You always write code in JavaScript. Your boss's name is Anna. Your kid is allergic to sweet potatoes. Or chat GPD can simply try to pick up those details.

Starting point is 00:12:54 over time, storing information about you as you ask questions and get answers. In either case, the goal is for chat GPT to feel a little more personal and a little smarter without needing to be reminded every time. Each custom GPT you use will have its own memory too. OpenAI uses the books GPT as an example. With memory turned on, it can automatically remember which books you've already read and which genres you like best. There are lots of places in the GPT store. You can imagine memory might be useful for that matter. The Tudor Me could offer a much better long-term course load once it knows what you know. Kayak could go straight to your favorite airlines and hotels. Jim Streak could track your progress over time. In many ways, memory is a feature

Starting point is 00:13:34 chat GPT desperately needs. It's also a total minefield. Open AI strategy here sounds a lot like the way other internet services learn about you. They watch you operate their services, learn about what you search for or click on or like or whatever else, and develop a profile of you over time. By default, memory will be turned on, and OpenAI says memories will be used to train its models going forward. Companies using ChatGPT Enterprise and Teams won't have their data sent back to the models. For now, memory is just a test. Open to a, quote, small portion of users, the company said in its blog post announcing the feature, but it's easy to imagine how quickly this might become a core part of the way we interact with ChatGPT for better or worse.

Starting point is 00:14:13 The bots are getting smarter, and they're getting to know us really fast, end quote. A.I. has detailed Stable Cascade, a new image generation model built on the Wurston architecture, which improves performance and accuracy compared to the SDXL architecture previously used, quoting Ventra Beat. Stability AI has been steadily iterating on its core stable diffusion model since 2022. The SDXL 1.0 release in July 2023 marked a new flagship release, which was further accelerated with the SDXL turbo update in November 2020. Stable Cascade uses somewhat of a different architecture than SDXL to generate images that stability AI researchers hope will be more efficient. The new approach builds on the

Starting point is 00:15:01 Wurst-Gin architecture, which uses a series of innovative techniques to improve performance and accuracy. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. The Wersh-Schen Research Abstract states. This highly compressed representation of an image provides much more detailed guidance compared to latent representations of language, and this significantly reduces the computational requirements to achieve state-of-the-art results, end quote. Unlike stable diffusion, which uses a single large model, stable cascade utilizes a pipeline of three distinct smaller models referred to as stages A, B, and C. This modular architecture

Starting point is 00:15:42 provides major advantages in training efficiency and customization. The first stage, stage C, transforms text prompts into a compact 24 by 24 pixel latent. Stages A and B then decode these latents into full high-resolution images. By separating the text-to-image generation from the image decoding, the initial text conditional model can be trained and fine-tune much more efficiently. According to Stability AI, fine-tuning Stage C alone provides a 16x cost reduction compared to fine-tuning in equivalently sized single-stable diffusion model. There is also the potential for direct preference optimization DPO to further improve image quality

Starting point is 00:16:20 in a 2023 interview with Venture Beat, Stability AI founder and CEO Ahmad Mostak, explain that DPO is an alternative approach to reinforcement learning used in models to tune them to human preferences, end quote. And you know, I always tell you to take things like this with a grain of salt, so telling you to do that again. But, Invidia passed Alphabet on February 14th as the third most valuable. U.S. company and the world's fourth with a market cap of around $1.83 trillion. They overtook Amazon in terms of market cap just the day previous, quoting Bloomberg. Shares of Nvidia rose 2.5% on Wednesday, closing with a market capitalization of about $1.83 trillion, and topping the search giant's value of roughly $1.82 trillion, data compiled by Bloomberg's

Starting point is 00:17:13 show. With the gain, the chipmaker has become the world's fourth most valuable company. Saudi Aramco, valued at around $2 trillion, looms as the... next milestone. Invidia's rally has been relentless this year. The stock has climbed about 49% and added some $602 billion in value, boosted by an insatiable demand for its accelerators that powered data centers running complex computational tasks required by AI applications. While other big tech shares have hardly performed badly in 2024, juxtaposed with Nvidia's rally, they appear to be relegated to the slow lane. The other mega-cap tech firms have already announced earnings, and Nvidia is slated to report February 21st.

Starting point is 00:17:50 end quote. Nothing for you today. Talk to you tomorrow.

Tech Brew Ride Home - Thu. 02/15 – We Already Have Gemini 1.5!

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.