Tech Brew Ride Home - Thu. 02/15 – We Already Have Gemini 1.5!
Episode Date: February 15, 2024Google’s moving fast. They’ve already launched Gemini 1.5. An EU court rules breaking encryption violates human rights. Why social media is flooded with posts of users returning their Apple Vision... Pros. And more proof that the AI moment is making Nvidia one of the most powerful tech companies in the world. Links: Gemini 1.5 is Google’s next-gen AI model — and it’s already almost ready (The Verge) Google’s new Gemini model can analyze an hour-long video — but few people can use it (TechCrunch) Backdoors that let cops decrypt messages violate human rights, EU court says (Ars Technica) People are returning Vision Pro in droves … or are they? (Cult of Mac) OpenAI Develops Web Search Product in Challenge to Google (The Information) ChatGPT is getting ‘memory’ to remember who you are and what you like (The Verge) What comes after Stable Diffusion? Stable Cascade could be Stability AI’s future text-to-image generative AI model (VentureBeat) Nvidia Overtakes Alphabet, One Day After Eclipsing Amazon (Bloomberg) Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco.
Hey, who did this to you?
What happened next turned the story into a political firestorm.
Reports have identified the victim as Bob Lee, the founder of Cash App.
From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16.
Welcome to the Tech meme right home for Thursday, February 15th,
2024. I'm Brian McCullough today. Google's moving fast. They've already launched Gemini 1.5.
An EU court rules breaking encryption violates human rights,
why social media is flooded with posts of users returning their Applevision pros,
and more proof that the AI moment is making InVDiv one of the most powerful tech companies in the world.
Here's what you miss today in the world of tech.
They only rolled out Gemini 1. What was it last week? And Gemini itself,
is barely two months old, but Google today launched Gemini 1.5 to developers and enterprise users
offering support for one million tokens and says Gemini 1.5 Pro is on par with its Gemini Ultra
model, quoting the verge. There are a lot of improvements in Gemini 1.5. Gemini 1.5 Pro, the general
purpose model in Google's system, is apparently on par with the high-end Gemini Ultra that the company
only recently launched, and it bested Gemini 1.0 Pro on 87% of benchmark tests.
It was made using an increasingly common technique known as mixture of experts, or M-O-E,
which means it only runs part of the overall model when you send in a query rather than
processing the whole thing the whole time.
That approach should make the model both faster for you to use and more efficient for Google
to run.
But there's one new thing in Gemini 1.5 that has the whole company, starting with CEO Sundar Pichai,
especially excited. Gemini 1.5 has an enormous context window, which means it can handle much larger
queries and look at much more information at once. That window is a whopping 1 million tokens compared to
128,000 for OpenAI's GPT4 and 32,000 for the current Gemini Pro. Tocons are a tricky metric to
understand, so Pichai makes it simpler. It's about 10 or 11 hours of video, tens of thousands of lines of
code. The context window means you can ask the AI bot about all of that content at once.
Pichai also says Google's researchers are testing a 10 million token context window. That's like
the whole series of Game of Thrones all at once. As he's explaining this to me,
Pichai notes offhandedly that you can fit the entire Lord of the Rings trilogy into that
context window. This seems too specific, so I ask him, this has already happened, hasn't it?
Someone in Google is just checking to see if Gemini spots any continuity errors trying to
understand the complicated lineage of Middle Earth and seeing if maybe AI can finally make sense
of Tom Bombadil.
I'm sure it has happened, Pichai says with a laugh or will happen, one of the two.
Pichai also thinks the larger context window will be hugely useful for businesses.
This allows use cases where you can add a lot of personal context and information at the
moment of the query, he says.
Think of it as we have dramatically expanded the query window.
He imagines filmmakers might upload their entire movie and ask Gemini what reviewers might say.
he sees companies using Gemini to look over masses of financial records.
I view it as one of the bigger breakthroughs we have done, he says, end quote.
Quoting TechCrunch. Gemini 1.5 Pro can take in 700,000 words or 30,000 lines of code,
35x the amount Gemini 1.0 Pro can handle, and the model being multimodal, it's not limited to text.
Gemini 1.5 Pro can ingest up to 11 hours of audio or an hour of video in a variety of different languages.
To be clear, that's an upper bound.
The version of Gemini 1.5 Pro available to most developers,
and customers starting today in a limited preview
can only process around 100,000 words at once.
Google's characterizing the large data input Gemini 1.5 Pro as experimental,
allowing only developers approved as part of a private preview
to pilot it via the company's GenAI DevTool AI Studio.
Several customers using Google's vertex AI platform
also have access to the large data input Gemini 1.5 Pro,
but not all. During the private preview, Gemini 1.5 Pro with the 1 million token context window will be free
to use, Google says, but the company plans to introduce pricing tiers in the near future that start
at the standard 128,000 context window and scale up to 1 million tokens. I have to imagine the larger
context window won't come cheap, and Google didn't allay fears by opting not to reveal pricing during
the briefing. If pricing's in line with Anthropics, it could cost $8 per million prompt
tokens and $24 per million generated tokens, but perhaps it'll be lower. Stranger things have happened,
we'll have to wait and see, end quote. Google also made Gemini 1.0 Pro and Gemini 1.0 Ultra
generally available today, adding support for adapter-based tuning in Vertex and rolling out new
developer tools. Different flavor species of regulatory ruling out of Europe today.
The European Court of Human Rights has ruled that backdoors that weak,
end-to-end encryption violate human rights law. This was in response to Russia requiring
telegram to decrypt messages beginning back in 2017, quoting Ars Technica. The International
Court's decision could potentially disrupt the European Commission's proposed plans to require
email and messaging service providers to create backdoors that would allow law enforcement to
easily decrypt users' messages. This ruling came after Russia's intelligence agency, the Federal Security
service, or FSS, began requiring Telegram to share users' encrypted messages to deter, quote,
terrorism-related activities in 2017, ECHR's ruling said. A Russian telegram user alleged that
FSS's requirement violated his rights to a private life and private communications, as well as
all telegram user's rights. The telegram user was apparently disturbed, moving to block required
disclosures after Telegram, refused to comply with an FSS order to decrypt messages on six users,
suspected of terrorism. According to Telegram, quote, it was technically impossible to provide the
authorities with encryption keys associated with specific users, and therefore any disclosure of
encryption keys would affect the, quote, privacy of the correspondence of all Telegram users,
the ECHR's ruling said. For refusing to comply, Telegram was fined, and one court even ordered
the app to be blocked in Russia, while dozens of Telegram users rallied to continue challenging
the order to maintain Telegram service in Russia. Ultimately, users' multiple court
challenges failed, sending the case before the ECHR, while telegram services seemingly tenuously remain
available in Russia. In the end, the ECHR concluded that the telegram user's rights had been violated
partly due to privacy advocates and international reports that corroborated Telegam's position
that, complying with the FSB's disclosure, would force changes impacting all its users.
The confidentiality of communications is an essential element of the right to respect for
private life and correspondence, the ECHR's ruling said.
requiring messages to be decrypted by law enforcement, quote, cannot be regarded as necessary in a
democratic society, end quote. Martin Huseovec, a law professor who helped to draft EISI's testimony,
told ours that EISI is, quote, obviously pleased that the court has recognized the value of
encryption and agreed with us that state-imposed weakening of encryption is a form of indiscriminate
surveillance because it affects everyone's privacy, end quote.
So the social media cycle has turned over. The socials
are full right now of Apple Vision Pro users saying they are returning their Apple Vision
Pro devices. Why now? Well, it's because the traditional 14-day return window Apple offers on
most of its products is closing. So if you got one just to test out and have any doubts at this
point, this is the time for folks to pull the ripcord. Not saying this is suggesting the AVP
is a failed product. You can just get a lot of attention these days by saying anything about the
AVP online right now. But I thought this analysis from Cult of Mac was interesting, quote,
it's highly unusual to see hordes of early adopters returning a product from Apple, especially
something as heavily hyped as Vision Pro. Without data from Apple, it's impossible to tell the real
numbers. There could be a silent majority of people who are keeping their Vision Pros and keeping
quiet about it. In fact, Cult of Mac's unscientific poll on X currently shows 55% of respondents
plan to keep Vision Pro. However, if 45% of people did plan on returning their headset,
that would amount to a massive number of returns. Apple reportedly sold 180,000 headsets over the Vision Pro
launch weekend. And I have definitely seen a lot more posts about returning Vision Pro than posts about
keeping it. The main reasons for returning Vision Pro seem to fall into three camps. The headset
seems too isolating. Vision Pro is too heavy and or uncomfortable. There is no compelling daily use
case for the headset. Sebastian DeWith, developer of the highly rated Halide iPhone camera app,
posted on X that he plans to return his headset. I'm returning mine, most likely, he wrote,
it's cool tech, but it's a lot of money for an indie shop. Personally, I'm considering returning
my Vision Pro because I haven't found it good for work. My feelings are well summarized by Quinn Nelson,
a Utah YouTuber content creator who wrote that he has little desire to use the headset for
work. Nelson was working on his computer, trying to get something done, and his Vision Pro was
sitting right there, but he had no desire to put it on to work in the headset, preferring to work
on his computer. I feel the same way, even though I can project my computer screen much larger
than my max physical screen, Vision Pro is not better for work. Yes, movies and TV shows and immersive
experiences are great, but they're not $3,500 worth of great. Plus, these experiences can't be
easily shared, end quote. Just my two cents here, Brian, I think a lot of people actually
ordered Vision Pros with the intention of trying them out and then sending them back. I'll
straight up tell you I considered doing that. But also, I think a lot of people might also be
making the same calculation that a lot of us made when we didn't pull the trigger on buying.
This is a beta product almost. I'll wait for version two or three once they've worked out
more of the kinks. A lot of the social posts I've seen say just that. They say that they
liked what they experienced with Vision Pro, just not enough to pay $3,500, but they were sold on the
idea of what it does. And once this comes in a cheaper, maybe different,
form factor, they're planning on coming back. One more thing to think about. If there are a lot of
returns, Apple will refurbish them and sell them as refurbished, right? Discounted? I wonder what sort of
price you could get one of these for. Maybe that's the strategy. Wait for the discount bin.
From the file of Sam Altman's ambitions, the information is reporting that OpenAI has been
developing a web search product, partly powered by Bing. Quote, it isn't clear whether the search
product would be separate from ChatGPT, the Chatbot OpenAI runs, and which also uses Bing's
index of the web to retrieve information to answer some questions. But ChatGPT, which runs in Microsoft's
data centers, isn't as fast as Google in answering questions. OpenAI could be looking to speed
up the service, which can be slow because it also does tasks like proofreading email drafts,
generating poetry or computer code. If OpenAI launches the search service, it would further
heighten its rivalry with Google, which has scrambled to catch up to the start.
startup in conversational AI. OpenAI relied in part on ex-Google employees to launch ChatGPT,
and some of these employees previously developed machine learning models used by Google's search
engine. The companies continue to compete fiercely for talent. The move to launch a search app
comes a year after Microsoft's CEO Sachin Adela said his company would, quote,
make Google dance by incorporating artificial intelligence from OpenAI into Microsoft's Bing
Search Engine. That partnership has failed to dent Google's search dominance, end quote.
Since we're here, though, I'll also update you on a bit of Open AI feature updates.
They've started testing a memory feature that is on by default and let's chat GPT and custom GPDs
remember info about users and their previous conversations over time, quoting the verge.
Memory works in one of two ways.
You can tell chat GPD to remember something specific about you.
You always write code in JavaScript.
Your boss's name is Anna.
Your kid is allergic to sweet potatoes.
Or chat GPD can simply try to pick up those details.
over time, storing information about you as you ask questions and get answers. In either case,
the goal is for chat GPT to feel a little more personal and a little smarter without needing
to be reminded every time. Each custom GPT you use will have its own memory too. OpenAI uses the
books GPT as an example. With memory turned on, it can automatically remember which books you've
already read and which genres you like best. There are lots of places in the GPT store.
You can imagine memory might be useful for that matter. The Tudor Me could offer a much
better long-term course load once it knows what you know. Kayak could go straight to your favorite
airlines and hotels. Jim Streak could track your progress over time. In many ways, memory is a feature
chat GPT desperately needs. It's also a total minefield. Open AI strategy here sounds a lot like
the way other internet services learn about you. They watch you operate their services,
learn about what you search for or click on or like or whatever else, and develop a profile of you
over time. By default, memory will be turned on, and OpenAI says memories will be used to train
its models going forward. Companies using ChatGPT Enterprise and Teams won't have their data sent back
to the models. For now, memory is just a test. Open to a, quote, small portion of users,
the company said in its blog post announcing the feature, but it's easy to imagine how quickly
this might become a core part of the way we interact with ChatGPT for better or worse.
The bots are getting smarter, and they're getting to know us really fast, end quote.
A.I. has detailed Stable Cascade, a new image generation model built on the Wurston architecture,
which improves performance and accuracy compared to the SDXL architecture previously used,
quoting Ventra Beat. Stability AI has been steadily iterating on its core stable diffusion model
since 2022. The SDXL 1.0 release in July 2023 marked a new flagship release,
which was further accelerated with the SDXL turbo update in November 2020.
Stable Cascade uses somewhat of a different architecture than SDXL to generate images that
stability AI researchers hope will be more efficient. The new approach builds on the
Wurst-Gin architecture, which uses a series of innovative techniques to improve performance and
accuracy. A key contribution of our work is to develop a latent diffusion technique in which we
learn a detailed but extremely compact semantic image representation used to guide the diffusion
process. The Wersh-Schen Research Abstract states. This highly compressed representation of an
image provides much more detailed guidance compared to latent representations of language, and this
significantly reduces the computational requirements to achieve state-of-the-art results, end quote.
Unlike stable diffusion, which uses a single large model, stable cascade utilizes a pipeline
of three distinct smaller models referred to as stages A, B, and C. This modular architecture
provides major advantages in training efficiency and customization. The first stage, stage C,
transforms text prompts into a compact 24 by 24 pixel latent.
Stages A and B then decode these latents into full high-resolution images.
By separating the text-to-image generation from the image decoding,
the initial text conditional model can be trained and fine-tune much more efficiently.
According to Stability AI, fine-tuning Stage C alone provides a 16x cost reduction
compared to fine-tuning in equivalently sized single-stable diffusion model.
There is also the potential for direct preference optimization DPO to further improve image quality
in a 2023 interview with Venture Beat, Stability AI founder and CEO Ahmad Mostak,
explain that DPO is an alternative approach to reinforcement learning used in models to tune them to human preferences, end quote.
And you know, I always tell you to take things like this with a grain of salt, so telling you to do that again.
But, Invidia passed Alphabet on February 14th as the third most valuable.
U.S. company and the world's fourth with a market cap of around $1.83 trillion. They overtook
Amazon in terms of market cap just the day previous, quoting Bloomberg. Shares of
Nvidia rose 2.5% on Wednesday, closing with a market capitalization of about $1.83 trillion,
and topping the search giant's value of roughly $1.82 trillion, data compiled by Bloomberg's
show. With the gain, the chipmaker has become the world's fourth most valuable company.
Saudi Aramco, valued at around $2 trillion, looms as the...
next milestone. Invidia's rally has been relentless this year. The stock has climbed about
49% and added some $602 billion in value, boosted by an insatiable demand for its accelerators
that powered data centers running complex computational tasks required by AI applications.
While other big tech shares have hardly performed badly in 2024, juxtaposed with
Nvidia's rally, they appear to be relegated to the slow lane. The other mega-cap tech firms
have already announced earnings, and Nvidia is slated to report February 21st.
end quote. Nothing for you today. Talk to you tomorrow.
