The AI Daily Brief: Artificial Intelligence News and Analysis - Open Interpreter: How People Are Using Github's Hottest AI Project
Episode Date: September 12, 2023Open Interpreter is a local, private version of GPT4 Code Interpreter. It was released last week and quickly became the most active project on Github. NLW explores how people are using it so far, and ...what it says about the future of AI-driven personal computing. Before that on the Brief: Tesla soars 10% after a Morgan Stanley note says their Dojo AI could add $600B to their market cap; prisoners training AI; more authors sue OpenAI and more. TAKE OUR SURVEY ON EDUCATIONAL AND LEARNING RESOURCE CONTENT: https://bit.ly/aibreakdownsurvey ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're looking at the viral new tool open interpreter.
Before that on the brief, Tesla and META both soar in markets thanks to AI excitement.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our newsletter, our Discord, and our YouTube.
Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes.
We kick off today with a little bit of market analysis focused in on Tesla.
which was up 10% on the day yesterday.
Now, the specific reason for this jump was an analyst's note by Morgan Stanley that suggested
that Tesla's dojo supercomputer could have a huge impact on Tesla's market value.
Specifically, they predicted a $600 billion increase in Tesla's value based on Dojo.
Said Morgan Stanley analyst, Dojo can open up new addressable markets that, quote,
extend well beyond selling vehicles at a fixed price.
If Dojo can help make cars see and react, what other markets could open up?
Think of any device at the edge with a camera that makes real-time decisions based on its visual field.
Now, this perhaps lends a little bit of credence to arcs Kathy Woods comments from earlier in the year that, quote, Tesla is the biggest AI play out there.
Alexandros Marinos also wrote an interesting post on Twitter slash X about exactly this.
He writes, let me explain why Tesla is dominating the real-world AI game.
Tesla has taken a drastically different approach to autonomy than everything.
everyone else. They have invested a lot more in compute with Tesla designed AI chips both on car and in
cloud, making the sensor suite far cheaper, no LiDAR. As a result, they have the full sensor suite in
in almost every one of the million cars they have made annotated with human driver input. Nobody else
has anything resembling such a data collection network. As such, they are building a real-world
foundation model that they can then move to their robots, which will kickstart a whole other S-curve,
first in their factories and mines and eventually in a whole host of dangerous and grueling occupations.
What's more, as Tesla makes leaps and bounds, many others will fail to follow and give up.
The gap will widen because Tesla can embody their AI far cheaper than anybody else.
They are already working on a car that will cost them less than 20K to make, probably closer
to 10K. The robot will similarly be extremely low cost. And the more they sell, the more data
they get, the better their foundation model will get, and the more tasks their robots will be
able to do reliably. Remember, Open AI gave up on robotics because they didn't have enough data.
The combination of training lead and manufacturing lead, which will also mutually amplify with the bot,
makes it nearly impossible for any competitor to come close.
Tesla will license their software to others not because they want to make money, but to prevent regulatory scrutiny.
Already another car manufacturer is negotiating to get the Tesla autopilot in their cars.
Tesla can become both the Android and the Apple of Real World AI.
Now, one last note on markets, Tesla was not the only company to perform well surrounding AI.
After the Wall Street Journal confirmed rumors that had been swirling on Twitter
and X that Meadow was working on a new, more powerful model to rival GPT4, their stock jumped a little
over 3% yesterday as well. Next up, you'll remember that a few weeks ago, the White House
extracted voluntary pledges from seven big tech companies around responsible AI development.
The initial signatories included companies like OpenAI, Microsoft, and Google. As of today,
a new cohort have joined them bringing the total number of companies that have taken on the voluntary
commitments to 15. The new pledges include AI chip giant, Nvidia, government data-focused
Palantir, Stability AI, Adobe, Cohere, Scale AI, and Salesforce.
Now, there is a lot going on with AI in Washington this week.
Later today, the White House chief of staff plans to meet with several of these companies,
and then tomorrow, Senate Majority Leader Chuck Schumer is hosting a closed-door session
that will feature basically every big tech CEO participating.
On top of that, Senator Richard Blumenthal and Senator Josh Hawley are today expected to
unveil comprehensive AI legislation, and so it seems like we will be spending some time
in the District of Columbia this week.
And as if to put a fine point on why clarity is needed around the AI rules of the road,
yet another class action lawsuit has been filed by writers against an AI platform, in this case
once again OpenAI, for infringing upon their copyright by unlawfully training their AI systems
on their copyrighted books. The plaintiffs in this case include the extremely prominent writer
Michael Chabon. Ultimately, all of these lawsuits are trying to establish precedent and could
be very significant in how the AI field develops depending on their resolution.
Speaking of LLMs and controversy, one of the big questions surrounding LLMs is the role of human
involvement in training the models, specifically the often lower cost global South labor that's
used to help label data, as well as to review potentially controversial or explicit content.
We've talked before about the controversy of a number of Kenyans, who had an extremely
negative experience working with OpenAI, but now Wired has written about a new program in Finland
where prisoners are being used to help train AI. The Wired piece presents this in
surprisingly optimistic terms, but also does note that it may only work as such in a country like
Finland that has an extremely progressive and rehabilitation-focused prison system.
Now, wrapping up with a little bit of a product update today, last year around this time,
tools like Mid Journey and Stable Diffusion were only barely becoming usable. Of course, over the
subsequent year, AI image generation tools have made leaps and bounds. Now many are wondering if
text-to-video is in a similar spot to where text-to-image was 12 to 18 months ago, and it seems like
every day we get some exciting new feature showing the future of AI-generated video. This week, it was a set
of new features from PICA Labs, which include a number of new parameters for zooming and panning.
These immediately give creations with PICA Labs a much more cinematic quality, where people can
actually articulate a vision for a shot as though they were a director. The feature is basically
hours old at this point, but there are already dozens of examples of people using it with much
efficacy on Twitter. Now, this follows a new feature from PICA competitor runway about a week and a half ago,
which similarly gave users the ability to change the intensity of the motion in their video.
Javi Lopez writes, another step closer to the future of AI and filmmaking,
I truly believe that in the next five to ten years,
we'll be able to create film scenes indistinguishable from reality.
Anyways, friends, that is going to do it for today's AI breakdown brief.
Thanks, as always, for listening or watching,
and I will be back soon with the main AI breakdown.
Welcome back to the AI breakdown.
Yesterday on Twitter, Alvaro Sintes wrote,
open interpreter is going viral. Imagine running in your computer a free open source implementation of code
interpreter. It's like having a conversation with your machine, understanding you perfectly. You can tell
it to open the internet browser to perform a task, create and edit photos, videos and PDFs, plot and analyze
large datasets, pretty much anything you can think of. For every coder, researcher, and curious person
out there, this is a game changer. So what we are talking about is open interpreter. And as the founder
Killean put it, it's an open source code interpreter that runs locally. It works within the terminal
on your computer and has a chat GPT style interface that can interact with your documents and generally
lets you talk to your computer in natural language and actually have it do the things that you want it to do.
Now, there are obviously a couple different trends that this intersects with. First of all, it's hard to
deny that code interpreter is the most significant update to chat GPT since the launch of GPT4. In fact,
as you've heard on this show and many other places, to many developers, chat GPT with the integration
of code interpreter effectively functions like a GPT 4.5. Giving chat GPT the ability to write code to solve
problems for itself fundamentally expands the set of things that it can do. On the R slash chat
GPT subreddit, a user posted, code interpreter is incredibly overpowered. I just got access to code interpreter
and let me tell you it is incredibly OP. I've been playing around analyzing some data sets and it is
insane. With code interpreter, everyone is a data scientist, until you reach usage cap just like I did.
As an aside, my favorite comment came from Cyrus who wrote, people ain't going to like this here.
You're supposed to complain about how dumb GPT is now. So, okay, one of the trends and it's right there in the
name is code interpreter. Another big trend that this intersects is AI agents. The idea behind
AI agents is to give artificial intelligence platforms the ability to do more things.
In other words, not just to give you answers to questions, but to actually create software, or
or build applications, or do data analysis, or generally do things.
We've talked extensively on this show about how excited people got about AutoGPT and all the
projects like it, and why, despite the fact that there was a bit of a cooling off period in
terms of the hype cycle around AI agents, autonomous agents remain one of the biggest focuses
for developers in the entire AI and startup space.
Now, a third trend that this interacts with is privacy and AI that runs locally.
One of the big concerns around tools like ChatGBTT and really any conversational
AI platform is that the more customized to you they are or your company, to your information,
to your data, the better they're going to perform. But of course, if they are owned and operated
by a third party, giving them all that access to data and information is a security risk. It's a
privacy risk. From reporting around Apple, this has been one of the hangups that has led them to
not exactly be clear on their AI strategy in the first place. From an information report last
week about how Apple is now spending millions of dollars a day to develop AI, to train its AI,
quote, questions linger over how Apple can incorporate LLMs in its products. The company's leaders
prefer running software on devices, which improves privacy and performance, as opposed to on cloud
servers. But that could be difficult to achieve. Ajax GPT, which is Apple's LLM, has been trained
on more than 200 billion parameters. An LLM with more than 200 billion parameters couldn't
reasonably fit on an iPhone. Okay, so bringing it back to Open Interpreter, you've got the trends
of Code Interpreter itself, the trend of AI agents, and the trend of private, locally running software.
The way that Open Interpreter describes itself is this. Open Interpreter lets LLM's run code,
Python, JavaScript, Shell, and more locally. You can chat with Open Interpreter through a
chat GPT-like interface in your terminal by running interpreter after installing. This provides
a natural language interface to your computer's general purpose capabilities. Create and edit
photos, videos, PDFs, etc. Control a Chrome browser to perform research. Plot clean and analyze
large datasets, et cetera. Now, almost immediately this took off. Two days after launching,
founder Killian wrote, with 10K stars, 5K and line.
for the desktop app and 500 strong in the Discord, Open Interpreter is the number one GitHub repo in the world.
Killion also pointed out some of the project's next steps, which include integrating other open source models and improving the setup documentation.
The question, of course, is whether this is just a novelty and something that people are excited about in terms of what it might be in the future, or whether people are actually finding value now.
So let's look at five use cases of people who, in these very early days, have actually figured out how to do something with Open Interpreter that is contributing to this big hype cycle.
Let's start with something fairly basic.
Josh Wardini writes,
Open source interpreter by Hello Killian is wild.
Wanted data analysis with chat GPT on a table from my database, but was 1.5 gigs.
ChatGBTGPT only allows 512 megabytes.
Dragged file into console, asked interpreter to remove 70% of rows.
It wrote the Pandas Code and saved file on my system.
File size reduced in 20 seconds.
Uploaded and started asking questions via chat GPT.
You can see in the images that Josh shared, the file is too large maximum file
size warning from chat GPT, and then the Python code that Open Interpreter wrote and asked Josh
to confirm that he would like it to run. Another use case, this one shared by Killian himself,
was setting up most HuggingFace models for offline use with a single command. HuggingFace is,
of course, the home of thousands and thousands of open source models, and so simplifying usage
is something that obviously has a lot of application for many developers in this space.
Sangman.eath used Open Interpreter to download, cut, translate, and subtitle a clip from YouTube
of an interview with Sam Altman. The steps it took were to download the YouTube URL,
cut the clip in a specific area from 3447 to 3617, transcribe the clip into English,
then translate that English into Japanese, convert the subtitles, and then burn them to the video.
Pietro Sharano, the AI lead at Brex, used code interpreter to turn an image into a PNG,
create a website to display that PNG, and then open that website in Chrome.
And then there was Twitter user at Eshwa who wrote,
Today I asked Open Interpreter to make me an audiobook from a website.
The command line was, take this website and make an audiobook out of it.
Deeplearningbook.org.
Open interpreter responds to accomplish this task, we will follow these steps.
One, scrape the website to extract the text content.
Two, clean the extracted text to remove any unwanted elements such as HTML tags, JavaScript code, etc.
Three, convert the clean text into speech.
Four, save the speech as an audio file.
And just to give you a sense of how it actually works, this is a really instructive example.
Open Interpreter continues, let's start with the first step, which is to scrape the website.
We will use Python's Beautiful Soup library for this.
If it's not installed, we will install it using Pip.
We will also use the request library to send HTTP requests.
Let's check if these libraries are installed.
If not, we will install them.
So what you're seeing here is that Open Interpreter understands what tools it needs to accomplish the task
and takes as a first step making sure that those tools are available to it.
So, for example, the Beautiful Soup package was not installed on the machine, and so its first step was to install it.
Now, this is pretty amazing for the first few days of this tool being available.
But of course, there are still challenges.
One, as pointed out by commenters on product hunt, is that the fact that this runs through
Terminal means that its value is limited to a technologically enfranchised set.
Vikram Aditya writes, this looks very interesting for terminal native users.
However, I would strongly recommend focusing on a GUI for distribution.
People who are comfortable with Terminal are also people who are capable of getting the outputs
one way or the other, even if this tool didn't exist.
Kili and Lucas responded, hey, Vikram 100% agreed.
This project is all about converting natural language to machine code with a strong focus on non-programmers,
so I'm totally aligned with your vision. It shouldn't require a terminal of Python installation anything.
It should be an app. Happy to report that a desktop app with chat GPT like GUI is in development.
Still, even with the valid concerns about interface, overall, there is a lot going on to be very excited about here.
McKay Rigley tweeted and said, tools like these give us a peek of what AI-powered personal computing will look like.
And I think that's a really good way to describe it. In many ways, open and
interpreter and the things like it represent a fundamentally new and different way to control your
computer. Instead of clicking around and going where you need to go to do something, you can talk to
the computer in natural language and have it do those things for you. Now, it's quite clear that open
interpreter won't be the only implementation of this. In fact, in many ways, the chatbot assistants
that live directly inside different operating systems are likely going to be the way that most
people first experience this type of new interface. Still, for even slightly more sophisticated users,
there is a lot to like in a different open implementation versus one that's going to be tightly controlled by Microsoft or whoever else.
Miguel Lucas sums it up.
Every time I try Open Interpreter, it's like having a fluent conversation with my computer.
No more complex commands.
Speak naturally and it understands.
So that, my friends, is Open Interpreter, a very cool new project and one that I am keeping a close eye on.
Again, this came out just a week ago, so still has a lot of growing left to do.
But if you are interested in the project, go check it out at Open Interpreterpreterper.
interpreter.com. That's going to do it for today's AI breakdown. Until next time, peace.
