The Shintaro Higashi Show - What is ChatGPT?
Episode Date: January 27, 2025Large language models like ChatGPT are transforming the way we interact with AI. Peter explains its inner workings, how it understands language through probabilities, and its applications across vario...us domains. Shintaro brings relatable scenarios, exploring AI's practical uses, its limitations, and ethical concerns like bias and transparency. They also touch on the future of AI, from prompt engineering to potential advancements like agentic AI and multitask robotics. Whether you're curious about how ChatGPT works or how it might shape the future, this conversation offers engaging insights and practical takeaways. (00:00:00) Introduction (00:00:45) What Is ChatGPT? (00:02:19) Language Models and Probabilities Explained (00:05:28) Making AI Understandable for Everyone (00:06:43) ChatGPT's Limitations and Real-World Use Cases (00:13:17) Ethical Concerns and AI Bias (00:17:36) What Is Prompt Engineering? (00:20:58) AI in Specialized Applications (00:25:21) The Turing Test and AI Sentience (00:30:31) Full Self-Driving and AI in Robotics (00:44:33) What’s After ChatGPT? (00:50:50) Closing Thoughts If you're in business, then you have customer churn. Whether you're building a startup, growing a mom & pop shop, or operating in a fortune 500 powerhouse, Hakuin.ai measures, predicts, and improves your customer retention. https://hakuin.ai
Transcript
Discussion (0)
large language model.
Yeah, oh, you know large language model.
Wow, you're so condescending, god damn it.
That was not condescending.
I don't know, some people don't know.
Forget the Cho Bros in Korea,
I'm flying to Detroit tonight.
It's a joke, yeah.
Hello, welcome back to the Shintaro Higashi show
with Peter Yoo.
Today's gonna be so out of character for us.
Well, not out of character for Peter,
who's like a-
Yeah, we've been kinda doing this. Princeton slash tech guy AI PHD all that stuff I'm just
a judo bum but yeah what is chat GPT is gonna be the thing and I'm gonna try to
make it useful for the grapplers out there yeah so you know I we've you've
already asked me this question kind of you are you already using chat GPT right
I've never asked you what is chat GPT?
what do you think I'm a caveman?
no no as in like how do you use it?
like I told you about like some prompt engineering or something
yeah you asked questions yeah
so I thought
you made it sound like oh Peter what is this damn chat GPT on the computers? oh no I meant more like you make it sound like, oh, Peter, what is this damn chat GPT on the computers?
Oh, no, I meant more like-
You make it sound like that.
My God.
I know.
I meant more-
My God.
He asked more like, how does it work? What is it? And whatever.
So it happens that chat GPT is kind of close to my research area.
So I was thinking, you know, it's- I guess since it
I was thinking, you know, it's I guess it since it got out of you know, it became a reality It's just I think people a lot of people are using it including you and then we were just talking about using it for some
Stuff right and so I thought maybe we I can try to demystify it a little bit. Yeah, let's do it
Because I think a lot you also sent me a video. It's like, oh, you know
coding is dead or something because yes, I saw a lot of you also send me a video like, oh, you know, coding is dead or something.
Yes, because I saw a video like,
oh my god, all this AI stuff better at coding than software engineers,
they're going to be obsolete very soon.
You could code stuff through ChatGPT now.
Which is true, but I can kind of like maybe give my more nuanced take on that
as you're later.
Yeah, let's do it.
Alright, so what is chat GPT, right?
So just tell me your understanding of it.
Like you say it's like something you can ask questions, right?
It's a large language model.
Yeah. Oh, you know large language model.
Wow, you're so condescending, goddammit.
That was not condescending.
I don't know.
Some people don't know.
Forget the Chobros in Korea Korea I'm flying to Detroit tonight.
It's a joke yeah. So yeah it is a large language model right so that's great because now
that what is language what is a language model right so I think it comes starts from that and
then it's uh what language modeling is basically
It's a basically trying to answer the question. Can we assign a probability?
Mmm of how likely this given sentences? Okay, so like you can say something like I
Love I love judo and you know, I do judo every two weeks
Every twice a week or something.
That's like a valid sentence.
It should get more like a higher probability.
Yeah.
But as opposed to I, Shin Taro, you love Judo, all this random string of words
shouldn't get assigned a higher probability.
Right. They should get assigned a higher probability, right?
They should get assigned a lower probability. Yeah. So that's what chat-tipity and large language
models do at the fundamental level. That's all they do. Basically, it
knows how to assess if a given sentence is likely, it sounds natural or not.
Interesting. So isn't that how little kids kind of learn language as well?
Yeah, so in some sense there's some, you know,
analogy to it, like how kids learn language,
just by kind of like repeatedly hearing what natural language sounds like, right?
So basically language models kind of do the same thing,
but they basically read just a bunch of text
generated by humans, or like natural language text.
And it tries to learn...
What do you mean by read though?
Read? So basically, I mean... So when you're reading a text file or a document with a bunch of words,
you know, ultimately in the computer when you save it, it gets each letter basically gets assigned
like a number. It's binary, you mean? Yeah, so it's assigned a number ID, like an ID.
But it's not binary?
It is.
And then a number can be represented as binary.
Oh, yeah, yeah, of course.
Eventually, yeah.
So at the high level, each letter gets assigned the binary.
So that's kind of like how computers understand letters,
right?
But large language models, they use what they call tokens.
Basically, it's like a smaller than a word, but bigger than a letter.
It's kind of like a subword almost.
It's kind of, you know, some like, for example, like a computer.
It's like a compute and ER is like a separate.
I'll tell you, let me interrupt you.
Yeah.
And be completely frank.
Yeah.
And let's keep this because it's authentic.
You know why this is not good?
Because you went from zero to 60.
Like a lot of people who are listening who don't have this background, they'll be completely lost.
And we're talking about what is chat GPT, right?
Right.
You ask the thing questions and the user's AI and then spits out an answer. Yeah. Right. You ask the thing questions and the users AI and then spits out an answer Yeah, right and then there needs to be another layer above it before you could kind of go into the nitty-gritty of this
I'd say maybe I could start it this way. So yeah, you ask questions
You know, you gotta do it. You gotta do it. You got to do Eli 5 Eli 10 Eli
You know 20 and then Eli genius like me, you know, okay, so you like five. So, okay, let's go back
So you just kind of
remember how these things read like computers read text but okay let's
backtrack. You can right now chat your PT you can ask questions and you can it'll
generate answers right? You could ask it anything. Anything. It's pretty much
anything. It'll say something about it. How accurate is it? I Mean it depends like it's for certain things if there's a lot of information in the
Training data so to speak like what if it read a lot about it, you know be pretty accurate
But if it's like something novel, it's gonna be less accurate. That's kind of like the general
Sense you can have. All right, so quick question. Yeah
So since you can have all right so quick question. Yeah
You me had a flu and fever the other night. Yeah heart rate was like 160 170 while she was sleeping Oh, wow, I put them into chat GPT that you got to take it in a hospital right away. I get to the hospital
They're like
Kids with a fever. Yeah all the time 160 180 as long as you're drinking fluids that you can wake them up
They're not confused and falling over and yeah, like it's completely fine
I was like, yeah, but chat GPT and Google were telling me a different story
They were saying like they should never be shouldn't be above a certain. Yeah, so it was like 130 140
Yeah, and it was way above and beyond so essentially chat GPT was completely wrong about this thing and I also got multiple opinions
I also know, you know, I remember Elliot from the Dojo
And I also got multiple opinions. I also know you know, I remember Elliot from the Dojo
Doctor now he's an EO dog. Okay. Okay, and he's working in pediatrics. Yeah, he was like bow 180
Completely normal for a kid with a fever, you know
Yeah, so I think my guess you know, it's so these things are pretty opaque and how the internals but my guess is that
You know how web MD people joke about webMD says everything is cancer yeah you know and that's probably a conscious decision they
made the webMD people because they want they rather err on the other side than diagnosing
less right yeah true true true you know and then chat GPT and Google all these systems are probably tuned
Similarly because they don't want to get they don't want to it's better to be more careful
Than less so that so now you act you are worried and you actually go see a doctor and doctors are better equipped
I mean, they're more knowledgeable about the whole situation and they are able to give you like a more accurate
answer but you know if chat GPT said oh you don't have to do anything it's fine
then so does chat GPT it doesn't make the decision Google makes that decision
and all the information on Google through the language reading model where
a chat GPT reads it it's just kind of regurgitating what it said on Google or
where does it pull the information? So that's kind of okay, so let's kind of separate that because now the current
iterations of large language models can use Google, some of them, but large language model
themselves don't and they all this knowledge, interesting, yeah, all this knowledge is just
contained in large language models. They're like
just basically all this knowledge has been converted into numbers and then you can basically,
what you could do is you can ask the questions and then large language model will kind of basically
using all the knowledge contained in it generate the answers. But it's connected to the internet now, the ChachiBT 4.0.
I think you 4.0, I think there you can have a-
Is it 4.0 or 4.0?
4.0, 4.0.
And then you can have an option where you can turn it on.
Why is it not 4.0?
Cause ChachiBT 3.0 and now it's 4.0.
I thought it went ChachiBT 3.0 to 4.0.
It's OpenAI's naming scheme. I don't know why they named it. I have no was 3.0 to 40. I know. It's OpenAI's naming scheme.
I don't know why they named it.
I have no idea actually.
It's like a weird naming scheme.
All right.
Let me ask you another question.
Yeah.
You know you go on Reddit and say, hey, Reddit, you know, my kid has this and that.
And then there's all these subreddits like, oh, ER medicine, parent, you know, parenting,
whatever it is.
Would it ever go through all that and then make a decision based on that?
And then, because it has a source,
like source, I'm a doctor.
Obviously, it's not verified or anything like that,
but would it go through that,
take that into consideration?
So you're asking if Chetjipiti will search Reddit,
for your specific question.
Yes. No, I know they're not searching it.
But I wonder if it's part of the database.
Oh yeah, so they definitely use, they definitely,
I wouldn't be surprised if they scraped Reddit post
as part of their data.
I mean Reddit has a lot of knowledge in it, right?
But would it weight it more because it's coming from an ER doc versus like
uh no so that's like the stuff that it's not clear like even right now so you can't really even
you know if there's no way to tell how you know chachapiti will weigh different things it's not
like a conscious being you know it's more like uh it's basically it just kind of like treats all the text the same and then it just keeps trying to basically learn the distribution like the probability of a sentence from this gigantic training.
So everything is like, oh, this is probably right.
is like, oh, this is probably right. Exactly.
That's why it's dangerous.
That's why OpenAI hires a lot of people
to actually filter the data, so to get the training data
most accurate as possible.
That's a very, yeah.
Number one, you know how you published a paper recently?
Yeah.
Does ChatGPT know about it already?
I don't know. I'm not sure how they collect data, but if they continuously collect data...
But should it be live in real time? Like anytime you publish something, shouldn't it go into the thing?
Because training this thing takes so much energy and resource. It takes millions of dollars to train these things once.
So training is very, you know, it's not live basically. So that's why people are trying to
give access, give chat-shipping access and large language models access to Google because in this then it can
kind of pull in live information. If there's a team filtering the information and there is a
top-down leadership within that company what's to say that a lot of the stuff
we're getting is convinced or influenced by Sam Altman? I'm sure that happens
it's just like how you know you how all the social media companies, they have their own moderation teams, right?
And they have their own biases and you can agree with them or not. But it depends. That other companies don't really want to publish in detail how they collected their data.
So it's hard to tell, but they definitely have moderation teams, data cleaning teams.
Yeah, because that's very important.
These people are going to filter something towards their own biases.
I remember asking about Trump versus Kamala on the chat GPT and then
when you mess around with certain prompts, you kind of feel like there's a
bias towards the left. So it's hard to tell if it's because the language
the data it was trained on had that kind of bias or they put something in front of the in front of the GPDO like they call
guardrails to kind of circumvent that like kind of filter through some stuff but you know it's
hard to tell or you know maybe just it might be that there are more you know people with liberal
people on the internet are generally more liberal or something and then they generate more data that could be it too so it's really hard to tell so these
are very like complex opaque systems yeah yeah so that so that's yeah it's
hard to tell. so this is why chat gbt although it seems smart you
knew you have to kind of be careful. You always have to verify if what's generated
is accurate. And same with coding right now. You can code, but once you get to the more
complex tasks, you have to check.
Okay, so level one is like, all right, hey man, how many grams of protein can I absorb
in one sitting? Or like, what's the difference between skull crushers?
I don't know. I'm versus flat like that's level one question asking right? It's kind of yeah
It's pretty you know, they call it encyclopedic knowledge, right? You just kind of have to memorize
Chatchiputri should be very good at it
But you know like there's little intricacies like that because those are two things actually I asked Chatchiputri, right?
And then it was like well, you know flat
It's actually really I asked Chachi Patti, right? And then it was like, well, you know,
flat works majority of the long head of the triceps.
And then incline works more of the long head of the triceps.
And I was like, bro, that doesn't really help.
And then it was like, yeah, well, the angle of the muscle,
you know, fatigue is like, is the same.
And I'm like, yeah, no shit.
You're on an incline, you know what I mean?
And it wasn't very useful, you know?
Yeah, so that's-
Even though I know the answer to this question already
I wanted to ask it right and then so what is like alright, so that's level one stuff
You know like how can I make it level two sort of?
So I like a little bit more prompt engineering ish let's start with what is prompt engineering even though
I know what it is. How do you condescendantly explain it to me multiple times. So prop engineering is basically like it can we be smart about asking questions
to Chachapi so that we get the right answer right because it's called a prom like how to
prompt Chachapi to so that you know give us the right answer then it's becoming kind of an art and that kind of goes into the goes to the core of like the limitation of JGPT.
Basically, again, it's ultimately fundamentally it just knows how, when you ask a question, it just tries to kind of complete that question.
And it just happens that it's the training that is in the form of question and answer.
So because Chantraputri knows how this sentence, the question that starts starts with a question should end with an answer, that's what it is doing.
It's basically like, okay, I know how this sentence that started, this conversation that started with a question should end with an answer.
So I'm just going to produce the most likely sequence of words.
Yeah.
So that's what it is.
So ultimately, um, that's the limitation.
That's why it's very, right now, it's very like sensitive to how you ask the
question and then you have to prompt engineering and how I do, I don't know.
I'm not a prompt engineer and there's a whole research on how to ask the right question and
People have tried different things, but it's more of an art than science. There's no like a
You know after we had a discussion what is prompt engineering and you condescendingly told me the different
There's the context and this and this and the as if I was and then pretend like I was and you know
I am
You know answer, you know, and then you can kind of say like hey, you know
Make the answer very friendly because I'm explaining something to Chintar. Who's a retard, right? So
something like this and you know you play around with it and
Yeah, it really is. I forgot where I was going with this,
but it was actually really amazing.
You wanna give us an example of something like that?
Oh, prompt engineering?
Yeah, oh yeah, so I asked you these things, right?
And then you kind of gave me sort of a guideline,
and then just kind of give an example.
I said, hey, if you were on Warren Buffett's team,
you know, and if you were to analyze a company like SoFi
based on 10K and all this stuff,
and if you look at the financials,
like, would you invest in it?
And then give me an answer,
if I were to explain this to my friend
who I went to college with, who wrestles,
who doesn't really know what he's talking about,
who thinks he knows what he's talking about go
You know and it gave a really really good answer, you know, yeah Wow, that's this is like so
It is it is powerful and then yeah, basically
It's kind of read that if you ask if you ask someone like online people probably have the hey pretend you're someone and then it'll
Probably go through like the scenario, right? It's basically read that and then it's kind of mimicking it. But that's not to say this doesn't
think or doesn't know things because it does have a lot of knowledge, really does. And then a good way to think about this, and this was
suggested, this was explained to,
this was used by Ilya Suskeva, who is used to be the chief scientist of OpenAI,
basically the, you know,
person who led the effort to develop ChatGPT. And he said,
say you're like reading a,
reading a detective novel,
and you hear all these stories about all the characters, motives and backgrounds.
And at the end, the last sentence is like,
so now the murder is blank, right?
Ooh, yeah.
So in order to fill in that blank,
to generate the next word,
you have to have all the knowledge of the context and then
the background knowledge and all right over to get that right, right?
Yeah.
So chat GPT basically is trained that way.
Like it just predicts the next word based on the previous words.
But that's a way to estimate the probability of a sentence. But because it's read so much,
that in order to be so, in order to be,
its only goal is to be good at predicting the next word.
But in order to do that,
you actually have to have a lot of knowledge.
Interesting.
So that's why it knows all this,
it knows how to follow your prompt how to give you answer
But at the same time because it's only purpose is to finish the next word if it doesn't know sometimes it'll just make shit up. I
Wonder if it could like all right. So when is it gonna be a thing?
What's like if you're a judo player or a wrestler or whatever it is, you know to pose your Russell and Spencer Lee in the finals
The Olympics. Yeah, you know and then feed it all these different things and it's like what is a good right?
Do you think one day? Yeah, it'll be able to answer that question because there's not a lot of people who've written text on this
Stuff at all. I'm sure it can
I'm sure you can do it right now. I mean it's it's like it may not have the
Yeah, but how could it do it without having any text around it? No one's writing articles
Detail about like he's lead left right and there's left hand goes to the post on the head
You know if he grabs the wrist he goes for an arm drag or whatever it is
So have that information to read
I'm sure it knows a lot about wrestling
Strategies because people have written articles about it, I'm sure.
So, but it probably wouldn't have... I'm pretty certain that people have not.
But yeah, so if it doesn't, so what you could do is you can kind of provide that information
in the context, in the prompt, right? Like Spencer Lee does this and then whatever.
And then you can kind of have a conversation because he knows how to kind of reason about it.
Like, oh, it probably has a basic knowledge of wrestling and all
but if you want to do if you want it to be like a net wrestling expert you probably have to
collect some data on wrestling and try to kind of
Yeah, basically train it further on the wrestling data if you want to instill that and then that you know a lot of these companies open AI, Anthropic, all these companies
actually offer this type of service to enterprise clients so you know like if
you're for example yeah KBI is to say you want to want to pay for a chat
sheet that's like a Judo expert.
You could kind of gather all the data you have around Judo
and then give it to OpenAI or Anthropic.
What if I want to analyze churn?
Churn, yeah. You could also do that too.
You can kind of, you know, Hakuin AI or Andrew's company, Drew's company.
What a great segue.
I know.
Drew's company could have gathered all this data
about their customer churn and then give it
to partner with OpenAI or Anthropical or what have you.
They'll develop their own ChatGPT for a couple weeks.
Drew came to Judo, by the way.
Oh, yeah?
Last night.
So do you think one day ChatGPT can read all the mind body
data and the attendance data, given that I take attendance consistently,
and then say, you know what, it's about time Drew is coming, and then it could sort of monitor Drew's freaking internet activity or something, right?
Yeah.
And then predict when he's coming in next?
So it can kind of do it already in a reasonable sense as long as you give access to the information.
But Google has access to all that information, no?
Yeah, so now the problem right now, the limitation is that you're kind of going into what they call like agentic instead of generative.
It's now the new hype is agentic AI.
Yes.
You've heard of that term?
Wow, you and your condescension man.
What do I mean?
No, I don't know.
Guys, listen to the tone of the way Peter talks to me and then comment below on the
video whether or not he's being condescended or not.
Maybe it's just my insecurity.
I have no idea.
Do you know what agentic AI is?
I only heard about agentic AI the term only recently.
It's new for me.
And so basically, the idea is that can we
give these large language models access to tools?
It doesn't know not a lot of it's actually
kind of hard to teach these things how to use tools.
Because it kind of messes. If you want to use tools, Yeah. Because it kind of messes, you know, if you want to use tools,
you have to be very precise on the commands and whatever.
So and like I said, chat GPT can, large language models can mess up in that.
Anyway, so that's kind of like the newer research area,
but you can imagine a more agentic chat GPT
that has access to your mind body system that can kind of go and pull the information processes and then output some actions.
And yeah, of course that's actively being researched on as a product.
Every AI movie has like this touring test, you know, yeah, like alright, so if you were having a conversation
What is the touring test good for the audience?
It's when you're talking to the thing. Yeah, and sort of like this blinded setting
Yeah, do you know that you're talking to a machine or do you feel like oh is this a
side as a human
How is that for you Peter? God damn? Man you are the got side as a human. How was that for you, Peter? God damn, that was so perfect.
Man, you wanted to got to me so bad.
No, no, I could say, I know you know.
I know you know.
Yeah, you love that stuff.
I mean, I love the movie Ex Machina, that's why.
That's the only reason why I know that.
Oh, did they have a touring test scene?
Well, the whole movie was about-
The whole movie is a touring test.
A touring test, yeah.
So what's your question about- It's a touring test within a touring test. So do they do these like tests in the movies like
okay let's see if this will pass the touring test. If researchers actually perform touring tests.
Yeah on this thing? The touring test has been beaten a long time ago. Oh test. Yeah. So the Turing test has been beaten long time ago.
Oh, wow. Yeah, before Chatchat BT. You don't actually. And that's why. So there's some
flaws like a Turing test itself is kind of under specified. So it's not like it's a cool
thought experiment, but it's not a reverse scientific test. said that scientists have performed that test but
turns out it's very easy to fool a human you can kind of see how you know like
yeah yes all these people scamming other people on the phone you know like when
you're on tinder or something like sometimes I don't know exactly am I
talking to a robot a prostitute or is humans on the side you know so it's a pretty easy to fool
humans as a machine all right next question ready yeah is it possible for these things to be sentient
one day it's kind of like a lex freeman freeman kind of question right i i don't so first of all
for me i tend to kind of avoid that question because for me sentience or consciousness are very like, they're not precise terms.
So for some people, I think it's valid that you could say Chachapiti is a sentient being. Some people made that claim. I'm not saying I don't agree. Let's discuss a definitive term.
Yeah. As a sign what it means to be.
OK, let's just say the moment.
What is your earliest memory?
Maybe like a top your head quick.
Yeah, I was playing with train toy trains at my grandpa's house.
Sure there were Barbies.
I didn't have Barbies. I wish I did. I would have loved it. Playing trains with my grandpa's house. You sure there weren't Barbies? I didn't have Barbies, I wish I did.
Playing trains with your grandpa. Yeah. That moment, when do you think a computer can have that?
Or is it even possible for that to happen? See that moment is okay let's just say I see unconscious
to conscience. You're talking about like subjective experience right like your own subjective experience about the world kind of like that I mean
I think this is like me making stuff up now yeah right but the moment you have
that first memory now that you recall that thing there was something that
happened that moment right now all of a sudden you're obviously you were sent it
didn't the thing But something shifted over.
Right.
Uh-huh.
Can that happen?
So if you're just saying, oh, we're just talking about forming memories.
I think, you know, Chachbi can definitely do that already.
You know, it can kind of it can remember.
It becomes aware of itself.
I don't even know what that means. What does that mean? It can kind of, it can remember things about you. The moment it becomes aware of itself.
I don't even know what that means. What does that mean?
That you're a separate being?
That what, is that what,
I don't know what that means, aware of yourself.
How does that mean to you?
I mean, pre you playing with trains with your grandpa,
you were just kind of existing.
So yeah, I think what you mean is that it's not just about like a forming a memory about the world
But it's a forming a memory about yourself like how you felt and how yeah, like this is me who I am
Yeah, I
I'm sure
So
This kind of the what they call the existence proof, right?
We have already we already have things that can do that, like animals, like humans definitely.
I would say my dog definitely has that kind of subjective experience.
Is your dog smarter than me?
I don't think so.
I should write, yeah, that would be crazy.
Yeah.
And in that sense, I think it's possible that a machine can do it
Because at the end of the day, like if you cannot take this reductionist view, we are a machine
That made of proteins, so I think it is possible
But I don't know how far away we are from that moment
And for me, it's kind of a a it's a fun question to ponder
about but I think it's kind of a pointless question yeah interesting do
you think you could teach Tesla's robot to do judo one day oh so that yeah I'm
sure but you know what's funny teaching physical task is a lot harder than teaching the machine these mental tasks,
like math.
So I forget the name of the paradox because a lot of for the early days of AI, a lot of
people thought it would be harder to teach a machine to play chess than teach it how
to walk. But turns teach it how to walk.
But turns out it's the opposite. It's a lot harder to teach a robot to walk properly
than teach how to chess.
So yeah, so that's, and there's a reason why
the evolution took a lot longer to have animals that,
you know, can walk and really navigate through the world then
you know developing this cognitive abilities yeah so judo will be teaching
the robot to actually do judo with people it's gonna be harder very hard
yeah it's a yeah humans are amazing at physical tasks.
Yeah.
Yeah, it's amazing how even driving, you know, people say, oh, humans are terrible drivers, so we should like have robots drive for us.
It's actually, if you get into it, it's amazing how we as drivers, it's driving is a very hard task and humans are exceptional at it Yeah, what's the new thing in chat GPT like they talk about like all right it leveled up from 3.0 to 3.5 to yeah
For all like huge jumps right like logarithmically humongous
Job, yeah
exponentially yeah
So what makes it such a big jump from one to the next?
Is it computing power or what is it?
It's actually, in the research field, we don't think we're making this step function jumps.
A lot of researchers, including me, were kind of stagnating actually since
Chai LGBT.
So a lot of people are trying to find the next breakthrough.
And then like, for example, 4.0, I think, OpenAI basically kind of gave the Chai LGBT
the ability to reason as in like a multi-step reason.
So instead of right now, the first iteration of chat GPT, I just kind of like giving you
a gut answers like whatever comes up to your mind first.
So that's why a lot of it's wrong.
But what humans do is like, we can kind of stop and kind of think through it before we
answer, right?
We can gather all this, you know, like when you play chess, you know, Magnus Carlsen said,
Magnus Carlsen, no, what's his name?
Magnus Carlsen, right?
Yeah, Magnus Carlsen.
Yeah, he's the same guy.
He says when you when he looks at the board, he has all these moves, right?
And he spends the next minute or whatever to disprove that these are good moves.
So that's kind of what people are trying to make large language models do now.
So we can generate all these gut answers, but like, can we actually think about these answers and then reason through them to be more accurate?
So that's like another breakthrough or
like people are trying to improve on. Yeah and so now it's also generating images chat gpt right?
So yeah you can't do that as image generators. So the image generator is a separate model.
So it actually so chat gpt can't generate images by itself.
So it basically generates, they have another model that takes text and then generate images.
And then so what happens is like if you ask Chatchapiti to generate an image, it'll take
that and then write a prompt to that image model, generation model, and then it gives image,
yeah, so, and then once the image is generated,
it kind of returns to you.
So that's why a lot of times it messes up.
It messes up big time, yeah.
Yeah, because it's not doing it,
it's kind of using a tool, kind of like it's an agent,
but the tool's not as good.
But if it can do that really well,
then it can make moving pictures, therefore it can make yeah they have they have video generators now so text to
video yeah I've seen like the Will Smith eating spaghetti yeah I know it's got
really good now right really good yeah so wow and that's what you're doing no I'm
more on the video understanding I'm on yeah oh video to text yeah tech not
text to video I'm more into video to some action text. Yeah
So I'm on they call it video understanding. Yeah, and what you described the video generation
Yeah, which is a lot of the approach is a little different. Yeah
Interesting. So what is like? Alright, so you're applying for jobs now
Yeah, right and you applied to some big finance firms because you're total cop out.
If someone's listening, who might be in a position to hire someone who has your experience,
what will be your value add to whatever company they're at?
Based on you in the AI, chat, GPT field, and then your research.
So yeah, so my research focus is on video understanding, specifically what they call
vision language models. They can process videos.
And I specifically focus on applying these VLMs in the context of what they call
embodied AI, so like physical things. So
there's some limitations and technical challenges. But so that's my AI expertise. But at the same
time, I worked as a software engineer before, so seven years. So I can, I know how to basically
engineer on actual product that's production ready. And so I'm someone who can research new
approaches and then take
it all the way to production. But you can't do that by yourself. Surely you need a
team of software engineers to build certain pieces. Of course, yeah.
That's why I've worked on teams and I know how to work
within the team. I know how to lead a team and yeah that's a
plus too. So would it just make sense to apply to Google and YouTube and stuff like that?
Cuz that's the kind of feel that they're in. Yeah, of course, but you know, it's tough
I am gonna apply but it's like a you know, very competitive
Super competitive. So what's the use case for you at a bank?
for so there's so
If you think about videos, it's a series of frames, video frames, right?
And then each frame is a very, what they call like noisy data.
So you basically want to, there are a lot of pixels that are useless for understanding.
So you basically, the task is like given this long sequence of things, can you extract the
right information out of it?
Right?
Give you an example, ready?
Yeah.
If you were to take all the video feed
from an ATM machine, is this person being coerced
to take video money out of the freaking machine?
Yeah. Or not?
Peter, go ahead, build a freaking product
specifically using your abilities.
Yeah. Can you do it?
I'm sure it's technically possible,
but I think it's probably gonna be hard to get the data,
so I'll probably have to rely a lot on human psychology
and body language.
No, but if you work for, oh, okay, gotcha, gotcha.
It should be a definitive yes, give me this job.
Yes, I'm sure I can come up with a solution and then we can...
Is that something a bank would likely hire somebody for to do or is that like...
They may, it's like a security on the security side but I've been applying to more on like investment strategies.
Oh my god man, with all you, with the liberal, I I wanna do good for the world and you're applying to big banks?
So they can make more money?
Just to see what's out there.
Wow, Peter.
The hypocrisy.
We'll see what happens.
It's not like they might take me.
Like I say, I'm on the video side,
so my expertise would be most fitting
to autonomous vehicles, robotics companies, or just like
straight up video understanding like sports analytics or video generation.
But like I said, there's some crossover that could happen between my expertise and what
they needed at financial.
So we both have Teslas.
When full self-driving, did you buy it?
I've, no, I don't have it, but I've tried the trial.
Yeah.
Does it work good or?
How far are we away from it?
I think we're very, I think it depends on our appetite for mistakes made by FSD systems.
I think it's one of those cases where I think we're like
you know it's like up to 80 to 90 percent is easy and then the 10 percent
required like you the first 90 percent requires 10 percent of the effort and
the last 10 percent requires a 90 percent effort one of those things like
the power law like so yeah Pareto yeah Pareto or the power law like so I don't yeah Pareto yeah Pareto or the power law distribution whatever hey man do you kill it bro
right away Wow okay yeah nice job no no it's like it's I'm impressed man I don't
know it's just like I you read a lot about economics and business I know it's
you know little by little talking to you like you know the diddler effect yeah yeah didier didier
effect yeah the diddler effect that's right no the diddler effect yes I know
diddler effect yeah it's a French guy yeah oh yeah that's a funny one right
that's a good one yeah do you want to kind of suddenly explain it to our
listener when did we even talk about it it basically it's like a lifestyle creep
right yes yeah yeah so and the French philosopher diddler wrote a funny essay When did we even talk about it? It's basically like a lifestyle creep, right?
Yes.
Yeah.
So the French philosopher Didier wrote a funny essay about it.
I love that essay.
It's like, let me buy a nice jacket.
Oh, I gotta buy nice jeans.
He got it as a gift, remember?
Yeah, yeah.
You didn't even buy it.
No, that's right.
I got a nice jacket.
I need nice jeans.
Oh, these nice jeans can't go with a regular bag.
I gotta get a new belt. Yeah. I gotta nice jeans. Oh these nice jeans can't go with a regular bag. I can get a new belt
Yeah, I gotta buy nice sneakers, you know, and so it's like and then you're the closet looks
Grimy with your nice chest. So you gotta get a new closet and then your living room looks shit. So yeah
Didler effect. Yeah
but for me, I think I think I
Fentis I think it's in's in a good at the place where you
know I've used Waymo in San Francisco it's amazing it works it works great
really I was so impressed by it how do you know there's not a person driving it
remotely well there's no way they could manage or they can hire enough people to
remotely drive all these things how many machines are there out there I don't Well, there's no way they could manage or they could hire enough people to remotely
drive all these things.
How many machines are there out there?
I don't know, but you know, it's like a whole fleet.
It's almost like, you know, the whole Uber operation.
No, I believe they can.
There's not 30,000 vehicles out there.
I don't know the number, but I know...
There's gotta be only a couple thousand at most, and you get a couple thousand Uber drivers
to drive remotely at the thing
I believe you still can be possible like the Tesla robots are
Like control by humans. Oh
Yeah, that was like the demo thing, but I do think a very well. I mean
they were kind of deceptive and they will be probably sued for that but
Also this remote control communication that that delay and all, it's actually, it will probably
be now at this point a lot less safe than actually having the car drive itself because
of the whole delay and you know, you're not attached to a cell tower.
So then it'd probably be like Starlink or something, right?
So a little bit more reliable.
I mean, it's like, you know, DARPA can do that because they have like direct link to the drones or something
But Waymo can't afford to make that. They will take enormous amount of money based on my understanding
But they should say, you know, let's take Waymo as
face value and then they it's amazing and I think Tesla I heard the V13, the newest version is amazing too.
So I think we're at the point where like,
it can handle most of the cases,
but now it's like to get to the point of like the policy,
that the road design, our urban design,
now like what's our appetite for risk
of you know, human casualty accidents.
I think now it's into that area.
Like it's, we got, It's guiding into this government regulation.
So Waymo is fully self-driving, you think? Yeah. And there's not a human intervention on the other side.
There's no car, there's no driver. No, no, no, but there's not a remote driver somewhere else
controlling the computer with the inputs. No. I don't think so. Yeah, it's an amazing technology. I encourage the government to come up with the right regulations
We might have to change
You know our road infrastructure and stuff like that. So
Yeah, we'll see how that goes, you know, I think tech
Technologically, it'll keep improving but it's gonna be like asymptotic like it's it's never gonna be perfect. That's just impossible
Yeah, and then just
how much more effort are we willing to put in basically into this thing so
so what's the next level up from chat gpt so i think that's it like the agents you know the
chat gpt that can use tools and i think so and specific to my research I'm trying to the chat GPT has
this amazing a large language models that have this amazing capabilities
where they can you know do a lot of different things at once right like you
can do you can play chess with you it can play you can talk about wrestling
with you judo with you generate Yeah generate images, whatever, right?
Now you could talk to it and then yeah human voice that talks about you can choose
What kind of voice yeah like, you know smart concise? Yeah, man, you could be like woman
You know sharp kind of you could choose all these little ways to kind of like I haven't talked to you and you could just go
Back and forth like a conversation. It's kind kind of scary man you put a beautiful face to that I could fall in love
with that thing like the movie horror didn't it?
I saw that movie.
Yeah that was an amazing movie doesn't even need to have a face you people will fall in
love you know.
It's kind of nuts man because then these things remember things that you've asked it four
months ago that your girlfriend or wife is never gonna remember. Yeah.
You know what I mean?
It makes you feel like, yeah.
You can never, you know, what if your Chat GPT girlfriend
is mad at you and then brings up all this like old stuff?
That's true, that's a bad man, wow.
Yeah, but anyway, so then.
Do they have Chat GPT API based girlfriends?
I'm sure they are. They're already out there?
Yeah, that's going to be one of the first use cases people may.
Must be. I don't know. I haven't looked into it, but I'm sure they are.
And so that's another avenue, I guess.
You sure you don't know?
I really don't know. I wish I did.
And so that and then I think so big disability what they call the generalization capability where it can do multiple multiple tasks.
So that's very sought after in robotics.
So for example, Tesla FSD cannot play cannot play judo with you, right? So but humans can't human same human
one person can do judo can drive and all that.
So that's kind of the Holy Grail now, like can we make a robot
that can do multiple things and chat-shift Pt kind of large
language models kind of show a path forward towards that lofty goal.
We don't know when it will get there. A lot of people are working on it and I'm
kind of in that field with this video stuff, video understanding and yeah it's
that's like to be the holy grail. Yeah. Do you make any discoveries? I made some
like small discoveries.
Research papers don't typically make crazy discoveries.
What are the small discoveries in layman's terms?
So one challenge in processing,
making chat-tripity understand videos,
that's my research field,
is how we can give chat-tripity or large language models the ability to watch long videos.
So typically before, people were just focusing on clips that are seconds long, like 8 seconds, 10 seconds.
But that's not really useful, right? We wanted to be able to watch like a whole movie
or just continuously see the world in like there are minutes and hours
long right? So there are many challenges but so I basically propose a way to process long videos
in an efficient way and then one the key point was that we want to separate out the spatial things
that we want to separate out the spatial things, how where things are basically,
and temporal information,
how these objects move through time.
So if we kind of separate those out
and then give the information back to large language models,
it's able to understand long videos a little better.
So for instance
this mic's not moving that picture is not moving this TV in the back is not moving
they're fixed so block it all out and only focus on yeah so yeah that's kind of
like yeah so it's so we don't know how it well I'm just kind of like designing
the architecture so they the spatial information and temporal information
flow through different paths but intuitively yes something like that designing the architecture so that the spatial information and temporal information flow
through different paths.
But intuitively, yes, something like that would be happening under the hood.
I love how I asked you this question a million times and every single time you're like, you
wouldn't understand it if I told you.
But now that you were able to explain it in a way where probably anybody in the world
can understand it. I think... a lot Peter. Thanks a lot
I've gotten better at it. I think the my previous research is very like niche
I don't even know how to like sometimes it's hard for me to explain to researchers if unless they're in that specific field
So this is a newer paper that I just worked on over the summer
So if that one is a little easier to explain just worked on over the summer. Wow.
So that one is a little easier to explain.
How did you come up with that? You just came up with it yourself?
Dude, it was a process.
So this is why scientists are kind of like, you know, artists in a sense.
You just kind of need some inspiration.
You have intuition on what could work and you read what others have done and
then you just kind of try a bunch of stuff out. Did my teachings in Judo have anything to do with
any of the inspirations? Yeah, I mean it's some resilience. I actually like what you say about
like only caring about things that you can control you
know because a lot of times in in the research there are so many factors that
are out of your control so you just have to kind of distill down what you can do
I always try to keep that in mind you think this is so long that we should do
this in two parts this episode I think we've used to do one hour right I mean
we can just put it on it's good I wouldn't like cut video stuff into it so it's kind of more visually
engaging you know yeah yeah yeah and then this is such a cool topic like what
is chat GPT like you know yeah what a good what imagine we rank that high on
that video and it becomes one of the things yeah that'd be really cool right
yeah we'll see how it does I I mean, I so I do apologize.
Like I it's hard for me to kind of it's explaining things is very hard.
I think once you dumb grapplers.
No, not even that.
I think I just like in the beginning, like I just couldn't really like adjust what you
what you are more curious about.
Because I thought maybe my initial thought was that maybe
you'll be more curious about the inner workings of it,
how this actually learns.
But maybe that was not really that useful for me.
I think now that we've spoken about it,
the base level stuff and people kind of get into it,
that could be another episode.
But yeah, AI, really interesting stuff.
Thank you, Peter, for your expertise.
And we'll thank our sponsor.
We already thanked Drew.
Thank you, Drew, Hakuin.ai for your turn needs.
And Jason and Levan.
Thank you again, our staff as supporters.
Yeah.
Fujisports.com, judotv.com, hengashi brand.com.
Yup. Thank you very much guys.
And we'll see you guys in the next episode.
Let us know in the comments if Peter is talking to me in a condescending tone.
You're saying that, man.
I'm half kidding.
Alright. See you guys. Bye. Bye.