ACM ByteCast - Edward Y. Chang - Episode 50
Episode Date: March 20, 2024In this episode of ACM ByteCast, Rashmi Mohan hosts 2021 ACM Fellow Edward Y. Chang, an Adjunct Professor in the Department of Computer Science at Stanford University. Prior to this role, he was a Dir...ector of Google Research and President of HTC Healthcare, among other roles. He is the Founder and CTO of Ally.ai, an organization making groundbreaking moves in the field using Generative AI technologies in various applications, most notably healthcare, sales planning, and corporate finance. He’s an accomplished author of multiple books and highly cited papers whose many awards and recognitions include the Google Innovation Award, IEEE Fellow, Tricorder XPRIZE, and the Presidential Award of Taiwan. Edward also also credited as the inventor of the digital video recorder (DVR), which replaced the traditional tape-based VCR in 1999 and introduced interactive features for streaming videos. Edward, who was born in Taipei, discusses his career, from studying Operations Research at UC Berkeley to graduate work at Stanford University, where his classmates included the co-founders of Google and where his PhD dissertation focused on on a video streaming network that became DVR. Later, at Google, he worked on developing the data-centric approach to machine learning, and led development of parallel versions of commonly used ML algorithms that could handle large datasets, with the goal of improving the ML infrastructure accuracy to power Google’s multiple functions. He also shares his work at HTC in Taipei, which focused on healthcare projects, such as using VR technology to scan a patient’s brain; as well as his current interest, studying AI and consciousness. He talks about the challenges he’s currently facing in developing bleeding edge technologies at Ally.ai and addresses a fundamental question about the role of human in a future AI landscape.
Transcript
Discussion (0)
This is ACM ByteCast, a podcast series from the Association for Computing Machinery,
the world's largest educational and scientific computing society.
We talk to researchers, practitioners, and innovators
who are at the intersection of computing research and practice.
They share their experiences, the lessons they've learned,
and their own visions for the future of computing.
I am your host, Rashmi Mohan.
If you've been using the popular generative AI tools in the market only to craft perfect
emails or grammar check your documents, you're selling yourself short.
Having been in this field for much longer than the recent popular wave of interest,
our next guest can tell us a thing or two about large-language models and their applications.
Professor Edward Y. Chang is an adjunct professor in the Department of Computer Science at Stanford
University since 2019. Edward has been a director at Google Research in the past, and President of HTC Healthcare, amongst many other roles.
He is the founder and CTO of Ally.ai, an organization that is making groundbreaking
moves in the field, using generative AI technologies in various applications,
most notably healthcare, sales planning, and corporate finance. An accomplished author of
multiple books and many highly cited
papers, he has also won numerous awards, including the Google Innovation Award, the coveted X-Prize
Tricorder, and the Presidential Award of Taiwan for his work. He is a fellow of ACM and IEEE,
recognized for his work and contributions to scalable machine learning and healthcare.
We are so lucky to have the opportunity to speak with him.
Edward, welcome to ACM ByteCast.
Thank you, Rajmi, for your invitation.
It's my great pleasure to join the podcast.
Likewise, we're really excited to speak with you.
I'd love to lead with a simple question that I ask all my guests. Edward, if you could please introduce yourself and talk about what you currently do, as well as give us some insight into what drew you into the field
of computer science. Okay. I was born in Taipei and came to the US to attend college. And I first
attended UC Berkeley, majoring in operations research, which is kind of optimization techniques.
And then I worked in a software company intrigued by
programming. And then I consider my background was insufficient. So I went back to school and
joined Stanford to receive my MS and PhD. And the timing was perfect. When I was in Stanford,
I was classmates of Larry and Sergey, and they are founders of Google. And after I spent about seven years in UC Santa
Barbara, received my tenure, then I joined Google. And because they have so many machines,
can do parallel processing. And at the time, I worked with Fei-Fei Li to annotate ImageNet.
And once we have received so many kind of images and my lab started to parallelizing some mission critical machine learning algorithms, including sub-effector machines and LDN and so on and so forth.
And it has been a very exciting journey.
And recently I started to study consciousness because as Yusha Banjo mentioned about four or five years ago, and the current AI pretty much focuses on computation,
which is modeling humans' unconsciousness.
To be able to do reasoning and planning,
we have to think about how to model human consciousness.
And GAI seems to attend,
kind of get into the realm of human consciousness.
So this is really a very exciting era.
That's great.
I mean, I think our audience would be super excited to hear about this topic.
I don't think we've covered this in great detail, and it's so relevant in today's day
and age.
But I want to go back a little bit, Edward, to what actually was the most driving force
for you to pick computer science, even in your undergraduate education?
Were you exposed to it when you were younger?
I think initially I was just receiving a job, so I needed to start coding.
So that was pretty straightforward.
And I really got so interested in coding because computing is really the foundation of science.
Many different sciences, we need to collect a lot of data, analyzing the data, and to be able to get some insights. So information processing is a big application
which drives me to dive even deeper into computer science methodologies.
Yeah, I mean, I hear you. I think many of us at the time when we probably picked computer science,
early exposure and definitely the excitement of an up-and-coming field that had a lot of jobs
was the motivator to sort of get into it. But it sounds like you found a lot of very interesting areas to delve deeper.
As I was reading about your previous work, one of your very, very early innovations,
I was very surprised to read was you're credited with inventing the DVR,
the digital video recorder, which really transformed the way in which,
you know, we created and saved content.
So I would love to hear more about that phase of your work.
Okay.
When I was in the PhD program at Stanford,
my advisor, Hector Garcia-Marina,
asked me to work on infrastructure very similar to Netflix,
pretty much just a streaming video.
And this was about 1995.
And the bottleneck is not really on the server side.
The bottleneck is really the internet to the home, the last mile.
At the time, 1995, and we are kind of accessing internet using telephone line.
So even I have developed this kind of media server technologies.
It was not practical at the time.
So the second part of my thesis, and we say, well,
if we really cannot do kind of real-time information streaming, can we buffer some data
in the local devices? And the best devices to do buffering would be the disk, hard disk, right?
So if we buffer some information already on the disk with some initial latency, we can do streaming at home.
And for real-time TVs, we can pause the TV and the TV program can save on local disk.
And then we come back to resume and we can fast forward. So that was a very simple idea.
And the implementation itself was not extremely hard. And basically then we kind of replacing this disk technology using disk instead of tape
to revolutionize the VCR technologies.
And Professor Patrick Harahan was also a major pusher behind this situation because I took
his course and he encouraged us to come up with this kind of interesting devices to help
the world.
At the time, there are multiple companies, participants come to Stanford to see the demo.
And one of the visitors later started a company.
I think people probably still remember TiVo was started after about two years after I published my paper.
Yeah, no, of course. I certainly remember TiVo and what a revolution that was.
Which is also interesting, Edward, is that as somebody who was in academia and, you know, working on your thesis and your PhD program, did they just
take your paper and kind of like, you know, run with it at that point, think about how you could,
they could productionize it and make it into a product. Were you involved in that process at all?
Did you have to collaborate with companies outside that were trying to make a product out of your
idea? I think unfortunately at the time, because we don't have the sense of following a pattern,
and I think those founders just took the idea
and they started the company.
So I really didn't get heavily involved
in the development of the product.
But subsequently, after I joined UC Santa Barbara
as a faculty member,
Sony was very interested in collaborating.
They planned to enhance this VCR only supporting one TV
to be able to support multiple TVs at home.
And we work on a prototype to be able to support
up to 20 devices at home.
But I consider because the device is very cheap
and end up all the family today,
they only have kind of every TV,
they have one digital VCR.
But technologies wise, actually one VCR can support 20 TVs.
So that's the situation.
Then after that, I think the internet bandwidth becomes really, really high, getting increased
faster and faster.
And this bandwidth problem gets resolved and the research issue is no longer changing.
So then I switch my focus to machine learning.
Got it.
I was going to ask you, how did that transition happen?
So the next phase of your career was your work that you did at UC Santa Barbara and
then at Google.
Is that right?
Yes.
When I joined Santa Barbara, I said, well, to pursue my tenure, I need to have a very
exciting research topic.
And digital VCR definitely is something in
the past. And I say, well, maybe using machine learning to identify photos or kind of processing
video will be interesting. And after working on the application for some time, then I consider
I really need to get into machine learning because that's the foundation of object detection and
object recognition.
And then I started working on the topic for some time.
As I mentioned, I knew Fafeli pretty well.
And when I joined Google, I collaborated with Fafeli and sponsored Fafeli about $250,000 to sponsor the project.
The reason I joined Google at the time was with so much data, right?
In university, you just have no machine,
no devices, no resources to process those data. And Google has so many machines, so many CPUs.
At the time, we haven't started using GPU. So with the MapReduce, and MapReduce ran on so many GPU
at the same time. So I joined Google to start develop parallel machine learning algorithms.
And that was a transition in about 2006. And between 2006 and 2012, my major focus was making
those mission-critical machine learning algorithms to be able to run maverick deals
on Google infrastructure. Got it. What better place? And at least at that time, the scale of data as
well as the infrastructure that Google had could not be rivaled anywhere else. So was your primary
focus at that point improved performance of the machine learning algorithms? Is that what you
were sort of focused on or was it accuracy? I'm trying to understand what are the sort of key
problems that you were trying to solve? Yeah, my focus was to improve the machine learning infrastructure accuracy
to try to power Google's different applications.
And myself focus working on the Q&A system.
So we need to do semantic parsing, natural language processing,
natural language understanding.
So with a robust machine learning algorithm will be extremely helpful.
But at the time, this is about 2008,
most of the algorithms Google employed, they use a linear algorithm or sublinear algorithm,
which means they only want to process the data once. And my colleague told me,
you don't want to use a much more complex algorithm. Like, suppose by the machine,
the computation complexity is n squared. And n squared
means if you can process 1 billion training instances in one second, and n squared means
now you need to spend 1 billion seconds to process all the data. And you just cannot do that in
Google. Google has a lot of data. So when I started to do a parallel machine learning on this
quadratic machine learning algorithm, my colleague actually advised me not to do it because they say, well, you cannot work on something which is very time consuming.
But really, machine learning with big data was a trend.
So eventually, AliceNet was extremely successful.
Then people say, well, the accuracy is so drastically improved. So now we are willing to put into a lot of money
to kind of parallel our computation to improve accuracy.
And the GPU then was started to be utilized
and the cost was not as high as using a lot of CPUs
and the entire field of using a CPU to process big data
just took off around year 2014.
And if you look at NVIDIA's stock price,
and you could see after the ImageNet was published,
AlexNet was very successful.
Then about two years after that,
and NVIDIA's stock was increased by 10 folds because of this paradigm shift.
Yeah, no, absolutely.
I've been on a rocket ship since.
But it's amazing that you were literally
at the inception of this whole transformation
and the use of a GPU for processing.
It's pretty fascinating that you were there.
What was the next transition like for you, Edward?
What took you to your next role after Google?
Yeah, the next role was,
I consider it was a time for me to maybe contribute
to my birthplace. So I said, well, maybe I should go back to Taipei and try to educate or maybe
mentor the local students, youngsters. So also at the same time, can you improve the infrastructures
of Taiwan's computation? So I went back to Taipei to join HTC.
At the time, HTC was a very good cell phone manufacturing company. I recall HTC was a
manufacturer of the Pixel, maybe Pixel 2 or Pixel 3. Of course, later the competition becomes
extremely tough. HTC was no longer competitive and we sold our entire cell phone division to Google.
So during the time when I was at HTC, initially, I contributed to these mobile phone applications.
One notable application, even today, iPhone still doesn't have that application, was the 360-degree kind of panorama. So you just take 20 some photos,
you capture this entire sphere of 3D.
And that innovation was we use some sensors on the camera
to try to capture the movement of the cell phone.
And then we instruct the user the direction
they need to point to, to take picture.
So this way, when they take a spherical picture, there wouldn't be holes in the middle, right?
Because we know exactly which direction the photo has been taken, and we know exactly where the information needs to be acquired.
So with about 20-some shots directed by gyroscope and a thermometer, we can capture precisely the entire sphere.
And that was extremely well done. And the post-hoc has not featured today,
but because we found a patent, so iPhone, we haven't seen that.
And then I moved on to work on healthcare. The motivation of healthcare was Taipei is really,
Taiwan has a very good healthcare system. And in about 30 years,
they have collected so many medical records.
So as we already learned at the time,
with big data, right,
you can really improve the accuracy of many things.
And healthcare diagnosis is one application
that later I focused working on.
And that was my major focus on HTC during the second half of my tenure over there.
And then we can probably discuss about the IoT devices I worked on at the time.
And then we entered a competition called Tricorder and won the second place in the world.
Wow, that's amazing.
I mean, I like how your work around image processing and
machine learning led you to HTC. I mean, and of course, the motivation sort of to do more in Taiwan,
but really led you to improving the quality of the camera and pictures that an HTC phone could take.
But also the transition to healthcare was a very interesting one, driven mostly by what you're
saying,
the record-keeping in the overall healthcare system in Taiwan. That sounds fascinating.
So moving into healthcare, what was the work like when you were, I mean, did you do more healthcare-related work while at HCC? Or by then, had you sort of started to think about doing
things on your own or getting back into academia? Yeah, I started the healthcare project by an accident because at the time, a professor at Harvard University, and he would
like to join the competition hosted by S-Prize. S-Prize is a foundation that encourages blue sky
innovation. So some well-known competition they hosted early days was self-driving vehicle.
And the latest competition was sending human to Mars or sending robots to Mars.
So in about 2010, and they started a project saying, well, you know, there are a lot of
remote areas.
They are lacking medical devices and the doctors cannot do precise diagnosis.
So can you put together a device which is very, very light
in weight? So their constraint to us is like five pounds. And with five pounds, you need to be able
to detect or diagnose about 15 diseases, including HIV, liver problems, and diabetes, and those kinds
of diseases. And the challenge at the time, of course, the weight is a big problem. And the second
challenge is if you want to do those kinds of diagnoses at home, definitely you need to have
some machine learning algorithms and to collect data to do supervised learning, and you can do
classification in a remote area. And once you have done the classification, the data can be sent to
the cloud. And a doctor, let's say, do a remote diagnosis on a patient in Africa, and the data can be sent to the cloud. And a doctor, let's say, do a remote diagnosis on a patient in Africa
and the data,
after once the diagnosis has completed,
the data can be sent to the cloud.
Any doctor in the world
can review the diagnosis
and assess the quality.
So we consider that
a very kind of formality kind of project.
So we started working on that.
And since at HTC,
we are really good in making devices light, right?
Like you make a cell phone very light.
So we have the edge in putting devices together.
And also because of my machine learning background
and this Harvard professor, Professor Pan,
was delighted to work with us.
So he kind of built a consortium
with all the hospitals in Taiwan.
So we get a lot of data. And on my side, I focus on machine learning and also device manufacturing.
And at the end, although we won the second prize, and we consider actually we should have won the
first prize. And the reason we won the second prize was the first prize winner, they came from
a family of five. And they use a very rudimental kind of
devices and the methods. They came up with the tricorder at home using on their dinner table,
using 3D printers. And we spent so much money and effort. So I actually raised a lot of money
from Taiwan government. So at the end, the foundation may be saying, well, this is Goliath fighting with David.
So that doesn't make any sense.
So they couldn't give us the first prize.
But anyway, the whole process, we learned a lot of good experience.
And that paved my way back to academia.
We had a good collaboration with UC Berkeley at the time.
And HTC, in about 2020, started working on virtual reality. So using virtual
reality, we can scan a patient's brain. And a surgeon before the surgery can fly into the brain
to see the detailed structures. And you know, the brain surgery or any kind of surgery, a surgeon
wants to remove tumors, at the same time wants to keep the benign tissues intact.
And in the past, without this virtual reality visualization, a surgeon needs to kind of imagine what a 3D structure is like by taking a look at this kind of 2D MRI images.
And oftentimes, they could make some mistakes or suboptimal surgery planning.
And suboptimal means you have a path you go in,
but you have to destroy some benign neurons.
So after surgery, the patient may be malfunctioning in speech
or other functions.
We definitely want to avoid that.
So that was a kind of collaboration with Berkeley
and with the virtual reality.
And Stanford invited me to host a panel.
And eventually I moved to Stanford
to start my adjunct professorship.
Oh, amazing.
I mean, I love the story
that you talked about, David and Goliath.
Congratulations on the Tricorder victory
because it sounds like an amazing innovation.
And I completely hear you, right?
I mean, when you talk about somebody
who has built a very homegrown solution
to a problem ingrown solution to a
problem in comparison to like a corporation that does it, I can see how that vote may have swayed,
but it sounds amazing. So going into a little bit more about the device that you built as a part of
the tricorder, I mean, was that used for more commercial usage as well, or is it mostly just
a POC? I think at the time it was a POC, but the challenge to us
to kind of make it commercialized was twofold. One was we have to go through FDA, right? Every
single device we have to go through FDA to get approval. And the second interesting thing is
whenever we have a software update, like say on our cell phone, we can just upload a new version
without any trouble. But in medicine,
you cannot do that. For every new version, either a minor revision, again, you have to go to FDA.
So this entire process is really not scalable. So we worked with FDA to try to have this kind
of regulation changed, but it was very, very slow. So we couldn't commercialize in time.
But at the time, the foundation of Gates, Bill Gates Foundation,
and also from India, we have some colleagues, they say, well, India FDA was not as strict.
And so therefore, we actually transferred the technologies to other entities. And they started
to working on those devices. I know China has a company called iHealth. They also have this kind of healthcare IOTs and try to do diagnosis and try to do a kind
of a treatment in the rural areas.
Got it.
Yeah, no, and that's great.
I mean, and certainly, you know, I know countries like India obviously need it as well because
there is a lot of rural population and access to healthcare is a challenge.
So I can imagine that this would be widely used and appreciated.
So yeah, it's great that you were able
to find a home for it,
even if the FEA process was cumbersome.
ACM ByteCast is available on Apple Podcasts,
Google Podcasts, Podbean, Spotify, Stitcher, and TuneIn.
If you're enjoying this episode,
please subscribe and leave us a review on your favorite platform. But your part, which is almost like the next phase of your career when you were back at Stanford and back in academia,
is that when you started to sort of get back or into looking at artificial intelligence and LLMs?
Yes. In the beginning, we didn't really pay attention about LLM, right? We work on
natural language processing kind of technologies, but always we run into a lot of corner cases
because during semantic parsing or language understanding, even we have a perfect model,
often we encounter exceptions, right? So let's say we have an airline reservation system,
but the customer may say something we didn't expect it.
Then the robot just cannot continue.
So we always run into limitations until GBT-3 launched last year.
And we say, wow, GBT-3 functions much better.
And since at the same time I was working on this conscious modeling and GBT-3 was able to do reasoning, to me, it was a really remarkable situation.
And then I look at GPT-3 and now GPT-4, and we know a lot of people saying, well,
they still have some limitations, especially the model has some biases because of training data.
Suppose all the training data we input is from CNN instead of Fox News? Of course, the answer would be probably
tilt to the left, right? So the model has inherited biases because of training data. That has to be
addressed. And the second issue people understand is this hallucination. Hallucination can be
randomly or could be kind of non-logical expression. So how to mitigate those?
And I think a lot of method has been devised
like chain of thought, tree of thought.
But I think we came up with an interesting breakthrough.
We named the platform Soccer Thing.
The idea was initially very simple.
We say, well, LLM is so powerful, so knowledgeable.
And also its knowledge representation is polydisciplinary.
So when we import data into LLM training, we don't say this is a physics kind of book and that's biology.
We don't.
When LLM was trained, they have no boundary of knowledge, right?
They just put everything together.
So even today we ask the question, it's about computer science.
LLM doesn't know this is a computer science problem. It just synthesizes the answers to answer
our question. So in this kind of representation without disciplinary boundary or we call
polydisciplinary, it actually can synthesize new knowledge. It can synthesize something we call unknown unknowns. Suppose we agree LLM can
synthesize unknown unknowns. That is a big problem because human's knowledge is limited. If we don't
know, we don't know. How can we even ask questions? So I use an analogy to describe a situation.
Suppose I'm a 10-year-old kid. I go to this Nobel Laureate award ceremony,
and there are 1,000 Nobel Laureates sitting there. If I'm a person asking questions in
front of this panel, this means I can ask any interesting questions and get insightful answers.
So the solution is a following. Human is not qualified to ask questions. If we want to get insightful information, we can only be a
moderator. We put a subject matter on the table, then ask Nobel laureates. They can do a debate.
They can do discussion. We are just sitting there to listen. So that's a key insight we obtain.
We say, well, you get hallucination. Yeah, maybe the algorithm may not be robust,
but most of the time we ask the wrong questions or our question, the context we provide to LLM
was not precise enough. So therefore we didn't get good answers. So that was the motivation and end
up we have four algorithms together with this kind of debating setting, I think we make really
interesting progress.
My gosh, that sounds absolutely fascinating.
I think the part that was particularly interesting to me is when you're talking about the boundaries,
the interdisciplinary work that, you know, there are areas because of the fact that LLMs
are not classifying the content into specific human-defined subjects, if you will, there is
no boundary in terms of how you can bring two concepts together. Versus as humans, we may think
of something, like you said, classified as biology or physics or psychology. But really,
the areas of intersection that we don't expect at all is something that the system that you're
building can uncover for us. It kind of blows my mind. So in terms of Socrates itself, Edward, what stage of the product or the idea are you at now?
And what is the role of human beings in this process?
Yeah, it's already kind of ready to be utilized by various applications. In fact,
I'm consulting multiple companies. It has been used in healthcare,
used in sales planning, and also investment banking. So give a very quick example. Suppose
we want to diagnose disease, and we can actually get a lot of ground truth from US CDC. So they
have this mapping of symptoms and then diseases, right? But it's interesting when we input a set of symptoms into BART and also
GPT at the same time, we say, okay, can you diagnose this patient with the following symptoms?
They actually came up with different answers. And I said, well, you can have a debate, right? Why
you come up with these answers and why your predictions are different from each other.
So they actually provided their justifications.
They also say, well, because the information you provided, the symptoms, they are not sufficient.
So you have to ask more questions.
And finally, they say you have to conduct certain lab tests like blood tests and so on and so forth.
So that was interesting in results I obtained because whatever we obtained from CDC was called ground
truth. But the ground truth, actually, they have mistakes. And this is really tied into this very
good paper published by John Hopkins' physician last year. Analysts were saying in the US,
about 5% of diagnosis is erroneous or misdiagnosis. And this is a huge problem,
not only liability, but human's health.
And so you cannot take CDC's data as ground truth. There are some errors in there. And I
have done this AI research in healthcare for 10 years. I always treated those data as ground
truth. Now I open my eyes and I say, oh, if I input to Sogathin, allow agents to debate,
they will have new answers and tell me the previous diagnosis may be wrong.
And human role in this situation is zero, just a moderator.
Because we are so limited and if people consider we are smart, it will be counterproductive.
And Dmitri is the CEO of Dmyte and gave us three examples saying humans should get out of the way.
The first example is AlphaGo.
You know AlphaGo compared to AlphaGo Zero?
AlphaGo Zero is scared of all the human experiences.
AlphaGo Zero wins over AlphaGo.
Another example is AlphaFold, protein folding.
AlphaFold 1 uses human heuristic to build a model in the middle.
And AlphaFold 2, DeMille is saying, oh, let's get rid of human heuristic to build a model in the middle. And AlphaFold2, they mean saying, oh, let's get rid of human heuristic. And the score of AlphaFold is much higher than AlphaFold1. It's like AlphaFold
scored something like 50, AlphaFold2 scored like 90 something. And the last example is
self-driving vehicle. If you have human knowledge, like put a map in the middle,
try to instruct the driver how to drive. No, it's not going to be very effective
because human sensors and human heuristics always encounter exceptions. So that tester will say,
no, forget about human in the loop. We just do end-to-end training. So they got a much better,
much more effective self-driving algorithm. So in short, human, because we are limited in knowledge, even when we have one or two PhDs, we cannot compete with LLM, which has multiple PhDs at the same time synthesized into this kind of polydiscipline representation.
Therefore, the treaty of Sarkozy is human can only be a moderator.
You'd be kind of in a very passive role.
We can evaluate their reasonableness, their logic, whatever, but we better don't contribute our ideas.
That's kind of very sad, but that's what we have learned so far.
Wow. Okay. So many pieces in there that I want to dig into a little bit more, right?
Let's start with the simplest. simplest is, have you found that, and I don't know how you would evaluate this, but the precision in terms of the diagnosis is better by using a system such as Sucrasan?
Yes, that's true. So I'll just give a quick example. So let's say 14 symptoms, one is headache,
the other one is kind of fever, right? And then GPT saying, well, you should ask additional
question like, do you have those two symptoms happening simultaneously? And that I
have never thought about. And also they say, well, you should ask, is your headache kind of periodic
or only happens once a day or whatever? And is your headache getting better or getting worse?
So all those kind of refined questions, most of the physicians didn't even think of. They just
said, do you have headache? Do you have fever? you have a runny nose? But the machine even asks for correlation, timing, duration, severity, all those details.
So it's interesting. Really, I mean, during the process, I was impressed. This is just one
example. There are many other examples, like sales planning and investment banking. Investment
banking, suppose you say, oh, I want to invest in this company. I want to buy stock. And it has this SEC, they're filing the petition. You want to say whether
their petition is accurate or not. And based on that, you want to make a judgment whether
invest or not to invest. And we really get a lot of insights from this circus and debate
process.
In the monologue kind of Q&A, you may not be able to get good answers because during the monologue discussion, the system like GBD will give you the default biases, the default biases of the model. take positive position. You force another agent to take negative position.
You are forcing them to be biased according to your will,
rather than taking the model's bias.
Who push into positive and push into the negative,
then we have very intelligent kind of modulate
with contentiousness.
And eventually after they kind of debate for a while,
they reduce their contentiousness,
come up with some compromise conclusions.
And that conclusion can provide much more insights than the default monologue kind of
Q&A session.
Yeah, no, that was going to be my exact next question, which is inherently, it sounds like
a system like this would remove bias or at least reduce bias in a significant way.
And that's kind of what you were just talking about.
Yes, it also reduces hallucination because you say,
during the debate process, the two agents will keep on arguing with each other.
During the argument, your QA is extremely focused.
The QA pair will be formulated by the agent, not by the human.
When human form a question, it can be very fussy.
And the second thing is,
when you started to do this debate on and on,
after rounds and rounds,
the contextual information is improved.
The context is getting better and better and better.
So the debater on the two sides
can delve into deeper and deeper, deeper insights.
So therefore, hallucination has no room to exist.
Also, I have a treaty. I say, well, do you ever have the same nightmare twice,
exactly the same nightmare twice? The caution is probably not, right? So if you don't have
the same hallucination twice, it means you will not have the same bad argument twice
generated by LLM. And then after this, the refutation debate process, all the hallucinations will be disappeared.
Got it.
It sounds like a perfect system, Edward.
I'm just curious.
What are the risks?
What are the downsides?
What are the challenges that you're facing?
Yeah, right now we are doing evaluation because every round of debate, we want to make sure
the quality is very good.
And luckily, we use the Socratic method. Socratic method has been there for 2,000 years, but strangely, people don't use
it. And we use the Socratic method to evaluate every argument's reasonableness. Here, we say
reasonableness is basically evaluating its logic. And we face a very interesting problem. A lot of
people are saying, no, I want to evaluate whether it's fact or it's truth. And unfortunately,
after doing the research for some time, I actually considered there's no facts in the world. I mean,
the same event happens in somewhere in the world, right? If you look at different newspapers,
the stories, narratives can be different. So there's no way I can know the facts unless I go to the news location to eyewitness
what happens. Even so, maybe I won't be able to see the whole thing. I couldn't understand the
causality, for example. I cannot tell who is right and who is wrong. So then I say the only
thing we can do is to evaluate reasonableness, the argument's quality. I think it turned out to be
reasonable. So we have GBT and bar, for example, doing debate. Then we have kind of inferior LLMs
that do evaluation. Because when you do evaluation, you don't need to have knowledge too
much. You just need to make sure the logic is extremely tight. And we published a paper,
we showed the evaluation is quite consistent.
So we are pretty happy about
not only generating content,
also we are able to evaluate
the quality of the content.
Of course, there are much work
to be done in the future.
Yeah, yeah.
No, it sounds great though.
Is this work that you also
sort of push forward via Ally,
the company that you are the CTO at?
Yes. A company came to look for me because they considered the idea to be intriguing. Actually, it took them some time
to really understand the powerfulness of the soccer swing system. There's a great potential
to apply to many areas. Got it. I understand. One of the things that I was reading was that
there's a program called Stanford Oval, I think. Yeah, the program was
established by Professor Monica Lam, who was my partner when I even I was back in HTC, and she
would visit our company for collaborations. And when I joined Stanford at the beginning,
because Monica was interested in healthcare, so we see a synergy there. So I joined her lab. So yeah, I think right now we are
taking different approaches to address the problem of semantic parsing. So we had the same mission
and we try to address semantic parsing quality, but we are using different methods to address
the challenges. Got it. It's absolutely fascinating work. I'm just curious, what's next
as you sort of make progress? What are the next set of goals that you have?
On the side, I know you asked me about my hobbies. I write a lot of poetry and I also generate a lot
of art and taking a lot of photographs. And so I'm actually using GPT currently with the daily to create some interesting art pieces.
And I just published a poetry book.
And so this multi-agent scenario, right?
We talk about soccer scenes, they can debate.
You can have three or four agents and each agent is a persona.
And you can ask them to have a dialogue and you can create a fiction, right?
Create a novel out of this kind of
focusing platform. And so I say, well, maybe after my scientific endeavor, when the system become
more mature, and I'm getting into writing a novel using the platform. At the same time,
when you start working on different applications, and we will discover additional research challenges
need to be resolved.
And the final grand challenge is,
if we really consider a polydisciplinary kind of representation,
have new insights into the knowledge, if we know LLM may have some unknown unknowns humans cannot tap into,
then my final goal is to be able to,
even if I don't know how to ask
questions. Will there be a trick
which I can trick LLM
to tell me something which
I don't know. Maybe I don't
even understand. And he can
teach me to explore
some unknown unknowns. And that
will be really remarkable.
This is maybe changing the world
of research.
That's my final dream.
Yeah, it truly is.
I do have to ask though,
I mean, because often
the most colloquial question is,
oh, AI is going to
take over the world
and take over all our jobs.
And, you know,
in a lot of the way
you describe the product,
if, you know,
the human is out of the loop,
how do you see
human involvement
in products
and systems like this in moving
forward? What kind of role would we play or what kind of jobs would we have?
So a human can probably be only the moderator, right? But a moderator, a more skillful moderator
or a more knowledgeable moderator can get much better results and much better insights.
And so humans still need to improve our own knowledge. And hopefully because we can work with LLM,
so our knowledge acquisition can be much more efficient.
So hopefully in the future, we will see some new careers
or new pathways to be able to support our own livings.
In the short term, for example, data science, computer science, some new careers or new pathways to be able to support our own livings.
In the short term, for example, data science, computer science,
AI really can do a lot of work to replace human beings.
So in the short term, it can be pessimistic.
But in the long run, there should be some new applications,
some usages, and the human can be employed.
Yeah, for sure. Like you said, areas that we've not thought of combining or bringing together, maybe those will be uncovered through these
systems and give opportunities for us to kind of explore new jobs and new careers and passions.
It's fascinating, can be unnerving at times, but definitely very exciting. It's amazing that the
work that you're doing in this field is going to uncover some of that. But speaking about passions, I spent some time on your YouTube channel, Edwin.
I was fascinated with your photography work and you were just talking about it.
I saw the one with the bald eagle and the snake and it was really quite something.
So I was just wondering, what are your other hobbies and passions?
You spoke about poetry, which is quite unique.
I've never heard of a computer science researcher also being interested in poetry. I haven't come across anybody. So what else do you do? Yeah, poetry is really my biggest
subject. And since I was a child, I was extremely interested in literature and philosophy. And
poetry is a very good kind of platform for a busy person to write thoughts, because fiction would
take a long time. And poetry typically is much shorter than the fiction.
So for me, I think this is a much better kind of media compared with other medias.
Amazingly, with GPT, and GPT can help me to write even better poetry,
for example, collecting historical facts,
and maybe sometimes can help me to have better rhyming, better choices of words.
So this is really kind of interesting situation.
Very cool.
Did I also see that you, did you climb EBC?
Were you at the Everest Base Camp?
Almost two years ago, yes.
That was a very good experience.
And because my oxygen level was deprived, that was when I was studying consciousness
very, very intensively. And when
the oxygen level gets deprived, you are pretty much in an unconscious situation. And that experience
was interesting. Schrodinger once said, well, a lot of situations we are unconscious. Even let's
say our vision, right? We look at a person, we pay attention to some object we are focusing on, but our peripheral vision still is processing data,
but it's under our unconsciousness.
But if our peripheral vision suddenly sensing a car driving toward us,
then there will be a quantum jump to elevate that event from our unconsciousness
to consciousness, right?
So just like Einstein has always said,
innovation doesn't come from consciousness.
Innovation comes from your preparation in consciousness
and push all the information into unconsciousness.
And one day when you are whistling on the Everest,
your unconscious suddenly something pop up.
And then you say, oh, I realize what you just told me from unconsciousness.
So a lot of people have that kind of experience.
You cannot wheel into innovation.
You can wheel into preparation.
And when the time is right, unconsciousness will tell you innovation has arrived.
That's such a great way for us to sort of bring this interview to an end.
I mean, I love the analogy that you drew there. But for our final bite, Edward, what are you most
excited about in this field of using generative AI and the work that you're doing with Sucresyn
and healthcare over the next five years? Yeah, so far, very critically, the OEM has
improved my productivity by 10 times. And that
means the next five years will be equivalent to 50 years of my previous years, previous endeavor.
So this means I cannot squander any minute of my day because it's become 10 times more precious.
And I'm going to move forward to continue polishing soccer things. Interestingly enough, I was writing a scrapper to scrape information for my investment banking
company.
And of course, soccer thing and also JetChat GPT helped me to write code be very effective.
But then I say, okay, now you scrap my soccer thing site, getting all my papers and GPT
tell me what are the major shortcomings of my current algorithm or systems.
And remarkably, GPT gave me three new assignments, new insights,
which I will be working on in the next few years.
Wonderful. Well, thank you so much for taking the time to speak with us. It's been a wonderful
conversation, Edward. Thank you for speaking with us at ACM Bypass.
Thank you. Bye-bye.
Bye-bye.
ACM ByteCast is a production of the Association for Computing Machinery's Practitioners Board.
To learn more about ACM and its activities, visit acm.org. For more information about this and other episodes, please visit our website at learning.acm.org.
That's learning.acm.org.