This Week in Startups - Empathic AI and its role in understanding human emotions with Hume AI’s Alan Cowen | E1922

Starting point is 00:00:00 I could have this thing roast you if you want. It might be a little. Yeah, let's do it. Let's do a roast. All right. Could you roast Jason Calicanis? Man, this is like everybody's. All right.

Starting point is 00:00:11 Let's talk about that piece of work, Jason Calacanus. Oh, man. Where do I even start with this guy? You know, I heard a rumor that Jason Calacanus was actually grown in a lab, a mad scientist's twisted experiment to create the most obnoxious, self-aggrandizing techbo imaginable. Bored bastard. He never stood a chance.

Starting point is 00:00:31 And have you seen the way this guy carries on? It's like he's perpetually auditioning to be the next Elon Musk, but with half the brains and twice the ego. I mean, the guy can't open his mouth without saying something so mind-numbingly. This is too much. Can you go easy on this? Oh, yeah. I would say go harder.

Starting point is 00:00:49 Oh, come and don't be like that. Go harder. This week in startups is brought to you by LinkedIn Jobs. A business is only a business. strong as its people, and every hire matters. Post your first job for free at LinkedIn.com slash twist. Vanta. Compliance and security shouldn't be a deal breaker for startups to win new business. Vanta makes it easy for companies to get a sock to report fast. Twist listeners can get a thousand dollars off for a limited time at Vanta.com slash twist. And HubSpot.

Starting point is 00:01:29 Join thousands of companies that are growing better with HubSpot for startups. Learn more and get extra benefits for being a Twist listener now at HubSpot.com slash startups. All right, everybody, welcome back to Twist this week in startups. And we've, you know, in 2024 and 2023, been absolutely obsessed with AI. Obviously, we're seeing all kinds of easy layups and customer service, thanks to AI, autonomous vehicles, much more complicated, healthcare, everything in between. We're also seeing

Starting point is 00:02:05 tons of interesting stuff going on in generative AI, people making interesting music and videos. You've seen all that. But the area of human emotions is extremely complex, and AI is trying to figure that out. And you've seen this in all kinds of science fiction, whether it's Blade Runner or the movie Her, where AI is,

Starting point is 00:02:27 trying to learn to interface with humans. Well, there is a startup, Hume AI, and they are trying to bridge the gap between just intelligence and, dare I say, emotional intelligence. We demoed some of this technology back on episode 1894, if you want to look for it. But today we have Alan Cohen here.

Starting point is 00:02:50 He's the CEO and chief scientist at Hume AI, and he's going to show us what they're building and why it's important. Welcome to the program. Alan. Hey, Jason. Great to be here. Right.

Starting point is 00:03:01 So maybe you could explain what the mission is of human AI and why you're spending all this effort to try to understand human emotions. And, yeah, in relation to AI and using AI, I guess, to understand humans' emotions and then to portray them back to AI to humans. Yeah. So it's really to understand people's well-being. And emotions are the components of that. So when are you laughing, when are you sad, when are you in pain, when you're experiencing pleasure?

Starting point is 00:03:33 And what we want to do is optimize for that. So our mission is to optimize AI for human well-being. Now, so much of what we express is in our voice, in our facial expression, and not in language. So that part of our expression was just ignored by AI for a long time. I mean, there is a field of affective computing, which I have a lot of experience in. I have over 40 papers in that area, but in terms of the generative models, they just were very far behind and understanding expressions. So what we've done at Hume is built models that understand expressions a lot better.

Starting point is 00:04:07 And we've integrated those into large language models. So now these models understand beyond language, what's going on in the voice, what's going on the facial expression, and can learn from that. So they figure out what's making you frustrated, what's satisfying, what's funny, and they can actually adapt to that information and get better over time. Got it. So what is the killer app of this genre or this vertical? I have chat GPT.

Starting point is 00:04:37 I pay 20 bucks a month for it. I frequently will use the interface where I talk to it, but I don't believe that they are looking at my tone, and I know I haven't turned my camera on. I assume with your technology at some point, these AI assistants are going to have the camera on with permission obviously and hear my tone of voice and understand when I'm frustrated or explain to us what the killer

Starting point is 00:05:02 app here is because we've heard of things like Gong and Rilla Voice, we have a small investment in. They study salespeople and how they behave on sales calls to coach them. That genre exists a bit. I'm curious what you perceive as the killer app here. So Chachavit is text only. we think the future of AI is a voice app. The voice is four times faster than text.

Starting point is 00:05:26 The problem is that when we're speaking, we expect the AI to understand not just what we're saying, but how we're saying it. So that's why this experience is clunky. Like, it's just missing something. And so we've built that. We have a talking chat chip E.T. And we have an API that allows developers

Starting point is 00:05:42 to build a voice interface into anything. So instead of using OpenAI, which would enable developers to build a chatbot interface into their product, they use us. And immediately they have a voice interface that does transcription, does language understanding that speaks back. And what's different about it than anyone else is it understands what's in their voice besides language. And it uses that to form better words and better speech.

Starting point is 00:06:08 It was supposed to be a perfect time, I think, for a demo. So let's get to it. All right. You cute it up perfectly. Sounds great. I will share a few demos. And what's the API called? It's called Evie, the Empathic Voice.

Starting point is 00:06:22 Interface, yeah. Empathic voice interface. Okay, like Lieutenant Deanna Troy on Star Trek Next Generation. She was an impasse. Yeah. Why don't I just demonstrate this? This is like the demo that anyone can play with and then I'll show you the developer playgrounds. Hey, what's up? Well, hello there. I am doing just fine. Thanks for asking. So, what's got you feeling a bit down and amused at the same time? I am all ears if you want to chat about it. Well, my dog died this morning. Oh no, I am so sorry to hear about your dog. Losing a beloved pet is never easy. I know it must be really hard right now, but try to remember all the wonderful times. I was just kidding. Aha, you really had me going there. Well played, you sneaky trickster.

Starting point is 00:07:19 Ah ha ha, I should have known better than to fall for that one. I think I am still learning when it comes to your sense of humor, but I am relieved to hear your dog is doing just fine. Care to share what's really on your mind today. Well, can you tell what I'm expressing now? Our demo is at capacity right now. Oh, there you go. So while you were doing that demo for people who were listening and not watching, you can watch, of course, on YouTube, go to This Week in startups on YouTube,

Starting point is 00:07:47 and you'll find it. It was showing each transcript in real time very, very quickly, and then it had the top three emotions and the percentage of those emotions. I think it was showing the top three every time. Is that correct? Yeah, so it shows more than just the top three. But actually, if you were to look at your raw data, you get back 48 different dimensions. So it's much more nuanced than what we're showing you there.

Starting point is 00:08:10 Got it. And so in real time, you can see that you were sad when you mentioned your dog died and etc. And then that person was showing sympathy for you. So all of that is being done through tone of voice, inflection, et cetera. Okay, let me cut to the chase right now because I know you're busy and everyone is hiring right now. And, you know, it's a lot of competition for the best candidates, right? Every position counts. Market's starting to come back.

Starting point is 00:08:38 You need to get the perfect person. You want a barraiser in your organization, somebody who will raise the bar for the entire team. And LinkedIn is giving you your first job posting for free to go find that barraiser. LinkedIn.com slash twist. And if you want to build a great company, you're going to need a great team. It's as simple as that. LinkedIn Jobs is here to make it quick and easy to hire. these elite team members. And I know, it's crazy, right? Lincoln has more than a billion users.

Starting point is 00:09:02 We all watch this happen when it was tens of millions, that hundreds of millions, and now a billion people using the service. This means that you're going to get access to active and passive job seekers. Active job seekers, they're out there looking. Passive job seekers, they got a job, but it's not as good as the job you're offering them. So you want to get in front of both of those people. Maybe somebody got laid off wasn't their fault and they're an ideal candidate. Get that active job seeker. And LinkedIn also knows that small businesses are wearing so many hats right now and you might not have the time or resources to devote to hiring. So let LinkedIn make it automatic for you. Go post an open job role. You get that purple hiring ring on your profile. You start posting interesting content and you watch the qualified candidates. They just roll in. And guess what? First one's on us. Call to action. Very simple. LinkedIn.com slash T-W-I-S-T. LinkedIn.com slash twist. That'll get you your first job posting for free on your boy J-Cal. Terms and conditions do apply. the components in voice that you're studying? Is it the speed at which somebody speaks, you know, tone?

Starting point is 00:10:04 And how did you train this thing on tone? How does it know what sadness is versus, you know, melancholy versus quirky? Yeah, we have all this data from millions of people around the world who are actually recording themselves while they're having interactions. And also we're reacting to things. and imitating things in some cases. And so we use all that data to train our models.

Starting point is 00:10:30 And that means they're able to capture way more than just like tone, rhythm. Like those are all basic things. But dimensions that you can't really describe it in any other way, except to say like this is kind of an angry dimension, kind of has a growl to it, kind of tension in the voice, where this is like an awe-inspired dimension or happy. And we get tons of different dimensions out of that. So every time we hear a word, we're getting more than 48 different dimensions of expression from that word.

Starting point is 00:11:02 Our model is taking that in, and our model is deciding how to respond. Our model is learning what these dimensions mean from tons and tons of data of people interacting. And it's saying, okay, this is something that means this person's frustrated, so I should apologize. It's something that means that the person's confused, so I should clarify. And it's figuring out what it should do to respond to somebody in different situations. How different is it per person? Like, I might be a high energy guy from Brooklyn who's extroverted, who speaks a certain way, and is, you know, you might be more introverted and soft-spoken.

Starting point is 00:11:34 So how does it know if J-Cow is like bombastic and joking and you might be, you know, more thoughtful and introverted? Are our emotional emotions very similar? Are they very disparate? I'm curious. So I have to learn that stuff. So we train in all these interesting. actions, right? And so it's trying to figure out the task is actually predicting the next

Starting point is 00:11:57 expression. So it has to figure out, like, is this next person going to laugh at what's what said or are they going to be frustrated? So it has to learn how you express your response to things in the course of doing that. And it's learning that in a generative way in a very ground-up way. So by the time that we've trained this thing, it has to account for individual differences, for potentially cultural differences, for sentiments, and also, just the average of all humans and what humans respond to along with the distribution, if that makes sense. So like, what is it the humans find funny?

Starting point is 00:12:29 What is it humans find sound and all that? So like when I said my dog died, you can probably figure out, this is a sad event. I'm going to be sympathetic, right? That's how it. And it figured out. How much of it is the words versus the tone of voice or is it doing both of those things? It's doing both. I'm really trying to give this another shot so you can see that.

Starting point is 00:12:48 Can you tell what I'm expressing right now? Whoa there. hear the frustration in your voice. But I also sense a hand determination and distress. Like you're ready to tackle whatever's got you worked up. Can you tell what I'm expressing now? Hmm, I am picking up on some subtle shifts here. You sound a bit more relaxed now, though maybe still a tad bored or uneasy.

Starting point is 00:13:14 But then I also hear a spark of amusement and even happiness. Like, you're pleasantly surprised by something. Am I on the right track there? I'm going to meet that bit. Got it, you're sounding a bit more at ease now with a hint of satisfaction. Anyway, you get the ideas. Yeah. So that demo is designed to reflect back to you what emotions and things you're having and then tweak it.

Starting point is 00:13:42 So how long does it take for it to accurately understand a human? It's less than 500 milliseconds. As you can see, our API is experiencing some load right now. But generally speaking, we can get you back a response. faster than any other API, and that's because we're able to detect when you're done speaking more accurately. So some of the other APIs, like, they have to dance, like, do this dance between, can I jump in, or is it going to interrupt the user? And so there's a little bit of a pause, right? But for us, because we understand the tone and voice, we can use that to figure out when the person's

Starting point is 00:14:16 done speaking, and then more accurately know when to step in. And so that enables us to respond a lot faster. So it doesn't need to talk to me and ask me 10 questions to understand my emotional state and how I might be uniquely different than another person. What about across cultures? Because do different cultures have, obviously we have different languages, but even putting aside language, just tone work across cultures. Do Koreans, Italians, and Americans all emot, frustration the same way, anger the same way? Is it across cultures, or does it require more subtlety?

Starting point is 00:14:50 There are similarities and differences. We have the paper that just came out on this, but basically, if you're speaking a different language, we need to train a new model for it. And it can be not a completely different model from scratch, but at least we need to fine tune on that language. That's what we find for most languages, especially for broadly different languages.

Starting point is 00:15:11 Like all the Latin languages have similarities. But if you look across East Asian languages, things are pretty different. So, yeah, so suffice to say, yes, we do need to train things for each language. And this demo only works in English. right now. So who's using the app?

Starting point is 00:15:27 Let's take a look at the developer console. So you had that up there, the playground. Yeah. Who's using this now? And is it in production anywhere? And what are people using it for? Because there's plenty of models out there to give you answers and generate copy for you. I'm wondering if people are even up to this level of nuance in their products yet or

Starting point is 00:15:49 just trying to get correct answers because accuracy seems to be a pretty paramount problem right now. Yeah, I mean, you might be interested in accuracy, but if you're using a voice interface, you need to get to the point fast, right? And so that's really what we're doing. And you can't, with these like long, verbose responses from chatbots, first of all, those are very taxing on the brain to read. So that's not a good interface. But also, you might have an accurate answer in there somewhere. It doesn't really matter if someone's not going to listen to a voice reading that out for three minutes, right? It's a good point.

Starting point is 00:16:20 So we have a lot of developers lined up for this. We haven't released this API. By the time this comes out, we will have released it because we're releasing this on Wednesday. But so far we have developers on this, which is our measurement API. Oh, wow. So you're on a webcam right now.

Starting point is 00:16:38 I'm just going to describe it. And you're making funny faces. Right now, you're surprised, horror, confusion, sadness, disappointed, laughing. And if I were to just say, be completely calm. and at ease. Your calmness just went up to 79. Your concentration went up to 45.

Starting point is 00:17:01 And now if you started thinking deeply about the meaning of the universe, like, why are we here? Like, what is the purpose of life? Like, why wake up and build this company every day? It says you're calm. You're calm with existential... Wait, is this a video or are you doing this right now, Alan?

Starting point is 00:17:23 I'm doing this right now. You're doing it right now. you're not following my instructions. No, give me your exes, give me your most existential. Like, I'm wondering about the meaning of life. Like, why are we all here? I want to see if it gets existential.

Starting point is 00:17:36 Confusion. There it is. Confusion or contemplation. Yeah, contemplation. Well, what's interesting about this is this would be great for coaching an actor. Because, like, happy's easy. Sad's easy.

Starting point is 00:17:48 If you go happy, it's got joy, amusement, excitement. Great. And if you were sad, sad as disappointing and confusion maybe you're just not a good actor Alan maybe you need to kick acting acting lessons where these down you know contemplation is a tough one

Starting point is 00:18:04 I was trying to get you to have existential angst I was trying to pick something that's really hard to read right acting like we'll just think like should you even come to work is it all meaningless that's kind of depression right you'd be sad

Starting point is 00:18:21 yeah a little sad a little confusing boredom. Yeah. It's fascinating. So this is just really getting your facial expression in real time. So if you were frustrated, the AI would know it and be like, huh, that wasn't the answer we were looking for. Yeah.

Starting point is 00:18:40 And so are people using this for therapy yet or like therapeutic coaching kind of things? Because that one seemed to be like, I got a lot of pitches for people who want to create AI therapist. And I'm like, hmm, that's a little dicey. I don't know if you should call it a therapist, but companion. Are people using this for companionship? I do think that AI is going to be something that is your friend. And so it's not just like a new, like a niche application.

Starting point is 00:19:06 I think generally speaking, we want an assistant that understands us. And there's tons of people working on that. I mean, there are people working on explicit therapy apps with Hume, too. And actually, a lot of it's in training therapists. And getting them to, it's a delicate balance. You don't really want to comment too much on people's emotions, but you want to ask the right questions and kind of get at it,

Starting point is 00:19:30 help them understand their own emotions better. And so there's a lot of that. And there's also like therapist, burnout, Dr. Burnout. There's a lot of health and wellness applications. There's also tracking depression and stuff. We work with clinical researchers who are running clinical trials and using Hume to track symptoms of depression and Parkinson's, just the symptoms. It's not like used for diagnosis because ultimately the doctor does that.

Starting point is 00:19:54 But it's helping the doctor understand these things. So we have a lot of those applications. A lot of them, you know, those are interpersonal things. Like someone's talking to someone and we're already like the measurement APIs that we have are very good at extracting more data from that and helping people analyze it and helping people understand themselves and their patients, I guess. I mean, is it so if we have therapy on one side,

Starting point is 00:20:21 you have the therapist who needs to present in a certain way to get people to open up. If you believe in that modality, if you believe in Western psychotherapy, there is something about pacing and aligning with the person matching their energy and getting them to open up so that they have some cathartic way of processing stuff. So people are using it to train therapists so that they don't have a goofy look on their face or they have the appropriate look that would elicit less suffering in their patients. Is that what I'm... Yeah, or like customer service reps, which is actually a kind of similar thing.

Starting point is 00:21:01 Yeah, it's another form of therapy, actually. It essentially is, yeah. But, you know, that requires somebody who's technical, who's maybe academic, maybe a researcher to take these measures and make sense of them. Listen, a strong sales team can make all the difference for a B2B startup. But if you're going to hire sharks, you need to let them hunt, and you can't slow them down with compliance hurdles like SOC2. What is SOC2?

Starting point is 00:21:27 Well, any company that stores customer data in the cloud needs to be SOC2 compliant. If you don't have your sock too tight, your sales team can't close major deals. It's that simple. But thankfully, Vanta makes it really easy to get and renew your SOC2 compliance. On average, Vantage customers are compliant in just two to four weeks. Without Vanta, it takes three to five months. Vanta can save you hundreds of hours of work and up to 85% on compliance costs. And Vanta does more than just SOC2.

Starting point is 00:21:54 They also automate up to 90% compliance for GDPR, HIPAA, and more. So here's your call to action. Stop slowing your sales team down and use Vanta. Get $1,000 off at vanta.com slash twist. That's vanta.com slash twist for $1,000 off your sock two. Have you done this with poker players yet? Have you put poker players through this to see if they're lying or deceptive? in a poker tree?

Starting point is 00:22:14 I've tried a lot of things and it does not it cannot tell poker players about me you know at least professional poker players cannot tell

Starting point is 00:22:24 I don't think the information's that I just don't think that with professional poker players that you can there's anything going on their facial expressions what can you tell with people's facial expressions

Starting point is 00:22:34 that we wouldn't know of some people have said you could tell a person's if a person's sexuality where a person's from you could tell all kinds of interesting things that you wouldn't know.

Starting point is 00:22:49 I think is that true or not? That's not really true. There's been a lot of pseudoscience in this area. Like most of the things that we can tell are things people want to communicate, which is good. Like, we don't actually don't really care to impinge on people as things that they want to keep private. We're more interested in helping people communicate well and helping the AI understand what people want. And most of that's like they're overtly. on the face. And for example, it even extends to things like, is the person done speaking?

Starting point is 00:23:17 Like, we're way better in understanding when they're done speaking because we can take into account facial expression versus just the language alone. And that's part of how our, um, our empathic voice interface is able to respond better. And like, so you know when I say, this is the end of the sentence. Yes. Because of my facial expression, you get a, a quicker clue than all. audio only. Therefore, you can start speaking without interrupting me, which is what humans do with each other. Yeah. Like, imagine I'm speaking to you. And right now, it's clear to you. I just finished a sentence. It's clear to you. I'm still speaking. And it's clear to you. I'm going to say something

Starting point is 00:23:56 again. But now I'm done. Now I know I can speak. Right. Which is what I do for a living on the podcast is try to understand when people are done so that we can have the next person speak, right? Like moderation is a difficult task. Um, and customer support folks are using this already to understand how hot and bothered people are when they call the customer support line, I assume. To some extent, yeah. So kind of understanding is the customer having a good time, bad time, where are we kind of failing on customer service and which customer service reps are doing well or poorly? And how do we train them to do better? How do we pull examples up when they're not doing well so that we can train them to do better?

Starting point is 00:24:38 And, you know, there's a lot of AI going into customer support now. So some of our early design partners for this new API are people who want to take the automated customer support, make it a lot better, but still know when to include a human, escalated to a human. Yeah. I mean, that makes sense. If the person's like, this is incredibly frustrating, you know, and you start hearing the frustration go up and there are whatever United Premier gold diamond status. Yeah. You want to get them on the phone with somebody because you're starting to piss them off, right? So understanding when that happens

Starting point is 00:25:12 And how much of this is going to be used for security? Do you have any security applications coming? Because it's been well known like when you go to certain countries, you know, they'll actually a couple of questions. They try to read you, do some human factoring and figure out

Starting point is 00:25:26 if you're lying. It's one of my favorite genres of television show is the people going through customs and they're trying to read if they're like sneaking into the country or sneaking things into the country. Are three-letter agencies using this technology yet? to analyze people as they come into buildings?

Starting point is 00:25:44 We haven't been working with security yet. Not that we don't believe that that's a good application, but we're being a little bit more careful about how this is used and trying to make this as rigorous as possible, essentially. There's been a lot of providers of facial expression reading technology who aren't very scientifically rigorous, make false promises, and then doesn't work. And you're just like signaling out people for no reason, basically, which is not, you know, we want to, we want to take, we want to be very careful to distance ourselves from those applications.

Starting point is 00:26:19 I think there's a huge wealth of applications that are consumer focused that are just helping people have a better time, a better experience with an application. I think that's a much bigger market. And then there's like the more niche things that security, lie detection, stuff like that, which. Yeah, could you tell if somebody's acting suspicious based upon their facial experience? or tone of voice, like they're knocking on your door, your ring doorbell answers and ring. It's like, hey, this person is scoping the house out, you know, they could use your API to figure out the expression of people at the door.

Starting point is 00:26:52 Yeah, I don't, I don't know. I feel like I've mixed feelings about that. I think in some cases, if you have enough data, you could start to do something in that direction. But if people are really good, if they're like, if people are criminals, like they're pretty good at disguising their facial expressions, probably. Yes, they're sociopaths.

Starting point is 00:27:09 Criminal for a reason, right? So you would catch the neophyte criminals, the bad criminals, and the good criminals would just sell right by. You'd be like, yep, that person's 100% trustworthy. And you let Hannibal Lecter into your house. Yeah, I can see how you need to be thoughtful about this. Yeah, we don't want to make promises that we can't keep, right? Well, I mean, yeah, so in that regard, let's go through the emotions that you feel you've nailed. And then tell us, like, what are the ones that are hard?

Starting point is 00:27:36 I tried to do existential. And that was really hard, obviously. We're not there yet for existential crisis. Yeah. I mean, the way that you respond to expressions has more to do with almost subtle things that you can't name versus overt emotions. And it depends on the situation. Like if I'm using this like comedy thought, which I can prompt, this is like a prompt that makes it a comedy thought, basically, it'll respond to my emotions completely or my expressions completely differently than. then like a therapy use case.

Starting point is 00:28:09 So let's try it. Oh, okay. Great. Whoa. Well, well, well. Look who decided to grace us with their presence tonight. I don't worry. I promise I'll keep it. PG-13 for now.

Starting point is 00:28:22 You know, I got to say, I am a little shocked that any of you fine folks decided to show up. Bean Cimon, who in their right mind would willingly subject themselves to an evening of my particular brand of, shall we say, edgy humor? But hey, I am not one to judge. In fact, I can you make a joke about Alan Cowan, the CEO of Humai? Sure thing. You want me to take a swing at the big cheese himself, eh? Well, twist my arm.

Starting point is 00:28:53 Why don't you? I'm right to say, I am not the biggest fan of that Alan Cowan character. Talk about a guy who needs to have that massive ego of his deflated a bit, am I right? Actually, you know what? I think I have the perfect way to do it. I say we round up a team of trained monkeys, dress M up in business casual. Oh, that's really offensive. You've made me very upset.

Starting point is 00:29:21 Whoa there, partner. It looks like I may have gone a little too far with that one, huh? Sometimes I get a little carried away with the whole edgy comedian thing. So, like, you can see in that case, I don't want it to be that sympathetic. It's doing its thing. And so you can determine what A.I. flavor you want to have interacting with folks. You'd have one that's cheeky and playful.

Starting point is 00:29:47 You could have one that's super empathetic, but maybe not going over to patronizing. But you could see this. Like if I'm calling the support line for United, they might, you know, think I'm a New Yorker who talks fast and I just want to get to the point. Or I could be from the south and into southern hospitality, and it could take its time with me and ask me about the weather.

Starting point is 00:30:11 and how I'm doing and, you know, a little bit of chit-chat. Some people like in the South I notice versus in New York where they're kind of gets a point. Let's move on. Yeah. You can basically train your AI to have both modalities and dynamically switch between them. Exactly. So there's all this context and then that kind of transforms the meanings of our expressions.

Starting point is 00:30:30 So like what an expression means and what to do with it really depends on all this other information that this model is taking into account. So it's not so much like detecting lies or did. you know, detecting anxiety or detecting depression. It really depends on the context, and we're able to integrate that into the model. And then it's not just like these kind of canonical emotions like anger. Like there's a little bit of anger dimension in a joke, you know, anger and amusement and contempt maybe. That makes it funny.

Starting point is 00:31:02 So it doesn't necessarily mean the person's expressing anger. So know what these expressions mean. You really have to have the context. You have to have the relationship that. that you're acting upon with your expressions. And that's what our AI does. So it's a little bit more nuanced than just detection. And these are all under what you studied,

Starting point is 00:31:22 effective computing. Yeah, this is a specific school of computing that kind of bridges the psych department and the computer science and I think behavioral factors, industrial organizational psychology. Maybe you could give us a quick education on that. Yeah, so affective computing traditionally It's the study of nonverbal expression, basically.

Starting point is 00:31:42 So facial expression, the voice, body posture, and then, you know, most of the history of that is just labeling those things in a very predictive way. Now that we have generative models, we have large language models that can reason. It's really about reasoning about affect, and that's what we've introduced at you. So it's about understanding whether somebody is going to find something funny, whether somebody's going to find something confusing and using expression along with language to come to those understandings. I would say historically, that's not what affective computing has been,

Starting point is 00:32:17 but now we've sort of pioneered this new form of affective computing that we're introducing to the world. Some of this was done. I know this was like a big thing that Minsky worked on in MIT, yeah? Did you go to MIT with it? Or did you? I went to Yale, and then I went to UC Berkeley for my PhD. And I also worked at Google while I was at Berkeley,

Starting point is 00:32:37 and I helped start the affective computing team there. So I've been doing this for like 10 years. Minsky had all the AI people had something to say about affect, right? But there really wasn't much that could be modeled at the time. Same with language, right? Like things have come a long way. And I would say that there's affect in language. And so the word affect has a little bit of misnomer.

Starting point is 00:33:00 It's really computing with more than just language that we're doing. You're computing with expression. This is the way that expressions transform communities. Yeah, because you have a multimodal situation here. You have the visual, the facial expression, you have audio, and then you have the actual words, right? And so you're beating all of those in at the same time to get the response and to understand the emotion. Yes. And all of this just contributes to accuracy. We can predict words better with expressions versus without. So like if you look at the raw metrics, they're used to train these large language models, we're doing better in terms of those raw metrics than models that just consider language alone. So this is like an incident part of reasoning and it's just part of human communication

Starting point is 00:33:51 that we're now taking into account. It's not something that is niche. You know, I think people think about emotion and affect as these niche things that are important for therapy, important for comedians, important for like a few. But actually, this is something that's important for all conversation. important for any interaction with AI, just understanding a whole new modality of information that people use to converse with each other. Yeah, it's absolutely fascinating how quickly this has come together, because if we were

Starting point is 00:34:22 sitting here two or three years ago, this just wouldn't be possible, would it? No, I mean, without large language models, without our measurement models, without the modifications that we've done to integrate those two things, like this was not possible at all. What has surprised you about... what the AI understands and what your model understands and what has been either disappointing or challenging, you know, on this journey. So, yeah, that's interesting. I mean, linking together the language models and text to speech and transcription is something that other people are doing. But like, what we've sort of started to see emerge out of models that do all three that are linked together is that they have these emerging capabilities.

Starting point is 00:35:07 And you start to see that in this interface where it's forming expressive speech. It's just like, it feels different to me than if you just like link 11 labs and open AI and just have a talking chat thought. Like that just sounds, it doesn't really sound like it's understanding you. And this is doing something a lot more nuanced. Do you understand what it's doing? When it starts processing all this stuff and you feed it in, do you actually know how it it's coming to these conclusions, or is it just sort of, you know, it's doing its best to figure it out and who knows? So, yeah, we don't come in and tell it to respond to sadness with sympathy,

Starting point is 00:35:51 but like it does, right? And it's sort of intuitive why that is. So I'm not going to say I don't understand that. But that's an emergent capability that we did not program in. And there's other things that it's doing that are more nuanced that we don't really have its handle on, except that we know what it's optimized for. Hey, everyone, you know I'm obsessed with AI right now. And a fantastic report about how AI is going to change the game for startups has been released. It was published by our friends at HubSpot for startups, and it's great because they surveyed a thousand early stage founders to get you these insights.

Starting point is 00:36:23 These are from the field. The report talks about AI tools and hacks for sales, marketing, and customer support teams. These are going to be your WMGs, weapons for massive growth. So check out the link in the episode description to get your hands. on their extensive report, head over to HubSpot.com slash startups. When you join HubSpot, get ready to unlock top-notch resources. These are power tools that they've custom made for founders. You'll also get access to masterclasses, templates, and incredible toolkits,

Starting point is 00:36:48 all designed to boost your startup speed and efficiency. So once again, visit HubSpot.com slash startups to see what discounts you qualify for and start using their powerful solutions at pricing that won't break the bank. That's HubSpot.com slash startups. In this case, it's optimized for, it's actually a specific personality. that is pretty robust. And it's trying to- The comedic one that we just did.

Starting point is 00:37:10 But for the helpful one, you might set it at some other archetype, which might be, you know, parental, motherly, you know, and then you might have, you know, service-oriented person, right? So even the worst customer support reps

Starting point is 00:37:26 and somebody who hates doing customer support could be turned into a empathetic patient person. Yeah, exactly. And what we ultimately want to optimize for is this is the response that's going to make the user the most satisfied. And it's going to figure out what this is the right tone of voice, this is the right thing to say, to make somebody express satisfaction. That's what we want to do.

Starting point is 00:37:53 I mean, we're still, we know that we can. We're still early in getting it to work, but it works. And we're starting to scale that methodology. It turns every customer support rep. into a roast and that I'm dealing with like comic the insult comic dog and it's just absolutely annihilating people and being cheeky. I can have this thing roast you if you want. It might be a little bit.

Starting point is 00:38:18 Yeah, let's do it. Let's do a roast. All right. Could you roast Jason Calacanis? Man, this is like everybody's. All right. Let's talk about that piece of work, Jason Calacanus. Oh, man.

Starting point is 00:38:31 Where do I even start with this guy? You know, I heard a rumor that Jason Caliconis was actually grown in a lab, a mad scientist's twisted experiment to create the most obnoxious, self-aggrandizing techbo imaginable. Bored, he never stood a chance. And I've seen the way this guy carries on. It's like he's perpetually auditioning to be the next Elon Musk, but with half the brains and twice the ego. I mean, the guy can't open his mouth without saying something so mind-numbingly. This is too much. Can you go easy on this?

Starting point is 00:39:03 Yeah. Okay. I would say go harder. Oh, come and don't be like that. Go harder. All right. All right. Fair enough. Let me try this again with a little less. Venom. You know, as much as Jason Calacanus rubs me the wrong way sometimes, I've got to admit the guy's got a certain...

Starting point is 00:39:20 They say, Quay. I mean, the way he strucks around Silicon Valley acting like he's the second coming of Steve Jobs. It's almost endearing in its own delusional way. And let's be honest, we all know deep down that half the... the reason we pay attention to him is because he provides such a high material for rope. I mean, it's fantastic. I mean, it literally understands what a roast comic does. I'll take it. How about this? Like, isn't it cool that you,

Starting point is 00:39:49 doesn't it say something good about Jason that you're able to roast him like this? I mean, it must mean he's made it, right? You know, you make a fair point there. All right. It's being a little funny now, but yeah. I mean, it's, what's amazing about it is it, it understands what a roast comic is. The language models understand that. It understands who Jason Calicanus is. It got the Wikipedia page. It knows I'm somehow involved in tech.

Starting point is 00:40:18 Somehow I know Steve Jobs or Elon or whatever. So it's, you know, and that the concept of a tech pro exists. So who knows. Yeah. You want it to roast you more. I can say you do it. Oh, no. I love the roast.

Starting point is 00:40:30 I think it's, well, I mean, it's interesting about jokes. I have friends who are in comedy. And, you know, while these jokes. are not funny. They're in kind of the zone. So if you squint a little bit, you're like, there's the joke there. Yeah,

Starting point is 00:40:46 you got something about strutting around University Boulevard thinking he Steve Jobs, maybe he's wearing a turtleneck or, yeah, like, there's, okay, there's a joke there. You didn't hit it, but we could brainstorm it. So like, I think in the writer's room, you could really brainstorm these. I asked it when chat chip PT3 came out. I was like, give me like the next season of secession.

Starting point is 00:41:08 you know, and it knew all the past seasons that it's like, here's what happens in this next season, even though the series is over. And I was like, huh, wow. Like, this may not be great right now, but it's okay where it's interesting. It's going to get there. It's close.

Starting point is 00:41:27 I think none of these models have mastered Latin humor because it's so much in our expression. Like, we don't say things are funny explicitly because that would just make them not funny. So, Let me explain the joke to you. Exactly. That's what the joke didn't land. We have this new e-val for humor and we're starting to push it.

Starting point is 00:41:46 Basically, we can optimize for laughter. We can optimize for like, what do people actually laugh at in millions of hours of conversation? That's great. And so that's how we're approaching these kinds of problems. So you could do a focus group where you had people watch curb your enthusiasm and you could say for 100 people, here's the funniest moments. And for this demographic, older people,

Starting point is 00:42:12 older men, older women, younger men, teenagers, Gen X, you could literally give you what jokes landed with each group. Yeah.

Starting point is 00:42:20 That's version one of this. Version two is like we have, which we're doing now, we have millions of hours of data and we analyze it just to see in general what's funny to people. Like across everything, not in third group of enthusiasm,

Starting point is 00:42:32 but like across everything. Across every single thing in the world. I can tell you like, There's a great movie, Idiocracy, and there's a amazing TV show. Have you seen Idiocracy? Yeah, great movie. I mean, it's so great. But, like, everything's been reduced down to, like, its most basic thing.

Starting point is 00:42:50 Like, here's, like, a gel for you to eat, like, from a tube. And the hit show is, Ouch, My Balls, which is just a compilation of somebody getting hit in the nuts over and over and over again. And it, you know, falls off of a roof. Lance on a fence, falls off that, it's hit by a crane with a big ball, you know, hitting him in his nuts. Ouch, my, I think it's called Ouchman Nuts or something like that. It's hilarious.

Starting point is 00:43:19 That's what it's been reduced out to. Somebody getting hit in the nuts. We're hoping not to be too reductive, but yeah, maybe the AI will. I mean, you know, figure it out. You could literally crack humor. What language model did you build all this on? So we have our own language model, and it calls other APIs. In this case, it's calling Claude.

Starting point is 00:43:38 So Claude is providing the language response, or some of the language responses, not all of them. We also have like a wrapper around Claude. It's not exactly a wrapper. It's our own language model that sort of integrates Claude into the speech to make it sound more conversational. And also like detects when you're done speaking and stuff. But we give Claude more data than just language.

Starting point is 00:44:00 We give Claude like some of my tone of voice data, some additional data that we're getting through our APIs. So it's augmenting it as well. So eventually, what does the world look like? If you succeed with this, and we're sitting here in five years and it's built into every iPhone and you've figured out emotion perfectly,

Starting point is 00:44:22 what do you think the world will look like? What are some highlights or, you know, dare I say, dystopian, utopian sort of, what are the pros and cons of this technology you're going to be? So RM is utopian. We want to build a layer in between the application and these gigantic AI models that is decoding the user's intentions and preferences and relaying that information to the model. So that's like what we have here, basically doing that with Cloud. And because we'll have facial expression and the voice, we're able to learn over

Starting point is 00:44:55 time, we're able to build interfaces that understand you and what you want and are optimized for you. So suffice to say, like, basically, Basically, it's going to be built into everything. It's going to be the universal interface that you use to interact with AI. That's the goal. And it's always going to be this AI that's optimized for your experience. So you can go to it and it knows basically what your preferences are, what makes you laugh, what makes you feel better, what you find to be a good explanation for things, your style of speech, how you write emails. It's going to know a lot of different things.

Starting point is 00:45:31 Obviously, it's going to keep all this information. protected and private. Now, you know, on the downside here, you could use this technology to say, I want to convince somebody subtly to vote, you know, this way politically. I want to try to convince somebody that, you know, Trump is amazing or Biden's amazing or Robert F. Kenny Jr. is the one. So you could literally start creating robocalls or subtlety here using this emotion to try to sway people in politics or towards ways of being or thinking. And we saw that happen with the YouTube algorithm. So how do you police how people use your system? I saw you have ethical guidelines there. And then obviously there's things that would be maybe R-rated or PG-13.

Starting point is 00:46:22 And romance always comes up when people are doing it, whether it's a Blade Runner or her. So are people using this for romantic relationships and what your take on allowing that and then also how do you think about influence big questions with TikTok today and your technology could really be used to influence people towards good and bad ends. Yeah, I think there's a pretty good way

Starting point is 00:46:47 to operationalize the difference between when you're being manipulated by something that wants you to vote for a person or to buy something versus when you're dealing with an ad-optimized for your own well-being. And that's what we try to do with our ethical guidelines. So we have this nonprofit, the human initiative,

Starting point is 00:47:05 that essentially tries to codify that principle and says, these are the ways that you can pursue these different applications. So as to optimize people's well-being. It even has, like, a bunch of ways that you can measure people's well-being that relies on a combination of what we're able to get through our API, so, like, positive emotions, basically, over time. and also, you know, different kinds of self-report measures that we recommend gathering. So as long as the AI is optimized for your satisfaction for your well-being, I think it's not manipulation.

Starting point is 00:47:39 When you get an AI that's optimized for somebody else's objectives and using your emotions for that, and that can be manipulative, I think. And that goes for the romance case as well. Like if you're dealing with like an AI girlfriend and it's ruining your life by, you know, forcing you're spending more time with it, then you're spending with humans. And that's going to be a negative for you. And it's going to show up in many ways as being negative for your well-being. Like, that would show up in these measures. If it's optimized, though, for your well-being, and you're having a good time with it and it's healthy and you're not spending more than X amount of time on it. Maybe that's okay. Yeah, maybe that's...

Starting point is 00:48:20 What about trying to upsell me like, hey, you're in business class and would you like to be in first class? You're in economy. Would you like to go? go up to Economy Plus, and it uses your technology to be really convincing about the value of that an upsell. How would you look at an upsell? That to me is... Ethical or not ethical? I think that's not ethical unless it's done in a very, very careful way. Basically, our guidelines don't allow that, but you could say... Guidelines don't allow an upsell? But humans do upsells all the time. Right. I think upsells are okay if the goal is to find the person who really will benefit from the upsell and only try to sell it to them, you know, and, and you measure the

Starting point is 00:49:03 effect of the upsell on people's well-being afterward. And you're like, okay, people actually benefited from this. I didn't sell this to somebody and then they regretted it. So I think there's, there's ways of doing it that are going to be fine. The problem is that if you just allow anything, if you allow people to optimize this for anything at all, then it's extreme, the potential for manipulation is pretty high. And I think this is true regardless of Hume. I think people, are building these things that will be extremely persuasive. And Hume, ideally, will be providing the AI that responds and protects you. It's like, okay, like, I'm detecting.

Starting point is 00:49:37 So you are very much in the camp of, hey, we have to be really thoughtful about how this technology is deployed. Yeah, but not in as much of a paternalistic way. Like, I think that technology can have a sense of humor and it's okay if it offends some people and it doesn't need to be politically correct all the time. But what I care about is like, is this good for people? that's at the end of the day. And so we have our ways of measuring well-being in order to optimize for that objective and not be paternalistic, basically.

Starting point is 00:50:06 Yeah, but I mean, at the end of the day, this is so powerful. It will be more powerful than just watching videos on YouTube because it's customized to an individual. So that Ben Shapiro or Rachel Maddo and pick whichever side of the political spectrum you're on, you know, those people are trying to convince you of their position and their interpret repetition of the world, this to me is even more bespoke and customized to individuals. So if you showed even a little propensity towards some of their viewpoints, it could really, whether it's the language model or the emotion, but the combination of them, you know, the same way people were complaining like, oh, people go into the intellectual dark web on YouTube.

Starting point is 00:50:49 I don't know if you heard about that. Like you see a Joe Rogan, then you get a Jordan Peterson, you wind up on an Alex Jones, and the next thing you know, you're like some white. supremacist or something is the claim. But media does influence people, and it is a stepping stone from one to the next to the next. You may start out with somebody like Sam Harris, like just intellectually, you know, rigorous, etc. And then all of a sudden you wind up at Alex Jones is the complaint for many parents.

Starting point is 00:51:16 But this would facilitate that, wouldn't it? Like massively. I think you'd get that when you optimize for engagement. And so to some extent, like TikTok doesn't have this data, but it's still incredible. incredibly good at doing that. I think where this data helps you the most is in taking into account people's user satisfaction, their well-being, their mental health, all those things. So TikTok took this stuff into account, and it was doing it in a way that was consistent with

Starting point is 00:51:44 our guidelines, let's say, then it would be using that data to optimize for people's well-being over time instead of engagement. And so, you know, you'd realize that if you throw people down the slippery slope of getting to getting more and more extreme viewpoints, which is what happens today, because they're engaging and they're offensive at first and you want to argue. Like if people who go down the slippery slip end up kind of isolated and it affects their social relationships. It affects they start to get angry. This is not good for people's well-being. So the technology can can look at that at the individual level, at the societal level, it can look at the health of all the people using

Starting point is 00:52:25 a technology and say, hey, like, there's a collective impact of this. So that's the road we want to go down, is being able to measure long term, is this affecting people positively? And you really need expressive behavior to look at that data. Like, there's no other, like language alone is not going to get you there, basically. Yeah, I mean, you're going to be facing a real uphill battle because the marketers want this software. Your top customers, I predict, will be marketers who want me to try Zin or whatever those

Starting point is 00:52:55 pouches are that people are putting there. And I'm in Texas right now and like everybody's putting these in pouches in or whatever. And I'm just like, that can't be good for you. And they're like, want to try it? Like, marketers love this kind of stuff. Like maybe the pitch to me is like, be where, you know,

Starting point is 00:53:11 hey, it's performance. And, you know, it's just nicotine. It's like caffeine. You drink caffeine. You should try this. But for other people, it might be, you know, hey, you're the cool kid. So it's, you're going to be in a really interesting position as a, provider of an API that I think a lot of the marketers are going to want to use this to try

Starting point is 00:53:30 to convince people to do things that maybe it's unclear if it's actually good for them. Like, hey, you should gamble on sports, right? Like, there's marketing going on like crazy. And if I'm a marketer, man, this is for me the holy growl. Yeah. I think that we want to connect more with like the end user and show the end user that we're optimizing for their interests and have that be the selling point rather than connect with the people selling to them.

Starting point is 00:53:55 But I hear you. I think that's a real concern. But on the flip side, if we just optimized for people to buy things, or let's say we just optimize for engagement, you reach a certain point where like it becomes so negative for people that regulators have to step in. And you kind of start to see it with like TikTok, for example. Kids are spending six hours a day on TikTok.

Starting point is 00:54:18 And if they made it any more addictive than it already is, parents would step in, regulators would step in. they'd be like, this is actually bad for our whole society. So at the end of the day, it's not necessarily good for our business. Yeah, which is what's happening with TikTok right now as we speak. I think parents are getting the message, like, this is too addicting for adults and kids. And the idea that, like, media is not influential is so naive.

Starting point is 00:54:41 Like, when people are like, yeah, you know, media doesn't have an impact. It's like, are you sure, like, all studies show that media is one of the, and video, specifically, is one of the most convincing mediums of all time and all human existence. If you want to manipulate somebody, video is the way to go. And then customize video that is matched to it is like 10x that. So you have like something here that I think is incredibly powerful. And the fact that you're being thoughtful about it makes me feel great. I think it's awesome that you're taking a measured approach to this.

Starting point is 00:55:14 I wish you great success with it. If people want to learn more or try it, how do they get into the developer sandbox and play with us? And who are you looking to work with? Yeah, go to here. You can sign up. We'll be releasing access to our voice API hopefully before this episode comes out. And we have some closer design partners

Starting point is 00:55:36 who we're working with to improve things as well. So please feel free to sign up. And you can start using our API today, like our existing measurement API. I think it's absolutely fantastic what you're building. And I like the fact that you're super thoughtful about it, Alan, and I wish you great success with it. And we'll see you all next time on This Week in Startups.

Starting point is 00:55:54 Bye-bye. Yeah.

This Week in Startups - Empathic AI and its role in understanding human emotions with Hume AI’s Alan Cowen | E1922

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.