The a16z Show - a16z Podcast: It's Not What You Say, It's How You Say It -- When Language Meets Big Data

Starting point is 00:00:00 Hi, everyone. Welcome to the A16Z podcast. I'm Sonal and I'm here today with Michael and we are talking to Kieran Snyder, who is a CEO and co-founder of Textio, a company that analyzes job listings to predict how well they're going to perform and can help optimize them to get more qualified diverse candidates. And interestingly, they've been able to figure out besides what doesn't work very well in job descriptions, words like synergize. They've been able to figure out what does work well. Language like in tech, people love to talk about hard problems and tough challenges. But it's a lot bigger than just about jobs. The ability to understand the words we use and how we use them is pretty important because even though we're completely immersed in a world of tech, where a lot of the conversation is around big data as numbers, a lot of the data that we produce of the output of our work is actually taking place in the form of words. And those words matter.

Starting point is 00:00:53 Sometimes how you say things is more influential than what you're actually saying, right? And it's counterintuitive to any of us who've built products before because you like to think you're leading with a strong vision. Clearly words matter. And another place that that plays out is with hidden biases that are often revealed in words. For example, Kieran examined a number of resumes to see the differences between how women and men describes themselves as well as in performance reviews to see the, ways that women and men were described differently. The word abrasive, which has been talked about since then, ended up being used in 17 out of a couple hundred women's reviews and zero times in men's reviews, right? The sort of stereotypical, like aggressive was used in a man's review with an exhortation to be more of it, and in women's reviews is a term of some judgment.

Starting point is 00:01:48 Okay, let's get started. Kieran, welcome. So the reason we actually invited you to the ASEANC podcast today is because you've been writing a lot of interesting work based on the outcomes of your product where you've been analyzing people's use of language in certain contexts as a way to surface insights. And I think that's really fascinating because I think we have a tendency in our world to focus on big data as if it's just numbers and not other forms of data because you're really describing, I mean, what you describe your work is doing is applying machine learning to text and natural language. So how does that, how did you kind of, how does that work? And then we can talk a little bit more about how you got there. Yeah, so how does it work? I mean, language is just an encoding of

Starting point is 00:02:29 concepts, right? And anything that can be encoded can be measured. And so I was sharing the story the other day. We were actually originally started out looking at Kickstarter projects, right? So we started out with this question, could we just look at the text of a Kickstarter project and some of its metadata around the text and predict before it was ever published whether it was going to raise money. And we didn't look at the quality of the idea. We didn't look at whether a celebrity endorsed it. Turns out we got over 90% predictive on minute zero of a project as to whether it was going to hit its fundraising goal based solely on things like, how long is the text, and what kind of fonts are you using, and how many headings do you have? So wait a minute, just to say,

Starting point is 00:03:14 unpack that a little bit. So before the project even went live on Kickstarter, just looking at those features of the text, you were able to predict whether it would be successful or not? Exactly. What were some of the high-level takeaways from that? Yeah, so longer is better where Kickstarter is concerned, kind of counterintuitive.

Starting point is 00:03:35 One thing that broke our hearts, because my co-founder, Jensen Harris and I both have some design background, you would think these cleanly designed projects with this beautiful use of single typography would do best. Not so. You want to look like a ransom note, so you want to mix and match types. You want lots and lots of headings. Oh, my God, that sounds visually painful.

Starting point is 00:03:57 You want images to be front-loaded. Kind of make sense. But a lot of what we found was not intuitive. And so it's demonstrated for us the value of actually measuring, because the whole Kickstarter corpus is out there in the world, right? So you can actually have great training data. you can see how well prior projects have performed. And we saw, hey, we're kind of onto something here,

Starting point is 00:04:19 just looking at the, very painful as a product person, the quality of your idea doesn't matter, just looking at the content aspects we could predict. And how do you account then for all the other sort of outside variables, you know, whether it was at the beginning of the Kickstarter kind of like craze, whether it was a certain time of year for that matter.

Starting point is 00:04:38 A certain type of product even. Yeah, or geography. me, how do you know that, in fact, your analysis was correct? I mean, you can look at some of those other factors, right? Because you can see when projects are published. Turns out that doesn't make a big difference. You can see the only things that really move the needle in a very short-term way are, do you have a celebrity endorsing you?

Starting point is 00:04:58 Because that can get you a lot of social media attention. It doesn't make or break you, but it can help quite a bit. And generally, how good you are at your social media strategy. can tip the balance a bit. But none of those other factors turned out to be as significant as we expected. The ability to really zero end via just the text. Did that surprise you? I mean, we started off with a hypothesis that it would be that way.

Starting point is 00:05:27 And that, you know, sometimes how you say things is more influential than what you're actually saying, right? And it's counterintuitive to any of us who've built products before because you like to think you're leading with a strong vision. We weren't surprised. We were curious as we started to apply the technology to some other verticals, whether it would extend. You know, our first big area has really been in the area of job listings where we've looked

Starting point is 00:05:55 to see the first real product application, where we've looked at listings now from over 10,000 different companies. We've measured who's applied to which listings. And we do see the content matters. We do see some tailoring by geography. Turns out what works in New York is different than what works in San Francisco. Francisco. We see a lot of tailoring by industry. So what works to hire in tech is very different than what it looks like to hire a claims adjuster or someone in retail. So you see some differentiation,

Starting point is 00:06:21 but in all cases, depending on how you're slicing and dicing the categories, that text leads. You know, we've looked at real estate a little bit prior to launching our jobs application, and we've seen the same principles apply. So far you've been talking about the form of the text, like the length and the fonts and the design. But like, were there particular words that popped out as well in terms of what people said on those Kickstarter descriptions or anything like that? I'm bringing this up because there's just this recent anecdote in the news that I read about someone saying that you can predict the success or default of loan applications based on

Starting point is 00:06:57 words people use like God or using God a lot will actually mean you'll default. You're more likely to fall on your loan, for example. By God, I'll pay you every month, I promise. In Kickstarter, we didn't look at that. We started looking at that for real estate listings and then jobs where we've looked at it quite a bit. So we saw when we were prototyping out the real estate stuff that if you say off street parking, that really moves the needle for low-income homes. But for high-income homes, in terms of the number of people who will go to your open house and then the eventual sale price of your home, for higher-priced homes, it's actually a negative because why would you want to highlight that it has off-street homes? parking. It's just sort of an expectation. So we saw, you know, vocabulary matter quite a bit. In jobs,

Starting point is 00:07:43 it matters hugely. You know, we've identified at this point over 25,000 unique phrases that move the needle on how many people will apply for your job, what demographics, how qualified they are. Could you share some of that insight with us? Because, you know, the reason that came across your work is because I read an article about how you analyzed performance appraisals and job descriptions for insights. about what moves the needle and the differences and how people communicate. What are some of the things? I mean, just because we have a huge audience

Starting point is 00:08:13 that does job descriptions, that needs to hire. Yeah, so there's sort of a set of language that works really well for everybody. These are not surprising on the face of them, but when you look, you see lots of them. So things like, we'd love to hear from you, be really encouraging and positive in your listing,

Starting point is 00:08:32 using the right balance of talking to the job seeker, so your background is in science and you really enjoy roller skating in your free time and talking about the company. So we stand for this. So terms of the balance between you statements and we statements can matter. You know, language like in tech, people love to talk about hard problems and tough challenges. Curiously, we see patterns change over time. So my favorite example of this is the phrase big data.

Starting point is 00:09:00 So a year and a half ago, if you use the phrase big data in a tech job listing, it was positive. You know, it was seen as compelling and cutting edge. In June of 2015, it's not negative, but it's totally neutral. That's interesting. I wanted to ask because if everybody sort of gloms on to these best practices, how then does the signal versus the noise shift? Exactly. Marketing content, as with any marketing content,

Starting point is 00:09:26 the patterns that work change as they get popular and get adopted. And so one of the reasons we believe software is so interesting as a solution here is that it can kind of keep track at broad scale of what's actually happening right now in the market. So you may have published a job listing that worked really well a year ago. And probably how a lot of your listeners write their job listings as they go back to that one. And then they try to edit it and tweak it a little bit and fix it. That's exactly what happens. Right.

Starting point is 00:09:53 But it actually doesn't necessarily work because the market has changed. And so there's a lot there. Were you ever, I mean, I'm just curious about this. Were you ever able to find or study associations between people's intent and outcomes and job listings? So, for example, one of the things that we've seen happen a lot is that people only become real about what they actually want out of a job description when they actually put words to paper and words have that power to sort of help discipline what you're looking for. You might not even know what you're looking for until you write it down. Have you ever looked at anything around that or found heard interesting anecdotes around that, given your work? We have seen that listings tend to perform better when they are originally authored.

Starting point is 00:10:35 So you can see some degradation over time when people patch, you know, I take a little bit from this listing and a little bit from this one and I sort of stitch them together. And it's probably because when you're originally authoring it, you bring that coherent point of view. That's really interesting. So a little bit. It's pretty early for us to have seen that. And we also identify phrases that torpedo your listing. Right. You know, there are, you know, corporate sort of cliches and jargon. So buzzwords, basically.

Starting point is 00:11:03 The biggest, you know, one of the very common, we call it a gateway term that kind of torpedoes your listing is the word synergy. Oh my God. That should torpedo any piece of content. I don't care what it is. But it's a gateway term because when people include synergy, they're also significantly more likely to include, you know, value add and make it pop. Right. Right. Right. Kind of silly. But they're all over the place. And it turns out, every. candidate of every demographic group hates them. Yeah. And so there's a lot of opportunity to improve in jobs. In the sort of editorial world, we would call that jargon.

Starting point is 00:11:39 And it sounds like... We also call it jargon. I think we all call it right. Jorgon is jargon, but no, totally. Actually, it's interesting because with words like that, they're obviously in use because they're useful words. And it's kind of sad because, I mean, Synergy at some point was probably a useful word.

Starting point is 00:11:54 So it's kind of interesting because over time with your corpus of day, you'll be able to sort of map how people's language changes. And when you think of dictionaries as like these static instruments for capturing text these days, it is kind of fascinating how language is changing in a way that we're able to track differently now, thanks to online and software. It changes lexicography. Yeah. As a whole discipline, it changes lexicography for sure.

Starting point is 00:12:16 I don't know that you could do it in a static way anymore. Right. I totally agree. The internet has just exploded that. Right. Exactly. Is there, so if big data is kind of neutral now, is there a kind of job? type or job description that's the celebrity of the job search world right now?

Starting point is 00:12:32 Yeah, what word is sort of popping out that's really moving the needle for you guys or that you've observed? There are several. Most of your listeners are probably intact. It varies a lot by industry. So at scale right now. Outscale is a very popular phrase. That's popular here too. Yeah, well, it is.

Starting point is 00:12:48 You don't want to do things in use methods that are perceived to be manual or perceived to be limited in some way. So at scale is one that shines. And it started in tech, but it spread to other industries, which is common that we see that. One of my favorite examples, given that we spend a lot of time talking to HR people, is, it turns out workforce analytics is no longer a good phrase to use. You want to use people analytics. So you can get these highly specific, you know, deep in an industry changes that if you're in the industry and you're on the industry and you're on the cutting edge, you probably know. But if you're just a startup trying to hire your first analytics person, you probably have no idea. You don't have a deep background in the industry.

Starting point is 00:13:35 Right. Yeah. So you've described it for job listings in real estate. And so this approach, you think, can extend in different directions. You started with Kickstarter. But what is it that it's doing? And how do you, like, it seems a little bit magical. I have to say that. Like, I know that this is a job listing, so therefore it's going to have to do this. But a real estate listing has to do something kind of different. That's a really good question. So, you know, this approach is as powerful as the data set that you have. So if you want to understand a document type, the very first thing you need to do is collect

Starting point is 00:14:09 a lot of examples of the document type. And that means you need the documents and you also need some information about their outcomes. So you are publishing a Kickstarter project. We want to know, did you make money or not? That's signal for us. You're publishing a job listing. We want to know, did you attract a lot of good people? Did you attract only men?

Starting point is 00:14:29 Did you attract no one? So, you know, for each document type that we take on, the first thing we do is we make sure we build out a great training data set. And then we apply really classical natural language processing techniques. So we look for patterns. And so we say, okay, these are the ones that we're successful. We're successful is defined as, you know, attracted more applicants than 80% of similar listings, maybe. And then we start looking for the linguistic patterns and the successes, the ones that aren't

Starting point is 00:14:59 as successful, ones that skew in a certain way demographically, and then we play that back. So sort of a key thing for us is that you get that feedback in real time as you're typing. So as you're working on your document before you ever publish it, for ever pay to publish it somewhere, you can make it good. And so the training set is the sort of core of all of that, because without that outcomes data, then it's just someone's opinion. I mean, could you extend that to say, like, look, I want to write a screenplay for a blockbuster? I mean, does it, could you, I mean, people probably tried this. In fact, a very prominent Bay Area CEO proposed to us a couple months ago that we

Starting point is 00:15:39 started applying this to screenplays. To start writing, to actually start producing content or just analyzing them. Sell it to Hollywood. Oh, wow, that's great. Yeah, so I think any time you're writing content to sell something, this is really interesting technology and you could be selling your company you could be selling yourself you're a job seeker with a resume that you want to have optimized you could be selling your product in a you know an e-commerce setup you could be marketing yourselves you could be marketing blast emails anytime

Starting point is 00:16:06 you're writing content to get people to take an action um this is really useful technology let's talk about where this fits and let's actually go let's purposely use some jargon here and let's talk about where it fits in the tech trends like where it fits in that space so it sounds like you you're describing big data techniques applied to natural language or machine learning techniques applied to natural language. But natural language has been around for over three decades, 30 years. I mean, and in the early days, they didn't have this kind of corpus to train the algorithms on, obviously. So they had to use different kinds of techniques. Like, where does your work fit? And how do you see how it fits in the evolution of natural language? Like, how is it been and where we are, where are we now,

Starting point is 00:16:47 kind of? Yeah. I mean, I think in core natural language processes, empirical strategies have always been really important. So when I was a grad student years ago writing a dissertation, collecting data was just a lot more work, right? So I had to go and record people in the field and I had to transcribe things. It feels like ancient now, actually, but I actually finished my PhD 12 years ago. It wasn't that ancient. The fact that the internet has codified everything over the last 15 or 20 years, at least in English and most Western languages, means that you have this ready set of corpora available for you. The tricky part is collecting the text and the outcomes.

Starting point is 00:17:31 The outcomes are the part that's hard. Finding the content is easy. So you're describing the difference between just analyzing something and being able to predict something using that text. Exactly. When you analyze something, you can say, oh, cool, this word is really popular now. That's an interesting fact.

Starting point is 00:17:47 It might be valuable to someone to know it, but it's different than saying this word is actually helping your document in some way. What are some other scenarios where you could use sort of this natural language text analysis to make more predict interesting things? Yeah, so people are really starting to think broadly about this. We saw a New York City-based company helping people optimize the sale of their New York City apartments recently using the right phrases. We've seen people do things in health care.

Starting point is 00:18:19 that I think are really interesting. It's not a known vertical to me, but looking at the kind of notes that doctors take about a patient and predicting that patient's likelihood of having a major insurance incident over the next 12 to 15 months. It's really interesting things in actuarial science. I think anytime people are producing text, which, by the way, in businesses, whatever your business is,

Starting point is 00:18:41 text is actually the thing you produce the most of. Right, I believe that. Which any industry, and so people produce a lot of text, it's meant to describe often what they think is going to happen. And so, I mean, the field of opportunity is pretty big. The techniques you're describing, is it the same underlying technique applied to all different domains? But do you have to also train each corpus on a different domain? Like there's special, like there's inside language in each industry.

Starting point is 00:19:07 Or are they also universals across all of them? That's a really good question. You don't know until you train is the short answer to the question. So we have a set of NLP libraries that look for common attributes of text, and we always start out any new vertical by turning them on the documents and seeing what happened. So things like sentence length, almost always interesting. Things like the density of verbs and adjectives, almost always interesting, document length, almost always interesting. But the specific phrases that matter, what it means to write a job listing is very different than what it means to predict.

Starting point is 00:19:46 whether a patient is going to become ill, right? And so the specifics matter. The goals matter. So if it's a document that's intended for broad consumption, it really probably shouldn't be longer than 600, 700 words. If it's a stock prospectus where you're giving a company some information about how their stocks are likely to perform, it's going to be pages and pages. And so, you know, the specific benchmarks that you're looking for

Starting point is 00:20:13 are often very vertical by vertical, but the principles of the kinds of things you look for are pretty similar. In the past, it seemed like only really big companies could do this because they had the type of computing hardware and processing power to pull this off. Like, what's changed that a small startup could do this? AWS is what has changed things, right? I mean, cloud compute at scale and, you know, Google Cloud and Azure. There's a lot of competitors now, but AWS did this for startups, I think.

Starting point is 00:20:39 And I say that not because I worked at Amazon before. but it actually is like for our team to set up the server infrastructure that we need is trivial you know so I think that that's a thing and it just the fact that there's so much text data encoded on the internet has Google has democratized a lot of access to data and so that has helped too that's great yeah did you guys I have to ask did you kind of put any Kickstarter projects up there yourselves just to give it a whirl we were asked this a lot during our fundraising we did look at pitch decks by the way one of the things I will come back to your question.

Starting point is 00:21:22 One of the things that's been fascinating about having the beta out there in the world is the ways people are using it. So, of course, they're using it for job listings. But people are using it for everything. Like, just a couple days ago, I had a material science professor write to me saying, I put all my course syllabi through. And I was like, really? Like, how did that work for you? I can't imagine that that was a good result. And he's like, oh, I threw out all of the job parts.

Starting point is 00:21:42 I just looked at gender bias. That was a component that I needed for what I was doing. So describe when you say put it through, like what? what happens? I understand like in my head I have this idea that I'm typing along and you know suggestions come flying at me but that's exactly what happened. So there's a website and you paste or type in your content and as you're typing it's getting annotated and marked up for you with patterns, suggestions, things you might want to change scores. And you can in the case of the syllabi, right? You can dial it up or down depending on what you want the outcome to be. So in his case, look, I'm sort of tracking for

Starting point is 00:22:16 gender bias. He was looking at. He was looking at. for a specific aspect of what we provide. And that, of course, the product isn't tuned for what he wants, but he still found that aspect to be applicable to what he was doing. We're seeing people put marketing content through, pitch deck content through. So to your question about did we initiate any Kickstarter campaigns, we didn't because we weren't making a- You guys would be genius at it.

Starting point is 00:22:39 We might be, yes. We've given a lot of advice to people on Kickstarter projects since then. But we didn't because we were making an example. enterprise product, right? And if we had followed through on a Kickstarter product and then it got funded, then we'd have to build it. Right. So what'd you find? But we helped friends for sure. That's great. So what'd you find out about the pitch decks? Actually, I'm totally intrigued by that, obviously, given who listens to our podcast. I mean, pitch decks, pitch decks are not always highly text oriented, right? So great pitch decks don't include just your text attributes, but there

Starting point is 00:23:12 are certainly things like length of your deck that matters. Slide titles end up mattering quite a bit, people are looking to see a certain style of content. And let's face it, we've all seen any kind of meeting where some one person gets hung up on one word in a headline, which always happens to. We didn't go deep on pitch decks, but we looked at as many as we could find

Starting point is 00:23:35 as we were building our own pitch deck in our last round of funding and found some patterns. In the synergy line of questioning, were there words or phases you should never include in your pitch deck? you know I don't know I don't know I guess there might not even actually be yeah I wonder if there's

Starting point is 00:23:53 I bet there are we didn't identify them Synergy is probably Yeah So actually let's talk a little bit more about some of And maybe we should wrap up on this note Let's talk a little bit more about some of your findings Around gender differences

Starting point is 00:24:05 So you said the material science professor Tested his own syllabus Which again I'm not sure that made sense like you said Because there wasn't a reference Corpus to I guess I guess There wasn't but when you have You know tens of thousands of phrases that are lighting up and he's writing for a science STEM student population

Starting point is 00:24:24 odds are good that there's going to be some lexical overlap so he found some things there so describe some of your findings around job descriptions because that's given what your product focuses on right now in terms of gender differences and how people um what things you picked up on that yeah so prior to us doing this there was some really strong qualitative research right, that National Coalition of Women in Technology, the Clayman Institute here at Stanford, they've done some really interesting qualitative work. But the number of phrases that they identified was on the order of a couple hundred. Avoid rock star, avoid ninja. You know, we want to hire more women in technology. The interesting things for us, first of all, we've talked to a lot

Starting point is 00:25:07 of industries outside of tech. And so while in technology, we want to hire more women, when I talk to people who are hiring ICU nurses or elementary school teachers, bias goes the other way. And so it's very important to us that we don't judge. We just forecast and let you make the right choices for your business. Right. Whatever you're optimizing for, given wherever there's indifference or imbalance. So I will say we have validated much of the qualitative research, which is good, that there's some alignment on those points. We have found cases where things are, it's pretty subtle, right? So the difference between fast-paced environment and rapidly moving environment, it's almost head-scratchingly tiny, but statistically, one of them draws many fewer

Starting point is 00:25:55 women to apply. And which one is it, by the way? Are you allowed to tell us? Fast-paced. Oh, okay. Interesting. Fast-paced. So you see sometimes these very fine distinctions between terms that you can only kind of play out statistically. The other thing I would say is that most individual terms aren't that egregious one way or the other. We put a lot of effort into making something visual so that you could see patterns. So if you have one sort of male bias term or female bias term,

Starting point is 00:26:23 you're probably not going to shift your applicant mix that much. But if you have 10, then you're going to see more substantial impact on your applicant set. Interesting. I feel like this kind of validates the natural language approach because in the past, I think in general people tend to put too much stock only on numbers and not on words, like the whole point of what your work is. But the second part of it is that even, you know, I was thinking about when I was back in grad school,

Starting point is 00:26:48 there was a lot of debates between qualitative and quantitative data and what was more valuable. And obviously, at the end of the day, they're both valuable. Exactly. But it's interesting because for the first time, you're really bringing quantitative, quantitative techniques to something that was traditionally in the qualitative domain, just like conversation analysis. Yes, it's true. I mean, so, you know, I've looked at in some of my prior research, I've looked at some other document types also. As Textio was getting started, the piece that actually really brought us into jobs was some work I did on performance reviews. So I collected hundreds of performance reviews from men and women who work in technology.

Starting point is 00:27:27 They were all voluntarily given, which meant they were all good reviews, which I was betting on, that I was going to be comparing strong performers regardless, because you don't give your review unless it's a good one mostly. And I found really striking demographic differences and how men and women who were getting good reviews were described in the language that was used. Wow, that's kind of interesting. The word abrasive, which has been talked about since then, ended up being used in 17 out of a couple hundred women's reviews

Starting point is 00:28:00 and zero times in men's reviews, right? The sort of stereotypical, like aggressive was used in a man's review with an exhortation to be more of it and in women's reviews is a term of some judgment. And so that was really interesting. I looked more recently at resumes. So I collected 1,100 resumes from men and women in technology, about half of each and found for men and women who have very similar backgrounds, very systematic differences in how they present themselves in a resume, which is really interesting. How did that difference play out? So men's resumes were shorter.

Starting point is 00:28:39 They were much deeper into detail about what they actually produced and worked on. That's kind of counterintuitive because you would think that shorter means you'd be less detailed. But you're saying that they were shorter but more detailed about specific things versus other things. Women's resumes tended to tell a story. They were written in prose. They didn't use bullets nearly as much. They included executive summaries. They included detailed statements of their.

Starting point is 00:29:05 personal interests that were twice as long as what men tended to include. So the women's resumes were stronger on narrative, much lighter on detail. The men's resumes were generally stronger on detail and later on narrative. But one of those kinds of resumes gets flagged as positive much more frequently, right? In tech especially, we look really for what did they deliver, how quickly and tersely can they communicate. And so as we started looking at some of these documents, we realized that there was just fascinating opportunities on the job listing front because in these other important business documents, we were seeing demographic differences play out. That's fascinating. Were there any other sort of takeaways you have for people who are job seekers out there who want to optimize a resume based on what you discovered? I mean, the length and the narrative is an interesting point.

Starting point is 00:29:54 I mean, does it matter, by the way, for a particular industry? I know you said you did. I looked at tech and resumes. Okay. I bet some of the same findings apply in other STEM fields too. I bet finance you would see some similar patterns. We do see tech and finance pattern together quite a bit in other document types. That's interesting, by the way, that those two domains.

Starting point is 00:30:16 It's the quant. Okay. We love numbers, we love data, we love rigor. Those are things that we share. Yeah, I don't know. I mean, I'm always very reluctant to tell somebody. to change the way they tell their story because I think both of those styles

Starting point is 00:30:33 are needed for companies. You need people who can tell a customer story and you need people who can track, find detail. And so I guess I would prefer to tell the story the way you actually are and find the company that values it rather than change the way you tell your story. You're right, because that is actually key.

Starting point is 00:30:52 That's a really good point. My one question, though, is this whole idea of optimizing language. How far does that get? Because at some point I do think that an optimized becomes average or bland or, you know, not to point fingers, but I'm thinking of demand media, for example. Like collect all this data on what people like or want and then spew out on the other side, something that nobody likes or nobody wants. I love that question. So it turns out that if everybody sounds the same, no one stands out and the good writers continue to find a way to stand out.

Starting point is 00:31:25 And then that changes what works. So the beauty of a learning system, we used the example of big data before, is that if everybody tries to glom on to the same patterns, they're no longer effective. Someone is going to figure out, as with any marketer, someone is going to figure out how to do it better, and they're going to introduce the next pattern for success. So that's how we see it. What are some other ways that people are using what's out there right now that's been surprising to you? So it was initially surprising to us that people were using our tool for anything other than job listings because we trained on job listings. We're a data quant oriented company. That's the promise we made to say this is going to help you with job listings.

Starting point is 00:32:06 As people started trying new things, we realized, oh, there's nothing doing quite like what we're doing. So people want to test its limits and see what kinds of content they can put through. some of the crazier things we've seen. So we've seen resumes, which kind of makes sense. We've seen lots and lots of marketing content. We've seen people putting their product descriptions through. Toy company that recently removed gender labels from their children's toys put toy descriptions through to see if they were still flagging any gender language.

Starting point is 00:32:41 Just the ninja toys I expect. Just the ninja toys, right. And, you know, so it's, we're not trained on those documents, but people are continuing to use the system for it because I think there's a hunger for that kind of experience and there's nothing tailored to what they need. And so for us, it has offered some insight into what people need and what we might do to support that. Will we be able, will we ever be able to use your technology to ask questions? Like say, hey, I want to know X, Y, or Z based on all the things you've trained it on? I don't know if you'll be able to ask questions, but we do think that you'll be able to use it to generate simple content. So I think there's something fascinating in the, hey, answer these 10 questions about yourself. Tell me where you'd like to work, and I will make the resume that is most likely to get you a good screening interview. And from there, you're on your own.

Starting point is 00:33:32 You've got to be good, right? We're not going to lie for you. We're going to tell your story, but we're going to tell it in the way that is most likely to get you that job at Google that you want. Or get that loan down the road, if that's what it is. Right. Avoiding the word of God. Yeah, we've seen like conference calls for participation coming through and grant proposals. And, you know, again, people are trying to write to get a result.

Starting point is 00:33:53 So we're seeing quite a bit of variation. It's not the majority of the usage, but it's, you know, a few percent. All right. Well, that was Karen Snyder of Textio and another episode of the A86&Z podcast. Thank you, everyone.

The a16z Show - a16z Podcast: It's Not What You Say, It's How You Say It -- When Language Meets Big Data

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.