The Data Stack Show - 18: Data Science in Health Insurance with Jason Haupt of Bind

Starting point is 00:00:00 Welcome back to the Data Stack Show. We hope your holiday season has been wonderful. We have a very interesting guest today, Jason from a company called Bind. Jason has a very interesting background, actually coming from the world of physics and specifically academia. And hopefully we get to talk to him about some of his work there. background actually coming from the world of physics and specifically academia. And, you know, hopefully we get to talk to him about some of his work there. And Bind is a fascinating company. They are doing a lot of interesting work in the healthcare space and bringing price transparency to health insurance, which is fascinating. So I'm extremely excited to meet Jason and learn about his sort of

Starting point is 00:00:47 data science practice at Bind Costas. He's such an interesting guy. What do you want to ask him about? I think the most important aspects of our conversation today is that we are going to discuss with a data scientist and actually a pretty hardcore one, which is great because usually in our show so far, we mainly have people from data engineering and we are covering things that have to do more of the typical data stack around BI and the standards analytics that a company implements as the first step into becoming data-driven. So today we are going to chat with someone who has a very strong background in data science. So I'm pretty sure that we will have the opportunity to discuss about a bit of more advanced, let's say, analytics use cases.

Starting point is 00:01:36 So this is super interesting for me. Another thing that's going to be, I think, a big part of our conversation is around data privacy and how you work with sensitive data in general. I think Bind is a very good example of a company that has to work with very sensitive data. And it would be super interesting to see from the data scientist's perspective what privacy means and how at the end you can deliver value without compromising, let's say, the privacy and the security of the people that are trusting you with their data. So yeah, I think it's going to be very interesting. Hopefully we will also learn something around physics. We'll see. But yeah, let's do it. Great. Let's dive in. We have a really exciting guest, Jason from BIND. BIND is doing some really interesting things in the healthcare benefit space. So we'll hear about that. But first of all,

Starting point is 00:02:33 welcome to the show, Jason. Thanks for joining us. Yeah, thanks a lot, Eric. I'm very interested in having a conversation with you guys today. We are too. Well, let's start out. Could you just give us a brief background on yourself and then just a high-level overview of what Bind as a company is doing in the healthcare space? Yeah, really good. So I myself got my PhD in particle physics, worked over at CERN for a long time. For me, I used to always say petabyte is a small data set because it was easy to run 20,000 jobs overnight and process data. I left that to go into industry, ended up in healthcare, just kind of happened local to the Minneapolis region. I worked for a provider, a large local provider organization for a while. So that means hospitals and clinics for a few years.

Starting point is 00:03:20 Built a team until that team got acquired by Health Catalyst, a startup that just went IPO last year out in Salt Lake City. And when that acquisition occurred, I moved to the insurance world for UnitedHealthcare, did that for a few years, led a team of several hundred, building with a few petabytes of data within internally to the UnitedHealthcare, building a lot of assets on their benefit services. And all of a sudden, I got a call someday from a startup in Minneapolis that they had a new way of doing things. So I listened to the pitch and yeah, that was right. I felt I wanted to be part of solution in what BIND is and actually maybe change some of the fundamental structures that I felt were not right about how health insurance operated.

Starting point is 00:04:07 So, Eric, you want me to get in a little bit and tell you about what BIND is? That'd be great. I mean, you know, health care is such an interesting space and it seems like BIND is doing some really interesting things. So, yeah, an overview of how y'all are trying to change things would be great. Yeah. And one way is compare what other people are doing. And the other is compare to expectations, right? One of my favorite ways of describing what buying does is taking the consumer approach. And I'll give you a couple examples, right? If you think about the way healthcare works and the way you expect consumer interactions to work in your day-to-day interactions, For instance, let's say you take

Starting point is 00:04:45 your credit card, you decide to stop and get gas, you swipe it, and you drive away. Imagine you don't sit there and pray or hope or whatever you may do, such that I hope that when it appears on my credit card bill in three to four weeks, it was only $50, right? There's no disconnect between the price you pay and when the transaction occurs. Similar, you're not going to go from where you guys are located to Vegas for the weekend and come back and hope Delta, United, whomever you fly with only charges you $500 for the ticket in a month or 2000, right? Those are the type of price swings that we see in healthcare. Could be a couple hundred, could be a couple thousand. So the fundamental

Starting point is 00:05:25 problem we have here is it's not the consumer marketplace doesn't exist. So what does my team do? My team has data on tens of million Americans in one data set, almost 200 million Americans in another data set about their experiences, their claims and other experiences with healthcare. We look for those patterns of how people experience healthcare, both cost efficiency and quality. And it's simple as this, we rank everybody, every provider in the space. And then what do we do? Because we're the insurance company, I put a different price tag on everybody. And what we do then is we expose that price tag to the members. And guess what? What they see is what they pay. They can look it up in their app.

Starting point is 00:06:08 They're not app savvy or website savvy. They can call us, right? And like, you want an MRI right there? A hundred bucks. You want an MRI down there? 2000. And what this does in the end, it's open access.

Starting point is 00:06:21 So we don't restrict the access, right? We have a broad nationwide network. So we're not one of those companies that are out there like, oh, we'll just find the cheapest person. That's the only place you can go. We think that's horrible. We just price everybody and consumers can make a decision with their wallets, right? And we incentivize them appropriately. Let's say for back surgery, this one might be $500 because they only charge 20,000 to the employer. Or this one might be 5,000 because they charge $200,000 to the employer. Employ this one might be $5,000 because they charge $200,000 to the employer. Employer saves $150,000. You as an individual just save yourself $4,500. And guess

Starting point is 00:06:52 what? It works. Simple as that. We found these categories where if you introduce price variations in some of our products like Bind On Demand, there's an activation component where you can activate additional insurance coverage on demand or Bind Basic whereand, there's an activation component where you can activate additional insurance coverage on demand, or Bind Basic, where there's no activations required. Simple as that, we keep finding these categories where if you just show people the prices, some people are going to, enough people are going to make decisions and save tens of percentages and sometimes 20 or more of the overall health insurance costs for some of our employer groups. This is, it actually works. And we've been scaling. I can tell you about some of our clients, but it just is an idea. We are a multiple X growth company. One, one is essentially what it usually looks like for us. Are you going to start using the product?

Starting point is 00:07:40 Am I going to start using it? Well, we're not on the individual market yet. So your employer currently has to, we operate technically as a TPA in most of our business. We're fully insured in, let's say, the state of Florida. And we will eventually have an individual product that you can get on the marketplaces in various states. But right now, your employer has to have Bind as an insurance option for you to be able to select it. Some Bind is the only option. For others, Bind is an option amongst a handful of others. Right, right. Yeah, actually, my comment was more about your pitching,

Starting point is 00:08:16 which I think you might... Yeah, that was my sales pitch. Yeah, you did an amazing job pitching the product and the business. So yeah, I think both Eric and me are sold on it already. Cool. So Jason, do you want to get like into a bit of more like the technical details on like how this works?

Starting point is 00:08:34 You mentioned about like the size of the data sets that you're working with. And I think it's pretty clear that, okay, like a big part of the product itself is based on the analysis that you are doing on data. So from a technical perspective, how does this look like? What kind of technologies you are using? And then we can also discuss a little bit more about methodologies and what kind of analysis you are doing on the data. Yeah. And I mean, one of the things, you know, compared to, you know, working just previously at the, you know, a fortune 10 company, very large, lot, lot of data assets, you know, national company is you can see what slows them down, right? So bind has taken a cloud only

Starting point is 00:09:19 approach to how we deal with our data and allows us to take AWS services as we need to scale them and use them in a way that just was so hard to do when you have these merger and acquisition on-prem solutions that are just really slow to catch up. I will say one thing I'll know about the big co's

Starting point is 00:09:40 is they're getting better with their modernization strategies. So they're getting to a point where they're getting more and more cloud-based and more and more able to scale some of their low-level functions as well some of their medium and high-level functions that's great for them they're on a multi-year journey to be able to have basically modern services that are cloud-based that can actually scale or not you know oh that'll take two to three years just to optimize something 10%. But we get to start off, hey, it's cloud-based. I can click a button AWS and double or triple my database in minutes, right? Depending on how the Redshift shards work, or I can go with more of an online solution, right? So most bind apps are, you know, a Java based backend microservice approach sitting in a very heavily secured AWS account. And my team is more of a Python based data science team has only been really coming live the last year, last year and a half.

Starting point is 00:10:48 My team's already over three years old. So a lot of what we had were custom built and we've kind of gone through one level of modernization there as well, where we're using SageMaker, Step Functions, Lambdas, a bunch of those AWS technologies that are allowing us to build model inference pipelines or data transformation pipelines. And a lot of our data transformation right now is done in PySpark, just to give you an example of the type of things that we're doing. Oh, that's great. So as a data team, let's abstract a little bit like your work and let's talk in terms of like inputs and outputs so what's the input that your team gets which my assumption is that like the raw data that you work with and i'd love to hear a little bit more about like what kind of data you're working with and the sources and what's

Starting point is 00:11:35 the output which probably might be is it a model is like how does this work and how do you update and how you actually turn the result of your work into into a product at the end something that the end customer can use which i think it's super interesting to hear about because my experience so far to be honest it's more about people who are doing more ad hoc analysis or like in the bi space and the kind of work that you are doing and how you can turn this into a product i think it's something very fascinating and still something that the industry is trying to figure out, right? There are still tools that are built for that needs.

Starting point is 00:12:09 Yeah, that's a really good question. And this was one of my bigger concerns when I had the large team at the large insurance company. I was worried about taking the flavor of the day in terms of the tool, getting vendor lock. And then a different tool a year, year and a half would be better. Because it's just a better product offering at that time. And I ran into that a few times, right? Having the larger team and running into that problem. And it was kind of annoying.

Starting point is 00:12:35 So sometimes I just hired a bunch of Java developers and developed a tool that met my needs. Rather than trying to find the one that I felt I was willing to go with a little bit of vendor lock with, right? Being able to not be able to, being able to trust that they're going to move in the direction with me. So sometimes that's been very successful. Sometimes it's not. So what do we actually do here at Bind? My inputs. Medical claims.

Starting point is 00:13:01 So we have both the plan that we're operating, the hundred plus thousand members, and then that more than doubles come one, one, when we're talking about hundreds of thousands of members that we have on our plan in just a little over a week's time. What we do is we have their medical claims coming in. Then we have their ability of their other touch points that we have, their RX claims, other sort of interactions, right? Whether they're eligible, one of the providers like, hmm, does this person have insurance? Person walks into a doctor's office, they'll send in some sort of query that says, hey, does this person have it? So I think about inputs as signals. Each of these things is a signal. A claim is a signal. Someone getting a prescription is a signal. A doctor checking things is a signal from a operations perspective. Plus a good percentage of our members log in with

Starting point is 00:13:50 our app, right? They sign up, they log in, they begin to search. All of those become more signals that I can use about member behavior that I can link into outcomes. Plus then I can go to the market, buy some of these historical things on these tens or partner with other organizations to get these tens of millions, in some cases, almost hundreds of millions of other historical data. That's other signals that I can use. My team takes that historical data. We look for these patterns, and then we implement product based on that pattern. So ranking all providers based on a myriad of algorithmic things, that's something my team does. That gets loaded into the product and what you see as a price tag for every provider, what you see as a price tag for every service, right? That's something we deliver. And also in the other sense, we take that historical

Starting point is 00:14:42 data to build models. We can take these patterns and predict what's likely to happen next. And then we put this into our MarTech stack. If you're not familiar, that's the marketing technology stack that allows us to fuel our member engagements or our internal marketing strategy. So based on something, and now the next time you log into the app, it might say, hey, it looks like you're about to head down surgery. Are you interested in a free second opinion service? Right. That's just giving you an example of one of our internal marketing campaigns that are fueled based off analytics and services, which I can tell you, you know, there's a lot of I've had research jobs before in the past.

Starting point is 00:15:22 But I heavily focus on things that are driving value to my members. In fact, if my team's like, oh, we want to improve this algorithm, I'm going to say, well, let's look at the roadmap. How is this actually going to help our members? What's the likelihood that it's going to help our members? And we try to focus our activities on things that are going to help them. That's amazing. And I think Eric will have a couple of questions to ask about the marketing tools and how you work with them. But before we go there, my last question, data related for me, and then we can return to that, is about,

Starting point is 00:15:55 you said something very, very interesting. You talked about signals, like all these data points that are coming, they're like actually signals that you combine together. And the end result at the end is price point, like price value. In my mind, what I find extremely fascinating, it might be because of me, but is how you start from something that has so many dimensions, all these signals that we are talking, because I think the problem with a bit of a problem with the term signal is that people might tend to think that it's something very one-dimensional, but usually these data points might be quite complex. And you manage through all the models that you build to collapse all this into something that's like a numerical value that someone can use.

Starting point is 00:16:36 And for me, this whole process and this kind of magic that data science and all these algorithms do are like, it's amazing. But can you share a little bit more about the structure of the signals that you are working with, how they look like? You talked about claims, right? I think most people will think of a claim as a piece of document that they have to fill, right? So how does a claim looks like from the perspective of a data scientist?

Starting point is 00:17:02 And what's the complexity and what kind of preparation you have to make on this data in order to turn them into signals that then you can apply all these algorithms and turn them into value at the end? Yeah, that's a really good question. I'll give you an example both of the claim and I'll give you an example as well based on our kind of member direct search experiences. But let's unpack that example of the claim, right? The claim comes in what's called an X12 EDI format, electronic data interchange format that's been around for quite some time. This format, very compact, can be unpacked. And if you think about it, a couple times in the past, I've had people write Java parsers to unpack. When you unpack

Starting point is 00:17:42 this into a data warehouse for a professional or an institutional claim, you usually end up, and I'm not joking here, with between 10,000 and 11,000 columns, right? It's a very sparse thing. Not all of those 10,000 to 12,000 columns are populated, but sometimes there's things that are given, but some things you don't know. So the structure of a medical claim, because you think about it from this paper form, gets transferred into 12,000 column sparsity. Wow. That's wild for a single claim. there's so many loops that are allowed, right? So you can have 25 diagnostic codes for every procedure code, right? And then every procedure code can have a different assigned pointer

Starting point is 00:18:31 to a description as to why, right? So that creates these, if you think about it, if you're JSON or XML in mindset, you think about these nested loops, right? And why it's so compactified. But if you wanted to unpack it, and denorm it as much as you want, I've done this activity in the big co and it becomes big, what we found out is you can create levels from that variables. Because, you know, people spend time with that very unpacked version of several 100 key variables to

Starting point is 00:19:04 several dozens of key variables. So I've gone through that activity. It was kind of interesting when I left to the startup, I just developed a big product, a couple of teams adjacent to me and myself, my own team, this kind of real-time online claims processing system and micro batch that as a claim comes in the door, it issued fraud predictions within minutes, right? I think it ran every 10 minutes. It was great architecture. Then I read Uber's Michelangelo architecture. And when I left, I'm like, ah, an online offline. If you haven't yet, it was, and they've had a couple articles since, since 2018 and again in 2019. And I'm like, that's very much like the architecture that we had built about taking things into database, unpacking it, creating an online

Starting point is 00:19:45 version of maybe taking those top one or 200 features or putting them into a feature store and then building all of your models on that feature store. So yeah. So when I say this is a signal, it is not one dimensional. It could be 10K to 12K dimensional, but when I'm actually running my models, I've already limited it down to those couple hundred features or so that are key, especially for things that are run online. Offline, I can keep a few more, but to be honest, it's so sparse that going beyond a few hundred isn't there. So that's an example. And what's interesting, my team even did that here at Bind. We unpacked that format. We picked out the top 20 or 40 features or variables that mattered, built our models specifically on those features, and therefore deployed our models specifically on those

Starting point is 00:20:33 features to get, you know, depending on what we were trying to predict, varying degrees of success, some of which now are impacting our members positively. So, and if you're interested, I can tell you the other space is search, right? We have a type ad as you're searching. Every time you click and add another variable, we have metadata about that search. So you can think about it as like, oh, you type in diabetes, but by the time you get to the S, I've known every, you know, every single, I have a row in my database for every letter you've typed. And I've known what search results exist. I know what a search attempt looks like. I know if you went back and went forward and what your final search. So even though some people would say, oh, it's a

Starting point is 00:21:14 signal, what do they search for? Well, I've got metadata stored in my logs for every keystroke you made, which allows me to make sure people aren't, people are, my search is working effectively, right? They're finding the things quickly. They're not, they're not misspelling things. It's providing diabetes quickly for them, right? Those are the type of experiences that we enable by just looking at all the data. Oh, that's amazing. One last technical question before I let Eric ask his questions, but sorry, I'm really getting excited with that stuff. So you talked about like unpacking the format

Starting point is 00:21:51 and it's very sparse, as you said. So, and you also mentioned like Redshift. So can you give us a little bit more of like a technical description of how this unpacking happens, like from the JSON or XML document, whatever it is, do you end up like with 10,000 columns before you start creating the features? And the reason I'm asking is because I know

Starting point is 00:22:11 about the limitations of Redshift. Like for example, you can't have a table with more than 1,500 like columns. So I'm very interested to see how you manage these dimensional explosion that you have with the limitations of a data storage system like Redshift? Yeah, so in this role, so previous role, we unpacked it. We had everything.

Starting point is 00:22:31 We were using HBase. So because of that ability to just hold the entire object there at the big co, and then we would create HBase tables that are reduced feature sets, right? So that worked fine. But now it doesn't make sense for us to unpack the entire thing, because we already know every field is not valuable, or we can do that at a future state. So we define a schema, let's call, let's say it's, you know, a JSON format, and then we have a schema on top of that, we unpack that schema that we define, right? So it's only those variables that we've determined to unpack out of it. And you want to think about

Starting point is 00:23:07 this from a technical perspective, we are definitely an orchestration organization, right? Kafka was central to the way we set things up. So we have these engines that go in, the schema gets unpacked once, gets put into a Kafka topic that anybody that needs to use that then can use it, right? So there's something that listens to that topic and then instantiates that unpacking into an analytics-ready database that I just talked about, right? There's other people that use that to actually subscribe to that outcome to actually begin to process that claim, right? To actually adjudicate it and determine what the actual price should be, how much the provider is owed,

Starting point is 00:23:45 how much the member may or may not owe, et cetera, and how much the employer needs to pay. So we have a bit of many microservices that allow these transfers and these processes to occur. That's great. All right, Eric, he's all yours. I know that you have many questions to ask, but you know me, like I get too excited sometimes. So go ahead. I mean, it really is fascinating. I just love all the unique things that we learn on the show, like a medical claim producing 11,000 columns.

Starting point is 00:24:20 It's wild. Jason, I'm interested on the sort of customer experience aspect of what you discussed as far as the outputs of the data. And I have two questions there. One is about the interaction between the data science team and the marketing team. And the second is about just the technical piping that sort of connects your work with the MarTech stack, as you said. But let's start with the relationship between the data science team and the marketing team or other people driving customer experiences. And specifically, I'm thinking about even the example that you gave around providing a customer who opens the app with a recommendation on a free second

Starting point is 00:25:06 opinion. Where does that the where does an effort like that originate? Is that coming from marketing or coming from someone in product? And then depending on where it originates, you know, how do you work with those teams to sort of produce the output that they need from your work? Yeah. So from where in an organization is this owned? Let's say that that's been something that has changed because we're still trying to find the optimal structure. So when this first came out, there was a product owner of, let's say, if you think about this as a store, you had inventory, you have these SKUs, you have these things that people can purchase, right? Things that have price tags.

Starting point is 00:25:50 So providers doing this thing somewhere is a SKU. So you'd inventory. If you also thought about it from a retail store concept, then you have merchandising, right? How do you arrange the things in the store such that people can see things, right? You put eye level around the end caps, things you want to highlight to people. So we had in our product division, we still have an inventory function. We had a merchandising function and the person who owned merchandising was in charge of

Starting point is 00:26:18 basically figuring out how things get stocked, right? Where they were from a visual perspective, think about in the app, you know, how do we highlight things? We've since kind of changed that function. It served us very well for this to now we have a member experience function within the business, within our operational business. They are kind of more in charge of that, you know, call that the arrangement of how things are still within the store, right? I want to stick with that retail construct. Our marketing team plays making sure that the technologies are there, that our brand makes sense. And they make, and if you think about, they own a lot of aspects of it, right? How to develop the kind of the front end of that. For instance, the videos that we're going to see, the

Starting point is 00:27:06 images that are embedded onto the machines of the potential people that are going to select us, right? So our marketing team is usually focused on selling Bind out of the front door and then selling Bind to the employees within these organizations, or at least providing, giving them information so they can make the choice for Bind. We love to be in choice environments. We don't, in many situations, we don't want to be the only option. We want people to choose us over their high deductible health plan. We want to say, you know, one thing I didn't tell you, there's no deductible with Bind, right? There's no co-insurance. You don't have to hit some number before Bind kicks

Starting point is 00:27:45 in. If this is a hundred dollar MRI, that's all you're going to pay. The hundred dollar MRI. I've got that on the site, which is awesome. Yeah, you have Constance and I excited. We're going to go back to our employer and ask them to ask us to check it out. So we just want to give people that information that for many people, look at the information, go to our website, type in the things that you care about. Is it diabetes? Is it this drug?

Starting point is 00:28:09 Is this better for you? Right? So our marketing team focuses heavily on those, the upfront experience of helping to sell or at least provide buying to those HR managers. And then to the employee level, being able to understand that during these annual enrollment events, when people are given the option to select bind or to not select bind, the information they may need to make a good decision on their own behalf, right? So it's a great relationship. We've hired some brilliant people that I really enjoy working with. So

Starting point is 00:28:42 I'm really happy with the way we're structured. Very cool. And jumping over to the technical side, could you explain, and I realize, you know, this may not, you know, be under your purview from a technical standpoint, or maybe it is, but you talked about sort of, you know, pulling data in and then processing it. How does it go from the infrastructure that you and your team leverage through to the end user experience, right? So let's say they open the app and they get a notification. What are the pipelines that actually drive that experience and how does the data get from you, you know, sort of to the places where it's going to be activated for the customer? Yeah, that's a really good question.

Starting point is 00:29:33 And I would say one of the best ways for me to answer that experience is going back to architecture, being all the AWS space allows us to have some of these integrations be far more streamlined than some of the on-prem companies, right. That maybe not have thought about these interactivity or connections in their original design or use cases.

Starting point is 00:29:54 So let's say back to the orchestration engine, I can just publish my model to Kafka with a model topic, right. And then my MarTech stock stack can listen to that, right? As long as I have some sort of data contract with the marketing team or with the product owner of this stack, that they know what that thing means, what that structure of that thing I published in. And sometimes when I'm early in and not ready for full production, I might publish it to a database and they'll query that database, right? That fuels into, let's say, a segment or whatever tools that you guys are familiar with that now is now the MarTech stack that now understands how multi-mode interaction occurs,

Starting point is 00:30:37 be it phone calling people, be it emailing folks, be it fax, be it in-app notifications. So using basically, if you're talking about just that transaction, that stack can just listen to Kafka and fuel its data stores. That stack can just query a database and fuel its data stores through configuration. And then that team that manages the marketing and merchandising function can then configure those campaigns, right, within those tools, based on that data and information that they, that was loaded in. And if you want me to get more technical, I can, but that's kind of the way I like to describe that. No, I mean, that's, that is, I mean, we, you know, we have the benefit of seeing just a lot of these different setups. And, you know, the way that you have approached it is very modern

Starting point is 00:31:27 and very streamlined. One thing I'm interested in is in the development of the MarTech stack. I mean, it makes absolute sense that you would have a pipeline that the marketing stack can listen to and then sort of just receive the information they need and then, you know, route it and do the things they need to do with it. Were you involved in sort of the architecture of the marketing tech stack as well? And was that system sort of, you know, from a claim coming in to going through your pipelines and data science to, you know, sort of publishing that in a way that the marketing tech stack can listen to it. Were you involved in that or had they sort of, you know, architected their system separately and you, you know, built your Kafka pipeline to suit? Yeah. So in this instance,

Starting point is 00:32:16 I was aware, but not involved in the choice of technology for the MarTech stack. I was aware they were doing it. I was understanding of which vendors they were, but I was not a key stakeholder in that process. So it was more of, hmm, here's our, it was the data contract, if you want to think about it from that concept, was how am I going to get you data? Great. We're a Kafka organization. I can read Kafka. Just put it there, right? From an orchestration standpoint. So we had this going in position such that we already had a method of communication so they can go off and with whatever,

Starting point is 00:32:53 all the use cases that they wanted this to work for, right? To manage marketing campaigns, you really want that ability to manage the app notifications, to manage the email notifications with modern tech, right? You just do, you're not gonna build your own Java application for that. It exists in the marketplace.

Starting point is 00:33:12 So we just had to make sure, hey, here's information on how to load it in there with kind of advanced analytic techniques. So it came down to that data contract. I feel we did well with that. Yeah, and it's interesting. We actually wrote a post recently about the history of data engineering. And one of the points we brought up was that IT and marketing, there's been a schism between the two groups within a lot of organizations because IT was seen as sort of a limiter,

Starting point is 00:33:45 right? Like, oh, we don't want to go to IT because it's going to take longer and they're not going to give us what we need. And, you know, they're going to say no. And so it's just really exciting for me, especially coming from the marketing side, to hear about a partnership that's actually, you know, seems to really be driving better value and better experiences for the customer. And I think that's where things are going to go in the future, you know, as companies

Starting point is 00:34:07 really figure out that that creates a competitive advantage. So really exciting to hear about that, hear about that structure of Bind. To kind of just, you know, hit it a little bit more when I think about where this technology is going, we've still got a lot of opportunity to enable it even more, right? That's the clincher there. When I think about organizations that are stumbling over themselves to kind of get things in there, I don't think that's our biggest problem, to be honest. Our biggest problem is making sure that consumers can understand our information in a way that's valuable to them. It's usually not a technology problem.

Starting point is 00:34:47 That's not our biggest thing. It's understanding the user experience and optimizing to that is, I think, where you become a consumer-oriented organization. Like I said, as long as you are upfront with the technology, then we can actually focus on what really matters, creating a consumer experience that actually works for people.

Starting point is 00:35:10 Sure. The technology gets out of the way and you can focus on the user, which is the whole point. One question speaking about IT, the issues that marketing has with IT, we can't talk about healthcare data without talking about security and privacy. And insurance and healthcare are extremely regulated in terms of data privacy and security. So how does that impact your work as a data scientist? I mean, you're obviously sort of dealing directly with the sensitive data. I would just love to know the types of things that you deal with on the data science team related to security and privacy. Yeah. And the interesting thing about Bind, it's, I think, the most secure PHI organization that I've ever been part of. Just to kind of throw that out compared to the bigger companies I work for. So when I say that, I mean in the day-to-day operation, right? So we

Starting point is 00:36:01 take a very strong dev-pro prod mindset, right? Most people at Bind have no access to prod. In fact, very few developers do, right? So they must develop their code on test data, dummy data, implement it, test it, put it into the pipeline. Even the data scientists need to do this, right? To a point that when we want to deploy code into production, that's when we get to see it. And certain variables are covered. So only a few people have access to the PHI itself, right? It makes it harder to develop when you need to live in the dev stage prod or whatever paradigm and you need to develop that way. Doing it with data science is a little weird,

Starting point is 00:36:41 but we've figured out ways to make it better. But it takes longer to live in that paradigm. So when I say it's more secure, I just meant it was easier to get full data access at some of the other companies. But it was really hard for them to get the data off the computers. Let me put it that way, right? It'd be say, yep, this person has access to 100 million Americans' data, but it's impossible for data to leave, right? They've got the machine so locked down. The possibility for breach is very, very, very small, but it was much easier to be like, yep, this person gets full access because it's part of their job. They need it. Right. Sure. And does that impact the way that you train models on the data science side?

Starting point is 00:37:22 Like, do you, you know, thinking through test data and your development flow, how does that look on the team? Yeah, for the most part, things that are PII or PHI, most of those identifiers are unimportant from a modeling perspective, right? I don't need to know someone's name. I don't even necessarily need their address or stuff like that, right? So if I want to pull in because we do live in the age of checking for equities, and which we do, right? I sometimes might take their zip code, link it in with socioeconomic data I might have about the region they live in to make sure that we have an equitable product, right? And equitable outcomes in terms of how people experience Bind, right? So those are things that are important to me. But age is an important

Starting point is 00:38:11 variable. But most of the PII can get blinded from a modeling perspective, right? Which is really nice. The only time that I need to put it in sometimes is if I'm providing output to an operations team that's now going to go do something. is if I'm providing output to an operations team that's now going to go do something. So if we have a model that's predicting people that are going to be high cost in some sort of condition category, that data needs to be plugged into a clinical ops team that might call them or might try to help that person make good decisions on their strategy, right? This isn't all just app-based. We operate a product and we sometimes will just call the folks and make sure they have all the information they need about their benefits to make good decisions. So that happens. So we might have a couple of folks

Starting point is 00:38:57 with kind of like that front end, but we are heavily regulated, heavily locked down. And we have a very good DevSecOps, a development ops security team that really makes sure our data is protected. Very cool. Yeah, that is very interesting on the modeling side in terms of the data you actually need to accomplish what you need to accomplish. Yeah.

Starting point is 00:39:23 I mean, it's that age is, but you can group age. You can do age since January one, because 66.5 and 66, trust me, are not sensitive to almost every model I've ever seen in the healthcare space. So just to give you an example, we find that we're able to strip out the PHI for a modeling perspective pretty readily.

Starting point is 00:39:47 So Jason, I have a question that's still related with the dimensionality of the data, but I really have to ask this to you because probably you are like one of the few who can answer that. What is more complex in terms of like dimensionality? Is it a medical claim or a measurement on LHC? From a dimensionality, I would say probably measurements in LHC are far more complex now. So I worked on something called

Starting point is 00:40:14 the electromagnetic calorimeter when I was there. It had, if I remember, 60 to 70, I think it was like 64,000 crystals. So just one part, one sub detector had 64,000 crystals and the measurements of energy were sampled every 25 nanoseconds. So you had to reconstruct the energy profile. So just that, let's say it was 10 or 15 batches of 60,000. So that's already telling you you're dealing with more than a million measurements. And then for every crystal, you reconstruct the energy profile.

Starting point is 00:40:46 Then you run a bunch of higher level. Was it an electron? Was it a proton? Where was it going? All these higher level things. That's just one thing, the hadronic calorimeter. There's a tracking thing. So from a data element for every one interaction, you're talking about billions. It's pretty much the most,

Starting point is 00:41:05 what these devices are pretty much the most highly instrumented spaces that exist. And I only gave you the surface of how instrumented that was. I think when it was designed in 93, it was designed with more fiber optics than existed or was laid in the world at the time so i mean that the the world laid a lot more fiber during the actual development of it but that just gives an idea of how instrumented that was yeah yeah i think every time an sre complains about all the different metrics that they have to measure every day they should have like a conversation with you so they can feel better in terms of like what data that they have to measure every day, they should have like a conversation with you. So they can feel better in terms of like what data points they have to keep track every day. That's great. Actually, I think it's very interesting because of your background. And

Starting point is 00:41:55 that's why I asked this question, because you have a very interesting background in terms of like data science coming from doing your PhD in physics in LHC. So, you know, we're always talking about big data. We're talking about the scale of like the data problems that the industry is like facing every day and all that stuff. You are a person who has probably been involved in like one of the most complex in terms of data projects that humanity has come to so far. So can you share a little bit of like around about that? Like how does it feel coming from the CERN experiments to going to the industry? And what difference did you see there? And what also I think it's quite important, what kind of lessons you learned there that

Starting point is 00:42:39 you are still applying today and you find like very useful? Yeah. So the most interesting thing for me about the scalability question is most time when I find somebody, be it in industry or not saying, oh boy, this is just impossible to do. This doesn't scale. I look at it and I'm like, this isn't hard at all. I mean, sometimes they're talking about going from one gig to 10 gigs, and it's because they've chosen a tool that's in-memory RAM. And there's a solution for that. None of these are mind-stopping.

Starting point is 00:43:13 Similar to before, it's like I had a data set that had 900 terabytes or one that had 1.3 petabytes. I'm talking about 2010. This wasn't that hard. We wrote C++ programming over many years that would put this on the worldwide grid and the grid might run 10,000 jobs in Torino. It might run 10,000 jobs in Chicago in 2000 and would kick it back. And it would be formatted and unpacked. In eight hours, I'd have 40,000 jobs done. I made a mistake, run 40,000 more jobs. So it's kind of funny,

Starting point is 00:43:50 almost every time someone has shown me a scale problem in industry, there's already a solution that humans have figured out for other purposes. So it's been really kind of funny that when I look at it, they don't think outside of the box. So I'm in R, this machine's only got 32 gigs of RAM. I need to run 32 gigs. I just can't do it and I'm like well we could just put this on a bigger machine at least Salzy for today but or we could use something that doesn't require in-RAM analytics so I haven't yet come to a problem that I hadn't seen already a solution I actually thought MapReduce was so backwards when I left into because the way we had done things at CERN is you unpack it, you do all your analytics at once, and then you repack it. And it was a C++, so you can add 10 plated classes. So when MapReduce 2 came out, I was much happier. When Spark came out, I was a lot happier. But I still thought they hadn't met what, you know, the physics community had already done on these larger data sets, but they've definitely have made it better since then. And just to kind of hit one more thing on the research side, I find people that have gone through this rigorous level of research that have

Starting point is 00:44:53 been kind of data scientists for the large research things do very well, right? I have starting in January, my third PhD physicist on the team, but I have plenty of other folks, you know, masters in bio, master in behavioral health, who add a lot of statistical health rigor to the types of things we do. But in other areas, I've had people fail who have been in the research world, and they just keep going down rabbit holes. They'll spend two weeks on hyper parameter tuning. And knowing when the business is going to get value has been a very tough thing to learn for some folks, where, wait, one is basically perfection, the enemy of good enough. I love that phrase. Yeah, no, no. And it's amazing to hear that from a person who's coming from the

Starting point is 00:45:39 academia, because I'm also a person who worked like the academia for quite a long time. I can relate in what you're describing. And it's amazing to hear from someone like you that you understand this distinction. And that's great, actually. So going from academia to the industry, more on a personal level and more on a professional level, let's say mainly, how did you choose to do that? And what are the differences that you see there? I know that there are many people, but they are going after PhDs, and they might be like thinking about

Starting point is 00:46:09 that. So I think it would be great to hear from someone who has done it. Are there things that you regret or things that came out to be like much better than expected? And what's the overall experience that you can share with us? Yeah, so I would say it's very hard to leave academic sometimes and go in the industry. I know there's a lot of programs out there, be it, there's a few now fellowship programs, right? That try to take people with MDs and PhDs and give them data science or data engineering skills, right? And those can produce folks that now have been, now have some understanding of business value, right? Which is one good thing that can

Starting point is 00:46:46 come out of those programs, right? Some people go get like a master's in business analytics from the business schools. And those as well, come out with people that can understand business value without needing to be taught that from the get-go. I sort of got lucky with the role I had. I left the industry. It was super busy. Got my PhD. The thought of doing a postdoc in that field, the mean postdoc is over 10 years. The thought of moving my family every two years to various institutions around the world was not enticing. I wanted to kind of more dig in, develop my family and develop my career, be compensated okay for that. And so I just kind of fell into healthcare. An opportunity happened, got some experience, and then I dug in, right? I showed up to that

Starting point is 00:47:32 first job with a tie every day, right? I did that in that organization, which mattered. So when they needed a manager and as people adding value, one manager moved to director, it was easy for them to select me. I'd already been providing that value to the organization. So for me, it was that focus on the business value that allowed me to get my feet in the door. It allowed me to continue to move to do the things that I wanted to do. So I just kept saying, like, not what did I find interesting? It's what do I find interesting that matters, right? Because those first thing I did there was one of

Starting point is 00:48:06 the first things is I built a model that predicted if people are going to come back to the hospital, readmission models are still very popular. They were still popular in 2011 when I built one. And then I could meet with providers, discharging people from the hospital, build a model, put on dashboard, the dashboard refreshed every hour. They could see these colors based on my models that would say, oh, this person's got a 20% chance of coming back in 30 days, worked with them to develop interventions, and they worked to mitigate that. That was cool. That added value made me feel good, saving people's lives by providing information to doctors and in hospitals on these screens that social workers can pay attention to.

Starting point is 00:48:44 Yeah, makes sense. I mean, I think from my experience and my perspective, I mean, usually people that they go after like an academic career in many cases, because there is a passion behind that, right? Going and doing a PhD in particle physics, you need to be passionate about something to go and do that. The same with other also, not only in physics, but in other like disciplines. So I think what is quite important, and I think this is the responsibility of the industry to figure out how to do it. It's when you try to attract these people out from the academia and get them inside

Starting point is 00:49:14 the industry, I think that it's also important outside of like, okay, the monetary, let's say, benefits of doing that. It's also to try and see how these people can get passionate about the problems that they are going to be solving. And that's what I get also from what you said about this first problem that you solved in healthcare and how this drives your passion in working with data. And we have also to remember, and I think that, okay, we have a pretty technical audience out there, but most people don't understand that actually doing physics today, it's mainly a data science problem. I mean, we saw the first image of a black hole. That's mainly because people were scratching a lot, a lot of data, like a big part of this

Starting point is 00:49:56 work was finding the right algorithms, the right processes to take this raw signal and turn it into something that we can consume as humans and understand. And that's exactly what LHC is doing. And the finance sector is doing a pretty good job attracting people. But I think there's a lot of talent out there that still, if the industry figures out the right ways to do it, there's going to be a lot of value to be driven from there without wanting to steal people from the academia and all that stuff, right? But yeah, that's super, super interesting.

Starting point is 00:50:26 I think we are at the end of our recording, Jason. I really, really enjoyed it. I think we can keep chatting for hours. We have many topics that we didn't even touch. And I'm really looking forward to have another chat with you in the future. Yeah, I really appreciate it for you guys. For me, it was really kind of fun to kind of talk about these things. So Eric and Costas, it was really cool.

Starting point is 00:50:53 Yeah, we really appreciate you having on the show. It's a treat for us any time we get to talk about Kafka and particle physics in the same conversation, which, you know which not many people probably have that privilege. So thank you for joining us. Really excited about the work that you're doing at BIND. And we'll reach out again in maybe six months or so to see how things are going. I'd be looking forward to it. Thanks a lot, guys. Well, that was a fascinating conversation. I think one of the most fascinating things I learned was that when you unpack a to do something valuable with the data that comes in a certain format or a certain size, it just creates all sorts of interesting complications. And of course, hearing about the scale of data that Jason's worked with was fascinating to me. But Costas, what stuck out to you and what did you learn today?

Starting point is 00:52:01 That's a great point, Eric. I think people were spoiled by interacting with digital products and we don't really understand the complexity behind the technology itself. And we are also, let's say, a little bit oblivious of how powerful of a processing machine the human brain is, right? Like we consider something like a medical claim or something that we can process so quickly, but actually like working with this and like representing in a way that a machine can work with can become like something extremely complicated. So it was a great discussion to have

Starting point is 00:52:37 and to communicate and help people understand of like the complexity of tasks that the data scientist or a data analyst or a data engineer has to go through in order to ensure that value at the end is extracted and delivered to all of us. So that was great. I really enjoyed discussing more about the complexity of the data. And that's, I think, it's also a benefit of discussing with someone who's a data scientist, because a big part of the work of data scientists is to navigate this complexity and find out ways to compress this complexity. And of course, for me, it's always

Starting point is 00:53:11 a great pleasure to chat with people that are coming from the academic environment and the industry because these people usually are very, very passionate about the things that they do. And I think this is something that we also experience today with Jason. As a person that I'm also passionate around data, it's always a great pleasure to be discussing with someone who shares this passion. And I'm extremely happy that I also managed to learn a few more things about projects like CERN

Starting point is 00:53:41 and how humanity is actually pushing forward the state of the art when it comes to data and our understanding of the world in general. So I hope we will have an opportunity to chat with him again in the future. I think we have many more things to discuss. Yeah, I think it was great. One other thing that was very interesting to me was how seamless it seems like the relationship is between data science and marketing. And that's pretty unique, you know, even from a technical standpoint. And so, you know, hats off to Jason and the entire team at Bind for building something pretty special there, it seems like. And we'll look forward to catching them again on another episode of the Daystack Show. Thanks for joining us and we'll catch you next time.

Pet Camera - EBO Air 2

The Data Stack Show - 18: Data Science in Health Insurance with Jason Haupt of Bind

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

The Data Stack Show - 18: Data Science in Health Insurance with Jason Haupt of Bind

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.