Screaming in the Cloud - The Realities of Working in Data with Emily Gorcenski

Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. Welcome to Screaming in the Cloud. I'm Corey Quinn.

Starting point is 00:00:34 My guest today is Emily Gorsensky, who is the data and AI service line lead over at ThoughtWorks. Emily, thank you so much for joining me today. I appreciate it. Thank you for having me. I'm happy to be here. Emily, thank you so much for joining me today. I appreciate it. Thank you for having me. I'm happy to be here. What is it you do exactly? Take it away. Yeah, so I run the data side of our business at ThoughtWorks Germany. That means data engineering work, data platform work, data science work. I'm a data scientist by training. And, you know,

Starting point is 00:01:03 we're a consulting company. So I'm working with clients and trying to help them through the sort of messy landscape that data is these days. Should we be migrating to the cloud with our data? What can we migrate to the cloud with our data? What should we be doing with our data scientists? And how do we make our data analysts' lives easier? So it's a lot of questions like that and trying to figure out the strategy

Starting point is 00:01:26 and all of those things. You might be one of the most perfectly positioned people to ask this question to, because one of the challenges that I've run into consistently and persistently, because I watch a lot of AWS keynotes, is that they always come up with the same talking point, that data is effectively the modern gold,

Starting point is 00:01:45 and data is what unlocks value to your business. And every business agrees, because someone who's dressed in what they think is a nice suit on stage is saying that, it's, okay, you're trying to sell me something, what's the deal here? And then I check my email, and I discover that Amazon has sent me the same email about the same problem for every region I've deployed things to in AWS,

Starting point is 00:02:04 and, oh, you deployed this to one of the Japanese regions. We're going to send that to you in Japanese as a result. And it's like, okay, for a company that says data is important, they have no idea who any of their customers are at this point is the takeaway here. How real is data is important versus we charge by the gigabytes, so you should save all of your data and then run expensive things on top of it. I think data is very important if you know what you're going to do with it and if you have a plan for how to work with it. I think if you look at the history of computing, of technology, if you go back 20 years to maybe the early days of the big data era, right? Everyone was like, oh, we've got big data,

Starting point is 00:02:45 data is going to be big. And for some reason, we never questioned why, like we were thinking that the big in big data meant big as in volume and not big as in big pharma. This sort of revolution never really happened for most companies. Sure, some companies got a lot of value from the sort of data mining

Starting point is 00:03:01 and just gather everything and collect everything. And if you hit it with a big computational hammer, insights will come out and somehow those insights will make you money through magic. The reality is much more prosaic. If you want to make money with data, you have to have a plan for what you're going to do with data. You have to know what you're looking for and you have to know exactly what you're going to get when you look at your data and when you try to answer questions with it. And so when you see somebody like Amazon not being able to correlate the fact that you're the account owner for all of these different accounts

Starting point is 00:03:33 and that the language should be English and all of these things, that's partly an operational problem because it's annoying to try to do joins across multiple tables and multiple regions and all of those things. But it's also part of, you know, nobody has figured out how this adds value for them to do that, right? There's a part of it where it's like, this is just professionalism, but there's a part of it where it's also like, whatever, you've got Google Translate, figure it out yourself, we're just going to get through it. I think that as time has evolved from the initial waves of the big data era into the data science era, and now we're in all sorts of different architectures and principles and all of these things, most companies still haven't figured out what to do with data. They're still investing a ton of money to answer the same analytics questions

Starting point is 00:04:25 that they were answering 20 years ago. And for me, I think that's a disappointment in some regards, because we do have better tools now. We can do so many more interesting things if you give people the opportunity. One of the things that always seemed a little odd was back when I wielded root credentials in anger. Anger, of course, being my name for the production environment as opposed to Theory, which is what I call staging because it works in Theory but not in production. I digress. It always felt like I was getting constant pushback from folks of, you can't delete that data. It's incredibly important because one day we're going to find a way to unlock the magic of it. And it's, these are web server logs that are 15 years old,

Starting point is 00:05:07 and 98% of them by volume are load balancer health checks, because it turns out that back in those days, baby seals got more hits than our website did. So that's not really a thing that we wind up, that's going to add much value to it. And then, from my perspective at least, given that I tend to live, eat, sleep, breathe cloud these days, AWS did something that was refreshingly customer-obsessed when they came out with Glacier Deep Archive.

Starting point is 00:05:31 Because the economics of that are, if you want to store a petabyte of data with a 12-hour latency on the request, for things like archival logs and whatnot, it's $1,000 a month per petabyte. Which is, okay, you have now hit a price point where it is no longer worth my time to argue with you. We're just not going to delete anything ever again. Problem solved. Then came GDPR, which is neither here nor there, and we actually want to get rid of those things for a variety of excellent legal reasons, and the dance continues. But my argument against getting rid of data because it's super expensive, no longer holds water in the way that it once did for anything remotely resembling a reasonable

Starting point is 00:06:11 amount of data. Then again, that's getting reinvented all the time. I used to be very, I guess we'll call it, I guess a data minimalist. I don't want to store a bunch of data, mostly because I'm not a data person. I am very bad at thinking in that way. I consider SQL to be the chess of the programming world, and I'm not particularly great at it. And I also am lucky and have an aura. So if I destroy a bunch of stateless web servers, okay, we can all laugh about that. But let's keep me the hell away from the data warehouse if we still want a company tomorrow morning.

Starting point is 00:06:42 And that was sort of my experience. And I understand my bias in that direction. but I'm starting to see magic get unlocked. Yeah, I think, you know, you said earlier that there's like this mindset, like data is the new gold or data is the new oil or whatever. And I think it's actually more true that data is the new milk, right? It goes bad if you don't use it, you know, before a certain point in time. And at a certain point in time, it's not going to be very offensive if you just leave it locked in the jug, but as soon as you try to open it,

Starting point is 00:07:09 you're going to have a lot of problems. Data is very cheap to store these days. It's very easy to hold data. It's very expensive to process data. I think that that's where the shift has gone. There's this old DBA legacy of like, don't let the software developers touch the prod database. And they've kind of kept their like arcane witchcraft to themselves. And that mindset has persisted. But now it's sort of shifted into all of these other

Starting point is 00:07:37 architectural patterns that are just abstractions on top of this, don't let the software engineers touch the data store, We have these streaming-first architectures, which are great. They're great for software devs. They're great for software devs, and they're great for data engineers who like to play with big, powerful technology. They're terrible if you want to answer a question like, how many customers did I have yesterday? These are the things that I think are some of the central challenges, right? A Kappa architecture, you know,

Starting point is 00:08:08 streaming-first architecture is amazing if you want to improve your application developer throughput. And it's amazing if you want to build real-time analytics or streaming analytics into your platform. But it's terrible if you want your data lake to be navigable. It's terrible if you want your data lake to be navigable. It's terrible if you want to find the right data that makes sense to do the more complex things. And it becomes very expensive to try to process it. One of the problems I think I have with it is that if I take a look at the

Starting point is 00:08:38 data volumes that I work with in my day-to-day job. I'm dealing with AWS billing data as spit out by the AWS billing system. And there isn't really a big data problem here. If you take a look at some of the larger clients, okay, maybe I'm trying to consume a CSV that's 10 gigabytes. Yes, Excel is going to violently scream itself to death if I try to wind up loading it there, and then my computer smells like burning metal all afternoon. But if it fits in RAM, it doesn't really feel like it's a big data problem on some level. And it just feels when I look at the landscape of all the different tools you can use for things like this, they just feel like it's more or less, hmm, I have a loose thread on my shirt. Could you pass me that chainsaw for a second? It just seems like stupendous overkill for anything that I'm working with. Counterpoint, the clients I'm working with have massive data farms, and my default response when

Starting point is 00:09:30 I meet someone who's very good in an area that I don't do a lot of work in is, counterintuitively to what a lot of people apparently do on Twitter, is not the default assumption of, oh, I don't know anything about that space, it must be worthless, and they must be dumb. No, that is not the default approach to take anything from my perspective. So it's clear there's something very much there that I just don't see slash understand. That is a very roundabout way of saying what could be uncharitably distilled down to, so is your entire career bullshit? But no, it is clearly not. There is value being extracted from this and it's powerful. I just think that there's been an industry-wide relatively poor job done of explaining that value in ways that don't come across as contrived or profoundly disturbing.

Starting point is 00:10:18 Yeah, I think there's a ton of value in doing things right. It gets very complicated to try to explain the nuances of when and how data can actually be useful, right? Oftentimes your historical data only tell, you know, it really only tells you about what happened in the past. And you can throw some great mathematics at it and try to use it to predict the future in some sense, but it's not necessarily great at what happens when you hit really hard changes, right? For example, when the coronavirus pandemic hit and purchaser and consumer behavior changed overnight, there was no data in the data set

Starting point is 00:10:51 that explained that consumer behavior. And so what you saw is a lot of these things like supply chain issues, which are very heavily data-driven on a normal circumstance. There was nothing in that data that allowed those algorithms to optimize for the reality that we were seeing at that scale. Even if you look at advanced logistics companies, they know what to do when there's a hurricane

Starting point is 00:11:18 coming or when there's been an earthquake or things like that. They have disaster scenarios, but nobody has ever done anything like this at the global scale. What we saw was this hard reset that we're still feeling the repercussions of today. Yes, there were people who couldn't work and we had lockdowns and all of that stuff, but we also have an effect from the impact of the way that we built the systems to work with the data that we need to shuffle around. And so I think that there is value in being able to process these really, really large data sets. But I think that actually there's also a lot of value in being able to solve smaller, simpler

Starting point is 00:11:55 problems, right? Not everything is a big data problem. Not everything requires a ton of data to solve. It's more about the mindset that you use to look at the data, to explore the data and what you're doing with it. And I think the challenge here is that, you know, everyone wants to believe that they have a big data problem because it feels like you have to have a big data problem. All the cool kids are having this kind of problem. You have to have big data to sit at the grownups table. And so what's happened is we've optimized a lot of tools around solving big data problems. And oftentimes these tools are really poor at solving normal data

Starting point is 00:12:32 problems. And there's a lot of money being spent in a lot of overkill engineering in the data space. On some level, it feels like there has been a dramatic misrepresentation of this. I had an article that went out last year where I called machine learning selling pickaxes into a digital gold rush. And someone I know at AWS responded to that in probably the best way possible. She works over on their machine learning group. She sent me a foam Minecraft pickaxe that now is hanging on my office wall. And that gets more commentary than anything, including the customized oil painting I have of Billy the Platypus fighting an AWS billing dragon. No, people want to talk about the Minecraft pickaxe. It's amazing. It's first, where is this creativity in any of the marketing that this department is putting out? But two, it's clearly not accurate. And what it took for me to see that was a couple of things that I built myself.

Starting point is 00:13:28 I built a Twitter thread client that would create Twitter threads back when Twitter was a place that wasn't overrun by some of the worst people in the world and turned into bird chan. But that was great. It would automatically do OCR on images that I uploaded. It would describe the image to you

Starting point is 00:13:43 using Azure's Cognitive Vision API. And that was magic. And now I see things like chat GPT, and that's magic. But you take a look at the way that the cloud companies have been describing the power of machine learning and AI, they wind up getting someone with a doctorate whose first language is math getting on stage for 45 minutes and just yelling at you in Star Trek Technobabble to the point where you have no idea what the hell they're saying. And occasionally other data scientists say, yeah, I think he's just shining everyone on at this point, but yeah, okay. It still becomes unclear. It takes seeing the value of it for it to finally click. People make fun of it, but the

Starting point is 00:14:19 hot dog, not a hot dog app is the kind of valuable breakthrough that suddenly makes this intangible thing very real for people. I think there's a lot of impressive stuff, and ChatGPT is fantastically impressive. I actually used ChatGPT to write a letter to some German government agency to deal with some bureaucracy. It was amazing. It did it. It was grammatically correct.

Starting point is 00:14:42 It got me what I needed, and it saved me a ton of time. I think that these tools are really, really powerful. Now, the thing is, not every company needs to build its own chat GPT. Maybe they need to integrate it. Maybe there's an application for it somewhere in their landscape of product, in their landscape of services, in the landscape of their internal tooling. And I'm certainly, I would be thrilled, actually, to see some of that be brought into reality in the next couple of years. But you also have to remember that ChatGPT is not something that came because we had a really great breakthrough in AI last year or something like that. It stacked upon 40 years of research.

Starting point is 00:15:26 We've gone through three waves of neural networking in that time to get to this point. And it solves one class of problem, which is honestly a fairly narrow class of problem. And so what I see is a lot of companies that have much more mundane problems, but where data can actually still really help them. Like how do you process Cambodian driver's licenses with OCR, right? These are the types of things that if you had a training data set

Starting point is 00:15:54 that was every Cambodian person's driver's license for the last 10 years, you're still not going to get the data volumes that even a day worth of Amazon's marketplace generates, right? And so you need to be able to solve these problems still with data without resorting to the cudgel that is a big data solution, right? So there's still a niche, a valuable niche for solving problems with data without having to necessarily resort to, we have to load the entire internet into our stream and throw GPUs at it all day long and spend hundreds of, tens of millions of dollars in

Starting point is 00:16:31 training. I don't know, maybe hundreds of millions, however much chat GPT just raised. There's an in-between that I think that is vastly underserved by what people are talking about these days. There is so much attention being given to this, and it feels almost like there has been a concerted and defined effort to almost talk in circles and remove people from the humanity and the human consequences of what it is that they're doing. When I was younger, in my more reckless years, I was never much of a fan of the idea of government regulation, but now it has become abundantly clear never much of a fan of the idea of government regulation. But now it has become abundantly clear that our industry, regardless of how you want to define industry, how to describe a society, cannot self-regulate when it comes to data

Starting point is 00:17:14 that has the potential to ruin people's lives. I mean, I spent a fair bit of my time in my career working in financial services in a bunch of different ways. And at least in those jobs, it was only money. The scariest thing I ever dealt with from a data perspective is when I did a brief stint at Grindr. And because that was the sort of problem where if that data gets out, people will die. And I have not had to think about things like that, of that level of import before or since, and for which I'm eternally grateful. It's only money, which is a weird thing for a guy who fixes cloud bills for a living to say. And if I say that on a client call, it's not going to go

Starting point is 00:17:49 very well. But it's the truth. Money is one of those things that can be fixed. It can be addressed in due course. There are always opportunities there. Someone's just been outed to their friends, family, and they feel their life is now in shambles around them, you can't unring that particular bell. Yeah. And in some countries, it can lead to imprisonment or death. It can lead to death sentences. Yes. It's absolutely not acceptable. There's a lot to say about the ethics of where we are. And I think that as a lot of these high profile, you know, AI tools have come out over the last year. So, you know, stable diffusion and chat GPT and all of this stuff. There's been a lot of conversation that is sort of trying to put some counterbalance on what we're seeing.

Starting point is 00:18:31 And I don't know that it's going to be successful. I think that, you know, I've been speaking about ethics and technology for a long time. And I think that we need to mature and get to the next level of actually addressing the ethical problems in technology. Because it's so far beyond things like, oh, you know, if there's a biased training data set and therefore the algorithm is biased. Everyone knows that by now. And the people who don't know that don't care. We need to get much beyond where these conversations about ethics and technology are going because it's a manifold problem. We have issues with the people labeling

Starting point is 00:19:11 this data are paid pennies per hour to deal with some of the most horrific content you've ever seen. I'm somebody who has immersed myself in a lot of horrific content for some of the work that I have done. This is so far beyond what I've had to deal with in my life that I can't even imagine it. You couldn't pay me enough money to do it. And we're paying people in, in developing nations, you know, a buck 35 an hour to do this. And I think you must understand Emily, that given the standard of living where they are, that that is perfectly normal and we wouldn't want to distort local market dynamics. So if they make a buck fifty a day, we are going to be generous gods and pay them a whopping dollar seventy a day. And now we feel good about ourselves. And no,

Starting point is 00:19:58 it's not about exploitation. It's about raising up an emerging market. Another happy horse shit that lies people tell themselves. Yes, it is. Yes, it is. And we built, you know, the industry has built its back on that. It's raised itself up on this type of labor. It's raised itself up on taking text and images without permission of the creators. And, you know, there's, I'm not a, and I'm not going to play one,

Starting point is 00:20:26 but I do know that derivative use is something that, at least under American law, is something that can be safely done. It would be a bad world if derivative use was not something that we had freely available, I think, on the balance. But our laws, the thing is, our laws don't account for the scale. Our laws about things like fair use, derivative use, are for if you see a picture and you want to take your own interpretation, or if you see an image and you want to make a parody, right? It's a one-to-one thing. You can't make five million parody images based on somebody's art yourself. These laws were

Starting point is 00:21:08 never built for this scale. And so I think that where AI is exploiting society is it's exploiting a set of ethics, a set of laws, and a set of morals that are built around a set of behavior that is designed around normal human interaction scales. You know, one person standing in front of a lecture hall or friends talking with each other or things like that. The world was not meant for a single person to be able to speak to hundreds of thousands of people or to manipulate hundreds of thousands of images per day.

Starting point is 00:21:43 It's actually, I find it terrifying. Like the fact that me, a normal person, has a Twitter following that, you know, if I wanted to, I can have 50 million impressions in a month. This is not a normal thing for a normal human being to have. And so I think that as we build this technology, we have to also say we're changing

Starting point is 00:22:06 the landscape of human ethics by our ability to act at scale. And yes, you're right. Regulation is possibly one way that can help this. But I think that we also need to embed cultural values in how we're using the technology and how we're shaping our businesses to use the technology. It can be used responsibly. I mean, like I said, ChatGPT helped me with a visa issue, sending an email to the immigration office in Berlin. That's a fantastic thing. That's a net positive for me, hopefully for humanity. I wasn't about to pay a lawyer to do it. But where's the balance, right? And it's a complex topic. It is. It absolutely is. There is one last topic

Starting point is 00:22:48 that I would like to talk to you about that's a little less heavy, and I've got to be direct with you, that I'm not trying to be unkind, but you disappointed me because you mentioned to me at one point when I asked how things were going in your AWS universe,

Starting point is 00:23:03 you said, well, aside from the bank heist, reasonably well. And I thought how things were going in your AWS universe, you said, well, aside from the bank heist, reasonably well. And I thought that you were blessed with something I always look for, which is the gift of glorious metaphor. Unfortunately, as I said, you've disappointed me. It was not a metaphor. It was the literal truth. What the hell kind of bank heist could possibly affect an AWS account? This sounds like something out of a movie. Hit me with it. Yeah, you know, I think in the SRE world, we tell people to focus on the high probability,

Starting point is 00:23:32 low impact things, because that's where it's going to really hurt your business. And let the experts deal with the black swan events, because they're pretty unlikely. You know, a normal business doesn't have to worry about terrorists breaking into the Google data center or a gang of thieves breaking into a bank vault. Apparently that is something that I have to worry about because I have some data in my personal life that I need to protect, like all other people. And I decided like a reasonable and secure and smart human being who has a little bit of extra spending cash that I would do the safer thing and take my backup hard drive and my orb phones and put them in a safety deposit box at an old private bank that has, you know, a vault that's behind a meter and a half thick steel door

Starting point is 00:24:15 and has two guards all the time and cameras everywhere. And I said, what is the safest possible thing that you can do to store your backups. Obviously, you put it in a secure storage location, right? And then, you know, I don't use my AWS account, my personal AWS account so much anymore. I have work accounts, I have test accounts. Oh, yeah, it's honestly the best way to have an AWS account is having someone else having a payment instrument attached to it because otherwise, oh, God, you're on the hook for that yourself

Starting point is 00:24:42 and nobody wants that. Absolutely. And, you know, creating new email addresses for new trial accounts is really just a pain in the ass. So, you know, I had my phone and, you know, from five years ago sitting in this bank vault and I figured that was pretty secure until I got an email from the Berlin Polizei saying there has been a break-in. And I went and I looked at the news and apparently a gang of thieves has pulled off the most epic heist in recent European history. This is barely in the news. Unless you speak German, you're probably not going to find any news about this. But a gang

Starting point is 00:25:17 of thieves broke into this bank vault and broke open the safety deposit boxes. And it turns out that this vault was also the location where a luxury watch consigner had been storing his watches. So they made off with some like tens of millions of dollars of luxury watches. And then also the phone that had my 2FA from my Amazon account. So the total value, you know, potential theft of this, this was probably somewhere in the 500 million dollar range if they set up a SageMaker instance on my account, perhaps. This episode is sponsored in part by Honeycomb. I'm not going to dance around the problem. Your engineers are burned out. They're tired from pagers waking them up at 2am for something that could have waited until after their morning

Starting point is 00:26:02 coffee. Ring ring. Who's there? It's Nagios, the original Call of Duty. They're fed up with relying on two or three different monitoring tools that still require them to manually trudge through logs to decipher what might be wrong. Simply put, there is a better way. Observability tools like Honeycomb, and very little else because they do admittedly set the bar, show you the patterns and outliers of how users experience your code in complex and unpredictable environments so you can spend less time firefighting and more time innovating. It's great for your business,

Starting point is 00:26:36 great for your engineers, and most importantly, great for your customers. Try free today at honeycomb.io slash screaming in the cloud. That's honeycomb.io slash screaming in the cloud. The really annoying part that you are going to kick yourself on about this, and I'm not kidding, is I've looked up the news articles on this event. And it happened something like two or three days after AWS put out the best release of last year's or any other reInvent past, present, future,

Starting point is 00:27:13 which is finally allowing multiple MFA devices on root accounts. So finally, we can stop having safes with these things. Or you can have two devices. Or you can have multiple people in COVID times out of remote sides of different parts of the world and still get into the thing. But until then, nope, it's either no MFA

Starting point is 00:27:31 or you have to store it somewhere ridiculous like that and access becomes a freaking problem in the event that the device is lost or in this case, stolen. Yes, I would just beg the thieves, if you're out there, if you're secretly, actually a bunch of cloud engineers who needed to break into a luxury watch consignment

Starting point is 00:27:51 storage vault so that you could pay your cloud bills, please have mercy on my poor AWS account. But also I'll tell you that the credit card attached to it is expired, so you won't have any luck. Yeah, really sad part, despite having an expired credit card, it just means that the charge won't go through. They're still going to hold you responsible for it. It's the worst advice I see people well-intentioned giving each other on places like Reddit where

Starting point is 00:28:15 the other children hang out. And it's, oh, just use a prepaid gift card so it can only charge you so much. It's, yeah, and then you get exploited like someone recently was and start accruing $60,000 a day in Lambda charges and an otherwise idle account. And Amazon will come after you with a straight face after a week and like, yes, we'd like our $360,000, please. What do you, we try to charge the credit card and wouldn't you know, it expired. Could you, could you get on that please? We'd like our money faster if you wouldn't mind. And then you wind up in absolute hell. Now, credit where due. They, in every case I am aware of that is not looking like fraud's close cousin,

Starting point is 00:28:53 they have made it right on some level. But it takes three weeks of back and forth and interminable waiting. And you're sitting there freaking out, especially if you're someone who does not have a spare half million dollars sitting around. Imagine who that you sound poor if you tried not being that. And I'm firmly convinced it is a matter of time until someone does something truly tragic because they don't understand that it takes forever, but it will go away. From my perspective, there's no bigger problem that AWS needs to fix than surprise lifelong earnings bills to some poor freaking student who is just trying to stand up a website as part of a class. All of the clouds have these missing stairs in them. And it's really easy because they make it...

Starting point is 00:29:38 One of the things that a lot of the cloud providers do is they make it really easy for you to spin up things to test them. And they make it really, really hard to find where it is to shut it all down. The data science is awful at this. As a data scientist, I work with a lot of data science tools. And every cloud has the spin up your magical data science computing environment so that your data scientists can bang on the data with high-performance compute for a while. It's one click of a button and you type in a couple of things, name your service or whatever, name your resource. You click a couple of buttons and you spin it up.

Starting point is 00:30:14 But behind the scenes, it's setting up a Kubernetes cluster and it's setting up some storage bucket, and it's setting up some data pipelines, and it's setting up some monitoring stuff, and it's setting up a VM in order to run all of this stuff. The next thing that you know, you're burning 100, 200 euro a day just to figure out if you can load a CSV into pandas using a Jupyter Notebook. You're like, when you try to shut it all down, you can't.

Starting point is 00:30:44 You have to figure, oh, there is a networking thing set up. Well, nobody told me there's a networking thing set up. You know, how do I delete that? You didn't say, please. So here you go without it. For me, it's not even the giant bill going from four dollars a month and S3 charges to half a million bucks, because that is pretty obvious from the outside. Just what the hell's been happening. It's the little stuff. I am still, since last summer, waiting for a refund on $260 of, because we said so, SageMaker credits because of a change to their billing system. For a 45-minute experiment I had done eight months before that.

Starting point is 00:31:19 Yep. Wild stuff. Wild stuff. And I have no tolerance for people saying, oh, you should just read the pricing page and understand it better. Yeah. Listen, jackhole. I do this for a living. If I can fall victim to it, anyone can, I promise. It is not that I don't know how the billing system works and what to do to avoid unexpected charges. And I'm just lucky because if I hadn't caught it with my systems three days into the month, it would have been a $2,000 surprise. And yeah,

Starting point is 00:31:44 I run a company. I can live with that. It's, I wouldn't be happy, but whatever. It is immaterial compared to, you know, payroll. I think it's kind of a rite of passage, you know, to have the $150 surprise Redshift bill at the end of the month from your personal test account. And it's, it's sad. You know, I think that there's so much better that they can do and that they should do. Sort of as a tangent, one of the challenges that I see in the data space is that it's so hard to break into data because the tooling is so complex and it requires so much extra knowledge. If you want to become a software developer, you can develop a microservice on your machine. You can build a web app on your machine. You can set up Ruby on Rails or Flask or.NET or whatever you want, and you can do all of that locally. You can learn everything you need to know about React or Terraform or whatever running locally. You can't do that

Starting point is 00:32:37 with data stuff. You can't do that with BigQuery. You can't do that with Redshift. The only way that you can learn this stuff is if you have an account with that set up and you're paying the money to execute on it. And that makes it a really high barrier for entry for anyone to get into this space. It makes it really hard to learn because if you want to learn anything by doing, like many of us in the industry have done, it's going to cost you a ton of money just to f*** around and find out. Yes. And no one likes the find out part of those stories.

Starting point is 00:33:10 Nobody likes the find out part when it comes to your bill. And to tie it back to the data story of it, it is clearly some form of batch processing because it tries to be an eight-hour consistency model. Yeah, I assume for everything it's 72. But what that means is that you are significantly far removed from doing a thing and finding out what that thing costs. And that's the direct charges. There's always the, oh, I'm going to set things up and it isn't going

Starting point is 00:33:34 to screw you over on the bill. You're just planting a beautiful landmine you're going to stumble blindly into in three months when you do something else and didn't realize what that means. And the worst part is, is it feels victim-blaming. I mean, this is my problem. I guess this is one of the reasons I guess I'm so down on data even now. It's because I contextualize it in a sense of the AWS bill. No one's happy dealing with that. You ever met a happy accountant? You have not. Nope. Nope. Especially when it comes to cloud stuff. Especially these days when we're all looking to save energy, save money in the cloud. Ideally save the planet, sustainability, and saving money.

Starting point is 00:34:10 A line on the axis of turn that shit off. It's great. We can hope for a brighter tomorrow. I really want to thank you for being so generous with your time. If people want to learn more, where can they find you? Apparently filing police reports after bank heists, which, you know, it's a great place to meet people. Yeah, you know, the Landeskriminalamt in Berlin is certainly a place you want to go to get your cloud advice. You can find me, I have a website, it's my name, emilygorsensky.com. You can find me on Twitter, but I don't really post there anymore.

Starting point is 00:34:40 And I'm on Mastodon at some place, because Mastodon is weird and kind of a mess. But if you search me, I'm really not that hard to find. My name is harder to spell, but you'll see it in the podcast description. And we will, of course, put links to all of this in the show notes. Thank you so much for your time. I really appreciate it. Thank you for having me. Emily Gorsensky, data and AI service line lead at ThoughtWorks. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud.

Starting point is 00:35:08 If you've enjoyed this podcast, please leave a five-star review in your podcast platform of choice. Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insipid, insulting comment talking about why data doesn't actually matter at all. And then the comment will disappear into the ether because your podcast platform of choice feels the same way about your crappy comment. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less

Starting point is 00:35:47 horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started. This has been a HumblePod production. Stay humble.

Screaming in the Cloud - The Realities of Working in Data with Emily Gorcenski

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.