Python Bytes - #324 JSON in My DB?

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 324, recorded February 21st, 2023. I'm Michael Kennedy. And I'm Brian Ocken. And I'm Erin Mulaney. And this episode is brought to you by Compiler, a podcast from Red Hat. Tell you more about them. Erin, it's awesome to have you on the show.

Starting point is 00:00:21 Thanks for joining us. Thanks for asking me to be on. Yeah, you bet. Yeah. Why don't you tell folks a bit about yourself before we jump into the topics? Yeah, I'm Erin Mullaney. I've been a web developer since around the year 2000. I currently work at Energy Solutions as a code-based lead on a Django project there,

Starting point is 00:00:41 which means that I write and review a lot of Django and Python code on a day-to-day basis. Energy Solutions, where I work, is an energy consulting company that's mission driven to protect the environment through different energy things, to be real, not specific. I specifically work on a Django project that facilitates energy efficiency programs. And energy efficiency is actually a super powerful and cost-effective way to combat climate change. And that's according to the U.S. Department of Energy. Yeah, that's awesome. All the wasted energy and bad insulation and other things like that.

Starting point is 00:01:22 That's really cool. That's good work. Really quickly, before we dive into Brian's item here, how are you feeling about Django and the recent changes? I feel like it's picked up a lot of momentum lately. It's picked up some new features like async stuff. Is that exciting for you and your team? Yeah, for sure.

Starting point is 00:01:37 It's exciting. I am coming from a background where I was actually coding in a different web framework for years and switched over to Django. So I'm just happy to hear that more and more people are downloading it and using it. So yeah, yeah, I just I wanted to stick around. Because I like it. Yeah, absolutely. All right, Brian, you want to kick us off here? Sure. So this one first one's coming from Brett Cannon. So he wrote an article called, Use TOML for.env files?

Starting point is 00:02:10 And so there's the question at the end, and we'll talk about that. But I just ran across, I mean, I don't know, because I'm not a web developer very much, I mean, I'm getting more so now, but I wasn't really familiar with the dot env files until just recently. And so one of the great things about this article is it talks about kind of what these are. So what these are often is you've got, you've got settings for your for

Starting point is 00:02:38 your application. And there's an idea of a 12 factorfactor app design, which I kind of read about many years ago and forgot about. But one of the ideas is you don't want to have too many differences between your development environment and your live environment. And one of the ways you do this is using environmental variables to store things like login credentials and all that sort of junk and um and in python one of the ways we do that is through dot env files and also through um a project called python.env which is used by uh pydantic and a lot of other projects and what this does what this does is allows you to have defaults in there so you have so in your development environment you might have something silly some silly credentials uh but then or you know looking up somewhere but then uh in your live environment those are actually set by the uh the the production server to set those um secrets and uh and so the question really is what's the format of this so

Starting point is 00:03:38 um and i kind of never really thought about it before and the basically the problem is it's not defined uh and it's in there exists a text file it has secrets yeah and it says it's kind of like bash-ish files or something it's by the uh it's a format that's not formally specified and improves over time according to the python.inv uh uh read me um but's not really, what does that mean? It kind of means it's your application so that you can define it however you want, right? But maybe we should have some standardization. So Brett was looking further into this. And one of the solutions that Adafruit came up with was, let's not use.env, but actually just do a settings.toml. And it's used for the same thing to store secrets such as passwords and API keys.

Starting point is 00:04:32 So they're using toml. And then basically kind of when you just do a normal simple toml file, it looks pretty much like a normal any other.env file that people have used. So really, that's the question that um that brett is posing is can we just standardize on this why don't we just you know standardize dot env is dot toml as toml format um and i think why not uh mostly it'll work for everybody already and uh then you could do think you could do cool things if we did Toml. You could extend it a bit. So like in the VS Code base, they're talking about using categories and specific table. You'd have multiple tables in there instead of just the global one. I think that's a cool idea.

Starting point is 00:05:16 I like the ability to have multiple things like test and maybe dev or like a connection string to a database or something. Yeah. It wouldn't make me sad if it was JSON as well. I know Aaron is going to make a cameo for JSON later, but, you know, Toml seems to be winning on these things, and I would be okay with Toml as well. So, Aaron, you do web development. Do you use.env files or this sort of a setting?

Starting point is 00:05:43 We use settings. Yeah, we don't use.enV files or this sort of a setting? We use settings. We, yeah, we don't use dot EMV files. We have, we do have local settings, but yeah. Cool. I'm not really a Django developer. So maybe is it built into Django to have some solution for this? We, yeah, I'd have to like, I'm not a, I get it running on my machine and then I go, and I code. Yeah. So all the OS stuff. Yeah, all the OS stuff is not, yeah, is not stuff I worry about unless I'm installing a new requirement or something. Yeah, Django does have its way of managing settings that predates this stuff, I believe, as well.

Starting point is 00:06:20 All right. Yeah, that makes sense. Well, Michael, should we switch to Pydantic? I have some crazy news for you. Yeah, that makes sense. Well, Michael, should we switch to Pydantic? I have some crazy news for you. Yeah, let's do it. First, huge, huge congrats over to Samuel Colvin. And I've had him on the show to talk about Pydantic before. Pydantic is one of the more exciting libraries, I think, especially in the API space. But also, Python Bytes itself is powered by Beanie,

Starting point is 00:06:45 the MongoDB ORM or ODM, and that uses PyDantic models as its validation in an exchange. Like the things that are mapped to MongoDB are PyDantic classes. So here's the news. The Sequoia, like one of the biggest VC firms in California, in the world probably, backs open source data validation Pydantic to commercialize with cloud services. That's crazy, huh? Yeah. Wow. We are a long way

Starting point is 00:07:12 from the buy me a coffee, donate PayPal button that you see on various projects in this. And I think it's just a sign of the open source space finding its way to support really successful projects and to support people whose time and energy and contributions to the world would be better spent to create further this this library than say potentially like well how can we get like one percent of one percent increase on ad clicks by using my library or something like that you know working for like companies that don't necessarily contribute so much. So some of the highlights here, you'll notice. When I said we're a long ways from buying me a cup of coffee,

Starting point is 00:07:53 Pydantic Services Incorporated emerges from stealth today with $4.7 million in seed funding. Wow. Yeah. Big coffee. That is a lot of coffee. That's like coffee for life. Some of that fancy kind, you know, the weird, weird variations and stuff.

Starting point is 00:08:10 Yeah. Anyway, so there's, it's not just Sequoia, it's Paratech, it's Irregular Expressions. It's Zabier co-founder, Brian Helmig, who's also been on Talk Python before and some other folks, co-founder of Sentry, David Kramer. So, so let me see, I wrote down some of the highlights of Sentry, David Kramer, so on. So let me see. I wrote down some of the highlights of this whole article that I wanted to hit on. First of all, also, this comes from Mark Little, who was a guest on show 285 and also a friend of mine. So thanks, Mark, for sending that in.

Starting point is 00:08:37 The new, the whole, like, so you might be wondering, okay, well, 4.7 million is amazing. It's a lot of support. It means Pydantic is only going to get better and stronger. But what the heck are you going to get for your 4.7 million is amazing. It's a lot of support. It means Pydantic is only going to get better and stronger. But what the heck are you going to get for your 4.7 million? So the idea is that this new commercial entity, it'll incorporate a bunch of tools and services that are powered by and inspired by the Pydantic library. And from what I can tell is its primary goal is to make PyTandic really, really good. Further, there's already this big project for 2.0 for rewriting the core in Rust. This is the last time I had Samuel on the show on TalkPython to talk about that, which is going to make it a lot faster. But something a little bit akin to a platform as a service, something a little bit like a Heroku, where you can push Python code to production in simple ways, but using the validation and the data exchange and the

Starting point is 00:09:30 understanding that Pydantic has for data as part of this. So final thing that I'll get your all's thoughts on this is they're going to start with an initial team of six. The first three engineers are based in Montana, Chicago, and Berlin, various places. And so, yeah, I wish all the luck to the Pydanic team and to Samuel and folks. I think this is great. What do you all think? I think this is great. I like the conversion to Rust. That's pretty exciting.

Starting point is 00:09:59 Yeah. How's this sit with you? Does this surprise you? No, it's cool. It's very cool. I mean, I'm just Googling it because I didn't research it ahead of this talk. But yeah, it sounds like it can be used with any Python-based framework. Yeah, it came out of FastAPI.

Starting point is 00:10:20 And it plays many important roles in FastAPI. It's the data validation. It's also the type hints that does the automatic data conversion. But it also drives the Swagger, OpenAPI documentation, and all those things. But it's been used way, way more places, for example, like Beanie, which I mentioned, or SQL Model, and plenty of others. And it's just starting to gain a ton of momentum as a really solid data exchange for Python that's not directly talking to databases. So, yeah, it should be good to see it grow. What does that mean, not directly talking to databases,

Starting point is 00:10:52 meaning it just reads what comes back from the API and validates that? Yeah, it basically will take any JSON, or if you could take a TOML document and you could turn it into a Python dictionary, then you could pass that on and have it validated. So you could say things like, this class has a list, which is a list of orders, and there can be no more than three orders in the list, and they have to be orders, and this thing has to be a number,

Starting point is 00:11:20 and just all that kind of logic gets expressed in the model there. Yeah, sounds nice. Yeah, it's cool. So one, just I guess a random thing. So it's a team of six, first three engineers based in Montana, Chicago, or Berlin. I wonder who's in Montana. And I guess if you had to choose one of three places to live, would you choose Montana, Chicago, or Berlin? Gosh, I could.

Starting point is 00:11:42 I think I'd go with Berlin. I could make a case for Montana or Berlin. They both are awesome in their own separate ways. What's your spare time look like, I guess? I mean, I do love the theaters in Chicago. The theaters

Starting point is 00:11:58 in Chicago are beautiful. I do too. But I'm thinking of motorcycle riding for days in Montana and the cities and all that stuff in Berlin. Erin, where would you live? Man, between those, that's really a hard choice. I moved to North Carolina for shorter winters. So it seems like Chicago would be out for that reason because they have even longer winters.

Starting point is 00:12:16 Montana might really be out. Yeah. So I would need to research what had the shortest winter, but also had really good vegan food. Like Chicago has amazing vegan food, but the winter, I think Berlin's going to be your bed. Yeah. Yeah. All right. Awesome. Well, over to you. What's your first topic? Okay, cool. Yeah. And I just wanted to go back to the topic because I kind of froze on that one. So we are using a YAML file for our local settings, not a TAML file. I haven't actually seen seen TOML before. I don't really know how different looking it is. But yeah, and settings are kind of

Starting point is 00:12:50 baked into Django for outside of the local environment stuff. Cool. Yeah. So my next, so my topic was, my first topic is JSON fields for performance and thinking about JSON fields in terms of what they are, which is kind of like denormalized data. I'm really interested in the topic of normalization and denormalization and specifically how JSON fields are basically denormalized and mutable data that's probably living in an otherwise normalized database. So I was interested in this topic and I searched to see if I could find it anywhere online. And yeah, so what we're showing here is this was a talk given by David Stokes at PHP UK in 2019 called How Denormalizing Your Data with JSON Can Boost Query Performance. I always miss,

Starting point is 00:13:42 do you guys pronounce it Jason or Jason jason and i'm sure you've talked about this before i i guess i hadn't really thought i say j jason like on top yeah yeah but i brian where do you like the name jason jason yeah it it it is jason it's like it's the name according to the creator and it is jason okay creator of jason jason but i i will mispronounce it a lot um and it stands for javascript objects notation um but yeah i think my filly comes out because i'm always saying jay saul and so uh yeah so david stokes gave his talk he is a technology evangelist and a lot of the talk was about mySQL as a backend in particular. But the parts of the talk that I found really interesting are the history lesson. And I kind of have it highlighted

Starting point is 00:14:31 here. It starts at around minute 250, where he talked about how Edgar Codd at IBM developed the idea of a relational data because hardware was expensive at the time. So having relational tables and normalized data was a way to not have duplication of data. And normalized data, just a quick definition is like, or example is like taking an address and breaking it down into parts. So experts, you know, had been saying for years at this point, like, normalizing data is the way to go. You want to normalize your data. And then during this history talk, he mentioned, and then no SQL came in and shook things up. And after that, SQL added JSON data types or a mutable data type.

Starting point is 00:15:20 So you don't have to define and normalize your whole database. You can kind of have these mutable fields so okay so anyway the history lesson i just found that super interesting uh as a as a data person um do you guys find that interesting at all i do i do yeah i think yeah i think that this concept of mutable schema not mutable data se, but that the schema itself doesn't have to be as controlled and as strictly guarded by a DBA that goes through some giant process to figure out what you do,

Starting point is 00:15:56 can add a ton of flexibility to the way that you evolve your app, right? So there doesn't necessarily have to be a DBA. It could be like, well, how are we going to schedule the downtime so that we can do the schema migration as we roll out this new feature, right? Like those kinds of things can get challenging. If you roll out the code first and it's some kind of relational thing, you're using SQL Alchemy or something like that,

Starting point is 00:16:23 it's going to crash saying the code doesn't match the database. You roll out the database first, you know, it may no longer match what the code that's running is. And like, there's always this, well, what do I do? And having some of this more mutable schema, in this case, they're talking about MySQL, I believe it's basically the same for Postgres, where you can have columns that are JSON, and then you can, you just say to the database, the schema is JSON, but your code knows, well, it's actually a list of these things with these properties in it. And you want to add a new property? Great, you add a new property, as long as your code can deal with it, super. So I think it's certainly something people should consider. It really adds a lot of flexibility. You don't need necessarily a normalization table,

Starting point is 00:17:03 because you can just put the stuff, you know, in a list, for example. Yeah. And not only flexibility, but also quicker querying. So, yeah. So I really liked starting at around minute 14, which is, this is what I was kind of looking for when I was looking for this topic, so I really liked that he gave this talk about it. He goes over an example of a music store and you have these items in a music store like guitars and you don't want to have to add field every time. There's a new guitar feature, right? So you have these JSON fields in your database. And like you said, they're available in lots of different backends. We use Postgres and yeah, we use JSON fields all over the place.

Starting point is 00:17:51 So, and he has this really cool diagram where he shows, you know, reducing database dives and many too many joins where you're diving from, you know, you know, one index into another into another to just to get at the data that you can get at the top level if you have it in this json field right if you don't have to do a multi-way many-to-many join when it's just in in there directly right because you have more flexibility it doesn't have to be tabular yeah yeah so i found it really cool um we use json fields in one of our big django projects quite a bit and yeah our data is totally, our schemas are normalized. But we find it really helpful for also for reporting, making reporting really,

Starting point is 00:18:33 really fast because of that database dive that you don't need to do. And also for tracking snapshots of data. So something happened on this date and then the relational record changed, but the JSON gives you the snapshot of what the user did on that date. So that's really useful too. Because if the snapshot doesn't match the current schema, well then how are you going to store it? Like that gets to be a problem, but just JSON is JSON. That's right. Yeah. Yeah. I guess I've taken this kind of to the far extreme in my world. So I'm a huge advocate, but doing, I do almost all my work on MongoDB, which means it's, it's all JSON all the way down. Right. So, but I, I think it's absolutely fabulous way to work. I love it. The operational

Starting point is 00:19:18 side of, of not doing massive migrations all the time. It's really, really good. Yeah. And I'm actually working on a blog, a blog article about it because I couldn't find what I specifically wanted to talk about today. So I'm, I'm writing up a blog article. It's not published. It won't, it'll be published next month. Um, but yeah, I'll share it later with you guys. Yeah. Yeah, please do. And I think that's, I think that's a great, uh, actually a great thing for people to do is just, uh, there's a discussion of something. And if you can't find an article that expresses what you want to express, then write one. That's great.

Starting point is 00:19:51 Yep. Indeed. All right, Brian, how about I tell everyone about our sponsor before we move on? Oh, that's a great idea. Yeah. As I said at the beginning, this episode is brought to you by the Compiler podcast from Red Hat. And just like you out there listening, we're big fans of podcasts, Brian and I, and we're happy to share one of the most highly respected, one from the most highly respected open source companies, Compiler,

Starting point is 00:20:16 original podcast from Red Hat. It brings together a curious team of Red Hatters to simplify tech topics, provide insight for new generation IT professionals. The show covers topics like what are the components of a software stack? Are big mistakes that big of a deal? And do you have to know how to code to contribute and get started in open source? And not always. Depends on how you're trying to contribute.

Starting point is 00:20:38 So Compiler closes the gap between those who are new to technology and those behind the inventions and services shaping our world. closes the gap between those who are new to technology and those behind the inventions and services shaping our world. They bring together stories and perspectives from the industry and simplify its language, culture, and movements in a way that's fun, informative, and guilt-free. I recently listened to Are We As Productive As We Think? And that episode is really fun. There's a bunch of good advice in there. As a developer, owner of a tech company, and a technologist, these productivity hacks such as time boxing, focusing on one task at a time, and incorporating intentional breaks into your workday all stood out as super relevant.

Starting point is 00:21:16 They suggest that by creating an honest self-image of your productivity habits and being intentional about how you spend your time, you can reduce the overwhelm of multitasking that you have to do and increase your focus and creativity, creativity leading to you'd be more successful for sure. So learn more about compiler at pythonbytes.fm slash compiler. The link is in your podcast show notes. Thanks to compiler and red hat for keeping this podcast going strong. Awesome.

Starting point is 00:21:42 All right. Yeah. Thanks. Fun show. And tell us, you're gonna take us to school, strong. Awesome. All right. Yeah. Thanks. Fun show. And tell us, you can take us to school, Brian. Yeah. So Kevin Markham is a friend of the show. Friend of ours.

Starting point is 00:21:51 Ran into him a lot during when, when I was going to conferences more. That's hopefully coming up again. What are those? Conferences. You know, people get together in real life. But so Kevin took a little bit of a break. He's a he used to write a lot. And I hadn't, I guess I hadn't noticed. But there's a break between August of 2021. And then now in February of 2023. So copy your break. And you know, we all

Starting point is 00:22:19 need that. That's fine. But these articles are great. So a couple new articles that he has, I'm going to pop through a couple of them, how to use F strings with pandas. So basically, it's a good discussion of F strings. If you're not comfortable with F strings already. This is a good intro to why F strings are great to pop in values. I don't know if it's really that panda specific. But one of the things I really loved, I'm going to pop to my. I don't know if it's really that Panda specific, but one of the things I really loved, I'm going to pop to my favorite part of this article. So, and I forget to do this. So I'm glad that he points these out. So one of the things is you can,

Starting point is 00:22:55 it's not just taking a value and putting it in brackets so that you can print it, but you can do, it's an expression in the brackets. So you can call like upper for a name variable so that you can print it in uppercase and not have to do that before you pass it to the F string. And or you could do things like, you know, some a little bit of math.

Starting point is 00:23:14 So if you've got like his example had days completed, and he did like, you know, 365 minus that divided by to get a percentage. So this is pretty cool to think um remember the if if the only place you're going to use the value is within the string you could just do it do it within the expression so this is a good one the the part that really i never really occurred to me to do that i wanted to highlight was uh he had different different columns of data within uh like a data frame and referencing them with a string index and then using fstring as the, to pick the index within a loop.

Starting point is 00:23:52 And it never occurred to me to use fstrings to generate the index for a string index. This is a cool idea. Yeah, that is wild. I like it. The other article is a fly-through yeah that is wild i like it highlight um the other article is a fly-through of uh jupiter key keyboard shortcuts and uh i guess i just have to say i'm a huge fan of the rocket emoji um i wonder why yeah um but the uh i i like it's this is not overwhelming so

Starting point is 00:24:21 especially for people that um use use i mean if you use it a lot and you don't know keyboard shortcuts this would be a good intro but uh people like me that just pop in use it every once in a while for something um these are useful just for those people too it's not an overwhelming list there's some great stuff like just you know hitting escape and enter to go back and forth between command mode and edit mode, for instance. And then I'm going to tell, we're going to remember this one a and B for create a cell above or below the current cell. So these are just some really great little Jupiter tricks to,

Starting point is 00:24:55 to make yourself more productive and not have to touch the mouse as much. So anyway, some good, good things. I think it's great. I wish actually Jupiter had more hotkeys. There's really a lot more they could do there. But knowing the ones that are there, I think, is pretty excellent. Yeah, for me, I often try to use Vim shortcuts, and it's just not going to work. It's just not going to have it.

Starting point is 00:25:20 Erin, what are your thoughts here? The Fstring article was really nice. Yeah, it's hard to find a good fstring article that tells you all these different things you can do. So I was just scanning through it and we use fstrings quite a bit. And if we have old format Python strings that are in the code that we're updating in a pull request,

Starting point is 00:25:43 we always ask the developer to please update those old ones to use app string as well they're just so much they're just so much more readable as you're going through it go ahead and fix them yeah yeah instead of like fixing them all just go through and fix the ones that you're touching does pi upgrade do that or i can't remember i can tell you that flint does flint yeah, that's it. Flint. So I've taken Flint and run it against large projects that I've done. In the early days, it introduced one bug out of 20,000 lines of code, but it rewrote like a thousand string formats of various versions, and I found it to be really helpful. And that's F-L-Y-N-T. Yeah, F-L-Y-N-T. For the podcast listeners. Exactly. Thank you.

Starting point is 00:26:28 Yeah, so this is really good, too. You know, if you ask people to do that, you could suggest, like, and you could try just running this on your code. Yeah. And just, you know, make sure it doesn't break anything. But it's been pretty stable since the few oddities it hit. Oh, that's cool. We'll check that out. Cool.

Starting point is 00:26:43 Indeed. All right, Brian, you all done with yours? Yeah. And I just did look it up. I think that PyUpgrade also does it. Oh, no. Anthony Lister out there in the audience is just trying to egg us on. Single quotes or double quotes with those F strings. See last episode. Yeah, exactly. That was the whole debate last episode. All right. My next item is BioG GPT.

Starting point is 00:27:07 And so we've heard about chat GPT, and this is similar stuff, but applied to biology. So creating a cat that barks. Exactly. And now make it mutate into a snake. How many generations will this take? Three. All right. So I want to just, as a way to, you know, it's not really easy for me to demo this.

Starting point is 00:27:28 So, like, let me, as a way of motivation, just show you, like, a chat TV thing, since you were just asking about it, Brian. Okay. Check this out. Here's a cool program that talks about how you should never write insanely nested code. You should instead use, so for people listening this is like it says is this a platypus if self.ismanimal and then if self.hasfer then if self.hasbeak and so on and so on it's like nested over so the code starts in the middle maybe a bit to the right of the screen and it says return true right like you shouldn't do that what should you do you should write guarding clauses

Starting point is 00:28:01 so check this out brian if i go over to chat g and say, I'm going to give you a program in Python. I want you to name it arrow. And it'll say, sure. Arrow sounds like a great name. And I give it this. And it talks about what it does. It checks whether it's a platypus and say, rewrite arrow to be less nested using guarding clauses. Certainly. Here you go, it says. And what is it, right? Exactly. The new pattern that you should have used. Is that insane? What do you think, Brian?

Starting point is 00:28:32 Aaron? I wouldn't write the code like this anyway, but okay. All right. Now, so arrow checks for a platypus. What? Plat? Plat? We'll fix it. Whatever. Oh, here. Hold on a platypus, whatever. Oh, here, hold on.

Starting point is 00:28:49 Platypus. Rewrite it to check for crocodiles. Look at this. So sure, no problem. We're going to write, is it a crocodile? And look, the tests are, is it a reptile? Has scales? Does it have jaws?

Starting point is 00:29:03 Does it have a four-chamber heart? Wow. Is that insane? And all I did is I'm going to give you this code and just start asking questions. So, okay. So impressive, right? So back to chat, a bio GPT, think of what this can do for doctors and nurses and people trying to understand like written text of this. So it contains, this BioGPT contains an implementation specifically trained for like medical analysis. Kind of like ChatGP is a general analysis tool. This one is like specifically for medicine.

Starting point is 00:29:39 Okay. So pretty cool. Apparently it can do PubMedQA tests. I have no idea what that is, but if I was a doctor, I'm sure this is like, how good are you at answering questions? With 81% accuracy, which might sound like, well, that's 19% not good enough. But I bet you doctors don't do it at 100% accuracy either. You know, there's a lot of examples where AI is predicting cancer sooner or better or more accurately. And I bet it's better than like Dr. Google and looking up your symptoms and thinking you have the worst thing. Yeah, yeah, exactly. That's what I was curious about. If it was named like, what am I dying of today?

Starting point is 00:30:20 Will I die GPT? I don't know. It seems grumpy. I don't know. So it comes with different models. It has the GP, the bio GPT one, but it also has the large one. And my experience with this stuff is the large models are where it's at. The regular ones are quick, but they're not very accurate. You want to go for the large model. So there's a bunch of different ones, like one trained for fine-tuned for relax, relation extraction task on KDE DTI,

Starting point is 00:30:47 which is a certain type of data set or other ones. So you can pick which ones it is. And then you just start writing Python code. So you can either use a PyTorch style programming, or I think down here there's a Hugging Face variant as well. So it seems a little bit cleaner, a little bit nicer. So your model is from pre-trained Microsoft slash BioGPT. And there's even a thing where you can try it out down here.

Starting point is 00:31:11 There's like a live, yeah, some answering questions, for example. Here you can pull this up and you can ask it questions. For example, this one. Should chest wall irradiation be included after da-da-da-da-da? Yes. It's just yes. I don't know. People can play around with examples. Like I said, I'm not a doctor. I don't really know reasonable things to ask it. But it's a weird world that we live in. And it has lots of positives and lots of negatives, I'm sure, that we're going to come to learn about. But BioGPT, if you're working on analyzing medical texts, check this out. It's from Microsoft.

Starting point is 00:31:45 I think anything that would reduce the amount of time doctors and medical professionals have to spend on the computer is probably good. So if this means they need to enter less things in because it's just like figuring stuff out for them, then that would be really powerful. But if it's just another tool that they have to use on the Internet that makes them not get to be face to face with their patients, then I'm just kind of skeptical of it. Yeah. I feel like you could ask it questions like we gave this person. Oh, here's their symptoms. We gave them this diagnosis. Is that consistent with, you know, historical things? And it could do a lot of comparisons and analysis.

Starting point is 00:32:21 Or do you think this person has this disease instead of just yes or no? Why do you think that, you know, you could have this conversation with it and it may be able to tell you. Yeah. That's really cool. Indeed. All right. Well, I guess I was joking about it a little bit, but I think there's a lot of power there.

Starting point is 00:32:37 I mean, like you said, I don't know if we can get doctors actually seeing people more, but also, you know, maybe a 911 call could like, if we determine it doctors actually seeing people more, but also, you know, maybe a 911 call could like, if, if we determine it's not an emergency yet, but maybe we could direct the person to the right, the right place faster. I mean, there's lots of places where maybe somebody not with a, like the full degree, but somebody that's still pretty involved with medicine can, can utilize this to ask the,

Starting point is 00:33:03 ask better questions and get somebody to somewhere faster. Right. Or even highlight, you know, what were the key takeaways from this visit with the doctor? Yeah. Right.

Starting point is 00:33:12 Yeah. So anyway, it's cool. Yep. All right. One more bit of feedback out there. Will McGugan. Hey,

Starting point is 00:33:18 Will, this is the kind of thing I'd like to see from AI, AI used for not putting artists and copywriters out of business. Yeah, I agree. Amplifying people's good work, not necessarily replacing it. Yeah, we'll see where it goes. All right, Erin, got the last one. Okay, great. So yeah, talking about code mentorship and communicating with new developers. That's my next topic. So Sheena O'Connell gave a talk at DjangoCon last year. I attended that conference, but I missed this talk and watched it online later. And it's about her work at Umoosie training unemployed young people in underserved communities in Africa.

Starting point is 00:33:59 So her company had to quickly build an online learning management system when the pandemic hit in 2020. And they built that LMS in Django, which is why she was giving a talk at DjangoCon. Before then, the learning was all done in person. So anyway, you might think like, that's cool and all, but how can I apply that to me? And I think that this talk is really excellent. I also think, I don't know if you all have ever listened to the Django Chat podcast. They had Sheena on and she talked about her work at Umuzee and she talked about getting learners to review each other and also teaching green

Starting point is 00:34:41 developers how to use GitHub and things like that so they don't, quote, bother their teammates too much once they get into their jobs after they're finished at a Muzi. And she specifically said, the quote I liked was, what sort of thing does a person need to know in order to not annoy their co-workers in the first three months? So I really liked thinking about the learning in that way. And yeah, so something we started doing recently where I work is we had been doing code reviews. Me and the other code base lead had been kind of just doing them all ourselves. And our project manager, Matt, suggested we take we have a new requirement where two non-code-based leads have to review any pull request before any code-based lead looks at it. So that's something we just implemented and

Starting point is 00:35:32 either of you have familiarity with pull requests and code reviews in your day-to-day? Yes. Yeah. So I have to say it's really been really helpful to us. And I liked Sheena talking about that on the Django Chat podcast. She also mentioned that at Umoosie, the learners review each other. both learn how to review code and also review someone else's answer. Because, you know, with Python, there are like a lot of different correct answers, right? So just like reactivating that part of their brain to look back at a previous answer is kind of cool. Yeah. I also think that it's cool that they're learning more than just loops, variables, functions, you know, but how to coexist as a teammate in a software team. Yeah.

Starting point is 00:36:29 Yeah, that's cool. Yeah. Good find there. Yeah. So we're always looking for new ways to like onboard developers. And another cool idea that Sheena had was writing half solutions and leaving gaps for others to fill in the blank. I thought that was kind of cool because when we onboard a new developer to our code base, it can be really rocky. And I kind of thought like, oh, that might be kind of neat

Starting point is 00:36:54 instead of giving them a whole ticket to work on, like half finishing the ticket and like letting them fill in the other blanks is kind of cool. And just one more article that I found about this was on the Cactus blog. I used to work at Cactus as a Janger developer there. And so I still follow their blog quite often. And they had this recent blog post from Dimitri Chukin about their new internal mentorship program there where they have three different paths. And one is apprenticeship for folks just starting out as developers. One is for fellowship,

Starting point is 00:37:32 and that's for people who are currently training in one of those coding camps. And then the third one, which is really kind of special, is mentorship for high school students. So I thought that was kind of neat. We're still, where I work, we're still figuring out how to onboard people. I feel thought that was kind of neat. We're still, where I work, we're still figuring out how to onboard people. I feel like that is one of the hardest things.

Starting point is 00:37:50 Do you both know what I'm talking about? Onboarding is extremely difficult and depends on how much, well, it depends on like that skill set you need people to have. I mean, when you have like a diverse set of skills, we always face that. So we've got, I need somebody that knows both uh python well testing practice as well c++ well and uh be great

Starting point is 00:38:11 if they also knew uh like rf measurements and stuff like that and you just can't find those people so you have to pick what what you what you want somebody to complement somebody else with and and know that you're going to have to help train. Right. You support them in the other areas. Yeah. Yeah. Yeah. Cool. And one of the things you mentioned, like code reviews, we use code reviews a lot for

Starting point is 00:38:35 communication, not necessarily for people to catch what somebody else is doing wrong, but to make sure that everybody understands what the rest of the team is working on. So we, um, especially for long running things, uh, we have a practice of using draft code reviews. So code reviews and draft so that, um, and get lab won't let you merge it if it says draft in the title. So, um, so then, then people can just keep updating that and then they can get feedback even when it's not ready, uh, when the code's not ready yet. So good way to do that.

Starting point is 00:39:06 Yeah. Very cool. Nice. Well, nice find, Aaron. All right. Nice. That's all of our items, Brian. You got some extras for us to share?

Starting point is 00:39:14 Anything else you want to throw out there real quick? No, I'm spending most of my extra time getting my talk ready for PyCascades. So PyCascades is coming up soon. Yeah, indeed. Coming up very soon. Excellent. Aaron, how about you? Want to throw anything out there?

Starting point is 00:39:25 Yeah. DjangoCon US is in Durham, which is 15 minutes from where I live. So I'm excited. Nice. North Carolina is a fun place to visit. Yes. It's generally warm, although not always warm, but generally warmer than a lot of places. Generally warmer.

Starting point is 00:39:40 And it's in October, so it'll be kind of a nice time of year, probably. Hopefully not boiling hot, but yeah, probably not. Cool. I'll have to try to see if I can get an excuse to get out there. That'd be fun. All right. Excellent. Anything else?

Starting point is 00:39:53 Is that it? How about you? Yeah, I got one. You know I do. All right. So an article came out a few days ago. Security researchers uncover 700 malicious open source packages on NPM and PyPI. This used to be a thing that could even headline.

Starting point is 00:40:08 I think we even headlined it. It was the title of one of our shows, Brian. The news here is not this. The news is that this stuff is just not news anymore. So people, be careful out there. When you pip install stuff, make sure you spell it right. That's generally the worst thing is the typo squatting. So anyway, the fact that this is not realized.

Starting point is 00:40:28 I didn't realize that that's how they were. Oh, that's so smart. They might put a virus in request instead of requests with the plural. You know what I mean? Or if you transpose two letters and there's some stuff that they're, the IPA is trying to do to work that, but it's still tricky. Or standard lib stuff that you don't have to install. It's just there. People spot on that.

Starting point is 00:40:49 Right, right. Yeah, and create a package for that. All right. That's not the end of it. Another one. Brian, do you remember I announced, hey, everybody, update your Git. There's a security vulnerability in Git.

Starting point is 00:41:01 This is the first time this has happened in a really long time. I said, make sure you apply Git, or you install 2.39.1 or higher. Well, guess what? 2.39.1 has a vulnerability that's completely different. But if you try to clone from a malicious repository, you're going to be having a bad day. So update your Git again.

Starting point is 00:41:23 All right. And then also, I'm working on a project now where i needed a an ignore file but the project was originally created in one language and i wanted the ignore file for another and i was basically going to combine them so maybe you all know this maybe you know this but github when you go to create a new project you can choose what kind of project is it is it c++ is it Is it Python? Is it Dart? Is it Flutter? And you'll get a different ignore for that. Well, there's actually a repo, github.com slash github slash git ignore,

Starting point is 00:41:54 and every single language that you could have chosen in that dropdown has its ignore file here. So, for example, the Python one, it's checked into this project. So when you say create a new python project what comes out as the ignore is actually this file so if there's people out there who really need a change to the default behavior of the python get ignore for projects you know you could go to a pr for this but the way i use it is i just said i also need one on flutter or uh there's not a flutter one but there's a Dart one so I grabbed the one for Dart and piled that in there as well

Starting point is 00:42:28 or even if you're not using GitHub you can use this for yeah exactly it has nothing to do with GitHub you have access to every version of an ignore file that GitHub thinks is good related to that is gitignore.io this is another one you can come down here and search

Starting point is 00:42:44 for other stuff. Like, for example, there was no Flutter in the GitHub one. But over here, I can put Flutter. And here's my Flutter one for all the crazy build code generation madness you get. So this is a project by TopTal. But gitignore.io. And you just put it in here.

Starting point is 00:42:59 I'm looking for whatever. And then it'll pull up. Oh, type PyTest. See if it'll do the results found oh sad sad face but anyway if you're looking for ignores um for projects there you go those are kind of nice cool nice all right are you all ready for a joke yeah yes brian i thought about you on this one in particular so we'll see uh we'll see what you think of it. So this is one. It's a cartoon, and it has a cartoon character looking at two red buttons.

Starting point is 00:43:29 They're both going to do something massive. One has the star asterisk character, and one has the ampersand. And there's the person there just sweating out, like their fingers in the middle, doesn't know which one to pick. And it said, my C code isn't working. No one involves pointers. What do you think brian i would not hire this person so the the star will dereference the pointer turning a pointer into one less level of pointing and do a value where the ampersand will take a variable and make it a pointer or if it is a pointer make it a pointer to a pointer or even more so.

Starting point is 00:44:06 Which one do you press? Oh my gosh. It should be obvious by context. It says a C++. Erin, do you have to do any of this kind of crazy stuff or are you thankfully above and beyond the pointer world? I am. Yeah. Thankfully not.

Starting point is 00:44:21 Yeah. No, no, no C++ in my world. Yeah. All right. Well, that's what I got. I brought that one for you, C++ in my world. Yeah. All right. Well, that's what I got. I brought that one for you, Brian. It's good. Thanks.

Starting point is 00:44:31 I'll incorporate that as my next interview. You need to change a string. You're given a variable. Which one of these do you push? Nice. All right. Cool. All right.

Starting point is 00:44:44 Well, Erin, it's been great to have you on the show. Thanks for being here. Thanks for having me. Nice to meet you push. Nice. All right. Cool. All right. Well, Erin, it's been great to have you on the show. Thanks for being here. Thanks for having me. Nice to meet you both. Yeah, you bet. And Brian, thanks as always. See you.

Starting point is 00:44:52 Bye. See you all.

Python Bytes - #324 JSON in My DB?

Topics covered in this episode: Use TOML for .env files? Pydantic gets serious funding f-strings with pandas and Jupyter keyboard shortcuts BioGPT Extras Joke See the full show notes for this ep...isode on the website at pythonbytes.fm/324

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.