Python Bytes - #233 RaaS: Readme as a Service

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver news and headlines directly to your earbuds. This is episode 233, recorded May 12th, 2021, and I'm Brian Ocken. I'm Michael Kennedy. And I'm Marlene Mangami. Well, welcome, Marlene. For people who don't know you, can you introduce who you are? So I am a Pythonista, of course, and I am based in Harare, Zimbabwe. I am also really involved with the Python community. So I'm currently the vice chair of the PSA board of directors.

Starting point is 00:00:34 The board, I think, for about coming up on four years now, which is really exciting. And it's been really a very cool experience for me. I'm also a software engineer. I work right now with the Rapids team at NVIDIA and have just been doing software engineering with them. I will talk a bit about that later. But yeah, I'm trying to think what else. I'm also a very avid reader

Starting point is 00:01:01 and just like doing other things besides software. So yeah, that's pretty much me cool that's awesome you're doing a bunch of cool stuff so i think rapid seems like a really neat project to work on as well and of course the python community side is great so super happy to have you here brian you know having a good readme is really important to a project. Wouldn't you say? Yeah, definitely. Um, and I, uh, for some reason,

Starting point is 00:01:28 I don't know. Readme's are not difficult to write, but I freeze up. Uh, it's a blank page syndrome. I think often I've gone through and just like copied from some other project. You want to, what's, what's in there?

Starting point is 00:01:38 Read me. Um, but I don't think that that's like the best way to go about it really. Cause sometimes you forget stuff. So this was a we have a recommendation from Johnny Metz. It's a tool called ReadMe.so. And this is like totally fun. It's just this interactive thing where you get to add stuff.

Starting point is 00:02:01 So we've got the title. On the left hand side, there's a bunch of sections where you can select what you want to go into the readme. And then it shows a preview on the right, but you can also see the raw markdown. And then in the middle, there's an editor. So you can actually just edit the whole thing here. But really, I don't know if I really would.

Starting point is 00:02:22 Project, project title. What I'd probably do is go through and, uh, you know, pick out to look at what, what sort of things I'd want. So I'd probably maybe some acknowledgements if I took the, if I got some help from somebody, um, maybe an API reference, if it's got a, if it's a library, uh, how to contribute. Oh, badges. Definitely want badges. Um, and then, then you know maybe like how to run tests if you want to contribute um uh if there's other cool projects using it i'd want to use by all

Starting point is 00:02:53 these sort of things and then you can the the editor only selects it only shows you the ones the the one at a time which is nice but then you've got this, this whole generated, really nice looking readme with tables and everything, like built in, and you can either just copy it or download it and just run with it. I think this is really great. I'll probably use this future. I really love this. And I'm surprised about the psychological benefit of just showing the little section with the one heading. So for example, acknowledgements, you just have hash, hash, acknowledgements. And then the few things, even though you're editing the whole readme, it seems so much more like, oh, I'm going to just

Starting point is 00:03:33 work on that section. It's really cool. Marlene, what do you think? It's really, really cool. I like it. I think I'm going to try it out. I have put no effort at all in this. So I think it's something I need to put more effort into. And this looks like a really good way to do that. Yeah. I think that it would also be great to just, like if you have an existing readme and you want to add some new sections, you're not quite sure how it should look.

Starting point is 00:03:59 Using this as a jumping point of just to grab sections of a readme to add to an existing one too, this would be yeah this is really really cool how do you how do you can you start with a new one like can i well sorry let me take it back can i start with an existing one can i somehow upload an existing one i i don't see so wait uh i can go to raw hold on oh you could probably just drop it into maybe yes you can drop it into the raw that's it okay perfect uh you go to raw. Hold on. Oh, you can probably just drop it into raw. Maybe. Yes, you can drop it into the raw. That's it. Okay, perfect. You go to raw, which it doesn't hide the sections.

Starting point is 00:04:29 It's just pure markdown. And then you just throw it in there. Okay. So you can't edit there. No, no, but you can flip it back. I think probably once you edit it there. I don't think you can edit. So you can only edit in the editor part.

Starting point is 00:04:41 So yeah, it still looks really, really cool. I've heard of platform as a service. I've heard of platform as a service. I've heard of infrastructure as a service. I've heard of database as a service, but I guess now we have readme as a service. I don't know. You just go to the website. Exactly. That's pretty cool.

Starting point is 00:04:57 I'm pretty excited about this. Actually, I might play around with this for my next project. I've got some stuff that may end up on PyPI soon, and it'd be cool to do project. I've got some stuff that may end up on PyPI soon and it'd be cool to do it. All right, so I've got the next item and it's a bit of a skateboarding dog type of thing. It's not something I think a lot of us

Starting point is 00:05:14 will take advantage of, but it's something that is pretty interesting as we kind of look at how Python is finding its way into the larger computing space. Yeah, and oh, Sam Morley out there in the live stream before we move on says it'd be really cool if you could point this at a GitHub repo

Starting point is 00:05:29 and edit your repo directly, the readme directly on your repo. Yes, absolutely. That's fantastic. Yeah, that's a really good idea. Really good idea. All right, back to my skateboarding dog. So there's a company called Cerebrass

Starting point is 00:05:44 and this was sent over to, to us by Galen Swint, who is a PhD researcher who does high performance computing and stuff. So in that world, I think this, this may be a real thing. You look through the article here that talks about this announcement and it's like, well, there's like these 12 customers or 15 customers of this, this chip. But for those of you watching, there's, or you check out the article, there's a woman holding a chip. And normally we think of computer chips as little tiny things. This is a 12 inch by 12 inch computer chip, or you want to go metric 30 centimeters by 30 centimeters. It is a big, big computer chip. And the idea is we've had small little chips come

Starting point is 00:06:26 along to do special types of processing. We've had GPUs come along and do be adapted, I guess, for things like machine learning, training, machine learning models, and so on. This thing just takes that idea to an entire new level. So for example, I'm always going on and on and raving about my Mac mini, my M1, where it's a cheap little computer relative to Apple stuff, I guess. But it's super fast. But it has four performance cores and four efficiency cores.

Starting point is 00:06:57 That's it. Your GPU, if you've got a really high-end one, might have 4,000 cores. This insane little chip here has 850,000 AI cores on one chip. Is that insane? What do you think? I'm curious how they, I mean, this is some major advances in wafer technology because how do you get that big of a chip with no defects in it? Yeah. And they have apparently 100% efficiency. Well, first of all, one of the ways you do it is you use the TSMC foundry, who seems

Starting point is 00:07:28 to be taking over all these small, high efficiency type of things. And so they had a previous one that they've more than doubled the core count for. And another way to kind of appreciate how much is going on in this chip, go back to my M1. It has 0016 trillion transistors. This has 2.6 trillion. going on in this chip, I, you know, go back to my, my M1, it has a zero zero one six trillion transistors. This has 2.6 trillion. Or is that another way?

Starting point is 00:07:50 2,600 billion, a billion transistors versus 1.6 billion. Like it's a 2000 times more on this chip. So super, super cool. And now you may be wondering, all right, all this is interesting and chips are neat. What is the Python angle? Like, why would I bother putting this on here? Because, you know, we don't really talk about chips that much, except for when I go on and on about my M1.

Starting point is 00:08:12 Here's the deal. If you scroll down in this article a little bit, you'll see user program this insane machine transparently in machine learning frameworks, such as specifically TensorFlow and PyTorch. Isn't that crazy? That's really interesting. Isn't it? Yeah, I was just thinking about you as I'm going through this because you're working on the Rapids project, which is not the same thing, obviously, but it's kind of in that

Starting point is 00:08:35 space, right? Yeah, it is. Have you heard of this before? No, I haven't heard of this. This is, yeah, this is really big and I have not heard of it. I will get into reading a bit more about it after. Yeah, yeah, for sure. So there's a lot of interesting things. And one of the, I can't remember where exactly they spoke about it, but they basically say,

Starting point is 00:08:57 what you do is you program in TensorFlow and PyTorch as normal. And then they have this custom compiler that rewwrites that extracts this execution graph to actually scale out to the 850,000 cores to the developers don't have to think about how they program against something like this. I don't want to spend too much time on this because there's something, my, my next item is super amazing. I want to take the time to dive into it, but there's another thing that's really interesting just as you look at it like this thing takes an insane amount of power like oh for this one chip you're going to need a four kilowatt power supply with up to a peak power of 23 kilowatts when when you plug in an electric car at one of the high speed home chargers, that's seven kilowatts, just to give you a sense.

Starting point is 00:09:47 This is like insane amounts for one chip, right? You could think of it as a supercomputer. Like it's one chip. So anyway. Our entire lab doesn't draw that much. So the reason I said it's a skateboarding dog thing is I don't think most of us will be able to ever even interact with one of these, much less buy one. They're going to be shipping in the later part of this year and price is something like three million US dollars plus. So this is certainly super computer level.

Starting point is 00:10:15 But I do think it like opens the door for really interesting stuff going on in the high performance Python space. So, yeah, glad that Galen sent it over. Well, I'm totally going to put 25 bucks into Dogecoin so that I can afford this later this year. Oh, speaking of which. Exactly. Well, what about, I think maybe you get this and you create an AI that can more intelligently

Starting point is 00:10:38 mine Dogecoin and then you take over the world. Just an investment. Yeah. Just an investment. All right. So speaking of large-scale high-performance computing, Marlene, take it away. Sure. I have the next item, which is Rapids. And I wanted to speak about this because I'm working on it and it's what I've been working on for, I think, yeah, wow. It's been about a year since I have been with NVIDIA, working as a software engineer there and working specifically on the Rapids project.

Starting point is 00:11:12 And so Rapids, I think, is really interesting because the goal of Rapids, similarly, like the last thing Michael just showed us, is to speed up data science. But this is with GPUs. So I think it's been really cool to work on the Rapids project, and I think it's really interesting as well because it's open source. Also, there's a lot of Python involved. So it's not mostly Python. Actually, there's a lot of C++ and CUDA code in there as well.

Starting point is 00:11:49 But I am not, you know, personally, I'm not, my aim is not to learn CUDA. It's to try and avoid that as much as possible. And also avoid as much C++ as possible. So that's a bit more reasonable. But one of the goals of the Rapids project is to allow people who are Pythonistas to work primarily with GPUs and to get those speedups without having to know any CUDA code or to know any C++. And so I have been working primarily on the Python side of things and have really been enjoying it. I work specifically on with the QDF data frame library. And QDF is basically a, it mirrors, it's a GPU data frame library that mirrors pandas. So if you have a data set and you'd like to do computations on your data set

Starting point is 00:12:48 or do different operations on your data set, if you can do that with Pandas, you should be able, hopefully, to do the same thing with QDF. But the good thing is that it will probably be faster. I actually can't definitively say that it will be faster because um I remember when I first joined the project as well I was I really I'm very enthusiastic and I really enjoy sort of sharing when I'm learning something new and I remember I was like going around and speaking and saying that you you know, KUDIF is so much better than Pandas because it's just so much faster.

Starting point is 00:13:28 And then my manager was just like, you need to stop saying that because it's not true all of the time. It's true most like some of the time. So for smaller data sets, it's probably better to stick with Pandas um because yeah there's always this overhead right like as you scale things out and stuff there's probably like well how do we convert this over and get it onto the gpu and if if that process takes right half the time of what just doing the computation you might as well just do the computation right exactly like if you're already if you're mainly doing right i agree like if you're working with smaller data sets and you are fine with that and

Starting point is 00:14:10 that works for you and your time is not like being wasted a lot then i would say please go ahead and stick with pandas but if you have if you are working on like larger data sets and the larger your data sets get the more the difference is going to be in terms of your speed up. So with very large data sets, QDF is going to take a much shorter time to do computations and things like that. Yeah. You actually put a really interesting example in the show notes here, right? Showing how many zeros is that? A hundred million items or something like that? Yeah, it's a hundred million. I just kind of like randomly chose a or something like that yeah it's 100 million i just kind of like randomly chose a number um to try and make it like i didn't also want to take a number that

Starting point is 00:14:52 was too big because i didn't want to spend like a long time um doing it and i know like for a lot of data scientists like i think increasingly people are working with larger and larger data sets um just depending which field you're in. For the example, I put it on the show notes and it's on the screen right now. But if you take Pandas sort of data frame and try and calculate the mean, and you take the same QDF data set and try and calculate the mean, it will take, I think, I'm trying to look at the notes. It's 105 milliseconds for pandas and it's like 1.83 milliseconds for QDN, which is just awesome. And that's like a smaller scale. I would say data, data set compared to some people. Um, yeah, just a hundred million. It's just a hundred million. So it's not a lot. I mean, it depends.

Starting point is 00:15:46 But yeah, I think it's definitely significant once you get to a certain threshold, which is pretty cool. Yeah. Over on the RapidSight, it's a RapidSight AI. It says it scales out on multiple GPUs. So seamlessly scale from a GPU workstation to multi-GPU servers and multi-node clusters,

Starting point is 00:16:04 working with Dask as well. So Dask is also kind of about scaling pandas and combining those. That's pretty awesome. I actually saw that you have a Dask force out. I recently saw it. Yes. Definitely going to take that. Yeah, check it out.

Starting point is 00:16:19 That one. I'm going to dive into Dask a little bit later. Awesome. Yeah, yeah. We put that together with Matthew Rocklin and team over at Coiled. Yeah, and that's actually free. So people can just drop in and take that course. I think maybe I can put it in the show notes at the end.

Starting point is 00:16:32 I think it was just announced. Let's see. Very cool. No, that was last week. But yeah, this is super cool. And this one is certainly within normal person's reach. You get a GPU and you're good to go, right? Yeah, I think, yeah.

Starting point is 00:16:47 I mean, I'm just using it on my laptop with the GPU. You can also use it like online. So there's also a Colab notebook on the Rapid site. I think that you can click and then you can like kind of experiment if you just wanted to do it online or I think you can use any sort of online GPU that you have access to. So it's very, I think it's trying to make it more accessible,

Starting point is 00:17:12 which is great. Yeah, that's super cool. Yeah, very neat. Well, like I said, I think this is a cool project to be working on. So thanks for sharing it with us. Brian, is it time for the next one? Is it time for the next one? This was recommended by a listener, Ira Horeca, I think. So mentioned this, and it's kind of a rabbit hole. I spent a whole bunch of time playing with all this stuff last night.

Starting point is 00:17:37 He recommended Date Finder. So this is a Python utility and it kind of is amazing. So it's a combination of a couple of things, but so he pointed us to a com code video, which I, you know, I'm totally a fan of com code stuff because they, they kind of go through some of the Python libraries and some of the other, a lot of other things, but just have kind of a quick demo of what it does. And I really appreciate that. It actually, the demo here is better than the readme

Starting point is 00:18:06 and the Datefinder readme. So maybe I guess a pull request is necessary. But anyway, so what Datefinder does is it takes, I'm going to scroll down a little bit. So Datefinder takes, it parses dates or it finds them. So you give it a string or a bunch of list of strings or something, and it can find where the dates are in there.

Starting point is 00:18:31 So if you've got a sentence or a paragraph or an entire page that has a whole bunch of dates in it, it'll find all of them and then return you a list of dates that it found. It actually does a whole bunch of things, but that's the default, or the one that we're talking about, find dates. There's a bunch of other less documented features of Date Finder, but this is the one that is demonstrated here, and it's pretty cool.

Starting point is 00:18:58 So what it does is it finds those dates, and then it converts them to date times. So find dates will find them and convert them to date times. And it does that by passing them off to the DateUtil library. So this is just kind of a really cool demo. The list, the little video is a good demo of showing how to do this. I also really kind of liked this way to play. So the video shows this way to play with things of of it just had a list of strings and then uh used a comprehension to convert that to to call a

Starting point is 00:19:31 function on a whole bunch of strings and i thought this was just kind of a clever way to just play with a function that translates things this is a neat thing to do i would have probably so hard yeah it's super hard but uh normally because it picky, right? You've got to go to the date time parsing language almost lookup. So if I put percent DDDD, that March 12, 2010. But if they forget the comma, it won't parse. And like all those things are really annoying about working with reading, converting strings to dates. And this looks like it just it doesn't care. It's nice. Yeah. And then it also it's kind of a clean, nice, clean interface to it as well. The and a limited documentation is just a focus tool, which is nice. And it's interesting that this is just a focus tool that apparently a lot of people need because according to GitHub, there's 662 projects

Starting point is 00:20:30 using this. So it's used kind of all over the place. The behind the scenes though, it's taking the dates that it found, the strings, and passing those to DateUtil. So if you want to avoid the finding part, this actually is also a good library to look at for the usage of how to to DateUtil. So if you want to avoid the finding part, this actually is also a good library to look at for the usage of how to use DateUtil to easily convert dates. And DateUtil is kind of an amazing tool as well. And it gives you, DateUtil, I told you this was a rabbit hole.

Starting point is 00:21:00 One of the cool things about it is it doesn't just parse dates, but you can do relative dates. You can say like today plus three weeks or something and it'll figure that out and then you can or you can take two days uh two dates and do date math with it really well and also date util has an amazing time zone support probably the best in python so um this is pretty pretty kind of cool uh, I think I was looking through the test code. The test code for DateUtil is kind of a neat mix of unit tests and PyTest. Both of them are good examples of how to do both.

Starting point is 00:21:37 And I like some of the newer stuff is using PyTest with parameterization, but it's good. Yeah, I like this a lot. Marlene, what do you think? Yeah, I like it. I think I'm not actually working with dates quite often. with parameterization but it's good yeah i like this a lot marlene what do you think yeah i like it i think it i'm not actually working with dates quite often so i'm i'm trying to think of use cases for myself other than like maybe converting time zones which is like a nightmare um so maybe oh you can say that again oh my gosh maybe you said that but it looks like it would be really useful for people that are yeah they use networking dates a lot yeah i'm i'm showing up some of the some of the examples

Starting point is 00:22:12 from date util of how to use it and it's basically i i imagine this is one of the reasons why date finder is so used because um this is non-trivial even to use datutils. Yeah, that's cool. Cool, cool. All right, well, I got the next one. And this one doesn't exactly come to us from Anthony Shaw, but I was talking to Anthony about something else. And he's like, oh, have you heard of this? Have you heard of Cinder?

Starting point is 00:22:37 And Cinder is pretty awesome. So Anthony's doing interesting work around Python and performance at the CPython level, especially now. I think he's giving a talk on Pidgin or Piston. Piston, I believe it is. I'm not 100% sure. I might be remembering which one's wrong at PyCon, which is, you know, we're going to

Starting point is 00:22:55 talk more about that in just a second as well. But Cinder is a really interesting fork of CPython from Instagram. So it's under the facebook incubator project and i think we've mentioned it before i definitely have talked about it before other presentations that instagram has done really interesting things like disable the garbage collector just turn it off 100 and they got less memory usage not more memory usage by just allowing the cycles to leak which is insane insane. But this is like, speaking of insane, this takes it to a whole nother level. So this is, they've been doing all these low level

Starting point is 00:23:30 things inside of CPython. It is based on three, eight. Hopefully some of these ideas can be brought forward and shared with everyone because there's a lot going on. So let me just cruise down here. I'll just read the little intro part because it's jam-packed and then I'll go into some of the details. So it says this is the internal performance-oriented production version of CPython 3.8. And it contains a number of performance optimizations. I feel like performance is some sort of theme of this episode. eager evaluations of code routines, a JIT, just-in-time compiler, an experimental bytecode compiler that uses type annotations in some incredibly interesting ways to emit type-specialized bytecode that performs better. So just to give you an example, one of the reasons that math in the

Starting point is 00:24:19 pure Python layer is slower than, say, C++ or C Sharp is C++ and C sharp work with just the value. So if you have the value seven, you might have two or four bytes that represent the value seven. In Python, you have a pie object pointer, which is like 28 bytes pointing out to a thing on the heap that represents the number seven. And it's a whole lot more work to interact with that and set the reference count on that. And so on, instead of just working with the value seven, right? So one of the things they do is they actually have hyped the, the, they use Python type annotations to understand, oh, this is an integer. This is a long and so on type of thing, and actually convert those to the, the machine

Starting point is 00:25:02 oriented numbers, right? So just the value four instead of a pointer. And then it will use what's called boxing. If something else that's outside of this world needs it, it'll up-level that to like a pylon object pointer type thing and hand it off. So there's all sorts of stuff like that going on. Interestingly, the first question is, is this supported?

Starting point is 00:25:20 No, not supported. But there's some interesting things going on here. And all of this has to be taken with an understanding that it's in a very specific context and that it may or may not be useful for you. Brian had pointed out some articles and ideas around that you're not Instagram, you're not Facebook, you're not Netflix and so on. Most of the time, people are building much smaller software with different constraints. So they start out by saying,

Starting point is 00:25:51 look, Instagram uses a multi-process web server architecture where the parent process starts, performs initialization, and then forks 10 worker processes to handle requests. This is super common. Like for example, TalkPython training literally does exactly this. It uses micro-WSGI, it starts up and it creates 10 worker processes to handle like people wanting to take courses. So it's not uncommon in the web, but it's not how all Python code runs. And so the first optimization they did is they created what are called immortal instances. The reason they were so focused on the garbage collector and all those sorts of things was when you fork these processes, initially, there's a bunch of memory that can be shared. And that helps with cache locality that helps with overall memory usage, all sorts of things. But as soon as something has changed about one of those items, it has to copy a whole page

Starting point is 00:26:42 of memory. And they realized that when an object's reference count is modified in one of the processes, it has to copy, replicate, and sort of fork off a bunch of the memory that used to be shared across all those processes. So they created what they call immortal instances that cannot be, that don't participate in reference counting or garbage collection. And that prohibits their reference count number to change so they can be shared. So they can mark like a whole bunch of the startup stuff as like, just don't even look at this or change it

Starting point is 00:27:12 and don't do reference counting on it. So in their world, it got things faster, but it doesn't always, they said it's something a little bit slower in straight line code, but in this sort of forked world, it's better. The next one is shadow bytecode, which is an inline caching implementation. And it goes through, applies in certain optimization

Starting point is 00:27:31 cases for generic Python opcodes. And it'll observe those for functions that take a lot of time and dynamically replace those with specialized opcodes that it thinks are going to be better. Another thing it does that's pretty interesting is it will eagerly evaluate coroutines. So if I say this is an async method, and then in that method, I call await some function call, normal Python is going to create a coroutine. It's going to schedule it on the async IO event loop, and it's going to get to it. And that's a lot of overhead. But maybe that function says inside, the first thing is, if this case just return the cached answer, otherwise go to the database, await the response and so on. And what they realized is if it's going to go through that

Starting point is 00:28:16 first case, it's not actually awaiting something. So they'll actually execute the awaited thing up until it actually needs to become async. So it'll like look or effectively look inside the function and say, is the path we're going on this time going to be async or not? And if the answer is no, it will run it without async, which means it skips all that context switching and all that stuff, which is pretty crazy. It also has the sender JIT, which is a method in time JIT compiler, think C sharp, Java, maybe even JavaScript V8. So it's enabled for every function that is called. Actually, it's not sorry, if it is, it'll make it slow. So you can basically say which functions should be optimized. But they say

Starting point is 00:29:00 it supports almost everything that Python can do. And it has a 1.5 to 4 times speedup of the Python benchmarks, which is pretty interesting. They also have this thing called strict modules, which is actually a static analyzer capable of validating top-level code to see if a module has side effects and can treat it differently if it doesn't. You can have an immutable strict module type that is sort of a replacement for Python's regular module that behaves and loads differently and so on. And then the thing I talked about, the numbers more broadly is under this category of static Python. It's an experimental bytecode compiler that makes use of type annotations to emit better things. And check this out. It can deliver performance similar to mypyc or Cython. And this thing will go up to seven times faster than regular Python for the

Starting point is 00:29:54 Richards benchmarks. And I don't know if the 4x improvement before is like in addition to this, you get 28 or you just get seven. I don't really know. But there's a lot of things going on here and a lot of different ideas about how this works. So I'm just scratching the surface on the details, but I feel like I've gone on and on about it. That's really interesting. I saw, I think, is there a talk about it at Python, at PyCon? I think that- It is coming up, yes.

Starting point is 00:30:20 They're going to give a talk on this at PyCon. Yeah, it was one of the talks I was looking forward to listening to. Yeah, just because I think it's super interesting to be able to kind of play around with that they were able to kind of make their own version of Python. And it might, I don't know, like I think that there's, like you mentioned, Anthony, and I also know Victor, I think,

Starting point is 00:30:46 and someone else who are also working on sub-interpreters and different things to make Python faster. So I'm really curious to see if the core devs or people will also be listening to this talk and maybe take some ideas from it. It would be really cool to kind of see. And I mean, it's always good to get speed ups, even if they, I don't know. I don't know if it will help like general, like normal Python users, but I think it's always good to look into. Yeah, yeah, I agree.

Starting point is 00:31:20 I think some things here are absolutely transfer transferable to regular general purpose C Python. And some of them might not be, for example, the immortal, the immortal instances. That might be a thing that just they do that. And it makes sense for their large scale farm of servers. But the the JIT that takes the type information and does math many, many times faster that everybody would want that. Like we all work with numbers at some level or another. Brian? Well, one of the things I love about the...

Starting point is 00:31:50 I mean, this kind of applies to all of these sort of speed up projects. One of the things I love about Python is just the generalness of it. You can throw it. Data structures can hold anything. But there are times where you really are using a huge array of floats or a huge array of integers or a huge array of like a fixed data size um those are those are

Starting point is 00:32:14 times where you i don't need it to be generic i just need it i need it to be fast so having having something that's the part where i think it'd be be interesting to pull into a regular python but but don't we get that with like uh some of the data science stuff anyway um the number like it's like numpy and stuff yeah you do but you can't do generic programming with it right you do like sort of matrix math type of things and this one like the answer used to be okay well this function is slow this serialization deserialization section might be slow so rewrite that in cython for example and what's really cool about this is you can write regular python and just put type annotations on it and then it goes as fast as cython and you you don't even have to do like a separate compiler

Starting point is 00:33:02 i believe in this world right because have, the JIT just knows that and then will, like as you run it, it'll just compile and run it. So which is, I think it just sort of makes some of those ideas closer and more automatic for most people. I kind of think I foresee a future where we have sort of some types that affect runtime.

Starting point is 00:33:24 There's like this tension that I sense in the Python core people of whether or not types should be just an afterthought or whether they should be really part of the runtime. And I think there are some cases where having them be part of the runtime might be a good thing. Yeah, and this is interesting

Starting point is 00:33:43 because what they do is they define these static modules, and then in there they can treat them differently. I feel like I always see on Twitter some people kind of like ranting about how they don't like that direction that Python is going in, like the

Starting point is 00:33:59 idea of putting in like annotations and things like that. I've seen some people that are not super big fans of that. I'm not really sure why. I generally would like to understand, like I think most people, or not most people, but I think some people would prefer Python to maybe remain as it is.

Starting point is 00:34:19 But I do think that there's like, just having it be a bit faster in a couple of cases would be helpful. So I don't know so i don't know i don't know if it's in that direction i'm with you and one of the things they point out in this readme uh announcing the project is that you can still do gradual typing so you can in some places have no types in some places have some types and the thing can convert and just deal with that automatically and i think that's the the reason that the types are really welcome in Python

Starting point is 00:34:46 is because you can use them if you want, but you don't have to. As opposed to places like TypeScript, which said, well, JavaScript doesn't have types, so we're going to add this very strict type system. And if you don't fit it exactly, we're going to not compile and complain and it's going to be really not good.

Starting point is 00:35:00 This feels like it continues that forgiving nature of Python to let you opt into it. But if you do, it can go faster. That's the direction I'd like to see. I personally would like to see types be really a full-fledged feature of Python. I love that they're optional, but if they're there, let's see how much we can do and improve things with them, right? A hundred percent. Yeah. All right. Marlene, you got the last things with them, right? A hundred percent. Yeah.

Starting point is 00:35:25 All right. Marlene, you got the last one. I got it on screen for you. Okay. Yes. The last one for today is PyCon US, which I'm very excited about. It started today, which is really great. Are both of you attending?

Starting point is 00:35:41 I don't know if you're attending. Yes, absolutely. Okay. Brian? Brian, are you attending? I don't know if you're attending. Yes, absolutely. Okay. Right? Brian, are you attending? Yes. Yay. Yeah, I think it's such a great event in terms of the fact that I know it's PyCon US, but

Starting point is 00:35:55 it is, at the moment, it's the largest Python gathering or largest PyCon on Earth, I think, which is very cool because it means that you can meet people from all around the world like I remember um I'm really sad that it's not in person because like last year like I remember not last year but the year before that that's where I actually met you Michael for the first time I think we were literally I think we were like at a table with like you and Anthony Shaw and like Lucas Slanga. And it was like, and I was just randomly there. But it was such a cool discussion.

Starting point is 00:36:32 And I really love the idea of being able to be in a room with people that are like contributing to Python. So very. That's my favorite part of PyCon. It was so nice to meet you as well and that that is actually my favorite part of python is the just you happen to end up at a table or out for a beer or coffee with this group of people and you're like wow i i got these connections and this experience that just i wouldn't so i'm very much looking forward to coming back in person but there's a bunch of great talks uh coming up exactly so this year it's also really it also although it's online the online platform is very cool and

Starting point is 00:37:11 there's still lots of great tools to watch um in the journals i put down like a list of the tools that i'm excited to watch but i also want to just put in a word for the things that i will be doing at pycon us this year and the first thing I'm going to be doing is I'm going to be hosting the diversity and inclusion work group discussion along with four other really amazing women that are part of the diversity and inclusion work group. I do want to comment here

Starting point is 00:37:38 because I thought like we got some comments about it, some feedback. I posted a picture of like our group that's going to be having this discussion or hosting this panel, and it's all women. And someone was just like, why is it all women? How is this diversity? So I do want to throw it out there. I just want to throw it out there that we did try.

Starting point is 00:38:00 The work group itself has a good balance of men and women in it. But then when I asked people if they want to come on the panel, it was only women that volunteered. So it's not my fault, and I am aware of that. That's just general feedback there. But I think the panel will be really exciting. It's going to be on Saturday on the main stage at 12 p.m. EST, I think. If you're going to be there, I really would encourage you to attend. There's going to be question and answer.

Starting point is 00:38:36 And I just think it's such an important thing. I know that sometimes diversity can seem like a really tiring thing to talk about, especially like recently recently I feel like sometimes people use it as like this buzzword and it can and people can be like oh my gosh and just turn off when they hear hear the word diversity but I really do think it's important and particularly now as Python is growing in popularity I think a few years ago, it was okay for the nucleus of Python to be based in the United States or based in Europe, but it's growing so quickly. Python, for I don't know how many years now, has been the most popular language in the world. And I know even for me, I'm in

Starting point is 00:39:21 Zimbabwe right now, and it's one of the most popular languages here where I live. And so just providing the group, our main purpose is to figure out how we can support the PSF to try and serve Pythonistas from around the world better and to connect the community better and have better representation and different things like that. So very excited about that one. That's awesome. And very excited about that one. That's awesome. And thanks for your work here.

Starting point is 00:39:51 I definitely agree that we're stronger together, right? And one thing I would really like to see, and I think we're getting there, is when people look at Python and programming in general, but generally the Python space, we have influence over that. When people look at that world, I would like them to say, I can see myself being part of that. I can see that I could belong there. Right. And if that's not the case, then how do we make

Starting point is 00:40:10 that the case? Exactly. Absolutely. I think exactly that. And I would love to see that happening in the next few years. I would love to see, you know, one of my things is I'd love to see more like women core developers and more like global core developers as well and also people on the board and different things and those are all goals that we're working towards and obviously we don't know like the perfect way to achieve something or the perfect way to do things but it's something that i think is it's really great and exciting to work on so please attend if you are listening to this and let me know if you're if you like came from this podcast it would be fantastic to see you there maybe just

Starting point is 00:40:51 comment fantastic um and then oh another thing that i am doing for pycon this year as well is i will be one i will be in the so there's like a lounge area well there's like a psf booth and if you would like to just if you're going to be there in the morning on saturday or friday i will be hanging out in the psf booth and so yeah if you just want to talk about python or the psf or anything i will be there and i will also be hosting the emMEA meeting. So if you're in Europe, the Middle East, or Africa, there's a members meeting on Saturday. I think it's at 10 a.m. Central African time. I'm not sure what time that is in other places. But I know it's at 10 a.m.

Starting point is 00:41:38 It's on the schedule, right? It's on the schedule. Yeah. We can use the date time thing. I don't know. Exactly. Pull up the REPL.. I don't know. Exactly. Pull up the REPL, throw it into date time. Exactly.

Starting point is 00:41:48 Please do that. So I will be hosting that. And that's going to be in the morning. And if you would like, even if you're not a member, you can watch it on the PSA YouTube channel. It's going to be streaming there. Or you could join. There's a meetup link that I put in the show notes. So people could join that way as well.

Starting point is 00:42:07 So yeah, Python is going to be really exciting. And I'm really looking forward to it. So just encouraging people to come along for sure. Yeah, it should be fun. And even though it is super sad that it's not in person, it's not in Pittsburgh this year, I think in some ways it's more accessible to people around the world, right? They don't have to travel there. They can just log in Pittsburgh this year. I think in some ways it's more accessible to people around the world, right? They don't have to travel there. They can just log in and attend it.

Starting point is 00:42:29 And that's so much less expensive than I flew to the U.S. and I paid $1,000 for a hotel. So there's a little silver lining, you know, out there in the live stream. Sam Morley really says, I really wish I could go to PyCon in person. Adam Parkin there says, me too, maybe in 2020. I think so. Finally, Sam also thinks it's great that we're having this diversity conversation and paying attention to it.

Starting point is 00:42:53 One of the things I've noticed in 2020 is all the regional, actually last year also though, but the 2020 and 2021, we've got all these PyCons going on all over the world. I used to think of like PyCon US as the PyCon and everything else is regional. Now I think of PyCon US as a regional conference also. It's the regional one that's close to the people

Starting point is 00:43:20 that are in the US. It isn't necessarily better. It's, I love it. It's great. Um, anybody from, uh, that's hosting it. Yes, it's better, but no, uh, I, I like all of them and I'm, I was excited to get to participate and watch videos from all over the world this year. That was pretty neat. Um, but yeah, I'm on board with, I want to get back to regional stuff. I'd like to see people in person. I can't wait.

Starting point is 00:43:49 Yeah, I will say for sure, like even if people are feeling adventurous, there is a regional conference. I didn't mention it before that I am also part of the organizing team for, which is PyCon Africa. So if you would like to travel to another PyCon in a different part of the world, when we're

Starting point is 00:44:07 able to travel and the world gets back to some form of free travel, definitely recommend also hopping over to PyCon Africa. I think like you said, I think PyCon US is fantastic and

Starting point is 00:44:23 one of the unique things about that is that it's a conference that has been there for so long. So a lot of people are going to be there. But there 100% are a lot of great conferences like Python Africa, which you should attend if you can. I think they're really just as exciting. And there's so many cool things that you get to experience like I think for me it's like whenever I go to the U.S. like last year I'd never been to Ohio before and like I had like I would never like in my I would never have a reason that I would think to myself let me go to America to go to Ohio but I feel like it was such a good experience for me

Starting point is 00:45:07 and I really liked it. And I was really surprised by that. And so I think it's the same way, like Pythons are a great way as well to like experience new places. So yeah, definitely sticking in that. Well, that wraps up our six. Anybody got any extra information to share?

Starting point is 00:45:24 Nothing else for me, other than the fact that if you do want to reach out to me, you can reach out to me on Twitter. I'm Marlene underscore ZW there. I'm also Marlene underscore ZW on, on GitHub, I think. And on my website,

Starting point is 00:45:40 my website is MarleneMangami.com. So if you would like to reach out to me there feel free to um i'm always happy to like chat about python yeah nice cool so i got a couple one made me really excited this tweet from github does is your fork behind you can now sync your parent repo with just a single click so check this out If you go to your fork now, next to contribute for your PRs and stuff, there is now a fetch upstream button. And all you have to do is click it and then automatically your fork will become in sync

Starting point is 00:46:14 with whatever you forked it from. You just have to go and go check it out at an upstream origin and then pull from that and then merge that wherever you want it to go to. Over here, you just click this button merge that wherever you want it to go to over here you just click this button and boom it's good to go so i think this will just lower the bar for people forking something they want to get the current one and then make a change to see if they could contribute back here's one fewer steps in that process you have any idea if it stays in sync or

Starting point is 00:46:40 if it's you have to no it's a one-time type of thing i believe uh it says there's this many changes we'll pull over and it basically just automatically does the the process at that time nice but still pretty nice i mean up here flask 2.0 is out and that one was sent into us from adam adam parkin that hey heads up this is now actually live so very cool actually everything from palettes has been updated so yeah i happen to have spoken uh done a podcast recording with david lord who runs palettes and phil jones who does core and contributes back to palettes as well about all the stuff coming in flask 2.0 all the exciting stuff and their future plans as well so yeah you can watch the live stream of that or wait a day or two until the episode is out and just listen at TalkPython as well.

Starting point is 00:47:29 But yeah, very, very cool. Yeah. And then Adam also had a live stream again, says this is super sweet. Always find it a headache to sync with upstream. Yeah, about the GitHub thing. That's cool. Cool.

Starting point is 00:47:38 Close it out with a joke. Well, I got a couple of things I wanted to mention. Go for it. Sorry. I had Brett Cannon on last week on testing code and huge feedback from everybody that it was a great episode. We talked about packaging. I'll have Ryan Howard on this week talking about Playwright. So that'll be fun.

Starting point is 00:47:57 And I wanted to mention a thank you to the 71 patrons that we have on Patreon. So thank you for supporting the show. Thanks. Yeah, thank you, everyone. How about a joke? Joke, yes. Sorry for almost skipping over your extras. Here, come on.

Starting point is 00:48:13 No worries. You ready? Yeah. So this one, I talked about that crazy giant ship thing. And we've got Marlene doing rapids. So I thought maybe some kind of machine learning joke. Here's a bunch of robots in school and they go like little Android looking thing, small ones because they're students, they're kids. And they're in machine learning class

Starting point is 00:48:36 and there's a big box of dirty data, like a bunch of bits that are like kind of gray and I don't just have dirt on them. And the teacher says, Robbie, stop misbehaving or I will send you back to data cleaning. Yeah. That's where they're spending half the day anyway. Yeah. They actually spend most of their time there. That's right. I don't know who is throwing them.

Starting point is 00:49:00 Like one of the robots is looking the wrong way. I was like, why is it thrown like that? I don't understand. Hey, a more concrete, really quick close-out question I see in the live stream here is, is there a difference between QDF and pandas in terms of utilization? In terms of how you actually use them? Well, I don't think so. For the most part,

Starting point is 00:49:27 if when you're using PDF, the way it's built is to mirror pandas. So the APIs are really similar. So ideally, the methods that you would use when you're using pandas are exactly the same methods that you would use when you're using QDF. The only difference is like when you're creating a Pandas data frame, for example, you would use pd.dataframe, for example. But then with QDF, you would say qdf.dataframe. If you make it like into a variable or something like that, then the methods that you're going to pull are going to be totally identical. It's really easy to train.

Starting point is 00:50:03 Yeah, that's awesome. Yeah, and Dask has similar stuff as well, right? You create a Dask data frame instead of a Pandas data frame, but the API looks quite similar. They're not always 100% compatible, but most of the mainstream things, right? Definitely.

Starting point is 00:50:17 So yeah, it's built definitely to make it as easy as possible to switch between the two. So it's very similar, yeah. Thanks a lot, everybody, for showing up. Yeah. Thanks. Thank you, Marlene.

Starting point is 00:50:29 It's really great to have you here. No problem. Thanks for having me. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured,

Starting point is 00:50:45 just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #233 RaaS: Readme as a Service

Topics covered in this episode: readme.so Wafer-scale Python datefinder and dateutil Cinder - Instagram's performance oriented fork of CPython PyCon US 2021 Extras Joke See the full show notes f...or this episode on the website at pythonbytes.fm/233

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.