Python Bytes - #233 RaaS: Readme as a Service
Episode Date: May 12, 2021Topics covered in this episode: readme.so Wafer-scale Python datefinder and dateutil Cinder - Instagram's performance oriented fork of CPython PyCon US 2021 Extras Joke See the full show notes f...or this episode on the website at pythonbytes.fm/233
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver news and headlines directly to your earbuds.
This is episode 233, recorded May 12th, 2021, and I'm Brian Ocken.
I'm Michael Kennedy.
And I'm Marlene Mangami.
Well, welcome, Marlene. For people who don't know you, can you introduce who you are?
So I am a Pythonista, of course, and I am based in Harare, Zimbabwe.
I am also really involved with the Python community.
So I'm currently the vice chair of the PSA board of directors.
The board, I think, for about coming up on four years now, which is really exciting.
And it's been really a very cool experience for me.
I'm also a software engineer.
I work right now with the Rapids team at NVIDIA
and have just been doing software engineering with them.
I will talk a bit about that later.
But yeah, I'm trying to think what else.
I'm also a very avid reader
and just like doing other things besides software.
So yeah, that's pretty much me cool that's awesome you're doing a bunch of cool stuff so i think rapid seems like a really neat
project to work on as well and of course the python community side is great so super happy
to have you here brian you know having a good readme is really important to a project. Wouldn't you say? Yeah, definitely.
Um,
and I,
uh,
for some reason,
I don't know.
Readme's are not difficult to write,
but I freeze up. Uh,
it's a blank page syndrome.
I think often I've gone through and just like copied from some other project.
You want to,
what's,
what's in there?
Read me.
Um,
but I don't think that that's like the best way to go about it really.
Cause sometimes you forget stuff.
So this was a we have a recommendation from Johnny Metz.
It's a tool called ReadMe.so.
And this is like totally fun.
It's just this interactive thing where you get to add stuff.
So we've got the title.
On the left hand side, there's a bunch of sections where you can select
what you want to go into the readme.
And then it shows a preview on the right,
but you can also see the raw markdown.
And then in the middle, there's an editor.
So you can actually just edit the whole thing here.
But really, I don't know if I really would.
Project, project title.
What I'd probably do is go through and, uh, you know,
pick out to look at what, what sort of things I'd want.
So I'd probably maybe some acknowledgements if I took the,
if I got some help from somebody, um, maybe an API reference, if it's got a,
if it's a library, uh, how to contribute. Oh, badges.
Definitely want badges. Um, and then, then you know maybe like how to run tests
if you want to contribute um uh if there's other cool projects using it i'd want to use by all
these sort of things and then you can the the editor only selects it only shows you the ones
the the one at a time which is nice but then you've got this, this whole generated, really nice looking readme
with tables and everything, like built in, and you can either just copy it or download it and
just run with it. I think this is really great. I'll probably use this future.
I really love this. And I'm surprised about the psychological benefit of just showing the little
section with the one heading. So for example,
acknowledgements, you just have hash, hash, acknowledgements. And then the few things,
even though you're editing the whole readme, it seems so much more like, oh, I'm going to just
work on that section. It's really cool. Marlene, what do you think? It's really, really cool. I
like it. I think I'm going to try it out. I have put no effort at all in this.
So I think it's something I need to put more effort into.
And this looks like a really good way to do that.
Yeah.
I think that it would also be great to just,
like if you have an existing readme and you want to add some new sections,
you're not quite sure how it should look.
Using this as a jumping point of just to grab sections of a readme
to add to an existing one too, this would be yeah this is really really cool how do you how do you can you start with a new one
like can i well sorry let me take it back can i start with an existing one can i somehow upload
an existing one i i don't see so wait uh i can go to raw hold on oh you could probably just drop it
into maybe yes you can drop it into the raw that's it okay perfect uh you go to raw. Hold on. Oh, you can probably just drop it into raw. Maybe. Yes, you can drop it into the raw.
That's it.
Okay, perfect.
You go to raw, which it doesn't hide the sections.
It's just pure markdown.
And then you just throw it in there.
Okay.
So you can't edit there.
No, no, but you can flip it back.
I think probably once you edit it there.
I don't think you can edit.
So you can only edit in the editor part.
So yeah, it still looks really, really cool.
I've heard of platform as a service. I've heard of platform as a service.
I've heard of infrastructure
as a service. I've heard of database
as a service, but I guess now we have readme as a
service. I don't know. You just go to the website.
Exactly.
That's pretty cool.
I'm pretty excited about this. Actually,
I might play around with this
for my next project. I've got some stuff
that may end up on PyPI soon, and it'd be cool to do project. I've got some stuff that may end up on PyPI soon
and it'd be cool to do it.
All right, so I've got the next item
and it's a bit of a skateboarding dog type of thing.
It's not something I think a lot of us
will take advantage of,
but it's something that is pretty interesting
as we kind of look at how Python is finding its way
into the larger computing space.
Yeah, and oh, Sam Morley out there
in the live stream before we move on
says it'd be really cool
if you could point this at a GitHub repo
and edit your repo directly,
the readme directly on your repo.
Yes, absolutely.
That's fantastic.
Yeah, that's a really good idea.
Really good idea.
All right, back to my skateboarding dog.
So there's a company called Cerebrass
and this was sent over to,
to us by Galen Swint, who is a PhD researcher who does high performance computing and stuff. So
in that world, I think this, this may be a real thing. You look through the article here that
talks about this announcement and it's like, well, there's like these 12 customers or 15 customers
of this, this chip. But for those of you watching, there's, or you check out the article,
there's a woman holding a chip. And normally we think of computer chips as little tiny things.
This is a 12 inch by 12 inch computer chip, or you want to go metric 30 centimeters by 30
centimeters. It is a big, big computer chip. And the idea is we've had small little chips come
along to do special types of processing. We've had GPUs come along and do be adapted, I guess,
for things like machine learning, training, machine learning models, and so on. This thing
just takes that idea to an entire new level. So for example, I'm always going on and on and raving about my Mac mini, my M1,
where it's a cheap little computer
relative to Apple stuff, I guess.
But it's super fast.
But it has four performance cores
and four efficiency cores.
That's it.
Your GPU, if you've got a really high-end one,
might have 4,000 cores.
This insane little chip here
has 850,000 AI cores on one chip.
Is that insane? What do you think? I'm curious how they, I mean, this is some major advances
in wafer technology because how do you get that big of a chip with no defects in it?
Yeah. And they have apparently 100% efficiency. Well, first of all, one of the ways you do it is you use the TSMC foundry, who seems
to be taking over all these small, high efficiency type of things.
And so they had a previous one that they've more than doubled the core count for.
And another way to kind of appreciate how much is going on in this chip, go back to
my M1.
It has 0016 trillion transistors. This has 2.6 trillion. going on in this chip, I, you know, go back to my, my M1, it has a zero zero one six
trillion transistors.
This has 2.6 trillion.
Or is that another way?
2,600 billion, a billion transistors versus 1.6 billion.
Like it's a 2000 times more on this chip.
So super, super cool.
And now you may be wondering, all right, all this is interesting and chips are neat.
What is the Python angle?
Like, why would I bother putting this on here?
Because, you know, we don't really talk about chips that much, except for when I go on and
on about my M1.
Here's the deal.
If you scroll down in this article a little bit, you'll see user program this insane machine
transparently in machine learning frameworks, such as specifically TensorFlow and PyTorch.
Isn't that crazy?
That's really interesting.
Isn't it?
Yeah, I was just thinking about you as I'm going through this because you're working
on the Rapids project, which is not the same thing, obviously, but it's kind of in that
space, right?
Yeah, it is.
Have you heard of this before?
No, I haven't heard of this.
This is, yeah, this is really big and I have not heard of it.
I will get into reading a bit more about it after.
Yeah, yeah, for sure. So there's a lot of interesting things. And one of the,
I can't remember where exactly they spoke about it, but they basically say,
what you do is you program in TensorFlow and PyTorch as normal. And then they have this
custom compiler that rewwrites that extracts this
execution graph to actually scale out to the 850,000 cores to the developers don't have to
think about how they program against something like this. I don't want to spend too much time
on this because there's something, my, my next item is super amazing. I want to take the time
to dive into it, but there's another thing that's really interesting just as you look at it like this thing takes an insane amount of power like
oh for this one chip you're going to need a four kilowatt power supply with up to a peak power of
23 kilowatts when when you plug in an electric car at one of the high speed home chargers, that's seven kilowatts, just to give you a sense.
This is like insane amounts for one chip, right?
You could think of it as a supercomputer.
Like it's one chip.
So anyway.
Our entire lab doesn't draw that much.
So the reason I said it's a skateboarding dog thing is I don't think most of us will be able to ever even interact with one of these, much less buy one.
They're going to be shipping in the later part of this year and price is something like three million US dollars plus.
So this is certainly super computer level.
But I do think it like opens the door for really interesting stuff going on in the high performance Python space.
So, yeah, glad that Galen sent it over.
Well, I'm totally going to put 25 bucks into Dogecoin
so that I can afford this later this year.
Oh, speaking of which.
Exactly.
Well, what about, I think maybe you get this
and you create an AI that can more intelligently
mine Dogecoin and then you take over the world.
Just an investment.
Yeah.
Just an investment.
All right.
So speaking of large-scale high-performance computing,
Marlene, take it away. Sure. I have the next item, which is Rapids. And I wanted to speak about this because I'm working on it and it's what I've been working on for, I think, yeah, wow. It's been about a year since I have been with NVIDIA,
working as a software engineer there and working specifically on the Rapids project.
And so Rapids, I think, is really interesting because the goal of Rapids,
similarly, like the last thing Michael just showed us, is to speed up data science.
But this is with GPUs.
So I think it's been really cool to work on the Rapids project, and I think it's really
interesting as well because it's open source.
Also, there's a lot of Python involved.
So it's not mostly Python.
Actually, there's a lot of C++ and CUDA code in there as well.
But I am not, you know, personally, I'm not, my aim is not to learn CUDA.
It's to try and avoid that as much as possible.
And also avoid as much C++ as possible.
So that's a bit more reasonable. But one of the goals of the Rapids project is to
allow people who are Pythonistas to work primarily with GPUs and to get those speedups
without having to know any CUDA code or to know any C++. And so I have been working primarily on the Python side of things and have really
been enjoying it. I work specifically on with the QDF data frame library. And QDF is basically a,
it mirrors, it's a GPU data frame library that mirrors pandas. So if you have a data set and you'd like to do computations on your data set
or do different operations on your data set,
if you can do that with Pandas, you should be able, hopefully,
to do the same thing with QDF.
But the good thing is that it will probably be faster.
I actually can't definitively say that it will be
faster because um I remember when I first joined the project as well I was I really I'm very
enthusiastic and I really enjoy sort of sharing when I'm learning something new and I remember I
was like going around and speaking and saying that you you know, KUDIF is so much better than Pandas because it's just so much faster.
And then my manager was just like,
you need to stop saying that because it's not true all of the time.
It's true most like some of the time. So for smaller data sets,
it's probably better to stick with Pandas um because yeah there's always this overhead right
like as you scale things out and stuff there's probably like well how do we convert this over
and get it onto the gpu and if if that process takes right half the time of what just doing the
computation you might as well just do the computation right exactly like if you're already
if you're mainly doing right i agree like if you're working with smaller data sets and you are fine with that and
that works for you and your time is not like being wasted a lot then i would say please go ahead and
stick with pandas but if you have if you are working on like larger data sets and the larger
your data sets get the more the difference is going to be
in terms of your speed up. So with very large data sets, QDF is going to take a much shorter time to
do computations and things like that. Yeah. You actually put a really interesting example in the
show notes here, right? Showing how many zeros is that? A hundred million items or something like
that? Yeah, it's a hundred million. I just kind of like randomly chose a or something like that yeah it's 100 million i just kind of like
randomly chose a number um to try and make it like i didn't also want to take a number that
was too big because i didn't want to spend like a long time um doing it and i know like for a lot
of data scientists like i think increasingly people are working with larger and larger data
sets um just depending which field you're in.
For the example, I put it on the show notes and it's on the screen right now.
But if you take Pandas sort of data frame and try and calculate the mean, and you take the same QDF data set and try and calculate the mean, it will take, I think, I'm trying to look at the notes. It's 105 milliseconds for pandas and it's
like 1.83 milliseconds for QDN, which is just awesome. And that's like a smaller scale. I would
say data, data set compared to some people. Um, yeah, just a hundred million. It's just a hundred
million. So it's not a lot. I mean, it depends.
But yeah, I think it's definitely significant
once you get to a certain threshold,
which is pretty cool.
Yeah.
Over on the RapidSight, it's a RapidSight AI.
It says it scales out on multiple GPUs.
So seamlessly scale from a GPU workstation
to multi-GPU servers and multi-node clusters,
working with Dask as well.
So Dask is also kind of about scaling pandas and combining those.
That's pretty awesome.
I actually saw that you have a Dask force out.
I recently saw it.
Yes.
Definitely going to take that.
Yeah, check it out.
That one.
I'm going to dive into Dask a little bit later.
Awesome.
Yeah, yeah.
We put that together with Matthew Rocklin and team over at Coiled.
Yeah, and that's actually free.
So people can just drop in and take that course.
I think maybe I can put it in the show notes at the end.
I think it was just announced.
Let's see.
Very cool.
No, that was last week.
But yeah, this is super cool.
And this one is certainly within normal person's reach.
You get a GPU and you're good to go, right?
Yeah, I think, yeah.
I mean, I'm just using it on my laptop with the GPU.
You can also use it like online.
So there's also a Colab notebook on the Rapid site.
I think that you can click
and then you can like kind of experiment
if you just wanted to do it online
or I think you can use any sort of online
GPU that you have access to. So it's very, I think it's trying to make it more accessible,
which is great. Yeah, that's super cool. Yeah, very neat. Well, like I said, I think this is
a cool project to be working on. So thanks for sharing it with us. Brian, is it time for the next one? Is it time for the next one?
This was recommended by a listener,
Ira Horeca, I think.
So mentioned this,
and it's kind of a rabbit hole.
I spent a whole bunch of time
playing with all this stuff last night.
He recommended Date Finder.
So this is a Python utility
and it kind of is amazing.
So it's a combination of a couple of things, but
so he pointed us to a com code video, which I, you know, I'm totally a fan of com code stuff
because they, they kind of go through some of the Python libraries and some of the other,
a lot of other things, but just have kind of a quick demo of what it does. And I really appreciate
that. It actually, the demo here is better than the readme
and the Datefinder readme.
So maybe I guess a pull request is necessary.
But anyway, so what Datefinder does is it takes,
I'm going to scroll down a little bit.
So Datefinder takes, it parses dates or it finds them.
So you give it a string
or a bunch of list of strings or something,
and it can find where the dates are in there.
So if you've got a sentence or a paragraph
or an entire page that has a whole bunch of dates in it,
it'll find all of them
and then return you a list of dates that it found.
It actually does a whole bunch of things, but that's the default, or the one that we're
talking about, find dates.
There's a bunch of other less documented features of Date Finder, but this is the one that is
demonstrated here, and it's pretty cool.
So what it does is it finds those dates, and then it converts them to date times.
So find dates will find them and convert them to date times.
And it does that by passing them off to the DateUtil library.
So this is just kind of a really cool demo.
The list, the little video is a good demo of showing how to do this.
I also really kind of liked this way to play.
So the video shows this way to play with things of of it just
had a list of strings and then uh used a comprehension to convert that to to call a
function on a whole bunch of strings and i thought this was just kind of a clever way to just play
with a function that translates things this is a neat thing to do i would have probably so hard
yeah it's super hard but uh normally because it picky, right? You've got to go to the date time parsing language almost lookup. So if I put percent DDDD, that March 12, 2010. But if they forget the comma,
it won't parse. And like all those things are really annoying about working with reading,
converting strings to dates. And this looks like it just it doesn't care. It's nice.
Yeah. And then it also it's kind of a clean, nice, clean interface to it as well.
The and a limited documentation is just a focus tool, which is nice. And it's interesting that
this is just a focus tool that apparently a lot of people need because according to GitHub, there's 662 projects
using this. So it's used kind of all over the place. The behind the scenes though, it's taking
the dates that it found, the strings, and passing those to DateUtil. So if you want to avoid the
finding part, this actually is also a good library to look at for the usage of how to to DateUtil. So if you want to avoid the finding part,
this actually is also a good library to look at for the usage of how to use DateUtil
to easily convert dates.
And DateUtil is kind of an amazing tool as well.
And it gives you, DateUtil,
I told you this was a rabbit hole.
One of the cool things about it is
it doesn't just parse dates,
but you can do relative dates. You can say like today plus three weeks or something and it'll figure that out
and then you can or you can take two days uh two dates and do date math with it really well
and also date util has an amazing time zone support probably the best in python so
um this is pretty pretty kind of cool uh, I think I was looking through the test code.
The test code for DateUtil is kind of a neat mix of unit tests and PyTest.
Both of them are good examples of how to do both.
And I like some of the newer stuff is using PyTest with parameterization, but it's good.
Yeah, I like this a lot.
Marlene, what do you think?
Yeah, I like it. I think I'm not actually working with dates quite often. with parameterization but it's good yeah i like this a lot marlene what do you think yeah i like
it i think it i'm not actually working with dates quite often so i'm i'm trying to think of use cases
for myself other than like maybe converting time zones which is like a nightmare um so maybe oh you
can say that again oh my gosh maybe you said that but it looks like it would be really useful for people that are
yeah they use networking dates a lot yeah i'm i'm showing up some of the some of the examples
from date util of how to use it and it's basically i i imagine this is one of the reasons why date
finder is so used because um this is non-trivial even to use datutils. Yeah, that's cool.
Cool, cool.
All right, well, I got the next one.
And this one doesn't exactly come to us from Anthony Shaw,
but I was talking to Anthony about something else.
And he's like, oh, have you heard of this?
Have you heard of Cinder?
And Cinder is pretty awesome.
So Anthony's doing interesting work around Python
and performance at the CPython level,
especially now.
I think he's giving a talk on Pidgin or Piston.
Piston, I believe it is.
I'm not 100% sure.
I might be remembering which one's wrong at PyCon, which is, you know, we're going to
talk more about that in just a second as well.
But Cinder is a really interesting fork of CPython from Instagram.
So it's under the facebook incubator project
and i think we've mentioned it before i definitely have talked about it before
other presentations that instagram has done really interesting things like disable the
garbage collector just turn it off 100 and they got less memory usage not more memory usage by
just allowing the cycles to leak which is insane insane. But this is like, speaking of
insane, this takes it to a whole nother level. So this is, they've been doing all these low level
things inside of CPython. It is based on three, eight. Hopefully some of these ideas can be
brought forward and shared with everyone because there's a lot going on. So let me just cruise down
here. I'll just read the little intro part because it's jam-packed and then I'll go into some of the details.
So it says this is the internal performance-oriented production version of CPython 3.8.
And it contains a number of performance optimizations.
I feel like performance is some sort of theme of this episode. eager evaluations of code routines, a JIT, just-in-time compiler, an experimental bytecode
compiler that uses type annotations in some incredibly interesting ways to emit type-specialized
bytecode that performs better. So just to give you an example, one of the reasons that math in the
pure Python layer is slower than, say, C++ or C Sharp is C++ and C sharp work with just the value. So if
you have the value seven, you might have two or four bytes that represent the value seven. In
Python, you have a pie object pointer, which is like 28 bytes pointing out to a thing on the heap
that represents the number seven. And it's a whole lot more work to interact with that and set the
reference count on that.
And so on, instead of just working with the value seven, right? So one of the things they do is they
actually have hyped the, the, they use Python type annotations to understand, oh, this is an integer.
This is a long and so on type of thing, and actually convert those to the, the machine
oriented numbers, right? So just the value four instead of a pointer.
And then it will use what's called boxing.
If something else that's outside of this world needs it,
it'll up-level that to like a pylon object pointer type thing
and hand it off.
So there's all sorts of stuff like that going on.
Interestingly, the first question is,
is this supported?
No, not supported.
But there's some interesting things going on here.
And all of this has to be taken with an understanding that it's in a very specific context and that
it may or may not be useful for you.
Brian had pointed out some articles and ideas around that you're not Instagram, you're not
Facebook, you're not Netflix and so on.
Most of the time,
people are building much smaller software with different constraints. So they start out by saying,
look, Instagram uses a multi-process web server architecture where the parent process starts,
performs initialization, and then forks 10 worker processes to handle requests.
This is super common. Like for example, TalkPython training literally does exactly this. It uses micro-WSGI, it starts up and it creates 10 worker processes
to handle like people wanting to take courses. So it's not uncommon in the web, but it's not
how all Python code runs. And so the first optimization they did is they created what
are called immortal instances. The reason they were so focused on the garbage collector and all those sorts of things was when you fork these processes, initially, there's a bunch of memory that can be
shared. And that helps with cache locality that helps with overall memory usage, all sorts of
things. But as soon as something has changed about one of those items, it has to copy a whole page
of memory. And they realized that when an object's
reference count is modified in one of the processes, it has to copy, replicate, and sort of
fork off a bunch of the memory that used to be shared across all those processes. So they created
what they call immortal instances that cannot be, that don't participate in reference counting or
garbage collection. And that prohibits their reference count number to change
so they can be shared.
So they can mark like a whole bunch of the startup stuff
as like, just don't even look at this or change it
and don't do reference counting on it.
So in their world, it got things faster,
but it doesn't always,
they said it's something a little bit slower
in straight line code,
but in this sort of forked world, it's better.
The next one is shadow bytecode,
which is an inline caching implementation. And it goes through, applies in certain optimization
cases for generic Python opcodes. And it'll observe those for functions that take a lot of time
and dynamically replace those with specialized opcodes that it thinks are going to be better.
Another thing it does that's pretty interesting is it will eagerly evaluate coroutines.
So if I say this is an async method, and then in that method, I call await some function call, normal Python is going to create a coroutine.
It's going to schedule it on the async IO event loop, and it's going to get to it.
And that's a lot of overhead.
But maybe that function says inside, the first thing is, if this case just return the cached answer, otherwise go to the
database, await the response and so on. And what they realized is if it's going to go through that
first case, it's not actually awaiting something. So they'll actually execute the awaited thing
up until it actually needs to become async. So it'll like look or
effectively look inside the function and say, is the path we're going on this time going to be
async or not? And if the answer is no, it will run it without async, which means it skips all
that context switching and all that stuff, which is pretty crazy. It also has the sender JIT,
which is a method in time JIT compiler, think C sharp, Java, maybe even
JavaScript V8. So it's enabled for every function that is called. Actually, it's not sorry, if it
is, it'll make it slow. So you can basically say which functions should be optimized. But they say
it supports almost everything that Python can do. And it has a 1.5 to 4 times speedup of the
Python benchmarks, which is pretty interesting. They also have this thing called strict modules,
which is actually a static analyzer capable of validating top-level code to see if a module
has side effects and can treat it differently if it doesn't. You can have an immutable strict module type that is sort of
a replacement for Python's regular module that behaves and loads differently and so on. And then
the thing I talked about, the numbers more broadly is under this category of static Python. It's an
experimental bytecode compiler that makes use of type annotations to emit better things. And check this out. It can deliver performance similar to
mypyc or Cython. And this thing will go up to seven times faster than regular Python for the
Richards benchmarks. And I don't know if the 4x improvement before is like in addition to this,
you get 28 or you just get seven. I don't really know. But there's a lot of things going on here and a lot of different ideas about how this works.
So I'm just scratching the surface on the details,
but I feel like I've gone on and on about it.
That's really interesting.
I saw, I think, is there a talk about it at Python, at PyCon?
I think that-
It is coming up, yes.
They're going to give a talk on this at PyCon.
Yeah, it was one of the talks
I was looking forward to listening to.
Yeah, just because I think it's super interesting
to be able to kind of play around with
that they were able to kind of make their own version of Python.
And it might, I don't know, like I think that there's,
like you mentioned, Anthony, and I also know Victor, I think,
and someone else who are also working on sub-interpreters and different things to make Python faster.
So I'm really curious to see if the core devs or people will also be listening to this talk
and maybe take some ideas from it.
It would be really cool to kind of see.
And I mean, it's always good to get speed ups, even if they, I don't know.
I don't know if it will help like general, like normal Python users,
but I think it's always good to look into.
Yeah, yeah, I agree.
I think some things here are absolutely transfer transferable to regular general purpose C Python.
And some of them might not be, for example, the immortal, the immortal instances.
That might be a thing that just they do that.
And it makes sense for their large scale farm of servers.
But the the JIT that takes the type information and does math many, many times faster that everybody would want that.
Like we all work with numbers at some level or another.
Brian?
Well, one of the things I love about the...
I mean, this kind of applies
to all of these sort of speed up projects.
One of the things I love about Python
is just the generalness of it.
You can throw it.
Data structures can hold anything.
But there are times where you really are using a huge array of floats
or a huge array of integers or a huge array of like a fixed data size um those are those are
times where you i don't need it to be generic i just need it i need it to be fast so having
having something that's the part where i think it'd be be interesting to pull into a
regular python but but don't we get that with like uh some of the data science stuff anyway um
the number like it's like numpy and stuff yeah you do but you can't do generic programming with
it right you do like sort of matrix math type of things and this one like the answer used to be
okay well this function is slow this serialization deserialization section might be slow so rewrite that in cython for example and
what's really cool about this is you can write regular python and just put type annotations on it
and then it goes as fast as cython and you you don't even have to do like a separate compiler
i believe in this world right because have, the JIT just knows that
and then will, like as you run it,
it'll just compile and run it.
So which is, I think it just sort of makes
some of those ideas closer
and more automatic for most people.
I kind of think I foresee a future
where we have sort of some types that affect runtime.
There's like this tension that I sense
in the Python core people of whether or not
types should be just an afterthought
or whether they should be really part of the runtime.
And I think there are some cases
where having them be part of the runtime
might be a good thing.
Yeah, and this is interesting
because what they do is they
define these static
modules, and then in there
they can treat them differently. I feel like I always
see on Twitter some people
kind of like ranting about how
they don't like that
direction that Python is going in, like the
idea of putting in like annotations
and things like that. I've seen some people
that are not super big fans of that.
I'm not really sure why.
I generally would like to understand,
like I think most people, or not most people,
but I think some people would prefer Python
to maybe remain as it is.
But I do think that there's like,
just having it be a bit faster in a couple of cases
would be helpful.
So I don't know so i don't know
i don't know if it's in that direction i'm with you and one of the things they point out in this
readme uh announcing the project is that you can still do gradual typing so you can in some places
have no types in some places have some types and the thing can convert and just deal with that
automatically and i think that's the the reason that the types are really welcome in Python
is because you can use them if you want,
but you don't have to.
As opposed to places like TypeScript,
which said, well, JavaScript doesn't have types,
so we're going to add this very strict type system.
And if you don't fit it exactly,
we're going to not compile and complain
and it's going to be really not good.
This feels like it continues that forgiving nature of Python
to let you opt into it.
But if you do, it can go faster.
That's the direction I'd like to see.
I personally would like to see types be really a full-fledged feature of Python.
I love that they're optional, but if they're there, let's see how much we can do and improve things with them, right?
A hundred percent.
Yeah. All right. Marlene, you got the last things with them, right? A hundred percent. Yeah.
All right.
Marlene, you got the last one.
I got it on screen for you.
Okay.
Yes.
The last one for today is PyCon US, which I'm very excited about.
It started today, which is really great.
Are both of you attending?
I don't know if you're attending.
Yes, absolutely. Okay. Brian? Brian, are you attending? I don't know if you're attending. Yes, absolutely.
Okay.
Right?
Brian, are you attending?
Yes.
Yay.
Yeah, I think it's such a great event in terms of the fact that I know it's PyCon US, but
it is, at the moment, it's the largest Python gathering or largest PyCon on Earth, I think,
which is very cool because it means that you can meet
people from all around the world like I remember um I'm really sad that it's not in person
because like last year like I remember not last year but the year before that that's where I
actually met you Michael for the first time I think we were literally I think we were like at
a table with like you and Anthony Shaw and like Lucas Slanga.
And it was like, and I was just randomly there.
But it was such a cool discussion.
And I really love the idea of being able to be in a room with people that are like contributing to Python.
So very.
That's my favorite part of PyCon.
It was so nice to meet you as well
and that that is actually my favorite part of python is the just you happen to end up at a table
or out for a beer or coffee with this group of people and you're like wow i i got these connections
and this experience that just i wouldn't so i'm very much looking forward to coming back in person
but there's a bunch of great talks uh coming up exactly so this year it's also really it also although it's online the online platform is very cool and
there's still lots of great tools to watch um in the journals i put down like a list of the tools
that i'm excited to watch but i also want to just put in a word for the things that i will be doing
at pycon us this year and the first thing I'm going to be doing
is I'm going to be hosting
the diversity and inclusion work group discussion
along with four other really amazing women
that are part of the diversity and inclusion work group.
I do want to comment here
because I thought like we got some comments about it,
some feedback.
I posted a picture of like our group that's going to be having this discussion
or hosting this panel, and it's all women.
And someone was just like, why is it all women?
How is this diversity?
So I do want to throw it out there.
I just want to throw it out there that we did try.
The work group itself has a good balance of men and women in it.
But then when I asked people if they want to come on the panel,
it was only women that volunteered.
So it's not my fault, and I am aware of that.
That's just general feedback there.
But I think the panel will be really exciting. It's going to be on Saturday on the main stage at 12 p.m. EST, I think.
If you're going to be there, I really would encourage you to attend.
There's going to be question and answer.
And I just think it's such an important thing.
I know that sometimes diversity can seem like a really tiring thing to talk about,
especially like recently recently I feel like
sometimes people use it as like this buzzword and it can and people can be like oh my gosh and just
turn off when they hear hear the word diversity but I really do think it's important and particularly
now as Python is growing in popularity I think a few years ago, it was okay for the nucleus of Python to be based in
the United States or based in Europe, but it's growing so quickly. Python, for I don't know how
many years now, has been the most popular language in the world. And I know even for me, I'm in
Zimbabwe right now, and it's one of the most popular languages here where I live.
And so just providing the group, our main purpose is to figure out how we can support
the PSF to try and serve Pythonistas from around the world better and to connect the
community better and have better representation and different things like that.
So very excited about that one.
That's awesome. And very excited about that one.
That's awesome.
And thanks for your work here.
I definitely agree that we're stronger together, right?
And one thing I would really like to see,
and I think we're getting there, is when people look at Python and programming in general,
but generally the Python space, we have influence over that.
When people look at that world,
I would like them to say,
I can see myself being part of
that. I can see that I could belong there. Right. And if that's not the case, then how do we make
that the case? Exactly. Absolutely. I think exactly that. And I would love to see that happening in
the next few years. I would love to see, you know, one of my things is I'd love to see more like
women core developers and more like global
core developers as well and also people on the board and different things and those are all goals
that we're working towards and obviously we don't know like the perfect way to achieve something or
the perfect way to do things but it's something that i think is it's really great and exciting
to work on so please attend if you are listening to this and let me
know if you're if you like came from this podcast it would be fantastic to see you there maybe just
comment fantastic um and then oh another thing that i am doing for pycon this year as well is i
will be one i will be in the so there's like a lounge area well there's like a psf booth and if you
would like to just if you're going to be there in the morning on saturday or friday i will be
hanging out in the psf booth and so yeah if you just want to talk about python or the psf or
anything i will be there and i will also be hosting the emMEA meeting. So if you're in Europe, the Middle East, or Africa, there's a members meeting on Saturday.
I think it's at 10 a.m. Central African time.
I'm not sure what time that is in other places.
But I know it's at 10 a.m.
It's on the schedule, right?
It's on the schedule.
Yeah.
We can use the date time thing.
I don't know.
Exactly. Pull up the REPL.. I don't know. Exactly.
Pull up the REPL, throw it into date time.
Exactly.
Please do that.
So I will be hosting that.
And that's going to be in the morning.
And if you would like, even if you're not a member, you can watch it on the PSA YouTube channel.
It's going to be streaming there.
Or you could join.
There's a meetup link that I put in the show notes.
So people could join that way as well.
So yeah, Python is going to be really exciting.
And I'm really looking forward to it.
So just encouraging people to come along for sure.
Yeah, it should be fun.
And even though it is super sad that it's not in person, it's not in Pittsburgh this year,
I think in some ways it's more accessible to people around the world, right?
They don't have to travel there. They can just log in Pittsburgh this year. I think in some ways it's more accessible to people around the world, right? They don't have to travel there.
They can just log in and attend it.
And that's so much less expensive than I flew to the U.S. and I paid $1,000 for a hotel.
So there's a little silver lining, you know, out there in the live stream.
Sam Morley really says, I really wish I could go to PyCon in person.
Adam Parkin there says, me too, maybe in 2020.
I think so.
Finally, Sam also thinks it's great
that we're having this diversity conversation
and paying attention to it.
One of the things I've noticed in 2020
is all the regional, actually last year also though,
but the 2020 and 2021,
we've got all these PyCons going on all over the world.
I used to think of like PyCon US as the PyCon
and everything else is regional.
Now I think of PyCon US as a regional conference also.
It's the regional one that's close to the people
that are in the US.
It isn't necessarily better.
It's, I love it. It's great. Um,
anybody from, uh, that's hosting it. Yes, it's better, but no, uh, I, I like all of them and I'm,
I was excited to get to participate and watch videos from all over the world this year. That
was pretty neat. Um, but yeah, I'm on board with, I want to get back to regional stuff.
I'd like to see people in person.
I can't wait.
Yeah, I will say for sure,
like even if people are feeling adventurous,
there is a regional conference.
I didn't mention it before
that I am also part of the organizing team for,
which is PyCon Africa.
So if you would like to travel to another PyCon in
a different part of the world, when we're
able to travel and the world gets
back to some form
of free travel,
definitely
recommend also hopping over
to PyCon Africa. I think
like you said, I think PyCon
US is fantastic and
one of the unique things about that is that it's a conference that has been there for so long.
So a lot of people are going to be there.
But there 100% are a lot of great conferences like Python Africa, which you should attend if you can.
I think they're really just as exciting.
And there's so many cool things that you get to
experience like I think for me it's like whenever I go to the U.S. like last year I'd never been to
Ohio before and like I had like I would never like in my I would never have a reason that I
would think to myself let me go to America to go to Ohio but I feel like it was such a good experience for me
and I really liked it.
And I was really surprised by that.
And so I think it's the same way,
like Pythons are a great way as well
to like experience new places.
So yeah, definitely sticking in that.
Well, that wraps up our six.
Anybody got any extra information to share?
Nothing else for me,
other than the fact that if you do want to reach out to me,
you can reach out to me on Twitter.
I'm Marlene underscore ZW there.
I'm also Marlene underscore ZW on,
on GitHub,
I think.
And on my website,
my website is MarleneMangami.com.
So if you would like to reach out to me there feel free
to um i'm always happy to like chat about python yeah nice cool so i got a couple one made me
really excited this tweet from github does is your fork behind you can now sync your parent
repo with just a single click so check this out If you go to your fork now, next to contribute for your PRs and stuff,
there is now a fetch upstream button.
And all you have to do is click it
and then automatically your fork will become in sync
with whatever you forked it from.
You just have to go and go check it out
at an upstream origin and then pull from that
and then merge that wherever you want it to go to.
Over here, you just click this button merge that wherever you want it to go to over here you just
click this button and boom it's good to go so i think this will just lower the bar for people
forking something they want to get the current one and then make a change to see if they could
contribute back here's one fewer steps in that process you have any idea if it stays in sync or
if it's you have to no it's a one-time type of thing i believe uh it says there's this many changes we'll pull over and it basically just automatically does the the process at that time
nice but still pretty nice i mean up here flask 2.0 is out and that one was sent into us from adam
adam parkin that hey heads up this is now actually live so very cool actually everything from palettes has been updated so yeah
i happen to have spoken uh done a podcast recording with david lord who runs palettes and
phil jones who does core and contributes back to palettes as well about all the stuff coming in
flask 2.0 all the exciting stuff and their future plans as well so yeah you can watch the live
stream of that or wait a day or two until the episode is out
and just listen at TalkPython as well.
But yeah, very, very cool.
Yeah.
And then Adam also had a live stream again,
says this is super sweet.
Always find it a headache to sync with upstream.
Yeah, about the GitHub thing.
That's cool.
Cool.
Close it out with a joke.
Well, I got a couple of things I wanted to mention.
Go for it.
Sorry.
I had Brett Cannon on last week on testing code and huge feedback from everybody that it was a great episode.
We talked about packaging.
I'll have Ryan Howard on this week talking about Playwright.
So that'll be fun.
And I wanted to mention a thank you to the 71 patrons that we have on Patreon.
So thank you for supporting the show.
Thanks.
Yeah, thank you, everyone.
How about a joke?
Joke, yes.
Sorry for almost skipping over your extras.
Here, come on.
No worries.
You ready?
Yeah.
So this one, I talked about that crazy giant ship thing.
And we've got Marlene doing rapids.
So I thought maybe some kind of machine learning joke.
Here's a bunch of robots in school and they go like little Android looking thing,
small ones because they're students, they're kids. And they're in machine learning class
and there's a big box of dirty data, like a bunch of bits that are like kind of gray and
I don't just have dirt on them. And the teacher says, Robbie, stop misbehaving or I will send you back to data cleaning.
Yeah.
That's where they're spending half the day anyway.
Yeah.
They actually spend most of their time there.
That's right.
I don't know who is throwing them.
Like one of the robots is looking the wrong way.
I was like, why is it thrown like that?
I don't understand.
Hey, a more concrete, really quick close-out question I see in the live stream
here is, is there a difference between QDF
and pandas in terms of utilization?
In terms of how you actually use them?
Well, I don't think so. For the most part,
if when you're using PDF, the way it's built is to mirror pandas. So the APIs are really similar.
So ideally, the methods that you would use when you're using pandas are exactly the same
methods that you would use when you're using QDF. The only difference is like when you're creating a Pandas data frame, for example, you would
use pd.dataframe, for example.
But then with QDF, you would say qdf.dataframe.
If you make it like into a variable or something like that, then the methods that you're going
to pull are going to be totally identical.
It's really easy to train.
Yeah, that's awesome.
Yeah, and Dask has similar stuff as well, right?
You create a Dask data frame
instead of a Pandas data frame,
but the API looks quite similar.
They're not always 100% compatible,
but most of the mainstream things, right?
Definitely.
So yeah, it's built definitely
to make it as easy as possible
to switch between the two.
So it's very similar, yeah.
Thanks a lot, everybody, for showing up.
Yeah.
Thanks.
Thank you, Marlene.
It's really great to have you here.
No problem.
Thanks for having me.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm.
If you have a news item you want featured,
just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Ocken,
this is Michael Kennedy.
Thank you for listening and sharing this podcast
with your friends and colleagues.