Python Bytes - #53 Getting started with devpi and Git Virtual FS

Episode Date: November 22, 2017

Topics covered in this episode: Exploring Line Lengths in Python Packages NumPy: Plan for dropping Python 2.7 support How to Learn Pandas Microsoft and GitHub team up to take Git virtual file syste...m to macOS, Linux Getting started with devpi Marketing-for-Engineers Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/53

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bites, where we deliver Python news and headlines directly to your earbuds. This is episode 53, recorded November 21st, 2017. I'm Michael Kennedy. And I'm Brian Ocken. And we've got a ton of cool things we've picked for you today. Well, six plus a few more, but it feels like a lot of good stuff to share with you guys. So I'm looking forward to it. How about you, Brian? I'm really looking forward to it, yeah.
Starting point is 00:00:21 Yeah, definitely. So before we get into that, let's say thank you, Rollbar. If you think there are errors lurking in your app and you want to get notified right away, go to pythonbytes.fm slash Rollbar and check it out. Tell you more about that right now. I want to know your philosophy on line length. Are you a strictly 79 or less sort of person, Brian? I'm trying to do the 79 thing, but it's really short.
Starting point is 00:00:43 So we do like 120 in my work group at work you guys what use like 44 inch like tv hd tvs for like your monitors or what it's still pretty good to have like i like a little bit shorter so that you can put a whole bunch of uh you can do side-by-side diffs easier and stuff yeah for sure but 79 is really tight but it is tight how about you what do you use i guess i I do stick to 79 pretty much. You know, if the editor says, hey, this one's too long, you should reformat it according to pip8.
Starting point is 00:01:11 I guess I do, but I feel like it has a tendency to put pressure on you to make bad decisions. For example, if you have like an expression involving like say five variables and like a string, like you're say formatting a string and it would encourage you to have those variables super short and non-descriptive
Starting point is 00:01:29 so they fit within 79 but if they're long and descriptive that might be 100 right and so i feel like there's this pressure but i guess i succumb to it anyway things that i share on the on github or something i try to keep it to 79 but i don't know if it's a good idea or not mostly because i do testing and stuff, people will run a flake eight over my code and say, Hey dude, how come your code doesn't exactly, you failed the build clean. Yeah. So there's a, there's an article from Jake Vanderplass. He's the, the astronomer guy that was a did PyCon talk to the keynote. Yeah. He did a keynote there. And I think he also did another talk, but yeah, it was,
Starting point is 00:02:03 it was great. He's up at a university washington doing all sorts of cool astronomy stuff so what do you have to say about line lengths because of the switch of twitter between 140 and 280 that they've done he was uh intrigued by looking at the statistics and did an exploration of line lengths in python packages and he did it like a Jupyter Notebook-type article so that you can kind of follow through all of his stuff and mostly looking at NumPy, SciPy, Pandas, I can learn Matplotlib and AstroPy. So I didn't know about AstroPy,
Starting point is 00:02:38 but that makes sense because I'm not an astronomer. Yeah. How often do you analyze telescopic images with machine learning? So far, zero times. It sounds fun though, doesn't it? Yeah. But it's kind of a neat look at basically, I wouldn't know how to do this right off the bat. I mean, but it's pretty simple to write a little bit of code to import a bunch of modules and check out the line lengths and examine that and graph it and plot it and clean it up.
Starting point is 00:03:06 It's a pretty cool article. And then also just sort of looking at it, it looks like most of them, they follow a distribution, a... Normal distribution? It's not exactly normal, but it's... An abnormal distribution? An abnormal, a log normal distribution. That's it. Oh, wow. Okay. That's a little bit more statistics than I understand, but it's sort of normal, I guess.
Starting point is 00:03:26 But it follows a log normal distribution, and except for there's an artificial bump near the right side, the 80 character side, because many of these packages are trying to hit 80 or less. But there's an argument there for you don't really need it because code naturally fits anyway it's a cool look at it um i was thinking about using the code within this to take a look at our code at work to see um where our line lengths are at work yeah that'd be an interesting analysis to like run some pep8 style metrics across like your organization yeah you know i think people should do someone some enterprising listener out there should build like a little package we can all drop in and do cool stuff like that with yeah and at the end of the article he does ask he's curious about what different uh popular packages where they fit into the line
Starting point is 00:04:18 length distribution match so that'd be neat right And other languages like how does this compare to say JavaScript versus C++ versus Python? Things like that. Also interesting to know, but I don't have those answers. So they're open questions for now. So it's a good day. Yet another good day for modern Python and, you know, sort of the sun continues to set on legacy Python. This time around very, you know, you mentioned this package just previously numpy yeah there's some interesting news with numpy yeah so numpy is dropping support for legacy python and they say you know we know that the python core developers are dropping support for python 2 in 2020 it's still an open question on the day i like that that guy who voted for the keynote of PyCon 2020 as the official end date. But who knows what day it is? It hasn't been officially announced. But they say basically
Starting point is 00:05:11 this requirement to continue supporting Python 2 makes it harder and harder to advance NumPy. And so they're going to drop it. I think that's great. I can see that. It's such an important library. And, you know, data science is definitely moving towards Python 3. And so their plans are December 31st, 2018. Up until then, they're going to support Python 2 and Python 3 100%. And that's not very far away. What is that, like 41 days? No, that's 41 days in a year.
Starting point is 00:05:39 So a little bit of time on that one. And then January 2019, all new features will be Python 3 only. And then the year after that, I guess when Python 2 support goes out, it probably goes out of here as well. It isn't just a spiteful thing. They've got real reasons to do it because the increased burden of trying to be Python 2 compatible is unreasonable. Yeah, definitely. It means it's like there's features that are not in NumPy because it to be Python 2 compatible is unreasonable. Yeah, definitely.
Starting point is 00:06:05 It means it's like there's features that are not in NumPy because it works on Python 2. Right. So it's time to say thank you, but goodbye to Python 2, they say, which is, I think, great. Speaking of data science, one thing I've tried to learn a lot, but haven't done a great job of is pandas. Actually, pandas and like kind of the whole data science tool chain. It's something I'm curious about, but I'm not sure how to go about it. So I really liked this article from Ted Petro about how to learn how to learn pandas and how to go about it. His opinion, of course,
Starting point is 00:06:36 but it's a it seems like a really pretty reasonable thing to he was recommending some of the learning, reading the documentation and reading about about pandas and how it works, but then also kind of jumping back and forth and using it for small projects. And I guess with any tool, that makes sense. But there is some, he gives a little bit more, I guess, more details of how to do that so that you can jump back and forth and know what to learn first? Yeah, I think one of the challenges that I have learning pandas, like I can sort of do a few things with it, but not a lot is I don't really have a project to use it on. Like I just kind of poke at it and go, Oh, okay, it does this cool stuff. But you know, like, I just haven't done like data science-y things or financial analysis things. So he talks about things like
Starting point is 00:07:24 here's some Jupyter notebooks, here's some Kaggle kernels and data sets in the form of, these are data sets in the form of Jupyter notebooks. So some concrete ways to play with it, not just, you know, fired up and poke at the API. Yeah, or maybe go back to that Jake article and examine your line lengths. Exactly. There's an example.
Starting point is 00:07:43 And then one of the things I thought was a nice ending is when you think you have it fairly well, go a little bit further and then start answering some questions on Stack Overflow and kind of measure yourself against the other things that people are running into problems with. I think that's a cool idea. That is a cool idea.
Starting point is 00:08:01 And the people on Stack Overflow will let you know if you're wrong. Yeah, definitely. It's one of the nice and not nice things about the internet is the best way to find out whether you're right about something is to post the wrong answer. Yeah, people don't really hold back on you too often, do they? Yeah, no, no, you get that right away. Yeah, if you have a thick skin or if you're willing to grow a thick skin, then that's actually a great way to do it. Yeah. Reddit would probably also work too. Also, I'm sure the data science people are similar, but the Python community as
Starting point is 00:08:30 a whole is fairly gentle with people. They'll tell you you're wrong, but they'll be nice about it and probably use more words than you've written to explain something to explain why you're wrong about it. Yeah. Maybe they'll have a good explanation of your misunderstanding and you can connect some more dots, right? I depend on that a lot. Nice. All right, before we get to the next one, which is some more social coding stuff, I just want to say thank you to Rollbar. If you have a web application and it's running on the internet, it's probably crashing at some point. And it would be great to know about that.
Starting point is 00:09:04 Like, how often do you go back and read logs? Like, do you go and read logs at your work very often, Brian? Actually, more than I want to. Yes. I'm in a manager role, so I get to tell other people to do it. Here's a problem in the log. Go fix that. Yeah. But you don't want to have to depend on reading that, right? If you could avoid it and just get the notifications right away, that'd be awesome. So, Rollbar actually, I normally talk about in the context of Python, and that's totally true, but it actually supports 26 languages and frameworks. So Python, obviously, Flash, Django, Pyramid, etc. But node.net, it even has a Flash plugin and client-side JavaScript. So totally cool. Like whatever you're using, you can use rollbar. It's awesome. And they have this thing called people tracking. So for example, on like
Starting point is 00:09:44 my training site, people are logged in. And if there's a crash, I can emit a little thing that will tell rollbar, this is the user that had this error. So not only do I know what the error was, I can actually go back and send that person a message, say, I saw you run into a crash, and here's how I fixed it. Like, whoa, I didn't even tell you what happened. That's kind of creepy, but awesome. So anyway, if you want to be creepy and awesome, check out bind on bytes.fm slash rollbar and solve the problem before your users even tell you about them. All right. So one of the things that came out recently was an announcement from Microsoft and GitHub. I'm not sure what the order of, but this sort of came out, but it started, I think it started at Microsoft and they want to use Git.
Starting point is 00:10:27 Okay. So everybody wants to use Git because Git is awesome. But the problem is they actually have some pretty large projects and it turns out they tried to use Git and it was basically unusable for some of their projects at Microsoft. So Brian, you're probably thinking Git was built for Linux and Linux is a huge project, right? Yeah. So what's up with these Microsoft people? They must be doing it wrong. And I kind of actually thought that when I read this first as well, but it turns out if you look at the Linux kernel, it's like 640 megs of data in the source code repository and Git. That's big, right? That's quite big. But it turns out that if you look at like the visual studio tools, those are three gigabytes, which is five times bigger than Linux.
Starting point is 00:11:11 And they're trying to use it for that. And that was kind of a little sketchy, but then they wanted to use it for windows. And apparently the, the repository for windows is 270 gigabytes or 421 times larger than Linux. Wow. No wonder it's slower. That's a little bit bigger. And there's 4,000 people committing to it like all day as their job, right? So it's got a lot of contention as well. And so what they've done in the announcement is Microsoft and GitHub team up to create a Git virtual file system. And the GitHub part is mostly to make this work on other platforms, macOS and Linux and things like that. So what they did is they said, look, the problem is, we literally have like, I don't know how many million, thousands, maybe millions of files when we do a checkout. So
Starting point is 00:11:59 when did a like a regular get checkout, it would take 12 hours to clone the repository three hours to do just a straight checkout of a branch eight minutes to ask git status and 30 minutes to commit like one file so it was pretty broken and they said the reason it's broken primarily is there's like all these files and generally you're only working with like a little sub part of them so what they did is they created a virtual file system that understands Git repositories, and it only checks out like a metadata list, like a directory listing. Wow, cool. And then if you interact with it, it basically will create those files by getting them from the
Starting point is 00:12:34 server on demand. And it doesn't have to be like some plugin, it's like at the file system level. So if I open up like command prompt, or I open up some editor, I just type like GCC, and it has to touch like 10 files, like that will automatically get them from Git if they weren't there. Isn't that crazy? It sounds a lot like Clearcase before Clearcase started to suck. Yeah, exactly. So they built this for Windows and they got really good success. They said instead of 12 hours to clone it, it takes 90 seconds. Instead of eight minutes to do a Git status, it takes three seconds. Instead of 30 minutes to do Git commit, it takes eight seconds. And so they've actually been pushing about half of these changes back upstream into Git. And they've been working with the Git developers to make this a general thing, not a Microsoft thing, which I think is pretty noble.
Starting point is 00:13:20 That's definitely like a new Microsoft, not the old Steve Ballmer Microsoft. Is it just for GitHub or can we use it with other Git? This is just purely for Git. So they're pushing this back to the Git developers, not for GitHub. But where GitHub comes into this is GitHub, maybe they have this problem for projects hosted on GitHub, but people are already using those projects on GitHub. So it's probably okay, but they're trying to sell enterprise GitHub, which is like a box you put in your company to run those things. And these enterprise projects can be like huge, like this Windows problem. And so GitHub is trying to basically expand this to Linux and Mac OS so that they can make that
Starting point is 00:14:01 part of their enterprise story. That'd be cool. I'd like to have it be part of the GitLab experience as well. That'd be good. I'd like to have it be part of the GitLab experience as well. That'd be good. Yeah, absolutely. Yeah, so hopefully this makes it back into Git proper. And then the OS support can come from Microsoft and GitHub. That'd be awesome. Yeah, this is pretty cool, actually.
Starting point is 00:14:17 I'll keep an eye on this. Yeah, yeah, we'll see where it goes. But they've already got demos and stuff working for Microsoft Windows. And there's actually a 10-minute little video as they work through this stuff, you can check it out. It's really short. I think that as well. Speaking of downloading stuff from servers and getting your libraries all put together. I don't know if I'm just dense or what, but the, uh, the multiple times I've tried to set up a dev pi server for caching pipey stuff locally. And mostly I need to do this partly because of setting up, you know, if you want to do a laptop setup. So for, for while you're on the plane or something, but also behind a firewall,
Starting point is 00:14:53 so I can have my build server, not have to go outside the firewall and stuff like that. I'd like to have a local one. And I ran across this article. I haven't actually gone through it. I was going to do that this morning, but it looks pretty good from Stefan Scherfke that's getting started with DevPi. And it walks through basically he had the same thing. He needed to set it up a local server again. Couldn't remember how to do it. The documentation is okay, but it still has some issues. And so he just sort of walks through the whole thing and shows you how to do it in at least one use case,
Starting point is 00:15:28 which is pretty close to what I think most people need, which is mostly mirroring the packages from PyPI that your company actually uses, not everything, just the stuff you're using, and then also being able to store your own local things there. Yeah, that's a great combination. I think the caching bit is really nice. Like you can just point at this thing and it'll just pass through and get the ones from the full PyPI, right?
Starting point is 00:15:53 And then you can tell it to refresh occasionally and stuff. And then you can also just push up your own local ones so that you can share your own stuff around. I think that's a really great thing that probably not too many organizations are doing. If you have different teams working on different packages, like you can actually publish it to like your company through these things, which is pretty awesome. We also have a PyPI whitelist. So that might be really positive
Starting point is 00:16:19 given some of the recent security scares we've had there, right, depending on how paranoid you are. Part of the article is talking about user management. For me, I'd probably set up things for all my local dev team plus the build to be able to get things. But he was having it locked down to just the build server being able to do it, which is an interesting idea as well. Nice.
Starting point is 00:16:42 So the last thing I want to cover this week is what I think a lot of people who are developers or work for a company building a product that are kind of new to it, sort of a technical company, maybe miss, which is the whole marketing side of software, right? Like the hardest thing about making something successful, if it's a web app, or it's a regular app, or it's a SaaS thing, or whatever, is not building it. Building it may be challenging, but that is not the hardest thing. The hardest thing is getting people to notice it in a busy world and getting the word out. The whole marketing side of stuff that most of us developers are not super good at. So there's this GitHub repository called Marketing for Engineers.
Starting point is 00:17:26 And it's a curated collection of marketing articles and tools to grow your product. That's nice. Yeah, isn't that cool? So these guys, they created some kind of iOS app and they're like,
Starting point is 00:17:34 it took us almost two years to learn how to market our project. It was painful. So we're trying to help that. So they said, look, we're going to come up with a bunch of resources that help you solve practical marketing tasks, such as finding better users,
Starting point is 00:17:49 growing your first user base, advertising your product without a budget, all those different things. So they have a whole bunch of different areas that if you're new to this, you know, you can really learn a lot from like how to market on social media, where are the right places, how to leverage Quora, how to leverage product Hunt and business models, all kinds of stuff. So I thought that might be useful. There's about 4,000 people who have started on GitHub. They probably also thought it was useful. It's a huge list. Yeah, it's massive. Yeah. One of the things on there that I saw, it's near the top, is doing things that don't scale, which I love that advice. Yeah, I do. I like that as well. Yeah, definitely do things that don't scale.
Starting point is 00:18:28 As I was writing the PyTest book, I tried to help out as many people as possible on the Slack channel. And even if it meant a couple times, I just asked people, hey, are you available? Can I just call you on the phone? I just talked to people about their issues with PyTest and with testing. Now, clearly you can't do that on a huge scale, but when you don't have any end users at all yet, it's pretty easy. Yeah, for sure. And the behavior creates super advocates for you. And it also lets you realize some of the challenges.
Starting point is 00:18:57 So like maybe in the final version of your book, it reflects some of those challenges that that one person had, but maybe there's a thousand or more people who actually have it. They didn't call you because they just read your book because you already got it, right? I love this because a lot of us nerds didn't become nerds because we really like talking with people. I used to laugh at the people in business school. Now I'm kind of like, huh, they probably know something, don't they? Yeah. Oh, those guys don't know calculus like nothing. Oh, I see how it's going for them. All right. Anyway.
Starting point is 00:19:26 Awesome. So that's it for this week. Those are tons of fun things. Thanks for sharing them, Brian. You have one more bit of crazy sort of American flavored shopping madness around Python for us, right? Yeah, I guess I forget that. Yeah, there's plenty of listeners outside of America. But one of the traditions we have is a Black Friday sale, which has spilled over into online things as well.
Starting point is 00:19:49 So starting the day after Thanksgiving, usually, but we're doing it, I think, a little early here. Maybe not. If anybody doesn't know, I wrote a book. I've been talking about it for a year, so you probably do. But the Python testing with PyTest is through Pragmatic and Pragmatic has a book sale going on the 22nd through December 1st and you get 40% off all eBooks. That is awesome. Yeah. So get in there and get it. The reviews are awesome for that book. Is this a global thing, even though it's the sort of terminology and date is US inspired? Can people all over the world come and get it for 40% off, whatever it is? Yeah.
Starting point is 00:20:25 To get the discount, just use coupon code TURKEYSALE2017. Awesome. All right. Well, go and get that book. You've been on the shelf. The fans, if you've been on the shelf. One more thing that just came up. I had somebody, somebody actually from the Testing Slack channel again, asked me if I could mention PyCon Colombia.
Starting point is 00:20:47 So tickets are available. They're going to have their first Columbia PyCon in Medellin in February 9, 10, and 11 of 2018. So we'll put a link in, but it's pretty easy to find. So that'll be fun. Yeah, awesome. Check it out if you're down in South America. It could be a good time. Or if you want to go visit there, right?
Starting point is 00:21:06 How about you? Do you have any news to share with us? I have no news. There's no news for me. I'm actually working on some stuff. I don't want to, I don't want to announce it yet, but absolutely got some cool things that I'm working on. Always trying to like juggle too much, which is kind of the curse of my personality, but
Starting point is 00:21:21 it's fun. You're doing a lot of cool stuff though. I can't wait to see. Oh yeah. Thanks. Back on the PyCon Columbia thing, but it's fun. You're doing a lot of cool stuff, though. I can't wait to see. Oh, yeah, thanks. Back on the PyCon Colombia thing, they have a really cool logo. So if anybody's going to that,
Starting point is 00:21:31 if you could snag me a t-shirt, that would be cool. Yeah, order the t-shirt. They come with a logo. Well, thanks for talking to me this year. You bet. Great to chat with you, Brian. And everyone as well, thank you for listening. See you later.
Starting point is 00:21:45 Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchcken, this is Michael Kennedy.
Starting point is 00:22:06 Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.