Python Bytes - #68 Python notebooks galore!

Episode Date: March 6, 2018

Topics covered in this episode: dumb-pypi Requests-HTML: HTML Parsing for Humans A phone number proxy * Notebooks galore part 1:* Datalore bellybutton Notebooks galore part 2 Extras Joke See the... full show notes for this episode on the website at pythonbytes.fm/68

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. It's episode 68, recorded February 28th, 2018. I'm Michael Kennedy. And I'm Brian Ocken. And we have yet another bundle of amazing stuff to share with you. I'm super excited about the ones I got. How about you, Brian? I'm really excited. I had to kick some out because I had too many things to cover. I think I changed my list four times this week because I'm like, oh, this is a great list.
Starting point is 00:00:24 Oh, no, this one's more important. No, this is even better. It's awesome. Yeah. Yeah, so before we get to it, I just want to say thanks to DigitalOcean for sponsoring this episode. Check them out at do.co slash Python.
Starting point is 00:00:35 Right now, I want to hear about PyPI, but there's something wrong with it. What's up here? Well, so I've had this on the list for a long time, a project called the Dumb-Y-P-I. So dumb PyPi or PyPI, I don't know. Anyway, it's not really that dumb though. So there's a lot of local, so you can have your own repository.
Starting point is 00:00:57 So there's a bunch of different ways you can set up your own server so that you can serve your own packages. Like if you've got a team or something that you've got proprietary code that you don't want to share with others on normal PyPI, you can have your own. But you have to have a server running. And there's a lot of the generation of the server code is tied to it. So there's like a flask version and there's various versions this one is just a flat file creator so this uh this package dumb pi pi will just take a directory full of uh wheels or zipped packages and create a directory that you can just stick on any server and have it be served up for an index and for instance i've got a so it doesn't do any caching. It doesn't go through to PyPI and grab things that it's missing.
Starting point is 00:01:50 So you have to manually do that yourself. But if we combine this with what we learned in episode 24, that you can just do pip download easily and download your own files somewhere, this combined, I'm using this at work now to create a really simple PyPI server behind our firewall that doesn't have, I don't have to give it permission to talk to the outside world. It's just a bunch of files. It's actually really cool.
Starting point is 00:02:16 So you could even put it up on like Amazon S3 or somewhere like that, right? Right. And actually there is an example on, I think that is the example on the website or the package website, GitHub site, does have an S3 example. It's like super fast and slick and it doesn't do anything like updates or anything.
Starting point is 00:02:39 You have to rebuild everything yourself. But if you're going to, you can set up a cron job or something to do some of this. Exactly. Just do it at night do some of this. Exactly. Just do it at night when nobody's around. Yeah. Just update it daily. How often do these packages change, right?
Starting point is 00:02:52 But like, for instance, I've got like all of our test code that we're creating virtual environments to, and then pulling in test packages and different packages. That stuff just, I don't want it to update all the time. I want it to grab certain versions that I know are there. So something like this is perfect. Yeah, it looks really cool. I think it needs a better name than dumb PyPI. Yeah, it does. Clever, but doesn't do anything PyPI. How about that? No server, server. Serverless PyPI. How about this? Come on. Yeah. Awesome. Okay. So the next thing I want to talk about is something for humans. And if I said it was for humans, who would that mean?
Starting point is 00:03:31 Kenneth. That's right. Kenneth Wright. So he's got all of his things for humans. He's got Maya, DateTime for humans, Records, SQL for humans, obviously Requests. So he's out with a new human thing and this time for web scraping so he created this thing called requests html html parsing for humans so when i looked at this i thought oh is this maybe like a replacement for beautiful soup or something like that like in some kind of extension to request but in fact it actually depends upon beautiful soup right what it is, it's a library that puts a
Starting point is 00:04:06 different API on top of combining requests plus BeautifulSoup plus something called PyQuery which lets you run jQuery style CSS selectors. So it does a bunch of cool stuff. Some of the notable features are it has full JavaScript support
Starting point is 00:04:22 which I'm taking to mean that it will parse and execute the javascript necessary so if i hit like an angular js page instead of just seeing curly brackets everywhere there's data that would have gone in there which is a big deal in web scraping because if you just use straight up request plus beautiful soup you just just get the markup where those bits would execute when it does, right? Yeah. The CSS selectors, XPath selectors, mocked user agents. So it pretends to be a real browser.
Starting point is 00:04:52 So people don't know that you're trying to scrape their sites, which is kind of interesting. Uses connection pooling and cookie persistence. So you can like log in and then go do a bunch of stuff at a site. And you can do it without reconnecting every time so that's pretty cool yeah and it keeps the session open and and tying requests with uh i mean that's what people often you did anyways request plus beautiful soup and tying it in with one one api is great and actually i like the idea anyway of somebody saying hey these tools are great but i wish the ap API was different.
Starting point is 00:05:25 So just write another package that uses others and write a better API then. Yeah, it's a little like Flask, what Flask did, but for requests and parsing. Kenneth is a great one for, he's got good eye for APIs. Yeah, that's for sure. People definitely seem to love his APIs. So I'll leave you with the final sort of tagline here from their website. The request experience you know and love, but with magical parsing abilities. That's nice. Yeah, not bad, right? Cool. So what's up with this phony number thing?
Starting point is 00:06:00 You got some more like prank calls to make? This was awesome. So Twilio does their Twilio blog where they, people can write for him and they're, it's a, I think we've talked about it before. They do a pretty cool program where they, they help give you an editor even to help you out with it. But this article is basically a, um, and you don't have to do a Twilio project, but this is a Twilio project. This is a phone number proxy. So the idea is you imagine a situation like for instance, you've got a, um, I don't know, a meetup or some temporary event, and you want people to be able to text you because you're not going to be around your computer all the time. You want people to be able to text you, and you want to text back, but you don't want to give out your phone number. Well, this project gives you a little proxy so that you can set it up with Flask and set up a server with Twilio and give out a temporary phone number and have it be attached to your phone.
Starting point is 00:06:48 And I'm going to definitely have to try this out because it looks fun. Yeah, that looks really, really cool. And I think that program they have is awesome. One of the challenges of getting started blogging is nobody knows about you. Nobody like you'll put all this effort into writing this thing and you put it out there and your 10 friends who are willing to follow your tech stuff off of Facebook glanced at it. Right. And so here's a way to like appear on a major, major blog and highlight what you're doing and maybe jumpstart your other tech stuff.
Starting point is 00:07:18 Right. Like you could link back to your blog or something like this. Having somebody work with you to to polish it up a little bit is a good idea. Often when you just tap your friends for that sort of help, they'll just tell you, oh, it looks great. Go ahead and put it up. Yeah, yeah. Very cool. Very cool. But this project is also pretty neat. It does encourage you to do some of the paid part of Twilio, but I think for something like this, it's a good idea yeah very nice good article all right before we get to next let's let me just tell you about digital ocean they're doing some really amazing stuff so the thing i'd like to highlight is they just upgraded all of their things and left the
Starting point is 00:07:56 price the same and they by upgraded i mean doubled all the stuff at least so for example you go to digital ocean and get a lin Linux server with all variety of Linux machines, Linux distributions, with four gigs of RAM, two CPUs, 80 gigs of SSD for $20 a month, like that's insane. Right? That is a that is a crazy thing. And that used to cost 40 bucks. And they just said, Nope, that's now 20 bucks. And it comes with four terabytes of free traffic. If I were to just transfer that over s3, which is nine cents a gigabyte, just that bandwidth would be $368 at s3. That's included in your for your $20 server. So really, really awesome stuff. Check them out over at do.co slash Python. And you know, check out what they're doing help support the show everybody's
Starting point is 00:08:46 getting good stuff so thanks thanks to jeroshan for that all right i kind of want to just go on a jupiter-like notebook rant for a while brian because the news around this stuff is just coming in fast and furious so there are so many things going on with notebooks right now. And like, this is a world I don't really live in. I'm much more creative Python project and have like 10 related files and run stuff on the command line or my editor and not put it in these cells because that's just not my world.
Starting point is 00:09:17 But I see how powerful it is for people who are exploring data and being more iterative with their code. And in the last couple of weeks, they've got a lot more options. They've been in the news a lot right now. I'll start with one for this one, and then I'll do another one in the final segment. So for this one, I want to talk about something that's brand new called Datalore. Have you heard of Datalore?
Starting point is 00:09:39 I have not. You've heard of PyCharm, right? So this is like PyCharm in a notebook, online, hosted. So it's from the JetBrains guys. It's just in the cloud. You just go sign up. It has this intelligent editor, just like JetBrains has, like IntelliJ plus PyCharm has with all of the cool autocomplete and IntelliSense.
Starting point is 00:10:01 It comes pre-installed with a bunch of stuff that you need like matplotlib and so on it has collaboration so you can log in and kind of like do google doc style work on it together i don't know how real time it is like do you actually see every character going in or do you know do you have to refresh it does it automatically refresh i'm not entirely sure the level of collaboration but there's some real-time multiple people working on the same notebook type of collaboration i got to check that out it has integrated version control so you don't have to be like if you're a student or you say you're an engineer but you don't like you're not like get pushed on the command line type of competent right you go there and just say create me a save point it basically saves it and tags it so you can get it back things like that oh that Oh, that's great. Pretty cool. The JetBrains, like the diff
Starting point is 00:10:49 viewer for version control is really great. So that building that in here is cool. Yeah. They've got some really cool stuff. And finally, this might be pretty big for some folks, depending on what you're doing. They have incremental calculations. So you can like, if you're doing like machine learning and training and all sorts of analysis, and there's a bunch of cells that work together to generate that data, they actually have figured out how to track the dependencies between where that data comes from. And you have to rerun the entire thing. If you're changing your model, it only reruns the parts that have changed that depend upon something you've changed. Oh, that's awesome. Yeah, that's pretty cool. Right? So if your your computation takes two minutes but this little part's really quick because it
Starting point is 00:11:27 uses mostly finished data and that's a really big deal i think yeah so anyway data lore it seems like it's in beta i don't know what it costs if there's a free thing or whatever but it's it's a jupiter notebook like hosted service from jetbrains which I thought was pretty cool and worth talking about. Yeah. Neat. Nice. I have no idea how to get started on this next one. I'm just going to say the name, Bellybutton. Bellybutton, yes. For personal lint. What's up with this?
Starting point is 00:11:56 So, yeah, I think it's a play on words around linters and where lint usually shows up. So we have things like PyLint and Flake8 and PyCodeStyle, which used to be called Pep8, that I use all the time and love. But there's times where you have like extra requirements for your own team or for your own project. And it'd be cool to have like something like PyLint, but just with your own rules in it. And, uh, and that's where our belly button comes in. So it's a way to, to create rules around
Starting point is 00:12:31 for static analysis or style. And one of the examples that I thought was great was, let's say you've got a, uh, a library with some functions that you decide that your team uses, but you decided some of them are dumb and deprecate them. Yeah, or maybe there's a better way to do things. You can add some of these rules to belly button to say, hey, this code here, you need to change it this way, and actually give exact examples of how somebody should change it. And I think that's a really cool idea.
Starting point is 00:13:01 Yeah, awesome, belly button. I wanted to bring that up. Yeah, it sounds really cool. These linters are really great. And I typically think of them in the context of like continuous integration and sort of team-wide things. But yeah, here's a cool way to sort of make your own overrides and whatnot. Yeah. And anytime where you've got like a coding style within your team, if you can automate it and take the person out of it and take that out of your code reviews, it helps with team dynamics to just have the computer say,
Starting point is 00:13:28 hey, change this code, instead of having your coworkers keep telling you to change your code. Yeah, that's a really interesting dynamic, isn't it? Like people are willing to take petty nitpicky criticism from robots and automated systems way more than from your manager or whoever. Yeah, and you can just, like, we've already had the discussion about what our style is. This is what it is. I don't want to keep opening up the discussion.
Starting point is 00:13:54 So just, you know, do it. Nice. Manager speak. That's right. Cool. All right. You ready for Notebooks Galore Part 2? Oh, more notebook news.
Starting point is 00:14:03 Yay. Yes. So our friend of the show, Daniel Shorstein, posted something on Reddit, some news that has to do with free hosted notebooks in Azure. This would be pretty
Starting point is 00:14:18 much a direct competitor to Datalore. So they are now supporting Python 3.6 Jupyter Notebooks inure and there's a nice conversation over on reddit about that and you go over and read more about it and so on so they had basically if you just drop in on notebooks.azure.com then off you go you can go work with it right there and that's like straight up jup notebooks, I believe. That's pretty cool, right? Free in the cloud powered by Jupiter. Like I'm telling you, this is like a space that is just like so blowing up right now. Yeah, we better pay attention to it
Starting point is 00:14:53 more if people are fighting over it. Exactly. There's big companies fighting over it. So speaking of big companies that want to fight over it, have you heard of Colaboratory? No, a great word though. It is. So this comes from a research, the research group at Google, colab.research.google.com. And people, this has been around for a little while, and people have been kind of dissing on it a little bit because it had been just Python 2. However, it is now Python supporting, not legacy Python, but modern Python. So that's really cool. And since the time that I took this note to talk to you
Starting point is 00:15:27 about it today and today they now have also launched gpu support so you go to your notebook and you say i want to do some machine learning oh yeah run this tensorflow uh this training process on a gpu and you can basically hit command shift p to make it run on gpu like how insane is that that's cool okay so that was pretty cool you ready for some more notebook news yes jupiter lab is ready for users is now open what is jupiter lab so jupiter is something based on jupiter notebooks but it's more just, so we're gonna have to put this with a grain of salt. Probably everyone, a lot of people out there know better than I do, but so it's like a hosted Jupiter notebooks, which is really cool, but it also enables you to use
Starting point is 00:16:17 text editors, terminals, data file viewers, and like all sorts of other stuff that's not just in the notebook so you could like ssh in and do stuff behind the scenes or something to this effect right so they've got some cool pictures like they have uh it's almost like this crazy web ide so you got like your files on the left you got your standard notebook with graphs in the middle and then on the right you might have like a map a couple of json files and a a CSV in like an Excel thing all in the same window. Okay. Well, that's neat. Yeah.
Starting point is 00:16:51 And you can build like extensions and plugins. So like that CSV thing is probably like a JupyterLab extension. Nice. So yet another really cool thing going on there. And I guess the final piece, a tip, maybe from the very first one from this segment is Daniel said, one thing that can happen is when you log into, say, like the Azure notebook, some of their dependencies are a little bit old, like Pandas or Matplotlib or something like that. He shows you how to import pip and then execute pip inside your notebook to force it to upgrade the dependencies in your project. Oh, okay. And it's good that you put, you're going to put the snippet in our,
Starting point is 00:17:29 our notes. Yeah. The snippet is in there, but you can basically, it shows you how to from code run pip to upgrade stuff, which I think is interesting and useful outside of just notebooks, but it happens to be like, if you don't get a remote into them to the servers, you still want to upgrade stuff. It's pretty helpful. Yeah. Nice. Cool. All right. That's a lot of notebook news. Probably have more next week. Probably. Probably. It's, it's really cool though, to see so much innovation and creativity around this stuff. So it's kind of a paradox of choice problem going on. Like if I wanted to get started, what the heck would I do? But there's a bunch of good options here. Definitely. Awesome. All right. You got anything extra you want to let everyone know about this week? Just that maybe I should spend more time paying attention to Jupyter. But
Starting point is 00:18:12 other than that, no. Yeah. Jupyter is pretty cool. Jupyter lab is exciting. Collaboratory is exciting. Notebooks on Azure is exciting. Data lore is exciting. Yeah. I'll have to pay more attention as well. Do you have any news? No news. Well, when this episode goes out, there's a very good chance that I'll be at PyCon Slovakia. And if I am and you hear this, feel free to come say hi. That'd be cool. Neat. Yeah. So I think that's the right timing.
Starting point is 00:18:36 I'm pretty sure it will be. I'll try to line it up that way. All right. Well, thanks for getting all this stuff together, Brian. This is great stuff. Yeah, thank you. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes.
Starting point is 00:18:48 That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast
Starting point is 00:19:07 with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.