The Changelog: Software Development, Open Source - ANTHOLOGY — The technical bits (Interview)

Episode Date: December 15, 2023

This week we’re taking you to the hallway track of All Things Open 2023 in Raleigh, NC. Today’s episode features: Heikki Linnakangas (Co-founder of Neon and Postgres hacker), Robert Aboukhalil (Bi...oinformatics software engineer) working on bringing desktop apps to the web with Wasm, and Scott Ford who loves taking a codebase from brown to green at Corgibytes.

Transcript
Discussion (0)
Starting point is 00:00:00 this week on the change law we're going back to the hallway track of all things open 2023 in raleigh north carolina today's episode features High Key Linakegas, co-founder of Neon and Postgres Hacker, talking about, well, Postgres. And of course, Neon. Little side note here, because Neon is actually one of the sponsors of this podcast. We met Neon at All Things Open,
Starting point is 00:00:37 and we pursued them. We wanted to use their stuff, and we asked them, hey, would you be interested in sponsoring us? And they said yes. And it just so happens they're sponsoring this episode. This is not intentional. Just so you know.
Starting point is 00:00:49 Up second is Robert Abukalil, bioinformatics software engineer who's working on bringing desktop applications to the web with Wasm. And last up is M. Scott Ford on the state of fixing bugs and what's been going on with his software consultancy called Corgi Bytes. Of course, a big thank you to our friends and our partners at Fastly and Fly. This podcast got to you fast because Fastly, they are super fast globally. Check them out at Fastly.com. And our good friends at Fly will help you put your app and your database in 30 plus regions on six continents with no ops. Check them out at fly.io. What's up, friends?
Starting point is 00:01:39 This episode is brought to you by our friends at Neon, on-demand scalability, bottomless storage, and database branching. And I'm here with Nikita Shamganov, co-founder and CEO of Neon. So Nikita, imagine you are a tour guide. Give me a tour through the world of Neon. So let's look at a modern developer. As people say, never bet against JavaScript. So more than 50% probability this person is writing JavaScript and TypeScript. Using React, Next.js, deploying their code on a platform like Vercel, and really care about design.
Starting point is 00:02:16 So working with Figma, working with a local designer, or maybe starting to work with an AI designer and using technology like Vercel just shipped called VZero. And then you got to store data somewhere. So you go to Neon or use Vercel Postgres, which is powered by Neon, you push a button, and now you're able to write and read from Neon. And then that kind of just works out of the box. The majority of your time you spend crafting your application, crafting the front end, and then the database is just kind of like, it's just kind of there and just kind of works. And you don't think too much about it. And when you run previews, when you run next versions of your software, you can send your collaborators, your other engineers on your team, or your product managers or designers,
Starting point is 00:03:05 a version of your app, the version of the future that you want to debate, that you want to comment on. And it's fully sandboxed, you know, from your front end to back end to the database. That's like a good part of this world. The world is obviously much bigger than just building front end apps. There's also back end apps, there are Python apps, there are Java apps, and all of those things. We're perfecting the world for the world that I just described. And we think that the rest of the world will follow.
Starting point is 00:03:33 And the rest of the world is Java apps, REST apps, back-end apps, queues, scheduling, AWS Lambda, Kubernetes, containers. Again, the tech world of the back-end is just enormous. But I think perfecting this first world that I described will create a standard in developer experience
Starting point is 00:03:53 that the rest of the developer world will just follow. So you have Vercel, Postgres, Powered by Neon. You've got Neon as an integration to Vercel. You've got Neon out there at neon.tech as self-serve where anybody could just go and sign up and start up right now. You've got Neon out there at neon.tech as self-serve where anybody could just go and sign up and start up right now. You're optimizing for this new standard, but what's the response been like? What's the community saying? What's the community's response? We are lately onboarding close to 2,500 databases a day. That's more than one database a minute
Starting point is 00:04:20 of somebody in the world coming to Neon either directly or through the help of our partners and they're able to experience what it feels like to program against database that looks like a URL and the program against database that can support branching and be like a good buddy for you in the in the software development life cycle so that's exciting and while that's that's exciting the urgency at Neon is currently unparalleled. There you go. If you want to experience the future, go to neon.tech. On-demand scalability, bottomless storage, database branching, everything you want for the Postgres of the future. Once again, neon.tech. Bye. Let's begin the beginning. Postgres.
Starting point is 00:05:34 Yes, Postgres. 1986, something like that. Wasn't it forever ago? This was released from Berkeley University in 1995. Okay. I'm not sure how long it was developing the university before that, several years. I read there's roots back into the 80s, but I could be wrong. It could be.
Starting point is 00:05:50 Either way, that's ancient history, right? That's a long time ago. And yet, it's the darling of most developers today, Postgres. It's become popular. When I started to hack on Postgres, it was not the case. It was not the most popular one. It was not the darling. the darling so what actually has X. I'm not sure what happened. I think post is just matured
Starting point is 00:06:11 So people used to ask the question like why post this and why not my sequel or something else right? I don't really hear that anymore like it's the default no do you think it could be? Somewhat technical and then also somewhat drama related like there's been a lot of drama in the mysql space hasn't there like with licensing open source yeah shifting like just drama behind the scenes to sort of like make it not very community friendly yeah postgres is also good very technically you know very good technically but like i wonder if that's also a reason to be like don't go there i'm sure it's a factor. Postgres has always had a slightly different community than many other open source projects.
Starting point is 00:06:53 It's truly community-driven and not owned by any single company. Yeah. So that's different. I think that has helped to keep it alive for a long time so that you can't acquire Postgres. That being said, that community is aging. I'm not sure. You may have seen James Governor's recent post on Red Monk about the aging Postgres community and how do we actually transition. Like, where do we go from there?
Starting point is 00:07:12 Yeah. I mean, there's always new people coming, but it's right. I mean, the core people who have been at it for a long time are definitely aging. None of us is getting any younger. Right. Can you summarize some of that, Jared? Well, just if you look at the core contributors to Postgres generally speaking they're men in their 50s they're at the you know in the fourth quarter of their careers at least maybe they would argue that but you know they're
Starting point is 00:07:35 not in the kickoff stage of a career or halftime or halftime i would argue fourth quarter maybe they say third quarter regardless they're getting on the older age of the spectrum and they're like, what happens to the project as those very key players retire, move on, lose interest? It's not dominated by any one person though, so there's a lot of people working on it. And if you look at the wider ecosystem, there's a lot of extensions and there's a lot of stuff happening around Postgres and there's young people there. Yeah. So there's a lot of potential if we can draw them in to become more active on Postgres
Starting point is 00:08:11 itself. Well, Neon, I mean you and your team, I'm not sure your age, but there's fresh, we'll call it fresh blood in the ecosystem. Like here's a brand new startup, relatively a couple of years old, contributing, building extensions, et cetera. For sure. For sure. And putting my community hat on, that's one reason why I'm excited to work for Neon.
Starting point is 00:08:28 I hope I can actually make a difference on that and bring some new blood to the community as well through the company. So you're a Postgres guy before Neon? I've been a Postgres guy since 2006. I've been working full time on Postgres for different companies. Very cool.
Starting point is 00:08:44 How did you know it was the right choice? What was your criteria for choosing? For Postgres, well, I've never really used Postgres, so my background is that I was working on a systems integrator and had some free time on my hands. So I've always been a programmer. I've always been doing stuff, and I'm a big fan of the relational model
Starting point is 00:09:01 once I got introduced to SQL and that. So I had some free time. I was on paternity leave with my daughter, and she was a good sleeper. So I was looking around for projects to contribute to, or if there was something in the open source world I could do. So I started to look at databases. I looked at MySQL code. I looked at Postgres.
Starting point is 00:09:19 I think I looked at some others. But Postgres was the one that was easy to read and easy to do. It was a pleasure to read through and understand and learn more. So I stuck with that. Speaker 1 1 thing we heard yesterday from all things open attendee is that back in June of this year, I believe, on the Postgres mailing list, you proposed or maybe not proposed, but brought up something that's probably been stirring for a little while. He called it the most significant change to Postgres,
Starting point is 00:09:48 if it lands or if it happens, in a long time. You want to tell us about that? You must be talking about the multi-threading, changing to multi-threaded architecture. Yes. So yeah, that came up in conversation in PGCon at the end of May with some other hackers. We were talking about some features,
Starting point is 00:10:08 and wouldn't it be easier if we had a multi-threaded architecture? So what I ended up, I kind of summarized the discussions because it seems like there's a rough consensus that if we had multi-threaded architecture, would it be better at this point? But there's a lot of history, of course. It's another easy change to go from multirocess architecture to multithreaded. Can you explain the foundational difference between multiprocess and multithreaded? Right.
Starting point is 00:10:31 So the key difference between multiprocess and multithreaded architecture is that when a new connection comes in, Postgres launches a new process to handle that connection. In a multithreaded architecture, you would only launch a new thread. And the difference between a process and thread is basically that threads all share the same address space in the process, whereas with processes, each process has its own address space. And that makes a difference in how easily you
Starting point is 00:11:00 can share data or share data structures between the connections. So multi-threaded architecture would make it a lot easier to resize things like buffer cache, a lot of other caches that are currently not shared across the connections in Postgres, that would make it easier to share them. Right. Does that change the CPU utilization as well? It might, yeah.
Starting point is 00:11:23 I mean, if I looked at HTOP, would I just see, like, when Postgres is being pinged, just, like, one line? Or, like, if I had eight cores, all eight cores lit up? Yeah, so multithreading wouldn't directly do that. Like, just by switching to multithreading, we wouldn't get that. Postgres can already multi-utilize multiple cores by launching multiple processes to process one query.
Starting point is 00:11:41 Right. But that was actually when that parallel query was implemented a few years ago. that was actually when that parallel query was implemented a few years ago, that was actually a lot of the effort went into working around the fact that it's a multiprocess architecture. So you actually have to build a lot of infrastructure to share the data between the processes,
Starting point is 00:11:56 which would be a lot simpler in multi-threaded architecture. So I think we could probably do more. It would probably speed up the development of parallel query as well, although that would be a separate project to do that. That's another mailing list post. Yeah. So multi-threaded software has specific requirements in order for it to be thread safe, right?
Starting point is 00:12:18 Yeah, sure. That used to be a problem back 20 years ago when this was probably the first time discussed. I think if you look back at the 95 or 96 discussions, and I think I've seen some comments saying, well Postgres is multi-processed now, but maybe we'll switch to multi-threaded later, and that was like 25 years ago.
Starting point is 00:12:37 Right. What was the question? Well, I didn't quite get there, but here it is, is that if you were assuming multi-process for all these years, these 25 years, and not thinking multi-threaded, I imagine it's not an insignificant change to the software. Oh, sure. Yeah. Oh, right. So thread safety, that used to be a big deal a long time ago. But nowadays libraries, I mean, most software, when people are writing software now, they would start with the multi-thread architecture. So that's not really a problem anymore.
Starting point is 00:13:07 Like all the libraries are multi-threaded or thread-safe. There are thread-safe versions of everything. So that was a good argument or would have been a problem 20 years ago. Not really a problem anymore. Not really a problem now. But of course, switching, you know, all the existing code need to be adapted somehow. Yeah, exactly. So that is a problem.
Starting point is 00:13:25 That's a long problem. And that's the hard part of all of this, really. Changing Postgres itself, but also the whole ecosystem to be thread safe. Most of it probably already would be, but how do you know? Like, how do you tell? So that's going to be the hard part in this
Starting point is 00:13:40 to figure out how do you detect the cases where something is not thread safe. I mean, it seems like this feature is an excellent case study in how a large change to a open source, multi-organization team, core team, introduces an idea, agrees on the idea, like the governance involved,
Starting point is 00:14:04 and then the actual work, who does it, how does it get divvied out, and then how does it actually land and transition? Isn't that a really complicated beast? Yes, it is. How does it work? We'll show how it ends up. Postgres doesn't have a very,
Starting point is 00:14:19 there's no voting system, there's no, it is actually hard to even make decisions like that because it's not well defined. How would you do that? The rough idea is that you try to find consensus and if someone very strongly disagrees, then we work through those disagreements. But yeah, it can be hard to pull off big changes like that. But at the end of the day, like what really, first thing that needs to happen is someone actually needs to do all of the work to show if this is what it was. You got the idea out there.
Starting point is 00:14:49 Are you asking for consensus and then the work? What's the stage of this idea? It's just an idea at the moment. I've spent a few hours, days maybe, thinking about it and writing some very preliminary stuff that, you know, some
Starting point is 00:15:05 small changes that we should make anyway just to clean up the code. But no, there's no real concerted effort yet. Yeah, that's going to be a lot of work. I mean, the first thing to do is to, and what I wanted to do with the posting in June was to make sure that I'm not missing some, you know, that I actually understood the right that there is consensus that this would be a good thing if we understood the right that there is consensus that this would be a good thing if we had it and that there is no strong objections from any of the core people on that. Otherwise, it would be pointless to spend any time on it.
Starting point is 00:15:35 Yeah. But the next step really needs to be to actually start to write some code to do that. I don't know if I'm going to do that, maybe, or maybe I'll have to do it together with the team. Sure. But we'll see. Is that something that would be beneficial for Neon? I imagine it would be.
Starting point is 00:15:49 It would be. And that Neon would be willing to fund the development of. Yeah. I think we, yeah. So Neon, it would benefit Neon because we do all the scaling. And that becomes easier if in a multi-traded architecture because that makes it easier to resize some of the buffer cache. It makes it easier to share some of those caches. Kind of the same problems that everyone has,
Starting point is 00:16:09 like it would benefit everyone. But yeah, for Neon, that would really help with all the scaling part. Gotcha. When we had Nikita on the show, probably 18 months ago, roughly. Exactly this time last year. Oh, was it?
Starting point is 00:16:22 I think so. Okay, a year ago. He mentioned three or four patches that Neon adds to Postgres to customize for your guys' needs and how they're trying to upstream those. He wasn't sure if that was ever going to happen, but he thought, you know, good chance, but it takes time, et cetera. Any update on upstream contributions from your team? Yeah. So those patches are still out there.
Starting point is 00:16:41 Not much has happened, unfortunately. The biggest patch we have is to do what's called the Storage Manager API in Postgres, which isn't really an API because there hasn't really been any other implementations in the past 20 years. So that patch is still out there to make that more pluggable, but there has been no progress. So with the Postgres community,
Starting point is 00:17:01 and I'm sure other communities have the same problem, it's hard to sometimes get attention to these things. If no one else is really feeling the pain, there isn't much happening. Although on that, there's been a lot of good discussions and some other ideas people could do with those patches and those APIs. But yeah, nothing has been committed yet. The patches are essentially the way it writes to disk. Instead of writing to the disk, it writes distributed?
Starting point is 00:17:26 Yeah, so Neon plugs in at a really low level. So whenever Postgres would read a page, an 8-kilobyte page from disk, like we hook in at that point. So we read it from elsewhere, like from our storage system. So yeah, making that, having an extension point there in Postgres would help to eliminate those patches. That sounds like your competitive advantage, though. Neon's competitive advantage.
Starting point is 00:17:48 Like, couldn't, if that patch goes into open source, does that become a threat? Well, it's already out there open source, so anyone can already start using it. That's true. And Neon, you know, lives and dies with Postgres. Right, okay. That's what I was trying to get to. Like, if this can be used by the enemy, let's just say.
Starting point is 00:18:04 Is that a bad thing? You know, I made peace with that thought a long time ago when I started to work on Postgres. I mean can be used by the enemy, let's just say. Is that a bad thing? I made peace with that thought a long time ago when I started to work on Postgres. It is a liberal license people can take and then do whatever they like with it. It speaks to the DNA and the outlook of the company, which is why I asked that. Do people see
Starting point is 00:18:19 Neon as a player, a safe player, I don't know, a nice player in the Postgres world, or are you trying to build a proprietary moat? I sure hope people see us as friendly people. Okay, that's a better word, friendly, yes. We want to partner with everyone and we like to make friends. Right.
Starting point is 00:18:37 So you're waiting on those particular patches, who knows. Postgres as a project, you know, you say you live and die with it. It seems like through its history it's had times where it's, quote unquote, fallen behind with features. And other people pop up and say, you know, look at these NoSQL, for instance. Look what we can do with JSON, right? And then eventually Postgres was like, well, we added all the JSON things and now we can also do that. What's next in that line? Like, what are you seeing out there? Or maybe what you guys are building where it's like Postgres can't do that. What's next in that line? What are you seeing out there
Starting point is 00:19:05 or maybe what you guys are building where it's like Postgres can't do that but people are doing it and now it's going to have to catch up at some point? That's a good question. Putting my knee on the handle on the storage related stuff that we are doing separation of compute and storage. Although that is out there in the open source
Starting point is 00:19:24 so people could take up and run with it. I don't know if that will fully take over the world or if that will stay to be something that we do. We'll see. But there are competitors doing similar architectures as well. Then there's all of the exciting stuff happening with PgVector, for example, a vector service. That's a hot topic. But I think Postgres is actually doing pretty well there.
Starting point is 00:19:46 The PGA vector is popular, and it keeps improving at its own pace, and that's all good. A similar thing with PostGIS. Postgres is pretty dominant in the GIS world with that. Yeah, good point. Are those things that, when using Neon, are those things that are pre-integrated for you as a user of Neon database?
Starting point is 00:20:09 Or is it like click a box, get PG vector? How does it work with plugins? Yeah, we provide those extensions. You just do create extension and you get it. So you just have full Postgres access and you're just doing your thing, huh? Yep. Okay. So geo-distributed Postgres around the world.
Starting point is 00:20:27 Let's talk about that. Okay. Can you do that? No, we don't do that at the moment. Okay. We've been thinking of that. We have a lot of good ideas. I know you do.
Starting point is 00:20:36 I remember asking Nikita about that as well. I'd love to hear from your mind. What are some ideas around this? So what you could do, first of all, you can run read-only replicas in different regions. That's kind of the first step, easy step. With Neon, we could also run the storage in different regions and do the replication at a lower level. Okay.
Starting point is 00:20:57 We have no plans for multi-master or multiple writer systems. There are other projects trying to do that, but that's always a hard problem. And it introduces a whole new set of problems, so we're not going there at the moment. Yeah, you gotta kind of break the cap theorem to do that. People are claiming it's possible. Is there a real demand for that, or is it just something that people like me
Starting point is 00:21:18 like to talk about and ask about? I don't know. I haven't really seen very... We don't hear a lot of people requesting data, let's put it that way. People talk about it, people ask about it, but not in a serious way, like, I don't think we've lost any customers
Starting point is 00:21:33 because we don't have it. Given Neon today, what is the current architecture? If you're not geo, you know, distributed, what is the architecture? When you deploy Neon, what is the benefits of using it? Why do people choose Neon for, you know, you don't write to this, you write to distribute it. How does that actually play out?
Starting point is 00:21:49 What's the architecture? So the core of the architecture is the separation of compute and storage, and then we have a control plan that kind of manages those Postgres instances in VMs, and there's a proxy, there's some moving pieces. But the, so the big differentiator that you get with that architecture is it's serverless. So what we mean by that is that we actually shut down Postgres if you're not using it.
Starting point is 00:22:11 So that's really good if you're a developer and you don't need to worry about forgetting to shut it down in a nutshell. The other thing that the storage system can do is the branching. And it kind of replaces traditional backups and while archive so you can do point-in-time query you can easily spin up a new Postgres instance against an older point in time start running queries against that stuff like that the branching is something that is kind of unique and we hear a lot of good things about that people people like that if you're if you're a developer you want to create the branch of your of your
Starting point is 00:22:44 development database or even your production database and do your changes run your PR against that and when you're done you can forget about it or you can you can refresh that right you said storage system is that like a different term that sits above the database so neon is the storage system and then there's the database like give me an idea what you mean when you say storage system. So we wrote a completely new server software that runs below Postgres and it deals with these 8 kilobyte pages
Starting point is 00:23:13 and it understands the Postgres write-ahead log format, the transaction log, and parses that. So whenever Postgres needs to read a page, it goes and fetches the page from the storage system instead and there's an interface for that. So that's different from just running Postgres on a remote volume because it actually understands about the Postgres disk format and it can do this branching, it can do the copy and write stuff underneath that. Gotcha. What else is exciting to you right now in the world of Postgres or even
Starting point is 00:23:43 beyond? Well, I mentioned PgVector already. I think that's an exciting thing. People are doing a lot of exciting stuff with that. Postgres world, there's stuff happening with asynchronous I.O. from colleagues at Microsoft. They're doing work on that. I think that will improve the I.O. speed, and that's really good for Neon as well because we've separated the storage. That actually helps us a lot.
Starting point is 00:24:07 So I'm hoping to spend personally some time reviewing those patches to see them go in. Cool. I love it. Yeah, thanks for talking with us, man.
Starting point is 00:24:15 Neon's awesome. Thank you. Appreciate it. Thank you. So So I'm here with Ian Withrow, VP of Product Management at Sentry. So Ian, you've got a developer-first application monitoring platform. It shows you what's slowed down to the line of code. That's very developer-friendly. And it's making performance monitoring actionable. What are you all doing that's new developer friendly and is making performance monitoring actionable what
Starting point is 00:25:06 are you all doing that's new what's what's novel there traditionally in errors what's the strength of century is we've taken not a stream of errors and said hey go look at this like all these error codes are flowing into says we actually look at them we try and fingerprint them and say, hey, we've actually grouped all these things. And then we give you everything you need within Sentry to go and solve that error and close that out. And that's, I think, driven tons of value for our users. And traditionally, if you look at performance, it's not that thing. It's looking at certain golden signals, setting up lots of alerts, maintaining those alerts, grooming those alerts, and then detecting them. And then maybe you have a war room and you try and look at traces, or maybe
Starting point is 00:25:49 you realize, oh, it's this engineering team that owns it. Maybe they'll look at logs, whatever they have available. Performance is very rotated on detection and then isolating to where the problem may exist. And root causing is often an exercise left to the user. Good performance products provide a lot of context and details that an experienced engineer or DevOps professional can kind of parse and make sense of and try and get to a hypothesis of what went wrong. But it's not like that century error experience
Starting point is 00:26:24 where it's like, here's the stack trace, here's all the tags, oh, we see it's like this particular segment of code, and Ian did the commit that changed that code, and do you want to fire your issue and assign it to Ian? It's not like that crisp, kind of tight
Starting point is 00:26:39 workflow that we have errors. This is breadcrumbs. Right. And we said, hey, maybe there's no reason why we couldn't do this for performance. Let's try. Okay. So you took a swing. You tried. Describe to me how that trial works. If I go to my dashboard now and I enable APM on my application, what are the steps? Largely because we kind of encourage you to go and set up transaction information when you set up Sentry. You probably, as a user, probably don't need to do much. But if you skip that step, you do need to configure to send that data in your SDK.
Starting point is 00:27:12 And what happens is we start now looking at that information. And then when we see what we call a performance issue, we fingerprint that and we put that into your issues feed, which is already where you're looking for error issues right it's not a separate inbox this is the same inbox the same inbox yeah now we obviously give logical filters and if you just want to look at those we do that um and for newer users sometimes we detect hey you've probably never seen this before we can kind of we do things because we know we build for for mass market that bring your attention to it but it's the same workflow you have for errors today so you don't have to learn something new uh to take advantage of these things so you asked the experience so last fall we did the experiment the first one which we called m plus one and we didn't know how it was go honestly uh but uh people liked it like we
Starting point is 00:28:01 we kind of know people like it when they start tweeting and saying nice things about it. And so, yeah, it got traction. Very cool. So if your team is looking for a developer-first APM tool to use, check out Sentry and use the code CHANGELOG when you sign up. And you're going to get the team plan for free for three months. Make sure you tell them we sent you because they love hearing from our listeners. Check them out at Sentry.io. Again, Sentry.io. Again, Sentry.io.
Starting point is 00:28:28 That's S-E-N-T-R-Y dot I-O. Are we started yet? This is the show, man. All right. We're here with Robert Abukalil. Hello, hello. His second appearance on the changelog. Apparently.
Starting point is 00:28:58 Allegedly. Allegedly. Sorry, that's a better word. Apparently also works. According to you, and with some verifiable memory of mine, we talked to you at OzCon probably 2018? 2017? 2017 maybe? I would say 2019,
Starting point is 00:29:14 yeah. Okay. And we talked about WebAssembly. We did. Was this in Europe? Was it in Europe? No, no. It was in Portland? It was in Portland. You were there. I went to OzCon London one time by myself. That's what that was.
Starting point is 00:29:29 Okay. Was WebAssembly a thing then? Yeah, it was. Yeah. It was a thing. It must have been because you were into it. Not as much a thing as it is now. Okay, this is sparking a memory, okay?
Starting point is 00:29:39 Isn't he? Yeah. Well, backstory for you, Adam, is he walked by earlier, and we both kind of locked eyes. And I was like, do I know you? And he's like, do I know you or something? And he's like, yeah, I think you had. And I was like, I have no memory of this. I said the same thing.
Starting point is 00:29:51 I'm like, I know this guy. I have a memorable face. Jared and I went to an OSCON together in Austin, I want to say, right? Probably. 2017. Portland in 2018. Portland 2018. That's probably where we met.
Starting point is 00:30:03 And then we haven't been there since because it stopped. Yeah. So that's why I thought the only OSCON we had been to was in Austin. So in my memory, until this moment, have you now inserted one brand new OSCON in my life, which I went to? I definitely went to Portland in 2019 in the summer, for sure. So, yeah, because I took my daughter and my mom to meet with family. And that was OzCon. Maybe it was 2019.
Starting point is 00:30:29 Anyways. Either way. Neither here nor there. History has been painted. Robert was there. He's probably correct and we're probably wrong. He was in the WebAssembly. He's into bioinformatics.
Starting point is 00:30:37 Yes, I am. You're still into both of these things. Surprisingly, yes. And I don't know what we talked about then specifically, but one thing that is interesting to me about WebAssembly is how much promise it has, but how little, in my purview, practical use it has beyond tinkers or people with very specific needs. So just curious your perspective on that. Yeah, I think I generally agree with that.
Starting point is 00:31:04 I think people who think that WebAssembly is going to be used everywhere are just wrong. Just wrong? Okay. It's just not what it's meant for. It's a very heavy-duty tool. Like if you have needs for running compute-intensive workloads in the browser, like Figma and Photoshop, Google Earth, or bioinformatics, I should add, all those are great applications for WebAssembly because for the first time you can take code that's not written in JavaScript
Starting point is 00:31:30 and bring it to the browser. But if you're building your typical web application that doesn't have any sort of compute, any sort of processing, audio, visual, then you probably don't need it. That's kind of my view on it. What about these people that are taking it server side? There's a lot of talk about that as well.
Starting point is 00:31:50 I mean, do you dip into that area at all? A little bit. So there is a lot of excitement about that. I don't share that excitement. Okay. Because here's the thing. When you're running WebAssembly in the browser, it lets you do something that was previously impossible.
Starting point is 00:32:11 You just couldn't take a C program running in the browser. Yeah. Except maybe Asm.js, but that was kind of a precursor. It lets you do things like SIMD. That's also impossible with just JavaScript. But once you leave the browser, you can do whatever you want. So WebAssembly is one extra alternative to the other hundred you have. So from that angle, there's a few use cases that I think are pretty valuable for WebAssembly on the server.
Starting point is 00:32:39 Maybe you want to extend your application, let's say, with plugins. And you want to let users write whatever code they want, and you want to extend your application, let's say, with plugins, and you want to let users write whatever code they want, and you want to execute that securely, WebAssembly is a good sandbox for that. But then again, you're not going to reimplement that yourself. You're going to use some other tool that maybe under the hood uses WebAssembly to solve that problem.
Starting point is 00:33:01 Okay. What kind of stuff are you doing? So I'm doing mostly web stuff. So bringing bioinformatics tools to the web for either building applications that analyze data in the browser so that you don't have to figure out bioinformatics dependencies, which are kind of a mess. If you want to keep your data private, it's kind of a local-only type workflow. The other thing I'm really interested in is something I'm talking about tomorrow
Starting point is 00:33:32 is using WebAssembly to power interactive tutorials for command line tools so that you can, you know, instead of when a student logs into your website, you spin up a container for them. That's super expensive. You could run these tools in the browser, give them a similar experience, and much, much cheaper for you to host. What should we know about bioinformatics that makes sense to us?
Starting point is 00:34:00 What exactly is bioinformatics? Oh, that's a good place. Can you say that three times fast? Bioinformatics, bioinformatics, bioinformatics oh that's a good place you say that three times fast bioinformatics bioinformatics bioinformatics that was not fast enough that was there was a pause in there i'll say it three times slowly please explain so bioinformatics is using computer science and software engineering to analyze biological data okay like dna Yes, exactly. So for example, if you're interested in knowing I don't know, which diseases you might be at risk for,
Starting point is 00:34:30 you could take a blood draw, isolate the DNA, sequence it, figure out what all the letters are and compare those to a reference and figure out what's different there and has that been associated in the past with some disease or something like that.
Starting point is 00:34:47 Right. And so the process of figuring that out, the algorithms and the software around that, is basically bioinformatics. So what does it take to take these kind of applications that are probably behind a desktop application, right? They're probably written in C or for a desktop environment, and you want to take those kind of applications to the web to essentially open it up where you can just go to any platform,
Starting point is 00:35:12 Linux, Mac, Windows. Is that the reason why? Yeah, yeah. And so, like, one example is I have this website called fastq.bio. So it takes in some data that you get out of an instrument and runs some really quick data analysis to tell you how good of a quality the data is. And, you know, it runs in the browser
Starting point is 00:35:33 because that's just super convenient. People drag and drop their files and they're done. They don't have to figure out how to install it, how to set it up, and all that stuff. So that's one use case. You wouldn't necessarily do super heavy duty analysis it's still the browser you're kind of limited by what the user has but it's a nice way to cover a ton of use cases that previously were not covered
Starting point is 00:35:56 and you specialize in the wasm world in bioinformatics in particular like that's where your usage of wasm is in that silo. Yeah, that's right. So I have a tool called BioWasm. BioWasm? Yes. That's pretty cool. Can you say BioWasm for a second?
Starting point is 00:36:15 BioWasm, BioWasm, BioWasm. Much easier. Yeah, that's true. Speaking of, how do you guys pronounce Wasm? Is it Wasm or Wasm? Well, I call it Wasm. Okay okay but i'm open to either direction i don't even understand why i call it wasm but i do call it wasm it's web assembly assembly wasn't yeah it's a it's a wasm i mean one time i called it wasm because i wanted to rhyme with awesome but that was just a means to an end right right, right. But I do call it Wasm.
Starting point is 00:36:45 I'm not sure why. I don't know either. I think we may have been on a podcast with somebody who seemed to be more knowledgeable than we were and called it Wasm, and so we kept going there with him. That's true. Although, it didn't work for Richard Hipp. I mean, I still call it SQLite. He's definitely more knowledgeable than I am about the project.
Starting point is 00:37:02 Yeah. So, yeah, I'll stick with WASM until I'm convinced otherwise. Sounds good to me. Yeah. Now, what do you call it? I call it WASM. And so why do you call it WASM? Because we did.
Starting point is 00:37:14 I don't know. I don't know. Nobody knows. Well, that's the thing. Sometimes just the first way you hear it is just how you do it. Right. What's a weird phenomenon in computer science and podcasting or real life conversing is
Starting point is 00:37:27 a lot of times with a term or an acronym or whatever it is we'll read it for years but we'll read it to ourselves for years and we've never actually had to say it to somebody else and then you have that moment of how do I say this I've been reading it for years writing it for years
Starting point is 00:37:43 and it's a weird moment that we all experience. So maybe we just had that with Wasm. Okay. But I'm glad that we're all on the same page. That is good. We have consensus.
Starting point is 00:37:52 Excellent. Although on our show recently, Christina Warren did say, yes, I call it Jif. And then she just continued to talk as if we shouldn't stop the world and discuss. Do you remember that?
Starting point is 00:38:01 Well, she's here. We can get her on the mice again. Christina's here? Yeah, I saw her downstairs. All right. We'll have to get her on the mice. We'll have to get her. Hey, listen. Our listeners, she's here. We can get her on the mics again. Christina's here? Yeah, I saw her downstairs. We'll have to get her on the mic. Hey, listen. Our listeners, a.k.a. Jared, listened to this part of the show and was upset because we didn't get the beef about Jif versus Gif.
Starting point is 00:38:14 I was upset in the moment. But she talks too fast, so I just let it go. I thought it was an appropriate amount of speaking cadence, but I will agree. I missed that argument. We had better things to cover, though. We did.
Starting point is 00:38:27 Let's get back to Robert. We also have better things to cover right now. Yeah, we do. We're sidetracked. Okay, so bioinformatics, taking applications that are for the desktop to the web, what kind of applications make the most sense? You mentioned this one where it sort of does data analysis. What does the web need?
Starting point is 00:38:43 What does a user base need of the web that can use these kind of tools and specific to what you know and then just in general for what WASM can actually do? Yeah, so I think for, it's pretty similar across the board. I think for bio, you know, tools that do some sort of preview of an analysis are really useful. Some analyses are just really small too. Like if you're analyzing, let's say the genome of viruses, they're pretty tiny.
Starting point is 00:39:10 So you could actually just run the whole thing in the browser. And so that gives you both the advantages of not having to install the tools and to do it in a privacy conscious way. In terms of more broadly outside bio, because you have audiences that aren't biologists, is that right? That are what? That are not biologists. We haven't surveyed them recently, but I think that's fair. Okay. I would say we got at least
Starting point is 00:39:38 one. Okay. That's good. I guess there's a few categories. If you have a tool that you already have in another language, and you really want to bring it to the web, and you don't want to rewrite it all in JavaScript, I think that's a great use case. Yeah. If you have a slow application that has portions of it that are really heavy JavaScript compute, in some cases, this is something that also tends to be overplayed.
Starting point is 00:40:07 This not always happens, but you can get performance improvements by switching it off with WebAssembly. But you can also get worse performance. And yeah, that's kind of a couple of applications that I think are pretty relevant. Describe worse performance. Is it like, because sometimes access is enough and I'll wait because maybe the web is easier
Starting point is 00:40:28 and I can't install it on my system or I can't because literally, I literally can't install the application, but I can browse the web and I can authenticate on the web. Yeah. So one big thing that I've noticed is that when you have a WebAssembly module
Starting point is 00:40:42 and it needs to communicate a lot back and forth with the JavaScript world, that is super expensive. So ideally, your module takes in a little amount of data, does a bunch of stuff, and returns small amounts of data. But if you're constantly returning large chunks, and that's because WebAssembly only understands numbers. So if you pass in strings, it converts to a number. Pass in an object, it converts to a number.
Starting point is 00:41:10 Do you know the conversion by any chance? Like if I said the word the, what number is that? Oh, of course. It's 86,112. No, I'm kidding. That'd be cool if you knew. It would. You could have kept going. We totally
Starting point is 00:41:25 bought it. I would have been spooked. I would have been like, oh my gosh. Well, that's cool. Numbers only. Numbers only. So the translation layer in between is expensive. Yeah. And so that's actually one way in which you can try to optimize the performance is that if you switch off some JavaScript with WebAssembly, you can try to trim that down in order to speed it up. Yeah, makes sense. Back to your current interest of CLI tutorials in the browser. Yeah. Are you giving people full-fledged Linux environments in the browser,
Starting point is 00:41:59 or how does it work? Not yet. So right now in the V1, every tool I have to compile to WebAssembly. And then I have this sort of, you know, Xterm.js simulates a console. And I kind of hook those up together. In the future, what I'm going to do is actually switch that up with a full-blown Linux OS in the browser. That's going to be a little slower, but
Starting point is 00:42:27 it's going to be worth it for getting some things on there that are otherwise hard to do just by directly compiling. And the way this is using an open source project called V86. So they wrote essentially a CPU
Starting point is 00:42:44 emulator in Rust and so they compiled that to WebAssembly and that's kind of how they emulate the whole operating system and it boots up there's a BIOS, there's everything it's pretty wild That'd be kind of cool man Can you
Starting point is 00:42:59 stimulate any BIOS or just a particular BIOS? I honestly don't know what a BIOS does. Okay. It's a basic input-output system. Except for I know how to get there in most cases. Delete, delete, delete. Maybe one of the Fs. It could be an F11.
Starting point is 00:43:16 It could be an F10. Who knows? Just hit all the Fs. You got to watch real fast. Which was a delete? Gosh, I missed it. It's like booted up already. Well, I think of that because if you can emulate those things, you can kind of give something a playground to
Starting point is 00:43:29 configure hardware or to configure a BIOS or whatever it might be to be like, okay, this is how you change the boot order. This is how you set these two NVMe drives to be the boot or to the USB or whatever it might be. Or this is how you set up virtualization in this particular Intel CPU, for example.
Starting point is 00:43:47 Those are the kind of things that you kind of have to have the hardware to learn. Until you have the hardware, you can't learn it. And then you're kind of by yourself. You know what I mean? If you could do it in an environment like that, there could be interactivity because you're emulating it. Yeah. I was mostly thinking, like, once you're logged in, past boot time.
Starting point is 00:44:05 Right. Yeah, this is an interesting use case for it. Yeah, it's a black box. I mean, you go to the forums, you'll find zillions, and I don't mean that like literally zillions, but quite a lot, of people saying, how do you do this with this BIOS or whatever, or AMI, you know, all the BIOS out there. And you got somebody showing screenshots. And that's just so like, that's caveman knocking rocks together trying to make fire. You know, you can have this emulator.
Starting point is 00:44:28 You'd be like, this is how it works. That would be amazing. You know? And you don't have to have the hardware. It's just here in the browser to play with. Yeah. Yeah. So once you're logged in, how leaky is the abstraction right now?
Starting point is 00:44:41 Meaning, like, maybe you know what I mean. I do not, no. Okay. What do you mean by leak abstraction? I'm just kidding. What I mean is, so for instance, a lot of text editors have Vim mode. Most Vim users will use Vim mode for about 7 to 12 minutes
Starting point is 00:44:57 and be like, this is not Vim. I can see all the places where this is not, clearly not Vim. Leak abstraction is not the right term. I just overused that term. Yeah, your emulation ends, maybe we call it the uncanny valley of what you're actually trying to emulate, where it's like, eh, this is not good enough.
Starting point is 00:45:13 Yeah, so if you're using some like SIMD instructions that are too fancy, that won't be supported. Yeah. If you're doing multi-threading, the emulator doesn't really support that, so you'll just have to stick to one thread. Those are kind of big ones. You're also just limited by how much RAM you can use in the browser. And also more realistic limitations, like if you're trying to run some Java program. I tried this recently. It works, but it takes
Starting point is 00:45:46 a few minutes. Yeah, just slow. So, you know, not practical in that case. Right. Kind of the 80-20 rule. Yeah. Okay. How big of a performance hit, boot up time, or load time, we'll just call it that, will it be to switch to this full
Starting point is 00:46:02 Linux environment? And is anybody else doing this currently, like loading Linux completely in the browser? Yeah, so there are projects that are using it. I'm not aware of people building tutorial sites with it, which is a shame because it's a really powerful tool. Most tutorial platforms I'm aware of tend to do the whole, like, we'll spin up a container, shut it down after a while, which is super expensive.
Starting point is 00:46:31 Expensive for them to run for their users? Yeah. And typically what you'll see is they'll start, hey, we have a free tier. They'll be like, hey, maybe just you can use it for a few hours. And then it turns into there's no free tier because we can't support it. Yeah, you can't support it long term. I think about Debian. Debian
Starting point is 00:46:50 just released a new version and I believe the install process changed enough to be talked about. So it'd be cool to emulate for Debian when they launch. Like, here's how the new installation process works. Here's the screens that have changed. If you're doing a unique disk set and this is how you need to do RAID or whatever or
Starting point is 00:47:09 choose this or that or choose ZFS or whatever it might be, then you can emulate it in the browser. This is a great example of that because you can see it before you actually have to install it. Or you can install it, but you have to have the hardware and enough hardware to expend on a tutorial or at least be able to virtualize with, say, Proxmox. But maybe Proxmox can't support the latest Debian, which it can. I'm just saying.
Starting point is 00:47:28 What if there's something there? If you emulate it, you can sort of just, it's marketing in a way. It's almost like here's how it works. Right. And if you don't know how it works, this is how it works. This sounds awesome. You should do these things. I want this.
Starting point is 00:47:41 He's focused on bioinformatics, right? You're teaching specifically those kind of tutorials. But you're playing with Xterm.js, though, right? And on bioinformatics, right? You're teaching specifically those kind of tutorials. But you're playing with Xterm.js, though, right? And your platform is beyond, right? You could use this generally. Yeah, you can use this for anything, really. Of course, I am going to add tutorials that are not biospecific, like Git and Grep, Set, Auck, all these things that I think everybody would be cool.
Starting point is 00:48:02 Make the basics. Yeah. Core utils. So give an example of how these tutorials would work then. Like let's say I have zero idea of how I would use awk or grep. Yeah, so there's an awk tutorial
Starting point is 00:48:13 right now. You can go to sandbox.bio and click on the awk tutorial. It basically shows you tutorial contents on the left and it shows you some scenarios. Like let's say you want to analyze a tab-separated file and filter out rows that have a number greater than whatever in a column. So you can do these sorts of things.
Starting point is 00:48:37 Ock, by the way, is a whole programming language, which is amazing. You can launch processes within it. You can write to files. You can like, it's quite deep, yes. But yeah, so the tutorial has these sorts of examples. And then you have exercises. And so some of them, I admit, are probably too complicated. Like you're doing a bit too much math for awk,
Starting point is 00:49:06 but just to show you how powerful it is. And you're working in, like, an emulated environment that is a terminal with an emulated version of awk. That's right, yeah. It's using GNU awk version, I don't know, 5. something. How do you author these tutorials? So some of them I've made up. Some of them I work with others who already wrote text-based tutorials.
Starting point is 00:49:33 And we kind of bring them into this interactive place. And it kind of brings them to life. Okay. Describe this interactive place. Oh, I just mean like, you know. Is it like the good place? The bad place? It's a this interactive place. Oh, I just mean like, you know... Is it like the good place? The bad place? It's a very good place. It's a very good place.
Starting point is 00:49:52 That could be the sequel. A very good place. There you go. But yeah, so basically we just take the markdown, put it into this sandbox.bio kind of template, and if it uses a tool that I've already compiled to WebAssembly, we can just use it directly. If not, then we have to bang our heads
Starting point is 00:50:12 against the wall, figure that out first and then put it in. We just had a conversation too. What was that conversation about, Jared? Gosh. Ascinima. Kind of similar to this in a way i mean you're it's not tutorial but it's recording what you did so it's almost it's a playback right in an emulation state i mean
Starting point is 00:50:33 if you can rewind and touch and feel and kind of like delete that'd be kind of cool too it's not quite the same but it's got the similar fidelity yeah the fidelity is there like it's literally the an example of what was recorded and so this is probably an example of what could be real life. So they're very similar in that way. What am I trying to say, though? What are you trying to say? Is embeddings and using this thing to like, is this something where you said it's sandbox.bio? Yes.
Starting point is 00:51:00 Okay, so that's the URL. Yes, that's for the tutorial website. And so you're using this to show off tutorials you want to show off, right? Correct. And can I author my own tutorials and put them on there or take them and do some, like, how can I, if I believe in what you believe in with this thing and I want to do my own things, I want to show off whatever. Yeah. So we're not yet at the point where we can, you know, have an automated system where you can log in and create tutorials. But typically the way it works is you email me.
Starting point is 00:51:29 Oh, yeah. Hey. No way. Yeah. Okay. Classic collab. Could you fork the repo or something like that? Sure.
Starting point is 00:51:37 Yeah? Yeah. And, like, if you want to just play with having Debian in the browser, you could also look at V86, which is what I'm using to emulate it. And you could run it on your own site or if you want to embed it or all that's possible. Yeah. Well, I was actually thinking about this recently and I just did this with like screenshots. I did a fresh install of, because I've been messing with Ubuntu 2204, or sorry, 2304, and I just did a, you know, I got a redundant OS installation. I've got two disks. I've got a swap. I've got a boot. You know, I've got root and all that stuff like that. And so rather than just choosing one drive,
Starting point is 00:52:17 I want to have the system be fully redundant by having two drives and mirror. And I want to, I like to show that off either in written but the only way i could do it really was like through screenshots and then right around those screenshots now will i do a full emulation it'd be kind of cool to have all of what i already have but then at the end or somewhere else a sidecar would be like here's literally the environment to go and do just that you've got two discs so when you get to that part, you can configure these disks, and you can follow my instructions. So rather than having to pull down a VM or Proxmox
Starting point is 00:52:51 or actual hardware you take a USB stick and boot up into and do the full thing yourself, it's accessibility to what's kind of trivial to some, redundant OS installation on Linux, but there's a lot of steps in there. There's a lot of steps in there, and choosing the partition, adding the partitions, and giving them the paths and stuff like that,
Starting point is 00:53:09 and adding them. It's a mess, really. So I want to do the example through screenshots, but the best version of that really would be an interactive playground they could do and just follow the steps. Yeah. I'd be curious to see if it works with all the configuration of like disks and BIOS and all that combination. I think, well, if I were doing it, it would be the happy path.
Starting point is 00:53:33 You would only have two disks. I mean, sure, you can go with one disk, but that's not why you're here. You're not here to configure one disk. You're here to configure two disks in redundancy. And so it would be the happy path of being able to configure Ubuntu, a new system with two disks with redundancy, and it would walk you through all that stuff. That would be kind of cool because you could literally see what you would see on your screen if you were in your home lab doing this
Starting point is 00:54:01 or in the environment you're in doing this. And to me, that's, that's like empowering. Yeah. Because now every system I want to have this like rock solid, I'm going to use my own tutorial for my future self. Right. This is how you do it, Adam. You know what I mean?
Starting point is 00:54:16 Yeah. I think that would be super powerful use case for that. I'm thinking like Niskraft tutorials, you know, Niskraft. Yeah. Yeah. A website that we all find eventually. Yes. Whenever you're trying to. When you Google anything, Nix,raft tutorials. You know NixCraft? Yeah, yeah. The website that we all find eventually. Yes, whenever you're trying to. Whenever you're trying to.
Starting point is 00:54:27 Anything Nix, Linux. Oh, yes. And so like his, he's got really detailed tutorials, but it would be really cool. And they're step-by-step. Type this, type this, right? It would be really cool if each one had a button that's like launch an emulation, and you can follow the tutorial in an emulator. Yeah, that would be amazing.
Starting point is 00:54:42 That's what I'm talking about. See, you're where I'm at. I am where you are. I described it. I'm connecting the dots. I went the long way around the lake, that would be amazing. That's what I'm talking about. See, you're where I'm at. I am where you are. I described it. I'm connecting the dots. I went the long way around the lake, and he's like, let's just go across the lake. On a boat. On a speedboat.
Starting point is 00:54:51 It's kind of like how we talk to Chad GBT, you know? Yes, that's right. I get straight to the point. Thank you, Chad GBT. Adam has a very cordial conversation with him. Oh, yeah. That is great insight, Chad GBT. Tell me more.
Starting point is 00:55:04 So, like, use cases like that, I think, would be really powerful. How far away are we from that? You should do this, man. Make it a thing. I would love to, but first of all, I know very little about hardware stuff. So, this would need a collaboration of sorts. So if you're listening to this and you can fill in the gaps where Robert has them, email him. If you want to collab, if you want to...
Starting point is 00:55:33 Robert at... Fork. Sandbox.bio. No. That's not his email. Okay. Well, my email's quite long. Robert.abukalil at gmail.com.
Starting point is 00:55:43 Okay, there we go. We'll throw that in the show notes for folks. And the repo lives on GitHub. We'll link that up. Cool. Cool stuff, man. I like it. So much possibility.
Starting point is 00:55:56 Yeah. So much potential. And I believe you can do it. I love it. And you should do it. We should. Let's do it. Thank you for doing all you've done so far.
Starting point is 00:56:02 Let's do it. Wasm. All right. Thanks for talking to us. Yeah, thanks for having me. I'm sure this was better than the first one. I think it was, yes. I'm sure.
Starting point is 00:56:11 Jared's like, I'm sure. We'll see. If it ships, then you'll know if it's good. That's true. The last one never shipped. Listen, that was terrible, too. You should diff it. Maybe I just said the same thing.
Starting point is 00:56:22 I don't remember. Oh, yeah. You could transcript it. Transcript it and diff it. There's an idea. What's up, friends? I'm here with one of our good friends, Firas Aboukadej. Firas is the founder and CEO of Socket.
Starting point is 00:56:46 You can find them at socket.dev. Secure your supply chain, ship with confidence. But Firas, I have a question for you. What's the problem? What security concerns do developers face when consuming open source dependencies? What does Socket do to solve these problems? So the problem that Socket solves is
Starting point is 00:57:03 when a developer is choosing a package, there's so much potential information they could look at, right? I mean, at the end of the day, they're trying to get a job done, right? There's a feature they want to implement. They want to solve a problem. So they go and find a package that looks like it might be a promising solution. Maybe they check to see that it has an open source license, that it has good docs. Maybe they check the number of downloads or GitHub stars. But most developers don't really go beyond that. And if you think about what it means to use a good package, to use a good open source dependency, we care about a lot of other things too, right? We care about who is the maintainer?
Starting point is 00:57:35 Is this thing well maintained? From a security perspective, we care about does this thing have known vulnerabilities? Does it do weird things? Maybe it takes your environment variables and it sends them off to the network, you know, meaning it's going to take your API keys, your tokens. That would be bad. The unfortunate thing is that today, most developers who are choosing packages and going about their day, they're not looking for that type of stuff. It's not really reasonable to expect a developer to go and open up every single one of their dependencies and read every line of code, not to mention that the average NPM package has 79 additional dependencies that it brings in. So you're talking about just, you know, thousands and thousands of lines of code. And so we do that
Starting point is 00:58:14 work for the developer. So we go out and we fully analyze every piece of their dependencies, you know, every one of those lines of code. And we look for strange things. We look for those risks that they're not going to have time to look for. So we'll find, you know, we detect all kinds of attacks and kinds of malware and vulnerabilities in those dependencies. And we bring them to the developer and help them when they're at that moment of choosing a package. Okay, that's good. So what's the install process? What's the getting started? Socket's super easy to get started with. So we're, you know, our whole team is made up of developers. And so it's super developer friendly, we got tired of using security tools that send a ton of alerts, and we're hard to configure and just kind of noisy. And so we built socket to fix all those problems.
Starting point is 00:58:54 So we have all the typical integrations, you'd expect a CLI, a GitHub app, an API, all that good stuff. But most of our users use socket through the GitHub app. And it's a really fast install. A couple clicks, you get it going, and it monitors all your pull requests. And you can get an accurate and kind of in-depth analysis of all your dependencies. Really high signal to noise. You know, it doesn't just cover vulnerabilities. It's actually about the full picture of dependency risk and quality. So we help you make better decisions about dependencies that you're using directly in the pull request workflow, directly where you're spending your time as a developer.
Starting point is 00:59:29 Whether you're managing a small project or a large application with thousands of dependencies, Socket has you covered and it's pretty simple to use. It's really not a complicated tool. Very cool. The next step is to go to socket.dev, install theub app or book a demo either works for us again socket.dev that's s-o-c-k-e-t.dev So we're here with M. Scott Ford. You have a name like a great novelist. Have you ever been told that? No, I have not been told that. M. Scott. Just call you Scott, right?
Starting point is 01:00:17 Yeah, just Scott. What does the M stand for? Is it public? Okay. Yeah, my parents named me Matthew Scott, but never called me Matthew. Huh. They must have decided later they liked the middle name better. Yeah.
Starting point is 01:00:29 So you made a mistake. There's a story there. The middle name is the first name. There's a story there somewhere. Yeah, I don't know that I ever got the full story. Okay. Could be a conspiracy. Yeah.
Starting point is 01:00:40 You and I go way back. Yeah. Years and years. Your wife, Andrea, was a speaker at my conference. Yep. Probably a decade ago. I don't know. Yep.
Starting point is 01:00:52 Listener of the show. I think we communicated. Yeah, I've been listening to the show for quite a long time. I came on your guys' podcast, Legacy Code Rocks. Legacy Code Rocks, yep. Probably a decade ago. Yep. Always good to see you.
Starting point is 01:01:03 I think we've met once or twice before, but good to have you here. Not so- Yeah, I met you at Sustain. I think you recorded me and Andrea for that. Right on. Lots of history. Lots. And you co-own Corgi Bites, which is a consultancy. How do you describe yourselves? Yeah, so we focus on kind of modernization and maintenance
Starting point is 01:01:27 and just kind of the joy of making improvements to software systems. And that's, you know, we have a team of people who love making code better. Building out test suites, fixing bugs, paying down technical debt.
Starting point is 01:01:42 Yeah. I was talking with Adam yesterday. I love fixing bugs, paying down technical debt. Yeah. Yeah. Like I was talking with Adam yesterday. Like I love fixing bugs. Like just going through a list of bugs and finding and fixing them. Guess what's available. That's like so much fun. ilovebugs.com. Seriously?
Starting point is 01:01:57 Yeah. It was like $4,200. But yeah, it's available. That's available. That's not quite available. That's affordable. Yeah, it's true. And today's, well, I guess we spent $1,000 on changelog.com.
Starting point is 01:02:07 Okay. That was eight years ago. Yeah, because before it was like cchangelog.com. Yeah. But if you were really passionate about bugs, you would have the domain ilovebugs.com. Somebody's out there holding that thinking, someone's this passionate about bugs. They're going to give me that. This is available on the market.
Starting point is 01:02:22 4,200. This isn't like a broker. This is available on the market. $4,200. This isn't like a broker. This is available on the market. $4,200. Yeah. It's a premium domain, so they're holding it as like a premium cost domain.
Starting point is 01:02:30 Well, cash is tight these days. Yeah. So Corgi Bytes has been a longtime business. Yeah. So it was founded in 2008. I had no idea what I was going to do with it.
Starting point is 01:02:41 It was pretty much just a name. And then Andre came on and we started doing consulting. We did small little websites at first and didn't really enjoy that and was trying to figure out what is it that I liked doing and then stumbled in on, I love fixing code. I love turning into a mess into something that looks new. So like a brown field into a green field.
Starting point is 01:03:05 Right. That transformation process is something that I genuinely enjoy doing. So building a company around that has been a lot of fun. There's people who like brand new cars, and there's people who like to restore old cars. And those people tend to be different people, you know? Yeah. And some people just love that. Well, and I think, love that well i think like like
Starting point is 01:03:29 like for me like like i i've sometimes fantasized like if i had enough money and time to do it i would probably love getting like a a late 1990s era car and like fixing it up and turning into an ev like so so like kind of like it's almost like for me, sometimes it's the bridge of the old and the new. So taking something that's old and breathing new life into it and making it do more than it used to, making it better than it was before. I love it too. I mean, you and I, we found common ground. I did some rescue projects back when I was consulting. I loved it.
Starting point is 01:04:02 I kind of like being the hero. You know, like this is all bad. It's like, well, here comes Jared. He's going to make it better. Yeah. And I think, I think for me, it's less about the hero and more about, you know, there are folks who think it's not possible and it's, it's almost like, it's almost like a challenge and like a hold my beer kind of moment. Like, like, no, this, we can turn this around. You don't have to start over. This can be made better. What's the gnarliest turnaround you've done? Maybe in terms of lines of code or time spent or you thought you weren't going to be able to do it?
Starting point is 01:04:33 Yeah, so there was a system several years ago that they were on a cloud server and they weren't doing a very good job keeping the underlying server up to date. So I wanted to help them move from infrastructure as a service solution to more of a platform as a service solution because I thought that the organization
Starting point is 01:04:55 would be able to do a better job keeping up with that. And then they wouldn't have to worry about OS-level updates anymore. They could just kind of focus on their code because the OS-level updates were way behind. Eight years behind. They hadn't done any Windows updates on this Windows server for eight years.
Starting point is 01:05:16 That was a challenging transition. It took a lot longer than I thought it would. Ended up crediting the client some time because of that and just kind of recognizing that I thought it was going to go easier crediting the client some time because of that and just kind of recognizing that I thought it was going to go easier than it actually turned out to be. We kept finding services that were running on that server in the background that we didn't know about. One of them we didn't have a source code for. That was fun to grapple with that as a challenge. That was definitely one that was difficult.
Starting point is 01:05:46 Okay. Long-standing business hits against this recent macroeconomic downturn. Yes, it is. And it's gone south, huh? It has been challenging. So we've lost a significant amount of our revenue. Our team is probably about a quarter of the size as it was a year and a half ago. And I've talked with other business owners that have, you know,
Starting point is 01:06:08 companies of similar business model to ours, software services, and there are a lot that have been hit really hard. A lot have gone out of business. Andrea said she had read an article with a, I forget who it was. I could probably find it if you wanted it for show notes, but it had a quote in there that there's an extinction level event for small software companies going on right now. And there's a lot more talent on the market, so from a services perspective, it's a lot
Starting point is 01:06:40 easier for companies to hire full time than it used to be. So I think there's less motivation to work with contractors or stretch your team out that way. I also think it's just a way that organizations have been trying to cut expenses and cut costs. And when you look at a balance sheet and when you look at a profit and loss statement, contractors come out of a different part of that than full-time employees do. So, you know, for your investors, you know, it can look like the organization's doing better if you cut those expenses, you know, kind of further down on the profit statement. So, yeah, I think, you know, all of the economic factors that are going on right now, so inflation, interest rates, two wars, the small, medium-sized bank failures.
Starting point is 01:07:33 I think Silicon Valley Bank really caused a lot of VCs to really pull back some money. I've heard stories of companies that were funded with, say, $30 million, had their funding pulled, and so the business had to shut down, where the investor was just like, the money I've given you I want back, or the money I haven't given you yet, you're not getting. So that's definitely a challenge that's going on right now. So I kind of think of that VC funding almost as like plankton in an ecosystem,
Starting point is 01:08:04 and that dries up and the smaller fish get affected first. And then they're not using services from the bigger fish. And so they start to get affected. So I think there is that ripple effect to the ecosystem. Is that similar to krill? Plankton's like krill? Yeah, maybe plankton's like krill. The little guys, basically.
Starting point is 01:08:23 The smallest of the small that the whales chase. Yeah. And then that dries up and you've got a big whale that's just hungry. And the big whale can go without food for a little while, but it's going to start to affect it, too. Then what does it eat? It's like, oh, man, my krill's gone. I guess it'll just die. We think about this, too.
Starting point is 01:08:41 How has the market shifted in terms of what it perceives as value? Because when you have less, you scrutinize more. You think, well, was that really, did I just spend my money there because we had the money? Yeah, that's right. And we thought it was viable, and so it was viable. And now that we reconsider, because I think in the last three years since the pandemic, we've basically, the whole globe has been reconsidering almost everything. Absolutely.
Starting point is 01:09:08 Right? three years since the pandemic, we've basically, the whole globe has been reconsidering almost everything. Absolutely. Right. And so in a, in a reconsideration of what the value is, do you think that the value of these rehab projects has changed or do you think it's just that there's no money? Uh, I think the value, I think the value has changed. I also think that low code, no code platforms have had a factor as well. You know, it's a lot easier for, it's a lot easier to build something kind of quick and dirty that, you know, might meet your immediate needs, and maybe do that as an experiment for starting over
Starting point is 01:09:38 without having to, like, engage a development team. And, you know, that's a capacity that's great. Like, you know, it will be an enabler for business and so I think like on the larger economic scale that's that's good and you know it does kind of affect the organizations that would have helped build the thing that that low code no code platform you know is now building instead yeah the I do think that for the maintenance side, I predict in the next five years,
Starting point is 01:10:07 kind of within the next five years, you'll have organizations that have really built a lot on top of those low-code, no-code platforms and start to bump up against the constraints and want to start to break out. And so I think there'll be a market for helping organizations move that functionality outside of those platforms or find ways to move that functionality outside of those platforms
Starting point is 01:10:25 or find ways to extend that functionality maybe through extensions that the vendor provides or things like that where there's custom software that needs to be built there. I do see that as an opportunity. And yeah, that has an effect. And I'm sure AI is having an effect at some point as well. I don't know how to quantify that.
Starting point is 01:10:49 And I imagine it could just be part of like a wait and see on a lot of organizations when they're trying to make hiring decisions or how they're going to grow their team. Maybe they're just waiting to see how productive their teams are going to be and how that productivity might change as they start leveraging AI. You mentioned in our conversation yesterday, which was not on the air, obviously, and to some degree even TMI, but you mentioned this desire to, or essentially the business model's wrong. I'm TLDRing it and you can fill in the gaps. The business model's wrong or it needs to change and you considered products in and around what you already do, but a product that you can buy that has a finite value that's maybe easier to buy even.
Starting point is 01:11:30 Yeah, because there are a lot of problems that we've seen over the years that many teams have been facing. And I do think there's a market for building solutions to help teams solve those problems themselves without having to hire an outside contractor or an outside team. And so there are aspects that I think could be productized. And we've gotten started a little bit on one product. We've been working on it for a couple of years, don't really have, you know, we've got like an alpha demo that we've shown to people. I've gotten some feedback on, we're still kind of working. We're hoping to have a beta out, you know, probably first quarter next year is kind of realistic for having something that people could actually sign up for
Starting point is 01:12:08 and give us better feedback on. That's called Freshly. It's around analyzing dependency freshness and looking at how fresh or out of date software dependencies are, like third party dependencies, most of them open source dependencies, and really assessing the quality of an application or a project from that perspective. We also wanted to be able to assess the, at multiple levels of the, you mentioned Adam that you're not a big fan of supply chain
Starting point is 01:12:41 as a term for the C2 system. It's generally a pejorative. Yeah. Open source is not a supply chain. It is a commons. Right. It's not a supply chain we just tap into and get. Yeah. It's a negative. Yeah.
Starting point is 01:12:52 If you think about your dependency graph, I think it would be great to evaluate multiple nodes on that dependency graph and not just evaluate your node. So how well are the upstream projects that you're depending on, how well are they keeping up with dependencies that they're managing? And so I think that could be some pretty good meta-analysis as well. A way to maybe even measure the health of a project that you're thinking of working with. And the similarity between maintenance, this idea of freshly, how old are my idea freshly how old are my dependencies how fresh
Starting point is 01:13:28 are my dependencies and this aspect of security because a lot of maintenance or even like a refresh on a project like you've talked about it's kind of a security burden like some of these products might be security-esque that you're talking about? Yeah, and so I think, you know, having out-of-date dependencies, one of the motivations for upgrading them is very much to try to avoid security issues. That's one of the motivations. I think there's also motivation around team productivity. It's a lot easier to work with the latest version of a library than it is an older version, just in terms of finding documentation.
Starting point is 01:14:03 You know, when you go look for the documentation for a project, you're gonna find the latest. The latest version is gonna be easiest to find. It's usually findable, yeah. Yeah, blog posts are gonna usually cover more recent versions than what you're working with, has been my experience. So, but yeah, on the security angle, you know, that I think is a big motivator
Starting point is 01:14:22 to try to avoid some of those security issues. And a lot of of people we've put the product in front of to kind of give demos They told us in addition to just seeing how out-of-date things are they do want some perspective of how security plays a factor So I've taken one of the dependency freshness measures that we're using is is called Libere and you can learn more about that at Libere comm and then I've taken a security approach to that and built what I call like a liability index, which computes a similar metric as Libere, but it looks at instead of where Libere looks at the distance in time between the version that you're using and the latest version, the Liability Index, which I published at liabilityindex.com.
Starting point is 01:15:05 We haven't implemented a version of it yet. But it looks at the version you're using and the distance between the next version that doesn't have any published vulnerabilities. So if the version you're on has published vulnerabilities, how many years in the future do you have to go in order to find a version that doesn't have any published vulnerabilities. And so I think that could give more of a security-focused approach to that. And maybe even looking at different levels for liability index at the critical level or different severity levels. Let's make sure you thing about Sourcegraph.
Starting point is 01:15:45 Like Sourcegraph is an intelligence platform that helps you understand code. Part of that understanding is like, is my stuff vulnerable or prone to vulnerabilities? And one of the things that we're trying to do that's unique with Freshly is not just capture how things are right now, but capture how they used to be and graphing that over time.
Starting point is 01:16:04 So these metrics that we're collecting and we're computing were mining information from the source code repository and computing what these metrics would have been like in the past, and graphing that information. And I think the trend can really paint a really interesting picture for leadership, and hopefully get budget for some of these improvement efforts. Something I've seen on a lot of teams is there'll be engineers on the teams who are aware this
Starting point is 01:16:29 is a problem. They want to fix it. They don't like that they're living with the status quo and they feel like their leadership hasn't given them enough flexibility to really go in and solve the problem. They feel like they're told to obsess over features instead and some of these essential maintenance activities get deprioritized. Sure. And you think bubbling that up to somebody with decision-making would help them?
Starting point is 01:16:52 That's my hope is that if leaders, the people who are kind of in control of the priorities and people who are in control of funding, if they had a better understanding of the problem, I think they would make different choices. I think in a large respect, how out-of-date dependencies are is it's invisible. It's even invisible to the team a lot of times. They pull in a package, they start using it, and they move on. There's not really much to help them stay up-to-date and keep aware of that. That's starting to change a little bit with different package ecosystems.
Starting point is 01:17:24 I feel like NPM is doing a pretty good job with letting people know when things are out of date, when they do a, you know, an NPM install. You know, NPM outdated is a, you know, a really good tool set for folks and has really good output and, you know, and it, you know, it's easy to read. And I think more package ecosystems are starting to adopt that, that strategy and that approach. My hope is that that helps and kind of increase awareness. I really do think it's interesting to see how well the team has been doing at keeping up
Starting point is 01:17:55 with that churn, and obviously, because of supply chain attacks, again, that's what they're called in the security ecosystem is supply chain attacks. Sorry, Adam. I don't think it's the right term, but it is that term. So I'm cool with it. This is all in conversation because I was talking about WebSocket
Starting point is 01:18:13 and how they secure the open source supply chain. So we were like, I'm like, you get it. So socket security you're talking about. Socket. Not WebSocket. Gosh, I'm such a fool. Oh, socket security. Socket. Not WebSocket. Gosh, I'm such a fool. Oh, socket security. Socket.
Starting point is 01:18:27 Okay, suck. Anyways. No worries. Strike that. We'll fix it. We'll edit that out, like Matt says. Staying in. Socket.
Starting point is 01:18:37 Thank you for helping me out on that. So supply chain attacks are definitely a big risk, and you can have an upstream library that gets taken over by a nefarious actor. And so staying up with the latest and greatest all the time, so just like if you're using the Pendabot, just merging those in blindly, that might not be the best idea
Starting point is 01:18:57 because you do make yourself vulnerable to some of those vulnerabilities. Totally. You know, at the same time, you don't want to let yourself get months out of date. Right. Where's the balance? Yeah, because with the Equifax breach from 2017,
Starting point is 01:19:14 that was one Apache struts dependency. On the date that they were attacked, they were out of date by two months for the library that had the patch for that vulnerability. So the two-month window for that project, and that was a very impactful vulnerability. It was a very impactful event. It affected a lot of people. The freshness of that library was stale by two months.
Starting point is 01:19:40 Yes. When you look at that particular vulnerability, I don't know if all the vulnerabilities were patched in that release, but I know that the vulnerability that they were ultimately exploited on was two months out of date. And I think a lot of it is a lot of teams don't make updating things a regular part of their practice. It tends to be really challenging.
Starting point is 01:20:04 It takes a lot of effort to upgrade some of these dependencies, especially if they include breaking changes. A lot of times software systems are really tightly coupled to these dependencies. So upgrading them is really non-trivial. And so I think, you know, kind of going back to like Martin Fowler has a quote where if something is difficult, you need to do it more often. So if software teams got in the habit of updating dependencies more often and kind of doing it as a practice and really,
Starting point is 01:20:31 you know, devoting time or even maybe devoting a team member whose job it is to stay on top of this stuff, then I think, you know, that could really help turn things around and keep projects healthier. But on the other side, the supply chain attacks, like the event stream one, et cetera, those hit people who don't have their dependencies pinned to a version, and their CI is just going to pull the latest. And so that's the other side, that's too fresh. Yes.
Starting point is 01:20:55 So what is the right balance? It seems like unless you have a known vulnerability, staying one minor release behind is actually a best practice. Yeah. And once there is a known vulnerability, staying one minor release behind is actually a best practice. Yeah. And once there is a known vulnerability, now you've got to get up immediately to the latest. I don't know. That could be a really good strategy. Yeah.
Starting point is 01:21:13 And, yeah, I think, you know, and it also comes down to risk tolerance. And different organizations have different levels of risk tolerance, you know, state, you know. And there are organizations that aren't interested in staying on the bleeding edge and i think there is a good argument to be made for if something's not broke then don't fix it just because it's old doesn't mean it's bad right um but i i do think that you do have these productivity impacts and you do have these security impacts when you are working with older older libraries and older versions of frameworks yeah well i mean hopefully these products will be a new, breathe new life into Corgi Bites.
Starting point is 01:21:49 Yeah, I think, you know, it'll be a little bit of transformation, you know, kind of like in the cycle of, you know, growth and reinvention and rebirth. And I think, you know, that will be, you know, part of the life cycle. This, you know, we had, when we were focused as a business on, you know, that will be, you know, part of a life cycle. This, you know, we had when we were focused as a business on, you know, building small websites, you know, like building five-page websites, stuff like that. You know, that business model didn't last very long. And, you know, kind of the business went into an incubation period and was reborn out of that. You know, that might be what's about to happen again.
Starting point is 01:22:23 We'll see. You never know. That does make sense. You've got to evolve when change happens. Resilience is change, really, essentially. You've got to change with the change. That's right. A wise man once said.
Starting point is 01:22:35 Was that you? Maybe. Martin Fowler? I don't know. Well, good luck on that change. Yeah. Thanks. Good luck navigating it.
Starting point is 01:22:44 I appreciate that. The product direction, I agree with Jared. It does sound like the way to go. Because if you can give an executive in I don't know what time frame something that is authoritative and finite in terms of there is lack of freshness or you're this far behind best practices or some sort of indicator that says, I'm not hearing it from my developers who, in quotes, whine or complain that I lovingly trust, but really I need this authoritative thing that says, hey, get your stuff together. Yeah, and trying to give engineering teams a way to translate the data that the system is collecting in a way that can be easily consumed by their leadership. So, instead of having a graph with a whole bunch of data on a webpage and then sending,
Starting point is 01:23:36 trying to get your manager to log into that, instead generate a PowerPoint deck and do something you can toss into an email and forward to somebody. And in there can be a link to that dashboard, if somebody wants to see the dashboard. Here's our vulnerability score, something like that. Or here's our staleness factor, or freshness factor, or freshly factor, or whatever it might be.
Starting point is 01:23:56 And that could actually be quite good at marketing too for you. Because then it becomes maybe a race or a competition of sorts with executives or CEO to CEO. Like, hey, what's your freshness factor? And then help even within an organization that might have a portfolio of projects. Are there projects that are doing better than others? And then getting curious about the teams that are doing better. What are they doing differently? And is there knowledge that those teams might have
Starting point is 01:24:27 which might make sense to share with other teams? Yeah. Good plan. Yeah, man. You should do it. Thanks. Working on it. It just takes time.
Starting point is 01:24:34 Building software, it takes time. It takes time. Even with AI's help, right? It still takes time. I can't just stand on my fingers and say, Hey, GitHub Copilot, build this for me. Or hey, AWS Code Whisperilot, build this for me. Or AWS CodeWhisperer, build this for me.
Starting point is 01:24:50 You still got to fix those bugs that it spits out at you. That's right. Well, thanks for stopping by, Scott. Yeah, appreciate you letting me chat. You bet. Well, the year is almost done. 2024 is almost here. You know, it's the end is almost done. 2024 is almost here. You know, it's the end of a year.
Starting point is 01:25:11 And it's that time you think, man, what's next, right? What's coming next? What's next for me? What's next for the world? What's next for tech? What's next for software? And all the above. Well, I do know what's next for this podcast. I can say that there is an episode of Friends coming
Starting point is 01:25:25 out momentarily. And then I can also say that next week there is an epic episode. Jared and I are back for State of the Log. Brakemaster Cylinder helped us up our game this year, even more so than years beforehand where, well, I'll just save it for the episode. Let's just say you want to check it out. It's the end of the year, but we're going to be back next year. More good stuff, more good things.
Starting point is 01:25:52 And I hope you have a safe holiday. Enjoy your family. And it's also good to say thank you, right? Thank you to you, of course, for listening to this podcast. Thank you to our Plus Plus subscribers. And then thank you to Fastly for supporting us all these years. And then, of course, thank you to Fly.io and our friends over there for supporting us all these years. Our friends at TypeSense, our friends at Sentry, our new friends at Neon. And everyone in between.
Starting point is 01:26:26 Thank you. Thank you. And of course to the beat freak in residence, Break Master Cylinder. Those beats. Those beats. But hey, we'll see you soon on Friends. We'll see you next week for State of the Log. But that's it.
Starting point is 01:26:43 This show's done. We'll see you next week for State of the Log. But that's it. The show's done. We'll see you very soon. Game on.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.