The Changelog: Software Development, Open Source - What exactly is Open Source AI? (Interview)

Starting point is 00:00:00 Welcome back friends. This is the changelog. I'm Adam Stachowiak and this week we're joined by Stefano Maffoli, the executive director of the Open Source Initiative, the OSI. The Open Source Initiative is responsible for representing the idea and the definition of open source globally. Stefano shares the challenges they face as a U.S.-based organization with a global impact. We discuss the work Stefano and the Open Source Initiative are doing to define open source AI and why we need an accepted and shared definition. Of course, we also talk about the potential impact if a poorly defined open source AI emerges from their efforts. I also want to mention that Stefano was feeling under the weather for this conversation, but he powered through because of

Starting point is 00:01:02 how important this topic is. A massive thank you to our friends and our partners at Fly.io, the home of changelog.com. It's simple. Launch apps near users. They transform containers into micro VMs that run on their hardware in 30 plus regions on six continents.

Starting point is 00:01:20 Launch an app for free at fly.io. What's up, friends? This episode of The Change Log is brought to you by our friends over at Vercel. And I'm here with Lee Robinson, VP of Product. Lee, I know you know the tagline for Vercel, Develop, Preview, Ship, which has been perfect. But now there's more after the ship process. You have to worry about security, observability, and other parts of just running an application in production. What's the story there? What's beyond shipping for Vercel? Yeah, you know, when I'm building my side projects or when I'm building my personal site, it often looks like develop, preview, ship. I try out some new features. I try out a new framework. I'm just hacking around with something on the weekends. Everything looks good. Great. I ship

Starting point is 00:02:13 it. I'm done. But as we talk to more customers, as we've grown as a company, as we've added new products, there's a lot more to the product portfolio of Vercel nowadays to help pass that experience. So when you're building larger, more complex products, and when you're working with larger teams, you want to have more features, more functionality. So tangibly, what that means is features like our Vercel firewall product to help you be safe and to have that layer of security. Features like our logging and observability tools so that you can understand and observe your application and production, understand if there's errors, understand if things are running smoothly and get alerted on those. And also then really an expansion of our integration suite as well, too, because you might already

Starting point is 00:02:54 be using a tool like a data dog or you might already be using a tool at the end of this software development lifecycle that you want to integrate with to continue to scale and secure and observe your application. And we try to fit into those as well too. So we've kind of continued to bolster and improve the last mile of delivery. That sounds amazing. So who's using the Vercel platform like that? Can you share some names? Yeah, I'm thrilled that we have some amazing customers like Under Armour, Nintendo, Washington Post, Zapier, who use Vercel's running cloud to not only help scale their infrastructure, scale their business and their product, but then also enable their team of many developers to be

Starting point is 00:03:35 able to iterate on their products really quickly and take their ideas and build the next great thing. Very cool. With zero configuration for over 35 frameworks, Vercel's front-end cloud makes it easy for any team to deploy their apps. Today, you can get started with a 14-day free trial of Vercel Pro or get a customized enterprise demo from their team. Visit vercel.com slash changelogpod to get started.

Starting point is 00:04:00 That's V-E-R-C-E-L dot com slash changelogpod. Well, Stefano, it's been a while. Actually, never, which is a good thing, I suppose. But now we're here. Fantastic. We were at All Things Open recently, and we tried to sync up with you, but we missed the message. And so we were like, we got to get you on the podcast. And obviously, you know, this show, The Change, was born around open source. And I kind of find it strange and sad that we've never had anybody from the open source initiative on this podcast.

Starting point is 00:05:00 I'm glad you're here to change that. So welcome. Thank you. Thank you for having me It's a pleasure Sorry we missed each other in South Carolina It was a great event Oh man, we love All Things Open

Starting point is 00:05:14 We love Todd and their team there We think All Things Open is the place to be at the end of the year If you're a fan of open source you're an advocate of open source and just the way that it's permeating all of software right it's one open source is one and now we're just living in a in a hopefully mostly open source world right oh absolutely absolutely i mean just last week was a it was an article published that estimated the the value of open source software as a whole.

Starting point is 00:05:45 The numbers are incredible. These researchers from Harvard Business School went and looked at the value of open source as it is consumed or produced, and they put dollar numbers on it. I envy those people because I don't know how. I'm not an analyst. Jared, maybe you're like

Starting point is 00:06:05 a somewhat of an analyst, right? Like you have an analytical brain from how I know of you. Okay. I don't know how you would quantify the value of open, I mean, I know it's quite valuable, but literally how do you value,

Starting point is 00:06:16 how do you quantify the value of open source? Like what do they do? What are the metrics they key off of? Do you know? They counted lines of code. They counted the hours.

Starting point is 00:06:24 They estimated the hours that it would take to rewrite from scratch all the software that is in use. And they used datasets that are available already with some of those counts. And using those two datasets, they estimated the value that it would take to replicate all of the open source software that is available. And they put the numbers around $8.8 trillion. Wow. I would actually just say all the dollars, really. Personally, I would just say all the dollars.

Starting point is 00:06:54 Yeah, well, I mean, it's a huge number. All the dollars. Right. Doesn't every dollar today really depend on open source at some layer? So really, couldn't it be just all the dollars? Well, right. It's an impressive number, and it's really hard to picture really, couldn't it be just all the dollars? Well, right. It's an impressive number, and it's really hard to picture

Starting point is 00:07:08 it, how big it is. I had to go look it up. So it's three times as much as Microsoft market cap, and it's larger than the whole of the United States budget. Like, 2023's budget in the

Starting point is 00:07:23 United States, that includes the military. That's, that includes the Medicare, 6.3 trillion. Whew. Yeah. That's a lot of trillions there. Right. More trillions than I've got, Jared, of anything. Right? I don't got trillions of anything, really.

Starting point is 00:07:36 Maybe not even in cents. Can I get a trillion cents? I don't think so. You don't keep a bucket? I don't know. I almost asked Siri to tell me. Yeah, go turn those into the bank and see what they'll give you. Well, that's fun to think about, really.

Starting point is 00:07:48 Well, I hear a number like $8.8 trillion, and I start to think, why don't you round that up to $9? And then I realize that's like a fifth of a trillion dollars if you're going to round it. That's a lot of money to round. That is a nice rounding error in your favor if it was your own dollars. Right? Oh, yeah. I wouldn't mind that. For sure. Yeah, round it off.

Starting point is 00:08:08 Hand it out to some folks. Hand it out to some maintainers. That would be nice. Yeah. Well, I don't know if everybody listening to this podcast will be, I think a lot of them will be, but in light of recent feedback, Jared, I don't want to assume that our listenership

Starting point is 00:08:22 is super informed of what the open source initiative is. I can kind of read from the about page, Stefano, but I'd prefer that you kind of give us a taste of what the OSI is really about. What is the organization? It's a 501c3. It's a public benefit corporation in California. But what exactly is the open source initiative for all that value we just talked about? What is it?

Starting point is 00:08:46 Oh, yeah. In a nutshell, we are the maintainers of the open source definition. And that's the open source definition is a 10 points checklist that has been used for 26 years. We have celebrated 25 years last year. It's the checklist that has been used to evaluate licenses, that is legal documents that come together with software packages to make sure that the packages, the software comes with freedoms that are written down as, can be summarized as four freedoms,

Starting point is 00:09:20 comes from the free software definition. That is the freedom to use the software without having to ask for permissions, the freedom to study and to make sure that you know and do understand what it does and what it's supposed to be doing and nothing else. And for that, you need access to the source code. And then the freedom to modify it and to fix it and increase its capacity or help yourself. And the freedom to modify it and to fix it and increase its capacity or help yourself, and the freedom to make copies that is for yourself or to help others. And those freedom

Starting point is 00:09:54 were written down in the 80s by the Free Software Foundation and the Open Source Initiative started a couple of decades after that, picking up the principles and spreading them out a little bit in a more practical way. At a time when a lot of software was being deployed and powering the internet, basically. This definition and this list of licenses gives users and developers clarity about the things that they can do. Provides that agency and independence and control.

Starting point is 00:10:34 And all of that clarity is what has propelled and generated that huge ecosystem that is worth 8.8 trillions. So who formed the initiative and then how did it sustain and continue? Seems like the definition is pretty set, but like what is the work that goes on continually? Yeah, well, the work that goes on continuously is, especially now recently, it's the policy, the monitoring of policy works and everything that goes around it. The concept of open source seems to be set, but it's constantly under threat

Starting point is 00:11:13 because evolution of technology, changes of business models, the rise of importance and power of new actors constantly shifts and tends to push the definition itself of open source in different directions, the meaning of open source in different directions. And regulation also tends to introduce hurdles that we need to be aware of. The organization, what we do, we have three programs. One is called the legal and licenses program. And that's where we maintain the definition. We review new licenses as they

Starting point is 00:11:52 get approved. And we also keep a database of licensing information for packages because often developers don't use the right words or miss some pieces. A lot of packages don't have the right data. And we're maintaining the community that maintains this machine called Clearly Defined. On the policy front, that's another program, the policy and standards front, we monitor the activity of standard-setting organizations and the activity of regulators in the United States and Europe mostly to make sure that all the new laws and rules and the standards can be implemented with open source code and the regulation doesn't stop or doesn't block the development of the

Starting point is 00:12:38 distribution of open source software. Then the third program is on advocacy and outreach. And that's the activities that we do with maintaining the blog, having the communication, running events. And in this program, we're also hosting the conversations around defining open source AI, which is a requirement that came out, especially a couple of years ago, very rapidly glowing of hotness at us. So we were basically forced to start this process because open AI is a brand new system, the brand new activities, it forces us to review the principles to see if they still apply and how they need to be modified or we can apply to ai systems as a whole and we are charity organization you mentioned that so our sponsors are individuals who donate to become members and they can donate any amounts from

Starting point is 00:13:39 fifty dollars a year up to what have you and we we have a few hundreds of those, almost a thousand. And then we have corporate sponsors who give us money, also donations to keep this work going. It's in their interest to have an independent organization that maintains the definition and having multiple of these donors, corporate donors, makes the organization stronger. So we don't depend on anyone seeing individually of them. So despite the fact that we get money from Google or Amazon or Microsoft and GitHub, we don't have to swear our allegiances to them.

Starting point is 00:14:19 Do you also defend the license so far as going to court with people who would misuse it or no it hasn't happened but we do have uh well i mean not under my watch but we do have experts and and uh now on our board and in our circle of um licensing experts we do have lawyers who go to court constantly to defend the license, defend trademark, protect users. And they're there as like expert witnesses. Exactly. And we do provide, we have provided briefs for courts, opinion pieces for regulators and responses to requests for information in various legislation here. How challenging is it to be a U.S.-based, founded idea, now organization that represents and defends this definition that really, you know, going back to the trillions,

Starting point is 00:15:19 like, I mean, all the money, all the dollars. Like, it's a world problem. It's not just a United States problem. How does this organization operate internationally? What challenges do you face as a U.S.-based nonprofit, but representative of the idea of open source that really impacts everyone globally? Yeah, that's a very good question. In fact, it is challenging. So I started at the organization only a little over two years ago.

Starting point is 00:15:47 And I'm Italian. And so I do have connections to Europe and knowledge about Europe. We do have board members that are based in Europe and other board members in the United States. And it is actually quite challenging to be involved into this global conversations because now a little bit like maybe in the in the late 90s open source is becoming increasingly getting at the center of geopolitical challenges and not because of open source per se but because software is so incredibly existing everywhere and most of that software that exists is open source. So there have been a lot of challenges as the relationship, the trade relationship with

Starting point is 00:16:31 other actors like Russia, Ukraine, now with the war in Israel and Gaza and the trade wars with China, between China and the United States. There are a lot of geopolitical issues that we are at the center of, and we're finding it really complicated. In fact, we do have raised more money to increase our visibility on the policy front. We have right now, at the moment, we have two people working, one in Europe and one is more focused in the United States. Both of them are part-time, but we do have budget to hire at least another one, if not two, policy analysts to help us review the incredible amount of legislation that is coming. We're just talking about in the United States. I guess even one more layer than that is that I don't know if it's a self-profession of the

Starting point is 00:17:29 defendership of the term of open source. I understand where it came from to some degree, you know, and I wonder if, how do you all handle the responsibility of not so much owning the trademark term of open source, but defending it. So in a way, you kind of own it by defending it because you have to defend it. Like it's some version of responsibility, which is maybe a byproduct of ownership, right? There's a pushback happening out there. Like there's even a conversation of recent where, you know, they can't describe their software as open source because the term means something. And we all agree on that, right?

Starting point is 00:18:05 We understand that. And I'm not trying to defend that, but how do you operate as an organization that defends this term? Yeah, I mean, this is really funny because we don't have a trademark on the term open source of my software. We have a soft power, if you want, that is given to us by all the people who, just like you just said, recognize

Starting point is 00:18:29 that the term open source is what we have designed, we have defined. We maintain the definition and it's kind of recursive, if you want. But corporations, individual developers, other institutions like academia, researchers,

Starting point is 00:18:46 they recognize that open source means exactly the list of licenses, those 10 points, which you want, the four freedoms that are listed. And we maintain that. And this has become quite visible also even in courts, where they do understand that if someone is, like there was a recent case involving the company Neo4j, and during that litigation, that is quite complicated and entrenched. I'm not a lawyer. I'm not going to dive into legal things. But the one key takeaway that is easy for me to drop and communicate is

Starting point is 00:19:26 that the judge recognized that the value of open source is in the definition that we maintain and any calling open source something that is not is not a license that is not a license that we have been approved is false advertising And that held up in court. Oh, yeah. So is that what you would say to people who are perhaps, maybe nonchalant isn't the best word, but unimpressed by open source as a definition, and they think it's stodgy and tight, and the thing that they're doing is close enough,

Starting point is 00:20:02 and they like the term, they're going to use the the term and they've got open-ish code or source available or business source because a lot of people that are kind of pushing not just against the definition itself but like against the idea that we need a definition or like you guys get to have the definition what do you say to them yeah you know they're self-ving. They try to be self-serving and they're trying to destroy the comments that way. Quite visibly, I think that users see through them and it's not even in their interest, but you know how it works.

Starting point is 00:20:36 Sometimes corporations, their greed goes up to they care only about the next quarter and who cares about what happens next. You know, maybe the next CEO will have to take care. Meanwhile, they're just going to laugh all the way to the bank. And that is the approach that I see in many of these people who complain or who try to redefine open source because it doesn't serve their purpose. What we maintain, it doesn't fully serve their purpose. What we maintain, it doesn't fully serve their purpose. So instead of respecting the comments and the shared ideas, they act like bullies and find all sorts of excuses

Starting point is 00:21:13 to redesign. We've seen it happening. I've been in free software and open source most of my career since I was in my 20s. And I've seen what was happening with the early days with the proprietary Unix guys that were going around telling us that this Linux thing is never going to work. You're joking. You're giving away. Then they started to be scared and started saying, hey, you're giving away your jewels. Why are you doing this? You're depriving us of our life support. Families are going to be begging on the street. I remember having

Starting point is 00:21:49 this conversation with a sales guy from Moscow. Microsoft, coming up with their program in the early 2000s, the shared source program, because they just could not get their head around the fact that you could make money sharing your source code.

Starting point is 00:22:08 But they were forced by the market to show at least a little bit of what was happening behind the scenes. They were losing deals. So we've seen it already. They're going to keep on going like this, but there is plenty of interest in maintaining plenty more forces on the other side to maintain, then to keep the bar straight, to keep going where we're going, because that clarity is such a powerful, such a powerful instrument to be able to say, I'm open source. Therefore, I know what I can do. I know what I can not do and have that collaboration straightened up.

Starting point is 00:22:47 You know, the legal departments, the compliance departments, the public tenders, they all tend to have very clear and speedy review of processes. That instead, if everyone has a different understanding of what open source means, yeah, we go back to the brand, right? I'm in Italy now, and I'm surprised to see a lot of Starbucks stores opening. And I'm absolutely baffled. Why is this happening? This country has plenty of bars every quarter.

Starting point is 00:23:18 So there's a cafe with a decent coffee. Why do you need a brand? Because people have been going around, traveling the world. They see the brand, they recognize it, they know what they can do, they know what they're going to get, what they're going to get,

Starting point is 00:23:31 and they go there. And it's the same with open source. What's up, friends? This episode is brought to you by our friends at Cinedia. Cinedia is helping teams take NAS to the next level via a global, multi-cloud, multi-geo, and extensible service fully managed by Cinedia. They take care of all the infrastructure, management, monitoring, and maintenance for you so you can focus on building exceptional distributed applications. And I'm here with VP of Product and Engineering, Byron Ruth. So, Byron, in the Nats versus Kafka conversation, I hear a couple of different things. One I hear

Starting point is 00:24:31 out there, I hate Kafka with a passion. That's quoted by the way on Hacker News. I hear Kafka is dead, long live Kafka. And then I hear Kafka is the default, but I hate it. So what's the deal with Nats versus Kafka? Yeah, so Kafka is an interesting, but I hate it. So what's the deal with Nats versus Kafka? Yeah. So Kafka is an interesting one. I've personally followed Kafka for quite some time ever since the LinkedIn days. And I think what they've done in terms of transitioning the landscape to event streaming has been wonderful. I think they definitely were the sort of first market for persistent data streaming. However, over time, as people have adopted it, they were the first to market, they provided a solution, but you don't know what you don't know in terms of you need this solution, you

Starting point is 00:25:15 need this capability, but inevitably there's also all this operational pain and overhead that people have come to associate with Kafka deployments. Based on our experience and what users and customers have come to us with, they would say, we are spending a ton of money on spend on a team to maintain our Kafka clusters or manage services or something like that. The paradigm of how they model topics and how you partition topics and how you scale them is not really in line with what they fundamentally want to do. And that's where NATS can provide, as we refer to it, subject-based addressing, which has a much more granular way of addressing messages, sending messages, subscribing to messages and things like that, which is very

Starting point is 00:26:02 different from what Kafka does. And the second that we introduced persistence with our Jetstream subsystem, as we refer to it, a handful of years ago, we literally had a flood of people saying, can I replace my Kafka deployments with this NATS Jetstream alternative? And we've been getting constant inbounds, constant customers asking, hey, can you enlighten us with what NATS can do? And oh, by the way, here's all these other dependencies like Redis and other things and some of our services-based things that we could potentially migrate and evolve over time by adopting NATS as a technology, as a core technology to people's systems and platforms. So this has been largely organic. We never from day one, you know, with our persistence layer Jetstream, the intention was never to say, we're going to go after Kafka. But because of how we layered the persistence on top of this really nice PubSub Coronet's foundation, and then we promoted it and we say,

Starting point is 00:27:00 hey, now we have the same, you know, same semantics, same paradigm with these new primitives that introduce persistence in terms of streams and consumers. The floodgate just opened and everyone was frankly coming to us and wanting to simplify their architecture, reduce costs, operational costs, get all of these other advantages that Nats has to offer that Kafka does not whatsoever or any of the other similar offerings out there. And you get all these other advantages that NATS has to offer.

Starting point is 00:27:28 So there's someone out there listening to this right now. They're the Kafka cluster admin, the person in charge of this cluster going down or not. They manage the team, they feel the pain, all the things. Give a prescription. What should they do? What we always recommend is that you can go to the NATS website, download the server, look at the client and model a stream. There's some guides on doing that. We also have, Synedia provided basically a packet of resources to inform people because we get, again, so many inbound requests about, how do you compare NATS and Kafka?

Starting point is 00:28:00 And we're like, let's actually just put a thing together that can inform people how to compare and contrast them. So we have a link on the website that we can share and you can basically go get those set of resources. This includes a very lengthy white paper from an outside consultant that did performance benchmarks and stuff like that and discuss basically the different trade-offs that are made. And they also do a total cost of ownership assessment between people who are organizations running Kafka versus running NATs for comparable workloads. Well, there you go. You have a prescription.

Starting point is 00:28:36 Check for a link in the show notes to those resources. Yesterday's tech is not cutting it. NATs powered by the global multi-cloud, multi-geo, an extensible service that is fully managed by Synedia. It's the way of the future. Learn more at synedia.com slash changelog. That's S-Y-N-A-D-I-A dot com slash changelog. so last year on this time meta released llama their large language model and to much fanfare and applause and they announced it as open source we know a lot has transpired since then

Starting point is 00:29:22 but at the time what was your response to, even personally or as the executive director of the OSI? What were you thinking? What were you doing in the wake of that announcement? Well, we were already looking at open source AI in general. We were trying to understand what this new world meant and what the impact was on the principles of open source as they apply to the new artifacts that are being created in AI. And we already had come to the conclusion that open source AI is a different animal than open source software. There are many, many differences. So we immediately, two years ago, over two years ago, that was one of the first things that I started was to really push the board and to push the community to think about AI as a new artifact

Starting point is 00:30:11 that required and deserved also a deep understanding and a deep analysis to see how we could transport the benefits of open source software into this world. The release of Lama 2 kind of cemented that idea. It is a completely new artifact because they have released, sure they have released a lot of information, a lot of details,

Starting point is 00:30:36 but for example, we don't know exactly what went into the training data. And well, Lama 2 also came out with a license that really has a lot of restrictions on use. So it's having restrictions on use is one of the things that we don't like. I mean, the open source definition forbids that you cannot have any restrictions on use. And, you know, a surface value, the license for Lama 2 seems innocent, right? One of the things says, well, you cannot use Lama 2 for commercial applications

Starting point is 00:31:09 if you have more than a few million, I don't remember exactly how many, a few million active users, monthly active users. Okay, you know, maybe that's a fair limitation. And in my mind, I was like, so what does it mean that the government of India cannot use it? The government of Italy, maybe, you know, if you want to embed this into... So that's already an exclusion and starts to have to think about it, you know, think about, yeah, I'm a startup, you know, I'm a small thing. But what happens when you get to the six million users when, you know, all of a sudden you have to lower up and change completely your processes.

Starting point is 00:31:48 But then there are a couple of other restrictions inside that license that are even more in the center of the surface. But when you start diving deeper, like you cannot do anything illegal with it. Okay. All right. So let me say, if I help someone decide whether they can or they should have an abortion, or if I want to have this tool used in application to help me, I don't know, get refugees out of war zones into another place. And maybe I'm considered a terrorist organization by the government that is using that.

Starting point is 00:32:25 So am I doing something illegal? Depends on whose side, you know, who needs to be evaluating that. It's these licensing terms that the open source initiative really doesn't think they're useful. They're valuable and they should not be part of a license. They should not be part of a contract in general. And they need to be dealt at a separate level. So that's what I was looking at. I was like, oh, Lama 2, oh my God.

Starting point is 00:32:52 It's not open source because clearly this licensing thing would never pass our approval. And at the same time, we don't even know exactly what open source means. Why are you polluting the space? So I was really upset. Yeah. so then do you spring into action like what does the osi do because you're the defenders of the definition and here's a misuse a huge public misuse do you set do you write a blog post do you send a letter you know from a lawyer what do you do we call it we were

Starting point is 00:33:21 already called call it zack luckily we were already into this two-year process of defining open source AI. So we have, actually, I was already in conversations with Meta to have them join the process and support the process to find the sheer definition of open source AI. And in fact, they're part of this conversation that I'm having with not just corporations like Google, Microsoft, GitHub, Amazon, etc. But also, we've invited researchers in academia, creators of AI,

Starting point is 00:33:56 experts of ethics and philosophy, organizations that deal with open in general, open knowledge, open data like Wikimedia, Creative Commons, Open Knowledge Foundation, Mozilla Foundation. And we're talking also with a bunch of experts in ethics, but also organizations like digital rights groups, like the ESF and other organizations around the world who have me, you know, helping

Starting point is 00:34:23 into this debate. Like we had to first go through an exercise to understand and come to a shared agreement that AI is a different thing than software. Then we went through an exercise to find the shared values that we want to have represented and why we want to have the same sort of advantages

Starting point is 00:34:43 that we have for software also posted over to the AI system. And then we have identified the freedoms that we want to have exercised. And now we're at the point where we are trying to enlisting, making the list of components of AI systems, which is not as simple as binary code, compiler, compiler and source code. So it's not as simple as that. It's a lot more complicated. So we're building this list of components for specific systems.

Starting point is 00:35:18 And the idea is by the end of the spring, early summer, to have the equivalent of what we have now as a checklist for legal documents, for software, and have the equivalent for AI systems and their components so that we will know, basically, we have a release candidate for an open-source AI definition.

Starting point is 00:35:39 Yeah, you mentioned that, and there's, I think you posted this eight days ago, a new draft of the open source AI definition version 0.0.5 is available I'm going to read from I think what you might be alluding to which is this like exactly what is open source AI and it says linked up to the hack md document it says what is open source AI to be open source an AI system needs to be available under legal terms that grant the freedoms to one use the system for any purpose and without having to ask for permission, two, study how the system works and inspect its components, three, modify the system for any purpose, including to change its output, and four, share the system for others to use with or without modifications for any purpose. So those seem to be the four hinges that this,

Starting point is 00:36:28 what is open source AI is hinging upon, at least in its current draft. Is that pretty accurate considering it's recent eight days ago? Yeah, those are the four principles that we want to have represented. Now, the very crucial question is what comes next is what is, if you are familiar with the four freedoms of for software those set by the free software foundation in the late 80s they have one those freedoms have one little sentence attached to it to the freedom to study and the freedom to modify they both say access to the source code is a precondition for this which really means that to clarify it's that little addition meant to clarify that the fact that if you want to study

Starting point is 00:37:12 a system if you want to if you want to modify it you need to have a way to make modifications to it that is not just it's the preferred form to make modification from the human perspective it's not that you give me a binary and then i have to decompile it or try to figure out from reverse engineering how it works. Give me the source code. I need the source code in order to study. For the AI systems, we haven't really found yet a shared understanding or a shared agreement on what it needs to have access to the preferred form to make modification to an AI system. That's the exercise that we're running now. Yeah, that's interesting. The preferred form of modification is really interesting because,

Starting point is 00:37:57 like you said, you don't want to give a binary and expect reverse engineering because that's possible, right? And that's possible maybe to a small subset. It's not the preferred route to get to Rome. It's just like, that's not the road I want to go down, right? I want a different way. Yeah. And you want to have a simple way. So, you know, even some licenses even have a more specific wording around defining what source code actually means.

Starting point is 00:38:21 Like the GNU GPL is one of those. You know, a very GNU and GPL is one of those, you know, very clear description and prescriptions about what needs to be given to users in order to exercise those freedoms, their freedoms as a user. So for AI, yeah, for AI, for AI, there is, it's complicated because there are a few new things for which we don't even have, there are no court cases yet. You know, I keep repeating the same story. When software came out for the first time, started to come out at the labs, research

Starting point is 00:38:52 labs, it started to become a commercial artifact that people could just sell. There was a conscious decision to apply copyright to it. It was not a given fact that it was going to be using copyright, like copyright law. So that decision was a lucky one, honestly. And it was a well thought out, I don't know which of the two, because copyright as a legal system

Starting point is 00:39:19 is very similar across the world. And building the open source definition, the free software definition, the legal documents that go with software for open source software and free software, those legal documents built on top of copyright means that they're very, very similarly applied pretty much everywhere around the world.

Starting point is 00:39:39 The alternative at the time were conversations around treating software as an invention and therefore covered by patents. Patent law is a whole different mess around the world. They're all different applications. They have all different terms, much more complicated to deal with. So for AI, we're pretty much at the same stage where there are some new artifacts, like the model, after you train a model and that produces weights and parameters, they go into the model. Those models, honestly, it's not clear what kind of legal frameworks apply to those things.

Starting point is 00:40:17 And we might be at the same time in history where we could have to imagine and think and maybe suggest and recommend what the best course of action will be, whether it makes sense to treat them as copyrightable entities, artifacts, or nothing at all, or inventions, or any, you know, some other rights or exclusive right. And the same goes into the other big conversation that is happening already, but for which I don't have a clear view of where it's going to end, is the conversations around the right to data mining. sued by New York Times and Getty Images, Stability AI suing Getty Images, and GitHub being sued by Anonymous, et cetera, et cetera. A lot of those lawsuits hinge on what's happening. Why are these powerful corporations going around and crawling the internet, aggregating all of this information and data that we have provided, uploaded?

Starting point is 00:41:27 We, society, some commercial actors actors some non-commercial actors we have created this wealth of of data on the internet and they're they're going around painting it and basically making a proprietary i'm building models that they have for themselves and on top of that you can already start seeing like, oh my God, they're going to be eventually making a lot of money out of the things that we have created. Or even more scarily, like sometimes I think about this myself, I've been uploading my pictures for many years without thinking too much. So there is another base out there. I'm sure that someone has built another base out there of my pictures as I was aging. And now these pictures can be used, could be used by an evil government or an evil actor to recognize me around the streets at any time. And I don't have any recourse. Is that fair?

Starting point is 00:42:18 Is that not fair? Those are big questions. And there is no easy or simple answer. Yeah. So did you enumerate, and I missed it, or can we enumerate the components that you have decided so far are part of an AI system? The code I heard, the training data, et cetera? Yeah.

Starting point is 00:42:39 There are three main categories, so maybe four. One is in the category of data. one is in the category of code, one is the other category is models, and there is a fourth category that goes into other things like documentation, for example, instructions about to use or scientific papers. In the data parts, some of the components are the training data, the testing data. In the data parts, some of the components are the training data, the testing data. In the code parts go the tooling for the architecture,

Starting point is 00:43:12 the inference code to run the model. Anything that is written by a human in general, you can also have in there the code to filter and set up the data sets and prepare them for the training. And then in the models, you have the model architecture, the model parameters, including

Starting point is 00:43:32 weights, hyperparameters, and things like that. There might be intermediate steps during the training. And the last bit is documentation, how-tos, samples, output. So there is an initial list of all of these components that have been, we worked with, or actually the Linux Foundation

Starting point is 00:43:54 worked on creating this list specifically for generative AI and large language models. And we're working with them. I mean, we're using this DaryList as a backdrop or as a starting point to move forward this conversation. Now, the question that we need to ask

Starting point is 00:44:13 having this list, and if you go to the draft five, you will see an empty matrix, basically. So this components, there are 16, if I remember correctly, 17. This is components. And then on a row next to them, there is a question, do I need it to run it? Do I need it to use it?

Starting point is 00:44:31 I mean, do I need to use it? Do I need it to copy? Do I need it to study? Do I need this component to modify the system? And we're referring to the system, right? This is one of the important thing is the open source definition refers to the program. And the program is never defined, but a program in pretty much we know what it is. AI is, and again, this is a very complicated question.

Starting point is 00:44:57 It looks very simple on surface, but when you start diving a little bit deeper, it becomes complicated because what is an AI system, right? So we started using the definition that has been becoming quite popular in every regulation around the world. It's a work done by the Organization for Economic Cooperation and Development, the OECD. And they have defined AI system in very broad terms. And this definition is being used in many regulations, like from the United States Executive Order on AI. NIST also uses it. In Europe, the AI Act uses it,

Starting point is 00:45:39 although with a slight, very small, minor variation. It seems to be quite popular, but there are detractors. And indeed, it is quite generic too. Sometimes when you read at it carefully, you may even cover a spreadsheet. It's really bizarre. So let's say that hypothetically I'm like a medical company that has been working on a large language model, and I have proprietary data.

Starting point is 00:46:08 So I have like readings and reports and stuff that we've accumulated over years. And I create an LLM based on that data that ultimately can answer questions about medicine or whatever. And I want to open source that. I need to be able to make it so it's usable, studyable, modifiable and shareable and it seems like the training data even though that's the most proprietary part and perhaps the most difficult part to actually

Starting point is 00:46:36 make available or sometimes impossible is necessary not to use but to study and modify it seems like so if I release the model the code not to use, but to study and modify, it seems like. So if I release the model, the code, all the parameters, everything we use to build a model, everything except for the source original data under what you guys are currently working on,

Starting point is 00:46:57 that would not be open source AI, would it? Honestly, that is a very good case. An example for why I think we need to carefully reason around what exactly do I need to study? What kind of access, what sort of access do I need? Is that the original data set? Because if it is the original data set, then we will never going to have an open source AI. Right. That's where I was getting to.

Starting point is 00:47:24 It's not going to happen. It's not going to happen. It's not going to happen. Yeah. So maybe, and this is my working hypothesis that I threw out there, maybe what we need is a very good description of what that data is. Maybe samples, maybe instructions on how to replicate it. Because, for example, there might be data that is copyrighted. You might have the right under fair use or under different exclusions of copyright.

Starting point is 00:47:52 You may have the rights to create a copy and create a derivative, like around the training, but not to redistribute it. If you redistribute it, then you start infringing. So I think we need to be carefully thinking about that. And the reason why I became more and more convinced that we don't need the original dataset is because I've seen

Starting point is 00:48:14 wonderful mixing, wonderful remixing of models, even splitting of models and recombinations of models, creating whole new capabilities, new AI capabilities without having to retrain a single thing. So I'm starting to believe really that the AI weights in machine learning, the weights in the architecture, it's not a binary code. It's not a binary system, the binary code that you have to reverse engineer.

Starting point is 00:48:50 If you have sufficiently detailed instructions on how it's been built and what went into it, you should be able, you might be able to create new systems, reassemble it, study how it works and executing it, modifying. So the preferred form to make modifications is not necessarily going through the pipeline or rebuilding the whole system from scratch, which for many reasons may be impossible. I do like the idea of a small subset of the data set

Starting point is 00:49:21 that's anonymized or sanitized in some way, shape, or form that's like, this is the acceptable sample amount required for the study portion or the modification portion. Yeah. You know, it could be the schema, for example. It could be the, you know. Right. Provide your own data in here if you can,

Starting point is 00:49:40 which you can obviously find other ways to use artificial intelligence to generate more data. So that's a whole thing, right? But I feel like that's acceptable to me to provide some sort of sampling, or as you said, the schema. I think that makes sense to me. Yeah, the research is going also in this direction with data cards and model cards,

Starting point is 00:50:03 lots of metadata specifications. I do think that that might be a viable option. I would love to have, I mean, we've seen the next few weeks and months how that conversation goes, but I do believe that that's one way that we can get out of this process with a definition

Starting point is 00:50:20 that is not just a theoretical something beautiful that you put up in a picture in a museum and nobody can do anything with it. It needs to be practical. Like I keep repeating, the open source definition had success because it enabled something practical and it has success because other people have written it, other people have decided to use it. If you keep on insisting from your

Starting point is 00:50:45 pedestal that you shall do this and that, you may not be finding enough little crowds to follow you. Right. If no one's using it, what's the point? What's the point?

Starting point is 00:51:00 You've lost the thread. What's up, friends? I'm here with one of my new friends zane hamilton from ciq so zane we're coming up on a hard deadline with the centos end of life later this year in july and there are still folks out there considering what their next move should be then last year we had a bunch of change around red hat enterprise linux that makes it quote less open source in the eyes of the community with many saying rel is open source but where is the source and why can't I download and install it? Now, Rocky Linux is fully open source and CIQ is a founding support partner that offers paid support for migration, installation, configuration, training, etc.

Starting point is 00:52:00 But what exactly does an enterprise or a Linux sysadmin get when they choose the free and open source Rocky Linux and then ultimately the support from CIQ if they need it? There's a lot going on in the enterprise Linux space today. There's a lot of end of life of CentOS. People are making decisions on where to go next. The standard of what enterprise Linux looks like tomorrow is kind of up in the air. What CIQ is doing is we're trying to help those people that are going through these different decisions that they're having to make and how they go about making those decisions.

Starting point is 00:52:29 And that's where our expertise really comes into play. A lot of people who have been through very complex Linux migrations, be it from the old days of migrating from AIX or Solaris onto Linux, and even going from version to version, because to be honest, enterprise Linux version to version has not always been an easy conversion. It hasn't been. And you will hear that from us. Typically the best idea is to do an in-place upgrade. Not always a real easy thing to do, but what we've done is we have started looking at and securing a path of how can we actually go through that? How can we help a customer who's moving from CentOS 7 because of the end of life in July of this year? What does that migration path look like and how can we help? And that's

Starting point is 00:53:03 where we're looking in ways to help automate from an admin perspective. If you're working with us, we've been through this. We can actually go through and build out that new machine and do a lot of the backend manual work for you. So that all you really have to do at the end of the day is validate your applications up and running in the new space. And then we automate the switch over.

Starting point is 00:53:20 So we've worked through a lot of that. There's also the decisions you're making around, I'm paying a very large bill for something I'm not necessarily getting the most value out of. I don't want to continue down that path. We can help you make that shift over to an open source operating system, Rocky Linux, and help drive what's next, help you be involved in a community and help make sure that that environment you have is stable. It's going to be validated by the actual vendors that you're using today. And that's really where we want to be as a partner from not just an end user perspective,

Starting point is 00:53:49 but as an industry perspective. We are working with a lot of those top tier vendors out there of certifying Rocky, making sure that it gets pushed back to the RESF, making sure that we can validate that everything is there and secure that needs to be there and helping you on that journey of moving. And that's where we, CIQ, really show our value on top of an open source operating system is we have the expertise.

Starting point is 00:54:10 We've done this before. We're in the trenches with you and we're defining that path of how to move forward. Okay, ops and sysadmin folks out there, what are you choosing? CentOS is end of life soon. You may be using it,

Starting point is 00:54:22 but if you want a support partner in the trenches with you, in the open source trenches with you, check out CIQ. They're the founding support partner of Rocky Linux. They've stood up the RESF, which is the home for open source enterprise software, the Rocky Enterprise Software Foundation, that is. They've helped to orchestrate the Open ELA, a collaboration created by and upheld by CIQ, Oracle, and SUSE. Check out Rocky Linux at RockyLinux.org, the RESF at RESF.org. And of course, if you need support, check out our friends at CIQ at CIQ.com. Thank you. Are there systems out there today that you would rubber stamp and say, like, this is open source AI? I'm thinking of perhaps Mistral has a bunch of stuff going on and they're committed to open and transparent, but I don't know exactly what that means for them. Have you looked at anything? And do you have like things you're comparing against as you build to make sure that there's a set of things that exist or could exist that are practical? Not yet. I know that there is, we have an affiliate organization called Eleuther AI.

Starting point is 00:55:49 They are a group of researchers. They recently incorporated as a file one C3 nonprofit in the United States. And from the very beginning, they've been doing a lot of research in the open, raising data sets and structure and then research papers, models and weights and everything like that. So I'm really leaning a lot on them to shine a light on how this can be done. But I don't want to be too restricted in my mind.

Starting point is 00:56:18 Like they are very open with an open science and open research mentality. I think that there is an open AI and open research mentality. I think that there is an open AI, an open source AI, that is not as equally open necessarily, but it can still practically have meaningful impact. It can generate that positive reinforcement of innovation, permissionless collaboration, et cetera.

Starting point is 00:56:44 So yes, I need an unlooted AI, but I'm also very open, and I'm sure that there will be other organizations, other groups, as we go and elaborate more on what we actually need to, what is preferred form to make modifications to an AI system, that we're going to discover more. So, no open source AI yet. So there's no rubber stamp for anything out there currently.

Starting point is 00:57:08 Well, I mean, I said I could rubber stamp PTA and the Alunzer AI, but I don't want to say that that's necessarily the only thing. Right, there may be more stuff. And again,

Starting point is 00:57:19 those are the ones, the guys that I, because I know how they work. Yesterday or the other day, ALMA released by the Alenea Institute. And that seems to be also quite openly available for models, weights, science behind it, etc. I haven't looked at their licenses and I haven't looked at carefully, so I can't really tell. It might as well be an open source AI system. I'm trying to get to a definitive, really.

Starting point is 00:57:47 Is there or is there not a stant open source AI out there yet? You know, I can tell you what is not. Mama 2 is not. Open AI is not. Touche. A deny list more than a permit list. Yeah. So I suppose one of the questions, which maybe is obvious, but I got to ask it, is what is the benefit if I'm building a model and I'm releasing a new AI?

Starting point is 00:58:13 What is the benefit to it being open source, to meet this open source AI definition? What is the benefit to its originator? And then obviously to humanity, I kind of get that but like what's the benefit it's pretty easy to kind of clarify that with software right we see we see how that's working because we've got you know 30 years of history or more in a lot of cases like we've got we've got a track record there we don't have track record here it's still early pioneer days what's the benefit that is a very good question and i i don't have an answer for it i mean i do i i know the benefit for humanity i know the benefit for the science of it and and

Starting point is 00:58:54 this is what really those benefits are what triggered the internet like if software started to come out at the labs without the definition of true software, without the GPN license, without the BSD research. I don't think we would have had such a fast evolution of software computer science. We would not have the internet that we see today. If everyone had to buy a license from Solaris, Sun, from Oracle, etc. If a data center would have to go

Starting point is 00:59:29 and call some microsystems or IBM's sales team before you could build a data center instead of using just boxes and sloppy Grimix and Apache Web Server on it, we would have had a completely different history of

Starting point is 00:59:45 digital world, the past, I mean, completely different. So I can see the benefit for society and science. For some of these corporations, I'm assuming that they have made some of their calculations on stopping the competition or creating competitive advantages, maybe in pure Silicon Valley approach, like get more users, we'll figure out the business model later. There is some of that going on, likely, most likely.

Starting point is 01:00:13 But I haven't had that conversation yet with any of the smart people I know thinking about the business models behind this or the possible ways of privatizing or, I don't know, finding revenue streams and things like that from this open source model. thinking about the business models behind this, the possible ways of privatizing, or finding revenue streams and things like that from this open source model. Do you think that they're becoming commoditized? If we specifically talk about these large language models,

Starting point is 01:00:36 if we call AI that for now, recognizing there's an umbrella term and there's other things that also that represents, do you think that they are becoming commoditized and will continue to enough so that open source can keep up with proprietary in terms of quality or even surpass just because of the number of people releasing things? And are they, you know, I don't know. That's why I'm asking honestly, what are your thoughts on it?

Starting point is 01:01:01 Honestly, recently I saw this new system that it's a text-to-speech system and they built it, this team of developers from a company called Colabora. They built this system by splitting a system from OpenAI another from

Starting point is 01:01:19 either Propic or I don't remember exactly, but they split an AI system. They took it and they flipped it, their input for outputs, and they attached another model of their own training with small data sets, and they built a brand new thing.

Starting point is 01:01:36 I think, I mean, this is the kind of stuff that is inspiring. Like at one point, there's going to be, I'm sure that the quick evolution of this discipline would make it so that smaller teams with smaller amount of data would be able to create very powerful machines. And maybe the advantages of these large corporations that are now deploying, delivering, and distributing openly accessible AI models, maybe in their mind, having optimized hardware, cloud resources that they can sell. Maybe that's where they're going.

Starting point is 01:02:12 It's one of their revenue streams they imagined that they would be coming from. Yeah, that is exciting. I did see, I think it was like Codium AI just recently announced a model that beats DeepMind on code generation, you know, according to benchmarks that I haven't looked at, as well as Copilot. And that's from a smaller player. I'm not sure if that's open or closed or what, but it is kind of pointing towards like,

Starting point is 01:02:40 okay, there's significant competition and, like you said, remixing and the ability to combine and change and even in some cases swap out and take the best results that we will have a vibrant ecosystem of these things. And I think open source is the best model for vibrant ecosystems. So that rings true with me.

Starting point is 01:03:05 Doesn't mean it's right, but it sounds right. Yeah. This is a tough one. This is really a tough nut to crack, really. I mean, even at the forums you have, I believe you're calling it the deep dive, right? It's deep dive, colon AI. And this is the place where you're hoping that many folks can come and organize.

Starting point is 01:03:26 You say it's the global multi-stakeholder effort to define open source AI and that you're bringing together various organizations and individuals to collaboratively write a new document, which is what we've been talking about directly and indirectly. Who else is invited to this? How does this get around? How do people know about this? Who is invited to the table to define or help define? Is this an open way to define it? What is happening? Who's participating?

Starting point is 01:03:55 At this point, it's now public. So anyone can really join the forum and can join me in the bi-weekly town hall meetings. So that part is public and everybody's welcome to join. We're going to keep on going with public reports and small working groups with people that we're picking, but only because of agility in the collaborations. We want to have, we're picking people that we know of or that we have been in touch with coming from a variety of experiences. Say we're talking to creators of AI in academia, large corporations, small corporations, startup, lawyers, people who work with regulators, think tanks, and lobbying organizations. We're talking to

Starting point is 01:04:46 experts in other fields like ethics and philosophy. We keep on chatting with... We have identified six stakeholders categories, and we're trying to have our representations also geographically distributed from North America, South America, Asia Pacific, Europe, Africa. Last year, we had conversations with about 80 people from representatives of all these categories in a private group just to get things kick-started. And we have had meetings in person starting in June in San Francisco and in July in Portland and then other meetings in Bilbao in Europe like we had meetings in person with some of these people going at different conferences but starting this year we're going to be this first half of the year we're going to be super public we're going to gather we're going to be, this first half of the year, we're going to be super public. We're going to gather.

Starting point is 01:05:45 We're going to be publishing all the results of the working groups. And we're going to be taking comments on the forums. And then we're going to have an in-person meeting. representatives for each of the stakeholder categories to get in a room and produce, you know, iron out the last, the last pieces in the definition, you know, removing all the comments and come out with a, out of that meeting with a range candidate, something that we feel like there is endorsement from a dozen different organizations across the world and across the experience, then we're going to use and we're

Starting point is 01:06:25 raising funds for it to have at least four events in different parts of the world between June and the end of October. One of these events will definitely going to be at All Things Open. We're going to gather more potential endorsements. And as soon as we get to five endorsements from each of the different categories, I think we're going to be able to say this is version one. We can start working with it and see where we land.

Starting point is 01:06:54 And maybe next year, we're going to have, by that time, I mean, by October, November, the board will also have a process for the maintenance of this definition because most likely, we're going to have to think about how to maintain it, how to respond to challenges,

Starting point is 01:07:11 whether they're technological or regulatory challenges, or just we missed a mark and we realize later we'll have to fix it. Yeah. Kind of want to backtrack slightly i guess as i hear you talk about this and kind of coming to the you know a version of blessed sometime this year based upon certain details like when i asked you and i know this is your response and not so much a corporate response in terms of what's the benefit of being an open source artificial intelligence like what is what's the benefit of being open source AI?

Starting point is 01:07:46 Like all this effort to define it, and then what if there's not that many people who really want to be defined by it? Like I guess that's an interesting consideration is that all this effort to define it, but maybe there is no real benefit, or the benefit is unclear, and then folks just, it's almost like saying,

Starting point is 01:08:04 it's definitely a line, right? It's like, well, okay, everything is basically not and there's very few that are, basically. Or at least initially, maybe as iteration and progress happens that more and more we'll see a benefit and maybe that benefit permeates more clearly than we can see it now. Yeah.

Starting point is 01:08:20 I don't want to think about that. Okay. I don't want to think about that. Yeah, no, it's one of those things. Like if you start any endeavor thinking about the biller, you're probably going to fail, right? So it's not one of the outcomes that I see tremendous amount of pressure. I mean, it's unlikely that that's going to happen.

Starting point is 01:08:40 That's what I want to say. I have had a lot of pressure from corporations, regulators, like the AI Act has a provision in there, a text that says that provides some exclusions to the mandates of the law for open source AI.

Starting point is 01:09:01 There is no definition in there. So regulators need it. Largest bulk corporations need it. Researchers need some clarity. I hear a lot of researchers, they want data. And they want data. It doesn't mean that they want

Starting point is 01:09:19 necessarily the original data. Some of them at least. But they do want to have good data set. And that only comes if there is a clarity about what are the boundaries of what is allowed for them to accumulate data, because data becomes very, very messy, very quickly. Privacy law, copyright law, trade secrets, illegal content, Content that's illegal in some parts of the country or in some countries and some other countries is not. It becomes really, really messy very quickly

Starting point is 01:09:53 and researchers don't have a way to deal with it right now. They need help. I agree that you should keep doing it. I didn't mean to sound like it should be a failure. Sometimes I think it might be beneficial to think about failure at the beginning because it's like, well, you got to consider your exit before you can go in, in a way. I'm not saying you should do that, but I'm glad you are defining it. It does need to be defined. I didn't mean to be necessarily like, what if, but, you know, there's a lot of your attention is probably spent simply on defining this and working with all the folks, all the stakeholders, all the opinion makers, etc. that are necessary to define what it is. It's a lot of work. It's a lot of work. And you're absolutely right.

Starting point is 01:10:38 This is taking most of my attention. And yes, I do see a couple of failure objects. Like, we can fail if we're late and if we get it wrong. But for getting it wrong,

Starting point is 01:10:52 the fact that it's defined with a version number, I think we can fix it over time and we really shouldn't be expecting to have it perfect

Starting point is 01:11:01 the first time. Because it's changing too quickly, the whole landscape. And the other, getting in late, is also part of the reason why I'm pushing to get something out of the door. Because a lot of pressure exists in the market to have something.

Starting point is 01:11:20 Everyone is calling their models open source AI, recognizing that there is value in that term implicitly. But if there is no clarity, it's going to be diluted very, very, very rapidly. Before Jared and I got on this call, one thing we had a loose discussion, then I quickly stopped talking because we have a term. I think it's pretty well known in broadcasting and podcasting is like, don't waste tape, right? And I didn't want to share my deep sentiment, although I loosely mentioned it to Jared in our pre-call just kind of 10 minutes before we met up, was basically what is at stake? I know we talked, you know, just loosely here about failure as an option and what is failure and is it iterative on the version numbers you just mentioned but is there a bigger concern at stake if the definition that you come up with collectively is not perfectly suited like does the term open source in software now is the term now

Starting point is 01:12:17 fractured because the arbiter of the term open source has not been able to carefully and accurately define open source AI. Like, is there a bigger loss that could happen? And I'm sorry to have to ask that question, but I have to. Yeah, you don't want me to sleep tonight, huh? Sorry about that. I think that's, I mean, I think so far we've been able to win, in quotes, win in the public when we push back on the term of open source because it's pretty well accepted, right? Yeah.

Starting point is 01:12:52 And whether, and I'm going to say this, but whether we like it or not, OSI has been the guardian, so to speak, of that term. Some say you've taken that right. I think you've been given that right over decades of trust. And in some cases, there's some mistrust, and that's not so much me. It's just out there. Not everybody's been happy with every decision you come up with, and that's going to be the case, right? If you're not making some enemies, you're not doing some things right,

Starting point is 01:13:20 I suppose, in the world, because not everybody's going to like your choices, right? Right. But I think, I wonder that. I personally wonder if you can't define this well, does the term open source change or is becoming open to change? There is that race, come aware. But that's one of the reasons why I'm being extra careful to make sure that everyone's involved and have a voice and has the chance to voice their opinion. And all of these opinions are recorded publicly so we can go back and point at the place

Starting point is 01:13:53 where we made a bad choice and be able to correct or not. Yeah. Stefano, real quick, what's the number one place people should go if they want to get involved? Like the URL. Here's how you can be part of that discussion. Discuss.opensource.org is where we're going to be having

Starting point is 01:14:11 all our conversations. Alright, you heard it. That'll be in the show notes. So if you are interested in this, even if you just want to listen and be lurking and watching as it makes progress, definitely hit that up. If you want your voice heard and you want to help Stefano and his team make this definition awesome and encompassing and successful. I think the more voices, the better, the earlier on the better so that we can have a

Starting point is 01:14:36 great open source AI definition. Thank you. Thanks Stefano. Appreciate your time. Thank you so much. Thank you. It's a big question mark what the future of the open source AI definition will be. Well, the first draft of the open source AI definition is linked in the show notes. I highly encourage you to check this out. Dig in, learn about what's happening here. Voice your opinion if you have a strong opinion, but definitely pay attention. As you can hear with some of the uncomfortability with the questions we asked about what happens if the open source AI definition falls a little short

Starting point is 01:15:16 or what the ramifications are or potential impact might be, I think we all need to pay close attention to how this definition evolves and lands. Links are in the show notes, so check them out. And again, thank you to Stefano because he did have a cold during this conversation. And he powered through because he knew this was an important conversation to have here on this podcast and to share with you. So thank you, Stefano. Up next on the pod is our friendly turned friend, Jamie Tanna,

Starting point is 01:15:46 coming up on Friends. And next week, it's about making your shell magical with Ellie Huxtable, talking about Atuin. Check it out at atuin.sh. Okay, once again, a big thank you to our friends and our partners at fly.io, our friends at typesense.org, and of course, our friends at century.io. Use the code changelog to get $100 off the team plan. You can do so at century.io. Okay, BMC, those beats are banging. We have that album out there, Dance Party. I don't know about you, but I've been dancing a lot more because that album has been on repeat on all my places that I listen to music. So I've been dancing a lot. Dance Party is out there. Check it out at changelog.com slash beats. That's it. The show's done. Thank you. Outro Music

The Changelog: Software Development, Open Source - What exactly is Open Source AI? (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.