The Changelog: Software Development, Open Source - HashiCorp strikes back (News)
Episode Date: April 8, 2024HashiCorp sends OpenTofu a nasty-gram in the wake of Matt Asay's infringement claims, Polar is like Patreon but for software creators, a Common Corpus of LLM data is released on HuggingFace & Loki is ...an open source tool for fact verification.
Transcript
Discussion (0)
What up nerds, I'm Jared and this is Changelog News for the week of Monday, April 8th, 2024.
Mateus Freyra wrote a very nice comment on Spotify about our undercover generalist episode,
but since approximately zero people read those,
I figured why not give it a boost? He says, amazing episode. This is the kind of content
that makes me open Spotify twice a week and come here to hear from real developers out there making
real stuff. Keep them coming. Change log. Thanks, Mateus. You keep listening. We'll keep them coming. Okay, let's get into the news.
HashiCorp strikes back.
On April 3rd, Matt Asay published a piece for InfoWorld titled,
Open Tofu May Be Showing Us The Wrong Way To Fork.
In it, he says, quote,
that Open Tofu may have illegally taken HashiCorp's code to keep pace.
At least, it's hard to avoid that conclusion, perusing Open Tofu's GitHub repositories and comparing them to HashiCorp's code to keep pace. At least, it's hard to avoid that conclusion,
perusing OpenTofu's GitHub repositories and comparing them to HashiCorp's.
End quote.
The code in question is a new feature in Terraform 1.7
that also landed in OpenTofu as the fork maintainers work to maintain parity.
Asay claimed, quote,
OpenTofu took this BSUL-licensed HashiCorp code, removed the headers, and tried to instead re-license it under the Mozilla public license, end quote.
As a beleaguered boxer might say, them's is fightin' words.
So naturally, it prompted many armchair software copyright lawyers to analyze the code in question and determine whether or not it was actually copy pasta. Smart people have landed on either side of this issue. Dan Larenc from Chaingard says,
I did my own audit and the samples bear no resemblance despite implementing similar
functionality, which is honestly hard to do and go where there are so few ways to do things.
I can't possibly see any validity to this claim. end quote. Meanwhile, Joe Duffy from Pulumi
concluded, quote, there are three major kinds of taint, from worst to least worst. One, copied the
code directly. Two, read the source code and was influenced by it. Three, copied the functionality.
I've seen this game enough to know that if the file, function, and variable names plus non-zero number of
statements match, you've probably got at least level two, if not level one. And that's a problem.
That's pretty clearly true of at least remove statement.go, end quote. Remove statement.go
was one of the files in that new feature. Asay later issued this statement about his article.
I regret how strongly I express myself, force of habit,
but grateful for those who expressed support against mob dogpiling.
To our execs, my post put under fire.
They didn't agree with my conclusions, but responded with kindness.
They're the kind of OSS community I want to join, end quote.
A few days later, Open Tofu posted this on their LinkedIn page.
Quote, Open Tofu Project was recently made disagrees with any suggestion that it misappropriated,
mis-sourced, or otherwise misused HashiCorp's BSL code.
Indeed, it seems that HashiCorp may be conflating code that had previously been open-sourced under the NPL
and more recently developed code it published under the BSL. OpenTofu's maintainers have
investigated this matter and intends to issue a written response providing a more detailed
explanation of its position in the coming days. I'm excited to read OpenTofu's written response, but I have a feeling it's only going to get
uglier from here.
If I'm running the OpenTofu project, I'd be seriously considering a change in strategy
from feature parity to differentiation from now on.
Polar is a creator platform for developers.
Polar is like Patreon, but tailored to software creators.
Their tagline is, get paid coding on your passion.
And I have to say, that is a compelling proposition.
They're just getting started, but the current suite enables maintainers to offer
exclusive posts and newsletters, access to private GitHub repos, Discord invites,
and what they're calling Sponsorship 2.0,
which is logos on your readmes.
There's a few things that are interesting to me
about this project.
One, zero fixed costs.
Polar takes a 5% rev share plus Stripe fees.
Two, issue funding and reward splitting.
And three, Mitchell Hashimoto has joined on as an advisor. Watch this space,
you know I will. It's now time for sponsored news. Rethinking microservices. In the linked video,
Sanadia's Jeremy Sands addresses the current state of overwhelm when building microservice
architectures and how a technology like NATS.io can help solve many of
the current requirements for microservices within a single piece of infra. Here's a sample.
Microservices have gotten really complicated to build and maintain. Not only do you have to build
the application, but you have to figure out an API gateway, a load balancer, you have to figure out
how to deploy it, what data store you want to use, how to do canary management and monitoring and
logging. There's lots and lots of use cases to handle. And I think it's one of the reasons why microservices
have gotten such a bad rap over the years. But what if I told you there was another way to go
about doing it? And just by rethinking some of the core ideas behind microservices, we could
possibly be using very, very few technologies to accomplish the very same task.
Watch the whole thing by following the link in the chapter data and companion newsletter.
And thanks to Synadia for sponsoring ChangeLog News.
Releasing Common Corpus.
Pierre-Karl Longleas announcing the release of Common Corpus on Hugging Face.
Quote, Contrary to what most large AI companies claim, the release of Common Corpus aims to show it is possible to train large language models on fully open and reproducible
corpus without using copyright content. This is only an initial part of what we have collected
so far, in part due to the lengthy process of copyright duration verification. In the following
weeks and months, we'll continue
to publish many additional datasets also coming from other open sources, such as OpenData or
OpenScience. End quote. Here is more info about this massive dataset. Common Corpus is the largest
public domain dataset released for training LLMs. Common Corpus includes 500 billion words from a wide diversity of cultural heritage
initiatives. Common Corpus is multilingual and the largest corpus to date in English, French,
Dutch, Spanish, German, and Italian. Common Corpus shows it is possible to train fully open LLMs on
sources without copyright concerns. Loki is an open source tool for fact verification.
This Python-based tool is designed to automate the process of verifying factuality.
Its list of components helps explain how it does what it does.
First, Decomposer, which breaks down extensive texts into digestible, independent claims,
setting the stage for detailed analysis.
Second, Checkworthy, which assesses each claim's potential significance,
filtering out vague or ambiguous statements to focus on those that truly matter.
Third, Query Generator, which transforms Checkworthy claims into precise queries,
ready to navigate the vast expanse of the internet in search of truth.
Fourth, Evidence Crawler, which ventures into the digital realm,
retrieving relevant evidence that forms the foundation of informed verification.
And finally, Claim Verify, which examines the gathered evidence,
determining the veracity of each claim to uphold the integrity of information.
Put all that together, and what you have is a Python script where you can run something
like python factcheck.py dash dash modal string dash dash input.
Loki is the god of mischief, and Loki will go out there and verify that for you.
That is the news for now, but give this episode's companion newsletter a scan for more stories, including
Cory Doctorow on ditching Google Search
for Kagi, the story of a Wi-Fi
network that only works when it's
raining, and more. If you don't yet
receive the ChangeLog newsletter,
sign up today at changelog.com
slash news. Yeah, yeah, yeah, yeah, yeah, do it.
We have some great episodes coming
up this week. We interview Scott Chacon
on Wednesday.
And Breakmaster Cylinder returns to help us deconstruct the new Dance Party album on Friday.
It's cacophonous and it's great.
Have a great week.
Leave us a five-star review if you dig it.
And I'll talk to you again real soon.