All of my dedicated cyberstalkers (I’m sure you exist… I see you in my walls…) have by now noticed that the project I ranted incoherently about in the last post, called CircuitDojo, is live! It’s a pretty cool piece of software if I do say so myself.
Most of my work historically has been licensed either GPLv2 or GPLv3 (or, in the case of this blog, the “fuck it it’s not like the scrapers are reading my license” license). Both are widely regarded in the open-source community as being sensible choices – despite being absurdly long and complicated, and nearly impossible for developer-not-lawyer types like me to decipher on our own. This is not inherently a bad thing! The bad guys have small countries worth of lawyers, so we have to have precise, specific, ironclad licenses to fight them.
Frankly, I no longer care.
The advent of AI training has shed some light on a problem I suspect has existed for quite a while: companies don’t have to respect GPLvX when they’re building closed source stuff, because they can’t get caught. These massive models absolutely do have GPL-licensed code in their datasets – but their datasets aren’t open-source. Why is this possible? Maybe because the companies never cared in the first place! The likes of Big Red, Google (not even deserving of a nickname), Faceb00k, Micro$hit, and Scamazon all have vested interests in AI projects continuing to build. All of them want a piece of the pie. OpenAI, Anthropic, and all the enterprise labs like DeepMind and the Copilot team, are all violating GPL constantly: they have not open-sourced their datasets, but they have GPLvX in their datasets.
Maybe this is a more complicated issue than I think. I’m certainly not a legal expert. But it seems to me that that’s… basically the point of GPLvX: to give us not-legal-experts a well-known, common-knowledge way to keep companies from using our open-source code in closed-source applications. It is not working.
If somebody calls them out successfully and brings this to court, guess what will happen? Jensen Huang will just make another large “investment” in X company’s “AI capabilities”, which will in fact fund a very quick and brutal legal fight. If somehow the free software foundation can round up enough legal oomph to defeat the full might of the enshittoverse, they’ll just… negotiate to lose only a few hundred million dollars, get some more money from Nvidia, and continue. There’s no way in hell the big companies will allow us to spear the white whale and force a lab to open source its dataset, because this would fucking destroy them: all the little critters of the open-source-AI world would be quick to bite and start training their models on high quality filtered datasets. How exactly would OpenAI compete if the RWKV guys trained an open source model on their GPT5 dataset? Their hardware is technically inferior to Groq’s custom silicon, and their architectures are technically inferior to RWKV/Mamba2/etc.
So GPLv* are impotent against the real bad guys. Who do they actually stop? Certainly not all the shitty shops that don’t matter enough to deserve a lawsuit. (I’d love to call out National Instruments here, but I don’t actually have any evidence or reason to believe they do this – I’ll get you fucking eventually). The only guys this stops are the ones who care: the companies that have a product they can’t afford to open-source for some reason or another, and aren’t willing to violate the law to build it even if they could avoid getting caught. These are good companies. They may not be making open-source software, but they respect open-source enough not to steal from us.
All my GPLv* licensed code was almost certainly stolen to train LLMs. Some of it may have also been stolen for non-AI reasons, although I’m a little more doubtful of that. No licensing can stop this from happening. Micro$hit does not care what license I put on my code in GitHub, and scrapers do not care about the licenses of code on my sorta-private Gitea server. There are plenty of technical solutions: using Anubis, setting up Nepenthes, etc, but these are not legal (and in fact I’m a little worried about Big Shit suing me if I set up a tarpit and corrupt one of their datasets). All of these words I’m writing are probably going to be harvested to train LLMs eventually – through archival sites even if the icky scrapers can’t access this page directly.
So I’m switching to BSD licensing. This is for several reasons. First, I’m a bit fatalistic; as described at length above, I do not believe it’s possible to use a legal solution to keep Big Shit from my OSS work. If I can’t keep bad shops from using my code, I’m sure as hell not going to penalize the good shops and potentially give the bad ones a competitive advantage: if companies that steal my work can perform better than companies that are not willing to steal my work, then by licensing my work restrictively, I’ve just disincentivized good companies.
Ultimately, I picked BSD (my minimalism won and I picked 2-clause over 3-clause, which is probably gonna bite me in the ass later but rahhhh 33% smaller) rather than MIT for somewhat sentimental reasons: BSD Unix was one of the first large open-source projects. It’s sort of an homage to the original CSRG nerds who gave a big “fuck you” to AT&T and open-sourced a superior product.
I am fundamentally opposed to the existence of closed-source software. Every piece of closed-source I have to deal with is a nightmare. Corporations do not have the same design ethics as open-source projects, and it fucking shows. But I don’t think my using GPLvX is likely to make any difference, and if Big Shit are going to use my code anyways, I’d rather that include the good companies that would have respected my GPL license.
We’ll see how it goes.
If you’re interested in reading more about the CircuitDojo project, check it out on GitHub! It’s young, but promising. My ECE2020 professor at Georgia Tech seems to like the project, so with any luck, it’ll get approved as a MyDAQ alternative! Muwahahahahaha, National Instruments – your power is great, but not greater than mine!
P.S. if you work at national instruments and are still for some inexplicable reason reading this, please for the love of god fix your mydaq gui. you can use my code if you need help! it’s all bsd licensed
P.P.S. deadly boring math is doing more traffic than when I was actively writing it??? if you have any idea what’s going on please email me???? plupy44@gmail.com