So you’ve developed some statistical software for your research (or your job or for fun). You’d like to publish your software so others can use it, maybe as supplementary material for a paper or maybe just for the sake of open science.
You can send the source code to the journal, post a Zip file on your website, or make your GitHub repository public. That’s easy enough.
But what are people allowed to do with that code? What conditions can you set on its use? What does “open source” mean, anyway?
Before we talk about open-source licenses and legal terms, it’s helpful to know what the basic legal regime is.
In the United States, and the 176 countries that have ratified the Berne Convention, your literary, scientific, and artistic works are protected by copyright law automatically. Any text you write, photographs you take, or, yes, code you write – all of it is automatically protected by copyright law.
Copyright law is intended to support the arts and sciences by making it possible for artists and authors to control the distribution of their work, so they can charge money for it. (This is, at least, the American view; the European view differs.) It stipulates that nobody may translate, publicly perform, broadcast, reproduce, or adapt your work without your permission. You can sell or assign your copyright on a specific work (like a book) to others via contract, meaning they have this control over the work.
(Europe recognizes “moral rights” as well, which are retained by the author regardless of any copyright transfer or contract.)
In the United States, this legal protection extends for your entire life plus 70 years. You do not need to file any paperwork or registration to obtain it. You hold copyright in all creative work you produce automatically. Yes, life plus 70 years is a very long time for your Facebook photos to be legally protected. The law provides ways to punish people who violate copyright, e.g. by imposing fines and injunctions.
Note that copyright is for artistic and creative expression: things like written text, performances, artwork, music and so on. It’s not for ideas; you can copyright your scientific paper so nobody can copy its words without your permission, but you can’t copyright the ideas in the paper to prevent anyone from using your theorems to prove their own conjectures.
There are some exceptions to copyright law. Not everything qualifies for copyright. For example:
For work where copyright does apply, copyright law means that if you release the work to the public but provide no guidance on how they use it – for example, a license granting them permission to copy or distribute it – then, by default, all restrictions apply.
There are exceptions allowing you to make use of copyrighted works even if the creator has not given you permission. In the United States, these exceptions are called fair use, and the law states that “the fair use of a copyrighted work… for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.” There is a test to determine what qualifies as fair use, relating to whether the use harms the market or value of the work, whether it is being used for commercial purposes, and the amount of copyrighted work being used.
Despite what certain YouTube uploaders think, you cannot simply upload copyrighted work with a note saying “No copyright infringement is intended”.
After the copyright period is up (70 years after you die), your work enters the public domain. That means there is no legal restriction on its use: anyone can copy it, make movies out of it, sell it, translate it into other languages, or whatever they want. This is intended to benefit society by enriching our culture.
This isn’t strictly relevant, but it’s worth making the distinction.
A trademark is a different thing than a copyright. Trademark refers to some name, brand, logo, or design identifying a product or brand and distinguishing it from others. Their purpose is to identify the origin of a product or service – the business or company or person who controls them – and trademarks exist primarily to protect consumers and inform free markets.
A trademark ensures that if you like a certain company’s products, no other company can sell products under their name without their permission, thus appropriating your trust in the brand and convincing you to buy something you otherwise wouldn’t. Trademarks can be licensed, meaning one company can grant another company permission to use their trademarks (e.g. a movie studio granting permission for Lego to make Lego sets for their movies).
No, you don’t need to use the little ™ or ® symbols any time you refer to a trademarked name. Companies use the symbols basically to give notice that their brands are trademarks; you don’t need to use the symbols when you refer to their names. Don’t crust up your writing with little barnacles of legalese every time you mention PepsiCo®™.
A patent is also different from a copyright. Patents are used to register inventions, and grant you the exclusive right to use and sell that invention for twenty years (roughly). In exchange for that exclusive right, you must file a detailed document (the patent) describing your invention, so the public may benefit from the invention. If you hold a patent you may grant licenses to others to use it (and charge them money for the privilege); after the patent expires, anyone may use the patented idea however they want.
Inventions in the form of software can be patented, but this is an area of some controversy. “Patent troll” companies tried to make money by patenting ordinary things with an extra “but done on a computer”, then suing other companies doing these things on a computer; the Supreme Court later ruled that adding “but done on a computer” to an abstract idea does not make it a valid patentable invention. Software patents are still controversial.
You likely will never need to patent or trademark your work, but you automatically hold copyright in it.
Questions sometimes arise about who, actually, holds the copyright in your work. If you’re a student at CMU and you write an R package, do you own the copyright, or does CMU own it? If you work for a big tech company and write some code, are you allowed to release it or do you need their permission?
CMU has an Intellectual Property Policy for its employees. Essentially, unless CMU paid specifically for your code, specifically hired you to write it, or provided expensive facilities to help you produce it, it’s yours, subject to a few conditions (i.e. “if you make lots of money from it, we get a cut”).
Certain grants may impose requirements, but usually the conditions are things like “the granting agency shall be allowed to use your code”, not “we own your copyright.”
If you’re hired by a company in the United States to a job that involves writing a lot of code or other copyrightable stuff, your work may qualify as “work made for hire”, e.g. you were hired specifically to create this work as part of your job. In that case, the company owns the copyright, not you, and you have no special rights to the work.
Alternately, you may sign a contract with a company specifying who holds the rights. Companies may have you do this even if your work qualifies as work made for hire, just so the rules are very clear. You can also sign contracts transferring your copyright to others – you might do this when publishing in a journal, for example. You’d have to read the contract to know what rights you’re giving up under what terms.
Code qualifies as copyrightable work (usually), so since your code is automatically copyrighted, to release it to the public you need to specify what, exactly, you’d like people to be allowed to do with it. This usually means writing a license: a legally binding set of terms and conditions. If people abide by the conditions, they have your permission to do things that copyright law would otherwise forbid.
You can write your own license terms, but you shouldn’t. Large projects with expensive lawyers have already written plenty of licenses with different types of terms, and you should pick the most suitable one.
A great deal of scientific software – and general purpose software – is open source. “Open source” is sometimes interpreted to mean “the source code is visible”, but it means more than that. The Open Source Definition gives clarity: open source software is available in source code form, can be modified freely by its users, can be freely redistributed in original or modified form, and can be used freely in any field and by any person. We make software open source not just so people can see its source code but so they have the freedom to use the software in many ways, with only minimal restrictions.
When it comes to licensing, you have a continuum of choices.
These open source licenses grant broad permissions, like permissive licenses, but also require that any copies or modified versions be distributed under the same terms. Anyone may copy or modify your code, but if they release their version to others, they must grant others the same permission. The GNU General Public License is the most prominent example; Linux, R, Python, and many other prominent projects are licensed under its terms.
Copyleft licenses are intended to preserve the freedom of users to see the source code of the software they use, modify it, and redistribute it, while ensuring the source always stays open and free.
There are a lot of premade licenses. Don’t waste time trying to pick between ISC, Apache, BSD, MIT, LGPL, GPL, CDDL, and a zillion others; use the Choose a License website and move on. Unless your project has particularly odd or special requirements, you’ll be fine with its recommendations.
A notable license is the CRAPL, specifically designed for academic software, including such important terms as “You agree to hold the Author free from shame, embarrassment or ridicule for any hacks, kludges or leaps of faith found within the Program.”
To license your software:
README
file indicating the license. You can write something like “Copyright 2019, Your Name Here. Released under the terms of the [license] license.”LICENSE.txt
file containing the full text of the software license. Put this in the root directory, along with the README
.Specific licenses may have other recommendations; the GPL recommends placing a licensing notice in a comment at the top of every file, for example.
Software licenses are written specifically for software, containing lots of legal languages about source code and compiled code and executables and so on. But you may also want to release other things: documentation, papers, images, your thesis…
Copyright applies to all these works as well, and software licenses aren’t well-suited for them.
(Remember, data itself can’t be copyrighted in the US, unless it involves creativity. Mere statements of fact do not qualify.)
You could say “All rights reserved”, and readers would only have the right to read the copies you give them. But there are again a zillion more permissive licenses you could potentially choose from, or you could write your own. Instead, if you want to grant others permission to reuse and redistribute your work, choose a Creative Commons license. They have a simple license chooser asking a few questions:
Answer those two questions and they direct you to a pre-written license, with instructions on how to use it.
Many people discourage answering “no” to the commercial use question; if you don’t like the idea of other people making money off your work, instead select the option to require others to “share alike”. This way, if a company e.g. takes your work and adapts it into a textbook they sell for $150, they’re required to make that adapted version freely available and redistributable.
I advocate for releasing things under open, non-restrictive terms, including your papers; I recommend Boyle’s The Public Domain to understand why you’d want to do this.
Traditional academic journals make their money by selling subscriptions to access papers, so they do not want their papers available under open, non-restrictive terms. Typically, they require authors to sign a copyright transfer agreement granting the publisher the exclusive right to publish the work; authors only retain minimal rights to, say, use the paper in their classes and give copies to friends.
The rise of pre-prints and open access journals has challenged this. Most journals now realize that authors will post their preprints on the arXiv (or the equivalent for their field), and copyright transfer agreements allow this. Some also allow authors to post preprints on their own websites or on university databases, though they may require a delay of several months after publication. Check the SHERPA/RoMEO to find policies for any journal.
When you submit to arXiv, the default license for your papers is a “non-exclusive license to distribute”, which just means you grant arXiv permission to distribute the article, but retain your copyright and can publish the article elsewhere (like a journal).
Open-access journals, on the other hand, do not charge subscriptions, and release their articles under open license – often the Creative Commons Attribution license, allowing articles to be redistributed freely as long as they are credited to the original authors. You can also mark your arXiv preprint with the same license.
(Why does a free license benefit articles? Maybe someone wants to scrape your work and do classification with it, or make a website that automatically suggests new papers to readers based on their prior interests. Maybe you want to extract bibliographic data and use it for something. Maybe you want to make a joke website that presents real paragraphs from papers next to ones generated by Markov chains and challenge readers to tell the difference. An open license permits all these uses, and many others, whereas a traditional journal license does not.)