In This Issue
Engineering Culture
March 1, 1997 Volume 27 Issue 1
Most of the fundamental changes in our modern society and way of life have been brought about, directly or indirectly, by technology.

Legal Protection for Database Contents

Saturday, March 1, 1997

Author: Pamela Samuelson

Scientists and engineers need to get organized to provide input to policymakers on proposals that may limit the use of data for research purposes.

The scientific and engineering communities have a very substantial stake in the outcome of the current debate over whether database developers should be given intellectual property rights in the contents of databases. Copyright law has always protected creativity in the selection and arrangement of data but has regarded the data themselves as unprotectable elements of a data compilation.

Under recently proposed laws, however, database developers would actually be granted rights in the data themselves. A draft treaty to require worldwide adoption of a new form of legal protection for the contents of databases was considered at the December 1996 diplomatic conference hosted by the World Intellectual Property Organization (WIPO) in Geneva. Although the conference decided to set aside the draft treaty for now, there is every reason to expect that similar proposals will make their way onto national and international legislative agendas in the near future. Scientists and engineers need to get organized to provide input to policymakers about these proposals.

While some might argue that the U.S. R&D community could benefit from database protection, insofar as it might provide a new source of revenue in an era of declining government subsidies for scientific and technological research, the ultimate effect of commercializing data of scientific importance may be to undermine the scientific progress upon which more practical technological advances often depend as fewer and fewer scientists become able to pay prices that a commercial vendor has set for the market. This is the conclusion reached in a new report1 of the National Research Council (NRC), which documents a number of examples of privatization of data that hampered scientific endeavors.

The presidents of the National Academy of Sciences, National Academy of Engineering, and Institute of Medicine were sufficiently concerned about potential harmful effects arising from the proposed database treaty that they wrote to Secretary of Commerce Mickey Kantor to ask the administration to withdraw its support for it. The presidents found it "especially disconcerting" that the U.S. had supported the treaty "without any debate or analysis of the law's potentially harmful implications for our nation's scientific and technical development." Moreover, they wrote, although the consequences of such a law "appear very grave to those studying these issues, very few individuals at the science agencies or in the academic community appear even to be aware that such changes are about to take place, nor has there been any effort made to solicit their views." Similar outpourings of concern about the database treaty came from a number of other sources, including the President's Science Advisor, John Gibbons. Even prominent information providers, such as Dow Jones & Co. and Bloomberg, L.P., expressed opposition to such a treaty.

However, the governing body of WIPO is already set to convene an extraordinary session in March 1997 to set up a new Committee of Experts to consider proposals for a new treaty on database protection. This committee will almost certainly give some weight to the draft treaty proposed by the previous Committee of Experts. Moreover, proponents of legislation to protect the contents of databases in the U.S. have made clear their intent to renew efforts to achieve their goal in the 105th Congress.

Given these developments, it is well to understand, first, what the rationale for such legislation might be, and second, what problems might exist with respect to current proposals to enact such a law. This article will concentrate on concerns about the overly broad definition of "database" in last year's U.S. legislation and the WIPO treaty as well as on the lack of meaningful guidance about the scope of rights users are supposed to have with respect to "insubstantial parts" of database contents. In addition, the article will argue that since there are no fair-use or other scientific- or educational-use limitations on rights of database owner, the breadth of the rights that such a law may grant seems far beyond what is needed to avert market failure. Such a law may also unduly restrict businesses efforts to add value to data when reusing it. The article will conclude that even if a need exists for some additional legal protection for database contents, current proposals need significant refinement before they are suitable for adoption as U.S. or international norms.

Protection Through Copyright
The major premise underlying current proposals for a sui generis (of its own kind) form of legal protection for the contents of databases is that copyright law alone does not adequately protect the interests of database developers. Copyright law does not, for example, protect all databases, only those that evince creativity in the selection and arrangement of the data. For instance, no matter how much money or effort someone might have spent to collect data or maintain a database, an unoriginal compilation (e.g., listings in telephone directories or compilations that might have been generated automatically or in some other methodical way) will not qualify for copyright protection under the laws of most nations, including the United States. In 1991, the U.S. Supreme Court rejected the so-called sweat-of-the-brow theory of originality that had been used previously to justify copyright for uncreative data compilations in its Feist Publications v. Rural Telephone Service decision, 499 U.S. 340 (1991).

In addition, although copyright law protects database authors from those who copy all or a substantial part of his or her creative selection and arrangement of data, it does not protect the data as such. No copyright infringement will generally occur when someone reselects and rearranges data taken from another's database. Indeed, an appropriator of someone else's data may even claim copyright in his or her own original selection and arrangement of those data.

Database firms often use contract as well as copyright law to protect their interests. However, contract law does not generally regulate the conduct of those who have not agreed to a contract's terms. Thus, if a user violates a contract with a database by extracting and redistributing a substantial part of the contents of a database, the database firm could not rely on contract law to get compensation from those to whom the user might have redistributed the data or to reclaim the data from those to whom it was transmitted, since they were not part of the original contract.

Proponents of the new legislation argue that without additional legal protection, databases will be vulnerable to market-destructive appropriations that, left unchecked, would undermine substantially incentives to invest in the creation and maintenance of databases. Data compilations in electronic form are said to be especially vulnerable because digital technology makes it so trivially easy for users to select and arrange the information they contain. In addition, print compilations can now be so easily scanned and their contents manipulated once the compilations are in electronic form, that it seems to make no sense to discriminate between electronic and nonelectronic databases. Both are vulnerable to market-destructive appropriations that existing law does not remedy adequately.

Industry Viability
Since databases have become among the most commercially important information industries and since there is a substantial public interest in the availability of high-quality data compilations, the law should provide protection adequate to ensure the viability of the industry not only in countries already having strong database industries, but also on an international scale to promote international trade in data from databases.

The idea for sui generis legislation to protect the contents of databases originated from the European Union (EU). As part of a 10-year effort to update their intellectual property laws to meet the challenges posed by digital technologies, governing bodies of the EU issued a directive in March 1996 requiring member states to pass by January 1998 legislation to protect database developers against unauthorized extractions and reuses of substantial parts of database contents. EU representatives also proposed an international treaty to establish such a legal regime worldwide. Even before the EU made this proposal, it had been clear that the Europeans intended to export their database protection concept to other nations. All drafts of the database directive contained a reciprocity clause under which databases of foreign nationals would not be protected by the new EU law unless the nations in which the databases were created adopted an equivalent law. This was of especial concern to U.S. database companies, because they held a leading position in the European database market and because the United States had no comparable law.

In response to the EU proposal, representatives of the U.S. government submitted its own proposal for a database protection treaty in spring 1996. Although the U.S. proposal was similar in many ways to the EU plan, it was different in some important respects. In the eyes of some U.S. database companies, the U.S. proposal would have "fixed" some deficiencies in the European proposal. It would, for example, have lengthened the duration of protection from 15 to 25 years; given database owners the right to control uses as well as reuses of their data; required nations to give the same degree of protection to the databases of foreign nationals as they granted to their own citizens; and permitted users to contract away their rights to take insubstantial parts of database contents, an action the EU directive had forbidden. Shortly after U.S. representatives submitted their proposal to the WIPO Committee of Experts, Rep. Carlos J. Moorhead (R-Calif.) introduced very similar legislation in Congress, the Database Investment and Intellectual Property Antipiracy Act of 1996 (H.R. 3531). Although no hearings were ever held on this bill, new legislation of this sort will almost certainly be introduced in the 105th U.S. Congress.

The draft treaty recommended by the WIPO Committee of Experts in August of 1996 adopted all of the rules on which the U.S. and European proposals agreed. For example, developers of databases would automatically qualify for legal protection against unauthorized extractions and reuses of all or substantial parts of their databases, so long as they had made a substantial investment in the collection, assembly, verification, organization, or presentation of the database contents. The substantiality of the investment would be judged not just in quantitative but also in qualitative terms. Rights would last a substantial period of time. (The WIPO draft treaty left to the diplomatic conference the choice between 15 and 25 years of protection.) If a database developer made additional substantial investments, for example in updating the database or otherwise maintaining it, this would give rise to additional terms of protection. This essentially meant that protection of the contents of a database would be perpetual unless the database developer abandoned upkeep of the database.

Lawful users of databases would generally be free to take insubstantial parts of databases. Others would also be free to develop databases having the same content, as long as they did so by means of their own independent labor and not by extracting and reusing substantial parts of other people's databases. Claiming protection under the database law would not prejudice a developer's right to claim protection under others laws, such as copyright or trade secrecy law.

In all respects in which the U.S. and EU proposals differed (other than on the duration of protection), the WIPO draft treaty more closely resembled the U.S. than the European proposal.

We all know what a database is, right? Or do we? The U.S. database proposal to WIPO, like H.R. 3531, contained a broad--perhaps too broad--definition of "database," calling it "a collection, assembly or compilation of works, data, information, or other materials arranged in a systematic or methodical way." Like the European directive, the U.S. proposal excludes computer programs from the scope of the proposed law, but the U.S. proposal makes explicit what may be implicit in the European directive: that insofar as computer programs incorporate any database components, the database components can be protected by the database law.

It is easy to imagine the developer of a program wanting to protect a data set embodied in a program, arguing that these data constitute a database component of a program that can be protected by the database law. (Microsoft, for example, might seek protection for interface specifications.) Currently, the developer cannot expect to get copyright protection for data or interfaces in a program, because data are unprotectable as facts and interfaces are an external factor constraining design choices of subsequent programmers, which means that they lack sufficient expressive content to be protectable by copyright.

What Would Be Protected?
But if the developer has made a substantial investment in developing data in a program, such as interface specifications, might it not claim rights in them under the database law? Are not such specifications "a collection . . . of . . . information . . . arranged in a systematic or methodical way"? Would not an extraction of data from a program be an appropriation of all or a substantial part of that database component of the program? There is nothing in the U.S. proposal that would limit the scope of database rights for components that have become standards or otherwise constrain subsequent design choices.

What would such a law mean for copyrighted works more generally? Are the facts and theories contained in a historical work a "database" under the U.S. legislation? What about a book about a scientific theory and data published in support of it? If someone extracts and reuses the component parts of these works in a subsequent work, will he or she be liable to the author of the first work under the database law? Traditional principles of copyright law have generally regarded the appropriation of facts and theories from a preexisting work as noninfringing, because facts as such are not "expressive" and reuse of facts and theories has long been thought to promote the principal purpose of copyright, which is to advance knowledge. The purpose of the database law, however, is to protect investments, not to promote knowledge. Which purpose should prevail when the two purposes are in conflict? The EU directive and other database proposals do not even recognize that such a conflict might exist.

Before their adoption as international norms, the database proposals need more clarification about what will and won't be considered a database and how to mediate tensions between the law's desire to protect investments in database development and to promote other important social goals such as advancing knowledge.

The European directive explicitly grants lawful users of publicly accessible databases rights to extract and reuse "insubstantial parts" of the databases' contents. Other proposals seem to contemplate such a user right as well, but they generally do not give a precise definition to this term or otherwise provide guidance on what criteria should be used to determine insubstantiality. The term naturally suggests that the quantity taken will matter, but the directive and its U.S. counterparts also contemplate that substantiality will be judged in qualitative terms. In addition, all of the database proposals indicate that no taking would be considered insubstantial if it conflicts with a normal exploitation of the database or unreasonably prejudices the legitimate interests of the maker of the database.

"Substantial Content"
H.R. 3531 would have outlawed the extraction and use of qualitatively or quantitatively substantial portions of database contents

  • in a product or service that directly or indirectly competed in any market with the database from which it was extracted;
  • in a product or service that directly or indirectly competed in any market in which the database owner has a demonstrable interest or expectation in licensing or otherwise using or reusing the database;
  • in a product or service for customers who might otherwise reasonably be expected to be customers for the database; or
  • by or for multiple persons within an organization or entity in lieu of the authorized additional use or reuse (by license, purchase, or otherwise) of copies of the database by or for such persons.

This list is said to be illustrative rather than exhaustive, yet it is so broad that it would seem to cover almost any appropriation of data from a database. Thus, if a database developer decided that all of the contents of its databases were qualitatively substantial and it was willing to charge for extraction and reuse of any part of these databases, it would appear that lawful users would have no right to extract or use any part of the database without further payment for it.

The breadth of the database developer's potential rights under current database proposals are troublesome partly because there appears to be little attempt to craft limitations on database owner rights for such purposes as scientific research or education. In the EU, this issue was hotly debated while the database directive was under consideration. Those who proposed the directive thought initially that the directive protected user rights adequately, since users would have rights to take and reuse insubstantial parts of a database for any purpose and since these rights could not be contracted away. However, the European Parliament pushed strongly for broader rights for those who might extract data for various noncommercial purposes, such as research and education, and the final directive contains a provision that enables member states to adopt one or more of three exceptions to the sui generis right, if they wish. These would allow lawful users to make substantial extractions from publicly accessible databases

  • from nonelectronic databases, for private purposes;
  • for the purpose of illustration for teaching or scientific research, as long as the source is indicated, but only to the extent that is justified by the noncommercial purpose to be achieved; and
  • for purposes of public security or an administrative or judicial procedure.

These exceptions are so narrow that it would appear that one could not extract or use data from a protected database for the purpose of scientific investigation, but only to illustrate conclusions already achieved. Nevertheless, they are in stark contrast to last year's U.S. legislation and the WIPO database treaty, which would have provided no fair use, private use, educational, scientific, research or any other public policy exceptions to, or limitations on, the scope of database owner rights to control uses or appropriations of database contents. While the draft treaty seemed to contemplate that nations could limit the rights of database owners, it restricted such limitations to those that would not unreasonably conflict with the normal exploitation of the database or unreasonably prejudice the legitimate interests of the rightsholder. Given the criteria for determining which takings are "substantial," it is fair to interpret the draft treaty as effectively forbidding exceptions to allow the extraction and use of substantial parts of databases for scientific research purposes. Even the EU's narrow limitations for purposes of scientific illustration might not satisfy the draft WIPO database treaty provision.

The Enlightenment values that underlie existing copyright law support the notion that people should refrain from copying someone else's creative selection and arrangement of data, but they should generally be able to appropriate data and reuse them in different ways. However, because electronic information is so easy to extract and manipulate, there may be some need for a law to protect against appropriations of data that most people would agree are unfair, even if such takings do not infringe a copyright.

Take the recent case, ProCD v. Zeidenberg, 86 F. 3d 1447(7th Cir. 1996). Zeidenberg had acquired a copy of a CD-ROM published by ProCD containing many thousands of telephone directory listings. Zeidenberg ignored the shrinkwrap license that came with the CD that purported to restrict to home use his right to the data and loaded all of the listings from the CD onto his web site, which he intended to commercialize. Zeidenberg claimed that there was no copyright protection for the data compilation and that the shrinkwrap license was unenforceable. The appellate court accepted Zeidenberg's argument about copyright but decided that the shrinkwrap license was enforceable.

Little Room for Second Comers
Leaving aside the question of whether a shrinkwrap license should be enforced in such a case, the ProCD case seems to involve the kind of data appropriation that probably should be regulated by the law. If Zeidenberg can upload all of this digital information to his web site, it would substantially undermine incentives for companies like ProCD to expend resources--said to be $10 million in this case--compiling useful data, keeping it accurate, and making it available in mass-marketed products. Here is someone who has appropriated the whole of someone else's database, has added nothing new, and intended to exploit commercially the data in a closely proximate market.

If the United States already had a new law that protected databases against unauthorized extractions and reuses of substantial parts of their contents, it would have remedied the wrong in the ProCD case without overstretching contract law to reach the same result. As applied to this kind of case, the basic concept of the EU directive seems to be sound.

But even if there is the germ of a good idea in the database directive, it and similar proposals may grant more extensive rights than are really needed or desirable. ProCD doesn't really need 15, let alone 25, years of protection for the listings in its directory. A far shorter term would undoubtedly provide it with adequate lead time in which to recoup its investment.

Current proposals for a database law also leave too little room for second comers to develop follow-on products and services. No matter how much the appropriating second comer has invested and no matter how different the new product or service is from the originator's database, current proposals seem to forbid unauthorized extractions of more than an insubstantial part of database contents. A better approach might be to go back to concepts embodied in the first draft of the database directive, which would have regulated "unfair" extractions and reuses of more than insubstantial parts of databases, not "unauthorized" extractions and reuses.

Among the factors that might be taken into account in determining unfairness might be the nature of the data (e.g., whether it has scientific importance); the second comer's purpose in taking it (e.g., to study it or to exploit it commercially); what the second comer ultimately does with the data (e.g., make a substantial investment in the data; add value to the data; what kind of product the second comer brings to market; and how proximate or different the markets are in which the two products operate).

Unless a database protection law takes these kinds of factors into account, there may eventually be too little competition and follow-on innovation in the information marketplace. A law that concentrates only on how much was taken and whether the part that was taken has any commercial value also may impede inadvertently scientific research upon which commercialization of practical applications ultimately depends. Extraction and reuse of data are core activities within scientific and engineering communities. It is disturbing that the database law has thus far been given its contours by government bureaucrats, intellectual property lawyers, and the few publishers and information industry groups who knew what was afoot. Such a law has not been "vetted" by any scientific groups, let alone other research organizations or educational institutions, even though a law based on current proposals will have a substantial impact on all of them.

Ordinarily, one might expect governments to represent the public interest when initiatives of this sort arise. One factor that has impeded such consideration is that many governments, especially in the EU, have begun looking to the data they generate and compile as new sources of revenue in an era of substantial government spending cutbacks. The draft WIPO treaty would have left this issue to individual countries to resolve.

Also of concern is the idea of an intellectual property law that protects expenditures rather than innovation. All other existing intellectual property laws require at least some creativity, innovation, or novelty before a person can claim rights to control exploitations of information products. A law that focuses on expenditures may induce wasteful expenditures of resources (or inflated estimates of how much it cost to compile data).

To prevent monopolistic practices harmful to competition and to public access to information, further consideration needs to be given to regulating the conduct of developers of databases whose contents cannot be replicated by independent effort. The first two drafts of the European directive would have required sole-source and government-generated database developers to license extractions and reuses of substantial parts of their databases on fair and nondiscriminatory terms. The final directive abandoned this approach, although the EU will be studying database practices for a period of years after the directive goes into effect to see if the licensing requirement provisions should be restored. Neither the U.S. legislation nor the WIPO draft treaty addressed these issues.

This article has identified a number of reasons to be concerned about current proposals to create a new form of legal protection for the contents of databases. There is as much reason to be concerned about the potential for overprotection of databases as there is to be concerned about underprotection. Because the scientific and engineering communities have an important stake in the outcome of this debate, they should become familiar with these proposals and let their concerns be known to policymakers. Failure to do so may result in the adoption of legal norms that, with respect to database rights, have adverse effects on science, engineering, and education. In time, society as a whole may suffer from not taking these concerns into account.

1 Bits of Power--Issues in Global Access to Scientific Data, a product of the NRC Committee on Issues in the Transborder Flow of Scientific Data (U.S. National Commission for CODATA, Commission on Physical Sciences, Mathematics, and Applications), was slated to be released publicly in mid-March.
About the Author:Pamela Samuelson is a professor of information management and of law at the University of California at Berkeley. A version of this paper appeared first in Vol. 39, No. 12 of Communications of the ACM 1996 Association of Computing Machinery. Reprinted by permission.