from OnEarth, Fall 2009
E. O. Wilson has a dream. In 2003, in an essay in the scholarly journal Trends in Ecology & Evolution, the eminent Harvard biologist sketched out his vision for what he called “a single-portal electronic encyclopedia of life.” This encyclopedia — a Web site, essentially — would grant each of the documented 1.8 million species on earth its own page featuring a detailed summary of everything known about it: its scientific name, habitat, and geographic range and distribution; what it eats and is eaten by; and where it fits on the evolutionary tree of life. There would also be hyperlinks to genetic databases and other pertinent information. It would be freely accessible to everyone everywhere, scientist and layman alike. In a speech at the 2007 TED conference, an annual mixer of creative and scientific minds, Wilson likened the encyclopedia to “a biological moon shot” in its ambition and imperative.
He’d hit the right note for the right crowd at the right time. Work had already started on the encyclopedia, and financial support was beginning to flow. The John D. and Catherine T. MacArthur Foundation and the Alfred P. Sloan Foundation together provided $12.5 million in seed money, with the prospect of additional funds dependent on the program’s progress. Several major scientific organizations signed on with personnel, logistical support, and financing. (Wilson, though not directly involved in the encyclopedia, operates as its avuncular totem and most prominent booster.) All told, the project could consume as much as $100 million in its first decade. Meanwhile, the software developer Adobe Systems, known for its popular creative applications like Photoshop and Flash, volunteered to develop a cutting-edge user interface for the encyclopedia, something to shake the cobwebs off the old “tree of life” metaphor and reimagine it for the twenty-first century. Silicon Valley discovered nature, and it was good.
In February 2008 the Encyclopedia of Life saw first light at eol.org. With Web pages for 30,000 species, mostly fish and amphibians, it was but a shadow of its promised future self. Yet with its appealing layout and media visibility, the site proved instantly popular, crashing five hours — and nearly two million page hits — after launch. The encyclopedia has grown rapidly since, adding staff and expertise, forming new partnerships with libraries and biodiversity databases, expanding the pool of organisms it can represent online, and adding interactive components like the opportunity for users to post comments and tags or upload photographs via Flickr. It’s not yet the Google of biology, but it’s one step closer. “We showed proof of concept,” says David Patterson, a microbiologist at the Woods Hole Marine Biological Laboratory who helped develop the encyclopedia’s software engine. “The rest is just a chore.”
BIOLOGY HAS BEEN CREEPING online for some time now. The Wikipedia model is especially contagious, and numerous subject-specific Wikis, such as GeneWiki and WikiPathways, have sprung up to help scientists share information, interact, and collaborate. For veteran biodiversity scientists accustomed to the dour theme of species loss and the loneliness of an underfunded profession, the excitement and attention surrounding the encyclopedia are a particularly refreshing change. “I’ve never been involved in anything in my entire life that has generated so much enthusiasm,” says Jesse Ausubel, an environmental scientist at Rockefeller University and the founding chairman of the Encyclopedia of Life’s original steering committee. “The reaction has been overwhelmingly positive. The biggest problem has been managing expectations. People don’t want to wait. This should have existed yesterday; they want it tomorrow.”
Some, however, are skeptical. “In my 25 years of databasing and bioinformatics, I’ve seen so many projects like this come and go that it’s hard to get excited about the next one,” says Barbara Thiers, director of the New York Botanical Garden’s Herbarium, a prized collection of several thousand plant specimens from around the world. And with so many species in peril, is $100 million really best spent by giving every one of them … a Web page? Meanwhile, the 1.8 million known species are a small fraction of all those yet to be discovered, catalogued, and named. How would the encyclopedia help them?
Patterson sympathizes. As a leading expert on marine microbes — a vast, poorly known, and terribly uncharismatic category of organisms — he appreciates the plight of the world’s species and of the unheralded scientists who track them. Yes, he says, the encyclopedia will be cool, comprehensive, and visually dazzling. But the real promise is hidden under the hood. The encyclopedia’s true gift, he contends, is in its potential to pool and sift biological data from everywhere, in a manner that will change the quantity and quality of what scientists — and casual viewers — can learn about life on earth and the manner in which they do so. Never mind evolution; this is revolution.
PATTERSON WORKS OUT OF a two-story clapboard building in the postcard village of Woods Hole, Massachusetts, not far from the ferry landing to Martha’s Vineyard. His colleagues call him Paddy and often greet him with some variation on “Hey, dude,” even though Patterson is in his late fifties. In dress he is thoughtfully disheveled: rumpled button-down shirt, khakis, sandals over socks. In conversation he reveals an Irish accent that’s been loosely combed by his years in English and Australian academia. When I enter, he swivels away from a pair of flat-screen monitors on his desk. “Hey, man,” he says, and motions to an empty chair.
He flips open his laptop. He wants to share a PowerPoint presentation that he’s been giving lately at professional meetings to introduce and explain the Encyclopedia of Life to working biologists. While he struggles to open the right file — “A pox on the world,” he says with a laugh — my attention wanders to what looks like a small Japanese stone garden on his desk, the sort of thing a person might soothingly rake during international phone calls about database incompatibilities. In fact it’s the litter box for J. B., a green iguana that roams Patterson’s office and occasionally peeks out from behind a monitor. He arrived some months earlier as the subject of a photo shoot for the encyclopedia’s splash page and stuck around. “He’s our conscience,” Patterson says. “With the encyclopedia being so computer based, it’s important to maintain the reminder that what we’re really trying to do is about biology.”
Patterson runs the encyclopedia’s biodiversity informatics team. Informatics is the emerging, and increasingly central, art and science of computer-data management. Nowadays the major sciences are awash in raw data. Earth-observing satellites beam back innumerable details about the planet’s workings, from the shifting area of Arctic sea ice to the respiration rates of Pacific plankton. NASA telescopes fill data banks with images — infrared to ultraviolet — of near and deep space. Lately the hippest theories in science involve not how to interpret all this data but how best to mine, manage, massage, and visualize it.
Biodiversity informatics is a newcomer to the party; in many ways the Encyclopedia of Life marks its official coming out. Of the tens of thousands of biologists around the world, only some 6,000 are taxonomists, trained to identify new species and confirm the identity of existing ones. They are experts in spiders, experts in sea worms, experts in fungi. Theirs is tedious work: peering through microscopes day after day, counting tiny hairs on tiny stems or tiny legs to distinguish one organism from another. The rise of ecology brought a wider appreciation of biological systems and the ties that bind them. Yet a knowledge of the particulars — how to identify individual species — is as critical as ever. “You can’t do much about preserving biodiversity if you don’t know what you’re looking at,” Thiers says.
Patterson calls taxonomists “custodians of knowledge.” The sum of their tremendous wisdom, he notes, is mostly squirreled away in eccentric custodial closets: published in hard-to-find journals, crammed into personal libraries, pinned in specimen drawers in back rooms of museums, locked away in graying heads. These databases are unique, essential, and almost entirely off the electronic grid. “There’s mine right there,” he says, pointing to two wide filing cabinets against one wall of his office. The rest of the wall is occupied by shelves thick with books suitable for a microbiology library – The Mycetozoans, The Trichomycetes, The Biology of Amoeba, The Cellular Slime Molds -- or, perhaps, for an old episode of Star Trek. The most accessible book has a familiar yellow binding: Iguanas for Dummies.
The project and promise of the Encyclopedia of Life is to pry all of this information from its various closets and make it universally accessible online. Open-source computing, meet open-source biology. “Taxonomists tend to be possessive,” Patterson says. “They hold what they find. But it’s not a model that will work going into the future. The challenge is to shift taxonomy out of its parochial format into one that’s considerably larger than the sum of its parts.”
AS A KID GROWING UP in Belfast, Patterson had a running debate with his older brother, Samuel, now a prominent mathematician in Germany. Math, his brother argued, is the underpinning of the universe, the logical framework from which all other knowledge emerges. Young David disagreed. He reasoned that mathematics is a product of the human brain, which evolved imperfectly through natural selection, so it cannot be internally consistent. “Then he’d sit on me — that’s how every argument ended — and I’d go off and stare at the garden pond.”
Patterson bought a microscope to study the pond more closely and developed what would become a lifelong fascination with microbes. A few years ago he created Microscope, a communal Web site where microbe wonks could share photos and descriptions of their favorite organisms, and which opened his eyes to the encyclopedia’s possibilities. All the while, the old debate gnawed at him. The so-called hard sciences have coalesced around fundamental entities: physics has atoms, particles, and formulas; chemistry has 117 periodic elements (and counting), which are readily described by their physical properties. The closest thing biology has to atoms, Patterson says, are species-loosely defined, hotly debated groupings that number in the millions. With the encyclopedia, Patterson believes he’s found a unifying code.
It’s tempting to think of the encyclopedia as a larger version of Microscope: a single, Wikipedia-style database of everything known about life on earth. In fact, it’s really an automated index, more Google than Wiki. The user sees a collection of Web pages describing the world’s organisms in a standardized format: physical appearance, size, geographic range and distribution, and the like. Backstage, computer algorithms trawl the Internet and online databases, grab pertinent bits of data, and aggregate them for the viewer — on demand, in real time, and seamlessly.
Developing that software was Patterson’s task, and it’s largely complete. He found that every organism can be described by a series of three-part statements, or “triples”: its scientific name, a feature (shape, color, geographical region, and so on), and the character of that feature (for example, round, blue, Indonesia). Each of those elements can then be assigned a numerical value. A certain species of freshwater algae, for instance, might be given three tags: for its name (Gymnodium hiemale, denoted by the number 00631956), a feature (cell shape, or 88158), and the state of that feature (ovoid, or 0007). In the cyber environment, that species becomes known as 00631956:88158:0007 — a forbidding number, to be sure, but one that the encyclopedia bots can readily nab, reshuffle, and build upon.
The approach is standard in engines like Google Maps. For Patterson it offers a basic language through which separate biological databases can be made to talk to one another and represents a way to extract data in a format that is finer and more flexible than the one in which it was first entered. “The encyclopedia is a demonstration that there’s a particular logic that will work for all organisms,” he says. The shoptalk around his office involves “mashups” and “data objects” and “atoms of information,” as when Patterson says, “Taxonomy should become no longer a listing of species but a style of managing atoms of information.”
Unlike Google, however, the Encyclopedia of Life is starved for material; although the number of species pages has now reached 170,000, there is yet relatively little to be mashed up. The initial version was filled mainly with fish and amphibians because it drew largely from two of the only databases that already existed, FishBase and AmphibiWeb, both painstakingly assembled by outside natural-history groups. Accordingly, one whole branch of the encyclopedia project is working to create new databases and make existing ones digitally accessible.
The encyclopedia is divided into five groups. One, managed out of Harvard, handles education and outreach to the general public. A second, the Biodiversity Synthesis Group, based at the Field Museum in Chicago, organizes workshops for taxonomists to hash out classification issues and consider which species the encyclopedia should next focus its efforts on. A third unit, at the Smithsonian, recruits new taxonomists from around the world to help authenticate the species information that ends up on the EOL Web site. The fourth unit, the Scanning and Digitization group, also at the Smithsonian, is responsible for forming relationships with data partners and adding new content to the encyclopedia-a key role as it seeks to expand its representation of life on earth. This group is led by the Biodiversity Heritage Library, a consortium of 10 natural history and botanical libraries, which is scanning and digitizing everything published about biodiversity before 1923 — about 500 million pages of literature. (So far it has put nine million pages online, searchable by the EOL bots and available to anyone free of charge.) Many of the plant specimens in the New York Botanical Garden’s vast collection have been digitally photographed; these too will be tapped by the encyclopedia. The encyclopedia has also formed partnerships with the Tree of Life project, a longstanding effort to map the evolutionary heritage of the world’s species, and the Global Biodiversity Information Facility, a nexus of information on museum specimens and field observations.
The fifth and best-financed group is Patterson’s biodiversity informatics unit. To help bring more species information into the encyclopedia, the unit recently developed a do-it-yourself database kit called LifeDesk to distribute to taxonomists. If you’re an expert on, say, the mollusks of the Pacific Northwest, you (or your grad students) can readily input everything you’ve gathered and published about the species you know — life history, identifying traits, images — in a standardized format. At your invitation, other naturalists — or even a high school teacher or a park ranger with a passion for identifying, say, ferns or mushrooms or scarab beetles — can pool their knowledge with yours. A push of a button opens your database to the encyclopedia’s data grabbers.
It’s worth noting that the content on the encyclopedia Web site cannot be directly altered. Nobody will be randomly redefining an aardvark as “one ugly animal” or a “medium-size inflatable banana,” the sort of shenanigans that occur regularly on Wikipedia. Instead, edits can be made only in the databases where the information actually resides. This data is then sucked into the encyclopedia, where Web pages for individual species or groups are further “curated” by selected taxonomists, who monitor incoming data and moderate any disputes. (Lay visitors can also add their own species information, but it will be marked in yellow as unvetted.) LifeDesk is just one more way of ensuring that whatever information appears in the encyclopedia has been approved by a professional biologist. Between LifeDesk and Scratchpad, a similar project run by the Natural History Museum in London that is popular in Europe, Patterson hopes to have 50 percent of the world’s taxonomists online in the next two to three years, all pooling their data for the encyclopedia to browse.
Once the encyclopedia has grown to include perhaps half a million species — which might take three to five years — it will dramatically broaden what scientists can learn about life on earth. “The value of the encyclopedia as a whole is as a macroscope, to look across the big picture of hundreds of thousands of species,” says Jesse Ausubel of Rockefeller University. A scientist might compare the life spans of hundreds of species across taxa and habitats to see what patterns emerge — the sort of study that currently is too complicated and expensive to conduct. Another might employ tags (a recently added feature of the encyclopedia) to establish which organisms eat, and are eaten by, others, thereby beginning to assemble a robust picture of food webs. Like Google Maps, the Encyclopedia of Life potentially offers a thick data platform on which other layers can be rapidly built.
It’s a new approach to biodiversity research, driven less by theory and more by one’s ability to imaginatively sift and slice data. Ausubel is a big fan of baseball, which in recent years has been transformed by sabermetrics, the analysis of nontraditional player statistics across the leagues. (The term is derived from the acronym SABR, for the Society for American Baseball Research.) A casual glance at the midseason stats, for instance, reveals that the rookie position player with the best “VORP,” a useful rating that gauges his value, in runs, over a potential replacement player, was Casey McGehee of the Milwaukee Brewers. Teams like the Boston Red Sox have succeeded in part by mastering the analysis of such data. The amassed data in the encyclopedia may well offer similar surprises. “I expect there will be lots of unexpected discoveries,” Ausubel says. “All kinds of curious things could turn up.”
THE ENCYCLOPEDIA OF LIFE REPRESENTS a race against time. Our planet is experiencing rapid environmental change. Numerous species, from rare Hawaiian caterpillars to Arctic polar bears, may well be extinct by the time the encyclopedia achieves its goal of indexing the world’s 1.8 million known species — never mind those still undocumented, which may number in the hundreds of millions. As E. O. Wilson put it during his address at the TED conference, “Our knowledge of biodiversity is so incomplete that we are at risk of losing a great deal of it before it is ever discovered.”
By bringing reliable information into one place online, the encyclopedia promises to hasten and democratize the process of identifying new and existing species. First-world libraries, where the bulk of knowledge about biodiversity resides, will become more accessible to naturalists in less developed countries, where most of the world’s biodiversity actually lives. Biologists can put data about newly discovered species online almost immediately, instead of waiting months or years for it to appear in obscure journals. “Taxonomists are completely swamped by their inability to share their research and get specimens,” says David Shorthouse, a young ecologist and spider expert on Patterson’s staff. “This is an opportunity to accelerate taxonomy beyond our wildest dreams.”
Looming behind the biodiversity crisis, meanwhile, is an equally pressing if far less recognized concern, what one might call the biodiversity-scientist crisis. Taxonomists are the librarians of life; without them, nature’s volumes are meaningless. But taxonomy is dwindling and its members aging, as universities and museums cut financing for this unglamorous yet essential science. “The sad thing is, just as the Encyclopedia of Life has come along, the number of people who supply biodiversity information is very small,” says Barbara Thiers of the New York Botanical Garden. “People aren’t being trained to identify individual organisms anymore. If the encyclopedia can change that, that would be wonderful.”
Patterson is betting that it will, in part because there is no alternative. “Taxonomy is at a break point,” he says. Sure, he can see how it might seem more fruitful to put the encyclopedia money into the pockets of taxonomists. But do the math: $25 million (four years’ worth of financing from the Sloan and MacArthur foundations) divided by 6,000 scientists would give each one “about enough to get you on an airplane and home — and then we’re right back where we began.” The encyclopedia offers a less direct but surer path to long-term recovery. “We have given relevance to what taxonomists are doing, and out of relevance should come increased funding and attention,” he says.
Half the trick going forward involves generating buzz: convincing the Facebook generation that counting tiny hairs on tiny stems or tiny insect legs is a doubly exciting career path now that you can link all your data to a global network of tiny-hair counters. That may be the easy part. Another obstacle is financial. Universities and museums must be persuaded that hiring taxonomists is a doubly worthwhile investment, now that their contributions are published online and for free. The old-school approach, of publishing taxonomic discoveries in journals that virtually nobody in the world could access or read, at least gave department chairs and financing committees something tangible to track and count toward your tenure.
The larger challenge, as the journal Science recently put it, is in “converting scientists from data hoarders to data sharers.” The issue is everywhere in academia now, as Googlization pushes researchers to go online and open-source. It’s part of what convinced Patterson to join the encyclopedia project in 2007; he felt that only someone with his senior standing in taxonomy could afford to take the leap. “If I were younger, this could be an absolute killer of career,” he says from his swivel chair. J. B. the iguana is peeking out from behind a monitor again and is giving Patterson the inscrutable eye. “Out of this comes no papers, no academic appointments. I’m treating this as my last job in professional life.”
In fact, it’s not. Under the leadership of Patterson and many other project scientists, the encyclopedia has done well; in August it received two more years of grant money from the Sloan and MacArthur foundations, worth another $12.5 million-a reassuring if not unexpected thumbs-up. The renewal brings some changes, however, including a new manager of the biodiversity informatics group. “EOL is moving from proof of concept into industrial-strength implementation,” Breen Byrne, the project’s publicity officer, wrote in an e-mail. Patterson’s replacement, who has yet to be hired, will have “additional expertise in implementing large-scale portals and databases.” Patterson, meanwhile, will move up, or perhaps aside, into the self-titled role of senior taxonomist, a still vague advisory position that likely will involve pulling other taxonomists and their content into the EOL fold.
Once a tiny-hair counter, always a tiny-hair counter, it seems. Thanks in part to the Encyclopedia of Life, that fate is far brighter than it has ever been.