The Opportunities and Challenges of Citizen Science as a Tool for Ecological Research
The Opportunities and Challenges of Citizen Science as a Tool for Ecological Research
Abstract and Keywords
This chapter discusses the opportunities and challenges of citizen science as a tool for undertaking ecological research. Before assessing the potential for large-scale citizen science to advance our understanding of ecological systems, the chapter considers the types of ecological research questions for which the scale (extent and resolution) of data from citizen science is particularly suitable. It then provides examples that illustrate how citizen science data can elucidate some of the processes relating to ecology, such as the underlying patterns of an organism's distribution and abundance as well as its life history and behavior. It also outlines research considerations that must be taken into account when designing (or continuing) citizen science projects.
In this chapter we explore the potential for large-scale citizen science to advance our understanding of ecological systems. We will (1) note the types of ecological research questions for which the scale (extent and resolution) of data from citizen science is particularly suitable, and (2) highlight research considerations that need to be taken into account when designing (or continuing) a citizen science project. We hope this chapter will encourage ecologists to see novel ways to make use of citizen science to examine large-scale patterns, effects of spatial variation on the processes they study, and higher levels of biological organization.
Within the term citizen science, science refers to the process of investigation rather than to the other meaning of science as a body of accumulated knowledge. The scientific process is often idealized as being fairly linear: researchers define a question based on prior knowledge or conjecture, make observations, form testable hypotheses, and conduct studies designed to potentially falsify these hypotheses (Popper 1959), leading to strong inference and publication of the results. The only criterion for scientific (as opposed to nonscientific) investigation is that the hypotheses have the potential to be falsifiable. Investigation is thus informed by a combination of theoretical models and carried out with observational tests of predictions, or, when possible, experimentation.
The collection and publication of natural history observations provides a base of knowledge from which initial questions and testable hypotheses can be formed. In many fields of science, avenues of research have progressed to the point where investigations based on collection of nonexperi-mental observations, such as natural history observations, are perceived as (p.100) not rigorous enough to push frontiers. Our belief is that this is not true for many ecological questions, specifically because patterns and processes can vary among locations or through time: intensive local studies are not sufficient to understand all ecological processes. One unique aspect of citizen science is that it facilitates the gathering of the natural history observations at enormous spatial and temporal scales. Thus, we argue that if used properly, large-scale citizen science data can open new avenues in ecological research. Indeed, even deciphering patterns from basic natural history data collected at large spatial and temporal scales often requires novel and complex analytical techniques (see Chapter 8).
Large-scale patterns built from citizen science methods can be employed strategically to advance ecological research in two ways: (1) by providing aggregations of observations that reveal patterns at larger scales and/or higher levels of organization, thereby generating new lines of inquiry into the processes that might account for newly discerned patterns; and (2) by allowing researchers to test mechanistic hypotheses that predict large-scale, or higher-level, patterns. These two options mirror the dual ways that citizen science schemes are formed, either with protocols designed to collect data to address a specific research question or, more commonly, with protocols that are designed robustly within the broader goal of monitoring without specific a priori hypotheses.
Higher-Level Phenomena and the Utility of Citizen Science
Often the creation of a new tool is responsible for opening up a new field of investigation. For example, tools to color band birds expanded the field of behavioral ecology and molecular tools expanded the field of evolutionary ecology. Many of these new tools allow scientists to examine mechanisms at lower levels of biological organization. As explained above, citizen science creates new research opportunities by increasing the geographic scale over which observations are gathered to gain insights into processes and mechanisms at work at higher levels of biological organization. Citizen science is especially useful when it not only provides data collected over large spatial or temporal extents but also provides data collected at fine resolution to address underlying processes. Logistically, citizen science works best when data can be accurately and cost-effectively obtained with direct public involvement in (at least) the data-collection process and with data validation tools in place (Chapter 3).
The endeavor of research via the citizen science “tool” can encompass the use of data that have already been collected by a project, use of an existing network of participants to collect new data, or development of an entirely new project. In the next section, we provide examples to illustrate (p.101) the potential of citizen science for ecological research by describing a suite of questions that require observations across a large geographic scale or during extended periods.
Opportunities for Research
Processes Underlying Patterns of Distribution and Abundance
Ecology is the study of the distribution and abundance of organisms (Andrewartha and Birch 1954; Krebs 2009) and their interactions with the environment. Citizen science data have been used extensively to describe species’ distributions (Gaston 2003). The processes underlying these patterns can also be elucidated using citizen science data, as the following examples illustrate.
Identifying Habitat Associations
As noted in Chapter 7, some forms of data that are gathered over large scales, such as remote-sensing data, can be easily collected without the need for citizen scientists. In most cases, however, data on the presence of species at a particular site can be collected only by human observers (Kelling et al. 2009). Therefore, a marriage between citizen science and remote-sensing data is often required to answer questions about the relationship between environmental variables and the distribution, abundance, phenology, or behavior of animals (or plants). For purposes of conservation and management, habitat associations can be determined with this combination of data (e.g., Brotons et al. 2004; Kéry et al. 2005; Chapter 9). Macroecology (Brown and Maurer 1989) is another facet of ecology where this marriage is fruitful.
Assessing the Impacts of Environmental Change
The accumulation of observations over long time scales has great utility for tracking changes in abundance, and this is best exemplified by the use of citizen science data from the North American Breeding Bird Survey, which has served as a key ingredient for conservation planning in North America, both for early identification of declining species and as the basis of detailed regional plans for conservation action (Rich et al. 2004). Citizen science data have also contributed significantly to our understanding of the effects of climate change on the ranges of species and their reproductive ecology. Thomas and Lennon (1999) used data from breeding bird atlas schemes in Great Britain to show that the breeding ranges of birds were expanding northward, and Hitch and Leberg (2007) found similar results in North America based on data from the Breeding Bird Survey. Citizen science data have shown similar northward shifts for other taxa, such as dragonflies in Britain (Hickling et al. 2005). It is important to note that none of the data (p.102) used to provide these insights into the biological impacts of climate change came from schemes designed with protocols for that purpose. These examples illustrate that researchers can repurpose long-term data to address a myriad of questions, or as Dhondt (2007) put it, “They provided answers before the questions were asked.”
Urban ecology is a specialized approach to understanding impacts of environmental change and how urban environmental practices might play a role in conservation. Although researchers may typically view remote areas as the most difficult to access for data collection, making observations around residences also can be logistically challenging. Engaging residents in data collection via citizen science allows easy sampling of private property, which usually takes up the majority of land in urban environments. For example, in Project Squirrel, participants count Fox and Gray Squirrels around their residences (van der Merwe et al. 2005). Based on Project Squirrel data in the Chicago area, Fox Squirrels were found to be more abundant than Gray Squirrels in areas with single-family homes and in association with elms and maples, as well as in areas with cats and dogs. In contrast, Gray Squirrels were associated with oaks and pines, multifamily homes, high-rises, and areas with fewer cats and dogs (van der Merwe et al. 2005). In Neighborhood Nestwatch, administered by the Smithsonian Migratory Bird Research Center, researchers coordinate their work around residences with participants at those residences, thus facilitating access and increasing observer effort by involving the people who are more frequently present at the sites, who then continue to make observations.
Dispersal, Migration, and Movement
Although migratory movements of animals are challenging to track, the multiple eyes and ears of citizen scientists can make tracking large-scale movements feasible. For example, the Vanessa Migration Project (www.public.iastate.edu/~mariposa/Vanessaproject.htm), organized in partnership with Iowa State University Geographic Information Systems Facility, Iowa Nature Mapping, and the Iowa Environmental Mesonet, tracks movement and outbreaks of butterflies. Bird migration is traceable through eBird and networks of migration monitoring stations such as the Canadian Migration Monitoring Network (www.bsc-eoc.org/national/cmmn.html) and the North American Hawk Migration Association (www.hmana.org). Any project that collects data on marked individuals or provides data on temporal changes in distribution and abundance within a year can provide potentially new information on animal movement.
Disease-causing organisms are invading species that can have particularly important influences on ecosystems, but the emergence of new diseases is (p.103) challenging to track. One of the best-studied emerging pathogens in wild animal populations is the bacterial pathogen Mycoplasma gallisepticum, which first appeared causing disease in House Finches in the mid-1990s in eastern North America. Dhondt et al. (2005) tracked the epidemic spread of mycoplasmal conjunctivitis in House Finches through a citizen science program, the House Finch Disease Survey (HFDS) administered by the Cornell Lab of Ornithology. Participants in the HFDS collected and reported data that were used to index disease prevalence (Altizer et al. 2004) and its impact on host abundance (Hochachka and Dhondt 2000). The HFDS was able to start within 9 months of the first reports of disease, because the majority of initial participants in the survey were recruited from the ongoing Project FeederWatch (Chapter 2). In this case, a citizen science project designed for monitoring winter birds served as the springboard for another project with much more specific goals, showing the potential ability of long-term data-collection programs to quickly gather supplemental information when unique circumstances arise. Similarly, existing monitoring projects were harnessed to obtain information critical to making predictions for how and where avian flu might reach the UK (see Chapter 10).
Geographic and Temporal Trends in Life History and Behavior
In addition to understanding processes underlying changes in distribution and abundance, citizen science can be used to better understand facets of life histories of organisms; for example, citizen science data can be used to examine differences in details such as demographics, traits, and phenology. Again, the strengths of using citizen science to investigate these facets of life histories is the ability to gather ecological data over a larger area than would otherwise be feasible in order to identify and understand variation in patterns across space or through time.
For example, advancements in laying dates varied across populations of tits (Parus sp.) across Europe (Visser 2003). At northern sites that have not experienced warming, and in habitats at southern sites where phenology of food sources had not changed, Great and Blue Tits did not show changes in laying dates. At intermediate latitudes, populations that historically lacked second clutches showed advances in laying dates, while populations with second clutches showed a reduction in the frequency of second clutches with a smaller advance in laying date (Visser 2003). Relatively few studies of this type exist, so much of our understanding of large-scale and longterm variation in life histories has come from citizen science data. Below are examples of past uses of citizen science data, or suggestions for fields in which the use of citizen science data could provide more detailed information on large-scale variation in life history traits.
Further understanding of geographic variation in reproductive rates will almost certainly require the use of citizen science data. Initial work in this field (Lack 1947; Moreau 1944) described gradients in clutch sizes between tropical and temperate birds, across seasons, between islands and continents, across continents, and with changes in elevation. These early efforts to examine geographic patterns in life history traits were coarse, with a few exceptions (Johnston 1954), comparing tropical (equatorial) and temperate (higher latitude) birds (Lack 1947; Moreau 1944; Skutch 1949), sometimes with entire countries comprising a single data point (Lack 1943). Nevertheless, the results from such studies provided the foundation for development of life history theory in ornithology (Ricklefs 2000). Although the resolution of these large-scale studies has improved over time as more locations have been added (Young 1994), citizen science projects can still provide additional data to deepen our understanding of geographic variation in reproductive strategies.
Data from citizen science projects have already contributed to our knowledge in this realm. For example, Peakall (1970) presented an analysis, similar to that of Lack (1943; Figure 6.1), of data on clutch size of Eastern Bluebirds using over 8000 records from Cornell’s first citizen science project, the Nest Record Card scheme. He was able to show, averaging clutch sizes within states, that Eastern Bluebird peak clutch size varied geographically (slightly higher in the center of the range at the height of the breeding season), but without an obvious latitudinal trend (Figure 6.2). The reason for this apparent lack of latitudinal trend was finally elucidated when Dhondt et al. (2002) examined the data further, and found that at southern latitudes, clutch size was low at the start of the season, which began earlier than at central and northern locales (Figure 6.3). In all areas, clutch size decreased from late April through July. These temporal trends masked a small but clear latitudinal increase in clutch size to the north.
Such complex variation in reproductive patterns over broad geographic areas likely can be detected only using citizen science data, and so far we have just scratched the surface in using such data to examine patterns such as seasonal trends across latitudes in clutch size (Cooper, Hochachka, Butcher, et al. 2005; Dhondt et al. 2002), reproductive effort (Cooper, Hochachka, and Dhondt 2005), and hatching failure (Cooper et al. 2006).
Seasonal timing of life events is important to measure as temporal shifts can provide sensitive indicators of response to environmental conditions. Citizen science data collection is well suited for collecting the information to study phenological patterns. Changes in phenology are predicted in a diversity of taxa as a response to climate change, and several have been examined using citizen science data. Data from butterfly monitoring (p.105)
schemes in Britain have shown a tendency toward earlier first appearances in the late twentieth century (Roy and Sparks 2000). Timing of birds’ breeding seasons has also been found to change; for example, Dunn and Winkler (1999) used nest record card data to discover that Tree Swallows had advanced their laying dates by up to 9 days in response to changing climate. Also of concern are changes in plant phenology. Project BudBurst (Chapter 2) monitors geographical and temporal variation in plant phenology as gateway project to a more intensive phenological monitoring scheme launched by the National Phenology Network in 2010. (p.106)
Geographic and Temporal Trends in Behavioral Traits
While citizen science projects that involve counts or recording presence/absence may be easiest to implement because they tap into what many hobbyists already do, citizen scientists can also be asked to provide more detailed data using specific protocols for assessing behavior, such as the breeding evidence codes used in breeding bird atlases, or using specialized instruments to collect additional types of data that extend beyond simple observation. Both direct and indirect observation can be used to study behavior. At the Cornell Lab of Ornithology, participants in Project PigeonWatch follow a protocol and gather data on mate choice behavior in pigeons across the globe to examine the potential role of mate choice in maintaining variation in color morphs. We have also created projects in which citizen scientists observe and classify behaviors recorded on camera using the CamClickr project’s drag-and-drop method of sorting still images by behavior or state (Voss and Cooper 2010).
We have successfully worked with project participants to record behaviors without observing the birds at all; participants placed temperature-recording data loggers into nests, where the temperature records were then used to infer incubation rhythms of the birds (Cooper and Mills 2005). (p.107)
All of the examples presented above show that ecologists can study diverse ecological patterns and processes over large areas and long time periods. Citizen science data, however, clearly should be viewed as complementary to data from other sources if the goal is to build a full understanding of ecological processes (Dhondt et al. 2005). They can also be used at different stages in the progress of a research program; Chapter 8 discusses this in detail, talking about the use of citizen science data to generate hypotheses for subsequent testing in an early, exploratory phase of research (see also Dickinson et al. 2010).
Regardless of their specific uses, citizen science data need to come from studies designed to gather relevant data that are appropriate for rigorous analysis. Design constraints and challenges are common to any research protocol, but our experience has shown us that there are several challenges that are particular to, or strongly manifested in, citizen science projects.
Rare Observations and Occurrences
Rare events and events that are rarely detected can be very important but are difficult to study. The cumulative efforts of dispersed networks of citizen scientists can contribute an army’s worth of person-hours of work, making the collection of information on rare events tractable. Examples of such information or events include predation events (witnessing predation), mortality events (discovering dead individuals), and sightings of rare or elusive species.
Observations and Experiments Involving Rare Events
A key example of the power of citizen science to study rare ecological phenomena is with invasive species or disappearing species. Rare species can be located by project participants, as has been seen with The Lost Ladybug Project (www.lostladybug.org), administered at Cornell University, in which digital images of various species of ladybugs are received from thousands of locations across the United States (Losey et al. 2007). Through this project, two children in Virginia reported a nine-spotted ladybug, which researchers had not documented in 14 years (Losey et al. 2007), and a 6-year-old in Oregon was able to report with photos the presence of numerous individuals of native ladybugs. Subsequently, professional researchers visited the ladybug population and collected individuals for captive breeding and research.
Death is another ecological phenomenon that is often difficult to study because dead animals are rarely encountered. Citizen science surveys (p.109) of beaches such as the Coastal Observation and Seabird Survey Team (COASST) (Parrish et al. 2007) can provide information on timing of mortalities of oceangoing bird species. Mortality of birds around human residences also can be documented using projects such as American Bird Conservancy’s PredatorWatch and the Cornell Lab of Ornithology’s My Yard Counts (2006–2007), and Project FeederWatch data have been used to estimate annual national mortality as a result of window strikes (Dunn 1993).
Mortality due to rare environmental events also can be inferred by examining data from ongoing citizen science projects. Following the 2003 heat wave in Europe, researchers used data from the French Breeding Bird Survey to show that death or reproductive failure rates for different bird species were related to their predicted resilience to extreme temperatures (Jiguet et al. 2006). Because temperature anomalies are more important than absolute temperatures in influencing populations, these insights about extreme temperatures would not have been possible without data collected over a long enough time frame to include the uncommon heat wave.
Other hard-to-observe events, such as the use of calcium sources by birds, also can be documented by citizen science projects (Dhondt and Hochachka 2001). Project participants provided sources of calcium for birds and made observations of the species that used supplemental calcium during their nesting seasons. The results showed that between-species variation in the use of supplemental calcium was closely related to diet.
Conducting ecological research using citizen science tools, whether collecting data from a new project or using data already collected, will need to overcome obstacles inherent in data gathered via volunteer schemes. Researchers need to address challenges from several angles—during project design, through project-to-participant communications, and by the use of appropriate statistical analytical methods. In general, working with citizen science data requires more sophisticated statistical and data management skills than do many other types of ecological research.
Observer Variability and Detection Probability
Participants contributing data to citizen science projects vary in age, experience, skill, training, willingness to be trained, and other attributes that influence data accuracy (Dickinson et al. 2010). This challenge can be addressed with protocols that minimize required skill levels, provide adequate training, and maximize standardization of data collection to increase consistency across observers. Protocols designed to encourage repeated observations (p.110) at individual sites are valuable for two reasons. First, all observers, professional or not (Kendall et al. 1996), differ in their abilities; similarly, observation locations may differ from each other in consistent ways that cannot be readily quantified. Protocols requiring repeated observation allow these observer- and site-related differences to be accounted for during analyses, removing the intersite and interobserver biases. Second, failure to detect something indicates that either this thing (e.g., bird species) was not present or, alternatively, it was present but not detected. For example, a silent, motionless bird can be nearly impossible to detect but is nevertheless using the habitat at a location. Formal methods of analysis now exist for analyzing data from systematic, repeated counts in order to separate out probabilities of true absence from lack of detection (MacKenzie et al. 2006). At least two projects at the Cornell Lab of Ornithology (Celebrate Urban Birds and My Yard Counts) were explicitly designed to allow estimation of detectability, and other less rigorously repeated counts may also prove amenable to analysis using these same methods.
Distribution of Data
We perceive that the greatest issue with citizen science data gathering, relative to more conventional means of gathering scientific data, is the extent to which self-selection by participants affects the data-collection process. In most citizen science projects, participants have ultimate control over where they collect data, when they collect data, and the effort they expend in data collection. We concur with the conclusions of Schmeller et al. (2009) in believing that this, rather than the inherent abilities and training of participants, is why results based on citizen science data are deemed to be more biased than results based on data collected by professionals (Engel and Voshell 2002; Genet and Sargent 2003).
When participants choose the locations at which they make observations, their chosen locations may not accurately represent the local availability of habitat of different types and may not be distributed evenly across the region of interest. Even though the density of data from across the United States and Canada is roughly proportional to human population density, meaning there tend to be more data where more people live, highly urbanized sites tend to be underrepresented in the data (e.g., in eBird and Project FeederWatch). Some citizen science projects use a stratified random sampling design, for example the North American Breeding Bird Survey, the North American Amphibian Monitoring Program and several bird monitoring projects in the UK (see Chapter 12), but this is rare. With less rigorous designs, gridding the region of interest can be helpful. Most breeding bird atlas projects attempt to sample within all grids, whereas other projects specify a subset of grid squares (Kéry et al. 2005). When random (p.111) spatial distribution of observations cannot be imposed via project design, analysis and interpretation need to take the nonrandom distribution of sampling points into account.
The Value of Nothing
Participants in citizen science projects can perceive their observations as unimportant, and fail to provide data, if they do not observe something “interesting.” The cases with which we routinely deal involve absences: not observing one or any species, or not observing events such as presence of diseased birds. To a researcher these observations are extremely important, and our approach to this challenge has been to address it on multiple fronts. First, in project design, mechanisms can be created to turn these “nonevent” observations into something that is reportable, and their importance can be communicated. The mechanisms include providing places on data-collection forms for observers to report that they made observations but saw no birds. The House Finch Disease Survey questionnaire (a computer-scannable form) was designed so that participants had only “yes” answers (rather than yes or no answers). The first question for each day was “Have you watched your feeder?” The following questions were “Have you seen House Finches?” and “Have you seen diseased House Finches?” Therefore if participants reported that they had observed their feeder but did not report that they had seen House Finches, we knew the species was not observed. This made it possible to show that in certain parts of the range, House Finch abundance declined through winter when disease prevalence was high, while this was not the case in warmer areas (Dhondt et al. 1998).
It is possible to require participants to provide information that they might otherwise assume is not essential, such as indicating whether an electronic checklist consists of all the bird species that they identified (eBird). This latter information allows data analysts to infer counts of zero birds observed for all species present on the checklist for which no counts were reported. Second, through project implementation, communications with participants can explain and repeatedly emphasize the importance of reporting what is seen and not seen (Dhondt 1997).
Third, this issue can be dealt with in analysis as well. For example, Dhondt et al. (1998) compared conjunctivitis prevalence in House Finches reported by participants who regularly submitted data forms in the months prior to reporting conjunctivitis with data from those participants who never reported House Finches until they reported conjunctivitis. The latter participants were presumably failing to report the absences of disease and their data overestimated disease prevalence (Dhondt et al. 1998).
Several other chapters in this book (Chapters 3, 7, 8, and 9) discuss methods needed to appropriately manage and analyze the types of large-scale data that can come from citizen science projects. The underlying challenge is for researchers to acquire the skills needed to understand and use the methods that are discussed in these other chapters.
Given the magnitude of data that can be collected from large-scale citizen science projects, researchers need to have strong skills in data management and manipulation. In our own work, we routinely use data files that are too large for spreadsheet programs to handle. For data such as these, researchers will need to learn to use database systems (e.g., MySQL, Microsoft Access).
We have described the uses of citizen science data in conjunction with other sources of data across large areas, such as data derived from satellite imagery. All of these data are available as geographic information system (GIS) files, and joining their data with the participant-collected data requires knowledge of the specialized software that makes use of GIS data files.
Once the data are collated and organized for analysis, the methods used for analysis can also differ from those for which ecological researchers are typically trained. Specifically, the need to use methods to account for nonindependence of data from multiple nearby locations (Chapter 7) can require relatively specialized knowledge. Regarding the exploratory data-analysis techniques described in Chapter 8, these are just starting to be used by ecologists, and training in their use would possibly require researchers to develop contacts with colleagues in the field of computer sciences rather than statistical sciences from which the “standard” analysis methods have come.
All of this means that to effectively and efficiently work with large citizen science data sets, ecologists will likely need to formally or informally develop collaborations with colleagues in disciplines with which ecologists have received little formal training or previous exposure.
Based on Thomas Kuhn’s Structure of Scientific Revolutions, we can ask, at which phase(s) of science might citizen science have the most effect? Are there new fields that can be opened where the topics are in the “pre-science” phase, that is, prior to a central paradigm? Might citizen science move an existing area of pre-science to the “normal science” phase, that is, enlarging a central paradigm? Is it possible that citizen science will create “revolutionary science,” that is, bring up enough anomalies to some current paradigm that it will create a new paradigm that subsumes the old results as well as new anomalous results into one new framework? Given (p.113) the ability of citizen science to detect large-scale patterns of systems, it is possible for it to advance science in all these ways.
In this chapter, we used examples primarily from our own work to illustrate opportunities for using the methodology of citizen science. Citizen science projects have the ability to not merely contribute to understanding of the questions that initially motivated the project, but also make available a network of participants subsequently tapped to allow collection of novel data in diverse subfields in ecology, including reproductive tactics, macroecology, disease ecology, evolutionary ecology, and conservation biology.
Many of the insights in this chapter arose from the daily collaborations among staff at the Cornell Lab of Ornithology, particularly Paul Allen, Rick Bonney, and Tina Phillips. Some citizen science projects and associated research described in this chapter were funded by NSF DEB 009445 and NSF EF 062705.