Ecological networks are a useful representation of ecological systems in which species or organisms interact (Heleno et al. 2014; Delmas et al. 2018). In addition to using the established mathematical framework of graph theory to describe the structure of species interactions, network ecology has related the structural and ecological properties of networks (Proulx, Promislow, and Phillips 2005; Poulin 2010). Networks often allow to link disconnected scales in ecology (Guimarães 2020), and in particular are powerful tools to bridge data on populations to ecosystem properties (Loreau 2010; Jordano and Bascompte 2013; Gonzalez et al. 2020). Recently, the interest in the dynamics of ecological networks across large temporal scales (Baiser et al. 2019; Tylianakis and Morris 2017), and along environmental gradients (Welti and Joern 2015; Pellissier et al. 2017; Trøjelsgaard and Olesen 2016), has increased. As ecosystems are changing rapidly, networks are at risk of undergoing rapid and catastrophic changes to their structure: for example by invasion leading to a collapse (Magrach et al. 2017; Strong and Leroux 2014), or by a “rewiring” of interactions among existing species (Hui and Richardson 2019; Guiden et al. 2019; Bartley et al. 2019). Simulation studies suggest that knowing the structure of the extant network, i.e. being able to map all interactions between species, is not sufficient (Thompson and Gonzalez 2017) to predict the effects of external changes; indeed, data on the species occurrences and traits, as well as local extant and projected climate, are also required.
This change in scope, from describing ecological networks as local, static objects, to dynamical ones that vary across space and time, has prompted several methodological efforts. First, tools to study spatial, temporal, and spatio-temporal variation of ecological networks in relationship to environmental gradients have been developed and continuously expanded (Poisot et al. 2012, 2017; Poisot, Stouffer, and Gravel 2015). Second, there has been an improvement in large-scale data-collection, through increased adoption of molecular biology tools (Eitzinger et al. 2019; Evans et al. 2016; Makiola et al. 2019) and crowd-sourcing of data collection (Bahlai and Landis 2016; Roy et al. 2016; Pocock et al. 2015). Finally, there has been a surge in the development of tools allowing to infer species interactions (Morales-Castilla et al. 2015; Dallas, Park, and Drake 2017) based on limited but complementary data on network properties (Stock et al. 2017), species traits (Gravel et al. 2013; Desjardins-Proulx et al. 2017; Brousseau, Gravel, and Tanya Handa 2017; Bartomeus et al. 2016), and environmental conditions (Gravel et al. 2018). These latter approaches tend to perform well in data-poor environments (Beauchesne et al. 2016), and can be combined through ensemble modeling or model averaging to generate more robust predictions (Pomeranz et al. 2018; Becker et al. 2020). The task of inferring interactions is particularly important because ecological networks are difficult to adequately sample in nature (Jordano 2016a, 2016b; Banašek-Richter, Cattin, and Bersier 2004; Chacoff et al. 2012; Gibson et al. 2011). The common goal to these efforts is to facilitate the prediction of network structure, particularly over space (Poisot, Gravel, et al. 2016; Albouy et al. 2019) and into the future (Albouy et al. 2014), to appraise the response of that structure to possible environmental changes.
These disparate methodological efforts share another important trait: their continued success at predicting network structure depends both on state-of-the-art data management, and on the availability of data that are representative of the area we seek to model. Novel quantitative tools demand a higher volume of network data; novel collection techniques demand powerful data repositories; novel inference tools demand easier integration between different types of data, including but not limited to: interactions, species traits, taxonomy, occurrences, and local bioclimatic conditions. Macroecological studies of networks have demonstrated the importance of integrating network structure with past and current climate data (Dalsgaard et al. 2013; Schleuning et al. 2014; Martín-González et al. 2015), and that even when considering large scale gradients, similar types of interactions can behave in similar ways, in that they respond to the same drivers (Zanata et al. 2017). That being said, network-based measures of community structure often bring complementary information when compared to other sources of data (like abundance; Dalsgaard et al. 2017).
In short, advancing the science of ecological networks requires us not only to increase the volume of available data, but also to pair these data with ecologically relevant metadata. Such data should also be made available in a way that facilitates programmatic interaction (i.e. where the data are processed automatically and without the need for manual curation) so that they can be used by reproducible data analysis pipelines. Poisot, Baiser, et al. (2016) introduced mangal.io
as the first step in this direction. In the years since the tool was originally published, we continued the development of data representation, amount and richness of metadata, and digitized and standardized as much biotic interactions data as we could find. The second major release of this database contains over 1300 networks, 120000 interactions across close to 7000 taxa, and represents what is to our best knowledge the most complete collection of species interactions available.
Here we ask if the current Mangal database is fit for global-scale synthesis research into ecological networks. A recent study by Cameron et al. (2019) suggest that food webs are un-evenly documented globally, but focused on metadata as opposed to actual datasets. Here, we conclude that interactions over most of the planet’s surface are poorly described, despite an increasing amount of available data, due to temporal and spatial biases in data collection and digitization. In particular, Africa, South America, and most of Asia have very sparse coverage. This suggests that synthesis efforts on the worldwide structure or properties of ecological networks will be weaker within these areas. To improve this situation, we should digitize available network information and prioritize sampling towards data-poor locations.
Global trends in ecological networks description
Network coverage is accelerating but spatially aggregated

The earliest recorded ecological networks date back to the late nineteenth century, with a strong increase in the rate of collection around the 1980s (fig. 1). Although the volume of available networks has increased over time, the sampling of these networks in space has been uneven. In fig. 2, we show that globally, network collection is biased towards the Northern hemisphere, and that different types of interactions have been sampled in different places. As such, it is very difficult to find a spatial area of sufficiently large size in which we have networks of predation, parasitism, and mutualism. The inter-tropical zone is particularly data-poor, either because data producers from the global South correctly perceive massive re-use of their data by Western world scientists as a form of scientific neo-colonialism (as advanced by Mauthner and Parry 2013), thereby providing a powerful incentive against their publication, or because ecological networks are subject to the same data deficit that is affecting all fields on ecology in the tropics (Collen et al. 2008). As Bruna (2010) identified almost ten years ago, improved data deposition requires an infrastructure to ensure they can be repurposed for future research, which we argue is provided by mangal.io
for ecological interactions.

Network size did not increase over time
In fig. 3, we report the changes in the number of nodes (usually species, sometimes functional or trophic groupings) in ecological networks over time - interestingly, even though the field of network ecology itself is growing (Borrett, Moody, and Edelmann 2014), the overwhelming majority of networks collected to date remain under a hundred species. This is most likely explained, not by the fact that ecological networks are necessarily small, but by the immense effort required to assemble these datasets (Jordano 2016b). Indeed, Jordano (2016a) emphasizes that the correct empirical description of ecological networks requires extensive field work in addition to a profound knowledge of the natural history of the system. These multiple constraints contribute to keeping network size small, and might not be indicative of low data quality.

Different interaction types have been studied in different biomes
Whittaker (1962) suggested that natural communities can be partitioned across biomes, largely defined as a function of their relative precipitation and temperature. For all networks for which the latitude and longitude were known, we extracted the value temperature (BioClim1, yearly average) and precipitation (BioClim12, total annual) from the WorldClim 2 data at a resolution of 10 arc minutes (Fick and Hijmans 2017). Using these we can plot every network on the map of biomes drawn by Whittaker (1962) (note that because the frontiers between biomes are not based on any empirical or systematic process, they have been omitted from this analysis). In fig. 4, we show that even though networks capture the overall diversity of precipitation and temperature, types of networks have been studied in sub-sections of the biomes space only. Specifically, parasitism networks have been studied in colder and drier climates; mutualism networks in wetter climates; predation networks display less of a bias. Interestingly, some combinations of temperature and precipitation that are abundant on Earth (darker shading) are not represented in our network dataset, which suggests that we lack knolwedge of some widespread biomes.

To scale this analysis up to the 19 BioClim variables in Fick and Hijmans (2017), we extracted the position of every network in the bioclimatic space, ranged them so that they have mean of 0 and unit variance, and conducted a principal component analysis on the scaled bioclimatic variables. In fig. 5, we projected the sampling locations in the resulting subspace formed by the first two principal components, which capture well over 75% of the total variance in the 19 bioclimatic variables. This ordination has a number of interesting properties. First, the different types of networks occupy different environmental combinations, which largely matches the results of fig. 4. Second, the space is more scarcely sampled by networks that contain either mostly predatore or mostly mutualistic interactions – although they do cover a larger part of the space, the distance between them is much greater than compared to parasitism.

In fig. 6, we measure the Euclidean distance to the centroid of the space for every network. Mutualistic interactions tend to have values that are higher than predation, which are themselves mostly higher than parasitism. This suggests a potential bias in that globally, as the growth of digitized ecological networks was largely driven by parasitic interactions fig. 1, the environments in which they have been sampled have became over-represented.

Some locations on Earth have no climate analog
In figures fig. 7, we represent the environmental distance between every pixel covered by BioClim data, and the three networks that were sampled in the closest environmental conditions (this amounts to a k nearest neighbors with k = 3). In short, higher distances correspond to pixels on Earth for which no climate analog network exists, whereas the darker areas are well described. It should be noted that the three types of interactions studied here (mutualism, parasitism, predation) have regions with no analogs in different locations. In short, it is not that we are systematically excluding some areas, but rather than some type of interactions are more studied in specific environments. This shows how the lack of global coverage identified in fig. 4, for example, can cascade up to the global scale. These maps serve as an interesting measure of the extent to which spatial predictions can be trusted: any extrapolation of network structure in an area devoid of analogs should be taken with much greater caution than an extrapolation in an area with many similar networks.

Conclusions
For what purpose are global ecological network data fit?
What can we achieve with our current knowledge of ecological networks? The overview presented here shows a large and detailed dataset, compiled from almost every major biome on Earth. It also displays our failure as a community to include some of the most threatened and valuable habitats in our work. Gaps in any dataset create uncertainty when making predictions or suggesting causal relationships. This uncertainty must be measured by users of these data, especially when predicting over the “gaps” in space or climate that we have identified. We are not making any explicit recommendations for synthesis workflows. Rather we argue that this needs to be a collective process, a collaboration between data collectors (who understand the deficiencies of these data) and data analysts (who understand the needs and assumptions of network methods).
One line of research that we feel can confidently be pursued lies in extrapolating the structure of ecological networks over gradients, not at the level of species and their interactions, but at that of the community. Mora et al. (2018) revealed that all food webs are built upon the same structural backbone, which is in part due to strong evolutionary constraints on the establishment of species interactions (Dalla Riva and Stouffer 2015); in other words, most networks are expected to be variations on a shared theme, and this facilitates the task of predicting the overarching structure greatly. Finally, this approach to prediction which neglects the composition of networks is justified by the observation that network structure tends to be maintained at very large spatial scales even in the presence of strong compositional turnover (Dallas and Poisot 2017; Kemp et al. 2017). In short, the invariance of some network properties allows examining how “ecological networks” changes, as abstract objects, over time and space. One thing that the current state of the data does not always allow is to examine how a specific group of species (i.e. when taxonomic turnover becomes important) would react, in its interactions, to environmental gradients. This is an important research question, and we think that spatially replicated sampling of networks in the future would help with generating adequate data to address it in a synthetic way.
Can we predict the future of ecological networks under climate change?
Perhaps unsurprisingly, most of our knowledge on ecological networks is derived from data that were collected after the 1990s (fig. 1). This means that we have worryingly little information on ecological networks before the acceleration of the climate crisis, and therefore lack a robust baseline. Dalsgaard et al. (2013) provide strong evidence that the extant shape of ecological networks emerged in part in response to historical trends in climate change. The lack of reference data before the acceleration of the effects of climate change is of particular concern, as we may be deriving intuitions on ecological network structure and assembly rules from networks that are in the midst of important ecological disturbances. Although there is some research on the response of co-occurrence and indirect interactions to climate change (Araújo et al. 2011; Losapio and Schöb 2017), these are a far cry from actual direct interactions; similarly, the data on “paleo-foodwebs,” i.e. from deep evolutionary time (Muscente et al. 2018; Yeakel et al. 2014; Nenzén, Montoya, and Varela 2014) represent the effect of more progressive change, and may not adequately inform us about the future of ecological networks under severe climate change. However, though we lack baselines against which to measure the present, as a community we are in a position to provide one for the future. Climate change will continue to have important impacts on species distributions and interactions for at least the next century. The Mangal database provides a structure to organize and share network data, creating a baseline for future attempts to monitor and adapt to biodiversity change.
Possibly more concerning is the fact that the spatial distribution of sampled networks shows a clear bias towards the Western world, specifically Western Europe and the Atlantic coasts of the USA and Canada (fig. 2). This problem can be somewhat circumvented by working on networks sampled in places that are close analogs of those without direct information (almost all of Africa, most of South America, a large part of Asia). However, fig. 7 suggests that this approach will rapidly be limited: the diversity of bioclimatic combinations on Earth leaves us with some areas lacking suitable analogs. These regions are expected to bear the worst of the socio-economical (e.g. Indonesia) or ecological (e.g. polar regions) consequences of climate change. Cameron et al. (2019) reached a similar conclusion by focusing on food webs, and our analysis suggests that this worrying trend is, in fact, one that is shared by almost all types of interactions. All things considered, our current knowledge about the structure of ecological networks at the global scale leaves us under-prepared to predict their response to a warming world. From the limited available evidence, we can assume that ecosystem services supported by species interactions will be disrupted (Giannini et al. 2017), in part because the mismatch between interacting species will increase (Damien and Tougeron 2019) alongside the climatic debt accumulated within interactions (Devictor et al. 2012).
Active development and data contribution
This is an open-source project: all data and all code supporting this manuscript are available on the Mangal project GitHub organization, and the figures presented in this manuscript are themselves packaged as a self-contained analysis which can be run at any time. We hope that the success of this project will encourage similar efforts within other parts of the ecological community. Besides, we hope that this project will encourage the recognition of the contribution that software creators make to ecological research.
One possible avenue for synthesis work, including the contribution of new data to Mangal, is the use of these published data to supplement and extend existing ecological network data. This “semi-private” ecological synthesis could begin with new data collected by authors – for example, a host-parasite network of lake fish in Africa, or a pollination network of hummingbirds in Brazil. Authors could then extend their analyses by including a comparison to analogous data made public in Mangal. Upon the publication of the research paper, the original data could be uploaded to Mangal. This enables the reproducibility of this particular published paper. Even more powerfully, it allows us to build a future of dynamic ecological analyses, wherein analyses are automatically re-done as more data get added. This would allow a sort of continuous assessment of proposed ecological relationships in network structure. This cycle of data discovery and reuse is an example of the Data Life Cycle (Michener 2015) and represents one way to practice ecological synthesis.
The idea of continuously updated analyses is very promising. Following the template laid out by White et al. (2019) and Yenni et al. (2019), it is feasible to update a series of canonical analyses any time the database grows, to produce a living, automated synthesis of ecological networks knowledge. To this end, the Mangal database has been integrated with EcologicalNetworks.jl
(Poisot, Belisle, et al. 2019), which allows the development of flexible network analysis pipelines. One immediate target would be to borrow the methodology from Carlson et al. (2019), and provide an estimate of the sampling effort required to accurately describe combinations of interaction types and bioclimatic conditions at various places on Earth, to provide recommendations on sampling effort allocation. Tightening the integration between infrastructure, data, and models, contributes to building the capacity of our field to bring about iterative near-term forecasting of ecological network structure (Dietze et al. 2018).
What problems would more data solve?
As the amount of empirical evidence grows, so too should our understanding of existing relationships between network properties, between networks properties and space, and the interpretation to be drawn from them. But what information would the structure of the food web from a pond bring to our understanding of the plant-pollinator interactions around it? Or to a food web in another pond a few kilometers from here? In short, will we get a lot more insights by accumulating data? Before answering this question (in the affirmative), it matters to recognize that, as Hortal et al. (2015) pointed out, biotic interactions are a core part of biodiversity; the Eltonian shortfall, manifested in our lack of widespread data about them, in as much of an impediment to our mission as ecologists as the absence of data on phylogeny or species occurrence would be. As a conclusion to this article, we would like to frame the aggregation of data on species interaction networks in standardized databases as both a requirement justified by fundamental science, and as an opportunity to conduct novel experiments on the prediction of ecological networks. In fact, re-analysis of the raw food web data contained in mangal.io
recently allowed MacDonald, Banville, and Poisot (2020) to develop a novel model of food web structure, which outperforms previous proposition for the relationship between species richness and link number.
First, we require to collect data on species interactions following their measurement in situ because there is mounting evidence that they cannot reliably be inferred from observing the two species in co-occurrence; this has been shown through experimental and modeling approaches (Barner et al. 2018; Thurman et al. 2019). A recent synthesis by Blanchet, Cazelles, and Gravel (2020) also reveals how the assumption that co-occurrence will inform our knowledge of species interactions as wholly unsupported by the corpus of ecological theories. With the mounting amount of information on species distribution, and initiatives like GBIF storing over a billion record of occurrences, inferring interactions this ways was tempting; sadly, it appears unfeasible, leaving the curation of interaction data as the justifiable decision moving forward.
Second, we should collect data on species interactions following their measurement in situ, because this will enable the development of new generation of general models. Initial guidelines by Morales-Castilla et al. (2015) have led to an increase in the development and application of forecasting methods (reviewed in the introduction of this manuscript), and it is now clear that coupling data on species interaction, occurrences, traits (Schleuning et al. 2020), phylogeny, is going to lead to powerful predictive models of community structure. While knowing the structure of the food web of two ponds a few kilometers apart is not going to qualitatively change our understanding of food webs as a whole, the accumulation of data about different interactions in multiple environments will allow us to hunt for generalities, and identify rules that govern the assemblage of ecological networks.
Third, we should focus on digitizing, or collecting, time series of network structure. Networks are known to vary over short (Trøjelsgaard and Olesen 2016), long (Burkle, Marlin, and Knight 2013), and very long (Nenzén, Montoya, and Varela 2014) periods of time, and having the ability to track changes of a network through time will provide important answers as to the suitability of a single, discrete sampling timepoint to serve as a reference state for the history of the entire network. This is of particular relevance as we now have both population time-series for various community assemblages (Dornelas et al. 2018), and the quantitative tools to analyse time-series of complex interactions (Ovaskainen et al. 2017). As of now, very few networks are proper temporal re-sampling of a single site, and this limits our ability to understand how networks change in nature.
In conclusion, by accumulating more data, we will increase the overlap between different databases (phylogeny, genetics, occurrences, functional traits), which will contribute to the unification of our knowledge of biodiversity, a task which is currently hampered by disconnectedness between data describing different aspects of community structure and composition (Poisot, Bruneau, et al. 2019). The work of predicting species interactions would be streamlined by both (i) establishing and using a standardized database for species interactions with contextual metadata, and (ii) ensuring the compatibility of this database with other sources, through the use of established species identifiers. The mangal
data specification (and database) solves both issues, and we are confident that through sustained data deposition, it will contribute to our ability to predict the structure of ecological networks.
Data and code availability:
All code is available openly at https://github.com/PoisotLab/MangalSamplingStatus
, and the data can be retrieved from mangal.io
and the BioClim database using the specified files. Also, weekly updated pages presenting the analyses reported in this manuscript, including the data files, are available at https://poisotlab.github.io/MangalSamplingStatus/
.