Abstract
Studying microbial communities in extreme environments, such as the frigid and arid soils of the Antarctic Dry Valleys, offers a unique opportunity to understand how microbial life can persist under harsh environmental conditions. In order to develop a comprehensive view of these resilient ecosystems, knowledge of the taxonomic and functional diversity of the viruses present in these communities must also be discerned. Viruses play remarkable but obscure roles in microbial ecosystems, managing microbial populations and nutrient cycling while also promoting horizontal gene transfer. Despite viruses' theoretical potential to influence the ecological structure of soil microbial communities, much remains unknown about the full significance of soil viruses due to the technical challenges associated with studying them in soil environments. To access information about the viral ecology in our microbial communities, we utilized metagenomics to identify the genetic material present in our samples and reconstruct viral populations (vOTUs), groups of clustered viral sequences sharing a specified threshold of genetic similarity that can function as taxonomic units. We analyzed the metagenomes from a set of soil samples taken as a depth series from Taylor Valley, and uncovered 23 different dsDNA vOTUs. By studying the sequences of these computationally reconstructed viral genomes, we can make predictions about how sampling depth affects viral diversity measures, the microbial hosts the viruses might be infecting, and community level functional diversity.
Research Objectives & Questions
Questions:
- What is the taxonomic diversity present in our sample? In other words, what viral species are represented by our vOTUs?
- What is the alpha diversity of each sample? In other words, what is the relative abundance of taxa within each sample depth?
- What is the beta diversity of the viral community? In other words, how does the viral diversity at each sample depth compare to one another?
- What microbial hosts could the viruses we identified be infecting? What information can we infer about the microbial community that lives in these soil samples, and do those predictions agree with the results of a microbial focused metagenomic study of our samples?
- What viral genes can be found in our viral genomes? How could these genes be significant for metabolism and viral propagation in our community?
- How could factors from the various stages of our metagenomics analysis bias our results? What are the limitations to our analysis? What is the quality of our initial data, and the level of confidence in our results?
Primary Objectives:
- Determine the alpha and beta diversity, community structure, and community membership of the viral community represented by our data.
- Construct a principal coordinates of analysis (PCoA) plot to visualize beta diversity.
- Construct a comprehensive normalized relative abundance table to see the abundances of each vOTU at each depth. Use this table to create a heat map showing the relative abundance of each vOTU at each depth, and whisker plots showing Shannon diversity, relative abundances between samples, and Bray-Curtis dissimilarities.
- Perform functional/gene annotation on our viral genomes. Evaluate functional diversity within each sample and across samples, and the functional genomic potential of the identified viruses.
Secondary Objectives:
- Use statistical tests to postulate how the viral ecology that we find might be dependent on differing environmental characteristics across the different soil depths.
- Study the viral sequences for artifacts of microbial hosts (AMGs), and connect identified viruses to possible host microbes, in order to make hypotheses of how the viruses in these samples may be interacting with their microbial hosts.
Background
Viruses are the most abundant biological entities on the planet and play a salient role in the lives of soil microbial communities. However, because of the immense challenges associated with isolating and identifying viruses from soil samples, we have only just begun to realize the viral diversity present in these ecosystems and their significance for microbial life. This is because soil is one of the most complex environments to study due to its highly heterogeneous nature, consisting of a variable mixture of inorganic matter and organic biomass, which makes it very difficult to isolate virions or viral DNA.
Metagenomics has proved to be an attractive method for accessing information about the ecological composition of soil microbial communities. Instead of requiring wet-lab work to isolate the physical microbes from a sample (losing many community members in the process), a metagenomics approach will extract all the DNA directly from an environment, and use next-generation whole genome sequencing (WGS) to capture a snapshot of the genetic information present in the sample. From that sequence data, one can then reconstruct the genomes of the individuals present in the sample. Soil environments in particular are extremely rich in microbial life, and because of metagenomic's ability to survey large and diverse populations, it has played a significant role in moving forward the classification of the diverse virosphere. With that said, biases can still be introduced at every stage of the metagenomics workflow, so while metagenomic studies enable progress, a large amount of uncertainty must be associated with results (Trubl et. al 2020).
Methodology
This research is building off of a project in the Johnson Lab at Georgetown that aims to study the relationship between taxonomic and functional diversity of soil microbial communities in the Antarctic Dry Valleys. We started with pre-assembled 'contiguous' sequences from the depth series samples, and have been moving this data through a series of bioinformatics analyses and data processing steps. First, we used VirSorter2 and geNomad to identify vOTUs from the starting sequences. We then combined our results from those two programs using a clustering technique that removes redundant viral populations. This dataset of vOTUs could then be functionally annotated and taxonomically classified. We also plan to associate the vOTUs with their possible microbial hosts.
To estimate the relative abundance of each vOTU, we need to align each original read sequence from our samples to each vOTU sequence in a process called read mapping/alignment. By doing this, we can determine how many times each vOTU is covered by the reads, which gives us an estimate of that virus's abundance in the sample. Once we know the relative abundances of each vOTU at each sample depth, we can then use a variety of R programs to do our ecological analyses.
References:
Trubl, G., Hyman, P., Roux, S., & Abedon, S. T. (2020). Coming-of-age characterization of soil viruses: a user’s guide to virus isolation, detection within metagenomes, and viromics. Soil Systems, 4(2), 23.