Computational Analysis of Deletions in the Rhesus Macaque Genome by Recombination of Alu Elements

Isabella R. Childers, and Miriam K. Konkel

Abstract

Alu elements are short, non-autonomous mobile elements that are specific to primate genomes. Alu recombination-mediated deletions or ARMD causes deletions in the genome. In this project, two Old World monkey species, rhesus macaque and baboon, were compared to find ARMD events in the rhesus macaque genome. Because of high Alu mobilization in both species, we hypothesized that the ARMD rate may have increased as a result. We constructed Python scripts to extract and join the flanking sequences upstream and downstream of rhesus macaque Alu elements and to later align those flanking sequences to the baboon genome using a local installation of BLAT. Putative hits garnered from BLAT were divided into four categories: ambiguous, sequences with gaps, shared, and candidate ARMD. Candidate ARMD and ambiguous events were further examined by aligning the original rhesus macaque Alu element with the sequence from the baboon genome (or the hit) found in between the shared flanking regions. Out of 1034 possible candidate ARMD events examined using a sample from an AluSx1 dataset, 99 contained gaps, 9 were classified as ambiguous, 905 were classified as shared, and 21 were classified as candidate ARMD events with at least 2 hits identified as likely ARMD events. 

Figure A. Alu element

    An Alu element consists of a left and right monomer separated by an A-rich center and ends with an oligo(dA)-rich tail (Cordaux & Batzer, 2009). The left monomer includes an A and B box that contains an internal RNA polymerase III promoter (Cordaux & Batzer, 2009). TSD stands for target site duplication and is the result of an Alu insertion (Konkel et al., 2010).

Introduction

  • The Alu element, a primate-specific mobile element, is a non-long terminal repeat (or non-LTR) retrotransposon which uses the machinery of their longer counterparts, L1, to copy and paste themselves throughout the genome (Cordaux & Batzer, 2009). Mobilizations of these elements have been known to cause alterations in the genome through insertions (or insertion mutagenesis), insertion-mediated deletions, non-allelic recombination, etc. (Cordaux & Batzer, 2009). One such alteration is Alu recombination-mediated deletions (ARMD) (Han et al., 2007; Sen et al., 2006). 
  • The comparisons of the human and chimpanzee genomes have previously identified 492 human-specific and 663 chimpanzee-specific ARMD events (Han et al., 2007; Sen et al., 2006). Their findings showed that ARMD events help contribute to the genomic diversity between humans and chimpanzees. 
  • In this project, ARMD events were identified in the rhesus macaque genome by comparing rhesus macaque with baboon. Previous studies have found that both rhesus macaque and baboon genomes have high mobilization of Alu elements such as 192,889 full length AluY elements in baboon and 199,894 full length AluY elements in rhesus macaque compared to 112,768 full length AluY elements in human (Rogers et al., 2019; Tang & Liang, 2019). Therefore, we hypothesized that due to the large amount of Alu elements present, ARMD events may have proliferated.    

Figure B. ARMD Event

Materials and Methods

Results

  • One of the central goals of this project was to set up a pipeline to identify ARMD events. One of the main parts of this process was to separate the BLAT outputs based on their score and their span because these two factors can be one of the initial indications of an ARMD event. 
    • A cutoff for the score was chosen as 800 because any score lower would indicate that the flanking region was less likely to be shared between the rhesus macaque and baboon. A score cutoff higher than 800 would miss identifying ARMD events
    • The span’s cutoff was chosen to be greater than the score plus 350 base pairs because a greater span could indicate there was genomic material available in the baboon but not in the rhesus macaque. 350 base pairs was included in the condition to be representative of the typical size of an Alu element.  
  • 1034 loci were analyzed from a test dataset of AluSx1 elements that ran through the pipeline. Out of 1034 loci, 99 contained gaps, 9 were classified as ambiguous, 905 were classified as shared and 21 were classified as candidate ARMD events.  
    • There were many shared loci that appeared like the figure used in the methods section. However, there were also loci that had an baboon lineage-specific insertion/s (i.e. Alu, L1, etc.) that appeared in between the shared flanking regions or within the flanking regions. This would cause the alignment to have a large span which would go undetected by the filter in the Python script. 
  • After further analysis, 2 out of the 21 candidate ARMD were identified as likely ARMD events. 
    • One candidate ARMD was found in chromosome 1 in the baboon genome around 210827924-210829545. In the rhesus macaque genome, the AluSx1 element was found on chromosome 1 around 7147872-7148165. The two Alu elements involved in the recombination were both AluSx1.
    • The other candidate ARMD was found in chromosome 1 in the baboon genome around 140091964-140093812. In the rhesus macaque genome, the AluSx1 element was found on chromosome 1 around 79248697-79249021. The two Alu elements involved in the recombination were AluSx3 and AluSq.   

Conclusions

  • Overall, a pipeline was created to identify ARMD events in the rhesus macaque genome. From the test dataset of AluSx1 elements, there is evidence of ARMD events in the rhesus macaque genome. Because of this evidence, there could be more ARMD events that occurred within the entire genome. As more hits from the AluSx1 dataset (and expanded to other Alu subfamilies) are examined, these events may give a better understanding of their role in the rhesus macaque genome. Han, et al. (2007) and Sen, et al. (2006) both noted that ARMD events could offset the genomic expansion caused by Alu insertion in chimpanzee and human genomes. Since there is high mobilization of Alu in the rhesus macaque genome (Rogers et al., 2019), it can be predicted that ARMD events within the rhesus macaque genome also counteract genomic expansion caused by Alu insertion.   
    • In addition, Han, et al. (2007) found more ARMD events that involved two Alu elements from two different Alu subfamilies than two Alu elements from the same Alu subfamily. In this project, two of the likely ARMD events involved Alu elements that were from the same Alu subfamily (AluS). Since the pipeline is now established, we will perform a whole genome analysis that will include close to full-length Alu elements from different subfamilies to determine whether the findings found in the study by Han, et al. (2007) and Sen, et al. (2006) are similar in the rhesus macaque. 

References

Bhagwat, M., Young, L., & Robison, R. R. (2012). Using BLAT to find sequence similarity in closely related genomes. Current Protocols in Bioinformatics, Chapter 10, Unit10.8. https://doi.org/10.1002/0471250953.bi1008s37 

Cordaux, R., & Batzer, M. A. (2009). The impact of retrotransposons on human genome evolution. Nature Reviews Genetics, 10, 691-703. https://doi.org/10.1038/nrg2640

Han, K., Lee, J., Meyer, T. J., Wang, J., Sen, S. K., Srikanta, D., Liang, P., & Batzer, M. A. (2007). Alu recombination-mediated structural deletions in the chimpanzee genome. PLoS Genetics, 3(10), 1939-1949. https://doi.org/10.1371/journal.pgen.0030184

Karolchik, D., Hinrichs, A. S., Furey, T. S., Roskin, K. M., Sugnet, C. W., Haussler, D., & Kent, W. J. (2004). The UCSC table browser data retrieval tool. Nucleic Acids Research, 32(Database issue), D493-D496. https://doi.org/10.1093/nar/gkh103

Kent, W. J. (2002). BLAT—the BLAST-like alignment tool. Genome Research, 12(4), 656-664. https://doi.org/10.1101/gr.229202

Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., & Haussler, D. (2002). The human genome browser at UCSC. Genome Research, 12(6), 996-1006. https://doi.org/10.1101/gr.229102

Konkel, M. K., Walker, J. A., & Batzer, M. A. (2010). LINEs and SINEs of primate evolution. Evolutionary Anthropology, 19(6), 236-249. https://doi.org/10.1002/evan.20283

Rogers, J., Raveendran, M., Harris, R. A., Mailund, T., Leppälä, K., Athanasiadis, G., Schierup, M. H., Cheng, J., Munch, K., Walker, J. A., Konkel, M. K., Jordan, V., Steely, C. J., Beckstrom, T. O., Bergey, C., Burrell, A., Schrempf, D., Noll, A., Kothe, M., . . . Baboon Genome Analysis Consortium. (2019). The comparative genomics and complex population history of papio baboons. Science Advances, 5(1), eaau6947. https://doi.org/10.1126/sciadv.aau6947

Sen, S. K., Han, K., Wang, J., Lee, J., Wang, H., Callinan, P. A., Dyer, M., Cordaux, R., Liang, P., & Batzer, M. A. (2006). Human genomic deletions mediated by recombination between Alu elements. The American Journal of Human Genetics, 79(1), 41-53. https://doi.org/10.1086/504600

Tang, W., & Liang, P. (2019). Comparative genomics analysis reveals high levels of differential retrotransposition among primates from the hominidae and the cercopithecidae families. Genome Biology and Evolution, 11(11), 3309-3325. https://doi.org/10.1093/gbe/evz234