Defense Date
7-26-2024
Graduation Date
Fall 12-20-2024
Availability
Immediate Access
Submission Type
thesis
Degree Name
MS
Department
Computational Mathematics
School
School of Science and Engineering
Committee Chair
Lauren Sugden
Committee Member
John Fleming
Keywords
hidden markov model, population genetics, positive selection, viterbi, python
Abstract
Identifying adaptive mutations in genetic data is challenging due to the low frequency of occurrence of such events, and because signatures of selection are intertwined with the footprints of various other evolutionary forces that shape our genomes. Even when a larger region appears to be under selection, genomic sites that are linked to adaptive mutations have similar statistical signals, and thus can obfuscate the identification of the actual adaptive mutation. The new method described here uses a Hidden Markov Model that allows for classification of neutral, linked, and sweep (adaptive mutation) genomic sites. This model is general and can be scaled to allow for an arbitrary number of classes. Using simulated genetic data, site-specific selection statistics are taken as input, and site probabilities and classifications are the resulting outputs. The Viterbi algorithm is used to identify the most likely path through all classes along the sequence. A stochastic backtrace method allows for the identification of multiple possible paths. By Scott McCallum August 2024 v These methods, in combination with enforcing sweep events, help to identify regions under selection, and allow for better localization of adaptive mutations.
Language
English
Recommended Citation
McCallum, S. (2024). Hidden Markov Model for Identifying Local Variants in Human Genomes using Simulated Data (Master's thesis, Duquesne University). Retrieved from https://dsc.duq.edu/etd/2290
Included in
Computational Biology Commons, Computational Engineering Commons, Evolution Commons, Genetics Commons, Population Biology Commons