KDD Papers

Estimation of recent ancestral origins of individuals on a large scale

Ross E Curtis (AncestryDNA);Ahna R Girshick (AncestryDNA)


The last ten years have seen an exponential growth of direct-to-consumer genomics tests. One popular feature of these tests is the report of a distant ancestral inference profile—a breakdown of the regions of the world where the test-takers’ ancestors may have lived. While current methods and products generally focus on the more distant past (e.g., thousands of years ago), we have recently demonstrated that by leveraging network analysis tools such as community detection, more recent ancestry can be identified. However, using a network analysis tool like community detection on a large network with potentially millions of nodes is not feasible in a live production environment where hundreds or thousands of new genotypes need to be processed every day. In this study, we describe a classification method that leverages network features to assign individuals to communities in a large network corresponding to recent ancestry. We will be launching a version of this research as a new product feature at AncestryDNA.