Data science is playing an increasingly important role in the design and analysis of engineered biology. This has been fueled by the development of high-throughput methods like massively parallel reporter assays, data-rich microscopy techniques, computational protein structure prediction and design, and the development of whole-cell models able to generate huge volumes of data. Although the ability to apply data-centric analyses in these contexts is appealing and increasingly simple to do, it comes with potential risks.
View Article and Find Full Text PDFCohort-wide sequencing studies have revealed that the largest category of variants is those deemed 'rare', even for the subset located in coding regions (99% of known coding variants are seen in less than 1% of the population. Associative methods give some understanding how rare genetic variants influence disease and organism-level phenotypes. But here we show that additional discoveries can be made through a knowledge-based approach using protein domains and ontologies (function and phenotype) that considers all coding variants regardless of allele frequency.
View Article and Find Full Text PDFAwareness and management of ethical issues in data science are becoming crucial skills for data scientists. Discussion of contemporary issues in collaborative and interdisciplinary spaces is an engaging way to allow data-science work to be influenced by those with expertise in sociological fields and so improve the ability of data scientists to think critically about the ethics of their work. However, opportunities to do so are limited.
View Article and Find Full Text PDF