Deep Neural Networks Really Are the Right Way for a Biostatistician to Analyze Biological Data or: How I Learned to Stop Worrying and Love the DNN

September 16, 2024

12:00 pm to 1:00 pm

French Family Science Center 4233

More information

Event sponsored by:

Computational Biology and Bioinformatics (CBB)

Biostatistics and Bioinformatics

Duke Center for Genomic and Computational Biology (GCB)

Precision Genomics Collaboratory

School of Medicine (SOM)

Contact:

Franklin, Monica

Speaker:

David Page

I admit the title intentionally overstates the case. But many (most?) high-throughput biology datasets are based on aggregates, where aggregation occurs during either the experiment or data post-processing. As a result, any node in a graphical model of the data (e.g., Bayes net, dynamic Bayes net, Markov net, point process, or CRF) really is an aggregate of many idealized single-measurement nodes, so the real model can be viewed as a high-dimensional tree-structured graphical model. We prove that such models correspond to neural networks, and also that every neural network can be viewed as such a model. Based on this theoretical result, we discuss potential applications, including causal neural networks and the potential for a future "foundation model" for health. We also use examples from clinical data (such as EHRs) in addition to biological data.

CBB Monday Seminar Series