djones3@fredhutch.org Fred Hutchinson Cancer Center | Seattle, WA
I am a Staff Scientist at the Fred Hutchinson Cancer Center, where I focus on machine learning problems involving spatial transcriptomics. I have a Ph.D. in computer science from the University of Washington, where I worked on RNA-Seq statistics.
I write a lot of Rust, Julia, and Python, bake croissants, read history and literary fiction, pet cats, etc.
Fast and accurate cell segmentation for spatial transcriptomics borrowing methods from cell simulation.
An information theoretic approach to quantifying the degree of spatial organization in spatial transcriptomics (or other spatial omics) data. This method computes spatial information scores for genes to identify spatially varying expression patterns.
A very fast interval tree data structure for overlap queries of static integer intervals, with genomic intervals in mind. Implemented in Rust, COITrees uses a van Emde Boas layout for improved cache locality and features SIMD optimizations through AVX2 and Neon instructions when available.
A plotting and data visualization system for Julia based on the Grammar of Graphics. Gadfly creates publication quality graphics with an intuitive, consistent interface and tight integration with DataFrames.jl. Features include interactive zooming/panning, SVG/PNG/PDF/PS output formats, and support for a wide range of plot types.
Single-cell spatial transcriptomics promises a highly detailed view of a cell's transcriptional state and microenvironment, yet inaccurate cell segmentation can render this data murky. We adopt methods from ab initio cell simulation to rapidly infer morphologically plausible cell boundaries that preserve cell type heterogeneity.
Read PaperA key step in spatial transcriptomics is identifying genes with spatially varying expression patterns. We adopt an information theoretic perspective to this problem by equating the degree of spatial coherence with the Jensen-Shannon divergence between pairs of nearby cells and pairs of distant cells.
Read PaperWe propose a new method of approximating the likelihood function of a sparse mixture model for RNA-Seq analysis, using a technique called the Pólya tree transformation. This approximation achieves most of the benefits of full probabilistic models with a fraction of the computational costs, leading to more accurate detection of differential transcript expression and transcript coexpression.
Read PaperWe present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. Using a novel de novo assembly algorithm with a probabilistic data structure to dramatically reduce memory requirements, we developed the first assembly-based compressor, effectively reducing dataset sizes to <15% of their original size.
Read PaperWe present a new method to measure and correct for sequence bias in RNA-Seq experiments using a simple graphical model. Our model does not rely on existing gene annotations, and model selection is performed automatically, making it applicable with few assumptions and effectively decreasing bias while increasing uniformity in read coverage.
Read Paper