Biology-first modelling: why?
When we build models, we could start with biology. Not with elegant equations or tidy abstractions, but with the messy, fascinating mechanisms that govern life. Like most biologists, I care about how things actually work: how genes move, how fungi spread, how interactions scale up. We might want to understand how multiple mechanisms interact, not just individually, but when they are part of a greater system. And then… we might see what emerges when they do.
I’ve long struggled to put into words this modelling strategy, and why it is useful. But with the risk of sounding a little like everyone’s least favourite person in the world; I think we can call it a biology-first modelling approach. The idea is simple: before imposing mathematical or computational abstractions, the model is shaped by knowledge on biologically relevant mechanisms and structures. In my field of evolutionary microbiology, examples are the deletion and duplication of genes on structured genomes, local reproduction of bacterial cells taking up resources giving rise to emergent patterns, and chemical rules resulting in complex microbial communities with metabolic cross-feeding. Instead of translating biology into the language of math, I start by asking what’s happening biologically, and then figure out how to capture that faithfully. And if I fail to capture it faithfully, I learn something else: I do (yet) not understand the system.
“What I cannot build, I do not understand. ”
I’m not opposed to mathematics. Far from it: math is essential in every domain of science. But I don’t want it to dictate the biology. Adapting biological systems to fit elegant or well-known mathematical structures (e.g., Lotka-Volterra, graphs, classic ODEs), resulting in clean and tractable concepts like equilibria, symmetry, and uniformity, can be a useful shortcut. But it can also obscure the very thing I’m interested in: the unexpected nature of life.
Artist impression of TIktaalik, the intermediate form between fish and amphibians.
A biology-first model leaves room for surprise. It doesn’t force tidy outcomes. It allows complexity, feedback, noise. It invites the kinds of patterns you actually see in biological data: heterogeneity, rough edges, unexpected correlations, and can generate biologically meaningful data in the form of DNA sequences, networks, and phylogenetic trees. Such patterns can be guided by bioinformatic data, or vice versa, make predictions about bioinformatic data. In that way, these models may produce predictions about things that are already in the data. Not very different from how the intermediate form between fish and amphibians (Tiktaalik) was not only predicted, but we knew exactly where to look for it.
To me, this approach isn’t about rejecting simplicity. Simplicity is one of the most satisfying outcomes in science, but it has to be earned. You can’t find it by forcing systems into neat shapes; you find it by paying close attention to nature’s mess. It’s easy to forget now, but the elegance of evolution by natural selection (Darwin and Wallace’s great insight) only emerged after years of grappling with the strange, muddy, and often seemingly contradictory details of biology. Their breakthrough wasn’t clean at the outset. It was carved out of complexity. They stayed close to the biology, even when the resulting mess made things harder to analyse. So if our goal is to understand living systems, we shouldn’t be afraid of a little mess. If we were to pave the jungle, we wouldn’t be studying the jungle. For that, we need to put on some boots.
Too long; didn’t read:
I think biology-first modelling deserves more attention. It’s not the only way, but it’s one of the few that can capture the unruly, generative, and surprising nature of life. If we want models that don’t just describe biology but reveal it, we need to start where life does: with the biology itself.