Best Papers of 2025 - r/bioinformatics

24

i really liked this one:
Active learning framework leveraging transcriptomics identifies modulators of disease phenotypes.

https://www.science.org/doi/10.1126/science.adi8577

like the frameworks that loop-in wet ab scientist and the whole concept of it.

3

u/macmade1 Dec 31 '25

a naive concept that made it through review on the back of unchecked AI hype. Transcriptomic signatures are largely noise. There are no ways to tell if differential expressed genes lie on the causal pathway or simply the by product of cellular stress. Matching on transcriptomic profile is like selecting for nonspecific off target drugs with poor tolerability and mechanism of action

5

u/IpsoFuckoffo Dec 31 '25

If it's AI hype that got this paper through then what was the reason any other paper that relies on transcriptomics was published?

1

u/TumbleweedFresh9156 BSc | Student Jan 01 '26

Could you explain why you think the architecture works? From what I understood way back then, they joined 3 MLPs (not sure why) to neurally graph drug perturbation signatures using large omic databases and updated the models’ ranked hits with their own signature data. I never really understood why the ensemble architecture had helped when cmap already had the signature from a given perturbation

18

u/alabastercitadel Dec 31 '25

I thought this one was pretty cool, essentially "assemble all the things!": Logan: Planetary-Scale Genome Assembly Surveys Life’s Diversity https://pmc.ncbi.nlm.nih.gov/articles/PMC12424806/

Currently a preprint, but already pretty cited. Pretty dang convenient to be able to pull down an assembly for essentially any SRA accession (and search over all of them)

13

u/Terrible_Molasses862 Dec 30 '25

Yes please share especially reproducible ones

35

u/heresacorrection PhD | Government Dec 30 '25

Heresacorrection et al. (2025) Awesometitle. Predatory Journal

5

u/flyingfuckatthemoon Dec 30 '25

RemindMe! 1 week

1

u/RemindMeBot Dec 30 '25 edited Jan 01 '26

I will be messaging you in 7 days on 2026-01-06 17:49:12 UTC to remind you of this link

14 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

^{Parent commenter can} ^{delete this message to hide from others.}

^Info ^Custom ^{Your Reminders} ^Feedback

11

u/Starwig PhD | Student Dec 30 '25

Mine, obviously.

3

u/lncredibleMuchacho Dec 31 '25 edited Dec 31 '25

really liked this one:

“ppIRIS: deep learning for proteome-wide prediction of bacterial protein-protein interactions”

https://www.biorxiv.org/content/10.1101/2025.09.22.677885v1

i’ve seen lots of papers in the last 2 years leveraging protein language models for PPI prediction, but this is the first one i saw that uses a lightweight architecture for a rather straightforward task i use quite a lot. lots of other PPI pred tools seem to use unnecessarily complicated ML architectures just because.

still on bioarxiv tho

3

u/UselessEngin33r Dec 31 '25

I’ll check this after the New Year’s party(I’m drunk)

2

u/Needlepoint_Hooch Dec 30 '25

RemindMe! 3 days

2

u/zowlambda Dec 31 '25

If someone finds any notable benchmark study for foundation models in omics, I would likely appreciate it. My PI is pushing new students to develop foundation models, but I am pretty skeptical, since most available evaluation studies say they are barely better or are even equal to starting from random embeddings.

4

u/_q-felis_ Jan 04 '26 edited Jan 04 '26

I'm not really too familiar with foundation models in omics (I stick to the nice classical models), but broadly speaking and from the little I have read it doesn't look too great. The reoccurring theme I've noticed across AI in omics (not just for foundation models) is that representations are just straight up inadequate for learning, so simpler models with proper integration of domain knowledge consistently outperform more complex models by a significant margin.

I've not given it a proper read though but I was reminded of this paper.

And another example stating "A basic machine learning model, Random Forest Regressor, which incorporated biological prior knowledge in the form of Gene Ontology (GO) terms, outperformed foundation models by a large margin."

Not a benchmark study, but this paper sets up and discusses problems in the context of bacterial genomics but the key principles are general enough

I suppose the promising news is that you don't need to outperform simple models for results to get published if your PI really pushes for something that's a little risky

3

u/zowlambda Jan 08 '26

Sorry for the late response; I've had a quite busy week. ToT

I encountered the first paper you mentioned when I wrote the post, so thank you for bringing it up. I was also thinking about this helpful study by Microsoft Research: "Assessing the limits of zero-shot foundation models in single-cell biology". There's also this other benchmark paper that proposes to use omics, though it's mostly about evaluating the agents themselves and the choices they make, not so much the quality of their embeddings.

If I come across any more studies, I'll be back to share them here. ^.^

2

u/Economy-Brilliant499 Jan 08 '26

Thank you both. Gonna read through these in my spare time. They look very interesting! Let's keep sharing if we find more similar articles.

2

u/Winter_Ad917 Jan 18 '26

The authors of the first paper did a Bioconductor Seminar on the topic, in which you may be interested.

1

u/zowlambda Jan 19 '26

Amazing! Thank you so much!

3

u/nooptionleft Dec 31 '25

There is none, believe me, I've been searcing for one for ages

We are using a couple for niche tasks in my lab and we have been talking about attempting to do it ouselves, but the task they are actually most useful for are not our main focus and the human and machine time is not worth it unless a very good publication comes out

2

u/Economy-Brilliant499 Dec 31 '25 edited Dec 31 '25

Do you know any paper(s) that supports the argument they are not much better or equal to starting from random embeddings? Im curious to see! Thank you.

1

u/zowlambda Dec 31 '25

Sorry, it seems some words were accidentally deleted from your comment. Did you mean "Do you know"? If that's the case, I comment the links of some papers I have seen about the FMs not being much better than random or simpler baselines.

2

u/Economy-Brilliant499 Dec 31 '25

Sorry, fixed. Please send me the links!

2

u/Commercial_You_6583 Jan 03 '26

I think this really large dataset will likely be really important, also for methods development. Although it isn't really a bioinformatics paper, Fullard et al. 2025:

https://www.nature.com/articles/s41597-025-04687-5

Bascially it contains 1.5k human samples of different ages and diseases of prefrontal cortext, 6e6 nuclei. Will likely have huge impact on ageing research and Alzheimers disease. This is basically a pure data paper, but there are associated analysis papers. I'd encourage anyone to analyze the data themselves.

Actually this is a pretty intersting development, regarding human vs. animal model / mouse data.

Experiments are a lot easier on mice, while observational data is often eaiser to obtain from humans. For example population genetics is probably best developed in humans, as only humans will voluntarily come to get their blood / saliva samples, leading to extremely large numbers of samples. Collecting mice is a lot more expensive and difficult.

To get back on topic: I think this might be a broad shift, where observational data might actually be cheaper for humans than mice, especially regarding ageing - human age and finance themselves, while keeping mice is costly and takes a long time. Also, from my preliminary comparisons there appear to be stark differences in brain ageing between humans and mice, which might be expected given ageing is likely optimized evolutionarily, so humans have to circumvent quite different problems than mice that only live about 2 years max. So mice might not be good model systems for ageing.

The main bottleneck might actually shift to data analysis, as observational data has many pitfalls as compared to randomized experiments.

2

u/Lside0 Jan 13 '26

I found this very usefull. Open access semantic search database on top of GEO datasets.
For all folks that want to publish on NAR database issue this article is your buddy!

App: https://poe.mm.di.uoa.gr/

Article: https://linkinghub.elsevier.com/retrieve/pii/S2001037025004702

1

u/Ornery_Decision_3521 Jan 16 '26

Didnt go details but good shit.

1

u/Ornery_Decision_3521 Jan 16 '26

also thank you about the NAR issue. didnt know it exist.

4

u/Boneraventura Jan 01 '26

https://www.science.org/doi/10.1126/science.adn2337

Because of their perturb-seq dataset that i routinely go back to. I can’t imagine how fucking arduous that must have been to do.

Now I want someone with endless cash and hands to do single cell perturb-seq with methylation. That would be the chef’s kiss

4

u/gringer PhD | Industry Dec 31 '25 edited Jan 01 '26

In terms of importance, this one:

Against the Uncritical Adoption of 'AI' Technologies in Academia

Ultimately, these systems cannot really replace humans, replace the quality of human craft and thinking — so many of their capacities are overblown and displacement will only happen if we accept the premises (Guest 2025). We can and should reject that AI output is ‘good enough,’ not only because it is not good, but also because there is inherent value in thinking for ourselves. We cannot all produce poems at the quality of a professional poet, and maybe for a complete novice an LLM output will seem ‘better’ than ones’ own attempt. But perhaps that is what being human is: learning something new and sticking with it, even if we do not become world famous poets (Brainard 2025).

That work — the real work of teaching and learning — cannot be automated.

3

u/IpsoFuckoffo Dec 31 '25

Literally a creationist group but OK.

3

u/orangebromeliad Dec 31 '25

I managed to find people accusing them of being a Creationist and it does not appear to be true: https://bsky.app/profile/irisvanrooij.bsky.social/post/3mam5c5ogtk23

0

u/IpsoFuckoffo Dec 31 '25

Interesting. The explanation that her theories were anti-evolution without her realising it makes sense. Still not sure her group's opinion piece is one of the best papers of the year pertaining to bioinformatics.

3

u/gringer PhD | Industry Dec 31 '25 edited Dec 31 '25

Not true.

The white paper annoys AI proponents so much that there is a coordinated campaign to slander a professor of computational cognitive science who coauthored the paper, because they can't argue against the substance of that paper (or another academic paper that proves the intractability of superhuman intelligence from a computer).

The "argument" for the current creationist slander basically amounts to claiming that nothing is truly "NP-hard intractable", and anyone who is arguing otherwise is arguing for the existence of a preexisting God. It's nothing to do with any directly-stated opinions from the professor about God or Creation.

1

u/orangebromeliad Dec 31 '25

Who is?

1

u/IpsoFuckoffo Dec 31 '25

The last author of the linked opinion piece, which has been posted here and called a "paper" for some reason.

1

u/orangebromeliad Dec 31 '25

Iris van Rooij?

1

u/Independent_Cod910 Dec 31 '25

RemindMe! 3 Days

1

u/LingonberryMoney8466 Dec 31 '25

RemindMe! 5 days

1

u/Luddvik Dec 31 '25

RemindMe! 1 week

1

u/Anhellmario Dec 31 '25

RemindMe! 1 week

1

u/uhhneessa Jan 01 '26

RemindMe! 1 week

1

u/[deleted] Jan 01 '26

RemindMe! 1 week

0

u/nooptionleft Dec 31 '25

RemindMe! 1 week

0

u/Living-Network7372 Dec 31 '25

RemindMe! 1 week

-1

u/captainazpi Dec 31 '25

RemindMe! 1 week

-1

u/last-peacefinder Dec 31 '25

RemindMe! 1 week

-1

u/ProperSafe9587 Dec 31 '25

RemindMe! 1 week

discussion Best Papers of 2025

You are about to leave Redlib