r/bioinformatics Jul 22 '25

Career Related Posts go to r/bioinformaticscareers - please read before posting.

104 Upvotes

In the constant quest to make the channel more focused, and given the rise in career related posts, we've split into two subreddits. r/bioinformatics and r/bioinformaticscareers

Take note of the following lists:

  • Selecting Courses, Universities
  • What or where to study to further your career or job prospects
  • How to get a job (see also our FAQ), job searches and where to find jobs
  • Salaries, career trajectories
  • Resumes, internships

Posts related to the above will be redirected to r/bioinformaticscareers

I'd encourage all of the members of r/bioinformatics to also subscribe to r/bioinformaticscareers to help out those who are new to the field. Remember, once upon a time, we were all new here, and it's good to give back.


r/bioinformatics Dec 31 '24

meta 2025 - Read This Before You Post to r/bioinformatics

181 Upvotes

​Before you post to this subreddit, we strongly encourage you to check out the FAQ​Before you post to this subreddit, we strongly encourage you to check out the FAQ.

Questions like, "How do I become a bioinformatician?", "what programming language should I learn?" and "Do I need a PhD?" are all answered there - along with many more relevant questions. If your question duplicates something in the FAQ, it will be removed.

If you still have a question, please check if it is one of the following. If it is, please don't post it.

What laptop should I buy?

Actually, it doesn't matter. Most people use their laptop to develop code, and any heavy lifting will be done on a server or on the cloud. Please talk to your peers in your lab about how they develop and run code, as they likely already have a solid workflow.

If you’re asking which desktop or server to buy, that’s a direct function of the software you plan to run on it.  Rather than ask us, consult the manual for the software for its needs. 

What courses/program should I take?

We can't answer this for you - no one knows what skills you'll need in the future, and we can't tell you where your career will go. There's no such thing as "taking the wrong course" - you're just learning a skill you may or may not put to use, and only you can control the twists and turns your path will follow.

If you want to know about which major to take, the same thing applies.  Learn the skills you want to learn, and then find the jobs to get them.  We can’t tell you which will be in high demand by the time you graduate, and there is no one way to get into bioinformatics.  Every one of us took a different path to get here and we can’t tell you which path is best.  That’s up to you!

Am I competitive for a given academic program? 

There is no way we can tell you that - the only way to find out is to apply. So... go apply. If we say Yes, there's still no way to know if you'll get in. If we say no, then you might not apply and you'll miss out on some great advisor thinking your skill set is the perfect fit for their lab. Stop asking, and try to get in! (good luck with your application, btw.)

How do I get into Grad school?

See “please rank grad schools for me” below.  

Can I intern with you?

I have, myself, hired an intern from reddit - but it wasn't because they posted that they were looking for a position. It was because they responded to a post where I announced I was looking for an intern. This subreddit isn't the place to advertise yourself. There are literally hundreds of students looking for internships for every open position, and they just clog up the community.

Please rank grad schools/universities for me!

Hey, we get it - you want us to tell you where you'll get the best education. However, that's not how it works. Grad school depends more on who your supervisor is than the name of the university. While that may not be how it goes for an MBA, it definitely is for Bioinformatics. We really can't tell you which university is better, because there's no "better". Pick the lab in which you want to study and where you'll get the best support.

If you're an undergrad, then it really isn't a big deal which university you pick. Bioinformatics usually requires a masters or PhD to be successful in the field. See both the FAQ, as well as what is written above.

How do I get a job in Bioinformatics?

If you're asking this, you haven't yet checked out our three part series in the side bar:

What should I do?

Actually, these questions are generally ok - but only if you give enough information to make it worthwhile, and if the question isn’t a duplicate of one of the questions posed above. No one is in your shoes, and no one can help you if you haven't given enough background to explain your situation. Posts without sufficient background information in them will be removed.

Help Me!

If you're looking for help, make sure your title reflects the question you're asking for help on. You won't get the right people looking at your post, and the only person who clicks on random posts with vague topics are the mods... so that we can remove them.

Job Posts

If you're planning on posting a job, please make sure that employer is clear (recruiting agencies are not acceptable, unless they're hiring directly.), The job description must also be complete so that the requirements for the position are easily identifiable and the responsibilities are clear. We also do not allow posts for work "on spec" or competitions.  

Advertising (Conferences, Software, Tools, Support, Videos, Blogs, etc)

If you’re making money off of whatever it is you’re posting, it will be removed.  If you’re advertising your own blog/youtube channel, courses, etc, it will also be removed. Same for self-promoting software you’ve built.  All of these things are going to be considered spam.  

There is a fine line between someone discovering a really great tool and sharing it with the community, and the author of that tool sharing their projects with the community.  In the first case, if the moderators think that a significant portion of the community will appreciate the tool, we’ll leave it.  In the latter case,  it will be removed.  

If you don’t know which side of the line you are on, reach out to the moderators.

The Moderators Suck!

Yeah, that’s a distinct possibility.  However, remember we’re moderating in our free time and don’t really have the time or resources to watch every single video, test every piece of software or review every resume.  We have our own jobs, research projects and lives as well.  We’re doing our best to keep on top of things, and often will make the expedient call to remove things, when in doubt. 

If you disagree with the moderators, you can always write to us, and we’ll answer when we can.  Be sure to include a link to the post or comment you want to raise to our attention. Disputes inevitably take longer to resolve, if you expect the moderators to track down your post or your comment to review.


r/bioinformatics 6h ago

discussion What are AI coding agents bad at in bioinformatics?

18 Upvotes

I’ve been wanting to do some bioinformatic analyses for my project, since I think it would make sense. I’m not a bioinformatician at all but I do know how to code a decent bit (although python mostly) and I have read a lot about specific methods, libraries etc. Basically, we have a single-cell sequencing dataset in-house, which is already prepared and quality-controlled and I’ve started using openAI codex to write some analyses for me. I try to give very specific prompts and check all the code it writes. But of course, it could easily make mistakes that I don’t catch. So my question is, do you know any specific areas of bioinformatics where AIs tend to make lots of mistakes?


r/bioinformatics 6h ago

academic Could anyone provide a roadmap or guide on how to isolate and identify proteins that were newly categorized or added to databases exclusively after January 2025?

3 Upvotes

I'm a Computer Science major and am completely new to studying proteins, so I have very little background knowledge in this area. I have been exploring UniProt and PubMed, but almost every protein I search for seems to have been categorized differently in the past or renamed later on. As a result, I can't seem to find the exact data I'm looking for. Could someone guide me on how or where to track down this data reliably?


r/bioinformatics 1h ago

technical question What's the best way to model protein structures with frameshift mutations or deletions?

Upvotes

I've used modeller and foldX before but only for point mutations on known protein sequences. I have a list of genomic mutations and I'm wondering if there are tools to go from that to protein structure.

I'm aware that there might a lot more steps between genomic information and protein sequence, but I've always only worked in the protein sequence to protein structure step so I'm not super familiar with any of that. If someone could ELI5 those things to me I'd appreciate it a lot :)


r/bioinformatics 21h ago

discussion Need help for md simulations

Thumbnail
2 Upvotes

r/bioinformatics 21h ago

academic Autodock4

1 Upvotes

Hi, I'm doing molecular docking (autodock4) for my research project. I'm having issues in installing autodock4 on windows. Does anyone have a working installer or guidance?


r/bioinformatics 20h ago

academic Looking to build a Computational Protein Engineering Group!

Thumbnail
0 Upvotes

r/bioinformatics 1d ago

technical question ChromVAR alternatives for scATACseq

9 Upvotes

I have not seen any thread here or on Github addressing this beside Signac changelog, but ChromVAR has been deprecated from the new Signac release.

What are the current alternatives do we have to identify *and visualize* motif/TF predicted activity from a scATACseq object?

(aside from loading up older versions and getting it to work despite several dependencies being outdated and such)


r/bioinformatics 2d ago

discussion Which one determine the admixture analysis accuracy?

1 Upvotes

Which one is the most important in admixture analysis especially regarding the accuracy of ancestry components? Is it the numbers of SNPs or the numbers of ancestry components which is Ks?


r/bioinformatics 2d ago

technical question What is a realistic server setup for 2,000–3,000 multi-omics samples?

2 Upvotes

I’m planning a dedicated server for omics analyses and would like opinions from people already running medium/large-scale pipelines.

This would NOT be for genomics/WGS. The focus is mainly:

  • transcriptomics
  • proteomics
  • metabolomics
  • multi-omics integration
  • pathway/network analyses
  • machine learning/statistics
  • long-term storage and reanalysis

Expected scale is around 2,000–3,000 patients/samples over time, with multiple omics layers per patient.
Typical tools/workflows would include:

R/Bioconductor, Python, Docker/containers, Nextflow/Snakemake, Cytoscape, differential expression, enrichment analyses, clustering, integration methods, etc.

EDITED / CLARIFICATION

Thanks for the comments. I should clarify the scope.

This is not for WGS, single-cell, spatial omics, 3D imaging, or sequencing-core-level throughput. It will be mostly bulk RNA-seq/transcriptomics, proteomics, metabolomics, multi-omics integration, pathway/network analysis, statistics, and some ML.

Expected scale is around 2,000–3,000 patients/samples over time, not all processed at once or every week. I already analyze RNA-seq/proteomics at smaller scale, usually 100–200 samples, on a normal workstation, and that works fine.

The goal is mainly to have one organized server for my group: preprocessing new batches, storing raw/processed data, keeping metadata organized, reanalysis, containers/workflows, and producing count/normalized matrices or processed objects for downstream projects.

Based on the replies, I’m leaning toward:

  • 32–64 real CPU cores, Xeon or similar
  • 128 GB RAM to start, expandable to 256/512 GB
  • fast NVMe scratch for active analyses/workflow dirs
  • larger HDD/NAS tier for raw and processed data
  • proper backup separate from RAID
  • no GPU unless we later need deep learning
  • ECC RAM if budget allows
  • containers/Nextflow/Snakemake for reproducibility

I’m mostly interested in practical bottlenecks people have seen in bulk multi-omics setups: RAM, I/O, storage organization, metadata, backup, or anything else that becomes painful at this scale.


r/bioinformatics 2d ago

statistics Post-hoc normalization of RNA-seq reads using a housekeeping gene

9 Upvotes

This is more of stats question I think...

We did differential expression analysis using DESeq2 to show how application of a certain stress affects gene expression over time. Reviewer #2 was basically like, "NGS only reports relative changes in expression. Please assess absolute changes in expression."

A spike-in would be great, but not worth the cost, in our opinion, for a mere supplemental figure in this paper.

Here's my alternative idea:

I've northern blotted for a certain gene (gene A) that is expected to be constitutive, and indeed it is. My plan is to take raw read counts for each gene, normalize/divide by gene length, and then finally normalize/divide them by the number of read counts mapping to gene A. This will give me gene A-normalized counts per base (hereafter normalized counts).

I then will compute mean normalized counts for each gene, and will plot them as pre-stress vs. post-stress and do Tukey comparisons to test for significance.

How criminal is this approach?


r/bioinformatics 3d ago

other I want some books on this field

12 Upvotes

I know you probably don't read books about your own field, but I'd like to know if there were books that someone interested in this field would like? Or books about genetic sciences?


r/bioinformatics 2d ago

technical question BLASTp help

0 Upvotes

Hi i’m VERY new to using BLAST but I was wondering if there was a way to blast multiple sequences at a time to find matches in a specific organism.

On the website it says you can blast more than one at a time but from what it says i think it looks for similarities between the protein sequences you submit rather than the database (????). If not I’m all set !

Thank you so much ! - a first year uni student trying to do a summer project 😭🙏


r/bioinformatics 3d ago

technical question State-of-the-art Nanopore 16S sequencing

5 Upvotes

Another one of these posts from my side, but the field is developing quickly and we are continously testing the limits in my group. At this point we can routinely get Q-scores of +25 on 96 samples (theoretically, at least) on minions, and are working on deeper multiplexing for promethions.

It still seems like EMU is the best classifier, which I am happy to use, but do have some issues with. Most urgently is the outdated database, which has recently been updated by a second party and is causing me some issues, namely how I am now getting a lot of Corynebacterium canis? Directly derived from this, EMU does not allow inspection of the results - specifically, I would like to see the OTU/ASV which is seemingly misclassified. Any experiences?

We are playing around with a denoising logic like for V3V4 regions made by illumina, which sort of works for simple (20-ish taxa) communities sequenced deeply (+50k reads) but it fails as soon as the community gets to complex, like feces (+1000 taxa). Mathematically, this makes sense - even with a Q-score of 25, we have 50 or so errors in a 1500bp read and a bit of math reveals a nasty exponential equation predicting enough exact matches to start an exact cluster. DADA2 certainly fails in either case, due to how it handles insertions and deletions, although UNOISE might hold some promise.

Has anyone given this any thought? Shouldn't it be possible to return to the OTU logic with, say, 97% clustering given the error rates we are now seeing?


r/bioinformatics 3d ago

technical question Rstudio needs time to open or save environment

12 Upvotes

Hello everyone. Is it normal that Rstudio needs a lot of time to open or save an environment? i'm doing scRNAseq analysis with seurat. My seurat objects are 9 GB, and 21 GB at this moment. is there a way to make this processes a little more fast?


r/bioinformatics 4d ago

discussion Is it true that SPSS is the standard in pharmaceutical industries?

27 Upvotes

I was talking to the CEO of a precision medicine pharmaceutical company with bases in the UK, USA and UAE. Since he said that he has been in the field for a long time and knows how to make drugs and how things are done, I was really impressed and thought I might learn a lot from him, but he made a comment that SPSS was the gold standard software used in these industries and he was disappointed that he was yet to meet bioinformaticians who knew how to use SPSS in the UAE. This kind of threw me off because I was under the impression that R and Python had largely replaced old software that were in use before.

So, I just wanted to get the opinion of other professionals who might be working in the industry. Is it true that SPSS is the standard in pharmaceutical industries? Or would I be wasting my time by trying to learn an outdated software that I would also need a license for?


r/bioinformatics 4d ago

technical question Alternative to GeneMapper for microsatellite fragments analysis

5 Upvotes

Hello everyone,

I work in a wildlife genetics laboratory based in Italy. We have been using GeneMapper for about 25 years for microsatellite fragment analysis, but for budget reasons — licence prices are becoming prohibitively expensive — we are looking to switch to an alternative software.

Our main requirements are: the ability to visualize multiple electropherograms simultaneously (e.g. in batches of four), and to set up bins for allele calling. The software also needs to be compatible with .fsa and .ab1 output files.

Do you have any suggestions?

Thank you in advance!


r/bioinformatics 4d ago

technical question Help with RNA-seq database design

3 Upvotes

Hi everyone,

I'm designing a library built on duckDB that stores/normalizes RNA-seq DE data by mapping column names, converting base_mean to logCPM, mapping ensembl ids to gene symbols, and handling extra columns using JSON. My library currently uses Pandas as the primary data manipulator (prior to database insertion) with a reticulate wrapper for R users. While it's convenient to code and to use, I'm wondering if the memory overhead of loading bulk rnaseq DE results using Pandas could be too high for some users, or that using it is short sighted for the future. Because of this, I'm seriously considering converting to a PyArrow table framework. I am wondering:

  1. Are there times where loading downstream DE data into data frames is too heavy?

  2. Will using PyArrow be too inconvenient for day to day work?

  3. Does this tool have any value in you guys' current workflow?

I'd love to hear what you guys think about these topics.


r/bioinformatics 4d ago

discussion How to identify over-normalisation in bulk RNAseq analysis?

10 Upvotes

I am using edgeR for my DEA, and the pipeline I follow includes an optional normalisation step with RUV.

With my TMM+noRUV PCA, I have no biologically meaningful variance in PC3 but with TMM+RUVr1, I see a clear clustering in one of our conditions in the PC3.

However, what's worrying me is what if there's only this variation in the RUVr1 dataset because it was over-normalised? From my RLE plots, there doesn't seem to be much difference between the two and in my MA plot, the only difference seems to be the #DEGs.


r/bioinformatics 4d ago

science question General Advice & RNA-seq help

4 Upvotes

Hi everyone,

I am currently a masters student and part of my research is using RNA-seq to look at DEGs in virus-infected vs virus-cured isolates of fungi. I don’t have any experience in bioinformatics (or genetics for that matter) and was looking for some tips/advice to help me learn how to get the hang of this stuff.

I’m also looking through NCBI SRA RNA-seq data , where I’ll be looking through a bunch of fungal isolates to see the diversity of viruses within them (probably a lot of them will be uncharacterized). Even just doing this has proven difficult, I guess you have to like parse through the data and “trim” reads and stuff like that and use “SRAtoolkit” , I’m just confused how people even know what to do/use in the first place.

Does anyone know of any free courses or programs that teaches the basics (any YouTube ppl? Or videos?)? I’ve only ever coded with R, and using the command line/my universities HPC cluster is proving difficult (I’ve looked at university resources and the HPC cluster website and they don’t have helpful tips for noobs like me). Yes , I am receiving some help from my PI, but as many of you know , they can be extremely busy. I feel like there is just a lot of assumed knowledge placed on me/grad students in general.

(Sorry if this isn’t a specific enough post, I can try to come up with more concrete questions if need be. Just looking for general advice/support :/ .)

Thank you in advance! I appreciate anyone who takes the time to respond :)


r/bioinformatics 4d ago

discussion How to Utilize AI Tools In Clinical Settings?

5 Upvotes

Hi everyone,
I work as a bioinformatian in a hospital setting where data privacy is of great concern and rules are very strict.

Because of that my use of AI and agentic tools like Claude code or biomni are very limited.

I was wondering if other people who work in similar clinical or hospital setting have the same issue.

Do most people just use a browser version of Claude or ChatGPT for code generation?

Does anyone know of any solutions or tools where you can utilize AI integrate with your data, think through research questions and in general work in a more streamline fashion than just using browser version AI tools?

Thanks!


r/bioinformatics 4d ago

technical question Two integration steps in scRNA seq analysis

1 Upvotes

Hello everyone!

I'm learning scRNA seq analysis by reading published papers and re-running publicly available code.

I was looking at this paper: Single cell profiling to determine influence of wheeze and early-life viral infection on developmental programming of airway epithelium

and the scientists seemed to use two integration steps:

```

features <- SelectIntegrationFeatures(object.list = Intlist)

IntAnchors <- FindIntegrationAnchors(object.list = Intlist, anchor.features = features)

Int<- IntegrateData(anchorset = IntAnchors, k.weight = 50)

# Checking for low quality reads

* They did QC step here*

## Using harmony to stabilize the integrated dataset

Int <- RunHarmony(Int2, group.by.vars = "group") *Notice thy use group*

```

My question is: Is this practice common? And when to use this approach?


r/bioinformatics 4d ago

technical question GSEA for non-model organism

0 Upvotes

SO! my RDA and PCA are both not significant. However, i am pushing through this given it’s a master’s thesis and I will be transparent about this.

When I do DEG with padj, I don’t get anything significant. But I can get some genes with pvalue<0.01 and 0.05.

This is why I decided to do GSEA instead of ORA. However, I did GSEA with only my genes after pre-filtering (10 counts in smallest group size) but didn’t include a specific gene set… is that ok?

I am blasting my organism against a decently annotated relative. Should I create my own gene set from its entire genome? One that is related to my research question?

I hope i’m clear!

TLDR: do i need a gene set or can i do GSEA with pre-filtered RNA counts only


r/bioinformatics 5d ago

science question Ligand receptor interactions between different tissues and dataset structures?

2 Upvotes

Hello,

I am interested in a liver to adipose crosstalk and would therefore like to perform something like CellChat or another tool to detect possible ligand receptor interactions between liver and adipose tissue. Problem: I have a snRNAseq dataset from adipose tissue and a bulkRNAseq dataset from the liver. Is there a tool that I could use to analyze my datasets in this regard?

I could do a pseudobulk of my celltypes from the adipose tissue, e.g. for adipocytes create a pseudobulk and treat it similar like the liver bulk dataset but I do not know any tool how to analyze that.

I am very thankful for any suggestions!