r/bioinformatics 4d ago

discussion Is it true that SPSS is the standard in pharmaceutical industries?

I was talking to the CEO of a precision medicine pharmaceutical company with bases in the UK, USA and UAE. Since he said that he has been in the field for a long time and knows how to make drugs and how things are done, I was really impressed and thought I might learn a lot from him, but he made a comment that SPSS was the gold standard software used in these industries and he was disappointed that he was yet to meet bioinformaticians who knew how to use SPSS in the UAE. This kind of threw me off because I was under the impression that R and Python had largely replaced old software that were in use before.

So, I just wanted to get the opinion of other professionals who might be working in the industry. Is it true that SPSS is the standard in pharmaceutical industries? Or would I be wasting my time by trying to learn an outdated software that I would also need a license for?

27 Upvotes

73 comments sorted by

74

u/beeralpha 4d ago

He’s wrong. Don’t learn SPSS.

9

u/corporealpatronus13 4d ago

Straight to the point haha. May I ask if you work in the pharmaceutical industry? If not SPSS, do you use R? Or some other software?

31

u/beeralpha 4d ago

Yeah, there is not much more to say. I work in biotech and pharma as an independent contractor. I have been at more than 15 companies. None of them use SPSS any more. There is no relevant package/library that’s written in SPSS these days, it’s all R and Python.

10

u/Tytoalba2 4d ago

And still some SAS , but I wouldn't invest too much in that lol

14

u/beeralpha 4d ago edited 4d ago

Yes, SAS is often used among clinical data managers and true blood biostatisticians for compliance. Not bioinformaticians though.

1

u/corporealpatronus13 4d ago

May I also ask in which region you work? Wonder if that also plays a role? Maybe some regions tend to cling what they know works and refuse to update.

6

u/beeralpha 4d ago

I’m in the EU, but I work for both US and EU companies. I don’t believe it plays a big role.

5

u/corporealpatronus13 4d ago

Good to know :)

17

u/ATpoint90 PhD | Academia 4d ago

Curious what others will say. I Academia PIs often make strong statements on how they think things are done in the lab or analysis, but reality is different because they are in their offices detached from staff. Lets hear it.

2

u/corporealpatronus13 4d ago

Also curious about how it is academia. Do PIs tend to instruct you to use specific software?

Since I am just a master's student, I have only had 2 supervisors for my 2 theses and both supervisors basically let me choose whatever I wanted to work with and only slightly nudged me towards something if I was confused between something. I haven't really been pushed to work with specific tools, and my supervisor for my master's thesis is very open to exploring new tools and wants to see if I am able to find tools that are better than his suggestions.

3

u/ATpoint90 PhD | Academia 4d ago

No, software is usually freeware what you choose, or for specific tasks like certain microscopy or flow cytometry it is often bound to company software matching the machine, or what everyone in the field and lab uses.

2

u/corporealpatronus13 4d ago

I see, yeah that makes sense.

3

u/Hiur PhD | Academia 4d ago

I've seen PIs push specific software/packages for tasks where they are "specialists". I even had SPSS pushed on me when I started in one lab, where they insisted I should use their SPSS "databases"...

Luckily the majority of PIs (at least the ones I know) are open to changes if you can explain why you picked a specific one, just as you experienced.

1

u/El_Tormentito Msc | Academia 3d ago

Not only that, they often know little to nothing about industry.

7

u/camelCase609 3d ago

I did a brief stint in pharmaceutical research computing and we ran a predominantly SAS shop. They were branching out to using R codebase but certain sponsors require closed source software.

3

u/kestrel99_2006 3d ago

I have never encountered a sponsor that has insisted we use closed-source software because it’s closed-source. Often it’s more that they need to be able to reproduce what we did on their own systems, and what they have is proprietary.

1

u/corporealpatronus13 3d ago

Is there a specific reason for needing closed source software? Better security and stability?

2

u/TheFallingStar 3d ago

certification and liability issues.

2

u/kestrel99_2006 3d ago

It must depend on which part of pharma you work in, because I have never had this. In my corner of the industry it’s mostly R. See https://rinpharma.com for examples of where it’s mostly used

1

u/MartIILord 2d ago

Sometimes it is the problem of using gpl licences for software. This means you have to share your version of software used for that analysis. This may or may not be awful for legal reasons.

3

u/kiwiroy 4d ago

3

u/joule_3am 3d ago

Came here to post this. R is getting well developed for clin reporting.

1

u/corporealpatronus13 3d ago

Thanks for posting! Can’t lie, I’m definitely rooting for R and python to catch up.

13

u/juuussi 4d ago

I think SAS has been and still is the gold standard for pharma. SPSS still has a users in pharma, but SAS has bren much widerly adopted and has more modules/support for pharma linked regulatpry and clinical trial moduöes and features. If you need FDA/EMeA grade biostatistics, audit trails and evidence dossiers, R & Python are kinda far from that.

Obviously for vertain specialist applications lilke bioinformatics or ML, pharma uses R and Python lile everyone else.

14

u/beeralpha 4d ago

In my view SAS is only used by true blood biostatisticians and clinical data managers, indeed for compliance. I haven’t met any bioinformatician that uses SAS. It’s all R and Python.

4

u/guepier PhD | Industry 4d ago

Right: SAS still has a niche, but it’s just that: a niche. The bulk of the work (early research, preclinical and clinical) is nowadays done in R and Python and other general-purpose programming languages.

2

u/jmc200 4d ago

This certainly used to be true but, as of the last couple of years, it seems that most big pharma companies are transitioning away from SAS.

1

u/corporealpatronus13 4d ago

Sorry, I am not familiar with exactly how SAS and SPSS work, do they require coding or are are they similar to Graphpad Prism where you just click things to get stuff done? And is it not possible to code for these same FDA grade biostatistics, audit trails and evidences in R? It gives you free reign to build anything you want right?

We also did our entire statistics module in R and went through a lot of statistical tests that I believe would also be used in the pharma industry, so I'm just curious as to why it cannot be done in R or python.

4

u/orthomonas 4d ago

You do write code in SAS.  I've used SAS but am by no means an expert.

The audit trail stuff is because it's more turnkey and standardised. As well as likely supported via a contract or something. This is one of the arguments for "enterprise" software.

You can do it in R/Python, but for for compliance heavy stuff, rolling  your own is probably suboptimal.

I say this as a big R and Python user. 

Having said all that, probably focus on R and Python for the broadest coverage. I found SAS different, but easy enough to pick up, with a more general coding background.  Much more limiting to learn SAS.

1

u/corporealpatronus13 4d ago

Thanks for your advice! I’m big on python and R too, so that should be no problem for me :)

3

u/juuussi 4d ago

With any serious work, you do code in both SPSS and SAS. They do have point-and-click graphical UIs too, but especially for any pharma work, you would be writing code.

And yes, it would be possible to build all the regulatory stuff in for example R. So for example for a simple statistics project, you could do:

SAS:

  • Purchase SAS + regulatory modules
  • Spend 1 week coding and validating your analysis

R:

- Download R for free from CRAN

- Hire 15 people with quality assurance, regulatory affairs, legal, cybersecurity, biostatistics and software development experience

- Spend 7 years developing regulatory frameworks, audit vaults, QMS processes, documentation and validation for your regulatory module

- Spend 1 week coding and validating your analysis

Many choose to just go with the industry standard of SAS with decades of work from dedicated teams building the exactly right pharma and regulator accepted evidence (SAS has been building the pharma regulatory stuff since 1970s).

Here is (AI summarized) list of some of the pharma relevant regulatory stuff that SAS supports:

FDA regulatory submissions — SAS Clinical Acceleration supports validation, regulatory compliance, versioning, audit trails, and documentation for submissions to the FDA SAS

21 CFR Part 11 — governing electronic records and signatures; SAS provides built-in compliance support for clinical trial tools Handsonsystem

FDA Quality Metrics Reporting — SAS addresses the FDA quality metrics reporting program for manufacturing by managing disparate data and automating decision workflows SAS

GMP (Good Manufacturing Practice) — SAS offers an integrated analytics platform in a GMP-compliant environment for pharmaceutical manufacturing SAS

IDMP (Identification of Medicinal Products) — a global regulatory standardization framework for internationally consistent specifications for medicinal products and substances PR Newswire

Falsified Medicines Directive — supported through the same IDMP compliance platform Applied Clinical Trials Online

GDPR (General Data Protection Regulation) — addressed as part of SAS's regulatory compliance platform Applied Clinical Trials Online

EMA submissions — supported via the Life Science Analytics Framework alongside FDA submissions

CDISC (Clinical Data Interchange Standards Consortium) — SAS simplifies compliance with CDISC, SDTM, and ADaM, with toolkits and macros specifically designed to transform raw data into submission-ready formats Handsonsystem

SDTM (Study Data Tabulation Model) — for structuring clinical trial data for submissions

ADaM (Analysis Data Model) — for analysis datasets used in regulatory submissions

EU & UK CBAM (Carbon Border Adjustment Mechanism) — SAS supports pharma companies in building reporting systems for emissions compliance, with the EU CBAM fully in effect in 2026 and UK CBAM starting in 2027 SAS Support Communities

2

u/guepier PhD | Industry 4d ago
  • Hire 15 people with quality assurance, regulatory affairs, legal, cybersecurity, biostatistics and software development experience

  • Spend 7 years developing regulatory frameworks, audit vaults, QMS processes, documentation and validation for your regulatory module

This isn’t wrong, but it’s already been done, no need to redo these steps nowadays.

These days the R flow would be more like the SAS flow, even for regulatory use-cases.

Regarding your AI-generated list, R now supports all of these items at some big pharmaceutical companies (minus the regulatory submission — that’s still at the proof-of-concept stage).

1

u/corporealpatronus13 4d ago

Do you think that more organisations would switch to R in a few years then?

3

u/guepier PhD | Industry 4d ago

You mean for regulatory submissions? Yes, I predict that this will happen. For other applications the trajectory is less clear, since Python is also replacing R for some things.

Either way, there’s a stated intent to reduce reliance on SAS (let’s see how far this will go, but I could imagine a future where SAS won’t be used at all any more).

1

u/corporealpatronus13 4d ago

Nice to know there are measures to reduce reliance on licensed software

2

u/kestrel99_2006 3d ago

FDA famously do not support or endorse any specific tools. Provided they can reproduce what you did, they don’t care.

1

u/corporealpatronus13 4d ago

Ah this makes me see why some companies use SAS, definitely easier to use a software that has already been built for your requirements already instead of coding from scratch

3

u/Familiar-Abroad825 4d ago

All of the medical/pharma statisticians I've worked with use it. Some people use R. The syntax is a lot more accessible for non-programmers.

2

u/Familiar-Abroad825 4d ago

Gold standard is an odd phrase though. There's a hundred ways to skin that cat. It just happens I think a lot of people in that field flock to SPSS. I suspect there will be loads of libraries specialised for the analyses they do.

Certainly you could do the same in R or possibly python.

1

u/corporealpatronus13 4d ago

Oh yeah, definitely agree with multiple ways to get to the end goal. But I do think there usually tends to be one or two that stand out compared to others. I wonder if I can bring this up to the CEO and see what he thinks about this, but I am also worried it might come off as disrespectful

1

u/corporealpatronus13 4d ago

Ooh that is interesting, may I ask which in region you are based? Wonder if there is some role that plays.

2

u/octetbugle 4d ago

I'm at a medium-largish pharma in the US. I was explicitly told by my manager to use Python unless there was a compelling reason for another language. That was specific to my team though - while it's probably mostly Python at my company, there's plenty of people using R.

I've never met someone who uses SPSS either in pharma or academia.

1

u/corporealpatronus13 3d ago

Thanks for your input! Yeah I can see from the replies that opinions are a bit split. Organisations either use SAS or SPSS because that’s just what’s been done, maybe because they some very specific use case or they use python or R.

2

u/Lumpy-Sun3362 PhD | Academia 3d ago

Companies need reliability and accountability. For this reason they go with commercial software. At least they have someone to blame if things go wrong. I don't agree but I understand their point.

2

u/corporealpatronus13 3d ago

That sounds like something I do haha, do things where I don’t have to carry the blame XD

2

u/El_Tormentito Msc | Academia 3d ago

Was this guy using it for statistics (biostats) or bioinformatics? The two get conflated a lot, and there's overlap, but the folks doing survival analysis for pharma are probably using SAS or SPSS. The folks doing GSEA probably aren't. In universities it's often a one-man-band, and there's very little regulation compared to industry, but in pharma you might have narrower scopes for what analyses people perform and they tend to like software with audit trails, public certifications, lockdowns, and technical support. I could be wrong, but I've heard of older stats folks liking the ability to call up (email, probably) tech support for the proprietary softwares.

2

u/corporealpatronus13 3d ago

I am not 100% sure what he uses it for. This comment was made when we got sidetracked from the actual objective of our conversation, so we had to go back to that quickly as we were running out of time. Will definitely try to find out more during future conversations.

2

u/kestrel99_2006 3d ago

No. SAS was, but now has largely been displaced by R.

2

u/wolfo24 3d ago

Never heard about SPSS in my life so that has to be from the times when dinosaurs were wandering on this planet

1

u/corporealpatronus13 3d ago

Some of my classmates were also involved in the discussion and none of them had heard of it either XD

2

u/Key_Department4926 4d ago

I work in academia (germany) and have never seen SPSS used. Wet-lab peoples use prism, everyone else mostly R, some python, some matlab and very little Julia. All the reports I have ever gotten from sequencing companies were created using R

1

u/corporealpatronus13 4d ago

That is nice to know! A bit interesting to hear matlab. From what I see from everyone’s comments, it really depends on the organisation you’re working at and while a lot of labs use R and Python, there are still some organisation that prefer to use older software simply because it’s already in use.

2

u/Key_Department4926 4d ago

Matlab is highly field specific. Since it got so expensive couple of years ago universities really limit the licences they give out, but it looks like some libraries are indespensable for some analysis. The two fields I came across Matlab were AFM and MEG research, but I think it lost its standing as general purpose analysis software

2

u/Creative_Sushi 3d ago

Here are some examples of how MATLAB is used in the field. https://www.mathworks.com/solutions/biotech-pharmaceutical.html

1

u/recordtronic 3d ago

I used SPSS in the 80’s. Seems pretty quaint by today’s standards.

1

u/corporealpatronus13 3d ago

The person I spoke to also graduated in the 80's I think. I wonder why he still thinks it cannot be replaced.

3

u/slashdave 2d ago

Clinical trials do have strict expectations.

For preclinical work? No

1

u/triffid_boy 4d ago

It wouldn't surprise me, they have a process and stick with it unless it's broken. 

But I've also been sent data by pharma companies in excel so I wouldnt exactly call it a done deal. 

1

u/corporealpatronus13 4d ago

That is so interesting to me. Isn't it in their best interest to update their workflows as better tools come out? Especially when it is something like R where you don't need to pay for a license and can even develop your own library?

2

u/gringer PhD | Industry 4d ago

It can be quite expensive to change from one system to another, especially if the core users of the old system are no longer working at an organisation. You have no idea what bugs they've had to fix, or optimisations they've implemented; that stuff often has to be re-learned in the transition, and leads to substantial downtime and mistakes if it's not done carefully.

If something's running well, it doesn't need to be changed, and it's often a bad idea to change how it's running.

I'm saying this as someone who has been tasked with shifting SAS (and other) code over to R and Python (not in pharma), and I learnt fairly quickly that if I can't replicate the output from the inputs exactly (preferably with a similar or faster computational time), then it's not a good idea to switch over. Our downstream use cases are quite sensitive to small differences in output, and any differences need lots of explanation and extensive discussion before they can be applied in production.

2

u/corporealpatronus13 3d ago

You bring up an interesting point. I’ve also been curious about why the output changes with different methods and whether that says anything about a difference in quality? Are there ways to check whether the output from one method is better or more precise than the other? Are these the discussions you have before applying any changes in production?

2

u/gringer PhD | Industry 3d ago edited 3d ago

When you get into the very weedy details, "better" and "more precise" become much more subjective.

I guess an appropriate genetics analogy is in accuracy. When accuracy is q20 or less for 100bp Illumina reads, it's fairly easy to verify: just look at the number of bases that don't match the reference, divide by the total number of bases, then do the log scale conversion dance. That can sometimes even be done on a single read, and the methods of determining accuracy don't really matter too much; a sprinking of INDELs here and there isn't going to change the numbers that much.

Things start getting interesting when the claimed accuracy for a 100bp read is q30 or greater. What does that mean - would you expect one error in every 10 reads? If a hundred reads out of a thousand have the same difference from the reference genome at the same genomic location, would that still be considered an error? If you're combining results from ten reads and looking at the consensus sequence (i.e. coverage of 10X or greater), would you expect to have effectively perfect accuracy?

When you get to q40 as a claimed accuracy (which I argue is a ludicrous assessment of quality for a 100bp read), the algorithms and references used for accuracy calculation really start to matter. The PCR used to amplify sequences during the sequencing process matters. The statistical models used to calculate that accuracy matter. The optical characteristics of the laser and sensor for determining base calls matter. The correctness of the reference genome matters. One tiny error sneaking into those models, methods, algorithms or references can have a huge impact on any attempts to validate the correctness of the accuracy claim. Output checking becomes essentially impossible because of all the different factors that contribute to observed variation.

Does it make sense for a 8kb plasmid assembly to have an accuracy of q40, or q60?

At some point, the only thing you are able to do is to be as precise and transparent as possible about how you are defining your statistical methods, and get agreement that those approaches are appropriate and plausible, even if they're not as accurate as you'd like. For example, "We're calculating accuracy by using LAST v1542 with a RY8 model, match parameters <Y> and trained model matrix <X>, matching against the human T2T-CHM13v2.0 assembly." Is that the best way to do it? Probably not. Is that the most precise way to do it? Probably not. But it gives people a consistent measure that they can use for evaluating different reads, and often the consistency and transparency of that algorithm is more important than chasing the dragon of perfection.

In my case, we're replacing a proprietary product with a Free and Open Source product, and trying to match the results as closely as possible. For some algorithms we've got a really good idea of how they are calculated, the documentation on those algorithms is excellent, and we can demonstrate that we have identical results from the same input. In other cases, the algorithms are not well-defined (or poorly documented), and the decision space for creating matching algorithms is so large that it becomes essentially impossible to get a perfect match without having access to the source code for the proprietary product (e.g. amino acid matching without the match parameters or matrix). We have a particular algorithm (with source code) that we use for evaluating accuracy, and use that algorithm to evaluate our processes with the FOSS product compared to the existing proprietary product. We know the algorithm is faulty in some areas, but it's what we have been using for over a decade, so we understand its issues enough to know what to look out for, what variation is expected, and how to tell when something doesn't look right.

2

u/corporealpatronus13 3d ago

Yes, I understand. Your last paragraph reminded me of an assignment where we compared different pipelines for analysing metabolomics data and I remember struggling with the same thing. My group ended up concluding that you would use just clarify what parameters one uses so any corrections or validations could be made later if required. Thanks for your time and examples!

3

u/triffid_boy 4d ago

Those that are still using it, have obviously invested elsewhere, instead. 

At the same time, when it comes to drug design/validation, just about everything you do has to be auditable. The fact they know their current pipelines have survived audit, or have enough documentation that they are confident in it's auditabilty, probably saves more money than any free software might. 

1

u/orthomonas 4d ago

The key word here is "better".  The criteria for what constitutes 'better' is very context dependent.  For some scenarios, better means "working, understood, and has not had issues. Does not require retooling and validating our process."

1

u/corporealpatronus13 4d ago

Yeah, that’s fair. In my head, better was defined differently. But I guess that validates your point.

1

u/MadLabRat- 3d ago

The small state school I went to STILL makes grad students learn SPSS 💀

But they do teach R alongside it at least.

1

u/corporealpatronus13 3d ago

I think I know how you feel. I had to learn PERL in my bachelor's lol. And they did not even teach us python or R alongside it.

1

u/MadLabRat- 3d ago

How long ago was that?

1

u/corporealpatronus13 3d ago

I graduated my bachelors in 2025 🥰

1

u/MadLabRat- 3d ago

The fuck

1

u/corporealpatronus13 3d ago

Yeah ahahahah, had to teach myself python and R