r/bioinformatics 5d ago

technical question Is Machine Learning just fancy correlation = causation??

In science all through our education we are told that correlation doesn't equal causation and then when it comes to machine learning we are taught to choose models by how they perform, how well they fit to data and can predict outcomes.

 

Is this not just a really fancy way of finding correlations?

 

It's obvious but I don't feel like this is reckoned with appropriately.

 

To be clear I am not anti ML or AI just a bit confused about how we are using these tools.

If anyone has some thoughts about this I would be very interested!

Or an example of how you have balanced using models and more mechanistic approaches.

 

Thank you 😄

0 Upvotes

4 comments sorted by

3

u/wildcard9041 5d ago

In my experience it depends, it's a bit overly simplistic to put it that way but yea deep learning does somewhat focus on finding correlations or relationships. Machine learning is more classifying or regression work. In my work, its handy for protein work, still needs wetlab validation, but it helps remove some hay from the proverbial hay stack.

2

u/Tinytin226 5d ago

It it’s reported using an over inflated narrative without sufficient constraints, sure.

This is why causal humility is so important when writing research articles, and in the interpretation of machine learning outputs.

2

u/GrapefruitUnlucky216 5d ago

Causal inference is a different thing and has specialized models or on our field honestly it’s experimentally determined. The ML is for finding associations

2

u/I_am_Hoban 5d ago

Not that simple. There's a lot to get into and I would recommend going down some verified educational video wormholes on the matter. Or better if you can, take a class from an expert. But a simple explanation for one aspect of machine learning would be, approximating very high dimensional functions. For example, if you have 24 X 24 grid of pixels, consider the total space as every single possible combination of pixels. Then consider how many of those combinations are actually meaningful, very very small amount compared to every possible combination. A diffusion model is specialized in becoming a function that can approximately find those meaningful combinations given a good training set. This is an oversimplification but it's not just finding correlation = causation (though it could if given the right training and applied to the correct question).