- Relationship between two variables
- Range from -1 to +1
- works well with linearly related data
- does not work well with non-linear data
- Always visualize to see the trend of relationship
- Sometimes transforming data may help
- Correlation does not mean causation. It may appear so due to the confounding variable.
- confounding variable : Also known as lurking variable
- example: drinking coffee and lung cancer has high correlation. but that does not
mean drinking coffee cause lung cancer. The third variable at play
is smoking. those who typically drink coffee are associated with smoking,
which is the main reason for lung cancer instead of coffee.
- Confounder also causes fuzziness in probabilities
- To see whether the correlation is due to cause we need Controlled experiment(eg: A/B Testing)
- Controlled experiment : We randomly divide the participants into two groups and see the outcome
- Gold standard for grouping : helps to remove bias or confoundness
1. pure random assignment,
2. placebo (Control group receives treatment that has no effect and are unaware of this),
3. Double blindness : Placebo + even treatment givers are unaware of placebos
- Study type:
1. Longitudinal : participants are observed for a period of time. Slow, but no confounded variable.
2. Cross-sectional : participants are observed on a snapshot of time. Fast, but possible confounded variable.
- example : What is the effect of an advertisement on the number of products purchased
- Treatment: advertisement
- Treatment group : sees advertisement
- Control group : does not see advertisement
- Response: number of products purchased
- Note : Make sure both groups are of similar characteristics (same age, gender, etc. since anything may be a confounder)