Used to determine relationship between 2 categorical variables.
It tests for association between 2 variables and tells us how likely
the is association due to chance
The test assumes (null hypothesis) that the variables are independent.
The the model does not fit, then that proves that the variables are dependent.
calculation steps:
- use contigency table for observed values from a dataset.
- on contigency table, for each cell, (row total * column total)/grand total
- this will create table with expected value
- calculate chi square value: SUM((observed-expected)^2 / expected)
- determine p-value for that chi square value
- if p<0.05, then the variables are not independent (Reject null)
###########################################################################
- Chi-square test of independence
- calculates if the difference in proportion of multiple categories across two variables have any significance
- Gives the idea if the two categorical variables are statistically independent of each other
- Statistical independence - proportion of successes in the response variable is the same across all categories of the explanatory variable
- p-value < significance level means the variables are not independent. Instead, they are associated
- No direction or tail or alternative argument since it is squared (Always right tailed test)
- Chi-square goodness of fit
- Checks whether the hypothesised proportion are a good fit for the sample proportion
- Used to compare the proportion of a categorical variable's distribution across different dataset
- if p-value < significance level then we can say that the hypothesised proportion is not a good fit of the sample proportion