STAM101 :: Lecture 11 :: Attributes
                  
				
Contingency table – 2x2 contingency table – Test for independence of attributes – test for goodness of fit of mendalian ratio
Test  based on 
 -distribution
In case of attributes we can not employ the parametric tests such as  F and t. Instead we have to apply 
test. When we  want to test whether a set of observed values are in agreement with those  expected on the basis of some theories or hypothesis. The 
statistic provides a measure of agreement between such  observed and expected frequencies. 
Chi-Square
The 
 test has a number of  applications. It is used to 
- Test the independence of attributes
 - Test the goodness of fit
 - Test the homogeneity of variances
 - Test the homogeneity of correlation coefficients
 - Test the equaslity of several proportions.
 
In genetics it is applied to detect linkage.
Applications
– test for goodness of fit
A very powerful test for testing the significance of the discrepancy between theory and experiment was given by Prof. Karl Pearson in 1900 and is known as “chi-square test of goodness of fit “.
If 0i, (i=1,2,…..,n) is a  set of observed (experimental frequencies) and Ei (i=1,2,…..,n) is the  corresponding set of expected (theoretical or hypothetical) frequencies, then,
                    
 
				  It follows a 
 distribution with  n-1 d.f.  In case of 
only one tailed test is used.
Example
In plant genetics, our interest may be to test whether the observed segregation ratios deviate significantly from the mendelian ratios. In such situations we want to test the agreement between the observed and theoretical frequency, such test is called as test of goodness of fit.
Conditions for the validity of 
-test:
				  
-test is an approximate test for large values of ‘n’ for the  validity of 
-test of goodness of fit between theory and experiment, the  following conditions must be satisfied.
- The sample observations should be independent.
 
2. Constraints  on the cell freqrequency, if any, should be linear.     
				  Example:
=
.
3. N, the total frequency should be reasonably large, say greater then (>) 50.
4. No theoretical cell frequency should be less than (<)5. If any  theoretical cell frequency is <5, then for the application of 
- test, it is pooled with the   preceding or scecceeding frequency so that the pooled frequency is  more  than 5 and finally adjust for  degree’s of freedom lost in pooling.
Example1
				  The number of yiest cells counted in a haemocytometer is compared to  the theoretical value is given below. Does the experimental result support the  theory?
No. of Yeast cells in the square  | 
                      Obseved Frequency  | 
                      Expected Frequency  | 
                    
0  | 
                      103  | 
                      106  | 
                    
1  | 
                      143  | 
                      141  | 
                    
2  | 
                      98  | 
                      93  | 
                    
3  | 
                      42  | 
                      41  | 
                    
4  | 
                      8  | 
                      14  | 
                    
5  | 
                      6  | 
                      5  | 
                    
Solution
				  H0: the experimental results support the theory
				  H1: the esperimental results does not support the theory.
				  Level of significance=5%
				  Test Statistic:
				  
 
Oi  | 
                    Ei  | 
                    Oi-Ei  | 
                    (Oi-Ei)2  | 
                    (Oi-Ei)2/Ei  | 
                  
103  | 
                    106  | 
                    -3  | 
                    9  | 
                    0.0849  | 
                  
143  | 
                    141  | 
                    2  | 
                    4  | 
                    0.0284  | 
                  
98  | 
                    93  | 
                    5  | 
                    25  | 
                    0.2688  | 
                  
42  | 
                    41  | 
                    1  | 
                    1  | 
                    0.0244  | 
                  
8  | 
                    14  | 
                    -6  | 
                    36  | 
                    2.5714  | 
                  
6  | 
                    5  | 
                    1  | 
                    1  | 
                    0.2000  | 
                  
400  | 
                    400  | 
                    
  | 
                    
  | 
                    3.1779  | 
                  
\
=3.1779
Table value
                    
(6-1=5 at 5 % l.os)= 11.070
				  Inference
  
<
tab
				  We  accept the null hypothesis.
				  (i.e)  there is a good correspondence between theory and experiment.
test for independence of attributes
            At times we may  consider two charactertistics on attributes simultaneously. Our interest will  be to test the association between these two attributes
                    Example:- An entomologist may be interested to know the effectiveness of  different concentrations of the chemical in killing the insects. The  concentrations of chemical form one attribute. The state of insects ‘killed  & not killed’ forms another attribute. The result of this experiment can be  arranged in the form of a contingency table.  In general one attribute may be divided into m  classes as A 1,A 2, …….A m  and the other attribute may be divided  into  n classes as B 1,B 2,  ……B n . Then the contingency table will have m x n cells. It is  termed as m x n contingency table 
B  | 
                      A1  | 
                      A2  | 
                      …  | 
                      Aj  | 
                      …  | 
                      Am  | 
                      Row Total  | 
                    
B1  | 
                      O11  | 
                      O12  | 
                      …  | 
                      O1j  | 
                      
  | 
                      O1m  | 
                      r1  | 
                    
B2  | 
                      O21  | 
                      O22  | 
                      …  | 
                      O2j  | 
                      
  | 
                      O2m  | 
                      r2  | 
                    
.  | 
                      
  | 
                      
  | 
                      
  | 
                      
  | 
                      
  | 
                      
  | 
                      
  | 
                    
Bi  | 
                      Oij  | 
                      Oi2  | 
                      …  | 
                      Oij  | 
                      
  | 
                      Oim  | 
                      ri  | 
                    
.  | 
                      
  | 
                      
  | 
                      
  | 
                      
  | 
                      
  | 
                      
  | 
                      
  | 
                    
Bn  | 
                      On1  | 
                      On2  | 
                      …  | 
                      Onj  | 
                      
  | 
                      Onm  | 
                      rk  | 
                    
Column Total  | 
                      c1  | 
                      c2  | 
                      …  | 
                      cj  | 
                      …  | 
                      cm  | 
                      n=  | 
                    
where Oij’s are  observed frequencies.
				  The expected  frequencies corresponding to Oij is calculated as 
. The 
is computed as
				  
   
  
				  where
				  Oij –  observed frequencies
				  Eij –  Expected frequencies
				  n= number of rows
				  m= number of  columns
				  It can be verified  that ![]()
				  This 
 is distributed as 
 with (n-1) (m-1) d.f.
2x2 – contingency table
When the number of rows and numberof columns are equal to 2 it is termed as 2 x 2 contingency table .It will be in the following form
  | 
                      B1 B2  | 
                      Row Total  | 
                    
A1 A2  | 
                      a b c d  | 
                      a+b r1 c+d r2  | 
                    
Column  | 
                      a+c b+d c1 c2  | 
                      a+b+c+d  | 
                    
Where a, b, c and d are cell frequancies c1 and c2 are column  totals, r1 and r2 are row totals and n is the total number of observations.
				  In case of 2 x 2 contigency table 
 can be directly found  using the short cut formula,
  
  
 
				  The d.f  associated with 
is (2-1) (2-1) =1
Yates  correction for continuity
				  If anyone of the  cell frequency is < 5, we use Yates correction to make 
as continuous. The yares correction is made by adding 0.5 to  the least cell frequency and adjusting the other cell frequencies so that the  column and row totals remain same . suppose, the firat cell frequency is to be  corrected then the consigency table will be as follows:
  | 
                      B1  | 
                      B2  | 
                      Row Total  | 
                    
A1 A2  | 
                       a   | 
                      b  | 
                      a+b=r1  | 
                    
c  | 
                      d  | 
                      c+d =r2  | 
                    |
Column  | 
                      a+c=c1  | 
                      b+d=c2  | 
                      n = a+b+c+d  | 
                    
Then  use the 
- statistic as 
            
  
 
				  The d.f   associated with 
is (2-1) (2-1) =1
Exapmle 2
				  The severity of a disease and blood group were studied in a research  projest. The findings sre given in the following table, knowmn as the m xn  contingency table. Can this severity of the condition and blood group are  associated.
				  Severity of a disease classified by blood group in 1500 patients.
Condition  | 
                    Blood Groups  | 
                    Total  | 
                  |||
O  | 
                    A  | 
                    B  | 
                    AB  | 
                  ||
Severe  | 
                    51  | 
                    40  | 
                    10  | 
                    9  | 
                    110  | 
                  
Moderate  | 
                    105  | 
                    103  | 
                    25  | 
                    17  | 
                    250  | 
                  
Mild  | 
                    384  | 
                    527  | 
                    125  | 
                    104  | 
                    1140  | 
                  
Total  | 
                    540  | 
                    670  | 
                    160  | 
                    130  | 
                    1500  | 
                  
Solution
				  H0: The severity of the disease is not associated with  blood group.
				  H1: The severity of the disease is associated with blood  group.
				  Calculation of Expected frequencies
Condition  | 
                    Blood Groups  | 
                    Total  | 
                  |||
O  | 
                    A  | 
                    B  | 
                    AB  | 
                  ||
Severe  | 
                    39.6  | 
                    49.1  | 
                    11.7  | 
                    9.5  | 
                    110  | 
                  
Moderate  | 
                    90.0  | 
                    111.7  | 
                    26.7  | 
                    21.7  | 
                    250  | 
                  
Mild  | 
                    410.4  | 
                    509.2  | 
                    121.6  | 
                    98.8  | 
                    1140  | 
                  
Total  | 
                    540  | 
                    670  | 
                    160  | 
                    130  | 
                    1500  | 
                  
Test statistic:
                    
   
 
				  The d.f. associated with the  
 is (3-1)(4-1) = 6
  Calculations
Oi  | 
                    Ei  | 
                    Oi-Ei  | 
                    (Oi-Ei)2  | 
                    (Oi-Ei)2/Ei  | 
                  
51  | 
                    39.6  | 
                    11.4  | 
                    129.96  | 
                    3.2818  | 
                  
40  | 
                    49.1  | 
                    -9.1  | 
                    82.81  | 
                    1.6866  | 
                  
10  | 
                    11.7  | 
                    -1.7  | 
                    2.89  | 
                    0.2470  | 
                  
9  | 
                    9.5  | 
                    -0.5  | 
                    0.25  | 
                    0.0263  | 
                  
105  | 
                    90.0  | 
                    15  | 
                    225.00  | 
                    2.5000  | 
                  
103  | 
                    111.7  | 
                    -8.7  | 
                    75.69  | 
                    0.6776  | 
                  
25  | 
                    26.7  | 
                    -1.7  | 
                    2.89  | 
                    0.1082  | 
                  
17  | 
                    21.7  | 
                    -4.7  | 
                    22.09  | 
                    1.0180  | 
                  
384  | 
                    410.4  | 
                    -26.4  | 
                    696.96  | 
                    1.6982  | 
                  
527  | 
                    509.2  | 
                    17.8  | 
                    316.84  | 
                    0.6222  | 
                  
125  | 
                    121.6  | 
                    3.4  | 
                    11.56  | 
                    0.0951  | 
                  
104  | 
                    98.8  | 
                    5.2  | 
                    27.04  | 
                    0.2737  | 
                  
Total  | 
                    12.2347  | 
                  |||
\
=12.2347
				  Table value of 
for 6 d.f. at 5% level of significance is 12.59
				  Inference
  
<
tab
				  We accept the null hypothesis.
				  The severity of the disease has no association with blood group.
Example 3
				  In order to  determine the possible effect of a chemical treatment on the rate of  germination of cotton seeds a pot culture experiment was conducted. The results  are given below
				  Chemical treatment and germination of cotton  seeds
  | 
                    Germinated  | 
                    Not germinated  | 
                    Total  | 
                  
Chemically Treated  | 
                    118  | 
                    22  | 
                    140  | 
                  
Untreated  | 
                    120  | 
                    40  | 
                    160  | 
                  
Total  | 
                    238  | 
                    62  | 
                    300  | 
                  
Does the chemical treatrment improve the germination rate of cotton seeds?
Solution
				  H0:The chemical treatment does not improve the germination  rate of cotton seeds.
				  H1: The chemical treatment improves the germination rate  of cotton seeds.
				  Level  of significance = 1%
				  Test statistic
  
   
  
 ![]()
Table value
                    
(1) d.f. at 1 % L.O.S = 6.635
				  Inference
  
 <
tab
				  We accept the null hypothesis.
				  The  chemical treatmentwill not  improve the  germination rate of cotton seeds significantly.
Example 4
				  In an experiment on the effect of a growth regulator on fruit  setting in muskmelon the following results were obtained. Test whether the  fruit setting in muskmelon and the application of growth regulator are  independent at 1% level.
  | 
                    Fruit set  | 
                    Fruit not set  | 
                    Total  | 
                  
Treated  | 
                    16  | 
                    9  | 
                    25  | 
                  
Control  | 
                    4  | 
                    21  | 
                    25  | 
                  
Total  | 
                    20  | 
                    30  | 
                    50  | 
                  
Solution
				  H0:Fruit setting in muskmelon does not depend on the  application of growth regulator.
				  H1: Fruit setting in muskmelon depend on the application  of growth regulator.
				  Level  of significance = 1%
				  After  Yates correction we have  
  | 
                    Fruit set  | 
                    Fruit not set  | 
                    Total  | 
                  
Treated  | 
                    15.5  | 
                    9.5  | 
                    25  | 
                  
Control  | 
                    4.5  | 
                    20.5  | 
                    25  | 
                  
Total  | 
                    20  | 
                    30  | 
                    50  | 
                  
Tet statistic
                    
  
  
                    
  
Table value
                    
(1) d.f. at 1 % level of  significance is 6.635
				  Inference
  
 >
tab
				  We reject the null hypothesis.
				  Fruit setting in muskmelon is influenced by the  growth regulator. Application of growth  regulator will increase fruit setting in musk melon.
			    
| Download this lecture as PDF here |