7.1 Chi-Squared Test of a Contingency Table

 

The chi-squared test is used to determine if there is enough evidence to infer that two are related and to infer that differences exist among two qualitative variables. Completing both objectives entails to two different criteria. The following is an Example to see how this is done.

 

Example (1)

One of the issues that came up in a recent national election in many future elections) is how to deal with a sluggish should governments cut spending, raise taxes, inflate the more money), or do none of the above and let the deficit rise politicians need to know which parts of the electorate suppose that a random sample of 1,000 people was asked which and their political affiliations. The possible responses to the affiliation were Democrat, Republican, and Independent. The responses were summarized in cross-classification table, shown below. Do this conclude that political affiliation affects support for the ecology.

 

 

Political Aff.

Economic Opinions

Democrat

Republican

Cut spending

Raise taxes

Inflate the economy

Let deficit increase

101

38

131

61

282

67

88

90

 

Solution:

One way to solve the problem is to consider the contingency table. The variables are economic affiliation. Both are qualitative. The values of economic "raise taxes," "inflate the economy," and "let deficit increase political affiliation are "Democrat," "Republican" and "Independent" objective is to analyze the relationship between the two variables. Specifically, we want to know whether one variable affects the other.

 

Another way of addressing the problem is to determine whether differences exist among Democrats, Republicans, and Independents. In other words, we treat each political group as a separate population. Each population has four possible values, represented by the four economic options. (We can also answer the question by treating the economic options as populations and the political affiliations as the values of the random variable.) Here the problem objective is to compare three populations.

As you will shortly discover, both objectives lead to the same test. Consequently, we can address both objectives at the same time.

 

The null hypothesis will specify that there is no relationship between the two variables. We state this in the following way.

 

Ho: The two variables are independent.

 

The alternative hypothesis specifies that one variable affects the other, which is expressed as

 

HA: The two variables are dependent.

 

If the null hypothesis is true, political affiliation and economic option are independent of one another. This means that whether someone is a Democrat, Republican, or Independent does not affect his economic choice. Consequently, there is no difference among Democrats, Republicans, and Independents in their support for the four economic options. If the alternative hypothesis is true, political affiliation does affect which economic option is preferred. Thus, there are differences d is likely to among the three political groups.

 

The test statistic is

 

Where k is the number of cells in the contingency table. The null hypothesis for the chi-squared test of a contingency table only states that the two variables are independent. However, we need the probabilities in order to compute the expected values (ej), which in turn permits us to calculate the value of the test statistic. (The entries in the contingency table are the observed values, oi. The question immediately arises: from where do we get the probabilities? The answer is that they will come from the data after we assume that the null hypothesis is true.

 

If we consider each political affiliation to be a separate population, each column of the contingency table represents an experiment with four cells. If the null hypothesis is true, the three experiments should produce similar proportions in each cell. We can estimate the cell probabilities by calculating the total in each row and dividing by the sample size. Thus,

P(cut spending) 

P(raise taxes) 

P(let deficit increase) 

We can calculate the expected values for each cell in the three by multiplying these probabilities by the total number of political group. By adding down each column, we find that there are residents who identified themselves as Democrats (331), 527 as Republicans and 142 as independents.

 

Expected Values of the Economic Options of Democrats

 

EONOMIC OPTION

 

Cut spending

Raise Taxes

Inflate economy

Let deficit increase

EXPECTED VALUE

 

Expected Values of the Economic Options of Republicans

 

EONOMIC OPTION

 

Cut spending

Raise Taxes

Inflate economy

Let deficit increase

EXPECTED VALUE

 

Expected Values of the Economic Options of Independents

 

EONOMIC OPTION

 

Cut spending

Raise taxes

Inflate economy

Let deficit increase

EXPECTED VALUE

 

 

Notice that the expected values are computed by multiplying the column total by the row total and dividing by the sample size.


7.2 Expected Frequencies for a Contingency Table

 

The expected frequency of the cell in column j and row i is

 

 

The expected cell frequencies are shown in parentheses in the Table below, the expected cell frequencies should satisfy the rule of five.

 

Contingency Table of Example 3

ECONOMIC OPTIONS

POLITICAL AFFILIATION

DEMOCRATE

REPUBLIC

INDEPENDENT

Cut spending

Raise Taxes

Inflate economy

Let deficit increase

101 (146.96)

38 (43.03)

131 (82.75)

61 (58.26)

282 (233.99)

67 (68.51)

88 (131.75)

90 (92.75)

61 (63.05)

25 (18.46)

31 (35.50)

25 (24.99)

 

We can now calculate the value of the test statistic. It is

 

Notice that we continue to use a single subscript in the formula of the test statistic when we should use two subscripts, one for the rows and one for the columns. We feel that it is clear that for each cell, we need to calculate the squared difference between the observed and expected frequencies divided by the expected frequency. We don't believe that the satisfaction of using the mathematically correct notation would overcome the unnecessary complication.

Rejection Region

To determine the rejection region, we need to know the number of degrees of freedom associated with this x2 - statistic. The number of degrees of freedom for a contingency Table with r rows and c columns is

 

For Example 3, the number of degrees of freedom is

 

d.f. = (r -l)(c -1) = (4 -1)(3 -1) = 6

 

If we use a 5% significance level, the rejection region is

 

Because x2 = 70.675, we reject the null hypothesis and conclude that evidence of a relationship between political affiliation and support for nomic options. It follows that the three political affiliations differ in their for the four economic options. We can see from the data that Republicans favor cutting spending, whereas Democrats prefer to inflate the economy.

 

Example (2)

The operations manager of a company that manufactures shirts whether there are differences in the quality of workmanship am shifts. She randomly selects 600 recently made shirts and scarf. Each shirt is classified as either perfect or flawed, and the shift also recorded. The accompanying Table summarizes the number into each cell. Do these data provide sufficient evidence at the 5 to infer that there are differences in quality among the three?

 

Contingency Table Classifying Shirts

 

 

SHIFT

SHIFT CONDITION

1

2

Perfect

Flawed

240

10

191

9

 

Solution:

The problem objective is to compare three populations (the shirt three shifts). The data are qualitative because each shirt will be perfect or flawed. This problem - objective / data - type combination statistical procedure to be employed is the chi-squared test of a. The null and alternative hypotheses are as follows.

                        Ho: The two variables are independent.

                        HA: The two variables are dependent.

 

Test statistics:

 

 

We calculated the row and column totals and used them to determine the expected values. For example, the expected number of perfect shirts produced in shift 1 is

 

The remaining expected values are computed in a like manner. The original Table and expected values are shown in the Table below.

 

SHIRT CONDITION

 

SHIFT

 

 

1

2

3

TOTAL

Perfect

Flawed

240 (237.5)

10 (12.5)

191 (190.0)

9 (10.0)

139 (142.5)

11 (7.5)

570

30

TOTAL

250

200

150

600

 

The value of the test statistic is

 

Conclusion: Do not reject the null hypothesis

We can measure how strong is the relationship between the two variables using (sort of a correlation coefficient called contingency coefficient CCC)