Statitical Analysis

                             B                          Top

Bimodal distribution: A distribution that has two modes.

Box-and-whisker plot: A plot that shows the center, spread, and skewness of a data set by drawing a box and two whiskers using the median, the first quartile, the third quartile, and the smallest and the largest values in the data set between the lower and the upper inner fences.

                             C                          Top

Census: A survey that includes all members of the population.

Continuous variable: A (quantitative) variable that can assume any numerical value over a certain interval or intervals.

Coefficient of variation: A measure of relative variability that expresses standard deviation as a percentage of the mean.

Coefficient of determination: A measure that gives the proportion (or percentage) of the total variation in a dependent variable that is explained by a given independent variable.

                           D                          Top

Data or data set: Collection of observations or measurements on a variable.

Descriptive statistics: Collection of methods that are used for organizing, displaying, and describing data using tables, graphs, and summary measures.

Discrete variable: A (quantitative) variable whose values are countable.

Degrees of freedom for a simple linear regression model: Sample size minus 2, that is, n -2.

Dependent variable: The variable to be predicted or explained.

Deterministic model: A model in which the independent variable determines the dependent variable exactly. Such a model gives an exact relationship between two variables.

E                          Top

Element or member: A specific subject or object included in a sample or population.

Empirical rule: For a specific bell-shaped distribution, about 68% of the observations fall in the interval to , about 95% fall in the interval to , and about 99.7% fall in the interval to .

Estimated or predicted value of y: The value of the dependent variable, denoted by , that is calculated for a given value of x using the estimated regression model.

F                           Top

· First quartile: The value in a ranked data set such that about 25% of the measurements are smaller than this value and about 75% are larger. It is the median of the values that are smaller than the median of the whole data set.

I                           Top

Inferential statistics: Collection of methods that help make decisions about a population based on sample results.

Interval scale: Data that can be ranked and for which we can find the difference between two values are said to have an interval scale.

Inter quartile range: The difference between the third and the first quartiles.

Independent or explanatory variable: The variable included in a model to explain the variation in the dependent variable.

L                           Top

Least squares estimates of A and B: The values of a and b that are calculated by using the sample data.

Least squares method: The method used to fit a regression line through a scatter diagram such that the error sum of squares is minimum.

Least squares regression line: A regression line obtained by using the least squares method.

Linear correlation coefficient: A measure of the strength of the linear relationship between two variables.

Linear regression model: A regression model that gives a straight line relationship between two variables.

M                           Top

Multiple regression model: A regression model that contains two or more independent variables.

Measures of dispersion: Measures that give the spread of a distribution. The range, variance, standard deviation, and coefficient of variation are four such measures.

Measures of position: Measures that determine the position of a single value in relation to other values in a data set. Quartiles, percentiles, and percentile rank are examples of measures of position.

Median: The value of the middle term in a ranked data set. The median divides a ranked data set into two equal parts.

Mode: A value (or values) that occurs with highest frequency in a data set.

Multimodal distribution: A distribution that has more than two modes. Bimodal is a special case of a multimodal distribution with two modes.

Mean A measure of central tendency: calculated by dividing the sum of all values by the number of values in the data set.

Measures of central tendency: Measures that describe the center of a distribution. The mean, median, and mode are three of the measures of central tendency.

N                           Top

Negative relationship between two variables: The value of the slope in the regression line and the correlation coefficient between two variables are both negative.

Nominal scale: Data that are divided into different categories that are used for identification purposes only are said to have a nominal scale.

Nonlinear (simple) regression model: A regression model that does not give a straight line relationship between two variables.

O                           Top

Outliers or extreme values: Values those are very small or very large relative to the majority of the values in a data set.

Observation or measurement: The value of a variable for an element.

Ordinal scale: Data that can be divided into different categories that can be ranked are said to have an ordinal scale.

P                           Top

Parameter: A summary measure calculated for population data.

Percentile rank: The percentile rank of a value gives the percentage of values in the data set that are smaller than this value.

Percentiles: Ninety-nine values that divide a ranked data set into 100 equal parts.

Population or target population: The collection of all elements whose characteristics are being studied.

Population parameters for a simple regression model: The values of A and B for the regression model y = A + bx +   that are obtained by using population data.

Positive relationship between two variables: The value of the slope in the regression line and the correlation coefficient between two variables are both positive.

Prediction interval: The confidence interval for a particular value of y for a given value of x. Probabilistic or statistical model A model in which the independent variable does not determine the dependent variable exactly.

Q                           Top

Quartiles: Three summary measures that divide a ranked data set into four equal parts.

Qualitative or categorical data: Data generated by a qualitative variable.

Qualitative or categorical variable: A variable that cannot assume numerical values but is classified into two or more categories.

Quantitative data: Data generated by a quantitative variable.

Quantitative variable: A variable that can be measured numerically.

R                           Top

Range: A measure of spread obtained by taking the difference between the largest and the smallest values in a data set.

Random sample: A sample drawn in such a way that each element of the population has some chance of being included in the sample.

Ratio scale: Data that can be ranked and for which all arithmetic operations can be performed are said to have a ratio scale.

Representative sample: A sample that contains the characteristics of the corresponding population.

Random error term (): The difference between the actual and predicted values of y.

S                           Top

Scatter diagram or scatter gram: A plot of the paired observations of x and y.

Simple linear regression: A regression model with one dependent and one independent variable that assumes a straight line relationship.

Slope: The coefficient of x in a regression model that gives the change in y for a change of one unit in x.

SSE: (error sum of squares) The sum of the squared differences between the actual and predicted values of y. It is that portion of the SST that is not explained by the regression model.

SSR (regression sum of squares): That portion of the SST that is explained by the regression model.

SST (total sum of squares): The sum of the squared differences between actual y values and y.

Standard deviation of errors: A measure of spread for the random errors.

Second quartile: Middle or second of the three quartiles that divide a ranked data set into four equal parts. About 50% of the values in the data set are smaller and about 50% are larger than the second quartile. The second quartile is the same as the median.

Sample: A portion of the population of interest.

Sample survey: A survey that includes elements of a sample.

Statistics: Collection of methods that are used to collect, analyze, present, and interpret data and to make decisions.

Survey: Collecting data on the elements of a population or sample.

Standard deviation: A measure of spread that is given by the positive square root of the variance.

Statistic: A summary measure calculated for sample data.

T                           Top

Third quartile: Third of the three quartiles that divide a ranked data set into four equal parts. About 75% of the values in a data set are smaller than the value of the third quartile and about 25% are larger. It is the median of the values that are greater than the median of the whole data set.

U                           Top

Unimodal distribution: A distribution that has only one mode.

V                           Top

Variable: A characteristic under study or investigation that assumes different values for different elements.

Variance: A measure of spread.

Y                           Top

Y-Intercept: The point at which the regression line intersects the vertical axis on which the dependent variable is marked. It is the value of y when x is zero.