Saturday, October 13, 2012

Simple Regression and Correlation Analysis


            Statistician frequently must estimate how one variable is related to or affected by another variable. A firm may need to determine how its sales are related to the gross national product or it may need to determine how its total production costs are related to its output rate. To estimate such relationships are, they use correlation analysis.

           One of the reasons for the importance of regression particularly in business and economics applications is that it can be used to forecast variables. Almost all companies and government institution frequently forecast variables such as product demand, interest rates, inflation rates, prices of raw materials and labor costs.

           The technique involves developing a mathematical equation that analyzes the relationship between the variable to be forecast and the variables that the statistician believes are related to the forecast variable. The variable to be forecast variables. The variable to be forecast is called the dependent variable and is denoted by Y, while the related variables are called the independent variables are denoted by X.

Regression Analysis 

          Regression analysis describes the way in which one variable is related to another. Regression analysis derives an equation which can be used to estimate the unknown value of one variable on the basis of the known value of the other variable. For example, suppose that a hosiery mill is scheduled to produce four tons output next month and once to estimate how much it cost will be. In this case, although the mills output is known its cost are unknown.

         Regression analysis can be used to estimate the value of the cost on the basis of the known value of output. Regression analysis can also be used to estimate the level of capital expenditures required to establish a plant with certain capacity. In the case of the hosiery mill, if the plant’s capacity were known regression analysis could be used to predict the firm’s level of expenditure.  

Suppose that the firm collects such data for a sample of nine months, the results being shown in the table I below:

Output (tons)                                                                                    Production cost (thousands of dollars)
1                                                                                                              2
2                                                                                                              3
4                                                                                                              4
8                                                                                                              7
6                                                                                                              6
5                                                                                                              5
8                                                                                                              8
9                                                                                                              8
7                                                                                                              6


There are two ways in solving this problem. The first method provides a rough estimate. It uses the graphical approach and the second method which uses the regression formula, gives the exact value of Y when X is 12.



The first method employs the scatter diagram. In this diagram, the X known variable is the monthly output rate-is plotted along the horizontal axis and is called the independent variable. The unknown variable- the monthly cost is plotted along the vertical axis called the dependent variable. 

After plotting the points corresponding to nine pairs of X and Y, the next step is to draw the trend line. This line represents the series of points that were plotted in such a way that the line approximates the general direction of the points and passes through the points. 

Let us now estimate the value of Y when X is 12 by using the trend line. First we draw a vertical line cutting the trend line through a value of X=12. At the point where this vertical line and trend line intersect, we draw a horizontal line until it intersects or cuts through the Y axis. The point on the Y axis through which this horizontal line passes is our rough estimate of Y when X=12.



Note that this estimate of Y will vary slightly depending on how accurately the trend line was drawn. In our example, the estimate using the graphical methods is near 11.
                            


The second method makes use of the equation of the Least Squares Regression Line or LSRL for short.

y =a + b x



Where:
y =dependent variable
x =independent variable
a =y-intercept
b =slope of the line 


a=1.268
b=0.752

y= a+ bx
if x=12
y= 1.268 + 0.752(12)
y=10.92 the standard cost when the output is 12


Correlation

 If two variables are related in such a way that the points of a scatter diagram tend to fall in the straight line, then we say that there is an association between the variables and that they are linearly correlated. The most common measure of the strength of the association between the variables id the Pearson correlation coefficient, denoted by r is given by the formula:

Problem:

A laboratory wishes to study the relationship between the dose of a growth stimulant and weight gain for laboratory animals. Seven animals of the same sex, age , and size are selected and randomly assigned to one of seven dosage levels of the growth stimulant. 







Plot the data in the scatter diagram and calculate the correlation between the two variables. 






Rounding Rule: Round the value of r to three decimal places.
Thus we have r=0.973 which is a very high correlation.

Interpretation of r:
The value of r will always be between -1 and +1. The closer it to either -1 or +1 the stronger the linear relationship between X and Y. If r=0, then X and Y are not linearly related. 



These figure below illustrate 3 diagrams of data of various values of r. If r is positive, then there is positive linear relationship between the two variables, meaning that as one variable increases the other variable increases also. If r is negative, then there is a negative linear relationship, meaning that as one variable increases the other variable decreases.






















                      











According to Garrett’s interpretation of coefficient of correlation:


An r=0 denotes no correlation
An r from  ± 0.01 to ±0.20 denotes indifferent or negligible relationship;
An r from  ± 0.21 to ±0.40 denotes low correlation present but slight;
An r from ±0.41 to ±0.70 denotes substantial or marked relationship;
An r from ±0.71 to ±0.99 denotes high to very high correlation.

If r=±0 denotes perfect correlation




By: Tito Nuevacobita Jr. 
III-GOLD



No comments:

Post a Comment