Wednesday, November 11, 2015

Regression and Regression Constants....

Regression: It is a prediction. Deriving an equation for predicting one variable from the other.
Algebraically, Errors of prediction = (Y - ^Y)

Least squares linear regression: In a cause and effect relationship, the independent variable (X) is the cause, and the dependent variable (Y) is the effect. Least squares linear regression is a method for predicting the value of a dependent variable Y, based on the value of an independent variable X.

The Least Squares Regression Line: It is a series of points arranged in a straight line that tells how much better the prediction is. Linear regression finds the straight line, called the least squares regression line that best represents observations in a bivariate data set. Suppose Y is a dependent variable, and X is an independent variable, the population regression line cab be written as:
Y = b X + a
In this linear equation, ‘b’ is the beta coefficient and ‘a’ is the Y- intercept of the regression line.

Line of Best Fit (Least Square Method): Minimizes the squared difference (squared deviations) between Y and ^Y (because we can have more than one predicted value for one X value). This method is a more accurate way of finding the line of best fit. A line of best fit is a straight line that is the best approximation of the given set of data. It is used to study the nature of the relation between two variables.
A line of best fit can be roughly determined using an eyeball method by drawing a straight line on a scatterplot so that the number of points above the line and below the line is about equal (and the line passes through as many points as possible).

Steps to find the equation of line of best fit:
1.      Calculate the mean of the x-values and the mean of the y-values.
2.      Find the slope of the line of best fit
3.      Compute the Y-intercept of the line by using the formula.
4.      Write equation of the line (Y = bX + a)

Properties of Regression Line: When the regression parameters (a & b) are defined by the equation above, the regression line has the following properties:
  1. The difference between obtained and predicted value (Y - ^Y) is called an error of prediction called residual. We want to find a line that minimizes the squared difference between Y and ^Y and is known as least square regression line and the approach is called least squares regression.
  2. Two important measures of the size of an effect in regression are r2 and r.
  3. The regression line passes through the mean of the X values (x) and through the mean of the Y values (y) or we can say that it passes through the centroid of the data.
  4. The regression constant (a) = Y-intercept of the regression line.
  5. To use the regression equation technique described in the text, we must have a logical pairing of the scores on the two variables and a linear relationship between them.

Intersection of two means fall on the regression line.

Regression coefficient (b, Slope, nonstandardized): The amount of change in Y for a one – unit change in X. Or the rate at which Y change with change in X. Larger the value (size) of the regression coefficient, the steeper the slope. It is (b) which a measure of how strongly each predictor variable influences the criterion (outcome) variable.
byx = rxy (SDy / SDx); when Y – Outcome, X – predictor
And,           bxy = rxy (SDx / SDy); When X – Outcome, Y – predictor

‘b’ is measured in units of standard deviation. For example, a beta value of 2.5 indicates that a change of one standard deviation in the predictor variable will result in a change of 2.5 standard deviations in the outcome (criterion) variable.

1.     ‘b’ = 1.08 means when X increases by 1 point, outcome increases by 1.08.
2.     On multiplying two slopes b y·x and b x·y we are left with the square of the correlation coefficient which tells about percentage of variance of both variables together. It tells the percentage accuracy in predicting Y. It is better to have the knowledge of correlation coefficient to predict the outcomes.
3.     If b coefficient is positive, the relationship of predictor variable with dependent variable is positive (e.g., the greater the IQ the better the grade point average) and if b coefficient is negative then the relationship is negative (e.g., the lower the class size the better the average test scores).
4.     If b coefficient is equal to 0 then there is no relationship between the variables.
5.     ‘b’ can be anything (when b = +, r = +, and b = -, r = -  but b = +, r = - & vice versa is not possible)
6.     Many lines may have same slope (b) but cannot have the same intercept ‘a’ altogether (‘a’ is the unique identification of a line).

Beta coefficient (b, standardized regression coefficients): It is the change in Y for one unit standard deviation change in X. It is the slope of the regression line when both X and Y variables are converted to standardized z-scores. Thus, higher the beta value the greater the impact of the predictor variable on the criterion variable.
1.     When we have only one predictor variable in our model, then,
Beta (b) = rxy.
2.     When we  have  more  than  one  predictor  variable,  we  cannot compare contribution  of  each  predictor  variable  by  simply  comparing  the            correlation coefficients. The beta coefficient allows to make such comparisons  and  to  assess  the  strength  of  the  relationship  between    each predictor variable to the criterion variable

Interpreting regression constants:
The regression coefficient (b) is the average change (increase or decrease depending on +ve or –ve b) in the outcome variable (Y) for a 1 – unit change in the predictive variable (X)Slope (b coefficient) is a measure of how strongly each predictor variable influences the criterion (outcome) variable. Higher the beta value the greater the impact of the predictor variable on the outcome variable. Relationship is positive if b coefficient is positive and vice versa.

Regression constant ‘a’; Y – intercept: That anchors our line.
The constant term ‘a (regression constant)’ is the value at which fitted line (Line of best fit) crosses the y-axis. It is used as a ‘correction factor’ when using particular values of the x's to predict y.  If we don’t include the constant, the regression line is forced to go through the origin which means all of the predictors and the outcome variables must be zero at that point.


These notes are written by S C Joshi during EPSY 635 Course, Fall 2015, Texas A&M University. Acknowledgements to Dr. Bob Hall, Professor, EPSY, Texas A&M University for his assistance in understanding these terms during the course  

No comments:

Post a Comment