Regression: It is a prediction. Deriving an equation
for predicting one variable from the other.
Algebraically, Errors of prediction = (Y
- ^Y)
Least squares linear regression: In a cause
and effect relationship, the independent
variable (X) is the
cause, and the dependent variable (Y) is the effect. Least squares linear
regression is a method for predicting the value of a
dependent variable Y,
based on the value of an independent variable X.
The Least Squares Regression Line: It
is a series of points arranged in a straight line that tells how much better the prediction is. Linear
regression finds the straight line, called the least squares regression line that best represents observations in a bivariate data set. Suppose Y is a dependent variable, and X is an independent variable, the
population regression line cab be written as:
Y = b X + a
In this linear equation, ‘b’ is the beta coefficient
and ‘a’ is the Y- intercept of the regression line.
Line of Best
Fit (Least Square Method):
Minimizes the squared difference (squared deviations) between Y and ^Y (because
we can have more than one predicted value for one X value). This method is a more accurate way of
finding the line of best fit. A line
of best fit is a
straight line that is the best
approximation of the given set of data. It is used to study the nature of
the relation between two variables.
A line of best fit can be roughly
determined using an eyeball method by drawing a straight line on a scatterplot
so that the number of points above the line and below the line is about equal
(and the line passes through as many points as possible).
Steps to
find the equation of line of best fit:
1. Calculate the mean of
the x-values and the
mean of the y-values.
2. Find the slope of the
line of best fit
3. Compute the Y-intercept of the line by using the formula.
4. Write equation of the
line (Y = bX + a)
Properties of Regression
Line: When the regression parameters (a & b) are defined by the equation above,
the regression line has the following properties:
- The
difference between obtained and predicted value (Y - ^Y) is called an
error of prediction called residual.
We want to find a line that minimizes the squared difference between Y and
^Y and is known as least square regression line and the approach is called
least squares
regression.
- Two
important measures of the size of an effect in regression are r2 and r.
- The
regression line passes through the mean of the X values (x) and through the mean
of the Y values (y)
or we can say that it passes through the centroid of the data.
- The
regression constant (a) = Y-intercept of
the regression line.
- To
use the regression equation technique described in the text, we must have a
logical pairing of the scores
on the two variables and a linear
relationship between them.
Intersection
of two means fall on the regression line.
Regression
coefficient (b, Slope, nonstandardized): The amount of change in Y for a one – unit change in X. Or
the rate at which Y change with change in X. Larger
the value (size) of the regression coefficient, the steeper the slope. It is (b) which a measure of how strongly each predictor
variable influences the criterion (outcome) variable.
byx = rxy (SDy / SDx);
when Y – Outcome, X – predictor
And,
bxy = rxy
(SDx / SDy); When X – Outcome, Y – predictor
‘b’ is measured in units
of standard deviation. For example, a beta value of 2.5 indicates that
a change of one standard deviation in the predictor variable will result
in a change of 2.5 standard deviations in the outcome (criterion) variable.
1. ‘b’
= 1.08 means when X increases by 1 point, outcome increases by 1.08.
2. On multiplying two slopes b y·x and
b x·y we are left with the square of the correlation coefficient which tells about percentage of variance of both
variables together. It tells the percentage accuracy in predicting Y. It is better to have the knowledge of
correlation coefficient to predict the outcomes.
3. If b coefficient
is positive, the relationship of predictor variable with dependent variable is
positive (e.g., the greater the IQ the better the grade point average) and if b coefficient is negative then the
relationship is negative (e.g., the lower the class size the better the average
test scores).
4. If b coefficient is equal to 0 then there
is no relationship between the variables.
5. ‘b’
can be anything (when b = +, r = +, and b = -, r = - but b = +, r = - & vice versa is not
possible)
6. Many
lines may have same slope (b) but cannot have the same intercept ‘a’ altogether
(‘a’ is the unique identification of a line).
Beta coefficient (b, standardized regression coefficients): It is the change in Y for one unit standard deviation change
in X. It
is the slope of the regression line
when both X and Y variables are converted
to standardized z-scores. Thus, higher the beta value the greater the
impact of the predictor variable on the criterion variable.
1.
When we have only one predictor variable in our model,
then,
Beta (b) = rxy.
2.
When we
have more than
one predictor variable, we cannot compare contribution of each
predictor variable by simply comparing
the correlation coefficients.
The beta coefficient allows to make
such comparisons and to assess the
strength of the relationship between each predictor variable to the criterion
variable
Interpreting regression constants:
The regression coefficient
(b) is the average change (increase
or decrease depending on +ve or –ve b) in
the outcome variable (Y)
for a 1 – unit change in the predictive variable (X).
Slope (b coefficient) is a measure of how strongly each predictor variable
influences the criterion (outcome) variable. Higher the beta value the greater the
impact of the predictor variable on the outcome variable. Relationship is positive if b coefficient is positive and vice
versa.
Regression constant ‘a’; Y – intercept: That anchors our line.
The
constant term ‘a (regression constant)’ is the value at which fitted line (Line
of best fit) crosses the y-axis. It is used as a ‘correction factor’ when using particular
values of the x's to predict y. If
we don’t include the constant, the regression line is forced to go through the
origin which means all of the predictors and the outcome variables must be zero at
that point.