The Posts of the Great Statisticians: October 2012

Sunday, October 28, 2012

Tuesday, October 23, 2012

Farewell Advanced Statistics!

It was almost four months that we have dealt with the mind-winding problems of Advanced Statistics, but it really pays off if you will look in your report card seeing that the subject that was once the lowest subject grade that you have last year was now even higher than the other subjects.

Though it was a very good feeling that Stat was now done with us, there is still that feeling of a little bit of loneliness that we will miss this hard subject. We will miss the frequent use of calculators, which had been our greatest friend in Advanced Statistics, the scientific calculator which had been our help in every problem that we solved. We will also miss the waiting for the photocopies of our handouts to be given by the leader of the day, the times that we had to collect money/ pay money for the handouts, the times that we had to text or call one of our classmates because we had no idea of what are we going to do with an item that is included in the assignments and exercises. And of course we will miss our Advanced Statistics teacher – Ms. Kristin Macatigos, who will be married very soon.

Everything in this world has its beginning and its end. Like Advanced Statistics, we will be bidding goodbye to it, yet the lessons learned will be applied in some areas of our everyday lives, and the memories formed during the time that we’re still on it will be cherished!

Posted by: Kent Spencer Manalo Mendez

Friday, October 19, 2012

STATISTICS- All the Lessons for 2nd Grading

Second Grading is finally over. We had just finished the Departmental test last October 17-18, 2012. We find it very difficult especially in the subject STATISTICS. But luckily we had all finished it. We had taken the test for almost 2-hours. The test is a 50-item test. I find it very confusing in finding the critical value because it's hard to identify what kind of statistical test will be use.

The Departmental Test include the following topics(all the topics for 2nd Grading):

* Z-test for Proportion

* Chi-square test for a Variance or Standard Deviation

* Testing the Difference between Two Means (Large Samples)

* Testing the Difference between Variances

* Testing the Difference between Two Means(small Samples)

* Testing the Difference between Proportion

* Correlation and Regression

* Test for Goodness of Fit

*Test Using Contingency Tables

* Analysis of Variance

Even though the different statistical methods have different formulas, they still use the Hypothesis TESTING(Stating the hypothesis, Finding the critical value, Solving for the test-value, Making the decision, Summarizing the result). For me, Hypothesis Testing is the most important to statisticians.

By: Dustin Joshua A. Esquia III-Gold

Wednesday, October 17, 2012

Tuesday, October 16, 2012

Random Things- Oct. 15, 2012

Check this out....

By: Carl Joel E. Palma III- Gold

Testing the difference between two variances- Sept. 7, 2012

According to what I've learned...

Characteristics of F-distribution

-The values of F cannot be negative
-The distribution is positively skewed
-The mean value of F is approximately equal to 1
-The F distribution is a family of curves bases on the degrees of freedom of the variance of the numerator and the degree of freedom of the denominator.

Formula for F-test

Use of F-test
1. The larger the variance should always be designated as variance sub 1 and be placed in the numerator formula.
2. For a two tailed test, the alpha must be divided and the critical value be placed on the right side of the f curve.
3. If the standard deviations instead of the variances are given in the problem, they must be squared for the formula for the F-test.
4. When the degrees of freedom cannot be found in table H. the closest value on the smaller side should be used.

Assumptions for testing difference between two variances
1. The populations from which the samples were obtained must be normally distributed.
2. The samples must be independent of each other.

That's all...
By: Carl Joel E. Palma III-Gold

Test for Goodness of fit- Sept. 12/ 2012

Chi square goodness of fit...
-used in testing to see whether a frequency distribution fits a specific pattern
- Hypotheses:
H_o: .... show no preference ....

H₁: .... show preference....

- Degree of Freedom

n-1

-Formula for the Chi-Square Goodness of Fit Test

-Assumptions for Chi Square Goodness of Fit

1. The data that are obtained from a random sample.

   2. The expected frequency for each category must be 5 or more.

-Procedure

   1. State the null hypothesis and identify the claim.
   2. Find the critical value.
   3. Compute the test value. (sum of the values)
   4. Make the decision.
   5. Summarize the results.

Oh well, that's all that I can remember...

By: Carl Joel E. Palma III-Gold

Testing the Difference Between Two Means (Small Samples) and Proportion- Sept. 10, 2012

Testing the difference between two means: small independent samples
Formulas:

Testing the difference between two means: small independent samples
Formula:

Testing the difference between proportions:

That's all...

By: Carl Joel E. Palma III-Gold

Correlation and Regression- Sept. 28, 2012

According to our teacher:

Terms...... In correlation and Regression

Correlation: Is a statistical method used to describe whether a relationship between variables exist.
Regression: A statistical method used to describe the nature of relationship between variables.
Scatter Plot: The graph of the ordered pairs (x,y) of numbers consisting of the independent variable x, and dependent variable y.
Correlation Coefficient: Use to determine the strength of the relationship between two variables.

Types of Relationships
1. Simple relationships- there are only two variables under study.
a. Positive relationship- exist when both variables increase or decrease at the same time.
b. Negative relationship- as one variable increases, the other variable decreases.
2. Multiple relationships- many variables are under study

Independent variable- is the variable in regression that can be controlled or manipulated
Dependent variable- is the variable in regression that cannot be controlled or manipulated

That's all...
By: Carl Joel E. Palma III- Gold

Test for Homogeneity of Proportions- Oct. 2, 2012

Test for Homogeneity of proportions
- it is used to determine whether the proportions for a variable are equal when several samples are selected from different populations

Hypotheses:

H_o: p₁=p₂=p₃

H₁: at least one proportion is different from the others

Possible decisions:

Do not reject H_{o, it can be assumed that the proportions are equal and the differences in them are due to chance.}

Reject H_{o, it can be assumed that the proportions are not equal.}

_{By: Carl Joel E. Palma III- Gold}

Contingency Tables- Oct. 1, 2012

Our lesson in Contingency tables:

This lesson is pretty hard though...

As what I've learned...

- Test for independence of variables is used to determine whether two variables are independent or related to each other when a sample is selected

First- State the hypotheses
ex.

H_o: The opinion about the procedure is independent...

H₁: The opinion about the procedure is dependent...

Second- To get the degree freedom:
(R-1)(C-1)
ex.
If there are three rows and two columns
(3-1)(2-1)= 2

Third- Find the expected value
Fourth- Find the Test value
using this formula

Fifth- Find Critical Value
using d.f. = 2 a= o.o5 CV= 5. 991

Sixth- Make the decision.
Seventh- Make the summary.

By: Carl Joel E. Palma III- Gold

Scheffe and Tukey Tests- Oct. 12, 2012

Scheffe Test

-Used when the decision is to reject the null hypothesis

-You need to compare the means two at a time using all possible combinations

Formula:

Tukey Test

- According to our teacher, this test can be used after the ANOVA has been completed to make pairwise comparisons between means when the "groups have the same sample size.

Formula:

That's all...

By: Carl Joel E. Palma III- Gold

Monday, October 15, 2012

October 9,2012-Lesson about ANOVA (Analysis of Variance)

Next week will be the schedule for our second departmental test, so we are now going fast with our lessons so that we will be able to take the whole coverage of Advanced Statistics.

Now we’re having the lesson about Analysis of Variance (ANOVA). This lesson will be useful for our Advanced Research next school year so we should understand the lesson well. Here are some key points for this lesson:

*The z-test and t-tests should not be used when three or more means are compared, instead, F-test can be used to compare three or more means. (ANOVA)

*The most commonly used tests are the Scheffe test and Tukey test.

*Between-Group variance (S_B²)– one of the first estimates, involves finding the variance of means.

*Mean Square of the Between Group (MS_B) – one of the first estimates, made by computing the variane using all the data and is not affected by differences in the means.

*Within-Group Variance (S_W²) or Mean Square of the Within-Group (MS_W) – the second estimates, is made by computing the variance using all the data and is not affected by differences in the means.

*Analysis of variance used to compare three or more means which contains only one variable. (One-Way Analysis of Variance)

*ANOVA that involve two variables. (Two Way Analysis of Variance)

*No difference in the means: the between group variance estimate will approximately equal to the within group variance estimate; F test value will be approximately equal to one; Null hypothesis will not be rejected.

*Means differ significantly: the between-group variance will be much larger than the within-group variance; F test value will be significantly greater than 1; the Null hypothesis will be rejected.

*k=number of groups

*N=sum of the sample sizes for groups

*SS_B =sum of squares between groups

*SS_w=sum of squares within groups/ (sum of squares for the error)

*MS_B =Between group variance (S_B²)=SS_B/k-1)

*MS_W=Error variance

*Scheffe Test = one must compare the means two at a time, using all possible combination of means.

*Tukey Test can also be used after the analysis has been completed to make pairwise comparisons between means when the groups have the same sample size. The symbol for the Tukey test is q.

Analysis of Variance includes a wide range of symbols, ideas and needs a lot of understanding. Our knowledge of ANOVA will be used for our studies when we are already in 4^th year.

Posted by: Kent Spencer Manalo Mendez

Chi-Square and Tests of Contingency Tables

CHI-SQUARE AND TESTS OF CONTINGENCY TABLES

Hypothesis tests may be performed on contingency tables in order to decide whether or not effects are present. Effects in a contingency table are defined as relationships between the row and column variables; that is, are the levels of the row variable deferentially distributed over levels of the column variables. Significance in this hypothesis test means that interpretation of the cell frequencies is warranted. Non-significance means that any differences in cell frequencies could be explained by chance.

Test for Independence

One of the most frequent uses of X2 is for testing the null hypothesis that two criteria of classification, when applied to a population of subjects (or objects), are independent. Two criteria of classification are said to be independent if the distribution of one criterion in no way depends on the distribution of the other. If the two criteria of classification are not independent, there is an association between them.

When data are arranged in table form for the chi-square independence test, the table is called a contingency table. The table is made up of R rows and C columns.

For the degree of freedom=(R-1)(C-1)

For the expected value=

Example:

Problem 1. A market research firm wishes to know whether they can conclude that, for adults in a certain city, the brand of car driven is associated with the drivers area of residence. A random sample of 500 adult drivers is interviewed to determine what brand of car they drive and in what area of residence they live. Test the claim at α=0.05.

The table shows the results:

Step 1.

Ho: The brand of car and the area of residence are independent of each other.

H1: The brand of car and the area of residence are dependent of each other. (claim)

Step 2.

C.V. =9.488

d.f= (R-1) (C-1) = (2) (2) = 4

Step 3.Using the formula:

χ2 = 19.82

Step 4.

Reject the null hypothesis.

There is enough evidence to support the claim.

__________________________________________________________________________________

The chi-square test of significance is useful as a tool to determine whether or not it is worth the researcher's effort to interpret a contingency table. A significant result of this test means that the cells of a contingency table should be interpreted. A non-significant test means that no effects were discovered and chance could explain the observed differences in the cells. In this case, an interpretation of the cell frequencies is not useful.

__________________________________________________________________________________

By Tito Nuevacobita Jr.

III-GOLD

Saturday, October 13, 2012

Simple Regression and Correlation Analysis

Statistician frequently must estimate how one variable is related to or affected by another variable. A firm may need to determine how its sales are related to the gross national product or it may need to determine how its total production costs are related to its output rate. To estimate such relationships are, they use correlation analysis.

One of the reasons for the importance of regression particularly in business and economics applications is that it can be used to forecast variables. Almost all companies and government institution frequently forecast variables such as product demand, interest rates, inflation rates, prices of raw materials and labor costs.

The technique involves developing a mathematical equation that analyzes the relationship between the variable to be forecast and the variables that the statistician believes are related to the forecast variable. The variable to be forecast variables. The variable to be forecast is called the dependent variable and is denoted by Y, while the related variables are called the independent variables are denoted by X.

Regression Analysis

Regression analysis describes the way in which one variable is related to another. Regression analysis derives an equation which can be used to estimate the unknown value of one variable on the basis of the known value of the other variable. For example, suppose that a hosiery mill is scheduled to produce four tons output next month and once to estimate how much it cost will be. In this case, although the mills output is known its cost are unknown.

Regression analysis can be used to estimate the value of the cost on the basis of the known value of output. Regression analysis can also be used to estimate the level of capital expenditures required to establish a plant with certain capacity. In the case of the hosiery mill, if the plant’s capacity were known regression analysis could be used to predict the firm’s level of expenditure.

Suppose that the firm collects such data for a sample of nine months, the results being shown in the table I below:

Output (tons) Production cost (thousands of dollars)

1 2

2 3

4 4

8 7

6 6

5 5

8 8

9 8

7 6

There are two ways in solving this problem. The first method provides a rough estimate. It uses the graphical approach and the second method which uses the regression formula, gives the exact value of Y when X is 12.

The first method employs the scatter diagram. In this diagram, the X known variable is the monthly output rate-is plotted along the horizontal axis and is called the independent variable. The unknown variable- the monthly cost is plotted along the vertical axis called the dependent variable.

After plotting the points corresponding to nine pairs of X and Y, the next step is to draw the trend line. This line represents the series of points that were plotted in such a way that the line approximates the general direction of the points and passes through the points.

Let us now estimate the value of Y when X is 12 by using the trend line. First we draw a vertical line cutting the trend line through a value of X=12. At the point where this vertical line and trend line intersect, we draw a horizontal line until it intersects or cuts through the Y axis. The point on the Y axis through which this horizontal line passes is our rough estimate of Y when X=12.

Note that this estimate of Y will vary slightly depending on how accurately the trend line was drawn. In our example, the estimate using the graphical methods is near 11.

The second method makes use of the equation of the Least Squares Regression Line or LSRL for short.

y =a + b x

Where:

y =dependent variable

x =independent variable

a =y-intercept

b =slope of the line

a=1.268

b=0.752

y= a+ bx

if x=12

y= 1.268 + 0.752(12)

y=10.92 the standard cost when the output is 12

Correlation

If two variables are related in such a way that the points of a scatter diagram tend to fall in the straight line, then we say that there is an association between the variables and that they are linearly correlated. The most common measure of the strength of the association between the variables id the Pearson correlation coefficient, denoted by r is given by the formula:

Problem:

A laboratory wishes to study the relationship between the dose of a growth stimulant and weight gain for laboratory animals. Seven animals of the same sex, age , and size are selected and randomly assigned to one of seven dosage levels of the growth stimulant.

Plot the data in the scatter diagram and calculate the correlation between the two variables.

Rounding Rule: Round the value of r to three decimal places.

Thus we have r=0.973 which is a very high correlation.

Interpretation of r:

The value of r will always be between -1 and +1. The closer it to either -1 or +1 the stronger the linear relationship between X and Y. If r=0, then X and Y are not linearly related.

These figure below illustrate 3 diagrams of data of various values of r. If r is positive, then there is positive linear relationship between the two variables, meaning that as one variable increases the other variable increases also. If r is negative, then there is a negative linear relationship, meaning that as one variable increases the other variable decreases.

According to Garrett’s interpretation of coefficient of correlation:

An r=0 denotes no correlation

An r from ± 0.01 to ±0.20 denotes indifferent or negligible relationship;

An r from ± 0.21 to ±0.40 denotes low correlation present but slight;

An r from ±0.41 to ±0.70 denotes substantial or marked relationship;

An r from ±0.71 to ±0.99 denotes high to very high correlation.

If r=±0 denotes perfect correlation

By: Tito Nuevacobita Jr.

III-GOLD

Friday, October 12, 2012

Statistics in poetry

I come into class

feeling real clever

for today's test

I'm as ready as ever.

As I pull out my paper

And ready my pen .

My teacher looks at me with confidence

Knowing I can

She writes on the board ,

The problems to solve

They look real easy

no trouble at all

Solving for mean and variance

And two columns of summation

I finish it all

And lat out a loud "phew"

I hand my paper

Which my teacher accept with smile

I leave the room happy

knowing it was all worthwhile.

BY: JOREY MARK A. MILLAMENA
OCTOBER 10,2012

ANOVA- October 11, 2012

This topic is our last one.
And even though it is easy it is also a time consuming work to do.
You need to;

1st. State the Hypothesis and identify the claim.
2nd. Find the Critical Value.
3rd. Compute the test value: get the mean and variance of each sample, find the grand mean.
4th. Find the between group variance and within group variance.
5th. Find F-test value.
6th. Make the decision.
7th. Summarize the result.
8th. Plot the summary table for ANOVA.

and also there are tests like Scheffe Test and Tukey test that you need to do if your decision is to reject the null hypothesis.
And scheffe tests will be long if you have many variables.
It is a great topic to learn because we can use it in our Research proposals.

By: Carl Joel E. Palma III- Gold

Analysis of Variance

Analysis of Variance

-used to compare three or more means which contains only one variable.

-used instead of t-test because all the means are compared simultaneously

With the F-test, two different estimates of the population variances are amde:L

1. Between-Group variance-involves finding the variance of the means

2. Within-Group variance-made by computing the variance using all thye data and is not affected by the difference of the means.

In the analysis of variance, you will perform the hypothesis also.

1. You will state the hypothesis

Ho:m1=m2=m3...m4

Hi: at least one mean is different from the others

2. Find the Critical Value

d.f.N= k-1, where k is the number of groups

d.f.D= N-k, where N is the sum of the sample size of the groups

3. Compute the test value

a. Find the mean and variance of each sample.

b. Find the grand mean(the mean of all sample) which is obtain by adding all the data and dividing it by the grand total

c.Find the between-group variance

d. Find the within-group variance

e. Find the F-test value which is obtain by dividing the between-group variance by the within group variance.

4. Make the decision

5. Summarize the result

Note: If the hypothesis is rejected or there is a significant difference, you can find where is the difference by performing the Scheffe or Tukey test.

Scheffe test

-one must compare the means two at a time, using all possible combination of means

Tukey test

-used only if the groups have the same sample size

By:DUSTIN JOSHUA A. ESQUIA III-GOLD

Wednesday, October 10, 2012

A GUITAR OF STATISTICS

Statistics is not just a subject. The way I treat it is not ordinary. Just a few of my treatment is this:

This subject gives me a thrill

That seems always I feel

Every moment with this

Was not just an ease

With science it fizz

That you can't easily dismiss

Like any of the criminal case

It deserves a proper time and place.

Treatment deal with what you feel, it can be your opinion or something else.

BY: JOREY MARK A. MILLAMENA
OCTOBER 10,2012

Monday, October 8, 2012

Test Using Contingency Tables

Test for homogeneity of proportions

-is used to determine whether the proportions for variable are equal when several samples are selected from different populations.

Example:
A researcher selected a sample of 180 senior citizens from different Brgy. and asked each senior citizen, " Do you avail the discount fee for being a senior citizen?" The data are shown in the table. At a=0.05, test the claim that the proportion of senior citizen who avail the discount fee is the same at all Brgy.

	Brgy.1	Brgy.2	Brgy.3	Total
Yes	20(14.44)	10(14.44)	22(14.44)	52
No	30(27.22)	40(27.22)	28(27.22)	98
	50	50	50	180

State the Hypothesis
*Ho: p1=p2=p3(claim)
Hi: at least one proportion is different from others

Find the Critical Value
*d.f=(r-1)(c-1)
d.f=(2-1)(3-1), C.V=5.991

Compute the Test Value.
*Summation of (Observed value-expected value)2 /expected value
=13.77
Note:(In order to solve the Expected value,you must multiply the total in row with the total in the column and divide it with the grand total.For example,E1,1=(50 x 52)/180. the steps will be repeated until it reaches the Expected value of the last column and last row)

Make the decision:
*Reject the null hypothesis since 13.77>5.991

Summarize the results
*There is enough evidence to reject the claim

by: Dustin Joshua Esquia III-Gold