|
|
Introduction to Statistics
- Homework:
-
Should there be a lack
of attendance accompanied by an increasing number of homework
assignments showing up in the grader's box during class time,
assignments will no longer be posted. Come to class to find
out what is due and when.
-
Scores
- Institutional Review Board:
- R/S-Plus Code:
- The following can be "source"d into R or
"restore"d into S-Plus.
- Data Files:
- NCSS "Sample"
data in Excel
and SPSS .SAV formats
as well as a
zipped R workspace.
- Height-Weight data from NCSS
in Excel
format and as a
zipped R workspace.
- R/S-Plus "Singer"
data in Excel format.
- Peripheral Vision data in
Excel and
CSV
formats. The CSV file can be read into R using
vision=read.csv("http://newton.uor.edu/facultyfolder/jim_bentley/downloads/math111/vision.csv")
- "Hospital
Survival" data in Excel format and as a
Zip
file.
- Anscombe's Data:
Excel
file with data and graphs. NCSS data only in
.S0
and .S1
files (you will need both).
- El-Far'ah Surface Survey Data:
Excel
file containing data. The first two rows of data should be
discarded since the data from the two circles were mixed when a cat spilled one
box in which the sherds were drying into the other.
- Phone Satisfaction:
Excel
file with data and graphs. NCSS data only in
.S0
and .S1
files (you will need both).
- Sun Spots:
Excel
file with data and graphs. NCSS data only in
.S0
and .S1
files (you will need both).
- Cauchy: Random values from Cauchy and normal
distributions. See if you can determine which columns are
from which distribution. Try histograms and normal
probability plots. In
Excel
format, or in NCSS .S0
and .S1 files
(you will need both).
- Six Points:
Six hypothetical points. In
Excel
format, or in NCSS
.S0
and .S1 files
(you will need both).
- Linear-Quadratic:
Data to test your ability to see
trends in the residuals. In
Excel
and SPSS formats, or in NCSS
.S0
and .S1 files
(you will need both).
- Heteroscedasticity:
Data to test your ability to see
trends in the residuals. In
Excel
and SPSS formats, or in NCSS
.S0
and .S1 files
(you will need both).
- Outliers and Influential Points:
Data to test your ability to see
trends in the residuals. In
Excel
and SPSS formats, or in NCSS
.S0
and .S1 files
(you will need both).
- Explanatory vs Response:
Data to help you see what
happens when the explanatory and response variables are swapped
in a linear regression. In
Excel
and SPSS formats, or in NCSS
.S0
and .S1 files
(you will need both).
- Racquetball
Data: Data in
Excel,
SAS, and NCSS .S0
and .S1
files (you will need both).
- Boxer
Data: Data in
SAS, and NCSS .S0
and .S1
files (you will need both).
A description of the data is contained in this
PDF file.
The data is also packaged as a
Zip
file.
- Titanic data in an
Excel
file.
- Handouts: The following were handed out in
class.
-
Sampling
Distributions of Means: Output from SPSS showing
histograms of the sampling distributions of the means of the
Weight data from the NCSS Sample data set.
-
Testing Statistical
Hypotheses: A short discourse on the basics of testing
statistical hypotheses. Makes reference to the penny flop
experiment.
-
Testing the
Population Mean: A short, canned, example of testing the
population mean.
- Testing if Mean Weight is 150 lbs: Output from
NCSS,
SPSS, and
SAS
showing how to test if the population mean of the weight data in
the SAMPLE data set is 150 lbs. All three include
t-tests. SPSS and SAS show how to use regression to get
the same result.
- Testing if Group Means are Equal: Output from
NCSS
showing how to test if the population means of the weight data in
the SAMPLE data are equal for groups 1 and 2. Includes
t-test and how to use regression to get
the same result.
- Testing if Group Means are Equal Correcting for Height: Output from
NCSS
showing how to test if the population means of the weight data
in the SAMPLE data are equal for groups 1 and 2.
- Demos: The following demonstrations were used in
class.
- Penny Flop
Calculator:
Excel file to help look at the power of a test as demonstrated
in testing the "fairness" of coins.
- Java Applets:
- Pass the Oinkers:
A fun way to get a feel for
probabilities and expected values.
Play
the game here or download
everything you need to play it on your Java enabled machine.
-
Histogram - check the effect of bin size on Old Faithful data
(Webster West - S. Carolina).
-
N=111
Exams Histogram - apply Webster's sliding bin size to our
class example.
-
Calculating normal probabilities - just click or slide the
boundaries to find the probability of the shaded region - or try
the more
Accurate Normal Calculator where you enter the endpoints
numerically. Both applets are demos from Gary McClelland's
Seeing Statistics
project.
-
Guessing Correlations - a neat "game" to show the relationship
between correlations and scatterplots Part of the CUWU
Statistical Program at Illinois-Champaign-Urbana.
-
Drawing a Regression Line by "Eye" Click the "Begin" box to
bring up a scatterplot. Use your mouse to draw a line on the
scatterplot. The MSE error is computed (i.e. "average" squared
error). Check the minimum MSE and see how close you can get.
Click the box to see the least squares line. You can also guess
and check the correlation. This applet is part of the Rice
Virtual Lab in Statistics. (Note: Netscape 4.06 or better is
required for Java 1.1)
-
Influence in Regression - see the effects of adding an outlier
to a least squares line. (Webster West - S. Carolina).
-
Normal Approximation to Count data - see how the distribution
of counts(binomial) in a sample relate to the sample size (n) and
proportion (p). Note: The sample proportion is just count/n so
this helps see how well the normal curve fits the sampling
distribution of a sample proportion. Try it for n=12 and p=0.1,
0.2, 0.4, 0.5, 0.6, 0.8, and 0.9. (Rice Virtual Lab in Statistics)
-
Confidence Interval Simulator - (CUWU Statistical Program)
You will first need to define your "population" by specifying the
values and their probabilities.
- For a quick demo, choose the "Die" option and a number of
sides. Click on "Accept Box" to see the population model.
- To simulate CI's for a proportion, choose the "Coin" option
and specify the population p. Again click on the "Accept Box" to
see the population model.
- Click on the "Confidence Intervals" button, then specify n,
CI level and the number of intervals to simulate.
-
Confidence Interval Simulator - (Rice Virtual lab in Statistics).
Simulates samples of size n=10, 15, or 20 from a population with
mean 50 and std. dev. 10. Window shows confidence intervals for
100 samples, highlighting those that miss 50 at a 95% or 99%
level.
-
Sampling
Distribution and Confidence Interval Simulator:
Link to
the Garfield, delMas, and Chance web page "Tools for
Teaching and Assessing Statistical Inference" which
contains a sampling distribution/CI simulator.
- Data collection: The use of the following will be
explained in class.
- Racquetball toss: Excel
spreadsheets for data collection.
Because we aren't
supposed to hurt the walls of the "new" building, I am making a
copy of a previous class's data as an
Excel
file.- Penny
Flop Outcome Calculator: Excel spreadsheet for
calculating probabilities associated with the penny flop
experiment.
- Subliminal Math: Data from Moore for looking at the
effect of a positive subliminal message upon performance on a
math exam. In Excel
and SPSS formats, or in NCSS
.S0
and .S1 files
(you will need both).
-
Randomized
response: PDF of slides discussing the implementation of
randomized response sampling.
- Utilities:
- R
is a freeware "statistical package". It has been
compiled to run under Windows, Mac OSX, and Linux --
other versions are probably around. R is actually
a vectorized, object oriented programming language with
a large library of statistical functions already written
for it. It is used in a number of graduate
programs and companies. R is known for its
flexibility and its presentation quality graphics. Not-
guaranteed-to-be-most-recent versions for
Windows
and Mac OSX
(install tctlk before R, see below)
as well as the
Windows installation notes are available locally by
clicking on the appropriate version. If possible,
you should download the executables directly from the
CRAN.
Some documentation can be found
here,
here, and
here.
You will probably want to install and load the "lattice"
and "Rcmdr" packages. Use the RunOnce.R
file in this
zip file to install the packages.
- If you are running Mac OS X, before
installing R (and Rcmdr) you need to have tcltk installed.
This package can be found on the CRAN under
R for
Mac OS X - Development Tools and Libraries
- Save
RunOnce.zip
to disk
- Etract RunOnce.R from RunOnce.zip
- Start R
- Click on File-Source R Code
- Select RunOnce.R
- Select a CRAN Mirror in the USA
- Click OK
- At the command prompt, ">", enter q()
- When asked if you want to save the
workspace, click on YES
- More hints for Mac people.
Rcmdr will import SPSS files in SAV format. Moore
has these saved as POR files. Just search for *.POR
and select the appropriate file.
Documents showing examples of
data entry,
numerical descriptives,
and graphics
are available from these links. Help with
GLM,
Logistic
Regression, and
CART is also available.
- The very much not free version of R
is
S-Plus
which has a somewhat nicer front end and about a $2000
street price. Academics get a discount, and
students can get a
free version. For those who care, S came first
(Bell Labs), then S-Plus showed up (work at UW and then
Insightful), and then the S people came back with R.
- NCSS
(Number Cruncher Statistical Software) is not free.
However, there is a trial version that works for 7 days. NCSS is available on
campus.
- SAS
(Statistical Analysis System) is used by many Fortune
500 companies. It contains both analytical and
data management tools. However, its graphics are
weak. It is also very expensive -- particularly to
small liberal arts colleges. A working knowledge
of this package can definitely get you a job.
-
Randomizer:
Web based randomizer for selecting samples from your
population. I'm not sure of the quality of their random
number generator, so use this at your own risk.
-
Surface survey
|
|