|
|
Introduction to Statistics
- Homework:
Available via
WileyPLUS. Please submit your PDFs
into Moodle.
- Institutional Review Board:
- R/S-Plus Code:
- The following can be
"source"d into R or "restore"d into
S-Plus.
- Data Files:
- Textbook Data
- NCSS "Sample" data
in Excel
and SPSS
.SAV formats as well as a
zipped R workspace.
- Height-Weight data
from NCSS in Excel
format and as a
zipped R workspace.
- R/S-Plus "Singer"
data in Excel format.
- Peripheral Vision
data in Excel
and CSV
formats. The CSV file can be read
into R using
vision=read.csv("http://newton.uor.edu/facultyfolder/jim_bentley/downloads/math111/vision.csv")
- "Hospital
Survival" data in Excel format and
as a Zip
file.
- Anscombe's Data: Excel
file with data and graphs.
NCSS data only in .S0
and .S1
files (you will need both).
- El-Far'ah Surface Survey
Data: Excel
file containing data. The first two
rows of data should be discarded since the
data from the two circles were mixed when
a cat spilled one box in which the sherds
were drying into the other.
- Phone Satisfaction: Excel
file with data and graphs.
NCSS data only in .S0
and .S1
files (you will need both).
- Sun Spots: Excel
file with data and graphs.
NCSS data only in .S0
and .S1
files (you will need both).
- Cauchy: Random values from Cauchy and
normal distributions. See if you can
determine which columns are from which
distribution. Try histograms and
normal probability plots. In Excel
format, or in NCSS .S0
and .S1
files (you will need both).
- Six Points:
Six hypothetical
points. In Excel
format, or in NCSS .S0
and .S1
files (you will need both).
- Linear-Quadratic:
Data to test your ability to
see trends in the residuals. In Excel
and SPSS
formats, or in NCSS .S0
and .S1
files (you will need both).
- Heteroscedasticity: Data
to test your ability to see trends in the
residuals. In Excel
and SPSS
formats, or in NCSS .S0
and .S1
files (you will need both).
- Outliers and Influential
Points: Data
to test your ability to see trends in the
residuals. In Excel
and SPSS
formats, or in NCSS .S0
and .S1
files (you will need both).
- Explanatory vs Response:
Data to help
you see what happens when the explanatory
and response variables are swapped in a
linear regression. In Excel
and SPSS
formats, or in NCSS .S0
and .S1
files (you will need both).
- Racquetball Data:
Data in Excel,
SAS,
and NCSS .S0
and .S1
files (you will need both).
- Boxer Data: Data
in SAS,
and NCSS .S0
and .S1
files (you will need both). A
description of the data is contained in
this PDF
file. The data is also packaged as a
Zip
file.
- Titanic data in an Excel
file.
- Handouts: The
following were handed out in class.
- Demos: The following
demonstrations were used in class.
- Penny Flop
Calculator: Excel
file to help look at the power of a
test as demonstrated in testing the
"fairness" of coins.
- Java Applets:
- Pass the
Oinkers: A fun way to get a
feel for probabilities and
expected values. Play
the game here or download
everything you need to play it
on your Java enabled machine.
-
Histogram - check the effect
of bin size on Old Faithful
data (Webster West - S.
Carolina).
-
N=111 Exams
Histogram - apply Webster's
sliding bin size to our class
example.
-
Calculating
normal probabilities - just click or
slide the boundaries to find
the probability of the shaded
region - or try the more Accurate
Normal Calculator where
you enter the endpoints
numerically. Both
applets are demos from Gary
McClelland's Seeing
Statistics project.
-
Guessing
Correlations - a neat "game" to
show the relationship between
correlations and
scatterplots Part of the
CUWU Statistical Program at
Illinois-Champaign-Urbana.
-
Drawing a
Regression Line by "Eye" Click the "Begin"
box to bring up a
scatterplot. Use your
mouse to draw a line on the
scatterplot. The MSE
error is computed (i.e.
"average" squared
error). Check the
minimum MSE and see how close
you can get. Click the
box to see the least squares
line. You can also guess
and check the
correlation. This applet
is part of the Rice Virtual
Lab in Statistics.
(Note: Netscape 4.06 or
better is required for Java
1.1)
-
Influence in
Regression - see the effects of
adding an outlier to a least
squares line. (Webster West -
S. Carolina).
-
Normal
Approximation to Count data - see how the
distribution of
counts(binomial) in a sample
relate to the sample size (n)
and proportion (p).
Note: The sample
proportion is just count/n so
this helps see how well the
normal curve fits the sampling
distribution of a sample
proportion. Try it for
n=12 and p=0.1, 0.2, 0.4, 0.5,
0.6, 0.8, and 0.9. (Rice
Virtual Lab in Statistics)
-
Confidence
Interval Simulator - (CUWU
Statistical Program) You will first
need to define your
"population" by specifying the
values and their
probabilities.
- For a quick
demo, choose the "Die"
option and a number of
sides. Click on "Accept
Box" to see the population
model.
- To simulate
CI's for a proportion,
choose the "Coin" option
and specify the population
p. Again click on the
"Accept Box" to see the
population model.
- Click on
the "Confidence Intervals"
button, then specify n, CI
level and the number of
intervals to simulate.
-
Confidence
Interval Simulator - (Rice
Virtual lab in Statistics). Simulates
samples of size n=10, 15, or
20 from a population with mean
50 and std. dev. 10.
Window shows confidence
intervals for 100 samples,
highlighting those that miss
50 at a 95% or 99% level.
-
Sampling
Distribution and Confidence
Interval Simulator: Link to the Garfield,
delMas, and Chance web page
"Tools for Teaching and
Assessing Statistical
Inference" which contains a
sampling distribution/CI
simulator.
- Data collection: The use
of the following will be explained
in class.
- Racquetball
toss: Excel
spreadsheets for data
collection.
Because we
aren't supposed to hurt the
walls of the "new" building, I
am making a copy of a previous
class's data as an Excel
file.
- Penny
Flop Outcome Calculator:
Excel
spreadsheet for calculating
probabilities associated with
the penny flop experiment.
- Subliminal
Math: Data from Moore for
looking at the effect of a
positive subliminal message
upon performance on a math
exam. In Excel
and SPSS
formats, or in NCSS .S0
and .S1
files (you will need
both).
- Randomized
response: PDF of
slides discussing the
implementation of randomized
response sampling.
- Utilities:
- R
is a freeware "statistical
package". It has been
compiled to run under
Windows, Mac OSX, and Linux
-- other versions are
probably around. R is
actually a vectorized,
object oriented programming
language with a large
library of statistical
functions already written
for it. It is used in
a number of graduate
programs and
companies. R is known
for its flexibility and its
presentation quality
graphics. Not-
guaranteed-to-be-most-recent
versions for Windows
and Mac
OSX (install tctlk
before R, see below) as well
as the Windows
installation notes are
available locally by
clicking on the appropriate
version. If possible,
you should download the
executables directly from
the CRAN.
Some documentation can be
found here,
here,
and here.
You will
probably want to install
and load the "lattice" and
"Rcmdr" packages.
Use the RunOnce.R file in
this
zip file to install
the packages.
- If
you are running Mac
OS X, before
installing R (and
Rcmdr) you need to
have tcltk
installed.
This package can be
found on the CRAN
under R
for Mac OS X -
Development Tools and
Libraries
- Save RunOnce.zip to disk
- Etract
RunOnce.R from
RunOnce.zip
- Start R
- Click on
File-Source R Code
- Select
RunOnce.R
- Select a
CRAN Mirror in the USA
- Click OK
- At the
command prompt, ">",
enter q()
- When
asked if you want to
save the workspace,
click on YES
- More
hints for Mac
people. Rcmdr will
import SPSS files in SAV
format. Moore has
these saved as POR
files. Just search
for *.POR and select the
appropriate file.
Documents
showing examples of data
entry,
numerical descriptives,
and graphics
are available from these
links. Help with GLM, Logistic
Regression, and CART is also available.
- The very much
not free version of R is S-Plus
which has a somewhat nicer
front end and about a $2000
street price.
Academics get a discount,
and students can get a free
version. For
those who care, S came first
(Bell Labs), then S-Plus
showed up (work at UW and
then Insightful), and then
the S people came back with
R.
- NCSS
(Number Cruncher Statistical
Software) is not free.
However, there is a trial
version that works for 7
days. NCSS is
available on campus.
- SAS
(Statistical Analysis
System) is used by many
Fortune 500 companies.
It contains both analytical
and data management
tools. However, its
graphics are weak. It
is also very expensive --
particularly to small
liberal arts colleges.
A working knowledge of this
package can definitely get
you a job.
-
Randomizer: Web based
randomizer for selecting
samples from your
population. I'm not
sure of the quality of their
random number generator, so
use this at your own risk.
-
Surface
survey
|
|