Introduction to Statistics

  • Homework:
    • Should there be a lack of attendance accompanied by an increasing number of homework assignments showing up in the grader's box during class time, assignments will no longer be posted.  Come to class to find out what is due and when.
    • Scores
  • Institutional Review Board:
  • R/S-Plus Code:
  • Data Files:
    • Textbook Data
    • NCSS "Sample" data in Excel and SPSS .SAV formats as well as a zipped R workspace.
    • Height-Weight data from NCSS in Excel format and as a zipped R workspace.
    • R/S-Plus "Singer" data in Excel format.
    • Peripheral Vision data in Excel and CSV formats.  The CSV file can be read into R using vision=read.csv("http://newton.uor.edu/facultyfolder/jim_bentley/downloads/math111/vision.csv")
    • "Hospital Survival" data in Excel format and as a Zip file.
    • Anscombe's Data: Excel file with data and graphs.  NCSS data only in .S0 and .S1 files (you will need both).
    • El-Far'ah Surface Survey Data: Excel file containing data.  The first two rows of data should be discarded since the data from the two circles were mixed when a cat spilled one box in which the sherds were drying into the other.
    • Phone Satisfaction: Excel file with data and graphs.  NCSS data only in .S0 and .S1 files (you will need both).
    • Sun Spots: Excel file with data and graphs.  NCSS data only in .S0 and .S1 files (you will need both).
    • Cauchy: Random values from Cauchy and normal distributions.  See if you can determine which columns are from which distribution.  Try histograms and normal probability plots.  In Excel format, or in NCSS .S0 and .S1 files (you will need both).
    • Six Points: Six hypothetical points.  In Excel format, or in NCSS .S0 and .S1 files (you will need both).
    • Linear-Quadratic: Data to test your ability to see trends in the residuals.  In Excel  and SPSS formats, or in NCSS .S0 and .S1 files (you will need both).
    • Heteroscedasticity: Data to test your ability to see trends in the residuals.  In Excel  and SPSS formats, or in NCSS .S0 and .S1 files (you will need both).
    • Outliers and Influential Points: Data to test your ability to see trends in the residuals.  In Excel  and SPSS formats, or in NCSS .S0 and .S1 files (you will need both).
    • Explanatory vs Response: Data to help you see what happens when the explanatory and response variables are swapped in a linear regression.  In Excel  and SPSS formats, or in NCSS .S0 and .S1 files (you will need both).
    •  Racquetball Data:  Data in Excel, SAS, and NCSS .S0 and .S1 files (you will need both).
    •  Boxer Data:  Data in SAS, and NCSS .S0 and .S1 files (you will need both).  A description of the data is contained in this PDF file.  The data is also packaged as a Zip file.
    • Titanic data in an Excel file.
  • Handouts:  The following were handed out in class. 

  • Demos: The following demonstrations were used in class.
    • Penny Flop Calculator: Excel file to help look at the power of a test as demonstrated in testing the "fairness" of coins.
  • Java Applets:
    • Pass the Oinkers:  A fun way to get a feel for probabilities and expected values.  Play the game here or download everything you need to play it on your Java enabled machine.
    • Histogram - check the effect of bin size on Old Faithful data (Webster West - S. Carolina).
    • N=111 Exams Histogram - apply Webster's sliding bin size to our class example.
    • Calculating normal probabilities - just click or slide the boundaries to find the probability of the shaded region - or try the more Accurate Normal Calculator where you enter the endpoints numerically.  Both applets are demos from Gary McClelland's Seeing Statistics project.
    • Guessing Correlations - a neat "game" to show the relationship between correlations and scatterplots  Part of the CUWU Statistical Program at Illinois-Champaign-Urbana.
    • Drawing a Regression Line by "Eye" Click the "Begin" box to bring up a scatterplot.  Use your mouse to draw a line on the scatterplot.  The MSE error is computed (i.e. "average" squared error).  Check the minimum MSE and see how close you can get.  Click the box to see the least squares line.  You can also guess and check the correlation.  This applet is part of the Rice Virtual Lab in Statistics. (Note:  Netscape 4.06 or better is required for Java 1.1)
    • Influence in Regression - see the effects of adding an outlier to a least squares line. (Webster West - S. Carolina).
    • Normal Approximation to Count data - see how the distribution of counts(binomial) in a sample relate to the sample size (n) and proportion (p).  Note:  The sample proportion is just count/n so this helps see how well the normal curve fits the sampling distribution of a sample proportion.  Try it for n=12 and p=0.1, 0.2, 0.4, 0.5, 0.6, 0.8, and 0.9. (Rice Virtual Lab in Statistics)
    • Confidence Interval Simulator - (CUWU Statistical Program)  You will first need to define your "population" by specifying the values and their probabilities.
      • For a quick demo, choose the "Die" option and a number of sides. Click on "Accept Box" to see the population model.
      • To simulate CI's for a proportion, choose the "Coin" option and specify the population p. Again click on the "Accept Box" to see the population model.
      • Click on the "Confidence Intervals" button, then specify n, CI level and the number of intervals to simulate.
    • Confidence Interval Simulator - (Rice Virtual lab in Statistics).  Simulates samples of size n=10, 15, or 20 from a population with mean 50 and std. dev. 10.  Window shows confidence intervals for 100 samples, highlighting those that miss 50 at a 95% or 99% level.
    • Sampling Distribution and Confidence Interval Simulator: Link to the Garfield, delMas, and Chance web page "Tools for Teaching and Assessing Statistical Inference" which contains a sampling distribution/CI simulator. 
  • Data collection: The use of the following will be explained in class. 
    •  Racquetball toss: Excel spreadsheets for data collection.
    •   Because we aren't supposed to hurt the walls of the "new" building, I am making a copy of a previous class's data as an Excel file.
    •  Penny Flop Outcome Calculator: Excel spreadsheet for calculating probabilities associated with the penny flop experiment.
    • Subliminal Math: Data from Moore for looking at the effect of a positive subliminal message upon performance on a math exam. In Excel  and SPSS formats, or in NCSS .S0 and .S1 files (you will need both). 
    •  Randomized response: PDF of slides discussing the implementation of randomized response sampling.
  • Utilities:
    • R is a freeware "statistical package".  It has been compiled to run under Windows, Mac OSX, and Linux -- other versions are probably around.  R is actually a vectorized, object oriented programming language with a large library of statistical functions already written for it.  It is used in a number of graduate programs and companies.  R is known for its flexibility and its presentation quality graphics. Not- guaranteed-to-be-most-recent versions for Windows and Mac OSX (install tctlk before R, see below) as well as the Windows installation notes are available locally by clicking on the appropriate version.  If possible, you should download the executables directly from the CRAN.  Some documentation can be found here, here, and here

      You will probably want to install and load the "lattice" and "Rcmdr" packages.  Use the RunOnce.R file in this zip file to install the packages.

      • If you are running Mac OS X, before installing R (and Rcmdr) you need to have tcltk installed.  This package can be found on the CRAN under  R for Mac OS X - Development Tools and Libraries
      • Save RunOnce.zip to disk
      • Etract RunOnce.R from RunOnce.zip
      • Start R
      • Click on File-Source R Code
      • Select RunOnce.R
      • Select a CRAN Mirror in the USA
      • Click OK
      • At the command prompt, ">", enter q()
      • When asked if you want to save the workspace, click on YES 
      • More hints for Mac people.  Rcmdr will import SPSS files in SAV format.  Moore has these saved as POR files.  Just search for *.POR and select the appropriate file.

      Documents showing examples of data entry, numerical descriptives, and graphics are available from these links.  Help with GLM, Logistic Regression, and CART is also available.

    • The very much not free version of R is S-Plus which has a somewhat nicer front end and about a $2000 street price.  Academics get a discount, and students can get a free version.  For those who care, S came first (Bell Labs), then S-Plus showed up (work at UW and then Insightful), and then the S people came back with R. 
    • NCSS (Number Cruncher Statistical Software) is not free.  However, there is a trial version that works for 7 days.  NCSS is available on campus.
    • SAS (Statistical Analysis System) is used by many Fortune 500 companies.  It contains both analytical and data management tools.  However, its graphics are weak.  It is also very expensive -- particularly to small liberal arts colleges.  A working knowledge of this package can definitely get you a job.
    • Randomizer: Web based randomizer for selecting samples from your population.  I'm not sure of the quality of their random number generator, so use this at your own risk.
    • Surface survey

       

This site was last updated 01/14/19