Chapter 1 Terminology : Exploring Data
Fathom Software
 Toolbar
 Formula Editor
 Data :
 Collection
 Case or Cases (i.e. individuals)
 Attribute (i.e. variable)
 Value
 Viewing the Data:
 Inspection Window
 Table (or Case Table)
 Graph (or Plot)  bar, pie, histogram, dotplot, stemplot
 Summary Table or Chart
Data & Distributions : shape, center, spread

univariate data ("attributes" in Fathom software)
 quantitative variable
 categorical (or qualitative) variable

SHAPE of distribution
 symmetric
 uniform (or rectangular) distribution
 normal distribution
 skewness
 left skewed distribution
 right skewed distribution
 other
 bimodal distribution

CENTER [summary statistic: single value measurement or computation]
 mean
 median
 mode (only for bimodal)

SPREAD [summary statistic: single or multiple valued measurement or computation]
 deviation (or residue)
 for means : standard deviation (s) or variance (s_{2})
 for medians : quartiles (ranges)
 5number summary : min, Q1, Q2 (median), Q3, max
 interquartile range (IQR)
Plots
 boxplot (and modified boxplot) : outliers : Q1  1.5*IQR or Q3 + 1.5IQR
 symmetric versus skewed (right or left)
 cumulative frequency graph (or "Ogive")
 time plot
Linear Transformations : effects on mean, spread and change of units
Context Clues
 statistical use of a "count" :
 Mathematics : Generally, a "count" is a whole number (0, 1, 2, 3, ...). However, it may be a realvalued number if the the item being "counted" is grouped in equalsized units such as 12 bottles of soda in a case.
 English : each numeric value must have units such as "oranges" or "individuals" or "cases of soda"
 Examples : 20 oranges, 17 people, 22.5 cases of soda
 statistical use of a "proportion" :
 Concept : part / total (i.e. percentage)
 Mathematics : realvalued ratio between 0 and 1 (inclusive)
 English : same units divide by same units (i.e. unitless quantity)
 Example : In a class of 25 students with 10 girls and 15 boys, the proportion of girls in the classroom is 10/25 (i.e. 40%). NOTE: Technically, the units are"students per students" but this does not need to be specified.
 statistical use of a "mean" :
 Concept : the (arithmetic) average value of a set of numbers
 Mathematics : ( Σ x_{i}) / n = sum of numbers / count of numbers
 English : units divided by a count (i.e. the mean has the same units as the numbers themselves)
 Example : In a class of 5 students, the points on a given quiz are 43, 47, 50, 40, and 44. The mean is (43+47+50+40+44) / 5. = 44.8 points/student. NOTE: The units are"points per student" and must be specified.
Chapter 3 Terminology : Relationships
Chapters 1 & 2: One Variableunivariate data : shape > center > spread 
Chapter 3: Two Variablesbivariate data: shape > trend > strength > variability 

Key Idea 
Distribution 
Relationship (association) : x = explanatory variable = independant variable y = response variable = dependant variable 
Plots/Graphs 
Dot plot 
ScatterPlot 
Shape 
Normal, uniform, or skewed Symmetric Clusters, gaps, and outliers 
Linear or curved constant strength clusters, gaps, outliers (in one variable), influential point (either or both variables) 
Ideal Shape 
Normal 
Linear (oval/ellipse) : positive or negative direction 
Measure of Center 
Mean Median 
Regression Line (LSRL) influential point 
Measure of Spread 
Standard Deviation Interquartile Range 
Correlation, r 

bivariate data (from Chapter 3): shape > trend > strength > variability
 plausible explanation : causation, common response, or confounding
 lurking variable
 residual plots
 outliers

scatter plots : shape > trend > strength > variability
 data's shape : linear, curved, or none
 y = a_{1} + b_{1}*x [equivalent to algebra equation of line : y = mx + b]
 shape's trend : positive slope, negative slope, or none
 b_{1} : measure of slope
 trend's strength : strong trend (tight cluster), moderate trend (some clustering), or weak trend (no cluster)
 correlation : a measure of a trend's strength
 strength's variability : uniform or heteroscedasticity (fanshaped)

summary line

 Least Squares Regression Line (also called LSRL)
 Line of Best Fit (best guess or LSRL)
 Regression Line (LSRL)
 Trend Line (LSRL)
 Fitted Line (best guess)

statistical (math or calculator) symbols and terms
 randomNormal ( mean, SD )
 least squares regression line ("regression line" or just LSRL) : y = a_{1} + b_{1}*x
 explanatory variable or predictor variable, x
 response variable or observed variable, y
 slope, b_{1 } = r* (s_{y}/s_{x})
 yintercept, a_{1}
 predicted value, ŷ
 interpolations, extrapolation
 influential point (outlier)
 residual = yŷ = observed y  predicted y
 sum of square errors (SSE)
 r : correlation coefficient
 r^{2} : coefficient of determination
 "correlation does not imply causation" due to lurking variable
Chapter 4 Terminology : TwoVariable Data
transformations
 linear : y = a + bt
 increase is fixed amount from previous values
 exponential growth model : y = ab^{t}
 increase is fixed percentage from previous values
 plot "log y against x" to see linear pattern
 power transformations : y = at^{b}
 use "log y against log x" graph to see linear pattern
 square, p=2, y = t^{2}
 reciprocal, p=1, y = 1/t
 reciprocal square root, p=1/2, y = t^{1/2}
 logarithm, p=0, y = log(t)
correlation and regression interpretation
 "correlation does not imply causation" due to lurking variable
 using LSRL for interpolation and extrapolation
 causation, common response, confounding
 categorical data (twoway tables and bar graphs)
 for counts and percentages within categories
 marginal distributions (single categorical variable)
 conditional distributions
 Simpson's Paradox (lurking variable as a precondition)
Chapter 5 Terminology : Producing & Exploring Data
 units (and population size)
 population census versus sample of population
 population parameter versus sample statistic

samples  unbiased representatation of population [random and independent]
 simple random sample (SRS) : each sample as an equal probability of being selected. Use of table of random digits
 stratified random sample : groups (strata) of similar individuals, SRS within each strata
 cluster sample
 twostage (or multistage) cluster sample
 systematic sample with random start

sample bias
 selection bias
 size bias
 volunteer bias
 convenience bias
 judgement bias
 undercoverage : certain groups left out of process of choosing sample
 nonresponsive bias
 questionnaire bias
 wording or language bias
 incorrect response bias

Experiment vs Observational Study
 designs of experiments :
 experimental units>subjects>treatment (factor vs plecebo)>observed response
 1. control; 2. randomize; 3. replicate
 blind experiment, double blind experiment
 completely randomized design
 randomized matched paired design (2 treatments)
 randomized block design
 variables in experiments
 explanatory variable or factor (if catagorical)
 response variable
 lurking variable
 confounding variable
 variability
 betweentreatment
 withintreatment
 statistically significant
 lack of realism

Simulation of Experiments : imitation of chance behavior based on an accurate model of the experiment
 Describe the experiment or state problem
 State assumptions
 Assign digits to represent outcomes (random number table)
 Simulate many repetitions
 State conclusions
Chapter 6 Terminology : Probability : a study of randomness
 event
 table of random digits
 venn diagrams

Probability
 event, P(A)
 complement of event, P(A_{c})
 distribution
 model
 mutually exclusive [disjoint categories]
 conditional events
 independent events
 "of at least one"
 variance & standard deviation

Sampling
 space
 with Replacement
 without Replacement

Mathematical
 Law of Large Numbers
 Fundamental Counting Principle
 Terminology with Probabilities
 Legal Values : 0 ≤ P(A) ≤ 1 for any Event A
 Total Probability = 1 : P(all sample space) = 1 : sum total probability of all sample space is 1.00 (100%)
 Complement of Event : P(A^{c}) = 1  P(A)
 P("of at least one") = 1  P("exactly none")
 Mutually Exclusive Events [i.e. Disjoint] : P(A and B) = 0
 Independent Events : P(AB) = P(A) also P(A and B) = P(A)*P(B))
 Conditional Events : P(AB) = P(A and B) / P(B) often rewritten as P(A and B) = P(AB) * P(B)
 Addition Rules (aka Union) ["or"]
 Full Rule : P(A or B) = P(A) + P(B)  P(A and B)
 Simplified Rule for Mutually Exclusive Events [Disjoint] : P(A or B) = P(A) + P(B)
 Multiplication Rules (aka Intersection) ["and"]
 Full Rule : P(A and B) = P(A) * P(BA) can also be rewriten as P(A and B) = P(B) * P(AB)
 Simplified Rule for Independent Events : P(A and B) = P(A)*P(B)
 Bayes' Rule
 Tree Diagrams
Chapter 7 Terminology : Random Variables : numeric outcome of a random phenomenon
 Random Variable, X
 Law of Large Numbers
 Discrete (integer values only for X)
 P(X=a) : probability of the random variable being a fixed value; height of a probability histogram bar
 Continuous (any real value for X)
 P(X<a) : probability of the random variable being less than (or equal to) fixed value; area under the curve from left side to a [reading left to right]
 P(X>a) : probability of the random variable being greater than (or equal to) fixed value; area under the curve from a to right side [reading left to right]
 P(a<X<b) : probability of the random variable being between than (or equal to) two fixed values; area under the curve between a and b.
 Distributions
 Probability Distributions for a Random Variable, X
 mean, u_{x} : sometimes misleadingly called the Expected Value, E(X)
 standard deviation, σ_{x}
 variance, σ_{x}^{2} : correlation for independant random variables = 0
 linear combinations of random variables : u_{x+y} = u_{x} + u_{y}; σ_{x+y}^{2} = σ_{x}^{2} + σ_{y}^{2}
 from Collected Data
 using Known Data Frequencies that model Your Situation
 simulation using Random selection from a known data
 from Theory
 assumptions + Basic Mathematical Principles
Chapter 8 Terminology : Binomial and Geometric Distributions
 Binomial Distributions : Discrete Random Variable with only 2 categories (i.e. "success" or "failure")
 n = fixed number of independant trials, must be known
 p = probability of success on any one trial, must be the same for each trial, must be known
 1p = q = probability of failure on any one trial
 P(X = k) = _{n}C_{k} p^{k} (1  p)^{n  k}
 P(X > n) = 1  P(X ≤ n)
 probability distribution function at X="number of success":
 binompdf(number of trials, probability of success, number of successes)
 cumulative (i.e. sum of ) probability distribution function for 0 ≤ X ≤ "number of successes" = area under curve
 binomcdf(number of trials, probability of success, number of successes)
 Statistics of a Binomial Distribution
 u = np
 σ = √_{[np(1p)]}
 Normal approximation to Binomial
dDistribution ~ BINS (binomial, independant, number of trials is
fixed, success probabilities
is
known)
 N(np, √_{[np(1p)]} ) : used only if np ≥ 10 and n(1p) ≥ 10
 Geometric Distributions : Discrete Random Variable with only 2 categories (i.e. "success" or "failure")
 n = number of independant trials (not fixed)... we are trying to determine this number of trials before we get our first "success"!
 p = probability of success on any one trial, must be the same for each trial, must be known
 1p = q = probability of failure on any one trial
 P(X = n) = (1p)^{n 1} p^{1}
 P(X > n) = 1  P(X≤n)
 probability distribution function at X="number of success":
 binompdf(number of trials, probability of success, number of successes)
 cumulative (i.e. sum of ) probability distribution function for 0 ≤ X ≤ "number of successes" = area under curve
 binomcdf(number of trials, probability of success, number of successes)
 Statistics of a Geometric Distribution
 u = 1/p
 σ = √_{(1p)} / p
Chapter 2 Terminology : Normal Distributions

probability density curve of population (all possible) data
 total area under pd curve = 1
 visually determine : median (equal area point), mean (balance point)
 skewed : mean further toward tail than median
 percentages calculated from probability density curve
 normal probability curve (i.e. bell curve) from population data
 Generally :
 u : population mean [also the formula to calculate]
 σ : population standard deviation, [also the formula to calculate]
 N(u, σ ) : Normal (bell) curve [aka Probability Density Curve]
 Standardized :
 zscore as a standard point on normal curve [also the formula to calculate as well as recentering and rescaling]
 N(0,1) is the Standard Normal curve (i.e. Zcurve)
 689599.7 rule
 Determine normality from plot of histogram, stemplot and/or boxplot

statistical (math or calculator) symbols and terms from sample data
 xbar : sample mean as an estimate for u_{x} [also the formula to calculate]
 s : sample standard deviation as an estimate for σ_{x} [also the formula to calculate]
 normalcdf ( leftbound, rightbound, mean, standard deviation)
 z = invNorm ( area, mean, SD )