1、Unit2高等数学英文 出国留学Unit 2: Introduction to Experimental Design Lecture Notes (Written by Nathan) I. Sample Survey The population in a statistical study is the entire group of individuals we want information or draw a conclusion about. Any numerical value that comes from the population is a parameter. A
2、 census collects data from every individual in the population. In the real world, time and cost considerations usually make it impossible to analyze an entire population whose “truth” is essentially unknowable. A sample is a subset of the population that is actually being examined and used to repres
3、ent the population. Any numerical value that comes from a sample is called a statistic. The idea is to use statistics to draw conclusions about unknown parameters. While a population parameter is a fixed quantity, statistics vary depending on the particular sample chosen. Sample Survey: The distinct
4、ion between population and sample is basic to statistics. To derive conclusions about the large population, we need to be confident that the sample we have chosen represents that population fairly. II. Bias Sampling methods that, by their nature, tend to over- or underemphasize some characteristics
5、of the population are said to be biased. Conclusions based on samples drawn with biased methods are inherently flawed. Bias is not just bad luck in one sample. It is the result of a bad study design that will consistently miss the truth about the population in the same way. Choosing individuals from
6、 the population who are easy to reach results in a convenience sample. Many survey designs suffer from under-coverage, in which some portion of the population is not sampled at all, or has a smaller representation in the sample than it has in the population. Response bias refers to anything in the s
7、urvey design that influences the responses. o It is a type of cognitive bias which can affect the results of a statistical survey. (e.g., respondents answer questions in the way they think the questioner wants them to answer rather than according to their true beliefs.) o Such circumstances lead to
8、a nonrandom deviation of the answers from their true value. Because this deviation takes on average the same direction among respondents, it creates a systematic error of the measure, or bias o These biases are most prevalent in the types of studies and research that involve participant self-report,
9、 such as structured interviewsor surveys. As a result, response biases can have a large impact on the validity of the questionnaire or survey to which the participant is responding. Wording bias: Non-neutral or poorly worded questions may lead to answers that are very unrepresentative of the populat
10、ion. Voluntary response bias: samples based on individuals who offer to participate, usually by responding to a general invitation, typically displays excessive emphasize to people with strong opinions, thus leading to bias. Non-response biasoccurs when an individual chosen for the sample cant be co
11、ntacted or refuses to participate. o Non-response to surveys often exceeds 50%, even with careful planning and several follow-up calls. o If the people who respond differ from those who dont, in a way that is related to the response, bias results. o Short, easily understood surveys generally have hi
12、gher response rates The main technique to avoid bias is to incorporate randomness into the selection process. Randomizing protects us from the influences of all the features of our population by making sure that, on average, the sample looks like the rest of the population. The larger the sample, th
13、e better the results, but what is critical is the sample size, not the percentage or fraction of the population. A random sample of size 500 from a population of size 100,000 is just as representative as a random sample of size 500 from a population of size 1,000,000. III. Sampling 1. Random Samplin
14、g Random sampling involves using a chance process to determine which members of a population are included in the sample. A Simple random sample (SRS) of size n is chosen in such a way that every possible group of n individuals in the population has an equal chance to be selected as the sample. o An
15、SRS gives every possible sample of the desired size an equal chance to be chosen. It also gives each member of the population an equal chance to be included in the sample. o An SRS is the standard against which we measure other sampling methods, and the sampling method on which the theory of working
16、 with sampled data is based. Choosing an SRS with Technology Step 1: Label. Give each individual in the population a distinct numerical label from 1 to N Step 2: Randomize. Use a Random Number Generator (RNG) to obtain n different integers from 1 to N. Samples drawn at random generally differ one fr
17、om another. These differences lead to different values for the variables we measure. We call these sample-to-sample differences sampling variability or sampling error. o Different samples give different sample statistics, all of which are estimates of a population parameter. o Sampling error relates
18、 to natural variation between samples, can never be eliminated, can be described using probability, and is generally smaller if the sample size is larger. 2. Stratified Sampling In stratified samplingthe population is first divided into homogenous (meaning of similar if not the same features) groups
19、 (known as strata), then take an SRS out of each stratum and combine these SRSs to form the full sample. o Choose the strata based on facts known before the sample is taken. For example, in a study of sleep habits on school nights, the population of students in a large university might be divided in
20、to freshmen, sophomore, junior, and senior strata. o Samples taken within a stratum vary less, so the resulting estimates can be more precise. This reduced sampling variability is the most important benefit of stratifying. Stratified samples give useful information about each stratum. Stratified ran
21、dom sampling works best when the individuals within each stratum are similar with respect to what is being measured and when there are large differences between strata. We could further do proportional sampling, where the sizes of the random samples from each stratum depend on the proportion of the
22、total population represented by the stratum. 3. Cluster Sampling In cluster sampling the population is divided into representative, heterogeneous groups, known as clusters. We then randomly select one or more clusters to be the sample. Some statisticians take an SRS from each cluster rather than inc
23、luding all members of the cluster. Clusters are internally heterogeneous, each resembling the overall population. Each cluster should be similar to every other cluster. Cluster samples are often used for practical reasons saving time and cost. Cluster sampling works best when the clusters look just
24、like the population but on a smaller scale. 4. Multistage Sampling Multistage samplingrefers to a procedure involving two or more steps, each of which (being of SRS) could involve any of the various sampling techniques. The Gallup organization, for example, often follows a procedure in which nationw
25、ide locations are randomly selected, then neighborhoods are randomly selected in each of these locations, and finally households are randomly selected in each of these neighborhoods. IV. Experiments 1. Observational Study (OS) An observational study observes individuals and measure variables of inte
26、rests but does not influence the responses. In an OS, subjects choose their own actions and researchers observe what they do. A sample survey is an observational study in which we draw conclusions about an entire population by considering an appropriately chosen sample to look at. 2. Experiment An e
27、xperiment deliberately imposes some “treatment” on individuals to measure their responses. A treatment is a specific condition applied to the individuals in an experiment. An experiment is performed on objects called experimental units, and if the units are people, they are called subjects. The expe
28、rimental units or subjects are typically divided into two groups: treatment and control. Treatment Group: the group that receives a treatment Control Group: the group that receives no treatment or old, established treatment. A control group must be used for comparison. For example, if you are testin
29、g a treatment for a sprained ankle, you must have a group that gets no treatment because sprained ankles naturally get better over time. You need to show that the treatment group gets better faster than the control group. Placebo Group: A control group that receives a placebo (fake drug) in experime
30、nts involving medicines. Placebo effect refers to the fact that many people respond to any kind of “perceived” treatment.The purpose of the placebo is to separate genuine treatment effects from possible subject responses due to simply being part of an experiment. Experiments involve explanatory vari
31、ables, called factors, which are believed to have an effect on response variables, which measure the outcomes of an experiment. The choices you have for each factor are known as levels. The different factor-level combinations are treatments. In an experiment, we study the effects of the specific tre
32、atments we are interested in, while trying to control for the effects of other variables. The purpose of an experiment is to determine whether the treatment causes a change in the response. The experiment compares the responses in the treatment group to the responses in the control group. Some experimental designs dont include a control group. That
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1