Lévy's continuity theorem

From formulasearchengine
Revision as of 01:59, 20 November 2013 by en>Frietjes
Jump to navigation Jump to search

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient statistical power. In complicated studies there may be several different sample sizes involved in the study: for example, in a survey sampling involving stratified sampling there would be different sample sizes for each population. In a census, data are collected on the entire population, hence the sample size is equal to the population size. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

Sample sizes may be chosen in several different ways:

  • expedience - For example, include those items readily available or convenient to collect. A choice of small sample sizes, though sometimes necessary, can result in wide confidence intervals or risks of errors in statistical hypothesis testing.
  • using a target variance for an estimate to be derived from the sample eventually obtained
  • using a target for the power of a statistical test to be applied once the sample is collected.

How samples are collected is discussed in sampling (statistics) and survey data collection.

Introduction

Larger sample sizes generally lead to increased precision when estimating unknown parameters. For example, if we wish to know the proportion of a certain species of fish that is infected with a pathogen, we would generally have a more accurate estimate of this proportion if we sampled and examined 200 rather than 100 fish. Several fundamental facts of mathematical statistics describe this phenomenon, including the law of large numbers and the central limit theorem.

In some situations, the increase in accuracy for larger sample sizes is minimal, or even non-existent. This can result from the presence of systematic errors or strong dependence in the data, or if the data follow a heavy-tailed distribution.

Sample sizes are judged based on the quality of the resulting estimates. For example, if a proportion is being estimated, one may wish to have the 95% confidence interval be less than 0.06 units wide. Alternatively, sample size may be assessed based on the power of a hypothesis test. For example, if we are comparing the support for a certain political candidate among women with the support for that candidate among men, we may wish to have 80% power to detect a difference in the support levels of 0.04 units.

Estimating proportions and means

A relatively simple situation is estimation of a proportion. For example, we may wish to estimate the proportion of residents in a community who are at least 65 years old.

The estimator of a proportion is p^=X/n, where X is the number of 'positive' observations (e.g. the number of people out of the n sampled people who are at least 65 years old). When the observations are independent, this estimator has a (scaled) binomial distribution (and is also the sample mean of data from a Bernoulli distribution). The maximum variance of this distribution is 0.25*n, which occurs when the true parameter is p = 0.5. In practice, since p is unknown, the maximum variance is often used for sample size assessments.

For sufficiently large n, the distribution of p^ will be closely approximated by a normal distribution.[1] Using this approximation, it can be shown that around 95% of this distribution's probability lies within 2 standard deviations of the mean. Using the Wald method for the binomial distribution, an interval of the form

(p^20.25/n,p^+20.25/n)

will form a 95% confidence interval for the true proportion. If this interval needs to be no more than W units wide, the equation

40.25/n=W

can be solved for n, yielding[2][3] n = 4/W2 = 1/B2 where B is the error bound on the estimate, i.e., the estimate is usually given as within ± B. So, for B = 10% one requires n = 100, for B = 5% one needs n = 400, for B = 3% the requirement approximates to n = 1000, while for B = 1% a sample size of n = 10000 is required. These numbers are quoted often in news reports of opinion polls and other sample surveys.

Estimation of means

A proportion is a special case of a mean. When estimating the population mean using an independent and identically distributed (iid) sample of size n, where each data value has variance σ2, the standard error of the sample mean is:

σ/n.

This expression describes quantitatively how the estimate becomes more precise as the sample size increases. Using the central limit theorem to justify approximating the sample mean with a normal distribution yields an approximate 95% confidence interval of the form

(x¯2σ/n,x¯+2σ/n).

If we wish to have a confidence interval that is W units in width, we would solve

4σ/n=W

for n, yielding the sample size n = 16σ2/W2.

For example, if we are interested in estimating the amount by which a drug lowers a subject's blood pressure with a confidence interval that is six units wide, and we know that the standard deviation of blood pressure in the population is 15, then the required sample size is 100.

Required sample sizes for hypothesis tests <Estimating sample sizes>...</Estimating sample sizes>

A common problem faced by the statisticians is calculating the sample size required to yield a certain power for a test, given a predetermined Type I error rate α. As follows, this can be estimated by pre-determined tables for certain values, by Mead's resource equation, or, more generally, by the cumulative distribution function:

By tables

[4]
 
Power
Cohen's d
0.2 0.5 0.8
0.25 84 14 6
0.50 193 32 13
0.60 246 40 16
0.70 310 50 20
0.80 393 64 26
0.90 526 85 34
0.95 651 105 42
0.99 920 148 58

The table shown at right can be used in a two-sample t-test to estimate the sample sizes of an experimental group and a control group that are of equal size, that is, the total number of individuals in the trial is twice that of the number given, and the desired significance level is 0.05.[4] The parameters used are:

Mead's resource equation

Mead's resource equation is often used for estimating sample sizes of laboratory animals, as well as in many other laboratory experiments. It may not be as accurate as using other methods in estimating sample size, but gives a hint of what is the appropriate sample size where parameters such as expected standard deviations or expected differences in values between groups are unknown or very hard to estimate.[5]

All the parameters in the equation are in fact the degrees of freedom of the number of their concepts, and hence, their numbers are subtracted by 1 before insertion into the equation.

The equation is:[5]

E=NBT,

where:

  • N is the total number of individuals or units in the study (minus 1)
  • B is the blocking component, representing environmental effects allowed for in the design (minus 1)
  • T is the treatment component, corresponding to the number of treatment groups (including control group) being used, or the number of questions being asked (minus 1)
  • E is the degrees of freedom of the error component, and should be somewhere between 10 and 20.

For example, if a study using laboratory animals is planned with four treatment groups (T=3), with eight animals per group, making 32 animals total (N=31), without any further stratification (B=0), then E would equal 28, which is above the cutoff of 20, indicating that sample size may be a bit too large, and six animals per group might be more appropriate.[6]

By cumulative distribution function

Let Xi, i = 1, 2, ..., n be independent observations taken from a normal distribution with unknown mean μ and known variance σ2. Let us consider two hypotheses, a null hypothesis:

H0:μ=0

and an alternative hypothesis:

Ha:μ=μ*

for some 'smallest significant difference' μ* >0. This is the smallest value for which we care about observing a difference. Now, if we wish to (1) reject H0 with a probability of at least 1-β when Ha is true (i.e. a power of 1-β), and (2) reject H0 with probability α when H0 is true, then we need the following:

If zα is the upper α percentage point of the standard normal distribution, then

Pr(x¯>zασ/n|H0 true)=α

and so

'Reject H0 if our sample average (x¯) is more than zασ/n'

is a decision rule which satisfies (2). (Note, this is a 1-tailed test)

Now we wish for this to happen with a probability at least 1-β when Ha is true. In this case, our sample average will come from a Normal distribution with mean μ*. Therefore we require

Pr(x¯>zασ/n|Ha true)1β

Through careful manipulation, this can be shown to happen when

n(zαΦ1(1β)μ*/σ)2

where Φ is the normal cumulative distribution function.

Stratified sample size

With more complicated sampling techniques, such as stratified sampling, the sample can often be split up into sub-samples. Typically, if there are H such sub-samples (from H different strata) then each of them will have a sample size nh, h = 1, 2, ..., H. These nh must conform to the rule that n1 + n2 + ... + nH = n (i.e. that the total sample size is given by the sum of the sub-sample sizes). Selecting these nh optimally can be done in various ways, using (for example) Neyman's optimal allocation.

There are many reasons to use stratified sampling:[7] to decrease variances of sample estimates, to use partly non-random methods, or to study strata individually. A useful, partly non-random method would be to sample individuals where easily accessible, but, where not, sample clusters to save travel costs.[8]

In general, for H strata, a weighted sample mean is

x¯w=h=1HWhx¯h,

with

Var(x¯w)=h=1HWh2Var(x¯h).[9]

The weights, Wh, frequently, but not always, represent the proportions of the population elements in the strata, and Wh=Nh/N. For a fixed sample size, that is N=Nh,

Var(x¯w)=h=1HWh2Varh(1nh1Nh),[10]

which can be made a minimum if the sampling rate within each stratum is made proportional to the standard deviation within each stratum: nh/Nh=kSh, where Sh=Varh and k is a constant such that nh=n.

An "optimum allocation" is reached when the sampling rates within the strata are made directly proportional to the standard deviations within the strata and inversely proportional to the square root of the sampling cost per element within the strata, Ch:

nhNh=KShCh,[11]

where K is a constant such that nh=n, or, more generally, when

nh=KWhShCh.[12]

Too-large sample size problem

A too large sample size may lead to the rejection of a null hypothesis even if the actual effect is so small, that it does not have practical importance. See Null_hypothesis#Too-large_sample_size_problem.

Software of sample size calculations

See Software for Power and Sample Size Calculations.

See also

Sportspersons Hyslop from Nicolet, usually spends time with pastimes for example martial arts, property developers condominium in singapore singapore and hot rods. Maintains a trip site and has lots to write about after touring Gulf of Porto: Calanche of Piana.

Notes

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

References

  • One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting

    In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang

    Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules

    Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.

    A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running

    The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more

    There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang
  • 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

    My blog: http://www.primaboinca.com/view_profile.php?userid=5889534

Further reading

External links

Template:Statistics

de:Zufallsstichprobe#Stichprobenumfang

  1. NIST/SEMATECH, "7.2.4.2. Sample sizes required", e-Handbook of Statistical Methods.
  2. "Large Sample Estimation of a Population Proportion"
  3. "Confidence Interval for a Proportion"
  4. 4.0 4.1 Chapter 13, page 215, in: 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

    My blog: http://www.primaboinca.com/view_profile.php?userid=5889534
  5. 5.0 5.1 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

    My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 online Page 29
  6. Isogenic.info > Resource equation by Michael FW Festing. Updated Sept. 2006
  7. Kish (1965, Section 3.1)
  8. Kish (1965), p.148.
  9. Kish (1965), p.78.
  10. Kish (1965), p.81.
  11. Kish (1965), p.93.
  12. Kish (1965), p.94.