Glossary of Epidemiological Terms
 
 

Accuracy (syn. validity): the ability of a diagnostic test to produce correct test results. Measures of diagnostic accuracy include sensitivity and specificity.

Apparent Prevalence: (AP): the probability that a randomly selected unit of analysis has a positive test result.  The apparent prevalence can be expressed using the prevalence (p), sensitivity (h), and specificity (q) as:

AP = hp + (1 - q) (1 - p).


Bayesian Statistics: The process by which prior uncertainty about a quantity or quantities is formally described and, through the application of Bayes' Theorem, updated following consideration of observed data.  Statistics is primarily concerned with the analysis of data, either to assist in the appreciation of some underlying mechanism, or to facilitate effective decisions. In both cases, some uncertainty exists and the statistician's tasks are both to reduce this uncertainty and to explain it clearly. One way of looking at statistics stems from the appreciation that all uncertainty must be described by probability: that probability is the only sensible language for a logic that deals with all degrees of uncertainty, and not just with the extremes of truth and falsity. This is called Bayesian Statistics.

Conditional Independence

Two tests are conditionally independent when the sensitivity (or specificity) of the second test (T2) does not depend on whether results of the first test (T1) are positive or negative among infected (or non-infected) individuals.  For example, if a test with sensitivity = 0.90 is used to test a population of 100 infected animals, we would expect 10 animals to yield false negative test results.  If a second test with a sensitivity = 0.80 is used to test the 10 animals that initially tested negative and the 2 tests were conditionally independent, then 8 of 10 animals would be expected to test positive on the second test.  Hence, the sensitivity of the second test is 0.80 regardless of results of the first test, i.e.

Pr (T2 + | T1 +, infected) = Pr (T2 + | T1 -, infected)  = Pr (T2 + | infected) =  0.80. 

Similarly if T2 were performed first, conditional independence means that 

Pr (T1 + | T2 +, infected) = Pr (T1 + | T2 -, infected) = Pr (T1 + | infected) = 0.90.

We note that if either of the tests is perfectly sensitive (specific) then the test sensitivities (specificities) are conditionally independent, by definition. The terms “dependence” and “correlation” are used interchangeably by some authors, but the former term is preferable when binary tests are used


Confounding: a variable that is associated with an independent variable(s) as well as the outcome of interest.  For example, if using logistic regression to model the presence of a disease, Y, as a function of some covariate of interest, X, then a confounding variable, W, would be associated both with X and Y.  The estimated regression coefficients, b, would be biased if W were not included in the regression model along with X.  Thus, if W were excluded, one would say that the estimated effect of X on Y was confounded by W.

Credible Interval (syn. "posterior probability interval" or "credibility interval"): the calculated interval that has a specified probability of containing a parameter of interest (such as a regression coefficient, or hazard ratio, for example), given the observed data.  For example, if one obtained a 95% central credible interval for some parameter, say, sensitivity, of (0.85, 0.96) with a mode of 0.92, then we would conclude that the most likely value of sensitivity was 0.92 and that we were 95% certain that the true value of sensitivity was between 0.85 and 0.96.

Diagnostic test: a test that directly measures a sign, substance, response, or tissue change that is either an absolute or reasonable surrogate predictor of a disease or disease agent.  Frequently, diagnostic test results are either continuous (such as an optical density reading using an ELISA), ordered (such as serum neutralization titers), or dichotomous (a precipitate is present or not on an AGID).  By convention, diagnostic tests based on continuous or ordered results are frequently dichotomized for decision-making purposes.  The accuracy is commonly measured by their sensitivity and specificity.

False Negative: an individual that is truly positive for a disease, but which a diagnostic test classifies as disease-free.

False Positive: an individual that is truly disease-free, but which a diagnostic test classifies as positive for a disease.

Gibbs Sampler:  An analytical approach for approximating complex posterior distributions in Bayesian analyses, where the full conditional distributions of the constituent parameters of the posterior distribution are sampled using MCMC methods.  For instance, for a posterior distribution of interest, say, p(a,b,g|data), then a Gibbs sampler would be performed by selecting initial values of the parameters, say a(1), b(1), and g(1), and then iteratively sampling:

                     (1)  a(i+1)|b(i),g(i),data from the full conditional for a,
                     (2)  b(i+1)|a(i+1),g(i),data from the full conditional for b, and
                     (3)  g(i+1)|a(i+1),b(i+1),data from the full conditional for g, and then continuing with
                     (4)  a(i+2)|b(i+1),g(i+1),data from the full conditional for a,
                     etc.,

for i = 1, 2, ... , MC, where MC is the Monte Carlo sample size.  Inferences about the parameters  a, b, and g, are then based on the numerical approximations of their respective posterior distributions.


Gold Standard: A diagnostic test that has perfect sensitivity and perfect specificity.

Incidence Proportion (IP): The proportion of healthy individuals that develop some disease of interest during a defined period of time.  For example, if at time = 0, a sample of n individuals is completely disease free, but by time = 1, y individuals have experienced a disease "incident," then the IP of that disease is equal to y/n during the time interval from 0 to 1.  Note that specification of the time interval associated with the fraction y/n is an important component of the definition of the incidence proportion.  Note also that in some epidemiology texts and journal articles, the term "cumulative incidence" is used in place of incidence proportion.

Likelihood: The joint probability of all the unknown parameters, q, in a model for some observed data and is written L(q).  In statistics and epidemiology, the term "likelihood" has a very specific meaning and should not be interchanged with descriptors that refer to probability or frequency of events.  The method of maximum likelihood uses iterative maxima-seeking algorithms to find those values of q that maximize L(q).

Logistic Regression: A class of Generalized Linear models in which a dichotomous outcome is modeled as a function of regression coefficients and covariates using a logit link.  The logistic regression model, like the probit and complimentary log-log regression models, is a natural choice when modeling probabilities, such as prevalence and incidence proportions.  Moreover, simple functions of the coefficients, such as odds ratios, are obtained using logistic regression.  These quantities have epidemiologic meaning under a variety of conditions, such as the rare disease assumption.  More formally, the logistic regression model is written as follows:

logit(p) = x'b, y~ binomial(n, p)

where p is a probability of interest, usually prevalence or an incidence proportion in epidemiologic studies, x is a vector of covariates, where the first covariate is a 1, b is the vector of regression coefficients, y is a vector containing the number of individuals with a particular covariate pattern that are disease positive, and n is a vector containing the number of individuals with a particular covariate pattern.  Note that since logit(p) = ln[ p / (1-p) ], it follows that p = exp(x'b) / [ 1+exp(x'b) ].  Note also that the odds ratio for a particular variable, say, xi, is given by exp(bi).


Logit Link:  A function that transforms a probability, with support on [0,1], to a quantity with support (-¥,¥).  The link function is applied to the probability of success for some outcome modeled using logistic regression.  For some probability, p, the logit link is defined as:

logit(p) = ln[ p / (1-p) ].


Posterior Distribution:  A concept associated with Bayesian statistics. The posterior distribution is the distribution of some parameter of interest given the observed data and the prior distribution for that parameter.  It is a result of applying the observed data to the prior distribution using Bayes' theorem.

Predictive Value Positive (PVP): the probability that an individual is truly positive for a disease, given that a dichotomous test returns a positive result.  For example, if Y denotes a test result (0 = negative, 1 = positive) and Z denotes the true status of an individual (again, 0 = negative, 1 = positive) , then:

PVP = Pr(Z=1|Y=1).

Note that PVP is dependent both on the diagnostic test characteristics (sensitivity and specificity), and prevalence.  Letting p = prevalence, h = sensitivity, and q = specificity, PVP is given by:

PVP = ph / [ph + (1-p)(1-q)] .


Predictive Value Negative (PVN): the probability that an individual is truly disease negative, given that a dichotomous test returns a negative result.  For example, if Y denotes a test result (0 = negative, 1 = positive) and Z denotes the true status of an individual (again, 0 = negative, 1 = positive) , then:

PVN = Pr(Z=0|Y=0).

Note that PVN is dependent both on the diagnostic test characteristics (sensitivity and specificity), and prevalence.  Letting p = prevalence, h = sensitivity, and q = specificity, PVN is given by:

PVN = q(1-p) / [q(1-p) + (1-h)p] .


Prevalence: the fraction of a sample of individuals that has some disease of interest at a particular point in time.  Prevalence is frequently denoted with the greek letter p.  For example, if n individuals are sampled at a given time, and y individuals are classified as positive for the disease in question, then the prevalence of that disease at that point in time is simply y/n.

Prior: a probability distribution reflecting previous experimental data, or expert opinion (or both) and provides the basis for a Bayesian statistical model.  When appropriately combined with the observed data, the prior is "updated" to provide the "posterior distribution," used to make inferences and draw conclusions.

Sensitivity: the probability that a dichotomous test yields a positive result, given that the true status of the individual tested is positive for the disease.  For example, if Y denotes a test result (0 = negative, 1 = positive) and Z denotes the true status of an individual (again, 0 = negative, 1 = positive) , then:

Sensitivity = Pr(Y=1|Z=1). 


Specificity: the probability that a dichotomous test returns a negative result, given that the true status of the individual tested is negative for the disease. For example, if Y denotes a test result (0 = negative, 1 = positive) and Z denotes the true status of an individual (again, 0 = negative, 1 = positive) , then:

Specificity = Pr(Y=0|Z=0).


True Negative: a negative test result for an individual that is truly negative for a particular disease.

True Positive: a positive test result for an individual that is truly positive for a particular disease.

 
 
 
 

 
 

last revised: 3/13/03