1、Lecture notes module 1Predicting Food QualityLecture notes for Models, errors and uncertainty (Module 1)Tiny van Boekel, PDQAs explained in the introduction, models are essential in predicting food quality in order to get a grip on it. This module takes a closer look at the properties of mathematica
2、l models and their statistical analysis, because it is necessary to understand their characteristics so that we can exploit them maximally for our purpose. Structure of mathematical modelsThe basic question to address first is about the nature of mathematical models. They should reflect a quantitati
3、ve relationship between variables. We can make a distinction between the independent variable (the x-value, e.g., time, pH, temperature), the one that we can set ourselves, and the dependent variable (the y-variable, e.g., concentration, rate of a reaction), the one that we observe as the response o
4、f the system that we are studying upon a change in the independent variable. The simplest model is of the type:Eqn 1This model represents a linear relationship: see Figure 1. Figure 1. Graphical representation of model (1) y= a+bx, for a= 8, b= 3.This model is in fact the equation for a straight lin
5、e, with intercept a and slope b, a and b are the so-called parameters. Obviously, a model can take many more forms than just this simple equation. Other types are: Eqn 2This model displays an exponential relationship: see Figure 2.Figure 2. Graphical representation of model (2:) y = a exp(bx) for a=
6、2, b= 0.3 (A) and a= 10, b=-0.3 (B)Another model is shown in Figure 3, a hyperbola :Eqn 3 Figure 3. Graphical representation of model (3) for a=3, b=1Yet another model is a polynomial of order 3, displayed in Figure 4:Eqn 4Figure 4. Graphical representation of model (4), for a=2, b=3, c=4.So, typica
7、lly, models contain the independent (the x-value) and the dependent variable (the y-value) and next to that the parameters (a, b and c in the above examples). It is the parameters that we are after, they contain specific information about the underlying process that we attempt to describe with our m
8、odel. To generalize, an equation can be written as:Eqn 5in which represents parameters (a, b, c in the above examples), represents the independent variable (the x-value), and the dependent or response value (the y value). Equation (5) is the notation for a general model that you often find in statis
9、tical literature.How can we find a suitable model for our problems? As stated before, a model should reflect a quantitative relation between an independent and a response variable. We are very much dependent on experimental measurements; there is no general theory that predicts how relations should
10、be in foods. (This is not to say that theory is not important, it definitely is as we also shall see in this course, but the available theory is not sufficient to predict quantitative relations.)Therefore, the first thing to do, always, is to simply plot the experimental data, and see how the relati
11、on looks like. This will give an important clue, for instance whether the relation is linear or non-linear, whether it looks like a hyperbola or a parabola, etc. The mathematical models we need in food-science problems are usually describing changes in time and/or space, in the form of:- Algebraic e
12、quations (such as the ones displayed in equations (1 4)- Differential equations of the form - Partial differential equations of the form Deterministic and stochastic modelsIf you look closely at such equations, you will notice that for fixed values of the parameters the same output y-value will alwa
13、ys be found for the same input x value. That makes them so-called deterministic models, there is no uncertainty involved, the output values are exactly determined by the input values. However, as mentioned before, we need to estimate parameters from experiments, and experimental values are always un
14、certain to some extent. It is very essential to be able to estimate this experimental uncertainty; fortunately we can do that from repeated experiments. Errors and errorsSo, the models that we are going to use are based upon experimental observations. Experimental observations always contain unexpli
15、cable and unavoidable error and therefore also the models based upon these experiments will be uncertain, and since we want to use models to predict, also our predictions will contain a certain error! This is the very reason why we insist so much on discussing errors: they are always there and we ne
16、ed to characterize them quantitatively. At this stage it is important to spend attention to two concepts in relation to measurements: accuracy and precision. Accuracy is about how close measurements are to the real value. This is a philosophically interesting statement because we do not know the rea
17、l value, we are, after all, trying to estimate it! Nevertheless, there is a real, true value that we want to approximate as close as possible; if there is a constant, systematic deviation between the actual measurement and the (unknown) true value this is called a systematic error. Accuracy can only
18、 be obtained by carefully calibrating instruments, solutions, weighting measurements, etc., using materials with a known composition. This is the full responsibility of the researcher who does the measurement. Statistics cannot correct for systematic errors! The other concept is of precision: how cl
19、ose are the measured values together, or how strongly are they dispersed? These are random errors that occur by chance and uncontrolled. You can have an accurate but imprecise measurement, but also an imprecise but accurate measurement. Figure 5 gives an impression of the possibilities. The statisti
20、cal methods that we discuss in this course are about precision; we assume that accurate results have been obtained or reported in literature, i.e., without systematic error. We also assume that random errors are measured and reported; they are easily identified by doing repetitions. Unfortunately, m
21、any literature sources do not report these errors very clearly, which may be considered a capital sin in science!Figure 5. A schematic representation of accuracy and precisionWe can make a further division in the nature of random errors, namely homoscedastic and heteroscedastic errors: see Figure 6.
22、Figure 6. Schematic representation of homoscedastic (A) and heteroscedastic errors (B, C).Homoscedastic errors are errors that do not differ with the x-values, or in other words, the errors are approximately constant and independent of the independent variable x; heteroscedastic errors do depend on
23、the x-value. In Figure 7B the case is shown that the errors in y increase with increasing x. It could, in principle, be also the other way around, as shown in Figure 7C. It is very instructive just to plot results so that it is immediately obvious whether the errors are homoscedastic or not. The imp
24、ortance of knowing this becomes clear when we discuss regression techniques in a moment.How to estimate errors?An important consequence of experimental uncertainty is that errors are carried over to the parameters. So, we are in need of models that are able to express this uncertainty. That brings u
25、s into the area of stochastic models. Equation (5) needs to be extended as follows:Eqn 6The term represents the “error” term (it is not an error in the sense that something is wrong, it represents the uncertainty we are faced with). How can we put a number to this error term? Quite simply from exper
26、iments (after having excluded systematic errors, see above): repeating an experiment (not just the measurement but the whole treatment!) in exactly the same way shows the uncontrollable variation, and by doing that a few times the mean and standard deviation can be calculated (preferably more than t
27、wo times because a standard deviation based on two measurements is quite unreliable). Experimental uncertainty typically reduces by doing more repetitions (see below). Next to experimental uncertainty there is also biological variation: foods are natural materials and their composition varies: two a
28、pples that come from the same tree will vary nevertheless. This is unavoidable and we cannot reduce it, but we can characterize it via statistics! This leads to well-known parameters as the mean, standard deviation, standard error, confidence intervals. Lets show some examples. Suppose a food chemis
29、t in Lab A determined the calcium content of the same milk in five repetitions (n=5), while another food chemist did the same thing in Lab B on the same sample of milk but now with ten repetitions (n=10):Sample noCalcium content (mg/100g milk)Lab ACalcium content (mg/100g milk)Lab B1117.7121.42119.5
30、114.63121.3116.14117.6120.95110.2109.36116.17117.38119.89116.610123.5Table 1. Determinations on the estimation of the calcium content in milk in two different labsNote that this experimental setup gives information about the imprecision of measuring. It does not give information about biological var
31、iation because the same batch of milk was analysed. For information about biological variability of calcium in milk, one would have to analyse samples from different batches; the variability observed in such a case would be the sum of the contribution of the biological variability + the experimental
32、 uncertainty. For the present example, the sample mean is:Eqn 7This results in 117.3 mg/100 g milk for Lab A and 117.6 mg/100 g milk for Lab B. Note that the means are not exactly the same, but quite close.The sample variance v is:Eqn 8This results in v= 17.8 for Lab A and 16.6 for Lab B.The sample variance is actually the sum of squares divided by the degrees of freedom; one degree of freedom is lost because the mean is esti
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1