Validating input-production transformations
The model is viewed as an input-output transformation for these tests. The validation test consists of comparing the outputs of the system under consideration with the model outputs for the same set of input conditions. Data recorded while observing the system must be available to perform this test. The output of the model that is of primary interest should be used as a measure of performance. For example, if the system being considered is a fast food unit where the input to the model is customer arrival time and the output measure of performance is average customer time in line, then the actual arrival time and time spent in line for customers in the unit would be recorded. The model would be run using actual arrival times and the average time from the online model would be compared to the average actual time spent in line using one or more tests.
Statistical hypothesis test using the t-test can be used as a basis to accept the model as valid or reject it as invalid.
The hypothesis to be tested is.
against.
The test is performed for a given sample size and significance level or α. To perform the test, an n number of statistically independent runs of the model are performed and an average or expected value, E(Y), is produced for the variable of interest. Then the test statistic, t is calculated for the α, n, E (Y) and the observed value for the system μ.
Yeah.
rejects H, the model needs adjustment.
There are two types of errors that can occur when using hypothesis testing, rejecting a valid model called Type I error or “model builders risk” and accepting an invalid model called Type II error, β, or “model user risk.” The significance level or α is equal to the probability of type I error. If α is small, then rejecting the null hypothesis is a strong conclusion. For example, if α = 0.05 and the null hypothesis is rejected, there is only a 0.05 probability of rejecting a model that is valid. Decreasing the probability of a type II error is very important. The probability of correctly detecting an invalid model is 1 - β. The probability of a type II error depends on the sample size and the actual difference between the sample value and the observed value. Increasing the sample size decreases the risk of a type II error.
A statistical technique in which the amount of model precision is specified as a range has recently been developed. The technique uses hypothesis testing to accept a model if the difference between a model's variable of interest and a system's variable of interest is within a specific precision range.[6] One requirement is that both the system data and the model data be approximately Normally Independent and Identically Distributed (NIID). The t-test statistics are used in this technique. If the model mean is μ and the system mean is μ then the difference between the model and the system is D = μ - μ. The hypothesis to be tested is whether D is within the acceptable range of precision. Let L = the lower limit of precision and U = upper limit of precision. So.
against.
It is to be tested.
The operating characteristic (OC) curve is the probability that the null hypothesis is accepted when it is true. The OC curve characterizes the probabilities of type I and II errors. Risk curves for the model builder and model user can be developed from the OC curves. In the risk curves you can easily see the comparison of curves with fixed sample size trade-offs between the risk of the model creator and the risk of the model user. If the model creator risk, model user risk, and upper and lower limits for the accuracy range are specified, the required sample size can be calculated.
Confidence intervals can be used to evaluate whether a model is “close enough” to a system for some variable of interest. The difference between the known model value, μ0, and the system value, μ, is checked to see if it is less than a value small enough for the model to be valid with respect to that variable of interest. The value is denoted by the symbol ε. To perform the test, a number, n, statistically independent runs of the model are performed and a mean or expected value, E (Y) or μ is produced for the simulation output variable of interest Y, with a standard deviation S. A confidence level, 100 (1-α), is selected. An interval, [a, b], is constructed by.
Where.
It is the critical value of the t-distribution for the given level of significance and n-1 degrees of freedom.
If the statistical assumptions cannot be met or there is insufficient data for the system, graphical comparisons of the model results with the system results can be used to make subjective decisions, however, other objective tests are preferable.