Math 286 Notes
Spring 2002
Simulation Exercise: Customers enter a bank with one teller and form a queue if the teller is busy. When the teller finishes serving a customer, the person who has been waiting the longest steps up to be served. One (long) day, the following times are observed for 13 customers:
Arrival Times (minutes): 12,31,63,95,99,154,198,221,304,346,411,455,537
Service Times (minutes): 40,32,55,48,18,50,47,18,28,54,40,72,12
(a) Determine the departure times for each of these 13 customers. What is the average time spent in the bank?
(b) Repeat (a) in the case where there are two tellers and the person first in line goes to the first available teller.
Review. Sample Space, Events, Axioms of Probability, Conditional Probability, Independence, Random Variables, pmf, pdf, cdf, Joint Distributions, Expectation, Variance, Covariance,
Discrete Distributions: Binomial(n,p), Poisson(l)
Continuous Distributions: Uniform(a,b), Normal(m,s^{2}), Exponential(l)
Poisson Process – characterization. (Nonhomogeneous Poisson Process)
BinomialPoissonExponentialGamma
· Binomial(n,p) can be approximated by Poisson(np) under certain conditions
· If Number of Arrivals is Poisson(l), Interarrival Times are Exponential(l).
· The sum of n independent Exponential(l) variables is Gamma(n, l)
Markov’s Inequality, Chebyshev’s Inequality, Weak Law of Large Numbers
Central Limit Theorem
Conditional Expectation, Conditional Variance
Random Numbers – flipping coins, rolling dice, shuffling cards
Pseudorandom numbers
Multiplicative congruential method: x_{0} = seed; x_{n} = ax_{n1} modulo m
Example: 32 bit word machine, m = 2^{31}1 and a = 7^{5}.
Mixed congruential generator: x_{n} = (ax_{n1} + c)modulo m
Calculators and Computers can generate random numbers for you.
Evaluating Integrals. If U is a random variable, the value of E[g(U)] is
By the law of large numbers, E[g(U)] is approximated by the average [g(U_{1})+g(U_{2})+…+g(U_{k})]/k as k gets large.
Integrals over other intervals can be handled by a change of variable. Moreover, the same methods can be applied to approximate multidimensional integrals.
Example. Estimating p/4
1/2
Inverse Transform Method:
Given pmf for discrete rv X: P{X=x_{j}} = p(x_{j})= p_{j} , j = 0,1,2,…
Generate a random number U (from Uniform(0,1))
Set X = x_{0} if U < p_{0}
X = x_{j} if p_{0} + …+ p_{j1} ² U < p_{0} + …+ p_{j }
A graph of the cdf provides a useful representation of this method.
Show it gives the proper pmf for X
Algorithm: Generate a random number U
If U < p_{0} , set X = x_{0} and stop
If U < p_{0} + p_{1} , set X = x_{1} and stop
If U < p_{0} + p_{1} + p_{2}, set X = x_{2} and stop
etc.
It is most efficient to work with p_{i}’s of decreasing order. Why?
Examples.
1. Generating a random permutation: Use Int[kU] + 1
Generating random subsets is important in tests of medical treatments
2. Geometric random variables. P{X=i} = pq^{i1}
Note that if F is the cdf, F(j) = 1 – q^{j} Why? Thus, we can
Generate a random number U
Set X equal to the value of j for which 1 – q^{j1} ² U < 1 – q^{j}
Equivalently, X = Int[log(1U)/log(q)] + 1, or X = Int[log(U)/log(q)] + 1
3. Generating Poisson(l)
1. Generate a random number U
2. i= 0, p = e^{}^{l}, F=p
3. If U < F, set X = i and stop
4. p = lp/(i+1), F = F + p, i = i + 1
5. Go to step 3
4. Generating Binomial(n,p)
1. Generate a random number U
2. c = p/(1p); p = e^{}^{l}; F = p
3. If U < F, set X = i and stop
4. p = lp/(p+1), F = F + p, i = i + 1
5. Go to step 3
Acceptance – Rejection Method.
Given simulations of Y with P{Y = j} = q_{j} : j = 0,1,2,...
Produce simulations of X with P{X = p_{j} : j = 0,1,2,...
Let c be a constant such that p_{j}/q_{j} ² c for all j such that p_{j} > 0.
1. Simulate Y
2. Generate a random number U
3. If U < p_{Y}/cq_{Y} set X = Y and stop
4. Return to step 1
Theorem. For the above algorithm, P{X = j} = p_{j}. The number of iterations needed to obtain X is geometric with mean c.
Suppose X = aX_{1} + (1a)X_{2}, where each X_{i} is a random variable and 0²a²1.
To generate a value of X:
1. Generate a random number U
2. If U < a, generate X_{1} and set X = X_{1}
3. Otherwise, generate X_{2} and set X = X_{2}.
This method can be generalized to arbitrary finite sums X = Sa_{j}X_{j} with nonnegative coefficients that sum to 1.
The Inverse Transform Algorithm. Let U be Uniform(0,1). For any continuous distribution function F, the random variable X = F^{1}(U) has cdf F.
Examples. (a) Exponential (b) Poisson from Exponential (c) Gamma from Exponential
Rejection Method. Given a method for generating Y with pdf g(y), generate X with pdf f(x). Let c be a constant such that f(y)/g(y) ² c for all y.
1. Generate Y having density g
2. Generate a random number U
3. If U ² f(Y)/cg(Y), set X = Y and stop.
4. Return to step 1
Theorem. (a) The random variable X generated by the rejection method has pdf f.
(b) The number of iterations of the algorithm that are needed is a geometric random variable with mean c.
Examples. (a) f(x) = 20x(1x)^{3}, 0 < x < 1;_{ }g(x) = 1 for 0 < x < 1.
(b) Gamma from Exponential
(c) Normal from Exponential
The Polar Method.
1.
S = _{}+ _{}
Generate random numbers U_{1} and U_{2}
2. Set V_{1} = 2U_{1} – 1, V_{2} = 2U_{2} –1, and
3. If S > 1, return to Step 1
4. Set X = ((2logS)/S)^{1/2}V_{1} and Y = ((2logS)/S)^{1/2}V_{2}
X = _{}V_{1} and
Y = _{}V_{2}
Poisson Process. Generating arrivals between 0 and T.
1. t = 0, I = 0
2. Generate a random number U
3. t = t – (lnU)/l. If t > T, stop
4. I = I+1, S(I) = t
5. Go to step 2
Nonhomogeneous Poisson Process with intensity l(t).
1. t = 0, I = 0
2. Generate a random number U
3. t = t – (lnU)/l. If t > T, stop
4. Generate a random number U
5. If U ² l(t)l, set I = I+1, S(I) = t
6. Go to step 2
Chapter 6. Discrete Event Simulation
Single Server Queueing System
Worksheet.
T = 2
Arrivals: Nonhomogeneous Poisson l(t) = 2+t, 0 £ t < 1; l(t) = 4 – t, 1 £ t £ 2.
Service times: pdf g(x) = 4e^{4x}(4x).
Random Number Table
U_{1} 
.85 
.61 
.09 
.77 
.70 
.22 
(1/3)ln(U_{2}) 
.40 
.27 
.43 
.51 
.33 
.65 
(1/4)ln(U_{3}) 
.12 
.25 
.52 
.18 
.08 
.14 
Event Time t 
n,
N_{A}, N_{D} 
t_{A},
t_{D} 
A(i),
D(i) 
0 

t_{A} = .67, t_{D} = ¥ 

























Two Servers in Series.
Time variable: t
System State: (n_{1},n_{2}), n_{i} = # in ith queueing system, i = 1,2
Counters: N_{A} = number of arrivals by time t
N_{D} = number of departures by time t
Output: A_{1}(k) = arrival time of customer k
A_{2}(k) = arrival time of customer k at server 2
D(k) = departure time of customer k
Two Parallel Servers
Inventory Model:
Example 
Abstract Model 
You sell widgets for $10 each 
r = price per unit 
Customers arrive according to Poisson(4) 
l = Poisson arrival rate 
Customer order is Uniform on {1,2,...20}^{*} 
G(x) = cdf for size of individual order 
If inventory < 20, restock to total of 50 
(s,S) = (20,50) 
Restocking takes 0.3 hrs 
L = time until restock is delivered 
Storage costs $.25 per unit per hour 
h = unit storage cost per unit time 


System State: (x,y), where x = # units in inventory, y = # units on order
Counters: C = total restocking costs, H = total inventory costs, R = revenue (all at time t)
Events: t_{0} = arrival time of next customer; t_{1} = time of next restock delivery
^{*} If a customer requests more than the inventory in stock, he buys the entire inventory and leaves (no waiting).
Worksheet. Assume you start with 30 units in stock and use the following random table.
U_{1} 
.85 
.61 
.29 
.47 
.70 
.22 
.12 
.50 
.24 

.25lnU_{2} 
.12 
.24 
.52 
.18 
.08 
.14 
.71 
.22 
.31 

Event time 
SS 
C 
H 
R 
t_{0} 
t_{1} 
.12 
(12,38) 
0 
0.90 
180 
.36 
.42 
.36 
(0,50) 
0 
1.62 
300 
.88 
.42 
.42 
(38,12) 
240 
1.62 
300 
.88 
.66 



































TRUE BASIC WORKSHEET
10 PRINT “THIS PROGRAM SUMS THE HARMONIC SERIES”
20 PRINT “TO A SPECIFIED NUMBER OF TERMS”
30 S = 0
40 PRINT “ENTER THE NUMBER OF TERMS”
50 INPUT N
60 FOR J = 1 TO N
70 S = S + 1/J
80 NEXT
90 PRINT “THE SUM IS”, S
100 END
Program for #12 on page 43.
10 PRINT "Enter number of trials"
20 INPUT K
30 E = 0
40 RANDOMIZE TIMER
50 FOR I = 1 TO K
60 N = 0
70 S = 0
80 U = RND
90 N = N + 1
100 S = S + U
110 IF S < 1 THEN GOTO 80
120 E = E + N
130 NEXT
140 PRINT "Average N"
150 PRINT E/K
160 END
10 PRINT “Average number of coin flips needed to obtain both
a head and a tail”
20 PRINT “Enter the number of simulations desired”
25 INPUT K
30 J = 0
35 X = 0
40 Y = 0
50 N = 0
55 S = 0
60 RANDOMIZE
70 U = RND
80 I = INT(2*U)
90 IF I = 0 THEN X = 1
100 IF I = 1 THEN Y = 1
110 N = N + 1
120 IF X*Y = 0 THEN GOTO 60
130 S = S + N
135 J = J + 1
140 IF J < K THEN GOTO 35
150 AVG = S/K
160 PRINT AVG
10 PRINT “THIS PROGRAM SOLVES NUMBER 22 ON PAGE 84”
20 LET t = 0
30 LET N = 0
40 RANDOMIZE
50 LET U = RND
60 LET t = t – 0.2*LOG(U)
70 IF t > 1, THEN GOTO 110
80 LET U = RND
90 LET N = N + INT(21*U) + 20
100 GOTO 50
110 PRINT “NUMBER OF FANS IS”, N
120 END
10 PRINT “THIS PROGRAM SOLVES NUMBER 17 ON PAGE 84”
20 RANDOMIZE
30 DIM U(2)
40 LET U(1) = 2*(RND) – 1
50 LET U(2) = 2*(RND) –1
60 LET S = U^2 + V^2
70 IF S > 1, THEN GOT 30
80 DIM X(2)
90 FOR J = 1 TO 2
100 LET X(J) = SQR(2*(LOG(S))/S)*U(J)
110 NEXT J
120 PRINT “X(1),X(2) =”, X(1), X(2)
130 END
EXERCISE: A single die is rolled until all six numbers appear. Use simulation to estimate the expected number of rolls.
Chapter 7. Statistical Analysis of Simulated Data
Simulation produces values X_{1},X_{2},… iid with mean q and standard deviation s^{2}.
Sample Mean: m_{k} = (X_{1} + … + X_{k})/k
Sample Variance: S_{k}^{2} = [(X_{1}m_{k})^{2} + … + (X_{k}m_{k})^{2}]/(k1)
Proposition: E[S_{k}^{2}] = s^{2}
Since E[m_{k}  q]^{2} = s^{2}/k, m_{k} is a “good” estimator of q when s^{2}/k is small.
Two measures of accuracy:
Chebyshev: P{½m_{k}  q½ > cs/Ök} ² 1/c^{2}
Central Limit Theorem: P{½m_{k}  q½ > cs/Ök} » 2[1  F(c)]
When to stop generating new data.
1. Choose an acceptable value d for the standard deviation of m_{k}.
2. Generate at least 30 values
3. Continue
to generate values, stopping when S_{k} < s/Ök.
4. Estimate
q
by m_{k}.
If the X_{i}’s come from a binomial distribution, can replace S_{k} by m_{k}(1m_{k}) as an estimator of s.
Recursive Calculation of m_{k} and S_{k}^{2} .
m_{k+1} = (km_{k} + X_{k+1})/(k+1)
S_{k+1}^{2} = S_{k}^{2}(k1)/k + X_{k+1}^{2}/k  ( 1+1/k) m_{k+1}^{2} + m_{k}^{2}
Examples.
1.
Simulate the expected number of rolls of two dice needed
to obtain all possible sums 212. We want to be 95% certain that our estimate
differs from the true value by at most 1. Generate simulated values until 1.96S_{k}/Ök < 1.
2.
Simulate the probability of a randomly thrown Buffon
needle hitting a grid line. We want a 90% confidence interval for estimate of
length at most l
= 0.2. Keep throwing the needle until
1.645(m_{k}(1m_{k})_{
}/Ök < 0.1
Bootstrapping Technique for Estimating Mean Square Errors
X_{1},…,X_{n} iid with distribution F (cdf)
The problem is to estimate q(F) (some parameter of the distribution)
Estimator g(X_{1},…,X_{n})
Mean Square Error
MSE(F) = E_{F}(g(X_{1},…,X_{n})  q(F))^{2}
Example: q(F) = E[X_{i}] and g(X_{1},…,X_{n}) = (X_{1} + … + X_{n})/n
Assume X_{i} = x_{i}, for 1 ² i ² n.
Define F_{e}(x) = (number of i such that x_{i} ² x)/n (empirical distribution)
By the strong law of large numbers, F_{e}(x) ® F(x) as n ® ¥, with probability 1.
Thus, q(F_{e}) should be close to q(F) and
MSE(F_{e}) = E_{Fe}(g(X_{1},…,X_{n})  q(F))^{2} = bootstrap approximation to MSE(F)
Examples.
1. q(F) = E[X_{i}] and g(X_{1},…,X_{n}) = (X_{1} + … + X_{n})/n = m_{n}
Then MSE(F_{e}) = S(x_{i}  m_{n})^{2}/n^{2} which is close to S^{2}/n = S(x_{i}  m_{n})^{2}/n(n1)
2. In a single server queueing system, we want to estimate the average amount of time spent in the system = q(F). Use g(X_{1},...,X_{n}) = sum of waiting times/number of customers.
If the data values X_{i} are distinct, the the empirical distribution F_{e} puts equal weight 1/n on each point. The value of q(F_{e}) is usually easy to calculate, but MSE(F_{e}) requires n^{n} calculations. Therefore we estimate MSE(F_{e}) using simulation.
Example. Inclass single server queueing system. Arrivals were Poisson with parameter 1/8 (average interarrival time 8 seconds) and service times (in seconds) were uniform on [0,20]. We obtained the following data for time in system, W, with 10 arrivals and then with 3 arrivals.
i 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
Mean 
x_{i} 
3 
5 
9 
11 
15 
17.5 
21 
29 
30 
37 
17.75 
y_{i} 
3 
5 
10 







6 
Parameter
q(F) 
Mean q (10 customers)

Variance s^{2} (3 customers) 
Estimator g(X_{1},...,X_{n}) 
(X_{1}+...+X_{10})/10 (17.75) 
S^{2}= [(Y_{1}6)^{2}
+(Y_{2}6)^{2}+(Y_{3}6)^{2}]/2 
MSE(F) = E_{F}[g(X_{1},...,X_{n})q(F)]^{2} 
E_{F}[(X_{1}+...+X_{10})/10
 q(F)]^{2} 
E_{F}[[(Y_{1}6)^{2}
+(Y_{2}6)^{2}+(Y_{3}6)^{2}]/2  s^{2}]^{2} 
MSE(F_{e}) = E_{Fe}[g(X_{1},...,X_{n})q(F_{e})]^{2} 
E_{Fe}[(X_{1}+...+X_{10})/10
– 17.75]^{2} (11.68) 
E_{Fe}[[(Y_{1}6)^{2} +(Y_{2}6)^{2}+(Y_{3}6)^{2}]/2 – 26/3]^{2} 
Note that for the 10 customer example we can calculate MSE(F_{e}) = Var(X_{1}+...+X_{10})/10) = (1/100)[ (X_{1}17.75)^{2} + ... + (X_{10}17.75)^{2}] = 11.68. An exact calculation in the 3 customer example would involve 3^{3} = 27 cases (in general, n^{n} cases as noted above). When this number is prohibitive, we can estimate MSE(F_{e}) by simulation. In the 3 customer example, we might simulate 10 values of M = [(Y_{1}6)^{2} +(Y_{2}6)^{2}+(Y_{3}6)^{2}]/2 – 26/3]^{2} and take the average. For example, to simulate one value of M, we can choose a number at random from {1,2,3} three times, with repetitions allowed. Say we obtain the sample 1,2,1. From our table above, this corresponds to Y_{1} = 3, Y_{2} = 5, Y_{3} = 2. The corresponding value of M is M_{1} = [(36)^{2} +(56)^{2}+(36)^{2}]/2 – 26/3]^{2} = (5/6)^{2}.
Aggregate Loss Models. The standard model is:
S = X_{1} + X_{2} + … + X_{N} where N,X_{1},X_{2},… are independent and the X_{j}’s identically distributed. In this case, can assume losses indexed in order.
T_{j} = time of jth loss (occurred, reported, paid)
If timing of losses is important, we need to know the joint distribution of
(T_{1},T_{2},…,X_{1},X_{2},…)
Assumptions can fail in accounting for time (value of money) and coverage modifications.
Example (time value of money). S = present value of all payments made on losses occurring in the next year on a policy issued today.
T_{j} = C_{j} + L_{j} = time of event + time from occurrence to payment.
C_{j} and L_{j} are independent and L_{j}’s are independent of each other.
C_{j} – C_{j1} are independent, exponential with mean 0.2 years
X_{j} = amount paid at T_{j} on the loss occurring at time C_{j}.
X_{j} and C_{j } are independent; X_{j} and L_{j} are positively correlated.
V_{t} = value invested today to accumulate to 1 in t years, independent of X_{j},C_{j} and L_{j}.
Example (Outofpocket maximum). Deductible d on individual losses. Maximum of u on total paid by policy holder for the year. Model insurer’s aggregate payments.
X_{j} = amount of jth loss
W_{j} = X_{j} Ù d = amount paid by policyholder
Y_{j} = X_{j} – W_{j} = amount paid by insurer
R = W_{1} + W_{2} + … _{ }+ W_{N} = total paid by policyholder without maximum u
R_{u} = R Ùu = amount actually paid by policyholder.
S = X_{1} + X_{2} + … + X_{N}
T = S – R_{u} = aggregate amount paid by insurer.
S, R_{u} based on iid distributions, but they are not independent, so
individual distributions can’t be combined to obtain a distribution for T.
Simulation Approach.
1. Build a model for S as a function of r.v.’s X,Y,Z,…
2. Generate values for the r.v.’s to get a value of S
3. Repeat many times to approximate cdf F(x) for S with F_{e}(x).
Two concrete examples.
1. C_{j} – C_{j1} is Exponential(0.2)
L_{j} is Uniform(0,1)
X_{j} = amount paid at T_{j} = L_{j}
V_{t} = (1.1)^{t}
0.2ln(U) 
.24 
.14 
.49 
.14 
U 
.75 
.19 
.26 
.96 
Assuming we only count payments made during one year,
S = .75(1.1)^{.99} + .19(1.1)^{.57} = .85
Generate 9 other simulations of S: .50, .65, .68, .71, .92, 1.04, 1.26, 1.28, 1.32
F_{e} is the distribution that assigns probability 1/10 to each of our 10 sample points.
Estimate F(.90) so that you are 95% sure you are within 1% of the true value.
F_{e}(.90) = 5/10
For larger samples, the estimator is P_{n}/n, where P_{n} is the number of sample values ² .9.
This is a binomial proportion with mean F(.9) and variance F(.9)[1 – F(.9)]/n.
We want 0.95 = P(.99F(.9) < P_{n}/n < 1.01F(.9))
Subtract the mean and divide by the standard deviation to get
Where 1.96 is the 97.5th percentile for the standard normal distribution. The inequality can be manipulated to obtain
The simulation can be run until this condition is met. Note that we expect P_{n}/n
to be around ½, so we expect (nP_{n})/P_{n} to be around 1.
With this estimate, n ³ 38,416.
2. Claims on an insurance policy occur with an interarrival time that is exponential with mean ¼ year. The claim amount is uniform on the interval (0,2). A deductible amount of 0.5 applies to each claim, but the maximum paid by the policy holder is 1.5 per year. Use the following table to answer the questions below.
ln(U) 
.165 
.120 
.681 
1.32 
2.16 




U_{2} 
.26 
.74 
.06 
.33 





Sims 
1.5 
.90 
.75 
1.4 
1.0 
.92 
1.2 
1.28 
1.05 
(a) Simulate the total claims for one year. (1.16)
(b) Use the 9 additional simulations in row 3 to calculate a sample variance. (.0678)
(c) Estimate the number of additional simulations needed to give you 95% confidence that the sample mean is within 1% of the true value. (1319)
Exercise. Claims arrive at an insurance company according to a nonhomogeneous Poisson process with intensity function l(t) that linearly increases from 0 to 6 over the interval [0,1/2] and then decreases linearly back to 0 at t = 1 (year). The amounts of the claims are independent and uniform on the interval [10,20]. The time between the arrival of a claim and payment of the claim is 2 weeks if the claim is less than 15 and 4 weeks otherwise. The insurance company pays a maximum of 50 for the claims that arrive in any one year. If the interest rate is 8% (discount factor 1/1.08), use the random samples in the table below to generate the present value of payments on claims incurred in a single year. You should first use the thinning method for generating nonhomogeneous Poisson.
(1/6)ln(U_{1}) 
.06 
.29 
.09 
.11 
.23 
.01 
.10 
.10 
.27 




U_{2} 
.77 
.38 
.59 
.12 
.33 
.34 
.28 
.38 
.62 
.60 
.40 
.29 
.28 
Math
286/366
Review for Final
Spring, 2001
Pseudorandom numbers: x_{n} = ax_{n1`}+b
Inverse Transform: discrete, exponentialPoisson
Rejection Method: X has pdf f(x), Y has pdf g(y) – easy to sample, f(y)/g(y) ² c.
1. Generate sample y from Y and a random number u
2. If u ² f(y)/cg(y), accept X = y.
Sample mean, sample variance: Confidence interval for mean: `X_{k} @ z_{a/2} S_{k}/Ök
Bernoulli trials: estimate variance with [X_{k}(1 X_{k})/k]^{1/2}
Recursion
formulas:
Bootstrap: MSE = E[g(X_{1},…,X_{n}) – E(g(X_{1},…,X_{n}))]^{2}
F_{e} = empirical distribution from sample
(note: sample variance ¹ variance of empirical distribution.
MSE » MSE_{Fe} can be estimated by simulation
Goodness of Fit
Discrete distributions, use Chisquare
Continuous distributions, use KolmogorovSmirnov
Missing parameters: estimate from sample data
Two sample rank sum statistic
Multisample rank sum statistic (KruskalWallis)
Testing for nonhomogeneous Poisson distribution
Loss Models
Standard model: S = X_{1} + X_{2} + … + X_{N}; N, X_{1},X_{2},… independent, X_{1},X_{2},… iid
Complications: coverage modifications, time considerations
1. Build a model for S as a function of random variables
2. Sample random variables to obtain a value for S
3. Repeat many times to approximate F_{S} with empirical distribution.
Two examples