Math 286 Notes

Spring 2002

 

Simulation Exercise:  Customers enter a bank with one teller and form a queue if the teller is busy. When the teller finishes serving a customer, the person who has been waiting the longest steps up to be served. One (long) day, the following times are observed for 13 customers:

Arrival Times (minutes): 12,31,63,95,99,154,198,221,304,346,411,455,537

Service Times (minutes): 40,32,55,48,18,50,47,18,28,54,40,72,12

 

(a)    Determine the departure times for each of these 13 customers. What is the average time spent in the bank?

(b)   Repeat (a) in the case where there are two tellers and the person first in line goes to the first available teller.

 

Review. Sample Space, Events, Axioms of Probability, Conditional Probability, Independence, Random Variables, pmf, pdf, cdf, Joint Distributions, Expectation, Variance, Covariance,

 

Discrete Distributions: Binomial(n,p), Poisson(l)

 

Continuous Distributions:  Uniform(a,b), Normal(m,s2), Exponential(l)

 

Poisson Process – characterization.  (Nonhomogeneous Poisson Process)

 

Binomial-Poisson-Exponential-Gamma

·        Binomial(n,p) can be approximated by Poisson(np) under certain conditions

·        If Number of Arrivals is Poisson(l), Interarrival Times are Exponential(l).

·        The sum of n independent Exponential(l) variables is Gamma(n, l)

 

Markov’s Inequality, Chebyshev’s Inequality, Weak Law of Large Numbers

 

Central Limit Theorem

 

Conditional Expectation, Conditional Variance

 

Random Numbers – flipping coins, rolling dice, shuffling cards

 

Pseudorandom numbers

            Multiplicative congruential method:  x0 = seed;   xn = axn-1­ modulo m

            Example:  32 bit word machine,  m = 231-1  and a = 75.

 

            Mixed congruential generator: xn = (axn-1­ + c)modulo m

 

Calculators and Computers can generate random numbers for you.

 

Evaluating Integrals.  If U is a random variable, the value of E[g(U)] is

 


By the law of large numbers, E[g(U)] is approximated by the average [g(U1)+g(U2)+…+g(Uk)]/k  as k gets large.

 


Integrals over other intervals can be handled by a change of variable. Moreover, the same methods can be applied to approximate multi-dimensional integrals.

 


Example.  Estimating p/4

 

1/2

 
 

 

 

 


Chapter 4.  Generating Discrete Random Variables

 

Inverse Transform Method: 

Given pmf  for discrete rv X: P{X=xj} = p(xj)= pj , j = 0,1,2,…

Generate a random number U (from Uniform(0,1))

Set       X = x0  if  U < p0

X = xj  if  p0 + …+ pj-1 ² U < p0 + …+ pj 

A graph of the cdf provides a useful representation of this method.

            Show it gives the proper pmf for X

 

Algorithm:  Generate a random number U

            If U < p0 , set X = x0 and stop

            If U < p0 + p1 , set X = x1 and stop

            If U < p0 + p1 + p2, set X = x2 and stop

            etc.

 

It is most efficient to work with pi’s of decreasing order. Why?

 

Examples. 

1.      Generating a random permutation:  Use Int[kU] + 1

Generating random subsets is important in tests of medical treatments

2.      Geometric random variables.  P{X=i} = pqi-1

Note that if F is the cdf, F(j) = 1 – qj  Why? Thus, we can

      Generate a random number U

      Set X equal to the value of j for which 1 – qj-1 ² U < 1 – qj

      Equivalently, X = Int[log(1-U)/log(q)] + 1, or X = Int[log(U)/log(q)] + 1

3.      Generating Poisson(l)

1.      Generate a random number U

2.      i= 0, p = e-l, F=p

3.      If U < F, set X = i  and stop

4.      p = lp/(i+1), F = F + p, i = i + 1

5.      Go to step 3

 

4.      Generating Binomial(n,p)

1.      Generate a random number U

2.      c = p/(1-p);  p = e-l;  F = p

3.      If U < F, set X = i and stop

4.      p = lp/(p+1), F = F + p,  i = i + 1

5.      Go to step 3

 

Acceptance – Rejection Method.

            Given simulations of  Y with P{Y = j} = qj : j = 0,1,2,...

            Produce simulations of X with P{X = pj : j = 0,1,2,...

Let c be a constant such that pj/qj ² c  for all j such that pj > 0.

 

1.      Simulate Y

2.      Generate a random number U

3.      If U < pY/cqY set X = Y and stop

4.      Return to step 1

 

Theorem.  For the above algorithm, P{X = j} = pj. The number of iterations needed to obtain X is geometric with mean c.

 

Composition Approach

            Suppose X = aX1 + (1-a)X2, where each Xi is a random variable and 0²a²1.

To generate a value of X:

1.      Generate a random number U

2.      If U < a, generate X1 and set X = X1

3.      Otherwise, generate X2 and set X = X2.

 

This method can be generalized to arbitrary finite sums   X = SajXj with nonnegative coefficients that sum to 1.

 

Chapter 5: Generating Continuous Random Variables

 

The Inverse Transform Algorithm.  Let U be Uniform(0,1). For any continuous distribution function F, the random variable X = F-1(U) has cdf F.

 

Examples. (a) Exponential   (b) Poisson from Exponential  (c) Gamma from Exponential

 

Rejection Method.  Given a method for generating Y with pdf g(y), generate X with pdf f(x).  Let c be a constant such that  f(y)/g(y) ² c for all y.

1.      Generate Y having density g

2.      Generate a random number U

3.      If U ² f(Y)/cg(Y), set X = Y and stop.

4.      Return to step 1

 

Theorem. (a) The random variable X generated by the rejection method has pdf f.

(b)     The number of iterations of the algorithm that are needed is a geometric random variable with mean c.

 

Examples.  (a)  f(x) = 20x(1-x)3,  0 < x < 1;  g(x) = 1 for 0 < x < 1.

(b) Gamma from Exponential

(c)     Normal from Exponential

 

The Polar Method.

1.     

S = +

 
Generate random numbers U1 and U2

2.      Set V1 = 2U1 – 1, V2 = 2U2 –1, and

3.      If S > 1, return to Step 1

4.      Set  X = ((-2logS)/S)1/2V1  and  Y = ((-2logS)/S)1/2V2

X = V1  and Y = V2

 
 

 

 

 


Poisson Process.  Generating arrivals between 0 and T.

1.      t = 0, I = 0

2.      Generate a random number U

3.      t = t – (lnU)/l.  If t > T, stop

4.      I = I+1, S(I) = t

5.      Go to step 2

 

Nonhomogeneous Poisson Process with intensity l(t).

1.      t = 0, I = 0

2.      Generate a random number U

3.      t = t – (lnU)/l.  If t > T, stop

4.      Generate a random number U

5.      If U ² l(t)l, set I = I+1, S(I) = t

6.      Go to step 2

 

Chapter 6. Discrete Event Simulation

 

Single Server Queueing System

 

Worksheet.

T = 2

Arrivals: Non-homogeneous Poisson  l(t) = 2+t,  0 £ t < 1;  l(t) = 4 – t,  1 £ t £ 2.

Service times:  pdf  g(x) = 4e-4x(4x).

 

Random Number Table

U1

.85

.61

.09

.77

.70

.22

(-1/3)ln(U2)

.40

.27

.43

.51

.33

.65

(-1/4)ln(U3)

.12

.25

.52

.18

.08

.14

 

 

Event Time t

n, NA, ND

tA, tD

A(i), D(i)

0

 

tA = .67,  tD = ¥

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Two Servers in Series.

Time variable:  t

System State: (n1,n2),  ni = # in i-th queueing system, i = 1,2

Counters:  NA = number of arrivals by time t

                  ND = number of departures by time t

Output:  A1(k) = arrival time of customer k

              A2(k) = arrival time of customer k at server 2

               D(k) = departure time of customer k

 

Two Parallel Servers

 

Inventory Model:

 

Example

Abstract Model

You sell widgets for $10 each

r = price per unit

Customers arrive according to Poisson(4)

l = Poisson arrival rate

Customer order is Uniform on {1,2,...20}*

G(x) = cdf for size of individual order

If inventory < 20, restock to total of 50

(s,S) = (20,50)

Restocking takes 0.3 hrs

L = time until restock is delivered

Storage costs $.25 per unit per hour

h = unit storage cost per unit time

 

 

System State: (x,y),  where x = # units in inventory, y = # units on order

Counters:  C = total restocking costs, H = total inventory costs, R = revenue (all at time t)

Events:  t0 = arrival time of next customer;  t1 = time of next restock delivery

* If a customer requests more than the inventory in stock, he buys the entire inventory and leaves (no waiting).

 

Worksheet.  Assume you start with 30 units in stock and use the following random table.

 

U1

.85

.61

.29

.47

.70

.22

.12

.50

.24

 

-.25lnU2

.12

.24

.52

.18

.08

.14

.71

.22

.31

 

 

Event time

SS

C

H

R

t0

t1

.12

(12,38)

0

0.90

180

.36

.42

.36

(0,50)

0

1.62

300

.88

.42

.42

(38,12)

240

1.62

300

.88

.66

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


TRUE BASIC WORKSHEET

 

 

10 PRINT “THIS PROGRAM SUMS THE HARMONIC SERIES”

20 PRINT “TO A SPECIFIED NUMBER OF TERMS”

30 S = 0

40 PRINT “ENTER THE NUMBER OF TERMS”

50 INPUT N

60 FOR J = 1 TO N

70 S = S + 1/J

80 NEXT

90 PRINT “THE SUM IS”, S

100 END

 

Program for #12 on page 43.

 

10 PRINT "Enter number of trials"

20 INPUT K

30 E = 0

40 RANDOMIZE TIMER

50 FOR I = 1 TO K

60 N = 0

70 S = 0

80 U = RND

90 N = N + 1

100 S = S + U

110 IF S < 1 THEN GOTO 80

120 E = E + N

130 NEXT

140 PRINT "Average N"

150 PRINT E/K

160 END

 

10 PRINT “Average number of coin flips needed to obtain both a head and a tail”

20 PRINT “Enter the number of simulations desired”

25 INPUT K

30 J = 0

35 X = 0

40 Y = 0

50 N = 0

55 S = 0

60 RANDOMIZE

70 U = RND

80 I = INT(2*U)

90 IF I = 0 THEN X = 1

100 IF I = 1 THEN Y = 1

110 N = N + 1

120 IF X*Y = 0 THEN GOTO 60

130 S = S + N

135 J = J + 1

140 IF J < K THEN GOTO 35

150 AVG = S/K

160 PRINT AVG

 

 


10 PRINT “THIS PROGRAM SOLVES NUMBER 22 ON PAGE 84”

20 LET t = 0

30 LET N = 0

40 RANDOMIZE

50 LET U = RND

60 LET t = t – 0.2*LOG(U)

70 IF t > 1, THEN GOTO 110

80 LET U = RND

90 LET N = N + INT(21*U) + 20

100 GOTO 50

110 PRINT “NUMBER OF FANS IS”, N

120 END

 

10 PRINT “THIS PROGRAM SOLVES NUMBER 17 ON PAGE 84”

20 RANDOMIZE

30 DIM U(2)

40 LET U(1) = 2*(RND) – 1

50 LET U(2) = 2*(RND) –1

60 LET S = U^2 + V^2

70 IF S > 1, THEN GOT 30

80 DIM X(2)

90 FOR J = 1 TO 2

100 LET X(J) = SQR(-2*(LOG(S))/S)*U(J)

110 NEXT J

120 PRINT “X(1),X(2) =”, X(1), X(2)

130 END

 

EXERCISE:  A single die is rolled until all six numbers appear. Use simulation to estimate the expected number of rolls.

 

 


Chapter 7. Statistical Analysis of Simulated Data

 

Simulation produces values X1,X2,… iid with mean q and standard deviation s2.

Sample Mean: mk = (X1 + … + Xk)/k

Sample Variance: Sk2 = [(X1-mk)2 + … + (Xk-mk)2]/(k-1)

 

Proposition: E[Sk2] = s2

 

Since E[mk - q]2 = s2/k,  mk is a “good” estimator of q when s2/k is small.

Two measures of accuracy:

            Chebyshev: P{½mk - q½ > cs/Ök} ² 1/c2

            Central Limit Theorem:  P{½mk - q½ > cs/Ök} » 2[1 - F(c)]

 

When to stop generating new data.

1.      Choose an acceptable value d for the standard deviation of mk.

2.       Generate at least 30 values

3.      Continue to generate values, stopping when Sk < s/Ök.

4.      Estimate q by mk.

 

If the Xi’s come from a binomial distribution, can replace Sk by mk(1-mk) as an estimator of s.

 

Recursive Calculation of  mk and Sk2 .

mk+1 = (kmk + Xk+1)/(k+1)

Sk+12 = Sk2(k-1)/k  + Xk+12/k - ( 1+1/k) mk+12 + mk2 

 

Examples. 

1.      Simulate the expected number of rolls of two dice needed to obtain all possible sums 2-12. We want to be 95% certain that our estimate differs from the true value by at most 1. Generate simulated values until 1.96Sk/Ök < 1.

2.      Simulate the probability of a randomly thrown Buffon needle hitting a grid line. We want a 90% confidence interval for estimate of length at most l = 0.2.  Keep throwing the needle until 1.645(mk(1-mk) /Ök < 0.1

 

Bootstrapping Technique for Estimating Mean Square Errors

X1,…,Xn iid with distribution F (cdf)

The problem is to estimate q(F)  (some parameter of the distribution)

Estimator g(X1,…,Xn)

Mean Square Error

            MSE(F) = EF(g(X1,…,Xn) - q(F))2

            Example: q(F) = E[Xi] and g(X1,…,Xn) = (X1 + … + Xn)/n

Assume Xi = xi, for 1 ² i ² n.

Define Fe(x) = (number of i such that xi ² x)/n  (empirical distribution)

 

By the strong law of large numbers, Fe(x) ® F(x) as n ® ¥, with probability 1.

Thus, q(Fe) should be close to q(F) and

            MSE(Fe) = EFe(g(X1,…,Xn) - q(F))2 = bootstrap approximation to MSE(F)

 

Examples.

1.      q(F) = E[Xi] and g(X1,…,Xn) = (X1 + … + Xn)/n = mn

Then MSE(Fe) = S(xi - mn)2/n2  which is close to S2/n = S(xi - mn)2/n(n-1)

 

2.      In a single server queueing system, we want to estimate the average amount of time spent in the system = q(F). Use g(X1,...,Xn) = sum of waiting times/number of customers.

 

If the data values Xi are distinct, the the empirical distribution Fe puts equal weight 1/n on each point. The value of q(Fe) is usually easy to calculate, but MSE(Fe) requires nn calculations. Therefore we estimate MSE(Fe) using simulation.

 

Example. In-class single server queueing system. Arrivals were Poisson with parameter 1/8 (average interarrival time 8 seconds) and service times (in seconds) were uniform on [0,20]. We obtained the following data for time in system, W, with 10 arrivals and then with 3 arrivals.

 

i

1

2

3

4

5

6

7

8

9

10

Mean

xi

3

5

9

11

15

17.5

21

29

30

37

17.75

yi

3

5

10

 

 

 

 

 

 

 

6

 

Parameter q(F)

Mean  q   (10 customers)

Variance s2  (3 customers)

Estimator g(X1,...,Xn)

(X1+...+X10)/10          (17.75)

S2= [(Y1-6)2 +(Y2-6)2+(Y3-6)2]/2

MSE(F) = EF[g(X1,...,Xn)-q(F)]2

EF[(X1+...+X10)/10 - q(F)]2

EF[[(Y1-6)2 +(Y2-6)2+(Y3-6)2]/2 - s2]2

MSE(Fe) = EFe[g(X1,...,Xn)-q(Fe)]2

EFe[(X1+...+X10)/10 – 17.75]2 (11.68)

EFe[[(Y1-6)2 +(Y2-6)2+(Y3-6)2]/2 – 26/3]2

 

Note that for the 10 customer example we can calculate MSE(Fe) = Var(X1+...+X10)/10)  = (1/100)[ (X1-17.75)2  + ... + (X10-17.75)2]  = 11.68.    An exact calculation in the 3 customer example would involve 33 = 27 cases (in general, nn cases as noted above). When this number is prohibitive, we can estimate MSE(Fe) by simulation. In the 3 customer example, we might simulate 10 values of M = [(Y1-6)2 +(Y2-6)2+(Y3-6)2]/2 – 26/3]2 and take the average. For example, to simulate one value of M, we can choose a number at random from {1,2,3} three times, with repetitions allowed. Say we obtain the sample 1,2,1. From our table above, this corresponds to Y1 = 3,  Y2 = 5, Y3 = 2. The corresponding value of M is M1 = [(3-6)2 +(5-6)2+(3-6)2]/2 – 26/3]2 = (5/6)2.

 


Aggregate Loss Models.  The standard model is:

            S = X1 + X2 + … + XN  where  N,X1,X2,… are independent and the Xj’s identically distributed. In this case, can assume losses indexed in order.

            Tj = time of j-th loss (occurred, reported, paid)

If timing of losses is important, we need to know the joint distribution of

            (T1,T2,…,X1,X2,…)

 

Assumptions can fail in accounting for time (value of money) and coverage modifications.

 

Example (time value of money). S = present value of all payments made on losses occurring in the next year on a policy issued today.

 

Tj = Cj + Lj = time of event + time from occurrence to payment.

Cj and Lj are independent and Lj’s are independent of each other.

Cj – Cj-1 are independent, exponential with mean 0.2 years

Xj = amount paid at Tj on the loss occurring at time Cj. 

Xj and Cj  are independent; Xj and Lj are positively correlated.

Vt = value invested today to accumulate to 1 in t years, independent of Xj,Cj and Lj.


 

Example (Out-of-pocket maximum). Deductible d on individual losses. Maximum of u on total paid by policy holder for the year. Model insurer’s aggregate payments.

 

Xj = amount of j-th loss

Wj = Xj Ù d = amount paid by policyholder

Yj = Xj – Wj = amount paid by insurer

R = W1 + W2 + …  + WN = total paid by policyholder without maximum u

Ru = R Ùu = amount actually paid by policyholder.

S = X1 + X2 + … + XN

T = S – Ru = aggregate amount paid by insurer.

 


S, Ru based on iid distributions, but they are not independent, so individual distributions can’t be combined to obtain a distribution for T.

 

Simulation Approach.

1.      Build a model for S as a function of r.v.’s X,Y,Z,…

2.      Generate values for the r.v.’s to get a value of S

3.      Repeat many times to approximate cdf F(x) for S with Fe(x).

 

Two concrete examples.

1.      Cj – Cj-1 is Exponential(0.2)

Lj is Uniform(0,1)

Xj = amount paid at Tj = Lj

Vt = (1.1)-t

-0.2ln(U)

.24

.14

.49

.14

U

.75

.19

.26

.96

 

 

Assuming we only count payments made during one year,

S = .75(1.1)-.99 + .19(1.1)-.57 = .85

 

Generate 9 other simulations of S:  .50, .65, .68, .71, .92, 1.04, 1.26, 1.28, 1.32

Fe is the distribution that assigns probability 1/10 to each of our 10 sample points.

 

Estimate F(.90) so that you are 95% sure you are within 1% of the true value.

Fe(.90) = 5/10

For larger samples, the estimator is Pn/n, where Pn is the number of sample values ² .9.

This is a binomial proportion with mean F(.9) and variance F(.9)[1 – F(.9)]/n.

 

We want 0.95 = P(.99F(.9) < Pn/n < 1.01F(.9))

Subtract the mean and divide by the standard deviation to get

 


 


Where 1.96 is the 97.5-th percentile for the standard normal distribution. The inequality  can be manipulated to obtain

 


The simulation can be run until this condition is met. Note that we expect Pn/n to be around ½, so we expect (n-Pn)/Pn to be around 1. With this estimate, n ³ 38,416.

 


2.      Claims on an insurance policy occur with an interarrival time that is exponential with mean ¼ year. The claim amount is uniform on the interval (0,2). A deductible amount of 0.5 applies to each claim, but the maximum paid by the policy holder is 1.5 per year. Use the following table to answer the questions below.

-ln(U)

.165

.120

.681

1.32

2.16

 

 

 

 

U2

.26

.74

.06

.33

 

 

 

 

 

Sims

1.5

.90

.75

1.4

1.0

.92

1.2

1.28

1.05

(a)    Simulate the total claims for one year.  (1.16)

(b)   Use the 9 additional simulations in row 3 to calculate a sample variance. (.0678)

(c)    Estimate the number of additional simulations needed to give you 95% confidence that the sample mean is within 1% of the true value. (1319)

 

Exercise. Claims arrive at an insurance company according to a non-homogeneous Poisson process with intensity function l(t) that linearly increases from 0 to 6 over the interval [0,1/2] and then decreases linearly back to 0 at t = 1  (year). The amounts of the claims are independent and uniform on the interval [10,20]. The time between the arrival of a claim and payment of the claim is 2 weeks if the claim is less than 15 and 4 weeks otherwise. The insurance company pays a maximum of 50 for the claims that arrive in any one year. If the interest rate is 8% (discount factor 1/1.08), use the random samples in the table below to generate the present value of payments on claims incurred in a single year. You should first use the thinning method for generating non-homogeneous Poisson.

 

-(1/6)ln(U1)

.06

.29

.09

.11

.23

.01

.10

.10

.27

 

 

 

 

U2

.77

.38

.59

.12

.33

.34

.28

.38

.62

.60

.40

.29

.28

 

 


Math 286/366

Review for Final

Spring, 2001

 

Pseudorandom numbers:  xn = axn-1`+b

 

Inverse Transform:  discrete, exponential-Poisson

 

Rejection Method:  X has pdf f(x), Y has pdf g(y) – easy to sample, f(y)/g(y) ² c.

1.      Generate sample y from Y and a random number u

2.      If u ² f(y)/cg(y), accept X = y.

 

Sample mean, sample variance: Confidence interval for mean:  `Xk @ za/2 Sk/Ök

Bernoulli trials: estimate variance with [Xk(1- Xk)/k]1/2

 

            Recursion formulas:


                       

Bootstrap:  MSE = E[g(X1,…,Xn) – E(g(X1,…,Xn))]2

            Fe = empirical distribution from sample  

(note: sample variance ¹ variance of empirical distribution.

MSE » MSEFe  can be estimated by simulation

 

Goodness of Fit

            Discrete distributions, use Chi-square

            Continuous distributions, use Kolmogorov-Smirnov

            Missing parameters: estimate from sample data

            Two sample rank sum statistic

            Multisample rank sum statistic (Kruskal-Wallis)

            Testing for nonhomogeneous Poisson distribution

 

Loss Models

            Standard model: S = X1 + X2 + … + XN;  N, X1,X2,… independent, X1,X2,… iid

            Complications:  coverage modifications, time considerations

1.      Build a model for S as a function of random variables

2.      Sample random variables to obtain a value for S

3.      Repeat many times to approximate FS with empirical distribution.

Two examples