Saturday, May 22, 2010

More on Monte-Carlo simulation: If we know the equation.

There are lots of examples of using Monte-Carlo analysis in situations when we have the relationship between the independent variables (factors) and the dependent variable(s).

For example, let's consider a performance-risk-analysis problem for a given multiprocessor system, and suppose we know that this system is described by the universal scalability model derived by Dr. Neil Gunther. (The latter is a very reasonable assumption, considering that the USM is known to be perhaps the most valid representation of multi-processor capacity. Suppose, further, that we have been able to get enough throughput measurements to derive the parameters (σ and k) for this system's model:

C(p)   =  p

1 + α (p − 1) + β p (p − 1) 
(1)

(Neil, thank you for the HTML !)

In this formula, C is the throughput, or capacity of the system; α is the contention component, and β is the coherency component of the system in question. The p is the number of processors.

We are asked to predict whether the system will be able to handle the workload given a probability distribution of p = the number of processors operational on this system at any given moment in time (it may just as well be switches, routers, controllers, etc.).

In mathematical terms, we are asked to determine the distribution of the system's throughput (capacity) if the number of processors follows a certain probability distribution.


(This is inserted on July 5, 2002:)In addition, when we are calculating the model parameters  α and β, we can only do so with some degree of uncertainty.  That implies a distribution of possible values for the parameters, with the calculated numbers being, hopefully, the mean and mode of these distributions.  In the best-case scenario, we can assume a normal distribution with a standard deviation of zero, in which case the only probabilistic term is the number of processors in operation.
More in a blog due next week (hopefully)
(End of July 5, 2002 insertion)

We have the formula and the distribution for the p; we vary the p by rolling the dice out of this distribution and calculate the values for the system's capacity, C(p). The more times we roll the dice, the more accurately the data will resemble the distribution of the capacity for this system.

Parenthetically: generally speaking, when applying the Monte-Carlo or any other analytical technique, care must be taken that the distribution of p should be reasonably within the range on which the parameters were determined. In the case of USM, however,this comment may be considered of low relevance.

We then run statistical tests to determine whether the C(p) that we calculated will meet the demands for the system's capacity.

What tests to run, and how to interpret the results, is the subject of another post.

9 comments:

  1. I have a passing familiarity with this equation, but not in the context of MC. I never thought about p-processors as a probability distribution for the purposes of CaP.

    So, what does C(p) look like when you do the MC? What are appropriate choices of dsns for p? Can you show us some examples?

    ReplyDelete
  2. Neil, thank you for the comment.

    I wish I had the data to do that. It would be correct to say that this post was "just a hunch"

    Re. distribution type, the nice thing about MC is that, as long as it is defined - as long as you can build a histogram, it does not have to be one of the named distributions.

    ReplyDelete
  3. I'm a bit confused. Why not choose a variate pdf for 'p,' run the MC and see what happens? Basically, the MC just generates sample paths. Right?

    ReplyDelete
  4. True. I just didn't want to post unrealistic distributions. But you are right; it doesn't matter for illustration purposes. Will do over the weekend.

    ReplyDelete
  5. Sounds good.

    I'm trying to put my Amdahl/Gauss blog post together too. I think there might an interesting connection with MC sims, which I wouldn't even have contemplated without reading your post. I'll be interested to see what you think.

    BTW, you still have some Ns in the USL formula. :)

    ReplyDelete
  6. New hint.

    Set β = 0, which makes C(p) identical to the simpler case of Amdahl's law (i.e., contention only), choose a uniform dsn for p-processors and see what happens.

    ReplyDelete
  7. Neil, thank you for the encouragement. I think it's a very sound way of taking capacity analysis to a new level. I'm setting up an R script to do these simulations.

    I've corrected the USL formula - thank you! - and added some more thoughts there: we really should be looking at a 3-factor simulation: we cannot assume that the Alpha and Beta are deterministic. I'll post some more this coming week.

    Please let me know what you think.

    Thank you!

    ReplyDelete
  8. Stop on by and check out "Prime Parallels for Load Balancing" or "Mr. Amdahl meets Mr. Gauss." :)

    Hope you and yours had a good 4th.

    ReplyDelete
  9. Neil, that's a very interesting and sound idea (I'm talking about your post). Please see my comment there.

    ReplyDelete