Tuesday, May 25, 2010

More on Monte-Carlo simulation: if we don't know the equation

In my previous post, I was discussing the beauty of using Monte-Carlo simulation when we can describe the world by a formula.  Sadly, more often than not, we can't.  The beauty of Monte-Carlo analysis is in its suitability for such cases.  If we can describe the world in a set of nodes and links (aka states and events), and if we know the probability distribution of the states, or the timing and magnitude of the events, and we want to know the maximum-likelihood of an outcome, we do not have to use an analytical model: the node-and-link grid has all the solutions in it.  Some of these solutions are expressed as an equation or set of simultaneous equations (again); others, a parsing algorithm (akin to the parsing algorithm for an N-ary tree).

Any node-and-link model involves activities (akin to decision making) on nodes, connected by links. In some cases,  such links will be physical; e.g., traffic simulation, where the highways will be represented by links, and nodes will correspond to intersections, stores, towns, etc.  Traffic density, delays, link traversal times, link and node capacities would be the independent variables for the "provider" nodes and links and dependent variables for the "receiver" nodes and links.

In other cases, links are logical, belonging to the "IF...THEN...ELSE" category.  Probability distributions of decisions at each node will be the metrics of interest in this case, along with the other parameters, like retention time at the node, link delay, etc.


A Closer Look at the Node-and-Link Model Structure from the Monte-Carlo Standpoint.

A fragment of a traffic network represented by nodes and links is shown in Figure 1 (left).  "Vehicles" (anything from cars to messages) traverse the nodes via links, each node and link having their intrinsic properties, like retention time; capacity; throughput, reliability, etc.







A traffic network model can just as well simulate a decision grid by introducing a virtual “crawler” parsing the network.  By sending multiple crawlers down the network and simulating each crawler's retention times and decisions at each node, we will build the total distribution of the network latency.

The diagram on the right illustrates the concept.  Nodes are decision-making points, where the crawlers are detained for time periods that can be described by known distribution (see the histograms).  At decision nodes (e.g., N2_2 in this diagram), each link is associated with a given frequency of selection (the binary histogram next to it).


The beauty of Monte-Carlo simulation is that it can do fast-time modeling, simulating the time faster than it happens in the model's real-world protagonist system.

It can be achieved in a number of ways, most popular of which are:

1.  Real-time: when running human-in-the-loop simulations, with actual humans making some of the decisions, as well as in psychology and social sciences research, real-time Monte-Carlo situation is irreplaceable, as it allows subjects to make decisions in the actual tempo of the world they are used to.  It is also very popular in the game business, with SimCity and Earthquake being among the earliest adopters of this approach, where the user is in control of the simulated world. 

Interactive online games are the next step in the evolution of the real-time Monte-Carlo simulation, approach, where users are in control, and all the hundreds (thousands?) of users, by virtue of playing the game, become the simulation.  The downside of using interactive-game environment in any research is that the number of users varies, and that will likely make any data collected from such a setup too noisy to be useful.

2.  Time-scale: the simulated time delays are actually waited out on a scale by the modeling computer.  This method allows building very high-fidelity event-and-process models; however, it may not be very efficient for modeling situations that take a long time to complete without much happening; e.g., a simulation of a day of operations of a police station in a quiet neighborhood of Beverly Hills, CA would be too boring to analyze and it would take way too long for any analyst's liking, even if the scale is 1:24, allowing a day in simulated time to pass in 1 hour or the simulation's runtime.  Another drawback of time-scale simulations is in their dependency on the simulator running: a stoppage in the operation of the program software or hardware will cause a disconnect between the simulated time and real time, which time-scale modeling developers need to be aware of and account for in their disaster recovery and exception handling practices.

3.  Fast-time: this is yet another change of paradigm with regard to what the nodes and links represent.  (See illustration below)


Each crawler (aka Agent) knows what the timestamp of its next event will be.  The event may be any change of the crawler's current state (spatial location, heading on the compass, number of friends on Facebook, job status, citizenship, oil pressure, speed, funding status, etc).

A master controller keeps the simulation time and a sorted collection of agents in the order of next event times.  All agents act independently or in interaction with each other via the rules defined for them in the simulation program or configuration files.  Each agent only exposes its functionality and fields that it needs other agents to be aware of.  In addition, each agent may have a list of agents that need to be updated with it, in an Observer type of design pattern.

The inherent flexibility of this simulation, as well as its ability to run faster than real time while the same time making more productive use of the hardware than does the time-scale or real-time simulation makes it the key to efficient discrete-event simulation and the paradigm of choice for many analyses.

No comments:

Post a Comment