Discovering Patterns in Irregular Behavior: Part 1

Mathematical description of irregular behavior is the holy grail of statistical analysis, akin to a game of croquet that Alice reluctantly played with the Queen of Hearts, where the wickets (cards / soldiers) move from their posts to “fix” the game, the mallets (flamingos) look at the player just before the ball (hedgehog) is about to be hit, and the hedgehogs start rolling down the court before they are hit with the flamingo’s head. (If you have not read or watched Lewis Carroll’s classic, please refer to a brief description of the game here.) A lot of uncertainty and ambiguity, just like in the world of data. And if you add the political aspect of the game (Alice is playing with a Queen after all, even if an imaginary one!), that makes it an excellent allegory of what happens in data science nowadays. Could Alice have won the game? Should she have?

Setting the Scene

Problem Statement

The rules of croquet are here. There are nine wickets and two stakes, placed in a certain pattern (topology), and each side needs to score fourteen wickets (going around the court and scoring one point per wicket). Each time a wicket or stake is scored, the player gets one bonus shot for the ball that scored the wicket or stake. The game is well structured, and the only uncertainty comes from the skills of the players and the strategy and tactics they are using.

Mathematically speaking

We have a game G(N, p, b), with p players, b balls per player, and N wickets. If blocking other player’s ball had been prohibited, then each player’s goal would be to maximize the number of wickets they score while keeping their “tank” of bonus shots sufficiently full, given the number and topology of the wickets and stakes:

Turn:P: max(W|B, Topo),

where P = player ID; H = number of wickets and stakes, B = number of bonus shots the player has, and Topo = the topology (layout) of the court (wickets, stakes, and balls).

Perhaps a better strategy would be to minimize the number of turns the player needs to finish the game given the number of bonus shots he/she has and the topology of the game:

P:min (NTunrs |B, Topo)

However, this is the trivial scenario, the non-interactive game. In the interactive game, each player has more than one ball and is allowed to use either one to block the progress of the opponent to his/her goal by hitting their ball(s) with his / hers. The rules encourage the players to hit any opponent’s ball with theirs: while scoring a wicket is worth one bonus shot, hitting an opponent’s ball is worth two, improving their chances to finish the game before the opponent. This changes the previous expression for the winning strategy to:

P:min (NTurns |B, Topo, Pr(other)) ,

where P = player ID, B = number of bonus shots the player has, Topo = topology of the game, and Pr(other) = probability distribution of actions by the other participants of the game (e.g., opponent).

Uncertainty

Without going into the details of game theory, suffice to say that the opponent’s behavior adds some uncertainty to the player’s strategy and tactics; however, this ambiguity is predictable given that each player knows what the end goal of the game is, and assuming the players keep track of each other’s bonus shots, they can predict what the opponent will do with a decent confidence and based on that build their own strategy and execution: it’s all about Pr(other).

“Curiouser and Curiouser”

These are the words immortalized by Alice when she remarked on the amount of ambiguity she is facing in Wonderland. While grammatically incorrect, they do describe her point of view on what happened to her, especially at the Royal Game of Croquet.

The simple game with known rules changed ever so slightly - just adding another source of uncertainty, and Alice did not know what to expect anymore: her knowledge of Pr(other) was gone.

Very quickly, she observed that the Queen’s mallet (flamingo) was solid and predictable , while the ‘wickets’ were moving to where the ball was rolling. But whenever it was her turn to hit the ball, it did not work: her mallet -flamingo wanted to talk her out of it, ducked (pardon the pun!) just before it was hitting the ball, the ball would not budge, and the wickets were moving away from the ball.

Thomas Bayes meets Charles Lutwidge Dodgson

It is a well known secret that Lewis Carroll = Lutwidge Charles, and Dodgson = Dodo, the mysterious bird who, along with the Cheshire Cat, helps Alice figure things out in Wonderland. What was in common between Bayes and Dodgson?

For starters, they were both clerics, the former a Presbyterian minister, while the latter an Anglican deacon. They were both mathematicians specializing in logic, Bayes becoming known for the Theorem connecting unknown causes with observed effect and the whole methodology that is used today in statistics, while Dodgson, some 150 years later, became famous for popularizing complicated mathematical concepts, making them accessible even to a little child like Alice, even at the cost of becoming Lewis Carroll and Dodo (and maybe Cheshire Cat too? after all, he was born in the Cheshire county).

Bayes Theorem

The “classical” (traditional, frequentist, Fisherian) statistics, brilliantly formalized by Gauss, Gossett (known as the author of Student’s T-test), Fisher, Pearson, and so many others before and after them, and put to use by them for figuring out confidence intervals, correlations, required sample sizes, probability of making a mistake, and similarities between two distributions, - this side of statistics has been written and sealed into our brain since college. Mathematical description of relationships between the explanatory variable(s) and dependent variable(s); statistical process control; mathematical forecasting and in general predictive analytics applied to continuous variables are all derived from the theoretical foundations created by these giants.

They made beer produced in England survive the sea voyage to India; they made it possible hundredfold improvements in crop yields; they allowed scientists to figure out the laws of behavior of gases, liquids, and solids; predict the rates of chemical reactions; mathematically describe complex physical phenomena; extract signal from noise, and accomplish many other feats that would have been impossible without it.

At the same time, Bayesian approach, with its unassuming - pun intended here! - approach to solving data-analytical problems, was the “Ugly Duckling” of the beautiful picture of the world, until it turned into a beautiful swan - not without the help of the statisticians who were more interested in solving the problems rather than admiring the edifice of Applied Mathematics. Alan Turing, George Box, Diane Cox, and other brilliant mathematicians realized early enough that mathematics did not need to go into politics. The gap between Frequentist and Bayesian statistics was bridged.

Data Science: Engineering? Machine Learning? Statistics?

In the 1920s, Goedel demonstrated mathematically that any non-trivial system of axioms is incomplete. We compensate for this incompleteness by imposing confidence intervals and developing statistical methods to calculate these intervals for a given probability of making a mistake.

In the 1940s, scientists in the Los Alamos National Laboratory needed to model processes they had only a vague idea about. They did have the mathematical models, but such models were not accounting for the incompleteness of our knowledge about the world they were peeking into. The Monte-Carlo simulation was born.

The idea was simple: when we have the mathematical model, and the distributions of input variables are known, we can slice the distribution of the input variables by multiple percentiles and for all combinations of percentiles of inputs, apply the mathematical model. The resulting output variables will have a distribution whose parameters can then very easily be evaluated.

The first electronic numerical calculating machine (ENIAC) had already been developed in the early 1940s. Its computational speed made Monte-Carlo calculations feasible. Nick Metropolis, Stanislaw Ulam, and John von Neumann were the founding fathers of modern Monte-Carlo method and the practical numerical approach to data analysis and modeling.

The amazing advances of the computing power, the computer engineering technology enabled an unprecedented fractalization and parallelization of mathematical tasks. Models that only twenty years ago were rejected simply because they were too computationally intensive are beginning to take foothold and are leading to discovering new knowledge. We are no longer confined to the Gaussian-distribution assumption about the data. We are not even confined to any distribution assumption.

That, combined with Alan Turing’s definition of artificial intelligence, has enabled “machine learning” to step from the realms created by the minds of Isaac Asimov and Arthur Clarke into the world of philosophical and ethical discussions, psychology, and finally, Knowledge Discovery and Data Mining, or KDD.

For an idea to come into acceptance, it has to be demonstrated.

Dr. Frankenstein chose the wrong time and the wrong place to demonstrate his creation to the world.

Starting in 2004, the DARPA Grand Challenge and other DARPA-sponsored competitions of artificial-intelligence devices have been conducted, highlighting different aspects of usability of such devices. The common denominator among competing teams is their use of machine learning in near-real-time analysis of the situation in front of them and making decisions independently of their human handlers.

Hopefully, now the time and place is more fruitful than it was for Frankenstein.

(To be continued...)

AlexOnSimAndData

Saturday, April 18, 2015