Monday, May 17, 2010

How it all started: Part 1 / 4 (for the remaining 3 parts, click Comments)

Allow me to introduce myself: AlexG. I am originally from Moscow, Russia; have been living in California (Bay Area) over the last 19 years. I have a family, kids, and a great interest in life and many different things it offers.

I've got an excellent graduate-level education in Applied Math, Control System Engineering, and Operations Research, among other things. This education carried me along into getting a professional engineer (PE) license and into Semiconductor Manufacturing, Six Sigma, and real Data Analytics, creating software for Data Mining, Fast-Time Simulations, and Statistical Forecasting.

Along the way, I generated papers and patent applications in every domain I touched, from waste treatment to data mining.

I joined EG&G Amorphous Silicon (which is now Perkin-Elmer Optoelectronics) in 1995 to be a Process Engineer in a semiconductor production startup environment. A couple of years later, I was picked by the management for Six Sigma Black Belt training, and upon completion of the course, became the local expert in data-driven process improvement.

I was involved in a number of DMAIC ("Define, Measure, Analyze, Improve, Control" - the Six Sigma mantra, aka GEMS - "Gather, Examine, Modify, Sustain" - the GE Medical Systems' Six Sigma mantra) projects, aimed at improvement of the consistency and quality of the critical to quality (CTQ) process parameters driving the bottom-line.

In that capacity, I realized two very important things:

1. That it is impossible to be successful in data-driven process improvement without using specialized software for data analytics.

2. That it is very expensive and silly to do experiments on live production lines. The cost of this may offset the benefit of the knowledge and control over the CTQ parameters, driving the ROI into the ground.

These two facts compelled me to ...

(To read on, click the Comments link)

3 comments:

  1. ...start moving into the area of developing software for data analysis and simulation

    How it all started: Part 2 / 4



    I wanted to try my hand in "real" software development, but applied to data simulation, modeling, and interpretation. I wanted to create software that would not just simplify the analysts' job, but become a sidekick in their work.

    I joined Analytical Technologies Applications Corporation (ATAC) to work on creating software for air traffic simulation for airports, enroute sectors, and the military. Most of the work was directed at airspace capacity planning. From traffic standpoint, airspace is modeled as a network of nodes and links, and in this sense it is not different from the WWW, logistic networks, or storage area networks, and just as one can analyze the Web, one can analyze air traffic operations.

    After some time at ATAC, we received funding for a NASA project aimed at evaluation of human performance under stress. The goal was to evaluate how much stress was acceptable for air traffic controllers and pilots, and at what levels of stress their operation would become dangerous to the hundreds of thousands of people in the air at any moment in time.

    I became the lead engineer and project architect on that project. We created a configurable fast-time agent-based simulation system for air traffic with a "human in the loop", meaning that a separate human-performance model was operating as one of the agents in our simulation, interacting with other agents and making decisions that real human air traffic controllers usually do day in, day out.

    Once the system became operational, we validated it against real data for one of the large airports in the US, comparing statistically the output of the system with the actual data collected by an air traffic performance data analysis and reporting system - another service created and supported by ATAC. Having confirmed the validity of the model, we used it on a variety of scenarios in a robust (meaning multiple runs at each set of conditions) full factorial experiment design (meaning each of the factors was altered in two levels, and we were interested in the effects of factor sand their interactions)aimed at finding out the critical stress levels at which the air traffic controller's judgment can be impaired.

    Results were later used by the FAA to modify the ATC protocols to increase ...

    (to read on, follow the next comment)

    ReplyDelete
  2. ... traffic throughput while at the same time improving aviation safety for in-flight and approach scenarios.

    How it All Started: Part 3 / 4


    At Analytical Technologies (ATAC), developers create analytical software that is primarily used by ATAC's analysts and developers. It was a very good experience, and a great opportunity to create something that other people can use in their work. It whetted my appetite to work on a commercial product for data mining.

    In May 2005, I joined MonoSphere, Inc., a startup involved in storage capacity planning. A brilliant idea - collect the data from the user's storage networks on how many MB were used each day on every LUN of every disk (on the supply side), map these data to the demand side (applications, filers, project-based file systems), and once sufficient amount of data have been accumulated - bingo! - roll out a forecast of how much storage this particular user will really need a week, or a month, or a year from now.

    My job was to create a wrapping component of our product, Storage Horizon, to include the third-party ForecastPro forecasting engine by BFS, Inc. The engine (a C++ DLL) receives the historical data and scenario parameters (level, trend, seasonality, outliers) and produces a forecast, along with a dozen or more parameters characterizing the quality of the model and forecast. The wrapper was to identify the scenario characteristics, launch the forecasting engine, and pick up and interpret the engine's output. This work included some novel and innovative approaches and methodologies, and I became the primary inventor on 2 patent applications. I also wrote and presented a paper on the subject of data pattern analysis at CMG, International (Computer Measurement Group) Annual Conference in Reno, NV in 2006.

    That done, I created intelligent software that selects, based on the input data characteristics, the model that the engine was to use with each data set, as well as a way to run competition of different models and different forecasting engines for the same scenario, picking the best.

    Now that the software was smart enough to run on its own, and even interpret its outputs, we went ahead and implemented an automatic bulk forecasting for tens of thousands of scenarios. We used a "lazy approach", and a novel method to determine which scenarios needed to be forecasted in this run, and which needed to wait (or could wait), became the ...


    (to read on, follow the next comment)

    ReplyDelete
  3. subject of the third patent application filed for MonoSphere where I was an inventor.

    (How it all started, Part 4 / 4)

    Now, if we have a few tens of thousands of forecasts, we want to know how trustworthy they are. There are standard methods to find an answer to this question, all based on data splitting, forecasting the later part based on the earlier part's historical data, and then comparing the later part's historical data with the later part's forecasted data. Where to split the data, what criteria to use in comparison - these are all great questions that are best left for a separate discussion.

    For now I'll just say that I created a data-mining application that allowed to do just that: forecast accuracy benchmarking. All the user had to do was press the button, and all the split-data forecasting and the consequent data analysis was done automatically, returning a single number - "index of believability".

    This number can then be stored in the database, and statistical process control (SPC) can be applied to monitor the quality of the forecasts.

    The system also allowed the users to conduct a fast and intelligent drill-down to the problematic scenarios, giving the users an option to ignore those, to fix them, or to consider these scenarios normal for future forecasts and for the benchmarking purposes in the future.

    Sadly, all good things have to come to an end. MonoSphere collapsed under the load of the collapse of the banking system in 2008. Fortunately, Quest Software acquired the assets of MonoSphere, and for awhile we were developing MonoSphere's old product (Storage Horizon).

    But again, all things have to come to an end, and now we are being refocused to another product within Quest. This new product will not be as heavy on analytics and data mining, but it is my specialization, and I don't want to lose the edge.

    So I am looking for a change, an opportunity to fully apply my skills, know-how, background, and experience - and at the same time to learn and grow professionally.

    ReplyDelete