NASA logo
NASA SISM
Intelligent Systems Project
Intelligent Data Understanding
Research Task
IS logo

IS Tasks | IDU Tasks
IS:  Previous | Next  ||  IDU:  Previous | Next


Automated Knowledge Discovery from Simulators

NASA Jet Propulsion Laboratory

Becky Castano (JPL/MLS)
Bill Merline (SwRI)



Abstract


Simulators offer a rich exploratory environment, but can produce massive amounts of data. This research task will develop interactive and automated techniques for exploring physical models via multiple simulation runs.


Task Description


Objective:

Traditional data mining extracts knowledge from static data sets. Simulators offer a richer exploratory environment, with the possibility of generating new data in order to verify patterns and test theories. New tools are needed for deciding which simulations to run and how to transform the output -- possibly terabytes of data -- into knowledge. This research task will develop automated or semi-automated techniques for intelligently sampling a parameter space to best clarify a physical model and its behavior. This application of active, closed-loop machine learning can reduce the hundreds or thousands of computationally intensive numerical simulations required to investigate science problems, helping experimenters maximize knowledge return from their simulation studies. The challenge is to develop a control technique that learns from simulation results -- via an output metric, and with allowance for potentially noisy or chaotic simulations -- to select the next simulation to run in order to maximize the information likely to be gained. Challenges include landscape characterization (determining which conditions lead to a given behavior), model identification, simulator control, and detection and use of trigger events (e.g., to allow backtracking during a simulation or to enable object-centered indexing). The driving application for this research task will be a simulation of asteroid collisions to determine conditions that lead to satellite formation.


Applications:

Numerical and particle simulation studies; earth science (core/mantle, climate, atmospheric, and ocean dynamics); space science (stellar dynamics, solar wind; galaxy and planet formation); engineering design (aerodynamics, structures, propulsion, failure analysis).


NASA Benefit:

Simulators play a fundamental role in investigations by scientists and engineers across NASA, DOE, DOD, FAA, industry, and academia. In many cases, they enable studies that would be infeasible or impossible otherwise. Science examples include studies of the earth's core and mantle dynamics; climate prediction; atmospheric and ocean dynamics; fluid flows in microgravity environments; dynamics of the interiors of stars; interaction of the solar wind with the earth; galaxy and planet system formation; artificial life; and neuronal models. Engineering examples include aerodynamics and flight research; propulsion systems; behavior of flexible structures; and failure analysis. Much of the work in high-performance computing has focused on producing larger, more accurate simulations. The complementary problem of deciding which simulations to run and how to transform the output into knowledge has largely been neglected, despite a large potential payoff. (The simulation process may take months, but the human analysis phase may take years.) This research task will develop machine learning techniques for efficient use by large-scale numerical simulators. Active learning can greatly reduce the number of simulation runs that must be done, by helping to predict which parts of an input parameter space need to be explored. Investigators will focus on particle simulations, a domain of interest to NASA for studying the origins of the planets, the long-term dynamical behavior of comets, Kuiper Belt objects, and other Solar System bodies.


Keywords:

exploratory model analysis, knowledge discovery, numerical simulator control, particle simulation


Images:

PI slides.



Research Plan


Prior Technology:

Pre-specified parameter sets, or manual analysis of simulator output and scheduling of new runs.


FY04 Milestone:

Quantify improvement; add another simulator; allow early stopping.



Progress


FY04 Quadchart Slide:

IDU_NRA_Castano_SimCtl.ppt.


May 02 Report:

SimCtl02b.pdf.


Accomplishments:

Quantitatively compared existing active learning algorithms on benchmark data sets; new, high-speed SVM active learning method with automatic model selection (tuned especially for simulators); particle-simulator asteroid collision study (characterizing initial conditions that result in satellite formation); quantified active learning contribution.


Preliminary Results:

Obtained preliminary results on the "asteroid satellites" simulator problem. (See May 02 slides for example result graphics.) These results include uniform sampling, for comparison to the intelligent sampling approach.


Papers:

DeCoste, "Anytime Interval-Valued Outputs for Kernel Machines: Fast Support Vector Machine Classification via Distance Geometry," Proc. Int. Conf. on Machine Learning (ICML-02), Jul 02.

Mjolsness and DeCoste, "Machine Learning for Science: State of the Art and Future Prospects," Science, Volume 293, pp. 2051-2055, 14 Sep 01.



For More Information


Related Web Pages:

Research group page.


Contacts:

Rebecca Castano (PI), JPL Machine Learning Systems Group.
William J. Merline (Co-I), Southwest Research Institute.



Intelligent Systems | Intelligent Data Understanding
IS:  Previous | Next  ||  IDU:  Previous | Next

Responsible NASA Official: Joseph C. Coughlan.
Program Support: Kenneth I. Laws. / Updated: 29-Nov-2004
Mail Stop 269-3, NASA Ames Research Center, Moffett Field, CA 94035-1000

NASA Privacy Statement.
For Section 508-accessible information, contact access@mail.arc.nasa.gov.