Training ML models

When you have processed your grid of MESA models and have obtained a 2D table with your input parameters (variables) and the targets that you want to predict, it is time to start training a ML model to do population predictions.

Data structure

NNaPS requires tabular training data, which does not contain NaN values. Internally a pandas DataFrame is used to store the training data. Training data can be read from a csv file (using the pd.read_csv) function or be provided as a DataFrame directly to the predictor constructor function. The variables / features are expected to be numerical, while the targets can be both numerical (continuous or discrete) or categorical variables. And example dataset included in the package is the ‘tests/BesanconGalactic_summary.txt’ dataset:

import pandas as pd

data = pd.read_csv('tests/BesanconGalactic_summary.txt')

print(data.head())

M1

Pinit

qinit

FeHinit

Pfinal

qfinal

product

binary_type

0.744

134.470005

1.095729

-0.912521

294.031588

0.608444

He-WD

single-lined

0.813

225.000014

2.524845

-0.806781

153.634007

1.031585

He-WD

single-lined

0.876

111.550009

2.190000

-0.918768

104.970587

0.912802

He-WD

single-lined

0.890

512.700045

2.386059

-0.878982

394.729424

1.396449

HB

single-lined

0.893

102.630007

1.485857

-0.731017

228.613065

0.640067

He-WD

double-lined

The features or input variables here are: M1, Pinit, qinit and FeHinit

The targets that we want to predict are Pfinal, qfinal and product. The first two are continuous numerical targets, while the last two are categorical variables which we will have to convert.

Model setup

Now we have our data we can setup the model that we want to use. There are two predictors included in NNaPS: XGBoost and fully connected neural networks. XGBoost is a very efficient random forest method. The fully connected NNs are implemented using Keras and TensorFlow. More info about these two types of models can be found in Predictors and setup

Here we will use a NN predictor. There are two ways of providing the required setup to the predictor. You can provide a yaml file detailing the setup, or provide the setup as a dictionary.

from nnaps import predictors

# use a setup from file
predictor = predictors.FCPredictor(setup_file='setup.yaml')

# use a setup dictionary
predictor = predictors.FCPredictor(setup=setup_dictionary)

The training data can be provided as a Pandas DataFrame directly to the constructor, or you can provide the filepath in the setup (both the dictionary or in the setup file). If you want to provide the data directly, use the data keyword.

The most simple setup will consist of the features, the targets and potentially the filepath to the training data:

datafile: 'tests/BesanconGalactic_summary.txt'
features:
   - M1
   - qinit
   - Pinit
   - FeHinit
regressors:
   - Pfinal
   - qfinal
classifiers:
   - product
   - binary_type

You can setup pre processors for both the features and targets, but this is not necessary. If you don’t provide them, NNaPS will add defaults where necessary. For more info on the available options, see Predictors and setup.

Model training and predicting

Training the model is as simple as calling the fit method with potential arguments specific to the model you have chosen. Predicting new targets can be done with the predict function

predictor.fit(epochs=100)

predictions = predictor.predict(new_data)

Checking the results

After training the model you will want to check the training process and the final score of the model.