mafese.wrapper package¶

mafese.wrapper.recursive module¶

class mafese.wrapper.recursive.RecursiveSelector(problem='classification', estimator='knn', estimator_paras=None, n_features=3, step=1, verbose=0, importance_getter='auto')[source]¶

Bases: mafese.selector.Selector

Defines a RecursiveSelector class that hold all RecursiveSelector Feature Selection methods for feature selection problems

Parameters

problem (str, default = "classification") – The problem you are trying to solve (or type of dataset), “classification” or “regression”
estimator (str or Estimator instance (from scikit-learn or custom)) –
If estimator is str, we are currently support:
- svm: support vector machine with kernel = ‘linear’
- rf: random forest
- adaboost: AdaBoost
- xgb: Gradient Boosting
- tree: Extra Trees
If estimator is Estimator instance: you need to make sure it is has a fit method that provides information about feature importance (e.g. coef_, feature_importances_).
estimator_paras (None or dict, default = None) – The parameters of the estimator, please see the official document of scikit-learn to selected estimator. If None, we use the best parameter for selected estimator
n_features (int or float, default=3) – The number of features to select. If None, half of the features are selected. If integer, the parameter is the absolute number of features to select. If float between 0 and 1, it is the fraction of features to select.
step (int or float, default=1) – If greater than or equal to 1, then step corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then step corresponds to the percentage (rounded down) of features to remove at each iteration.
verbose (int, default=0) – Controls verbosity of output.
importance_getter (str or callable, default='auto') –
If ‘auto’, uses the feature importance either through a coef_ or feature_importances_ attributes of estimator.

Also accepts a string that specifies an attribute name/path for extracting feature importance (implemented with attrgetter). For example, give regressor_.coef_ in case of TransformedTargetRegressor or named_steps.clf.feature_importances_ in case of class:~sklearn.pipeline.Pipeline with its last step named clf.

If callable, overrides the default feature importance getter. The callable is passed with the fitted estimator and it should return importance for each feature.

Examples

The following example shows how to retrieve the most informative features in the RecursiveSelector FS method

>>> import pandas as pd
>>> from mafese.wrapper.recursive import RecursiveSelector
>>> # load dataset
>>> dataset = pd.read_csv('your_path/dataset.csv', index_col=0).values
>>> X, y = dataset[:, 0:-1], dataset[:, -1]     # Assumption that the last column is label column
>>> # define mafese feature selection method
>>> feat_selector = RecursiveSelector(problem="classification", estimator="rf", n_features=5)
>>> # find all relevant features
>>> feat_selector.fit(X, y)
>>> # check selected features - True (or 1) is selected, False (or 0) is not selected
>>> print(feat_selector.selected_feature_masks)
array([ True, True, True, False, False, True, False, False, False, True])
>>> print(feat_selector.selected_feature_solution)
array([ 1, 1, 1, 0, 0, 1, 0, 0, 0, 1])
>>> # check the index of selected features
>>> print(feat_selector.selected_feature_indexes)
array([ 0, 1, 2, 5, 9])
>>> # call transform() on X to filter it down to selected features
>>> X_filtered = feat_selector.transform(X)

SUPPORT = ['svm', 'rf', 'adaboost', 'xgb', 'tree']¶

fit(X, y=None)[source]¶

Learn the features to select from X.

Parameters

X (array-like of shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of predictors.
y (array-like of shape (n_samples,), default=None) – Target values. This parameter may be ignored for unsupervised learning.

Returns

self – Returns the instance itself.

Return type

object

mafese.wrapper.sequential module¶

class mafese.wrapper.sequential.SequentialSelector(problem='classification', estimator='knn', estimator_paras=None, n_features=3, direction='forward', tol=None, scoring=None, cv=5, n_jobs=None)[source]¶

Bases: mafese.selector.Selector

Defines a SequentialSelector class that hold all Forward or Backward Feature Selection methods for feature selection problems

Parameters

problem (str, default = "classification") – The problem you are trying to solve (or type of dataset), “classification” or “regression”
estimator (str or Estimator instance (from scikit-learn or custom)) –
If estimator is str, we are currently support:
- knn: k-nearest neighbors
- svm: support vector machine
- rf: random forest
- adaboost: AdaBoost
- xgb: Gradient Boosting
- tree: Extra Trees
- ann: Artificial Neural Network (Multi-Layer Perceptron)
If estimator is Estimator instance: you need to make sure it is has a fit method that provides information about feature importance (e.g. coef_, feature_importances_).
estimator_paras (None or dict, default = None) – The parameters of the estimator, please see the official document of scikit-learn to selected estimator. If None, we use the default parameter for selected estimator
n_features (int or float, default=3) – The number of features to select. If None, half of the features are selected. If integer, the parameter is the absolute number of features to select. If float between 0 and 1, it is the fraction of features to select.
direction ({'forward', 'backward'}, default='forward') – Whether to perform forward selection or backward selection.
tol (float, default=None) – If the score is not incremented by at least tol between two consecutive feature additions or removals, stop adding or removing. tol can be negative when removing features using direction=”backward”. It can be useful to reduce the number of features at the cost of a small decrease in the score. tol is enabled only when n_features is “auto”.
scoring (str or callable, default=None) – A single str (see scoring_parameter) or a callable to evaluate the predictions on the test set. NOTE that when using a custom scorer, it should return a single value. If None, the estimator’s score method is used.
cv (int, cross-validation generator or an iterable, default=None) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 5-fold cross validation,
- integer, to specify the number of folds in a (Stratified)KFold,
- CV splitter,
- An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. These splitters are instantiated with shuffle=False so the splits will be the same across calls.
n_jobs (int, default=None) – Number of jobs to run in parallel. When evaluating a new feature to add or remove, the cross-validation procedure is parallel over the folds. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

Examples

The following example shows how to retrieve the most informative features in the Sequential-based (forward, backward) FS method

>>> import pandas as pd
>>> from mafese.wrapper.sequential import SequentialSelector
>>> # load dataset
>>> dataset = pd.read_csv('your_path/dataset.csv', index_col=0).values
>>> X, y = dataset[:, 0:-1], dataset[:, -1]     # Assumption that the last column is label column
>>> # define mafese feature selection method
>>> feat_selector = SequentialSelector(problem="classification", estimator="knn", n_features=5, direction="forward")
>>> # find all relevant features
>>> feat_selector.fit(X, y)
>>> # check selected features - True (or 1) is selected, False (or 0) is not selected
>>> print(feat_selector.selected_feature_masks)
array([ True, True, True, False, False, True, False, False, False, True])
>>> print(feat_selector.selected_feature_solution)
array([ 1, 1, 1, 0, 0, 1, 0, 0, 0, 1])
>>> # check the index of selected features
>>> print(feat_selector.selected_feature_indexes)
array([ 0, 1, 2, 5, 9])
>>> # call transform() on X to filter it down to selected features
>>> X_filtered = feat_selector.transform(X)

SUPPORT = ['knn', 'svm', 'rf', 'adaboost', 'xgb', 'tree', 'ann']¶

fit(X, y=None)[source]¶

Learn the features to select from X.

Parameters

X (array-like of shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of predictors.
y (array-like of shape (n_samples,), default=None) – Target values. This parameter may be ignored for unsupervised learning.

Returns

self – Returns the instance itself.

Return type

object

mafese.wrapper.mha module¶

class mafese.wrapper.mha.MhaSelector(problem='classification', obj_name=None, estimator='knn', estimator_paras=None, optimizer='BaseGA', optimizer_paras=None, mode='single', n_workers=None, termination=None, seed=None, verbose=True)[source]¶

Bases: mafese.selector.Selector

Defines a MhaSelector class that hold all Metaheuristic-based Feature Selection methods for feature selection problems

Parameters

problem (str, default = "classification") – The problem you are trying to solve (or type of dataset), “classification” or “regression”
obj_name (None or str, default=None) –
The name of objective for the problem, also depend on the problem is classification and regression.
- If problem is classification, None will be replaced by AS (Accuracy score).
- If problem is regression, None will be replaced by MSE (Mean squared error).
estimator (str or Estimator instance (from scikit-learn or custom)) –
If estimator is str, we are currently support:
- knn: k-nearest neighbors
- svm: support vector machine
- rf: random forest
- adaboost: AdaBoost
- xgb: Gradient Boosting
- tree: Extra Trees
- ann: Artificial Neural Network (Multi-Layer Perceptron)
If estimator is Estimator instance: you need to make sure that it has fit and predict methods
estimator_paras (None or dict, default = None) – The parameters of the estimator, please see the official document of scikit-learn to selected estimator. If None, we use the default parameter for selected estimator
optimizer (str or instance of Optimizer class (from Mealpy library), default = "BaseGA") – The Metaheuristic Algorithm that use to solve the feature selection problem. Current supported list, please check it here: https://github.com/thieu1995/mealpy. If a custom optimizer is passed, make sure it is an instance of Optimizer class.
optimizer_paras (None or dict of parameter, default=None) – The parameter for the optimizer object. If None, the default parameters of optimizer is used (defined in https://github.com/thieu1995/mealpy.) If dict is passed, make sure it has at least epoch and pop_size parameters.
mode (str, default = 'single') –
The mode used in Optimizer belongs to Mealpy library. Parallel: ‘process’, ‘thread’; Sequential: ‘swarm’, ‘single’.
- ’process’: The parallel mode with multiple cores run the tasks
- ’thread’: The parallel mode with multiple threads run the tasks
- ’swarm’: The sequential mode that no effect on updating phase of other agents
- ’single’: The sequential mode that effect on updating phase of other agents, default
n_workers (int or None, default = None) – The number of workers (cores or threads) used in Optimizer (effect only on parallel mode)
termination (dict or None, default = None) – The termination dictionary or an instance of Termination class. It is for Optimizer belongs to Mealpy library.
verbose (int, default = True) – Controls verbosity of output.
seed (int or None) – Random seed for reproducibility.

Examples

The following example shows how to retrieve the most informative features in the MhaSelector FS method

>>> import pandas as pd
>>> from mafese.wrapper.mha import MhaSelector
>>> # load dataset
>>> dataset = pd.read_csv('your_path/dataset.csv', index_col=0).values
>>> X, y = dataset[:, 0:-1], dataset[:, -1]     # Assumption that the last column is label column
>>> # define mafese feature selection method
>>> selector = MhaSelector(problem="classification", obj_name="AS", estimator="rf", optimizer="BaseGA")
>>> # find all relevant features - 5 features should be selected
>>> selector.fit(X, y)
>>> # check selected features - True (or 1) is selected, False (or 0) is not selected
>>> print(selector.selected_feature_masks)
array([ True, True, True, False, False, True, False, False, False, True])
>>> print(selector.selected_feature_solution)
array([ 1, 1, 1, 0, 0, 1, 0, 0, 0, 1])
>>> # check the index of selected features
>>> print(selector.selected_feature_indexes)
array([ 0, 1, 2, 5, 9])
>>> # call transform() on X to filter it down to selected features
>>> X_filtered = selector.transform(X)

SUPPORT = {'classification_objective': {'AS': 'max', 'BSL': 'min', 'CEL': 'min', 'CKS': 'max', 'F1S': 'max', 'F2S': 'max', 'FBS': 'max', 'GINI': 'min', 'GMS': 'max', 'HL': 'min', 'HS': 'max', 'JSI': 'max', 'KLDL': 'min', 'LS': 'max', 'MCC': 'max', 'NPV': 'max', 'PS': 'max', 'ROC-AUC': 'max', 'RS': 'max', 'SS': 'max'}, 'estimator': ['knn', 'svm', 'rf', 'adaboost', 'xgb', 'tree', 'ann'], 'optimizer': ['OriginalABC', 'OriginalACOR', 'AugmentedAEO', 'EnhancedAEO', 'ImprovedAEO', 'ModifiedAEO', 'OriginalAEO', 'MGTO', 'OriginalAGTO', 'DevALO', 'OriginalALO', 'OriginalAO', 'OriginalAOA', 'IARO', 'LARO', 'OriginalARO', 'OriginalASO', 'OriginalAVOA', 'OriginalArchOA', 'AdaptiveBA', 'DevBA', 'OriginalBA', 'DevBBO', 'OriginalBBO', 'OriginalBBOA', 'OriginalBES', 'ABFO', 'OriginalBFO', 'OriginalBMO', 'DevBRO', 'OriginalBRO', 'OriginalBSA', 'ImprovedBSO', 'OriginalBSO', 'CleverBookBeesA', 'OriginalBeesA', 'ProbBeesA', 'OriginalCA', 'OriginalCDO', 'OriginalCEM', 'OriginalCGO', 'DevCHIO', 'OriginalCHIO', 'OriginalCOA', 'OCRO', 'OriginalCRO', 'OriginalCSA', 'OriginalCSO', 'OriginalCircleSA', 'OriginalCoatiOA', 'JADE', 'OriginalDE', 'SADE', 'SAP_DE', 'DevDMOA', 'OriginalDMOA', 'OriginalDO', 'DevEFO', 'OriginalEFO', 'OriginalEHO', 'AdaptiveEO', 'ModifiedEO', 'OriginalEO', 'OriginalEOA', 'LevyEP', 'OriginalEP', 'CMA_ES', 'LevyES', 'OriginalES', 'Simple_CMA_ES', 'OriginalESOA', 'OriginalEVO', 'OriginalFA', 'DevFBIO', 'OriginalFBIO', 'OriginalFFA', 'OriginalFFO', 'OriginalFLA', 'DevFOA', 'OriginalFOA', 'WhaleFOA', 'DevFOX', 'OriginalFOX', 'OriginalFPA', 'BaseGA', 'EliteMultiGA', 'EliteSingleGA', 'MultiGA', 'SingleGA', 'OriginalGBO', 'DevGCO', 'OriginalGCO', 'OriginalGJO', 'OriginalGOA', 'DevGSKA', 'OriginalGSKA', 'Matlab101GTO', 'Matlab102GTO', 'OriginalGTO', 'GWO_WOA', 'IGWO', 'OriginalGWO', 'RW_GWO', 'OriginalHBA', 'OriginalHBO', 'OriginalHC', 'SwarmHC', 'OriginalHCO', 'OriginalHGS', 'OriginalHGSO', 'OriginalHHO', 'DevHS', 'OriginalHS', 'OriginalICA', 'OriginalINFO', 'OriginalIWO', 'DevJA', 'LevyJA', 'OriginalJA', 'DevLCO', 'ImprovedLCO', 'OriginalLCO', 'OriginalMA', 'OriginalMFO', 'OriginalMGO', 'OriginalMPA', 'OriginalMRFO', 'WMQIMRFO', 'OriginalMSA', 'DevMVO', 'OriginalMVO', 'OriginalNGO', 'ImprovedNMRA', 'OriginalNMRA', 'OriginalNRO', 'OriginalOOA', 'OriginalPFA', 'OriginalPOA', 'AIW_PSO', 'CL_PSO', 'C_PSO', 'HPSO_TVAC', 'LDW_PSO', 'OriginalPSO', 'P_PSO', 'OriginalPSS', 'DevQSA', 'ImprovedQSA', 'LevyQSA', 'OppoQSA', 'OriginalQSA', 'OriginalRIME', 'OriginalRUN', 'GaussianSA', 'OriginalSA', 'SwarmSA', 'DevSARO', 'OriginalSARO', 'DevSBO', 'OriginalSBO', 'DevSCA', 'OriginalSCA', 'QleSCA', 'OriginalSCSO', 'ImprovedSFO', 'OriginalSFO', 'L_SHADE', 'OriginalSHADE', 'OriginalSHIO', 'OriginalSHO', 'ImprovedSLO', 'ModifiedSLO', 'OriginalSLO', 'DevSMA', 'OriginalSMA', 'DevSOA', 'OriginalSOA', 'OriginalSOS', 'DevSPBO', 'OriginalSPBO', 'OriginalSRSR', 'DevSSA', 'OriginalSSA', 'OriginalSSDO', 'OriginalSSO', 'OriginalSSpiderA', 'OriginalSSpiderO', 'OriginalSTO', 'OriginalSeaHO', 'OriginalServalOA', 'OriginalTDO', 'DevTLO', 'ImprovedTLO', 'OriginalTLO', 'OriginalTOA', 'DevTPO', 'OriginalTS', 'OriginalTSA', 'OriginalTSO', 'EnhancedTWO', 'LevyTWO', 'OppoTWO', 'OriginalTWO', 'DevVCS', 'OriginalVCS', 'OriginalWCA', 'OriginalWDO', 'OriginalWHO', 'HI_WOA', 'OriginalWOA', 'OriginalWaOA', 'OriginalWarSO', 'OriginalZOA'], 'regression_objective': {'A10': 'max', 'A20': 'max', 'A30': 'max', 'ACOD': 'max', 'APCC': 'max', 'AR': 'max', 'AR2': 'max', 'CI': 'max', 'COD': 'max', 'COR': 'max', 'COV': 'max', 'CRM': 'min', 'DRV': 'min', 'EC': 'max', 'EVS': 'max', 'GINI': 'min', 'GINI_WIKI': 'min', 'JSD': 'min', 'KGE': 'max', 'MAAPE': 'min', 'MAE': 'min', 'MAPE': 'min', 'MASE': 'min', 'ME': 'min', 'MRB': 'min', 'MRE': 'min', 'MSE': 'min', 'MSLE': 'min', 'MedAE': 'min', 'NNSE': 'max', 'NRMSE': 'min', 'NSE': 'max', 'OI': 'max', 'PCC': 'max', 'PCD': 'max', 'R': 'max', 'R2': 'max', 'R2S': 'max', 'RAE': 'min', 'RMSE': 'min', 'RSE': 'min', 'RSQ': 'max', 'SMAPE': 'min', 'VAF': 'max', 'WI': 'max'}, 'transfer_func': ['vstf_01', 'vstf_02', 'vstf_03', 'vstf_04', 'sstf_01', 'sstf_02', 'sstf_03', 'sstf_04']}¶

fit(X, y=None, test_size=0.2, fit_weights=(0.9, 0.1), transfer_func='vstf_01', fs_problem=None)[source]¶

Fit the MhaSelector to the data, performing feature selection based on the specified parameters.

Parameters

X ({array-like, sparse matrix} of shape (n_samples, n_features)) –
y (array-like of shape (n_samples,)) –
test_size (float, default=0.2) –
fit_weights (tuple, default=(0.9, 0.1)) –
transfer_func (str or callable, default="vstf_01") –
fs_problem (None, callable, or FeatureSelectionProblem, default=None) –

fit_transform(X, y=None, test_size=0.2, fit_weights=(0.9, 0.1), transfer_func='vstf_01', fs_problem=None)[source]¶: Fit the MhaSelector to the data and transform it by selecting the features.

get_best_information()[source]¶: Get the best information from the optimizer after fitting.

transform(X)[source]¶: Transform the input data X by selecting the features based on the fitted model.

class mafese.wrapper.mha.MultiMhaSelector(problem='classification', obj_name=None, estimator='knn', estimator_paras=None, list_optimizers=('BaseGA',), list_optimizer_paras=None, mode='single', n_workers=None, termination=None, seed=None, verbose=True)[source]¶

Bases: mafese.selector.Selector

A class for Multi Metaheuristic-based Feature Selection (MultiMhaSelector) methods.

SUPPORT¶

A dictionary containing supported estimators, transfer functions, objectives, and optimizers.

Type: dict

obj_name¶

The name of the objective metric for the problem. Defaults to “AS” for classification and “MSE” for regression.

Type: str or None

estimator¶

The machine learning model used for feature selection. Can be a string or an object with fit and predict methods.

Type: str or object

estimator_paras¶

Parameters for the estimator. If None, default parameters are used.

Type: dict or None

list_optimizers¶

A list of metaheuristic algorithms used for optimization.

Type: list or tuple

list_optimizer_paras¶

A list of dictionaries containing parameters for each optimizer. If None, default parameters are used.

Type: list or None

mode¶

The mode of optimization. Options: ‘single’, ‘swarm’, ‘process’, ‘thread’.

Type: str

n_workers¶

Number of workers for parallel optimization. Only applicable for parallel modes.

Type: int or None

termination¶

Termination criteria for the optimization process.

Type: dict or None

seed¶

Random seed for reproducibility.

Type: int or None

verbose¶

Controls verbosity of output.

Type: bool

_set_estimator(estimator, paras)[source]¶: Configures the estimator based on the input type.

_set_optimizers(list_optimizers, list_paras)[source]¶: Configures the optimizers based on the input type.

_set_metric(metric_name, list_supported_metrics)[source]¶: Validates and sets the objective metric.

fit(X, y, test_size, n_trials, n_jobs, fit_weights, transfer_func, save_path, save_results, fs_problem)[source]¶: Fits the feature selection model to the data using multiple optimizers.

transform(X, trial, model, all_models)[source]¶: Transforms the input data to include only selected features.

fit_transform(X, y, test_size, n_trials, n_jobs, fit_weights, transfer_func, save_path, save_results, fs_problem)[source]¶: Fits the model and transforms the input data.

evaluate(estimator, estimator_paras, data, metrics, save_path, verbose)[source]¶: Evaluates the selected features using the specified estimator and metrics.

export_boxplot_figures(xlabel, ylabel, title, show_legend, show_mean_only, exts)[source]¶: Exports boxplot figures comparing models.

export_convergence_figures(xlabel, ylabel, title, exts)[source]¶: Exports convergence figures for each trial.

Examples

>>> from mafese.wrapper.mha import MultiMhaSelector
>>> selector = MultiMhaSelector(problem="classification", obj_name="AS",
>>>                              estimator="knn", list_optimizers=["BaseGA", "OriginalWOA"])
>>> selector.fit(X, y, n_trials=3, n_jobs=2)
>>> X_selected = selector.transform(X, trial=1, model="BaseGA")

SUPPORT = {'classification_objective': {'AS': 'max', 'BSL': 'min', 'CEL': 'min', 'CKS': 'max', 'F1S': 'max', 'F2S': 'max', 'FBS': 'max', 'GINI': 'min', 'GMS': 'max', 'HL': 'min', 'HS': 'max', 'JSI': 'max', 'KLDL': 'min', 'LS': 'max', 'MCC': 'max', 'NPV': 'max', 'PS': 'max', 'ROC-AUC': 'max', 'RS': 'max', 'SS': 'max'}, 'estimator': ['knn', 'svm', 'rf', 'adaboost', 'xgb', 'tree', 'ann'], 'optimizer': ['OriginalABC', 'OriginalACOR', 'AugmentedAEO', 'EnhancedAEO', 'ImprovedAEO', 'ModifiedAEO', 'OriginalAEO', 'MGTO', 'OriginalAGTO', 'DevALO', 'OriginalALO', 'OriginalAO', 'OriginalAOA', 'IARO', 'LARO', 'OriginalARO', 'OriginalASO', 'OriginalAVOA', 'OriginalArchOA', 'AdaptiveBA', 'DevBA', 'OriginalBA', 'DevBBO', 'OriginalBBO', 'OriginalBBOA', 'OriginalBES', 'ABFO', 'OriginalBFO', 'OriginalBMO', 'DevBRO', 'OriginalBRO', 'OriginalBSA', 'ImprovedBSO', 'OriginalBSO', 'CleverBookBeesA', 'OriginalBeesA', 'ProbBeesA', 'OriginalCA', 'OriginalCDO', 'OriginalCEM', 'OriginalCGO', 'DevCHIO', 'OriginalCHIO', 'OriginalCOA', 'OCRO', 'OriginalCRO', 'OriginalCSA', 'OriginalCSO', 'OriginalCircleSA', 'OriginalCoatiOA', 'JADE', 'OriginalDE', 'SADE', 'SAP_DE', 'DevDMOA', 'OriginalDMOA', 'OriginalDO', 'DevEFO', 'OriginalEFO', 'OriginalEHO', 'AdaptiveEO', 'ModifiedEO', 'OriginalEO', 'OriginalEOA', 'LevyEP', 'OriginalEP', 'CMA_ES', 'LevyES', 'OriginalES', 'Simple_CMA_ES', 'OriginalESOA', 'OriginalEVO', 'OriginalFA', 'DevFBIO', 'OriginalFBIO', 'OriginalFFA', 'OriginalFFO', 'OriginalFLA', 'DevFOA', 'OriginalFOA', 'WhaleFOA', 'DevFOX', 'OriginalFOX', 'OriginalFPA', 'BaseGA', 'EliteMultiGA', 'EliteSingleGA', 'MultiGA', 'SingleGA', 'OriginalGBO', 'DevGCO', 'OriginalGCO', 'OriginalGJO', 'OriginalGOA', 'DevGSKA', 'OriginalGSKA', 'Matlab101GTO', 'Matlab102GTO', 'OriginalGTO', 'GWO_WOA', 'IGWO', 'OriginalGWO', 'RW_GWO', 'OriginalHBA', 'OriginalHBO', 'OriginalHC', 'SwarmHC', 'OriginalHCO', 'OriginalHGS', 'OriginalHGSO', 'OriginalHHO', 'DevHS', 'OriginalHS', 'OriginalICA', 'OriginalINFO', 'OriginalIWO', 'DevJA', 'LevyJA', 'OriginalJA', 'DevLCO', 'ImprovedLCO', 'OriginalLCO', 'OriginalMA', 'OriginalMFO', 'OriginalMGO', 'OriginalMPA', 'OriginalMRFO', 'WMQIMRFO', 'OriginalMSA', 'DevMVO', 'OriginalMVO', 'OriginalNGO', 'ImprovedNMRA', 'OriginalNMRA', 'OriginalNRO', 'OriginalOOA', 'OriginalPFA', 'OriginalPOA', 'AIW_PSO', 'CL_PSO', 'C_PSO', 'HPSO_TVAC', 'LDW_PSO', 'OriginalPSO', 'P_PSO', 'OriginalPSS', 'DevQSA', 'ImprovedQSA', 'LevyQSA', 'OppoQSA', 'OriginalQSA', 'OriginalRIME', 'OriginalRUN', 'GaussianSA', 'OriginalSA', 'SwarmSA', 'DevSARO', 'OriginalSARO', 'DevSBO', 'OriginalSBO', 'DevSCA', 'OriginalSCA', 'QleSCA', 'OriginalSCSO', 'ImprovedSFO', 'OriginalSFO', 'L_SHADE', 'OriginalSHADE', 'OriginalSHIO', 'OriginalSHO', 'ImprovedSLO', 'ModifiedSLO', 'OriginalSLO', 'DevSMA', 'OriginalSMA', 'DevSOA', 'OriginalSOA', 'OriginalSOS', 'DevSPBO', 'OriginalSPBO', 'OriginalSRSR', 'DevSSA', 'OriginalSSA', 'OriginalSSDO', 'OriginalSSO', 'OriginalSSpiderA', 'OriginalSSpiderO', 'OriginalSTO', 'OriginalSeaHO', 'OriginalServalOA', 'OriginalTDO', 'DevTLO', 'ImprovedTLO', 'OriginalTLO', 'OriginalTOA', 'DevTPO', 'OriginalTS', 'OriginalTSA', 'OriginalTSO', 'EnhancedTWO', 'LevyTWO', 'OppoTWO', 'OriginalTWO', 'DevVCS', 'OriginalVCS', 'OriginalWCA', 'OriginalWDO', 'OriginalWHO', 'HI_WOA', 'OriginalWOA', 'OriginalWaOA', 'OriginalWarSO', 'OriginalZOA'], 'regression_objective': {'A10': 'max', 'A20': 'max', 'A30': 'max', 'ACOD': 'max', 'APCC': 'max', 'AR': 'max', 'AR2': 'max', 'CI': 'max', 'COD': 'max', 'COR': 'max', 'COV': 'max', 'CRM': 'min', 'DRV': 'min', 'EC': 'max', 'EVS': 'max', 'GINI': 'min', 'GINI_WIKI': 'min', 'JSD': 'min', 'KGE': 'max', 'MAAPE': 'min', 'MAE': 'min', 'MAPE': 'min', 'MASE': 'min', 'ME': 'min', 'MRB': 'min', 'MRE': 'min', 'MSE': 'min', 'MSLE': 'min', 'MedAE': 'min', 'NNSE': 'max', 'NRMSE': 'min', 'NSE': 'max', 'OI': 'max', 'PCC': 'max', 'PCD': 'max', 'R': 'max', 'R2': 'max', 'R2S': 'max', 'RAE': 'min', 'RMSE': 'min', 'RSE': 'min', 'RSQ': 'max', 'SMAPE': 'min', 'VAF': 'max', 'WI': 'max'}, 'transfer_func': ['vstf_01', 'vstf_02', 'vstf_03', 'vstf_04', 'sstf_01', 'sstf_02', 'sstf_03', 'sstf_04']}¶

evaluate(estimator=None, estimator_paras=None, data=None, metrics=None, save_path='history', verbose=False)[source]¶

Evaluate the new dataset. We will re-train the estimator with training set and return the metrics of both training and testing set

Parameters

estimator (str or Estimator instance (from scikit-learn or custom)) –
If estimator is str, we are currently support:
- knn: k-nearest neighbors
- svm: support vector machine
- rf: random forest
- adaboost: AdaBoost
- xgb: Gradient Boosting
- tree: Extra Trees
- ann: Artificial Neural Network (Multi-Layer Perceptron)
If estimator is Estimator instance: you need to make sure that it has fit and predict methods
estimator_paras (None or dict, default = None) – The parameters of the estimator, please see the official document of scikit-learn to selected estimator. If None, we use the default parameter for selected estimator
data (Data, an instance of Data class. It must have training and testing set) –
metrics (tuple, list, default = None) – Depend on the regression or classification you are trying to tackle. The supported metrics can be found at: https://github.com/thieu1995/permetrics
save_path (str, default="history") – The path to save the file
verbose (bool, default=False) – Print the results to console or not.

Returns

metrics_results – The metrics for both training and testing set.

Return type

dict.

export_boxplot_figures(xlabel='Model', ylabel='Global best fitness value', title='Boxplot of comparison models', show_legend=True, show_mean_only=False, exts=('.png', '.pdf'))[source]¶

Export boxplot figures of the best fitness values for each model across trials.

Parameters

xlabel (str, default="Model") –
ylabel (str, default="Global best fitness value") –
title (str, default="Boxplot of comparison models") –
show_legend (bool, default=True) –
show_mean_only (bool, default=False) –
exts (tuple, default=(".png", ".pdf")) –

export_convergence_figures(xlabel='Epoch', ylabel='Fitness value', title='Convergence chart of comparison models', exts=('.png', '.pdf'))[source]¶

Export convergence figures for each trial and model.

Parameters

xlabel (str, default="Epoch") –
ylabel (str, default="Fitness value") –
title (str, default="Convergence chart of comparison models") –
exts (tuple, default=(".png", ".pdf")) –

fit(X, y=None, test_size=0.2, n_trials=2, n_jobs=None, fit_weights=(0.9, 0.1), transfer_func='vstf_01', save_path='history', save_results=True, fs_problem=None)[source]¶

Parameters

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values.
test_size (float, default=0.2) – The proportion of the dataset to include in the test split. Must be between 0.0 and 1.0.
n_trials (int, default=2) – The number of trials to run for each optimizer. Each trial will use a different random seed.
n_jobs (int, None.) – Number of processes will be used to speed up the computation (<=1 or None: sequential, >=2: parallel)
fit_weights (list, tuple or np.ndarray, default = (0.9, 0.1)) – The first weight is for objective value and the second weight is for the number of features
transfer_func (str or callable function, default="vstf_01") –
The transfer function used to convert solution from float to integer. Current supported list:
- v-shape transfer function: “vstf_01”, “vstf_02”, “vstf_03”, “vstf_04”
- s-shape transfer function: “sstf_01”, “sstf_02”, “sstf_03”, “sstf_04”
If callable function, make sure it return a list/tuple/np.ndarray values.
save_path (str. The path to the folder that hold results) –
save_results (bool.) – Save the global best fitness and loss (convergence/fitness) during generations to csv file (default: True)

fit_transform(X, y=None, test_size=0.2, n_trials=2, n_jobs=None, fit_weights=(0.9, 0.1), transfer_func='vstf_01', save_path='history', save_results=True, fs_problem=None)[source]¶: Fit the MultiMhaSelector to the data and transform it by selecting the features.

transform(X, trial=1, model='BaseGA', all_models=False)[source]¶

Transform the input data X by selecting the features based on the fitted model.

Parameters

X ({array-like, sparse matrix} of shape (n_samples, n_features)) –
trial (int, default=1) –
model (str, default="BaseGA") –
all_models (bool, default=False) –