mafese.wrapper package¶
mafese.wrapper.recursive module¶
- class mafese.wrapper.recursive.RecursiveSelector(problem='classification', estimator='knn', estimator_paras=None, n_features=3, step=1, verbose=0, importance_getter='auto')[source]¶
Bases:
mafese.selector.SelectorDefines a RecursiveSelector class that hold all RecursiveSelector Feature Selection methods for feature selection problems
- Parameters
problem (str, default = "classification") – The problem you are trying to solve (or type of dataset), “classification” or “regression”
estimator (str or Estimator instance (from scikit-learn or custom)) –
- If estimator is str, we are currently support:
svm: support vector machine with kernel = ‘linear’
rf: random forest
adaboost: AdaBoost
xgb: Gradient Boosting
tree: Extra Trees
If estimator is Estimator instance: you need to make sure it is has a
fitmethod that provides information about feature importance (e.g. coef_, feature_importances_).estimator_paras (None or dict, default = None) – The parameters of the estimator, please see the official document of scikit-learn to selected estimator. If None, we use the best parameter for selected estimator
n_features (int or float, default=3) – The number of features to select. If None, half of the features are selected. If integer, the parameter is the absolute number of features to select. If float between 0 and 1, it is the fraction of features to select.
step (int or float, default=1) – If greater than or equal to 1, then
stepcorresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), thenstepcorresponds to the percentage (rounded down) of features to remove at each iteration.verbose (int, default=0) – Controls verbosity of output.
importance_getter (str or callable, default='auto') –
If ‘auto’, uses the feature importance either through a coef_ or feature_importances_ attributes of estimator.
Also accepts a string that specifies an attribute name/path for extracting feature importance (implemented with attrgetter). For example, give regressor_.coef_ in case of
TransformedTargetRegressoror named_steps.clf.feature_importances_ in case of class:~sklearn.pipeline.Pipeline with its last step named clf.If callable, overrides the default feature importance getter. The callable is passed with the fitted estimator and it should return importance for each feature.
Examples
The following example shows how to retrieve the most informative features in the RecursiveSelector FS method
>>> import pandas as pd >>> from mafese.wrapper.recursive import RecursiveSelector >>> # load dataset >>> dataset = pd.read_csv('your_path/dataset.csv', index_col=0).values >>> X, y = dataset[:, 0:-1], dataset[:, -1] # Assumption that the last column is label column >>> # define mafese feature selection method >>> feat_selector = RecursiveSelector(problem="classification", estimator="rf", n_features=5) >>> # find all relevant features >>> feat_selector.fit(X, y) >>> # check selected features - True (or 1) is selected, False (or 0) is not selected >>> print(feat_selector.selected_feature_masks) array([ True, True, True, False, False, True, False, False, False, True]) >>> print(feat_selector.selected_feature_solution) array([ 1, 1, 1, 0, 0, 1, 0, 0, 0, 1]) >>> # check the index of selected features >>> print(feat_selector.selected_feature_indexes) array([ 0, 1, 2, 5, 9]) >>> # call transform() on X to filter it down to selected features >>> X_filtered = feat_selector.transform(X)
- SUPPORT = ['svm', 'rf', 'adaboost', 'xgb', 'tree']¶
- fit(X, y=None)[source]¶
Learn the features to select from X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of predictors.
y (array-like of shape (n_samples,), default=None) – Target values. This parameter may be ignored for unsupervised learning.
- Returns
self – Returns the instance itself.
- Return type
object
mafese.wrapper.sequential module¶
- class mafese.wrapper.sequential.SequentialSelector(problem='classification', estimator='knn', estimator_paras=None, n_features=3, direction='forward', tol=None, scoring=None, cv=5, n_jobs=None)[source]¶
Bases:
mafese.selector.SelectorDefines a SequentialSelector class that hold all Forward or Backward Feature Selection methods for feature selection problems
- Parameters
problem (str, default = "classification") – The problem you are trying to solve (or type of dataset), “classification” or “regression”
estimator (str or Estimator instance (from scikit-learn or custom)) –
- If estimator is str, we are currently support:
knn: k-nearest neighbors
svm: support vector machine
rf: random forest
adaboost: AdaBoost
xgb: Gradient Boosting
tree: Extra Trees
ann: Artificial Neural Network (Multi-Layer Perceptron)
If estimator is Estimator instance: you need to make sure it is has a
fitmethod that provides information about feature importance (e.g. coef_, feature_importances_).estimator_paras (None or dict, default = None) – The parameters of the estimator, please see the official document of scikit-learn to selected estimator. If None, we use the default parameter for selected estimator
n_features (int or float, default=3) – The number of features to select. If None, half of the features are selected. If integer, the parameter is the absolute number of features to select. If float between 0 and 1, it is the fraction of features to select.
direction ({'forward', 'backward'}, default='forward') – Whether to perform forward selection or backward selection.
tol (float, default=None) – If the score is not incremented by at least tol between two consecutive feature additions or removals, stop adding or removing. tol can be negative when removing features using direction=”backward”. It can be useful to reduce the number of features at the cost of a small decrease in the score. tol is enabled only when n_features is “auto”.
scoring (str or callable, default=None) – A single str (see scoring_parameter) or a callable to evaluate the predictions on the test set. NOTE that when using a custom scorer, it should return a single value. If None, the estimator’s score method is used.
cv (int, cross-validation generator or an iterable, default=None) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross validation,
integer, to specify the number of folds in a (Stratified)KFold,
CV splitter,
An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if the estimator is a classifier and
yis either binary or multiclass,StratifiedKFoldis used. In all other cases,KFoldis used. These splitters are instantiated with shuffle=False so the splits will be the same across calls.n_jobs (int, default=None) – Number of jobs to run in parallel. When evaluating a new feature to add or remove, the cross-validation procedure is parallel over the folds.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.
Examples
The following example shows how to retrieve the most informative features in the Sequential-based (forward, backward) FS method
>>> import pandas as pd >>> from mafese.wrapper.sequential import SequentialSelector >>> # load dataset >>> dataset = pd.read_csv('your_path/dataset.csv', index_col=0).values >>> X, y = dataset[:, 0:-1], dataset[:, -1] # Assumption that the last column is label column >>> # define mafese feature selection method >>> feat_selector = SequentialSelector(problem="classification", estimator="knn", n_features=5, direction="forward") >>> # find all relevant features >>> feat_selector.fit(X, y) >>> # check selected features - True (or 1) is selected, False (or 0) is not selected >>> print(feat_selector.selected_feature_masks) array([ True, True, True, False, False, True, False, False, False, True]) >>> print(feat_selector.selected_feature_solution) array([ 1, 1, 1, 0, 0, 1, 0, 0, 0, 1]) >>> # check the index of selected features >>> print(feat_selector.selected_feature_indexes) array([ 0, 1, 2, 5, 9]) >>> # call transform() on X to filter it down to selected features >>> X_filtered = feat_selector.transform(X)
- SUPPORT = ['knn', 'svm', 'rf', 'adaboost', 'xgb', 'tree', 'ann']¶
- fit(X, y=None)[source]¶
Learn the features to select from X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of predictors.
y (array-like of shape (n_samples,), default=None) – Target values. This parameter may be ignored for unsupervised learning.
- Returns
self – Returns the instance itself.
- Return type
object
mafese.wrapper.mha module¶
- class mafese.wrapper.mha.MhaSelector(problem='classification', obj_name=None, estimator='knn', estimator_paras=None, optimizer='BaseGA', optimizer_paras=None, mode='single', n_workers=None, termination=None, seed=None, verbose=True)[source]¶
Bases:
mafese.selector.SelectorDefines a MhaSelector class that hold all Metaheuristic-based Feature Selection methods for feature selection problems
- Parameters
problem (str, default = "classification") – The problem you are trying to solve (or type of dataset), “classification” or “regression”
obj_name (None or str, default=None) –
The name of objective for the problem, also depend on the problem is classification and regression.
If problem is classification, None will be replaced by AS (Accuracy score).
If problem is regression, None will be replaced by MSE (Mean squared error).
estimator (str or Estimator instance (from scikit-learn or custom)) –
- If estimator is str, we are currently support:
knn: k-nearest neighbors
svm: support vector machine
rf: random forest
adaboost: AdaBoost
xgb: Gradient Boosting
tree: Extra Trees
ann: Artificial Neural Network (Multi-Layer Perceptron)
If estimator is Estimator instance: you need to make sure that it has fit and predict methods
estimator_paras (None or dict, default = None) – The parameters of the estimator, please see the official document of scikit-learn to selected estimator. If None, we use the default parameter for selected estimator
optimizer (str or instance of Optimizer class (from Mealpy library), default = "BaseGA") – The Metaheuristic Algorithm that use to solve the feature selection problem. Current supported list, please check it here: https://github.com/thieu1995/mealpy. If a custom optimizer is passed, make sure it is an instance of Optimizer class.
optimizer_paras (None or dict of parameter, default=None) – The parameter for the optimizer object. If None, the default parameters of optimizer is used (defined in https://github.com/thieu1995/mealpy.) If dict is passed, make sure it has at least epoch and pop_size parameters.
mode (str, default = 'single') –
The mode used in Optimizer belongs to Mealpy library. Parallel: ‘process’, ‘thread’; Sequential: ‘swarm’, ‘single’.
’process’: The parallel mode with multiple cores run the tasks
’thread’: The parallel mode with multiple threads run the tasks
’swarm’: The sequential mode that no effect on updating phase of other agents
’single’: The sequential mode that effect on updating phase of other agents, default
n_workers (int or None, default = None) – The number of workers (cores or threads) used in Optimizer (effect only on parallel mode)
termination (dict or None, default = None) – The termination dictionary or an instance of Termination class. It is for Optimizer belongs to Mealpy library.
verbose (int, default = True) – Controls verbosity of output.
seed (int or None) – Random seed for reproducibility.
Examples
The following example shows how to retrieve the most informative features in the MhaSelector FS method
>>> import pandas as pd >>> from mafese.wrapper.mha import MhaSelector >>> # load dataset >>> dataset = pd.read_csv('your_path/dataset.csv', index_col=0).values >>> X, y = dataset[:, 0:-1], dataset[:, -1] # Assumption that the last column is label column >>> # define mafese feature selection method >>> selector = MhaSelector(problem="classification", obj_name="AS", estimator="rf", optimizer="BaseGA") >>> # find all relevant features - 5 features should be selected >>> selector.fit(X, y) >>> # check selected features - True (or 1) is selected, False (or 0) is not selected >>> print(selector.selected_feature_masks) array([ True, True, True, False, False, True, False, False, False, True]) >>> print(selector.selected_feature_solution) array([ 1, 1, 1, 0, 0, 1, 0, 0, 0, 1]) >>> # check the index of selected features >>> print(selector.selected_feature_indexes) array([ 0, 1, 2, 5, 9]) >>> # call transform() on X to filter it down to selected features >>> X_filtered = selector.transform(X)
- SUPPORT = {'classification_objective': {'AS': 'max', 'BSL': 'min', 'CEL': 'min', 'CKS': 'max', 'F1S': 'max', 'F2S': 'max', 'FBS': 'max', 'GINI': 'min', 'GMS': 'max', 'HL': 'min', 'HS': 'max', 'JSI': 'max', 'KLDL': 'min', 'LS': 'max', 'MCC': 'max', 'NPV': 'max', 'PS': 'max', 'ROC-AUC': 'max', 'RS': 'max', 'SS': 'max'}, 'estimator': ['knn', 'svm', 'rf', 'adaboost', 'xgb', 'tree', 'ann'], 'optimizer': ['OriginalABC', 'OriginalACOR', 'AugmentedAEO', 'EnhancedAEO', 'ImprovedAEO', 'ModifiedAEO', 'OriginalAEO', 'MGTO', 'OriginalAGTO', 'DevALO', 'OriginalALO', 'OriginalAO', 'OriginalAOA', 'IARO', 'LARO', 'OriginalARO', 'OriginalASO', 'OriginalAVOA', 'OriginalArchOA', 'AdaptiveBA', 'DevBA', 'OriginalBA', 'DevBBO', 'OriginalBBO', 'OriginalBBOA', 'OriginalBES', 'ABFO', 'OriginalBFO', 'OriginalBMO', 'DevBRO', 'OriginalBRO', 'OriginalBSA', 'ImprovedBSO', 'OriginalBSO', 'CleverBookBeesA', 'OriginalBeesA', 'ProbBeesA', 'OriginalCA', 'OriginalCDO', 'OriginalCEM', 'OriginalCGO', 'DevCHIO', 'OriginalCHIO', 'OriginalCOA', 'OCRO', 'OriginalCRO', 'OriginalCSA', 'OriginalCSO', 'OriginalCircleSA', 'OriginalCoatiOA', 'JADE', 'OriginalDE', 'SADE', 'SAP_DE', 'DevDMOA', 'OriginalDMOA', 'OriginalDO', 'DevEFO', 'OriginalEFO', 'OriginalEHO', 'AdaptiveEO', 'ModifiedEO', 'OriginalEO', 'OriginalEOA', 'LevyEP', 'OriginalEP', 'CMA_ES', 'LevyES', 'OriginalES', 'Simple_CMA_ES', 'OriginalESOA', 'OriginalEVO', 'OriginalFA', 'DevFBIO', 'OriginalFBIO', 'OriginalFFA', 'OriginalFFO', 'OriginalFLA', 'DevFOA', 'OriginalFOA', 'WhaleFOA', 'DevFOX', 'OriginalFOX', 'OriginalFPA', 'BaseGA', 'EliteMultiGA', 'EliteSingleGA', 'MultiGA', 'SingleGA', 'OriginalGBO', 'DevGCO', 'OriginalGCO', 'OriginalGJO', 'OriginalGOA', 'DevGSKA', 'OriginalGSKA', 'Matlab101GTO', 'Matlab102GTO', 'OriginalGTO', 'GWO_WOA', 'IGWO', 'OriginalGWO', 'RW_GWO', 'OriginalHBA', 'OriginalHBO', 'OriginalHC', 'SwarmHC', 'OriginalHCO', 'OriginalHGS', 'OriginalHGSO', 'OriginalHHO', 'DevHS', 'OriginalHS', 'OriginalICA', 'OriginalINFO', 'OriginalIWO', 'DevJA', 'LevyJA', 'OriginalJA', 'DevLCO', 'ImprovedLCO', 'OriginalLCO', 'OriginalMA', 'OriginalMFO', 'OriginalMGO', 'OriginalMPA', 'OriginalMRFO', 'WMQIMRFO', 'OriginalMSA', 'DevMVO', 'OriginalMVO', 'OriginalNGO', 'ImprovedNMRA', 'OriginalNMRA', 'OriginalNRO', 'OriginalOOA', 'OriginalPFA', 'OriginalPOA', 'AIW_PSO', 'CL_PSO', 'C_PSO', 'HPSO_TVAC', 'LDW_PSO', 'OriginalPSO', 'P_PSO', 'OriginalPSS', 'DevQSA', 'ImprovedQSA', 'LevyQSA', 'OppoQSA', 'OriginalQSA', 'OriginalRIME', 'OriginalRUN', 'GaussianSA', 'OriginalSA', 'SwarmSA', 'DevSARO', 'OriginalSARO', 'DevSBO', 'OriginalSBO', 'DevSCA', 'OriginalSCA', 'QleSCA', 'OriginalSCSO', 'ImprovedSFO', 'OriginalSFO', 'L_SHADE', 'OriginalSHADE', 'OriginalSHIO', 'OriginalSHO', 'ImprovedSLO', 'ModifiedSLO', 'OriginalSLO', 'DevSMA', 'OriginalSMA', 'DevSOA', 'OriginalSOA', 'OriginalSOS', 'DevSPBO', 'OriginalSPBO', 'OriginalSRSR', 'DevSSA', 'OriginalSSA', 'OriginalSSDO', 'OriginalSSO', 'OriginalSSpiderA', 'OriginalSSpiderO', 'OriginalSTO', 'OriginalSeaHO', 'OriginalServalOA', 'OriginalTDO', 'DevTLO', 'ImprovedTLO', 'OriginalTLO', 'OriginalTOA', 'DevTPO', 'OriginalTS', 'OriginalTSA', 'OriginalTSO', 'EnhancedTWO', 'LevyTWO', 'OppoTWO', 'OriginalTWO', 'DevVCS', 'OriginalVCS', 'OriginalWCA', 'OriginalWDO', 'OriginalWHO', 'HI_WOA', 'OriginalWOA', 'OriginalWaOA', 'OriginalWarSO', 'OriginalZOA'], 'regression_objective': {'A10': 'max', 'A20': 'max', 'A30': 'max', 'ACOD': 'max', 'APCC': 'max', 'AR': 'max', 'AR2': 'max', 'CI': 'max', 'COD': 'max', 'COR': 'max', 'COV': 'max', 'CRM': 'min', 'DRV': 'min', 'EC': 'max', 'EVS': 'max', 'GINI': 'min', 'GINI_WIKI': 'min', 'JSD': 'min', 'KGE': 'max', 'MAAPE': 'min', 'MAE': 'min', 'MAPE': 'min', 'MASE': 'min', 'ME': 'min', 'MRB': 'min', 'MRE': 'min', 'MSE': 'min', 'MSLE': 'min', 'MedAE': 'min', 'NNSE': 'max', 'NRMSE': 'min', 'NSE': 'max', 'OI': 'max', 'PCC': 'max', 'PCD': 'max', 'R': 'max', 'R2': 'max', 'R2S': 'max', 'RAE': 'min', 'RMSE': 'min', 'RSE': 'min', 'RSQ': 'max', 'SMAPE': 'min', 'VAF': 'max', 'WI': 'max'}, 'transfer_func': ['vstf_01', 'vstf_02', 'vstf_03', 'vstf_04', 'sstf_01', 'sstf_02', 'sstf_03', 'sstf_04']}¶
- fit(X, y=None, test_size=0.2, fit_weights=(0.9, 0.1), transfer_func='vstf_01', fs_problem=None)[source]¶
Fit the MhaSelector to the data, performing feature selection based on the specified parameters.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) –
y (array-like of shape (n_samples,)) –
test_size (float, default=0.2) –
fit_weights (tuple, default=(0.9, 0.1)) –
transfer_func (str or callable, default="vstf_01") –
fs_problem (None, callable, or FeatureSelectionProblem, default=None) –
- class mafese.wrapper.mha.MultiMhaSelector(problem='classification', obj_name=None, estimator='knn', estimator_paras=None, list_optimizers=('BaseGA',), list_optimizer_paras=None, mode='single', n_workers=None, termination=None, seed=None, verbose=True)[source]¶
Bases:
mafese.selector.SelectorA class for Multi Metaheuristic-based Feature Selection (MultiMhaSelector) methods.
- SUPPORT¶
A dictionary containing supported estimators, transfer functions, objectives, and optimizers.
- Type
dict
- obj_name¶
The name of the objective metric for the problem. Defaults to “AS” for classification and “MSE” for regression.
- Type
str or None
- estimator¶
The machine learning model used for feature selection. Can be a string or an object with fit and predict methods.
- Type
str or object
- estimator_paras¶
Parameters for the estimator. If None, default parameters are used.
- Type
dict or None
- list_optimizers¶
A list of metaheuristic algorithms used for optimization.
- Type
list or tuple
- list_optimizer_paras¶
A list of dictionaries containing parameters for each optimizer. If None, default parameters are used.
- Type
list or None
- mode¶
The mode of optimization. Options: ‘single’, ‘swarm’, ‘process’, ‘thread’.
- Type
str
- n_workers¶
Number of workers for parallel optimization. Only applicable for parallel modes.
- Type
int or None
- termination¶
Termination criteria for the optimization process.
- Type
dict or None
- seed¶
Random seed for reproducibility.
- Type
int or None
- verbose¶
Controls verbosity of output.
- Type
bool
- _set_optimizers(list_optimizers, list_paras)[source]¶
Configures the optimizers based on the input type.
- fit(X, y, test_size, n_trials, n_jobs, fit_weights, transfer_func, save_path, save_results, fs_problem)[source]¶
Fits the feature selection model to the data using multiple optimizers.
- transform(X, trial, model, all_models)[source]¶
Transforms the input data to include only selected features.
- fit_transform(X, y, test_size, n_trials, n_jobs, fit_weights, transfer_func, save_path, save_results, fs_problem)[source]¶
Fits the model and transforms the input data.
- evaluate(estimator, estimator_paras, data, metrics, save_path, verbose)[source]¶
Evaluates the selected features using the specified estimator and metrics.
- export_boxplot_figures(xlabel, ylabel, title, show_legend, show_mean_only, exts)[source]¶
Exports boxplot figures comparing models.
- export_convergence_figures(xlabel, ylabel, title, exts)[source]¶
Exports convergence figures for each trial.
Examples
>>> from mafese.wrapper.mha import MultiMhaSelector >>> selector = MultiMhaSelector(problem="classification", obj_name="AS", >>> estimator="knn", list_optimizers=["BaseGA", "OriginalWOA"]) >>> selector.fit(X, y, n_trials=3, n_jobs=2) >>> X_selected = selector.transform(X, trial=1, model="BaseGA")
- SUPPORT = {'classification_objective': {'AS': 'max', 'BSL': 'min', 'CEL': 'min', 'CKS': 'max', 'F1S': 'max', 'F2S': 'max', 'FBS': 'max', 'GINI': 'min', 'GMS': 'max', 'HL': 'min', 'HS': 'max', 'JSI': 'max', 'KLDL': 'min', 'LS': 'max', 'MCC': 'max', 'NPV': 'max', 'PS': 'max', 'ROC-AUC': 'max', 'RS': 'max', 'SS': 'max'}, 'estimator': ['knn', 'svm', 'rf', 'adaboost', 'xgb', 'tree', 'ann'], 'optimizer': ['OriginalABC', 'OriginalACOR', 'AugmentedAEO', 'EnhancedAEO', 'ImprovedAEO', 'ModifiedAEO', 'OriginalAEO', 'MGTO', 'OriginalAGTO', 'DevALO', 'OriginalALO', 'OriginalAO', 'OriginalAOA', 'IARO', 'LARO', 'OriginalARO', 'OriginalASO', 'OriginalAVOA', 'OriginalArchOA', 'AdaptiveBA', 'DevBA', 'OriginalBA', 'DevBBO', 'OriginalBBO', 'OriginalBBOA', 'OriginalBES', 'ABFO', 'OriginalBFO', 'OriginalBMO', 'DevBRO', 'OriginalBRO', 'OriginalBSA', 'ImprovedBSO', 'OriginalBSO', 'CleverBookBeesA', 'OriginalBeesA', 'ProbBeesA', 'OriginalCA', 'OriginalCDO', 'OriginalCEM', 'OriginalCGO', 'DevCHIO', 'OriginalCHIO', 'OriginalCOA', 'OCRO', 'OriginalCRO', 'OriginalCSA', 'OriginalCSO', 'OriginalCircleSA', 'OriginalCoatiOA', 'JADE', 'OriginalDE', 'SADE', 'SAP_DE', 'DevDMOA', 'OriginalDMOA', 'OriginalDO', 'DevEFO', 'OriginalEFO', 'OriginalEHO', 'AdaptiveEO', 'ModifiedEO', 'OriginalEO', 'OriginalEOA', 'LevyEP', 'OriginalEP', 'CMA_ES', 'LevyES', 'OriginalES', 'Simple_CMA_ES', 'OriginalESOA', 'OriginalEVO', 'OriginalFA', 'DevFBIO', 'OriginalFBIO', 'OriginalFFA', 'OriginalFFO', 'OriginalFLA', 'DevFOA', 'OriginalFOA', 'WhaleFOA', 'DevFOX', 'OriginalFOX', 'OriginalFPA', 'BaseGA', 'EliteMultiGA', 'EliteSingleGA', 'MultiGA', 'SingleGA', 'OriginalGBO', 'DevGCO', 'OriginalGCO', 'OriginalGJO', 'OriginalGOA', 'DevGSKA', 'OriginalGSKA', 'Matlab101GTO', 'Matlab102GTO', 'OriginalGTO', 'GWO_WOA', 'IGWO', 'OriginalGWO', 'RW_GWO', 'OriginalHBA', 'OriginalHBO', 'OriginalHC', 'SwarmHC', 'OriginalHCO', 'OriginalHGS', 'OriginalHGSO', 'OriginalHHO', 'DevHS', 'OriginalHS', 'OriginalICA', 'OriginalINFO', 'OriginalIWO', 'DevJA', 'LevyJA', 'OriginalJA', 'DevLCO', 'ImprovedLCO', 'OriginalLCO', 'OriginalMA', 'OriginalMFO', 'OriginalMGO', 'OriginalMPA', 'OriginalMRFO', 'WMQIMRFO', 'OriginalMSA', 'DevMVO', 'OriginalMVO', 'OriginalNGO', 'ImprovedNMRA', 'OriginalNMRA', 'OriginalNRO', 'OriginalOOA', 'OriginalPFA', 'OriginalPOA', 'AIW_PSO', 'CL_PSO', 'C_PSO', 'HPSO_TVAC', 'LDW_PSO', 'OriginalPSO', 'P_PSO', 'OriginalPSS', 'DevQSA', 'ImprovedQSA', 'LevyQSA', 'OppoQSA', 'OriginalQSA', 'OriginalRIME', 'OriginalRUN', 'GaussianSA', 'OriginalSA', 'SwarmSA', 'DevSARO', 'OriginalSARO', 'DevSBO', 'OriginalSBO', 'DevSCA', 'OriginalSCA', 'QleSCA', 'OriginalSCSO', 'ImprovedSFO', 'OriginalSFO', 'L_SHADE', 'OriginalSHADE', 'OriginalSHIO', 'OriginalSHO', 'ImprovedSLO', 'ModifiedSLO', 'OriginalSLO', 'DevSMA', 'OriginalSMA', 'DevSOA', 'OriginalSOA', 'OriginalSOS', 'DevSPBO', 'OriginalSPBO', 'OriginalSRSR', 'DevSSA', 'OriginalSSA', 'OriginalSSDO', 'OriginalSSO', 'OriginalSSpiderA', 'OriginalSSpiderO', 'OriginalSTO', 'OriginalSeaHO', 'OriginalServalOA', 'OriginalTDO', 'DevTLO', 'ImprovedTLO', 'OriginalTLO', 'OriginalTOA', 'DevTPO', 'OriginalTS', 'OriginalTSA', 'OriginalTSO', 'EnhancedTWO', 'LevyTWO', 'OppoTWO', 'OriginalTWO', 'DevVCS', 'OriginalVCS', 'OriginalWCA', 'OriginalWDO', 'OriginalWHO', 'HI_WOA', 'OriginalWOA', 'OriginalWaOA', 'OriginalWarSO', 'OriginalZOA'], 'regression_objective': {'A10': 'max', 'A20': 'max', 'A30': 'max', 'ACOD': 'max', 'APCC': 'max', 'AR': 'max', 'AR2': 'max', 'CI': 'max', 'COD': 'max', 'COR': 'max', 'COV': 'max', 'CRM': 'min', 'DRV': 'min', 'EC': 'max', 'EVS': 'max', 'GINI': 'min', 'GINI_WIKI': 'min', 'JSD': 'min', 'KGE': 'max', 'MAAPE': 'min', 'MAE': 'min', 'MAPE': 'min', 'MASE': 'min', 'ME': 'min', 'MRB': 'min', 'MRE': 'min', 'MSE': 'min', 'MSLE': 'min', 'MedAE': 'min', 'NNSE': 'max', 'NRMSE': 'min', 'NSE': 'max', 'OI': 'max', 'PCC': 'max', 'PCD': 'max', 'R': 'max', 'R2': 'max', 'R2S': 'max', 'RAE': 'min', 'RMSE': 'min', 'RSE': 'min', 'RSQ': 'max', 'SMAPE': 'min', 'VAF': 'max', 'WI': 'max'}, 'transfer_func': ['vstf_01', 'vstf_02', 'vstf_03', 'vstf_04', 'sstf_01', 'sstf_02', 'sstf_03', 'sstf_04']}¶
- evaluate(estimator=None, estimator_paras=None, data=None, metrics=None, save_path='history', verbose=False)[source]¶
Evaluate the new dataset. We will re-train the estimator with training set and return the metrics of both training and testing set
- Parameters
estimator (str or Estimator instance (from scikit-learn or custom)) –
- If estimator is str, we are currently support:
knn: k-nearest neighbors
svm: support vector machine
rf: random forest
adaboost: AdaBoost
xgb: Gradient Boosting
tree: Extra Trees
ann: Artificial Neural Network (Multi-Layer Perceptron)
If estimator is Estimator instance: you need to make sure that it has fit and predict methods
estimator_paras (None or dict, default = None) – The parameters of the estimator, please see the official document of scikit-learn to selected estimator. If None, we use the default parameter for selected estimator
data (Data, an instance of Data class. It must have training and testing set) –
metrics (tuple, list, default = None) – Depend on the regression or classification you are trying to tackle. The supported metrics can be found at: https://github.com/thieu1995/permetrics
save_path (str, default="history") – The path to save the file
verbose (bool, default=False) – Print the results to console or not.
- Returns
metrics_results – The metrics for both training and testing set.
- Return type
dict.
- export_boxplot_figures(xlabel='Model', ylabel='Global best fitness value', title='Boxplot of comparison models', show_legend=True, show_mean_only=False, exts=('.png', '.pdf'))[source]¶
Export boxplot figures of the best fitness values for each model across trials.
- Parameters
xlabel (str, default="Model") –
ylabel (str, default="Global best fitness value") –
title (str, default="Boxplot of comparison models") –
show_legend (bool, default=True) –
show_mean_only (bool, default=False) –
exts (tuple, default=(".png", ".pdf")) –
- export_convergence_figures(xlabel='Epoch', ylabel='Fitness value', title='Convergence chart of comparison models', exts=('.png', '.pdf'))[source]¶
Export convergence figures for each trial and model.
- Parameters
xlabel (str, default="Epoch") –
ylabel (str, default="Fitness value") –
title (str, default="Convergence chart of comparison models") –
exts (tuple, default=(".png", ".pdf")) –
- fit(X, y=None, test_size=0.2, n_trials=2, n_jobs=None, fit_weights=(0.9, 0.1), transfer_func='vstf_01', save_path='history', save_results=True, fs_problem=None)[source]¶
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values.
test_size (float, default=0.2) – The proportion of the dataset to include in the test split. Must be between 0.0 and 1.0.
n_trials (int, default=2) – The number of trials to run for each optimizer. Each trial will use a different random seed.
n_jobs (int, None.) – Number of processes will be used to speed up the computation (<=1 or None: sequential, >=2: parallel)
fit_weights (list, tuple or np.ndarray, default = (0.9, 0.1)) – The first weight is for objective value and the second weight is for the number of features
transfer_func (str or callable function, default="vstf_01") –
- The transfer function used to convert solution from float to integer. Current supported list:
v-shape transfer function: “vstf_01”, “vstf_02”, “vstf_03”, “vstf_04”
s-shape transfer function: “sstf_01”, “sstf_02”, “sstf_03”, “sstf_04”
If callable function, make sure it return a list/tuple/np.ndarray values.
save_path (str. The path to the folder that hold results) –
save_results (bool.) – Save the global best fitness and loss (convergence/fitness) during generations to csv file (default: True)
- fit_transform(X, y=None, test_size=0.2, n_trials=2, n_jobs=None, fit_weights=(0.9, 0.1), transfer_func='vstf_01', save_path='history', save_results=True, fs_problem=None)[source]¶
Fit the MultiMhaSelector to the data and transform it by selecting the features.
- transform(X, trial=1, model='BaseGA', all_models=False)[source]¶
Transform the input data X by selecting the features based on the fitted model.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) –
trial (int, default=1) –
model (str, default="BaseGA") –
all_models (bool, default=False) –