MAFESE Library¶

mafese.selector¶

class mafese.selector.Selector(problem='classification')[source]¶

Bases: abc.ABC

Defines an abstract class for Feature Selector.

SUPPORTED_CLASSIFICATION_METRICS = ['PS', 'NPV', 'RS', 'AS', 'F1S', 'F2S', 'FBS', 'SS', 'MCC', 'HS', 'CKS', 'JSI', 'GMS', 'ROC-AUC', 'LS', 'GINI', 'CEL', 'HL', 'KLDL', 'BSL']¶

SUPPORTED_ESTIMATORS = ['knn', 'svm', 'rf', 'adaboost', 'xgb', 'tree', 'ann']¶

SUPPORTED_PROBLEMS = ['classification', 'regression']¶

SUPPORTED_REGRESSION_METRICS = ['EVS', 'ME', 'MAE', 'MSE', 'RMSE', 'MSLE', 'MedAE', 'MRE', 'MRB', 'MAPE', 'SMAPE', 'MAAPE', 'MASE', 'NSE', 'NNSE', 'WI', 'R', 'PCC', 'AR', 'APCC', 'R2S', 'RSQ', 'R2', 'COD', 'AR2', 'ACOD', 'CI', 'DRV', 'KGE', 'GINI', 'GINI_WIKI', 'PCD', 'JSD', 'VAF', 'RAE', 'A10', 'A20', 'A30', 'NRMSE', 'RSE', 'COV', 'COR', 'EC', 'OI', 'CRM']¶

evaluate(estimator=None, estimator_paras=None, data=None, metrics=None)[source]¶

Evaluate the new dataset. We will re-train the estimator with training set and return the metrics of both training and testing set

Parameters

estimator (str or Estimator instance (from scikit-learn or custom)) –
If estimator is str, we are currently support:
- knn: k-nearest neighbors
- svm: support vector machine
- rf: random forest
- adaboost: AdaBoost
- xgb: Gradient Boosting
- tree: Extra Trees
- ann: Artificial Neural Network (Multi-Layer Perceptron)
If estimator is Estimator instance: you need to make sure that it has fit and predict methods
estimator_paras (None or dict, default = None) – The parameters of the estimator, please see the official document of scikit-learn to selected estimator. If None, we use the default parameter for selected estimator
data (Data, an instance of Data class. It must have training and testing set) –
metrics (tuple, list, default = None) – Depend on the regression or classification you are trying to tackle. The supported metrics can be found at: https://github.com/thieu1995/permetrics

Returns

metrics_results – The metrics for both training and testing set.

Return type

dict.

fit(X, y=None)[source]¶

Learn the features to select from X.

Parameters

X (array-like of shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of predictors.
y (array-like of shape (n_samples,), default=None) – Target values. This parameter may be ignored for unsupervised learning.

Returns

self – Returns the instance itself.

Return type

object

fit_transform(X, y=None, **fit_params)[source]¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

name = 'Feature Selector'¶

transform(X)[source]¶

Reduce X to the selected features.

Parameters: X (array of shape [n_samples, n_features]) – The input samples.
Returns: X_r – The input samples with only the selected features.
Return type: array of shape [n_samples, n_selected_features]

mafese.filter¶

class mafese.filter.FilterSelector(problem='classification', method='ANOVA', n_features=3, n_neighbors=5, n_bins=10, normalized=True)[source]¶

Bases: mafese.selector.Selector

Defines a FilterSelector class that hold all filter methods for feature selection problems

Parameters

problem (str, default = "classification") – The problem you are trying to solve (or type of dataset), “classification” or “regression”
method (str, default = "ANOVA") –
If the problem = “classification”, FilterSelector’s support method can be one of this value:
- ”CHI”: Chi-Squared statistic
- ”ANOVA”: ANOVA F-score
- ”MI”: Mutual information
- ”KENDALL”: Kendall Tau correlation
- ”SPEARMAN”: Spearman’s Rho correlation
- ”POINT”: Point-biserial correlation
- ”RELIEF”: Original Relief method
- ”RELIEF-F”: Weighted average Relief based on the frequency of each class
- ”VLS-RELIEF-F”: Very Large Scale ReliefF
If the problem = “regression”, FilterSelector’s support method can be one of this value:
- ”PEARSON”: Pearson correlation
- ”ANOVA”: ANOVA F-score
- ”MI”: Mutual information
- ”KENDALL”: Kendall Tau correlation
- ”SPEARMAN”: Spearman’s Rho correlation
- ”POINT”: Point-biserial correlation
- ”RELIEF”: Original Relief method
- ”RELIEF-F”: Weighted average Relief based on the frequency of each class
- ”VLS-RELIEF-F”: Very Large Scale ReliefF
n_features (int or float, default=3) – If integer, the parameter is the absolute number of features to select. If float between 0 and 1, it is the fraction of features to select.
n_neighbors (int, default=5, Optional) – Number of neighbors to use for computing feature importance scores of Relief-based family
n_bins (int, default=10, Optional) – Number of bins to use for discretizing the target variable of Relief-based family in regression problems.
normalized (bool, default=True, Optional) – Normalize feature importance scores by the number of instances in the dataset

n_features¶

The number of selected features.

Type: int

supported_methods¶

Key: is the support method name Value: is the support method function

Type: dict

method_name¶

The method that will be used

Type: str

Examples

The following example shows how to retrieve the most informative features in the FilterSelector FS method

>>> import pandas as pd
>>> from mafese.filter import FilterSelector
>>> # load dataset
>>> dataset = pd.read_csv('your_path/dataset.csv', index_col=0).values
>>> X, y = dataset[:, 0:-1], dataset[:, -1]     # Assumption that the last column is label column
>>> # define mafese feature selection method
>>> feat_selector = FilterSelector(problem='classification', method='SPEARMAN', n_features=5)
>>> # find all relevant features
>>> feat_selector.fit(X, y)
>>> # check selected features - True (or 1) is selected, False (or 0) is not selected
>>> print(feat_selector.selected_feature_masks)
array([ True, True, True, False, False, True, False, False, False, True])
>>> print(feat_selector.selected_feature_solution)
array([ 1, 1, 1, 0, 0, 1, 0, 0, 0, 1])
>>> # check the index of selected features
>>> print(feat_selector.selected_feature_indexes)
array([ 0, 1, 2, 5, 9])
>>> # call transform() on X to filter it down to selected features
>>> X_filtered = feat_selector.transform(X)

SUPPORT = {'classification': {'ANOVA': 'f_classification_func', 'CHI': 'chi2_func', 'KENDALL': 'kendall_func', 'MI': 'mutual_info_classif', 'POINT': 'point_func', 'RELIEF': 'relief_func', 'RELIEF-F': 'relief_f_func', 'SPEARMAN': 'spearman_func', 'VLS-RELIEF-F': 'vls_relief_f_func'}, 'regression': {'ANOVA': 'f_regression_func', 'KENDALL': 'kendall_func', 'MI': 'mutual_info_regression', 'PEARSON': 'r_regression', 'POINT': 'point_func', 'RELIEF': 'relief_func', 'RELIEF-F': 'relief_f_func', 'SPEARMAN': 'spearman_func', 'VLS-RELIEF-F': 'vls_relief_f_func'}}¶

fit(X, y=None)[source]¶

Learn the features to select from X.

Parameters

X (array-like of shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of predictors.
y (array-like of shape (n_samples,), default=None) – Target values. This parameter may be ignored for unsupervised learning.

Returns

self – Returns the instance itself.

Return type

object

fit_transform(X, y=None, **fit_params)[source]¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

transform(X)[source]¶

Reduce X to the selected features.

Parameters: X (array of shape [n_samples, n_features]) – The input samples.
Returns: X_r – The input samples with only the selected features.
Return type: array of shape [n_samples, n_selected_features]

MAFESE Library¶

mafese.selector¶

mafese.filter¶

mafese submodule¶