mafese.utils package¶
mafese.utils.correlation module¶
Refs: 1. https://docs.scipy.org/doc/scipy/reference/stats.html#correlation-functions 2. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection
- mafese.utils.correlation.relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶
Performs Relief-F feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores
- Parameters
X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset
- Returns
importance score – Vector of feature importance scores, with shape (n_features,).
- Return type
np.ndarray
- mafese.utils.correlation.relief_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶
Performs Relief feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores.
- Parameters
X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset
- Returns
importance score – Vector of feature importance scores, with shape (n_features,).
- Return type
np.ndarray
- mafese.utils.correlation.select_bests(importance_scores=None, n_features=3)[source]¶
Select features according to the k highest scores or percentile of the highest scores.
- Parameters
importance_scores (array-like of shape (n_features,)) – Scores of features.
n_features (int, float. default=3) –
Number of selected features.
If float, it should be in range of (0, 1). That represent the percentile of the highest scores.
If int, it should be in range of (1, N-1). N is total number of features in your dataset.
- Returns
mask
- Return type
Number of top features to select.
- mafese.utils.correlation.vls_relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶
Performs Very Large Scale ReliefF feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores
- Parameters
X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset
- Returns
importance score – Vector of feature importance scores, with shape (n_features,).
- Return type
np.ndarray
mafese.utils.data_loader module¶
- class mafese.utils.data_loader.Data(X, y)[source]¶
Bases:
object
The structure of our supported Data class
- Parameters
X (np.ndarray) – The features of your data
y (np.ndarray) – The labels of your data
mafese.utils.encoder module¶
- class mafese.utils.encoder.LabelEncoder[source]¶
Bases:
object
Encode categorical features as integer labels.
- fit_transform(y)[source]¶
Fit label encoder and return encoded labels.
- Parameters
y (array-like of shape (n_samples,)) – Target values.
- Returns
y – Encoded labels.
- Return type
array-like of shape (n_samples,)
mafese.utils.estimator module¶
mafese.utils.mealpy_util module¶
- class mafese.utils.mealpy_util.FeatureSelectionProblem(lb, ub, minmax, data=None, estimator=None, transfer_func=None, obj_name=None, metric_class=None, fit_weights=(0.9, 0.1), fit_sign=1, obj_paras=None, name='Feature Selection Problem', **kwargs)[source]¶
Bases:
mealpy.utils.problem.Problem
- amend_position(position=None, lb=None, ub=None)[source]¶
The goal is to transform the solution into the right format corresponding to the problem. For example, with discrete problems, floating-point numbers must be converted to integers to ensure the solution is in the correct format.
- Parameters
position – vector position (location) of the solution.
lb – list of lower bound values
ub – list of upper bound values
- Returns
Amended position (make the right format of the solution)