mafese.utils package¶

mafese.utils.correlation module¶

Refs: 1. https://docs.scipy.org/doc/scipy/reference/stats.html#correlation-functions 2. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection

mafese.utils.correlation.chi2_func(X, y)[source]¶

mafese.utils.correlation.f_classification_func(X, y)[source]¶

mafese.utils.correlation.f_regression_func(X, y, center=True, force_finite=True)[source]¶

mafese.utils.correlation.kendall_func(X, y)[source]¶

mafese.utils.correlation.point_func(X, y)[source]¶

mafese.utils.correlation.relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶

Performs Relief-F feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores

Parameters

X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.correlation.relief_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶

Performs Relief feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores.

Parameters

X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.correlation.select_bests(importance_scores=None, n_features=3)[source]¶

Select features according to the k highest scores or percentile of the highest scores.

Parameters

importance_scores (array-like of shape (n_features,)) – Scores of features.
n_features (int, float. default=3) –
Number of selected features.
- If float, it should be in range of (0, 1). That represent the percentile of the highest scores.
- If int, it should be in range of (1, N-1). N is total number of features in your dataset.

Returns

mask

Return type

Number of top features to select.

mafese.utils.correlation.spearman_func(X, y)[source]¶

mafese.utils.correlation.vls_relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶

Performs Very Large Scale ReliefF feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores

Parameters

X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.data_loader module¶

class mafese.utils.data_loader.Data(X, y)[source]¶

Bases: object

The structure of our supported Data class

Parameters

X (np.ndarray) – The features of your data
y (np.ndarray) – The labels of your data

set_train_test(X_train=None, y_train=None, X_test=None, y_test=None)[source]¶

Function use to set your own X_train, y_train, X_test, y_test in case you don’t want to use our split function

Parameters

X_train (np.ndarray) –
y_train (np.ndarray) –
X_test (np.ndarray) –
y_test (np.ndarray) –

split_train_test(test_size=0.2, train_size=None, random_state=41, shuffle=True, stratify=None, inplace=True)[source]¶: The wrapper of the split_train_test function in scikit-learn library.

mafese.utils.data_loader.get_dataset(dataset_name)[source]¶

Helper function to retrieve the data

Parameters: dataset_name (str) – Name of the dataset
Returns: data – The instance of Data class, that hold X and y variables.
Return type: Data

mafese.utils.encoder module¶

class mafese.utils.encoder.LabelEncoder[source]¶

Bases: object

Encode categorical features as integer labels.

fit(y)[source]¶

Fit label encoder to a given set of labels.

yarray-like: Labels to encode.

fit_transform(y)[source]¶

Fit label encoder and return encoded labels.

Parameters: y (array-like of shape (n_samples,)) – Target values.
Returns: y – Encoded labels.
Return type: array-like of shape (n_samples,)

inverse_transform(y)[source]¶

Transform integer labels to original labels.

yarray-like: Encoded integer labels.

original_labelsarray-like: Original labels.

transform(y)[source]¶

Transform labels to encoded integer labels.

yarray-like: Labels to encode.

encoded_labelsarray-like: Encoded integer labels.

mafese.utils.estimator module¶

mafese.utils.estimator.get_general_estimator(problem, name, paras=None)[source]¶

mafese.utils.estimator.get_lasso_based_estimator(problem, name, paras=None)[source]¶

mafese.utils.estimator.get_recursive_estimator(problem, name, paras=None)[source]¶

mafese.utils.estimator.get_tree_based_estimator(problem, name, paras=None)[source]¶

mafese.utils.mealpy_util module¶

class mafese.utils.mealpy_util.FeatureSelectionProblem(lb, ub, minmax, data=None, estimator=None, transfer_func=None, obj_name=None, metric_class=None, fit_weights=(0.9, 0.1), fit_sign=1, obj_paras=None, name='Feature Selection Problem', **kwargs)[source]¶

Bases: mealpy.utils.problem.Problem

amend_position(position=None, lb=None, ub=None)[source]¶

The goal is to transform the solution into the right format corresponding to the problem. For example, with discrete problems, floating-point numbers must be converted to integers to ensure the solution is in the correct format.

Parameters

position – vector position (location) of the solution.
lb – list of lower bound values
ub – list of upper bound values

Returns

Amended position (make the right format of the solution)

fit_func(solution)[source]¶

Fitness function

Parameters: x (numpy.ndarray) – Solution.
Returns: Function value of x.
Return type: float

mafese.utils.transfer module¶

mafese.utils.transfer.sstf_01(x)[source]¶

mafese.utils.transfer.sstf_02(x)[source]¶

mafese.utils.transfer.sstf_03(x)[source]¶

mafese.utils.transfer.sstf_04(x)[source]¶

mafese.utils.transfer.vstf_01(x)[source]¶

mafese.utils.transfer.vstf_02(x)[source]¶

mafese.utils.transfer.vstf_03(x)[source]¶

mafese.utils.transfer.vstf_04(x)[source]¶

mafese.utils.validator module¶

mafese.utils.validator.check_bool(name: str, value: bool, bound=(True, False))[source]¶

mafese.utils.validator.check_float(name: str, value: int, bound=None)[source]¶

mafese.utils.validator.check_int(name: str, value: int, bound=None)[source]¶

mafese.utils.validator.check_str(name: str, value: str, bound=None)[source]¶

mafese.utils.validator.check_tuple_float(name: str, values: tuple, bounds=None)[source]¶

mafese.utils.validator.check_tuple_int(name: str, values: tuple, bounds=None)[source]¶

mafese.utils.validator.is_in_bound(value, bound)[source]¶

mafese.utils.validator.is_str_in_list(value: str, my_list: list)[source]¶