mafese.utils package¶

mafese.utils.correlation module¶

Refs: 1. https://docs.scipy.org/doc/scipy/reference/stats.html#correlation-functions 2. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection

mafese.utils.correlation.chi2_func(X, y)[source]¶

mafese.utils.correlation.f_classification_func(X, y)[source]¶

mafese.utils.correlation.f_regression_func(X, y, center=True, force_finite=True)[source]¶

mafese.utils.correlation.kendall_func(X, y)[source]¶

mafese.utils.correlation.point_func(X, y)[source]¶

mafese.utils.correlation.relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶

Performs Relief-F feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores

Parameters

X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.correlation.relief_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶

Performs Relief feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores.

Parameters

X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.correlation.select_bests(importance_scores=None, n_features=3)[source]¶

Select features according to the k highest scores or percentile of the highest scores.

Parameters

importance_scores (array-like of shape (n_features,)) – Scores of features.
n_features (int, float. default=3) –
Number of selected features.
- If float, it should be in range of (0, 1). That represent the percentile of the highest scores.
- If int, it should be in range of (1, N-1). N is total number of features in your dataset.

Returns

mask

Return type

Number of top features to select.

mafese.utils.correlation.spearman_func(X, y)[source]¶

mafese.utils.correlation.vls_relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶

Performs Very Large Scale ReliefF feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores

Parameters

X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.data_loader module¶

class mafese.utils.data_loader.Data(X=None, y=None, name='Unknown')[source]¶

Bases: object

The structure of our supported Data class

Parameters

X (np.ndarray) – The features of your data
y (np.ndarray) – The labels of your data

SUPPORT = {'scaler': ['standard', 'minmax', 'max-abs', 'log1p', 'loge', 'sqrt', 'sinh-arc-sinh', 'robust', 'box-cox', 'yeo-johnson']}¶

static encode_label(y)[source]¶

static scale(X, scaling_methods=('standard',), list_dict_paras=None)[source]¶

set_train_test(X_train=None, y_train=None, X_test=None, y_test=None)[source]¶

Function use to set your own X_train, y_train, X_test, y_test in case you don’t want to use our split function

Parameters

X_train (np.ndarray) –
y_train (np.ndarray) –
X_test (np.ndarray) –
y_test (np.ndarray) –

split_train_test(test_size=0.2, train_size=None, random_state=41, shuffle=True, stratify=None, inplace=True)[source]¶: The wrapper of the split_train_test function in scikit-learn library.

mafese.utils.data_loader.get_dataset(dataset_name)[source]¶

Helper function to retrieve the data

Parameters: dataset_name (str) – Name of the dataset
Returns: data – The instance of Data class, that hold X and y variables.
Return type: Data

mafese.utils.encoder module¶

class mafese.utils.encoder.LabelEncoder[source]¶

Bases: object

Encode categorical features as integer labels.

fit(y)[source]¶

Fit label encoder to a given set of labels.

yarray-like: Labels to encode.

fit_transform(y)[source]¶

Fit label encoder and return encoded labels.

Parameters: y (array-like of shape (n_samples,)) – Target values.
Returns: y – Encoded labels.
Return type: array-like of shape (n_samples,)

inverse_transform(y)[source]¶

Transform integer labels to original labels.

yarray-like: Encoded integer labels.

original_labelsarray-like: Original labels.

transform(y)[source]¶

Transform labels to encoded integer labels.

yarray-like: Labels to encode.

encoded_labelsarray-like: Encoded integer labels.

class mafese.utils.encoder.ObjectiveScaler(obj_name='sigmoid', ohe_scaler=None)[source]¶

Bases: object

For label scaler in classification (binary and multiple classification)

inverse_transform(data)[source]¶

transform(data)[source]¶

mafese.utils.estimator module¶

mafese.utils.estimator.get_general_estimator(problem, name, paras=None)[source]¶

mafese.utils.estimator.get_lasso_based_estimator(problem, name, paras=None)[source]¶

mafese.utils.estimator.get_recursive_estimator(problem, name, paras=None)[source]¶

mafese.utils.estimator.get_tree_based_estimator(problem, name, paras=None)[source]¶

mafese.utils.mealpy_util module¶

class mafese.utils.mealpy_util.FeatureSelectionProblem(bounds=None, minmax=None, data=None, estimator=None, metric_class=None, obj_name=None, obj_paras=None, fit_weights=(0.9, 0.1), fit_sign=None, **kwargs)[source]¶

Bases: mealpy.utils.problem.Problem

A class to define a feature selection optimization problem.

data¶

An object containing training and testing datasets.

Type: object

estimator¶

A machine learning model with fit and predict methods.

Type: object

metric_class¶

A class used to evaluate the performance of the model.

Type: object

obj_name¶

The name of the objective metric to optimize.

Type: str

obj_paras¶

Parameters for the objective metric.

Type: dict

fit_weights¶

Weights for combining the objective value and feature selection ratio.

Type: tuple

fit_sign¶

Sign to determine the direction of optimization (e.g., 1 for maximization, -1 for minimization).

Type: int

obj_func(solution)[source]¶: Computes the fitness, objective value, and number of selected features for a given solution.

obj_func(solution)[source]¶

Computes the fitness, objective value, and number of selected features for a given solution.

Parameters: solution (array-like) – The solution representing selected features.
Returns: A list containing the fitness value, objective value, and number of selected features.
Return type: list

mafese.utils.transfer module¶

mafese.utils.transfer.sstf_01(x)[source]¶

mafese.utils.transfer.sstf_02(x)[source]¶

mafese.utils.transfer.sstf_03(x)[source]¶

mafese.utils.transfer.sstf_04(x)[source]¶

mafese.utils.transfer.vstf_01(x)[source]¶

mafese.utils.transfer.vstf_02(x)[source]¶

mafese.utils.transfer.vstf_03(x)[source]¶

mafese.utils.transfer.vstf_04(x)[source]¶

mafese.utils.validator module¶

mafese.utils.validator.check_bool(name: str, value: bool, bound=(True, False))[source]¶

Checks if a value is a boolean and optionally verifies it matches a specified bound.

Parameters

name (str) – The name of the variable being checked.
value (bool) – The value to check.
bound (tuple, optional) – A tuple of allowed boolean values.

Returns

The validated boolean value.

Return type

bool

Raises

ValueError – If the value is not a boolean or not in the bound (if provided).

mafese.utils.validator.check_float(name: str, value: None, bound=None)[source]¶

Checks if a value is a float and optionally verifies it falls within a specified bound.

Parameters

name (str) – The name of the variable being checked.
value (int or float) – The value to check.
bound (tuple, optional) – A tuple representing the lower and upper bound (inclusive).

Returns

The validated float value.

Return type

float

Raises

ValueError – If the value is not a float or falls outside the bound (if provided).

mafese.utils.validator.check_int(name: str, value: None, bound=None)[source]¶

Checks if a value is an integer and optionally verifies it falls within a specified bound.

Parameters

name (str) – The name of the variable being checked.
value (int or float) – The value to check.
bound (tuple, optional) – A tuple representing the lower and upper bound (inclusive).

Returns

The validated integer value.

Return type

int

Raises

ValueError – If the value is not an integer or falls outside the bound (if provided).

mafese.utils.validator.check_str(name: str, value: str, bound=None)[source]¶

Checks if a value is a string and optionally verifies it exists within a provided list.

Parameters

name (str) – The name of the variable being checked.
value (str) – The value to check.
bound (list, optional) – A list of allowed string values.

Returns

The validated string value.

Return type

str

Raises

ValueError – If the value is not a string or not found in the bound list (if provided).

mafese.utils.validator.check_tuple_float(name: str, values: tuple, bounds=None)[source]¶

Checks if a tuple contains only floats or integers and optionally verifies they fall within specified bounds.

Parameters

name (str) – The name of the variable being checked.
values (tuple) – The tuple of values to check.
bounds (list of tuples, optional) – A list of tuples representing lower and upper bounds for each value.

Returns

The validated tuple of floats.

Return type

tuple

Raises

ValueError – If the values are not all floats or integers or do not fall within the specified bounds.

mafese.utils.validator.check_tuple_int(name: str, values: None, bounds=None)[source]¶

Checks if a tuple contains only integers and optionally verifies they fall within specified bounds.

Parameters

name (str) – The name of the variable being checked.
values (tuple) – The tuple of values to check.
bounds (list of tuples, optional) – A list of tuples representing lower and upper bounds for each value.

Returns

The validated tuple of integers.

Return type

tuple

Raises

ValueError – If the values are not all integers or do not fall within the specified bounds.

mafese.utils.validator.is_in_bound(value, bound)[source]¶

Checks if a value falls within a specified numerical bound.

Parameters

value (float) – The value to check.
bound (tuple) – A tuple representing the lower and upper bound (inclusive for lists).

Returns

True if the value is within the bound, False otherwise.

Return type

bool

Raises

ValueError – If the bound is not a tuple or list.

mafese.utils.validator.is_str_in_list(value: str, my_list: list)[source]¶

Checks if a string value exists within a provided list.

Parameters

value (str) – The string value to check.
my_list (list, optional) – The list of possible values.

Returns

True if the value is in the list, False otherwise.

Return type

bool