mafese.utils package

mafese.utils.correlation module

Refs: 1. https://docs.scipy.org/doc/scipy/reference/stats.html#correlation-functions 2. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection

mafese.utils.correlation.chi2_func(X, y)[source]
mafese.utils.correlation.f_classification_func(X, y)[source]
mafese.utils.correlation.f_regression_func(X, y, center=True, force_finite=True)[source]
mafese.utils.correlation.kendall_func(X, y)[source]
mafese.utils.correlation.point_func(X, y)[source]
mafese.utils.correlation.relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]

Performs Relief-F feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores

Parameters
  • X (numpy array) – Input dataset of shape (n_samples, n_features).

  • y (numpy array) – Target variable of shape (n_samples,).

  • n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.

  • n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.

  • problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes

  • normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.correlation.relief_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]

Performs Relief feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores.

Parameters
  • X (numpy array) – Input dataset of shape (n_samples, n_features).

  • y (numpy array) – Target variable of shape (n_samples,).

  • n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.

  • n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.

  • problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes

  • normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.correlation.select_bests(importance_scores=None, n_features=3)[source]

Select features according to the k highest scores or percentile of the highest scores.

Parameters
  • importance_scores (array-like of shape (n_features,)) – Scores of features.

  • n_features (int, float. default=3) –

    Number of selected features.

    • If float, it should be in range of (0, 1). That represent the percentile of the highest scores.

    • If int, it should be in range of (1, N-1). N is total number of features in your dataset.

Returns

mask

Return type

Number of top features to select.

mafese.utils.correlation.spearman_func(X, y)[source]
mafese.utils.correlation.vls_relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]

Performs Very Large Scale ReliefF feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores

Parameters
  • X (numpy array) – Input dataset of shape (n_samples, n_features).

  • y (numpy array) – Target variable of shape (n_samples,).

  • n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.

  • n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.

  • problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes

  • normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.data_loader module

class mafese.utils.data_loader.Data(X=None, y=None, name='Unknown')[source]

Bases: object

The structure of our supported Data class

Parameters
  • X (np.ndarray) – The features of your data

  • y (np.ndarray) – The labels of your data

SUPPORT = {'scaler': ['standard', 'minmax', 'max-abs', 'log1p', 'loge', 'sqrt', 'sinh-arc-sinh', 'robust', 'box-cox', 'yeo-johnson']}
static encode_label(y)[source]
static scale(X, scaling_methods=('standard',), list_dict_paras=None)[source]
set_train_test(X_train=None, y_train=None, X_test=None, y_test=None)[source]

Function use to set your own X_train, y_train, X_test, y_test in case you don’t want to use our split function

Parameters
  • X_train (np.ndarray) –

  • y_train (np.ndarray) –

  • X_test (np.ndarray) –

  • y_test (np.ndarray) –

split_train_test(test_size=0.2, train_size=None, random_state=41, shuffle=True, stratify=None, inplace=True)[source]

The wrapper of the split_train_test function in scikit-learn library.

mafese.utils.data_loader.get_dataset(dataset_name)[source]

Helper function to retrieve the data

Parameters

dataset_name (str) – Name of the dataset

Returns

data – The instance of Data class, that hold X and y variables.

Return type

Data

mafese.utils.encoder module

class mafese.utils.encoder.LabelEncoder[source]

Bases: object

Encode categorical features as integer labels.

fit(y)[source]

Fit label encoder to a given set of labels.

yarray-like

Labels to encode.

fit_transform(y)[source]

Fit label encoder and return encoded labels.

Parameters

y (array-like of shape (n_samples,)) – Target values.

Returns

y – Encoded labels.

Return type

array-like of shape (n_samples,)

inverse_transform(y)[source]

Transform integer labels to original labels.

yarray-like

Encoded integer labels.

original_labelsarray-like

Original labels.

transform(y)[source]

Transform labels to encoded integer labels.

yarray-like

Labels to encode.

encoded_labelsarray-like

Encoded integer labels.

class mafese.utils.encoder.ObjectiveScaler(obj_name='sigmoid', ohe_scaler=None)[source]

Bases: object

For label scaler in classification (binary and multiple classification)

inverse_transform(data)[source]
transform(data)[source]

mafese.utils.estimator module

mafese.utils.estimator.get_general_estimator(problem, name, paras=None)[source]
mafese.utils.estimator.get_lasso_based_estimator(problem, name, paras=None)[source]
mafese.utils.estimator.get_recursive_estimator(problem, name, paras=None)[source]
mafese.utils.estimator.get_tree_based_estimator(problem, name, paras=None)[source]

mafese.utils.mealpy_util module

class mafese.utils.mealpy_util.FeatureSelectionProblem(bounds=None, minmax=None, data=None, estimator=None, metric_class=None, obj_name=None, obj_paras=None, fit_weights=(0.9, 0.1), fit_sign=None, **kwargs)[source]

Bases: mealpy.utils.problem.Problem

A class to define a feature selection optimization problem.

data

An object containing training and testing datasets.

Type

object

estimator

A machine learning model with fit and predict methods.

Type

object

metric_class

A class used to evaluate the performance of the model.

Type

object

obj_name

The name of the objective metric to optimize.

Type

str

obj_paras

Parameters for the objective metric.

Type

dict

fit_weights

Weights for combining the objective value and feature selection ratio.

Type

tuple

fit_sign

Sign to determine the direction of optimization (e.g., 1 for maximization, -1 for minimization).

Type

int

obj_func(solution)[source]

Computes the fitness, objective value, and number of selected features for a given solution.

obj_func(solution)[source]

Computes the fitness, objective value, and number of selected features for a given solution.

Parameters

solution (array-like) – The solution representing selected features.

Returns

A list containing the fitness value, objective value, and number of selected features.

Return type

list

mafese.utils.transfer module

mafese.utils.transfer.sstf_01(x)[source]
mafese.utils.transfer.sstf_02(x)[source]
mafese.utils.transfer.sstf_03(x)[source]
mafese.utils.transfer.sstf_04(x)[source]
mafese.utils.transfer.vstf_01(x)[source]
mafese.utils.transfer.vstf_02(x)[source]
mafese.utils.transfer.vstf_03(x)[source]
mafese.utils.transfer.vstf_04(x)[source]

mafese.utils.validator module

mafese.utils.validator.check_bool(name: str, value: bool, bound=(True, False))[source]

Checks if a value is a boolean and optionally verifies it matches a specified bound.

Parameters
  • name (str) – The name of the variable being checked.

  • value (bool) – The value to check.

  • bound (tuple, optional) – A tuple of allowed boolean values.

Returns

The validated boolean value.

Return type

bool

Raises

ValueError – If the value is not a boolean or not in the bound (if provided).

mafese.utils.validator.check_float(name: str, value: None, bound=None)[source]

Checks if a value is a float and optionally verifies it falls within a specified bound.

Parameters
  • name (str) – The name of the variable being checked.

  • value (int or float) – The value to check.

  • bound (tuple, optional) – A tuple representing the lower and upper bound (inclusive).

Returns

The validated float value.

Return type

float

Raises

ValueError – If the value is not a float or falls outside the bound (if provided).

mafese.utils.validator.check_int(name: str, value: None, bound=None)[source]

Checks if a value is an integer and optionally verifies it falls within a specified bound.

Parameters
  • name (str) – The name of the variable being checked.

  • value (int or float) – The value to check.

  • bound (tuple, optional) – A tuple representing the lower and upper bound (inclusive).

Returns

The validated integer value.

Return type

int

Raises

ValueError – If the value is not an integer or falls outside the bound (if provided).

mafese.utils.validator.check_str(name: str, value: str, bound=None)[source]

Checks if a value is a string and optionally verifies it exists within a provided list.

Parameters
  • name (str) – The name of the variable being checked.

  • value (str) – The value to check.

  • bound (list, optional) – A list of allowed string values.

Returns

The validated string value.

Return type

str

Raises

ValueError – If the value is not a string or not found in the bound list (if provided).

mafese.utils.validator.check_tuple_float(name: str, values: tuple, bounds=None)[source]

Checks if a tuple contains only floats or integers and optionally verifies they fall within specified bounds.

Parameters
  • name (str) – The name of the variable being checked.

  • values (tuple) – The tuple of values to check.

  • bounds (list of tuples, optional) – A list of tuples representing lower and upper bounds for each value.

Returns

The validated tuple of floats.

Return type

tuple

Raises

ValueError – If the values are not all floats or integers or do not fall within the specified bounds.

mafese.utils.validator.check_tuple_int(name: str, values: None, bounds=None)[source]

Checks if a tuple contains only integers and optionally verifies they fall within specified bounds.

Parameters
  • name (str) – The name of the variable being checked.

  • values (tuple) – The tuple of values to check.

  • bounds (list of tuples, optional) – A list of tuples representing lower and upper bounds for each value.

Returns

The validated tuple of integers.

Return type

tuple

Raises

ValueError – If the values are not all integers or do not fall within the specified bounds.

mafese.utils.validator.is_in_bound(value, bound)[source]

Checks if a value falls within a specified numerical bound.

Parameters
  • value (float) – The value to check.

  • bound (tuple) – A tuple representing the lower and upper bound (inclusive for lists).

Returns

True if the value is within the bound, False otherwise.

Return type

bool

Raises

ValueError – If the bound is not a tuple or list.

mafese.utils.validator.is_str_in_list(value: str, my_list: list)[source]

Checks if a string value exists within a provided list.

Parameters
  • value (str) – The string value to check.

  • my_list (list, optional) – The list of possible values.

Returns

True if the value is in the list, False otherwise.

Return type

bool