mafese.utils package

mafese.utils.correlation module

Refs: 1. https://docs.scipy.org/doc/scipy/reference/stats.html#correlation-functions 2. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection

mafese.utils.correlation.chi2_func(X, y)[source]
mafese.utils.correlation.f_classification_func(X, y)[source]
mafese.utils.correlation.f_regression_func(X, y, center=True, force_finite=True)[source]
mafese.utils.correlation.kendall_func(X, y)[source]
mafese.utils.correlation.point_func(X, y)[source]
mafese.utils.correlation.relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]

Performs Relief-F feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores

Parameters
  • X (numpy array) – Input dataset of shape (n_samples, n_features).

  • y (numpy array) – Target variable of shape (n_samples,).

  • n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.

  • n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.

  • problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes

  • normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.correlation.relief_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]

Performs Relief feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores.

Parameters
  • X (numpy array) – Input dataset of shape (n_samples, n_features).

  • y (numpy array) – Target variable of shape (n_samples,).

  • n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.

  • n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.

  • problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes

  • normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.correlation.select_bests(importance_scores=None, n_features=3)[source]

Select features according to the k highest scores or percentile of the highest scores.

Parameters
  • importance_scores (array-like of shape (n_features,)) – Scores of features.

  • n_features (int, float. default=3) –

    Number of selected features.

    • If float, it should be in range of (0, 1). That represent the percentile of the highest scores.

    • If int, it should be in range of (1, N-1). N is total number of features in your dataset.

Returns

mask

Return type

Number of top features to select.

mafese.utils.correlation.spearman_func(X, y)[source]
mafese.utils.correlation.vls_relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]

Performs Very Large Scale ReliefF feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores

Parameters
  • X (numpy array) – Input dataset of shape (n_samples, n_features).

  • y (numpy array) – Target variable of shape (n_samples,).

  • n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.

  • n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.

  • problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes

  • normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset

Returns

importance score – Vector of feature importance scores, with shape (n_features,).

Return type

np.ndarray

mafese.utils.data_loader module

class mafese.utils.data_loader.Data(X, y)[source]

Bases: object

The structure of our supported Data class

Parameters
  • X (np.ndarray) – The features of your data

  • y (np.ndarray) – The labels of your data

set_train_test(X_train=None, y_train=None, X_test=None, y_test=None)[source]

Function use to set your own X_train, y_train, X_test, y_test in case you don’t want to use our split function

Parameters
  • X_train (np.ndarray) –

  • y_train (np.ndarray) –

  • X_test (np.ndarray) –

  • y_test (np.ndarray) –

split_train_test(test_size=0.2, train_size=None, random_state=41, shuffle=True, stratify=None, inplace=True)[source]

The wrapper of the split_train_test function in scikit-learn library.

mafese.utils.data_loader.get_dataset(dataset_name)[source]

Helper function to retrieve the data

Parameters

dataset_name (str) – Name of the dataset

Returns

data – The instance of Data class, that hold X and y variables.

Return type

Data

mafese.utils.encoder module

class mafese.utils.encoder.LabelEncoder[source]

Bases: object

Encode categorical features as integer labels.

fit(y)[source]

Fit label encoder to a given set of labels.

yarray-like

Labels to encode.

fit_transform(y)[source]

Fit label encoder and return encoded labels.

Parameters

y (array-like of shape (n_samples,)) – Target values.

Returns

y – Encoded labels.

Return type

array-like of shape (n_samples,)

inverse_transform(y)[source]

Transform integer labels to original labels.

yarray-like

Encoded integer labels.

original_labelsarray-like

Original labels.

transform(y)[source]

Transform labels to encoded integer labels.

yarray-like

Labels to encode.

encoded_labelsarray-like

Encoded integer labels.

mafese.utils.estimator module

mafese.utils.estimator.get_general_estimator(problem, name, paras=None)[source]
mafese.utils.estimator.get_lasso_based_estimator(problem, name, paras=None)[source]
mafese.utils.estimator.get_recursive_estimator(problem, name, paras=None)[source]
mafese.utils.estimator.get_tree_based_estimator(problem, name, paras=None)[source]

mafese.utils.mealpy_util module

class mafese.utils.mealpy_util.FeatureSelectionProblem(lb, ub, minmax, data=None, estimator=None, transfer_func=None, obj_name=None, metric_class=None, fit_weights=(0.9, 0.1), fit_sign=1, obj_paras=None, name='Feature Selection Problem', **kwargs)[source]

Bases: mealpy.utils.problem.Problem

amend_position(position=None, lb=None, ub=None)[source]

The goal is to transform the solution into the right format corresponding to the problem. For example, with discrete problems, floating-point numbers must be converted to integers to ensure the solution is in the correct format.

Parameters
  • position – vector position (location) of the solution.

  • lb – list of lower bound values

  • ub – list of upper bound values

Returns

Amended position (make the right format of the solution)

fit_func(solution)[source]

Fitness function

Parameters

x (numpy.ndarray) – Solution.

Returns

Function value of x.

Return type

float

mafese.utils.transfer module

mafese.utils.transfer.sstf_01(x)[source]
mafese.utils.transfer.sstf_02(x)[source]
mafese.utils.transfer.sstf_03(x)[source]
mafese.utils.transfer.sstf_04(x)[source]
mafese.utils.transfer.vstf_01(x)[source]
mafese.utils.transfer.vstf_02(x)[source]
mafese.utils.transfer.vstf_03(x)[source]
mafese.utils.transfer.vstf_04(x)[source]

mafese.utils.validator module

mafese.utils.validator.check_bool(name: str, value: bool, bound=(True, False))[source]
mafese.utils.validator.check_float(name: str, value: int, bound=None)[source]
mafese.utils.validator.check_int(name: str, value: int, bound=None)[source]
mafese.utils.validator.check_str(name: str, value: str, bound=None)[source]
mafese.utils.validator.check_tuple_float(name: str, values: tuple, bounds=None)[source]
mafese.utils.validator.check_tuple_int(name: str, values: tuple, bounds=None)[source]
mafese.utils.validator.is_in_bound(value, bound)[source]
mafese.utils.validator.is_str_in_list(value: str, my_list: list)[source]