mafese.utils package¶
mafese.utils.correlation module¶
Refs: 1. https://docs.scipy.org/doc/scipy/reference/stats.html#correlation-functions 2. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection
- mafese.utils.correlation.relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶
Performs Relief-F feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores
- Parameters
X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset
- Returns
importance score – Vector of feature importance scores, with shape (n_features,).
- Return type
np.ndarray
- mafese.utils.correlation.relief_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶
Performs Relief feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores.
- Parameters
X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset
- Returns
importance score – Vector of feature importance scores, with shape (n_features,).
- Return type
np.ndarray
- mafese.utils.correlation.select_bests(importance_scores=None, n_features=3)[source]¶
Select features according to the k highest scores or percentile of the highest scores.
- Parameters
importance_scores (array-like of shape (n_features,)) – Scores of features.
n_features (int, float. default=3) –
Number of selected features.
If float, it should be in range of (0, 1). That represent the percentile of the highest scores.
If int, it should be in range of (1, N-1). N is total number of features in your dataset.
- Returns
mask
- Return type
Number of top features to select.
- mafese.utils.correlation.vls_relief_f_func(X, y, n_neighbors=5, n_bins=10, problem='classification', normalized=True, **kwargs)[source]¶
Performs Very Large Scale ReliefF feature selection on the input dataset X and target variable y. Returns a vector of feature importance scores
- Parameters
X (numpy array) – Input dataset of shape (n_samples, n_features).
y (numpy array) – Target variable of shape (n_samples,).
n_neighbors (int, default=5) – Number of neighbors to use for computing feature importance scores.
n_bins (int, default=10) – Number of bins to use for discretizing the target variable in regression problems.
problem (str) – The problem of dataset, either regression or classification If regression, discretize the target variable into n_bins classes
normalized (bool, default=True) – Normalize feature importance scores by the number of instances in the dataset
- Returns
importance score – Vector of feature importance scores, with shape (n_features,).
- Return type
np.ndarray
mafese.utils.data_loader module¶
- class mafese.utils.data_loader.Data(X=None, y=None, name='Unknown')[source]¶
Bases:
objectThe structure of our supported Data class
- Parameters
X (np.ndarray) – The features of your data
y (np.ndarray) – The labels of your data
- SUPPORT = {'scaler': ['standard', 'minmax', 'max-abs', 'log1p', 'loge', 'sqrt', 'sinh-arc-sinh', 'robust', 'box-cox', 'yeo-johnson']}¶
mafese.utils.encoder module¶
- class mafese.utils.encoder.LabelEncoder[source]¶
Bases:
objectEncode categorical features as integer labels.
- fit_transform(y)[source]¶
Fit label encoder and return encoded labels.
- Parameters
y (array-like of shape (n_samples,)) – Target values.
- Returns
y – Encoded labels.
- Return type
array-like of shape (n_samples,)
mafese.utils.estimator module¶
mafese.utils.mealpy_util module¶
- class mafese.utils.mealpy_util.FeatureSelectionProblem(bounds=None, minmax=None, data=None, estimator=None, metric_class=None, obj_name=None, obj_paras=None, fit_weights=(0.9, 0.1), fit_sign=None, **kwargs)[source]¶
Bases:
mealpy.utils.problem.ProblemA class to define a feature selection optimization problem.
- data¶
An object containing training and testing datasets.
- Type
object
- estimator¶
A machine learning model with fit and predict methods.
- Type
object
- metric_class¶
A class used to evaluate the performance of the model.
- Type
object
- obj_name¶
The name of the objective metric to optimize.
- Type
str
- obj_paras¶
Parameters for the objective metric.
- Type
dict
- fit_weights¶
Weights for combining the objective value and feature selection ratio.
- Type
tuple
- fit_sign¶
Sign to determine the direction of optimization (e.g., 1 for maximization, -1 for minimization).
- Type
int
- obj_func(solution)[source]¶
Computes the fitness, objective value, and number of selected features for a given solution.
- obj_func(solution)[source]¶
Computes the fitness, objective value, and number of selected features for a given solution.
- Parameters
solution (array-like) – The solution representing selected features.
- Returns
A list containing the fitness value, objective value, and number of selected features.
- Return type
list
mafese.utils.transfer module¶
mafese.utils.validator module¶
- mafese.utils.validator.check_bool(name: str, value: bool, bound=(True, False))[source]¶
Checks if a value is a boolean and optionally verifies it matches a specified bound.
- Parameters
name (str) – The name of the variable being checked.
value (bool) – The value to check.
bound (tuple, optional) – A tuple of allowed boolean values.
- Returns
The validated boolean value.
- Return type
bool
- Raises
ValueError – If the value is not a boolean or not in the bound (if provided).
- mafese.utils.validator.check_float(name: str, value: None, bound=None)[source]¶
Checks if a value is a float and optionally verifies it falls within a specified bound.
- Parameters
name (str) – The name of the variable being checked.
value (int or float) – The value to check.
bound (tuple, optional) – A tuple representing the lower and upper bound (inclusive).
- Returns
The validated float value.
- Return type
float
- Raises
ValueError – If the value is not a float or falls outside the bound (if provided).
- mafese.utils.validator.check_int(name: str, value: None, bound=None)[source]¶
Checks if a value is an integer and optionally verifies it falls within a specified bound.
- Parameters
name (str) – The name of the variable being checked.
value (int or float) – The value to check.
bound (tuple, optional) – A tuple representing the lower and upper bound (inclusive).
- Returns
The validated integer value.
- Return type
int
- Raises
ValueError – If the value is not an integer or falls outside the bound (if provided).
- mafese.utils.validator.check_str(name: str, value: str, bound=None)[source]¶
Checks if a value is a string and optionally verifies it exists within a provided list.
- Parameters
name (str) – The name of the variable being checked.
value (str) – The value to check.
bound (list, optional) – A list of allowed string values.
- Returns
The validated string value.
- Return type
str
- Raises
ValueError – If the value is not a string or not found in the bound list (if provided).
- mafese.utils.validator.check_tuple_float(name: str, values: tuple, bounds=None)[source]¶
Checks if a tuple contains only floats or integers and optionally verifies they fall within specified bounds.
- Parameters
name (str) – The name of the variable being checked.
values (tuple) – The tuple of values to check.
bounds (list of tuples, optional) – A list of tuples representing lower and upper bounds for each value.
- Returns
The validated tuple of floats.
- Return type
tuple
- Raises
ValueError – If the values are not all floats or integers or do not fall within the specified bounds.
- mafese.utils.validator.check_tuple_int(name: str, values: None, bounds=None)[source]¶
Checks if a tuple contains only integers and optionally verifies they fall within specified bounds.
- Parameters
name (str) – The name of the variable being checked.
values (tuple) – The tuple of values to check.
bounds (list of tuples, optional) – A list of tuples representing lower and upper bounds for each value.
- Returns
The validated tuple of integers.
- Return type
tuple
- Raises
ValueError – If the values are not all integers or do not fall within the specified bounds.
- mafese.utils.validator.is_in_bound(value, bound)[source]¶
Checks if a value falls within a specified numerical bound.
- Parameters
value (float) – The value to check.
bound (tuple) – A tuple representing the lower and upper bound (inclusive for lists).
- Returns
True if the value is within the bound, False otherwise.
- Return type
bool
- Raises
ValueError – If the bound is not a tuple or list.
- mafese.utils.validator.is_str_in_list(value: str, my_list: list)[source]¶
Checks if a string value exists within a provided list.
- Parameters
value (str) – The string value to check.
my_list (list, optional) – The list of possible values.
- Returns
True if the value is in the list, False otherwise.
- Return type
bool