thoth.utils
ScikitModel
Bases: Protocol
A protocol for compatible scikit-learn models
See the scikit-learn documentation for details here
Source code in thoth/utils.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
fit(X, y)
Fits the model to a dataset
Source code in thoth/utils.py
24 25 26 27 28 29 30 |
|
predict(X)
Predict with the model on a dataset
Source code in thoth/utils.py
32 33 34 |
|
get_metrics(clf, x, y)
Evaluate the performance of a scikit-learn predictor on a given dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clf |
ScikitModel
|
The trained classifier to evaluate |
required |
x |
pd.DataFrame
|
The input data |
required |
y |
pd.Series
|
Lables for each sample in the input data |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: A DataFrame containing the Precision, Recall and F1 scores Macro average is used for multiclass datasets, and micro average is used for binary classification. |
Source code in thoth/utils.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
load_process_data(dataset_name)
Loads and formats a dataset based on its name
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_name |
str
|
The name of the dataset to load and process |
required |
Returns:
Type | Description |
---|---|
Tuple[dict, pd.DataFrame]
|
Tuple[dict, pd.DataFrame]: A tuple of the dataset metadata dict and the data |
Source code in thoth/utils.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
train_model(model, params, train_x, train_y)
Initialise and train a given scikit-learn model with the provided parameters and data
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
Type[ScikitModelT]
|
The model architecture to use |
required |
params |
dict
|
A parameter dictionary containing parameter_name: value pairs |
required |
train_x |
pd.DataFrame
|
The training data, should be of shape (n_samples, n_features) |
required |
train_y |
pd.Series
|
The training labels, should be of shape (n_samples) |
required |
Returns:
Name | Type | Description |
---|---|---|
model |
ScikitModelT
|
The trained model |
Source code in thoth/utils.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|