```
import numpy as np
def perf_metrics_2X2(yobs, yhat):
"""
Returns the specificity, sensitivity, positive predictive value, and negeative predictive value
of a 2X2 table.
where:
0 = negative case
1 = positive case
Parameters
----------
yobs : array of positive and negative ``observed`` cases
yhat : array of positive and negative ``predicted`` cases
Returns
-------
sensitivity = TP / (TP+FN)
specificity = TN / (TN+FP)
pos_pred_val = TP/ (TP+FP)
neg_pred_val = TN/ (TN+FN)
Author: Julio Cardenas-Rodriguez
"""
TP = np.sum( yobs[yobs==1] == yhat[yobs==1] )
TN = np.sum( yobs[yobs==0] == yhat[yobs==0] )
FP = np.sum( yobs[yobs==1] == yhat[yobs==0] )
FN = np.sum( yobs[yobs==0] == yhat[yobs==1] )
sensitivity = TP / (TP+FN)
specificity = TN / (TN+FP)
pos_pred_val = TP/ (TP+FP)
neg_pred_val = TN/ (TN+FN)
return sensitivity, specificity, pos_pred_val, neg_pred_val
```

```
import pandas as pd
y = np.array([0, 1, 1, 1, 0, 1, 0,0,1,0])
y_hat = np.array([1, 1, 0, 1, 0, 1,0, 1,1,0])
metrics = perf_metrics_2X2(y, y_hat)
print(pd.DataFrame( dict( Metric = ['Specificity', 'Sensitivity', 'PPV','NPV'],
Performance = np.round(metrics,3))))
Metric Performance
0 Specificity 0.800
1 Sensitivity 0.600
2 PPV 0.667
3 NPV 0.750
```

Running `Jupyter`

and `PIP`

becomes painful if you don't have admin privileges for your computer; A solution to this is to enter the entire path to `PIP`

and/or `Jupyter`

but it can very time consuming and inefficient. A way around this is to create and alias for the terminal to interpret a short command as if you were entering the entire path to `PIP`

and/or `Jupyter`

.

These are the locations for `Python`

, `pip`

, and `jupyter`

in my computer:

```
> ~\AppData\Local\Continuum\Anaconda3\python.exe
>~\AppData\Local\Continuum\Anaconda3\Scripts\pip.exe
> ~\AppData\Local\Continuum\Anaconda3\Scripts\jupyter-notebook.exe
```

Just type the following in your PowerShell terminal:

```
> New-Item -Type file -Force $profile
```

For my computer the location is the following:

```
> \Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1
```

```
Set-Alias py "~\AppData\Local\Continuum\Anaconda3\python.exe"
Set-Alias pip "~\AppData\Local\Continuum\Anaconda3\Scripts\pip.exe"
Set-Alias jup "~\AppData\Local\Continuum\Anaconda3\Scripts\jupyter-notebook.exe"
```

Remember that the path above are specific for my computer, you should update according to the file structure in your own computer.

For example, type the following to launch a Jupyter notebook in any directory:

```
> cd Documents
> ezj
```

On previous posts I described how to perform non-linear curve fitting in Ptyhon and Julia. At their core non-linear and linear curve fitting (or regression) are optimization problems in which we find the parameters that minimize an objective function. The entire field of mathematical optimization is concerned with finding the most efficient and accurate methods to minimize such functions.

On the other hand, the current standard to find the optimal values for the parameters of the algorithms used in machine learning is to perform a `random search`

or a `grid search`

throughout the space of the possible values that such parameters can take. These approaches have several limitations:

- They are not computationally efficient for large data sets
- the parameters tested are not informed in any way by the results from the previous step.

However, the implementation of optimization-driven approaches for `scikit-learn`

is not a trivial matter. Thankfully, James Bergstra and other brave souls have created `hyperopt`

, a Python library for optimizing over awkward search spaces with real-valued, discrete, and conditional dimensions, which makes it ideal for tuning hyper parameters with `scikit-learn`

.

In order to tune the parameters of `scikit-learn`

estimator, `hyperopt`

needs the following:

1. Data

2. The objective function to be minimized

3. The search space from which to sample the parameters

4. The algorithm to be used for the minimization of the objective function, and the number of time the optimization should be run

```
#modules
from sklearn.metrics.regression import mean_absolute_error as mae
from sklearn.metrics import make_scorer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from hyperopt import hp, fmin, tpe
from hyperopt.pyll import scope
import numpy as np
# hyperopt object for
scope.define(GradientBoostingRegressor)
def train_GradientBoostingRegressor(Xdata, Ydata, loss='ls' ,alpha = 0.50, cv = 2, n_steps = 10):
"""
Trains a Gradient Boosting Regressor using bayesian optimization
Parameters
----------
Xdata: numpy array of size KxN and composed of floating and/or integers
Ydata: numpy array of size K (1D array) of floating
loss: loss function to be optimized.
alpha: quantile for the quantile and hubber loss; floating < 1.0 and > 0.0
CV: K-fold cross-validation size for the training procedure
n_steps: Number of times the `hyperopt` mimizer will run to find the optimal parameters
Returns
-------
Regressor : A sckiki-learn obkect with the trained Gradient Boosting Regressor
"""
#split data
X_train, X_test, y_train, y_test = train_test_split(Xdata, Ydata, test_size=.33, random_state = 42)
# create and objective function
def objective_function_regression(estimator):
mae_array = cross_val_score( estimator, X_train, y_train, cv= cv, n_jobs=-1,
scoring = make_scorer(mae) )
return mae_array.mean()
# search space
n_estimators = hp.randint('n_estimators',1000)
learning_rate = hp.loguniform('learning_rate',-3,1)
max_depth = hp.randint('max_depth', 10)
max_features = hp.randint('max_features',X_train.shape[1])
min_samples_leaf = hp.randint('min_samples_leaf', 10)
criterion = hp.choice('criterion', ['friedman_mse'])
# model / estimator to be optimized
est0 = (0.1, scope.GradientBoostingRegressor( loss = loss,
alpha = alpha,
n_estimators = n_estimators + 1,
learning_rate = learning_rate,
max_depth = max_depth + 1,
max_features = max_features + 1,
min_samples_leaf = min_samples_leaf + 1,
criterion = criterion,
random_state= 101)
)
# search space
search_space_regression = hp.pchoice('estimator', [est0])
print('--'*20)
print('Finding optimal parameters')
# perform the optimization
best = fmin(fn= objective_function_regression,
space= search_space_regression,
algo = tpe.suggest,
max_evals = n_steps,
verbose = 0 # The number of iterations
)
# Allocate optimized parameters and apply to test data set
Regressor = GradientBoostingRegressor( loss = loss, alpha = alpha,
learning_rate = best['learning_rate'],
max_depth = best['max_depth'],
max_features = best['max_features'],
min_samples_leaf = best['min_samples_leaf'],
n_estimators = best['n_estimators'],
random_state = 101
)
# fit
Regressor.fit(X_train,y_train)
#evaluate
yhat = Regressor.predict(X_test) ;
error_pct = np.round( np.median(np.abs(yhat - y_test)), 2)
#print('--'*20)
print(
"{} {}".format('The Median Abs. Error (%) for the test set is :', error_pct)
)
return Regressor, y_test, yhat
```

Now, we can use the Boston housing data set to test our *beautiful* code:

```
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
D= load_boston()
R1, ytest1, yhat1 = train_GradientBoostingRegressor( D.data,
D.target,
loss='quantile',
alpha = 0.50,
n_steps = 50)
```