class: center, middle, inverse, title-slide # Introdução ao Machine Learning ## Dataprep e Classificação ###
### May de 2022 --- class: middle, center, inverse # Dataprep Parte I --- # Conteúdo - Preditores categóricos - Transformações 1:1 - Transformações 1:n - Regressão Logística - Matriz de Confusão - Métricas de Classificação - Curva ROC - Múltiplas Notas de Corte --- ## Preditores Categóricos ### Preditor com apenas 2 categorias Saldo médio no cartão de crédito é diferente entre homens e mulheres? <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> $$ y_i = \beta_0 + \beta_1x_i \space\space\space\space\space\space \text{em que}\space\space\space\space\space\space x_i = \Bigg\\{\begin{array}{ll}1&\text{se o i-ésimo carro for }\texttt{manual}\\\\ 0&\text{se o i-ésimo carro for } \texttt{automático}\end{array} $$ .footnote[ Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 84 (Predictors with Only Two Levels). ] --- ## Preditores Categóricos ### Preditor com 3 ou mais categorias .pull-left[ <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ] .pull-right[ Exemplo: Modelo linear `$$y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i}$$` Em que `\(x_{1i} = \Bigg \{ \begin{array}{ll} 1 & \text{se for }\texttt{Multi_Family}\\0&\text{caso contrário}\end{array}\)` `\(x_{2i} = \Bigg \{ \begin{array}{ll} 1 & \text{se for }\texttt{Residential}\\0&\text{caso contrário}\end{array}\)` ] --- ## Preditores Categóricos ### Preditor com 3 ou mais categorias "One hot enconding" ou "Dummies" ou "Indicadores". <table> <thead> <tr> <th style="text-align:left;"> type </th> <th style="text-align:right;"> (Intercept) </th> <th style="text-align:right;"> typeMulti_Family </th> <th style="text-align:right;"> typeResidential </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Residential </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Multi_Family </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Condo </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Residential </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Condo </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Multi_Family </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> </tbody> </table> steps: `step_dummy()` --- ## Preditores Categóricos ### Preditor com 3 ou mais categorias As previsões para cada categoria ficaria assim: `\(y_{i} = \left\{ \begin{array}{ll} \beta_0 & \text{se for }\texttt{Condo}\\ \beta_0 + \beta_1&\text{se for } \texttt{Multi_Family}\\ \beta_0 + \beta_2&\text{se for } \texttt{Residential}\end{array}\right.\)` --- ## Transformações Não Lineares dos Preditores ### Exemplo: log <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ## Transformações Não Lineares dos Preditores ### Exemplo: log .pull-left[ <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] .pull-right[ Relação real entre `x` e `y`: `\(y = 10 + 0.5log(x)\)` Modelos propostos: 1) `y ~ x` 2) `y ~ log(x)` ] Outras transformações comuns: raíz quadrada, Box-Cox. steps: `step_log()`, `step_BoxCox()`, `step_sqrt()` --- ## Transformações Não Lineares dos Preditores #### Exemplo: Regressão Polinomial .pull-left[ Relação real: `\(y = 500 + 0.4(x-10)^3\)` ] Modelo proposto: `\(y = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3\)` <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> Outras expansões comuns: b-splines, natural splines. steps: `step_poly()`, `step_bs()`, `step_ns` --- ## Transformações Não Lineares dos Preditores #### Exemplo: Regressão Polinomial .pull-left[ <table> <thead> <tr> <th style="text-align:right;"> y </th> <th style="text-align:right;"> idade </th> <th style="text-align:right;"> idade2 </th> <th style="text-align:right;"> idade3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 456.5 </td> <td style="text-align:right;"> 5.3 </td> <td style="text-align:right;"> 28.2 </td> <td style="text-align:right;"> 149.7 </td> </tr> <tr> <td style="text-align:right;"> 492.5 </td> <td style="text-align:right;"> 7.4 </td> <td style="text-align:right;"> 55.4 </td> <td style="text-align:right;"> 412.2 </td> </tr> <tr> <td style="text-align:right;"> 548.4 </td> <td style="text-align:right;"> 11.5 </td> <td style="text-align:right;"> 131.3 </td> <td style="text-align:right;"> 1503.9 </td> </tr> <tr> <td style="text-align:right;"> 758.7 </td> <td style="text-align:right;"> 18.2 </td> <td style="text-align:right;"> 329.9 </td> <td style="text-align:right;"> 5993.0 </td> </tr> <tr> <td style="text-align:right;"> 444.7 </td> <td style="text-align:right;"> 4.0 </td> <td style="text-align:right;"> 16.3 </td> <td style="text-align:right;"> 65.6 </td> </tr> <tr> <td style="text-align:right;"> 748.3 </td> <td style="text-align:right;"> 18.0 </td> <td style="text-align:right;"> 322.8 </td> <td style="text-align:right;"> 5800.8 </td> </tr> <tr> <td style="text-align:right;"> 820.5 </td> <td style="text-align:right;"> 18.9 </td> <td style="text-align:right;"> 357.0 </td> <td style="text-align:right;"> 6744.3 </td> </tr> <tr> <td style="text-align:right;"> 517.0 </td> <td style="text-align:right;"> 13.2 </td> <td style="text-align:right;"> 174.7 </td> <td style="text-align:right;"> 2308.3 </td> </tr> </tbody> </table> ] .pull-right[ Outras expansões comuns: b-splines, natural splines. steps: `step_poly()`, `step_bs()`, `step_ns` ] --- ## Interações Interação entre duas variáveis explicativas: `species` e `bill_length_mm` <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-9-1.png" height="330" style="display: block; margin: auto;" /> --- ## Interações Modelo proposto (Matemático): Seja `y = flipper_length_mm` e `x = bill_length_mm`, `$$\small \begin{array}{l} y = \beta_0 + \beta_1x\end{array}$$` <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-10-1.png" height="260" style="display: block; margin: auto;" /> Modelo proposto (em R): `Sepal.Width ~ Sepal.Length` --- ## Interações Modelo proposto (Matemático): Seja `y = Sepal.Width` e `x = Sepal.Length`, `$$\small \begin{array}{l} y = \beta_0 + \beta_1x + \beta_2I_{versicolor} + \beta_3I_{virginica}\end{array}$$` <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-11-1.png" height="260" style="display: block; margin: auto;" /> Modelo proposto (em R): `Sepal.Width ~ Sepal.Length + Species` --- ## Interações Modelo proposto (Matemático): Seja `y = Sepal.Width` e `x = Sepal.Length`, `$$\small \begin{array}{l} y = \beta_0 + \beta_1x + \beta_2I_{versicolor} + \beta_3I_{virginica} + \beta_4\color{red}{xI_{versicolor}} + \beta_5\color{red}{xI_{virginica}}\end{array}$$` <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-12-1.png" height="260" style="display: block; margin: auto;" /> Modelo proposto (em R): `step_interact(~flipper_length_mm:starts_with("species_"))`. --- class: middle, center ## Exemplo 04 --- ## Outras referências - Transformações recomendadas p/ cada modelo: https://www.tmwr.org/pre-proc-table.html - Lista de transformações do recipes: https://recipes.tidymodels.org/reference/index.html - Embbed: p/ quando o preditor tem muitas categorias: https://embed.tidymodels.org/ - Textos: quando colunas tem textos https://github.com/tidymodels/textrecipes - Séries temporais: https://business-science.github.io/timetk/reference/index.html#section-feature-engineering-operations-recipe-steps- --- exclude: false --- class: middle, center, inverse # Classificação --- # Regressão Logística .pull-left[ ### Para `\(Y \in \{0, 1\}\)` (binário) $$ log\left\(\frac{p}{1-p}\right\) = \beta_0 + \beta_1x $$ Ou... $$ p = \frac{1}{1 + e^{-(\beta_0 + \beta_1x)}} $$ ```r ### No R: logistic_reg() %>% fit(spam ~ exclamacoes, data = dt_spam) ``` ] .pull-right[ <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-14-1.png" width="400" style="display: block; margin: auto;" /> .footnote[ Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 131 (Logistic Regression). ] ] --- # Regressão Logística <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- # Regressão Logística
--- # Árvore de Decisão
--- # Regressão Logística - Custo A **Métrica** que a regressão logística usa de **Função de Custo** chama-se *log-loss* (ou *Binary Cross-Entropy*): `$$D = \frac{-1}{N}\sum[y_i \log\hat{y_i} + (1 - y_i )\log(1 - \hat{y_i})]$$` Para cada linha da base de dados seria assim... .pull-left[ `$$D_i = \begin{cases} \\ -\log(\hat{y}_i) & \text{quando} \space y_i = 1 \\\\\\ -\log(1-\hat{y}_i) & \text{quando} \space y_i = 0 \\ \!\end{cases}$$` ] .pull-rigth[ <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-18-1.png" height="280" style="display: block; margin: auto;" /> ] --- # Regressão Logística - Regularização A **Métrica** que a regressão logística usa de **Função de Custo** chama-se *log-loss* (ou *Binary Cross-Entropy*): `$$D = \frac{-1}{N}\sum[y_i \log\hat{y_i} + (1 - y_i )\log(1 - \hat{y_i})]$$` Regularizar é analogo a Regressão Linear. `$$D_{regularizado} = D + \color{red}{\lambda}\sum_{j = 1}^{p}|\beta_j|$$` **PS1:** Se `\(\log\left(\frac{\hat{p_i}}{1-\hat{p_i}}\right) = \beta_0 + \beta_1x\)` então `\(\hat{p_i} = \frac{1}{1 + e^{-(\beta_0 + \beta_1x)}}\)`. --- # Regressão Logística - Predições O "produto final" será um vetor de probabilidades estimadas. .pull-left[ <table> <thead> <tr> <th style="text-align:right;background-color: white !important;text-align: center;"> pts excl </th> <th style="text-align:left;background-color: white !important;text-align: center;"> classe observada </th> <th style="text-align:right;background-color: white !important;text-align: center;"> prob </th> <th style="text-align:left;background-color: white !important;text-align: center;"> classe predita </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;background-color: white !important;text-align: center;"> 167 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> 0.79 </td> <td style="text-align:left;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> Spam </td> </tr> <tr> <td style="text-align:right;background-color: white !important;text-align: center;"> 129 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> 0.45 </td> <td style="text-align:left;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> Não Spam </td> </tr> <tr> <td style="text-align:right;background-color: white !important;text-align: center;"> 299 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> 1.00 </td> <td style="text-align:left;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> Spam </td> </tr> <tr> <td style="text-align:right;background-color: white !important;text-align: center;"> 270 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> 1.00 </td> <td style="text-align:left;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> Spam </td> </tr> <tr> <td style="text-align:right;background-color: white !important;text-align: center;"> 187 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> 0.89 </td> <td style="text-align:left;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> Spam </td> </tr> <tr> <td style="text-align:right;background-color: white !important;text-align: center;"> 85 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> Não Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> 0.12 </td> <td style="text-align:left;background-color: white !important;text-align: center;font-weight: bold;color: purple !important;"> Não Spam </td> </tr> </tbody> </table> ] .pull-right[ <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> ] --- # Matriz de Confusão .pull-left[ <table class="table table-bordered" style="font-size: 20px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observado</div></th> </tr> <tr> <th style="text-align:left;background-color: white !important;text-align: center;"> Predito </th> <th style="text-align:left;background-color: white !important;text-align: center;"> Neg </th> <th style="text-align:left;background-color: white !important;text-align: center;"> Pos </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Neg </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; "> TN </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 2in; "> FN </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Pos </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; "> FP </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 2in; "> TP </td> </tr> </tbody> </table> <br/> <table class="table table-bordered" style="font-size: 20px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: red !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">p > 50%</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: black !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observado</div></th> </tr> <tr> <th style="text-align:left;background-color: white !important;text-align: center;"> Predito </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Não Spam </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Spam </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Não Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 410 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 73 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 52 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 465 </td> </tr> </tbody> </table> ] .pull-right[ $$ \begin{array}{lcc} \mbox{accuracy} & = & \frac{TP + TN}{TP + TN + FP + FN}\\\\ & & \\\\ \mbox{precision} & = & \frac{TP}{TP + FP}\\\\ & & \\\\ \mbox{recall/TPR} & = & \frac{TP}{TP + FN} \\\\ & & \\\\ \mbox{F1 score} & =& \frac{2}{1/\mbox{precision} + 1/\mbox{recall}}\\\\ & & \\\\ \mbox{FPR} & = & \frac{FP}{FP + TN} \end{array} $$ ] --- # Nota de Corte (Threshold) .pull-left[ <table class="table table-bordered" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: red !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">p > 10%</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: black !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observado</div></th> </tr> <tr> <th style="text-align:left;background-color: white !important;text-align: center;"> Predito </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Não Spam </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Spam </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Não Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 267 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 8 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 195 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 530 </td> </tr> </tbody> </table> <table class="table table-bordered" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: red !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">p > 25%</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: black !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observado</div></th> </tr> <tr> <th style="text-align:left;background-color: white !important;text-align: center;"> Predito </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Não Spam </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Spam </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Não Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 332 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 28 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 130 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 510 </td> </tr> </tbody> </table> <table class="table table-bordered" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: red !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">p > 50%</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: black !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observado</div></th> </tr> <tr> <th style="text-align:left;background-color: white !important;text-align: center;"> Predito </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Não Spam </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Spam </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Não Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 410 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 73 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 52 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 465 </td> </tr> </tbody> </table> ] .pull-right[ <table class="table table-bordered" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: red !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">p > 75%</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: black !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observado</div></th> </tr> <tr> <th style="text-align:left;background-color: white !important;text-align: center;"> Predito </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Não Spam </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Spam </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Não Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 443 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 112 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 19 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 426 </td> </tr> </tbody> </table> <table class="table table-bordered" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: red !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">p > 90%</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; color: black !important;padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observado</div></th> </tr> <tr> <th style="text-align:left;background-color: white !important;text-align: center;"> Predito </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Não Spam </th> <th style="text-align:right;background-color: white !important;text-align: center;"> Spam </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Não Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 456 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 171 </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Spam </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 3in; "> 6 </td> <td style="text-align:right;background-color: white !important;text-align: center;width: 2in; "> 367 </td> </tr> </tbody> </table> ] --- # Curva ROC .pull-left[ <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> [An introduction to ROC analysis](https://people.inf.elte.hu/kiss/11dwhdm/roc.pdf) ] .pull-right[ <br/> <table class="table table-bordered" style="font-size: 20px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observado</div></th> </tr> <tr> <th style="text-align:left;background-color: white !important;text-align: center;"> Predito </th> <th style="text-align:left;background-color: white !important;text-align: center;"> Neg </th> <th style="text-align:left;background-color: white !important;text-align: center;"> Pos </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Neg </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; "> TN </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 2in; "> FN </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Pos </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; "> FP </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 2in; "> TP </td> </tr> </tbody> </table> $$ \begin{array}{lcc} \mbox{TPR} & = & \frac{TP}{TP + FN} \\\\ & & \\\\ \mbox{FPR} & = & \frac{FP}{FP + TN} \end{array} $$ ] --- # Curva ROC - Métrica AUC .pull-left[ <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> [An introduction to ROC analysis](https://people.inf.elte.hu/kiss/11dwhdm/roc.pdf) ] .pull-right[ <br/> <table class="table table-bordered" style="font-size: 20px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observado</div></th> </tr> <tr> <th style="text-align:left;background-color: white !important;text-align: center;"> Predito </th> <th style="text-align:left;background-color: white !important;text-align: center;"> Neg </th> <th style="text-align:left;background-color: white !important;text-align: center;"> Pos </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Neg </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; "> TN </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 2in; "> FN </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> Pos </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; "> FP </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 2in; "> TP </td> </tr> </tbody> </table> $$ \mbox{AUC} = \mbox{Area Under The ROC Curve} $$ ] **PS:** AUC varia de 0.5 a 1.0. O que significa se AUC for zero? --- # Curva ROC - Playground <a href = "http://arogozhnikov.github.io/2015/10/05/roc-curve.html"> <img src="static/img/roc_curve.gif" style=" display: block; margin-left: auto; margin-right: auto;"></img> </a> --- # Múltiplas Notas de Corte .pull-left[ Risco por Segmentação <table class="table table-bordered" style="font-size: 20px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; padding-right: 4px; padding-left: 4px; background-color: white !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Observado</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> </tr> <tr> <th style="text-align:left;background-color: white !important;text-align: center;"> Predito </th> <th style="text-align:left;background-color: white !important;text-align: center;"> Neg </th> <th style="text-align:left;background-color: white !important;text-align: center;"> Pos </th> <th style="text-align:left;background-color: white !important;text-align: center;"> N </th> <th style="text-align:left;background-color: white !important;text-align: center;"> Risco </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> A (até 0,19) </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; "> 90 </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 2in; "> 11 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> 101 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> 11% </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> B (até 0,44) </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; "> 60 </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 2in; "> 40 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> 100 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> 40% </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> C (até 0,62) </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; "> 39 </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 2in; "> 60 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> 99 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> 60% </td> </tr> <tr> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; font-weight: bold;"> D (0,62 ou +) </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 3in; "> 20 </td> <td style="text-align:left;background-color: white !important;text-align: center;width: 2in; "> 80 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> 100 </td> <td style="text-align:left;background-color: white !important;text-align: center;"> 80% </td> </tr> </tbody> </table> ] .pull-right[ Usamos o `score` como preferirmos ```r dados %>% mutate( segmento = case_when( score < 0.19 ~ "A", score < 0.44 ~ "B", score < 0.62 ~ "C", score >= 0.62 ~ "D")) ``` ] <img src="02-intro-classificacao_files/figure-html/unnamed-chunk-32-1.png" width="800" style="display: block; margin: auto;" /> --- class: middle, center ## Exemplo 05 --- class: middle, center ## Exercício 02