Regressão Linear

class: center, middle, inverse, title-slide

# Regressão Linear
## Teoria e Prática
### <img src = 'https://d33wubrfki0l68.cloudfront.net/9b0699f18268059bdd2e5c21538a29eade7cbd2b/67e5c/img/logo/cursor1-5.png' width = '20%'>
### 2020-03-14

---

## Curso-R

---

## Linha do tempo

---

## Agenda

- O que é e quando usar
- Parâmetro vs estimador
- Teste de Hipóteses e valor-p
- Interpretação dos parâmetros
- Desempenho: EQM e EPR
- Outliers
- Regressão Linear Múltipla
- Preditores Categóricos
- Transformações Não Lineares dos Preditores
- Interações
- Multicolinearidade
- Fazendo Predições
- <p style="color:red;margin-top:0px;margin-bottom:0px;">Interpretabilidade da Predição</p>
- <p style="color:red;margin-top:0px;margin-bottom:0px;">Sobreajuste (overfitting)</p>
- <p style="color:red;margin-top:0px;margin-bottom:0px;">Regularização</p>
- <p style="color:red;margin-top:0px;margin-bottom:0px;">LASSO e Seleção de Preditores</p>
- <p style="color:red;margin-top:0px;margin-bottom:0px;">Validação Cruzada</p>
- <p style="color:red;margin-top:0px;margin-bottom:0px;">Limitações da Regressão Linear</p>
- Exercício para casa

---

## Referência

---

## O que é e quando usar

.pull-left[

### Regressão Linear Simples

$$
y = \beta_0 + \beta_1x
$$

### Exemplo:

$$
dist = \beta_0 + \beta_1speed
$$

]

.pull-right[

![](index_files/figure-html/unnamed-chunk-1-1.png)

]

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 61 (Simple Linear Regression).
]

---

## O que é e quando usar

.pull-left[

### Regressão Linear Simples

$$
y = \beta_0 + \beta_1x
$$

### Exemplo:

$$
dist = \beta_0 + \beta_1speed
$$

]

.pull-right[

![](index_files/figure-html/unnamed-chunk-2-1.png)

]

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 61 (Simple Linear Regression).
]

---

## O que é e quando usar

.pull-left[

### Regressão Linear Simples

$$
y = \beta_0 + \beta_1x
$$

### Exemplo:

$$
dist = \beta_0 + \beta_1speed
$$

]

.pull-right[

![](index_files/figure-html/unnamed-chunk-3-1.png)

]

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 61 (Simple Linear Regression).
]

---

## O que é e quando usar

.pull-left[

### Regressão Linear Simples

$$
y = \beta_0 + \beta_1x
$$

]

.pull-right[

]

```r
# ajuste de uma regressão linear simples no R
*melhor_reta <- lm(dist ~ speed, data = cars)
melhor_reta
```

```
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Coefficients:
## (Intercept)        speed  
##     -17.579        3.932
```

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 61 (Simple Linear Regression).
]

---

## "Melhor Reta" segundo o quê?

Queremos a reta que **erre menos**.

Uma medida de erro: **E**rro **Q**uadrático **M**édio.

$$
EQM = \frac{1}{N}\sum(y_i - \hat{y_i})^2
$$

![](index_files/figure-html/unnamed-chunk-5-1.png)

---

## "Melhor Reta" segundo o quê?

Queremos a reta que **erre menos**.

Uma medida de erro: **E**rro **Q**uadrático **M**édio.

$$
EQM = \frac{1}{N}\sum(y_i - \hat{y_i})^2
$$

Ou seja, nosso **objetivo** é

## Encontrar `$\hat{\beta}_0$` e `$\hat{\beta}_1$` que nos retorne o ~menor~ EQM.

... que é o mesmo que dizer "encontrar a melhor reta que explique os dados".

OBS: o EQM é a nossa **Função de Custo**.

```r
# lembrete: exercício 1 do script!
```

---

## Qual o valor ótimo para `$\beta_0$` e `$\beta_1$`?

No nosso exemplo, a nossa **HIPÓTESE** é de que

$$
dist = \beta_0 + \beta_1speed
$$

Então podemos escrever o Erro Quadrático Médio como

$$
EQM = \frac{1}{N}\sum(y_i - \hat{y_i})^2 = \frac{1}{N}\sum(y_i -  \color{red}{(\hat{\beta}_0 + \hat{\beta}_1speed)})^2 
$$

Com ajuda do Cálculo é possível mostrar que os valores ótimos para `$\beta_0$` e `$\beta_1$` são

`$\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}$`

`$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}$`

Já que vieram do EQM, eles são chamados de **Estimadores de Mínimos Quadrados**.

```r
# lembrete: exercício 2 do script!
```

---

## Depois de estimar...

$$
\hat{y} = \hat{\beta}_0 + \hat{\beta}_1x
$$

### Exemplo:

$$
\hat{dist} = \hat{\beta}_0 + \hat{\beta}_1speed
$$

Colocamos um `$\hat{}$` em cima dos termos para representar "estimativas". Ou seja, `$\hat{y}_i$` é uma estimativa de `$y_i$`.

No nosso exemplo,

- `$\hat{\beta}_0$` é uma estimativa de `$\beta_0$` e vale `-17.579`.
- `$\hat{\beta}_1$` é uma estimativa de `$\beta_1$` e vale `3.932`.
- `$\hat{dist}$` é uma estimativa de `$dist$` e vale `-17.579 + 3.932 x speed`.

```r
# Exercício: se speed for 15 m/h, quanto que 
# seria a distância dist esperada?
```

---

## Teste de Hipóteses e valor-p

Exemplo: relação entre População Urbana e Assassinatos.

.pull-left[

![](index_files/figure-html/unnamed-chunk-9-1.png)

Modelo proposto:

`$$y = \beta_0 + \beta_1 x$$`

]

.pull-right[

Hipótese do pesquisador: 
> "Assassinatos não estão relacionados com a proporção de população urbana de uma cidade."

Tradução da hipótese em termos matemáticos:

$$
H_0: \beta_1 = 0 \space\space\space\space\space vs \space\space\space\space H_a: \beta_1 \neq 0
$$
Se a hipótese for verdade, então o `$\beta_1$` deveria ser zero. Porém, os dados disseram que `$\hat{\beta}_1 = 0.02$`.

### 0.02 é diferente de 0.00?

]

---

## 0.02 é diferente de 0.00?

### Saída do R

```
## 
## Call:
## lm(formula = Murder ~ UrbanPop, data = USArrests)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.537 -3.736 -0.779  3.332  9.728 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  6.41594    2.90669   2.207   0.0321 *
*## UrbanPop     0.02093    0.04333   0.483   0.6312  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.39 on 48 degrees of freedom
## Multiple R-squared:  0.00484,	Adjusted R-squared:  -0.01589 
## F-statistic: 0.2335 on 1 and 48 DF,  p-value: 0.6312
```

---

## 0.02 é diferente de 0.00?

Conceito importante: Os estimadores ( `$\hat{\beta}_0$` e `$\hat{\beta}_1$` no nosso caso) têm distribuições de probabilidade.

### Simulação de 1000 retas (ajustadas com dados diferentes).

![distrib_params](img/combined_gif.gif)

---

## 0.02 é diferente de 0.00?

Conceito importante: Os estimadores ( `$\hat{\beta}_0$` e `$\hat{\beta}_1$` no nosso caso) têm distribuições de probabilidade.

### A Teoria Assintótica nos fornece o seguinte resultado:

.pull-left[

`$t = \frac{\hat{\beta_1} - \beta_1}{\hat{\sigma}_{\beta_1}} \overset{\text{a}}{\sim}  t(N - 2)$`

#### Em que

`$\hat{\sigma}_{\beta_1} = \sqrt{\frac{EQM}{\sum(x_i - \bar{x})^2}}$`

Usamos essas distribuições assintóticas para testar as hipóteses.

]

.pull-right[

]

---

## 0.02 é diferente de 0.00?

Conceito importante: Os estimadores ( `$\hat{\beta}_0$` e `$\hat{\beta}_1$` no nosso caso) têm distribuições de probabilidade.

### A Teoria Assintótica nos fornece o seguinte resultado:

.pull-left[

`$t = \frac{\hat{\beta_1} - \beta_1}{\hat{\sigma}_{\beta_1}} \overset{\text{a}}{\sim}  t(N - 2)$`

#### Em que

`$\hat{\sigma}_{\beta_1} = \sqrt{\frac{EQM}{\sum(x_i - \bar{x})^2}}$`

Usamos essas distribuições assintóticas para testar as hipóteses.

]

.pull-right[

No nosso exemplo, a hipótese é `$H_0: \beta_1 = 0$`, então

`$$t = \frac{0.02 - 0}{0.04} = 0.48$$`

![distrib_params](img/teste_t.png)

]

---

## 0.02 é diferente de 0.00?

Então agora podemos tomar decisão! Se a estimativa cair muito distante da distribuição t da hipótese 0, decidimos por **rejeitá-la**. Caso contrário, decidimos por **aceitá-la** como verdade.

![testes_t_estrelas](img/testes_t_estrelas.png)

```
## NO R:
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  6.41594    2.90669   2.207   0.0321 *
## UrbanPop     0.02093    0.04333   0.483   0.6312  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

## Interpretação dos parâmetros

.pull-left[

![](index_files/figure-html/unnamed-chunk-12-1.png)

$$
y = \color{darkgblue}{\beta_0} + \color{darkgreen}{\beta_1}x
$$

]

.pull-right[

### Interpretações matemáticas

`$\color{darkgblue}{\beta_0}$` é o lugar em que a reta cruza o eixo Y.

`$\color{darkgreen}{\beta_1}$` é a derivada de Y em relação ao X. É quanto Y varia quando X varia em 1 unidade.

### Interpretações estatísticas

`$\color{darkgblue}{\beta_0}$` é a distância percorrida esperada quando o carro está parado (X = 0).

`$\color{darkgreen}{\beta_1}$` é o efeito médio na distância por variar 1 ml/h na velocidade do carro.

]

---

## Teste de Hipóteses e valor-p

### Exercício 3 do script:

No R, use a função `summary(melhor_reta)` (ver slide 9) para decidir se `speed` está associado com `dist`. Descubra o valor-p associado.

lembrete: o banco de dados se chama `cars`.

### Exercício 4 do script:

Interprete o parâmetro `$\beta_1$`.

---

## O modelo está bom?

### EQM e EPR

ERP significa *E*rro *P*adrão dos *R*esíduos e é definido como

$$
EPR = \frac{\sum(y_i - \hat{y_i})^2}{N - 2} = \frac{SQR}{N - 2}
$$
O **2** no denominador decorre do fato de termos **2 parâmetros** para estimar no modelo.

- Se `$y_i = \hat{y}_i \space\space\space \rightarrow \color{green}{EPR = 0 \downarrow}$`
- Se `$y_i >> \hat{y}_i \rightarrow \color{red}{EPR = alto \uparrow}$`
- Se `$y_i << \hat{y}_i \rightarrow \color{red}{EPR = alto \uparrow}$`

Problema: Como sabemos se o EPR é grande ou pequeno?

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 68 (Assessing the Accuracy of the Model).

]

---

## O modelo está bom?

### R-quadrado ( `$R^2$` )

$$
R^2 = 1 - \frac{\sum(y_i - \color{salmon}{\hat{y_i}})^2}{\sum(y_i - \color{royalblue}{\bar{y}})^2} = 1 - \frac{\color{salmon}{SQR}}{\color{royalblue}{SQT}}
$$

.pull-left[

]

.pull-right[

`$R^2 \approx 1 \rightarrow \color{salmon}{SQR} << \color{royalblue}{SQT}$`.
`$R^2 \approx 0 \rightarrow \color{salmon}{reta} \text{ em cima da } \color{royalblue}{reta}$`.

Problema do `$R^2$` é que ele sempre aumenta conforme novos preditores vão sendo incluídos.

]

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 68 (Assessing the Accuracy of the Model).
]

---

## O modelo está bom?

### R-quadrado ajustado

$$
R^2 = 1 - \frac{\color{salmon}{SQR}}{\color{royalblue}{SQT}}\frac{\color{royalblue}{N-1}}{\color{salmon}{N-p}}
$$

Em que `$p$` é o número de parâmetros do modelo (no caso da regressão linear simples, `$p = 2$`).

```r
# lembrete: exercícios 5 e 6 do script!
```

---

## Outliers

Resíduo = `$y_i - \hat{y}_i$` (observado - esperado).

```r
# lembrete: faça o exercício 7 do script
```

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 96 (Outliers).
]

---

## Regressão Linear Múltipla

### Regressão Linear Simples

$$
y = \beta_0 + \beta_1x
$$

### Regressão Linear Múltipla

$$
y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \beta_px_p
$$

```r
# ajuste de uma regressão linear múltipla no R
*modelo_boston <- lm(medv ~ lstat + age, data = Boston)
summary(modelo_boston)
#             Estimate Std.Error t value Pr(>|t|)    
# (Intercept) 33.22    0.73      45.4    < 2e-16 ***
# lstat       -1.03    0.04     -21.4    < 2e-16 ***
# age          0.03    0.01       2.8    0.00491 ** 
```

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 71 (Multiple Linear Regression).
]

---

## Regressão Linear Múltipla

### Exemplo: Plano em vez de reta

Modelo: `lm(mpg ~ disp + wt, data = mtcars)`

.footnote[
Fonte: [sthda.com/impressive-package-for-3d](http://www.sthda.com/english/wiki/impressive-package-for-3d-and-4d-graph-r-software-and-data-visualization)
]

---

## Preditores Categóricos

### Preditor com apenas 2 categorias

Saldo médio no cartão de crédito é diferente entre homens e mulheres?

![](index_files/figure-html/unnamed-chunk-19-1.png)

```r
summary(lm(Balance ~ Gender, data = Credit))
# Coefficients:
#              Estimate  Std.Error  t value Pr(>|t|)    
# (Intercept)    509.80  33.13      15.389   <2e-16 ***
# GenderFemale    19.73  46.05       0.429    0.669   
```

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 84 (Predictors with Only Two Levels).
]

---

## Preditores Categóricos

### Preditor com apenas 2 categorias

Saldo médio no cartão de crédito é diferente entre homens e mulheres?

![](index_files/figure-html/unnamed-chunk-21-1.png)

$$
y_i = \beta_0 + \beta_1x_i \space\space\space\space\space\space \text{em que}\space\space\space\space\space\space x_i = \Bigg\\{\begin{array}{ll}1&\text{se a i-ésima pessoa for }\texttt{female}\\\\
0&\text{se a i-ésima pessoa for } \texttt{male}\end{array}
$$

```r
# lembrete: exercícios 8 e 9 do script!
```

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 84 (Predictors with Only Two Levels).
]

---

## Preditores Categóricos

### Preditor com 3 ou mais categorias

.pull-left[

![](index_files/figure-html/unnamed-chunk-23-1.png)

```r
summary(lm(Balance ~ Ethnicity, data = Credit))
#                    Estimate  Std.Error t value Pr(>|t|)    
# (Intercept)          531.00  46.32     11.464   <2e-16 ***
# EthnicityAsian       -18.69  65.02     -0.287    0.774    
# EthnicityCaucasian   -12.50  56.68     -0.221    0.826  
```

]

.pull-right[

Modelo

`$$y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i}$$`

Em que

`$x_{1i} = \Bigg \{ \begin{array}{ll} 1 & \text{se for }\texttt{Asian}\\0&\text{caso contrário}\end{array}$`

`$x_{2i} = \Bigg \{ \begin{array}{ll} 1 & \text{se for }\texttt{Caucasian}\\0&\text{caso contrário}\end{array}$`

]

---

## Preditores Categóricos

### Preditor com 3 ou mais categorias

"One hot enconding" ou "Dummies" ou "Indicadores".

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Ethnicity </th>
   <th style="text-align:right;"> (Intercept) </th>
   <th style="text-align:right;"> EthnicityAsian </th>
   <th style="text-align:right;"> EthnicityCaucasian </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Caucasian </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Asian </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Asian </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Asian </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Caucasian </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Caucasian </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> African American </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Asian </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
</tbody>
</table>

```r
# lembrete: exercício 10 do script!
```

---

## Preditores Categóricos

### Preditor com 3 ou mais categorias

Interpretação dos parâmetros:

`$y_{i} = \left\{ \begin{array}{ll} \beta_0 & \text{se for }\texttt{Afro American}\\ \beta_0 + \beta_1&\text{se for } \texttt{Asian}\\ \beta_0 + \beta_2&\text{se for } \texttt{Caucasian}\end{array}\right.$`

```r
# interprete cada um dos três parâmetros individualmente.
# lembrete: exercícios 10 e 11 do script!
```

---

## Transformações Não Lineares dos Preditores

### Exemplo: log

.pull-left[
Modelo real: `$y = 10 + 0.5log(x)$` 
]

.pull-right[
Modelo proposto: `$\small y = \beta_0 + \beta_1log(x)$` 
]

Outras transformações comuns: raíz quadrada, polinômios, Box-Cox, ...

```r
# lembrete: exercício 12 do script!
```

---

## Transformações Não Lineares dos Preditores

### Exemplo: Regressão Polinomial

.pull-left[
Modelo real: `$y = 500 + 0.4(x-10)^3$` 
]

.pull-right[
Modelo proposto: `$y = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3$` 
]

![](index_files/figure-html/unnamed-chunk-31-1.png)

```r
# lembrete: exercício 13 do script!
```

---

## Interações

Modelo proposto (Matemático): Seja `Sepal.Width` o `$y$` e `Sepal.Length` o `$x$`,

`$$\small \begin{array}{l} y = \beta_0 + \beta_1x + \beta_2I_{versicolor} + \beta_3I_{virginica} + \beta_4\color{red}{xI_{versicolor}} + \beta_5\color{red}{xI_{virginica}}\end{array}$$`

Modelo proposto (em R): `Sepal.Width ~ Sepal.Length * Species`

```r
# lembrete: exercícios 14 ao 17 do script!
```

---

## Multicolinearidade

.pull-left[

![](index_files/figure-html/unnamed-chunk-35-1.png)

]

.pull-right[

Modelo 1: sem colineares

<table class="table" style="font-size: 13px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:left;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> -173.41 </td>
   <td style="text-align:left;"> <span style="     color: black !important;padding-right: 4px; padding-left: 4px; background-color: white !important;text-align: r;">43.83</span> </td>
   <td style="text-align:right;"> -3.96 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Limit </td>
   <td style="text-align:right;"> 0.17 </td>
   <td style="text-align:left;"> <span style="     color: white !important;padding-right: 4px; padding-left: 4px; background-color: darkred !important;text-align: r;">0.01</span> </td>
   <td style="text-align:right;"> 34.50 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Age </td>
   <td style="text-align:right;"> -2.29 </td>
   <td style="text-align:left;"> <span style="     color: black !important;padding-right: 4px; padding-left: 4px; background-color: white !important;text-align: r;">0.67</span> </td>
   <td style="text-align:right;"> -3.41 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
</tbody>
</table>

Modelo 2: com colineares

<table class="table" style="font-size: 13px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:left;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> -377.54 </td>
   <td style="text-align:left;"> <span style="     color: black !important;padding-right: 4px; padding-left: 4px; background-color: white !important;text-align: r;">45.25</span> </td>
   <td style="text-align:right;"> -8.34 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Limit </td>
   <td style="text-align:right;"> 0.02 </td>
   <td style="text-align:left;"> <span style="     color: white !important;padding-right: 4px; padding-left: 4px; background-color: darkred !important;text-align: r;">0.06</span> </td>
   <td style="text-align:right;"> 0.38 </td>
   <td style="text-align:right;"> 0.70 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Rating </td>
   <td style="text-align:right;"> 2.20 </td>
   <td style="text-align:left;"> <span style="     color: black !important;padding-right: 4px; padding-left: 4px; background-color: white !important;text-align: r;">0.95</span> </td>
   <td style="text-align:right;"> 2.31 </td>
   <td style="text-align:right;"> 0.02 </td>
  </tr>
</tbody>
</table>

]

Problema: Instabilidade numérica, desvios padrão inflados e interpretação comprometida.

Soluções: eliminar uma das variáveis muito correlacionadas ou Consultar o VIF (Variance Inflation Factor)

---

## Multicolinearidade

### VIF (Variance Inflation Factor)

Detecta preditores que são combinações lineares de outros preditores.

**Procedimento:** Para cada preditor `$X_j$`,

1) Ajusta regressão linear com as demais: `lm(X_j ~ X_1 + ... + X_p)`.

2) Calcula-se o R-quadrado dessa regressão e aplica a fórmula abaixo

`$$\small VIF(\hat{\beta}_j) = \frac{1}{1 - R^2_{X_j|X_{-j}}}$$`

3) Remova o preditor se VIF maior que 5 (regra de bolso).

```r
# lembrete: exercícios 18 do script!
```

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 101.
]

---

## Fazendo Predições

### Modelo

```r
modelo_iris <- lm(Sepal.Width ~ Sepal.Length * Species, data = iris)
```

Suponha que dados novos chegaram:

```r
dados_novos <- tibble(Sepal.Length = 10, Species = "setosa")
```

### Utilizando a função `augment()` do pacote `broom`

```r
augment(modelo_iris, newdata = dados_novos)
##   Sepal.Length Species .fitted .se.fit
## 1           10 setosa     7.42   0.553
```

O valor estimado de `Sepal.Width` foi de `7.42` +/- `0.55`.

---

## Fazendo Predições

### Modelo

```r
modelo_iris <- lm(Sepal.Width ~ Sepal.Length * Species, data = iris)
```

Suponha que dados novos chegaram:

```r
dados_novos <- data.frame(Sepal.Length = 10, Species = "setosa")
```

### Utilizando a função `predict()`

```r
dados_novos %>% 
  mutate(S.W.Est = predict(modelo_iris, newdata = dados_novos))
##  Sepal.Length Species S.W.Est
##            10 setosa     7.42
```

---

## Interpretabilidade da Predição

### LIME: Local Interpretable Model-agnostic Explanations

```r
# fazendo lm com caret::train() pq o lime soh aceita caret
modelo_iris <- train(Sepal.Width ~ Sepal.Length * Species, data = iris, method = "lm")
explicador <- lime(iris, modelo_iris)
explicacoes <- lime::explain(dados_novos, explicador, n_features = 2)
plot_features(explicacoes) 
```

.pull-left[

```r
# lembrete: exercícios 19 
# do script!
```

]

.pull-right[

![](index_files/figure-html/unnamed-chunk-48-1.png)

]

.footnote[
Ver [LIME for R](https://github.com/thomasp85/lime) página 101.
]

---

## (Opcional) Abordagem Probabilística

Do ponto de vista probabilístico, modela-se o problema como uma amostra de N indivíduos, todos independentes entre si e com distribuição Normal.

$$
Y_i|x_i \sim N(\mu_i, \sigma^2), \space i = 1, \dots, N
$$

E então, supõem que a média de `$Y$` dado o valor de `$x$` seja linear:

$$
\mu = E[Y|x] = \beta_0 + \beta_1x
$$

Assim, gostaríamos de achar `$\beta_0$` e `$\beta_1$` que fizessem dessa amostra a mais verossímil possível.

Daí entra o conceito de verossimilhança, que é a probabilidade conjunta dos dados acontecerem:

$$
P(Y_1, \dots, Y_N|x) \overset{\text{indep}}{=} P(Y_1|x_1)P(Y_2|x_2)\dots P(Y_N|x_N)
$$

continua...

---

## (Opcional) Abordagem Probabilística

Se tirarmos o `$logarítmo$` dessa probabilidade conjunta, teremos:

$$
logP(Y_1, \dots, Y_N|x) \overset{\text{indep}}{=} logP(Y_i|x_1) + log P(Y_2|x_2) +\dots +logP(Y_N|x_N)
$$

Que podemos escrever de forma mais sussinta usando um somatório:

`$$\ell = logP(Y_1, \dots, Y_N|x) = \sum_{i = 1}^{N}logP(Y_i|x_i)$$`

Essa expressão que chamamos de `$\ell$` é conhecida como log-verossimilhança (log-likelihood no inglês).

continua...

---

## (Opcional) Abordagem Probabilística

Já que assumimos que `$Y_i|x_i$` segue uma distribuição `$N(\beta_0 + \beta_1x_i, \sigma^2)$`, temos que:

$$
\ell =  \sum_{i = 1}^{N}log\left(\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left[-\frac{1}{2\sigma^2}(y_i - \mu_i)^2 \right] \right)
$$

Que depois de simplificar (e deixando as constantes de fora), fica

`$$\ell = -\frac{1}{N}\sum(y_i - \mu_i)^2 = -\frac{1}{N}\sum(y_i -  \color{red}{(\beta_0 + \beta_1speed)})^2 = -EQM$$`

Ou seja, maximizar a verossimilhança é equivalente a minimizar o EQM como vínhamos fazendo.

---

## Questões importantes

Questões que usualmente estamos interessados quando ajustamos uma regressão linear.

- Pelo menos um dos preditores  `$X1, X2,\dots,X_p$` é útil para prever/explicar?

- Todos os preditores são úteis ou apenas um subconjunto deles que é?

- O quão bem o modelo se ajusta aos dados?

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 75 (Some Important Questions).
]

---

## Sobreajuste (overfitting)

Modelo real é de **grau 3**

![distrib_params](img/overfiting_sem_teste.gif)

---

## Sobreajuste (overfitting)

Modelo real é de **grau 3**

![distrib_params](img/overfiting_com_teste.gif)

---

## Sobreajuste (overfitting)

Modelo real é de **grau 3**

![scatter_eqm](img/overfiting_scatter_eqm.gif)

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 61 (Simple Linear Regression).
]

---

## Regularização

Relembrando o nossa **função de custo** EQM.

`$$EQM = \frac{1}{N}\sum(y_i - \hat{y_i})^2 = \frac{1}{N}\sum(y_i -  \color{red}{(\hat{\beta}_0 + \hat{\beta}_1x_{1i} + \dots + \hat{\beta}_px_{pi})})^2$$`

Regularizar é "não deixar os `$\beta's$` soltos demais".

`$$EQM_{regularizado} = EQM + \color{red}{\lambda}\sum_{j = 1}^{p}|\beta_j|$$`

Ou seja, **penalizamos** a função de custo se os `$\beta's$` forem muito grandes.

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 203 (Linear Model Selection and Regularization).
]

---

## LASSO e Seleção de Preditores

Conforme aumentamos o `$\color{red}{\lambda}$`, forçamos os `$\beta's$` a serem cada vez menores.

![scatter_eqm](img/lasso_lambda.png)

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 219 (The LASSO).
]

---

## LASSO e Seleção de Preditores

Conforme aumentamos o `$\lambda$`, forçamos os `$\beta's$` a serem cada vez menores.

![scatter_eqm](img/betas.png)

.footnote[
Ver [ISL](https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf) página 219 (The LASSO).
]

---

## Validação Cruzada

```
## #  5-fold cross-validation 
## # A tibble: 5 x 6
##   splits          id    n_treino n_teste regressao    eqm_teste
## * <named list>    <chr>    <dbl>   <dbl> <named list>     <dbl>
## 1 <split [40/10]> Fold1       40      10 <lm>              11.9
## 2 <split [40/10]> Fold2       40      10 <lm>              15.0
## 3 <split [40/10]> Fold3       40      10 <lm>              18.2
## 4 <split [40/10]> Fold4       40      10 <lm>              21.0
## 5 <split [40/10]> Fold5       40      10 <lm>              11.0
```

```
## [1] 15.426
```

ERRO DE VALIDAÇÃO CRUZADA: `$$EQM_{cv} = \frac{1}{10}\sum_{i=1}^{10}EQM_{Fold_i} = 14,671$$`

---

## Validação Cruzada

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> fold </th>
   <th style="text-align:right;"> speed </th>
   <th style="text-align:right;"> dist </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: #F8766D !important;"> fold 1 </td>
   <td style="text-align:right;background-color: #F8766D !important;"> 4 </td>
   <td style="text-align:right;background-color: #F8766D !important;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #F8766D !important;"> fold 1 </td>
   <td style="text-align:right;background-color: #F8766D !important;"> 4 </td>
   <td style="text-align:right;background-color: #F8766D !important;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #D39200 !important;"> fold 2 </td>
   <td style="text-align:right;background-color: #D39200 !important;"> 7 </td>
   <td style="text-align:right;background-color: #D39200 !important;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #D39200 !important;"> fold 2 </td>
   <td style="text-align:right;background-color: #D39200 !important;"> 7 </td>
   <td style="text-align:right;background-color: #D39200 !important;"> 22 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #93AA00 !important;"> fold 3 </td>
   <td style="text-align:right;background-color: #93AA00 !important;"> 8 </td>
   <td style="text-align:right;background-color: #93AA00 !important;"> 16 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #93AA00 !important;"> fold 3 </td>
   <td style="text-align:right;background-color: #93AA00 !important;"> 9 </td>
   <td style="text-align:right;background-color: #93AA00 !important;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #00BA38 !important;"> fold 4 </td>
   <td style="text-align:right;background-color: #00BA38 !important;"> 10 </td>
   <td style="text-align:right;background-color: #00BA38 !important;"> 18 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #00BA38 !important;"> fold 4 </td>
   <td style="text-align:right;background-color: #00BA38 !important;"> 10 </td>
   <td style="text-align:right;background-color: #00BA38 !important;"> 26 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #00C19F !important;"> fold 5 </td>
   <td style="text-align:right;background-color: #00C19F !important;"> 10 </td>
   <td style="text-align:right;background-color: #00C19F !important;"> 34 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #00C19F !important;"> fold 5 </td>
   <td style="text-align:right;background-color: #00C19F !important;"> 11 </td>
   <td style="text-align:right;background-color: #00C19F !important;"> 17 </td>
  </tr>
</tbody>
</table>

---

## Limitações da Regressão Linear

- Variável resposta Não Normal
- Variável resposta Positiva
- Variável resposta Categórica
- Relação funcional não linear entre X e Y
- Muitas interações para testar entre as preditoras