---
title: "Poisson Pseudo-Maximum Likelihood (PPML) Model with Cluster-Robust Standard Errors"
output: rmarkdown::html_vignette
bibliography: "references.bib"
vignette: >
  %\VignetteIndexEntry{Poisson Pseudo-Maximum Likelihood (PPML) Model with Cluster-Robust Standard Errors}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

We will estimate a Poisson Pseudo-Maximum Likelihood (PPML) model using the
data available in this package with the idea of replicating the PPML results
from Table 3 in @yotov2016advanced.

This requires to include exporter-time and importer-time fixed effects, and to
cluster the standard errors by exporter-importer pairs.

The PPML especification corresponds to:
\begin{align}
X_{ij,t} =& \:\exp\left[\beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} +\right.\\
\text{ }& \:\left.\beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \pi_{i,t} + \chi_{i,t}\right] \times \varepsilon_{ij,t}.
\end{align}

We use `dplyr` to obtain the log of the distance. This model excludes
domestic flows, therefore we need to subset the data also with `dplyr`.

Required packages:

```r
library(capybara)
```

We can use the `fepoisson()` function to obtain the estimated coefficients
and we add the fixed effects as `| exp_year + imp_year` in the formula.

Model estimation:

```r
fit <- fepoisson(
  trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year,
  data = trade_panel
)

summary(fit)
```

```r
Formula: trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year

Family: Poisson

Estimates:

|          | Estimate | Std. Error | z value    | Pr(>|z|)   |
|----------|----------|------------|------------|------------|
| log_dist |  -0.8216 |     0.0004 | -2194.0448 | 0.0000 *** |
| cntg     |   0.4155 |     0.0009 |   476.0613 | 0.0000 *** |
| lang     |   0.2499 |     0.0008 |   296.8884 | 0.0000 *** |
| clny     |  -0.2054 |     0.0010 |  -206.3476 | 0.0000 *** |
| rta      |   0.1907 |     0.0010 |   191.0964 | 0.0000 *** |

Significance codes: *** 99.9%; ** 99%; * 95%; . 90%

Pseudo R-squared: 0.587 

Number of observations: Full 28152; Missing 0; Perfect classification 0 

Number of Fisher Scoring iterations: 11
```

The coefficients are almost identical to those in Table 3 from
@yotov2016advanced that were obtained with Stata. The difference is attributed
to the different fitting algorithms used by the software. Capybara
uses the demeaning algorithm proposed by @stammann2018fast.

```r
fit <- fepoisson(
  trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year | pair,
  data = trade_panel
)

summary(fit, type = "clustered")
```

```r
Formula: trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year | 
    pair

Family: Poisson

Estimates:

|          | Estimate | Std. Error | z value | Pr(>|z|)   |
|----------|----------|------------|---------|------------|
| log_dist |  -0.8216 |     0.1567 | -5.2437 | 0.0000 *** |
| cntg     |   0.4155 |     0.4568 |  0.9097 | 0.3630     |
| lang     |   0.2499 |     0.3997 |  0.6252 | 0.5319     |
| clny     |  -0.2054 |     0.3287 | -0.6250 | 0.5320     |
| rta      |   0.1907 |     0.7657 |  0.2491 | 0.8033     |

Significance codes: *** 99.9%; ** 99%; * 95%; . 90%

Pseudo R-squared: 0.587 

Number of observations: Full 28152; Missing 0; Perfect classification 0 

Number of Fisher Scoring iterations: 11
```

The result is similar and the numerical difference comes fom the 
variance-covariance matrix estimation method. Capybara clustering
algorithm is based on @cameron2011robust.

# References