\documentclass[a4paper]{article}
\usepackage{graphicx}
\usepackage{natbib}
\usepackage{vmargin}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{lmodern}
\usepackage{fix-cm}
\usepackage{hyperref}
\usepackage{color}

\newenvironment{knitrout}{}{}
\newenvironment{Sinput}{\begin{alltt}}{\end{alltt}}
\newenvironment{Soutput}{\begin{alltt}}{\end{alltt}}
\newenvironment{Scode}{\begin{alltt}}{\end{alltt}}

\DeclareMathAlphabet{\mathbbold}{U}{bbold}{m}{n}

\title{\texttt{bdsm}: Bayesian dynamic systems modelling. \\
Bayesian model averaging for dynamic panels \\
with weakly exogenous regressors}

\author{%
  Mateusz Wyszyński$^{1, 2}$\and
  Krzysztof Beck$^{1}$ \and
  Marcin Dubel$^{1}$
}

\date{}

\begin{document}
\SweaveOpts{concordance=TRUE}

<<setup, include=FALSE, echo=FALSE>>=
knitr::opts_chunk$set(
  fig.width = 5,
  fig.height = 3,
  dev = "pdf",
  dpi = 72
)
@

\footnotetext[1]{Lazarski University, $^{2}$University of Warsaw}
%\setpapersize{A4}
% \VignetteIndexEntry{bdsm: Bayesian dynamic systems modelling}
\maketitle
\abstract{This manuscript introduces the \verb+bdsm+ package,
which enables Bayesian model averaging
for dynamic panels with weakly exogenous regressors
— a methodology developed by \citet{Moral+2016}.
The package allows researchers to simultaneously address
model uncertainty and reverse causality.
The manuscript includes a hands-on tutorial
accessible to users unfamiliar with this approach.
In addition to calculating the model space and providing key BMA statistics,
the package offers flexible options for specifying model priors,
or including a dilution prior that accounts for multicollinearity.
It also provides graphical tools
for visualizing prior and posterior model probabilities,
as well as functions for plotting histograms
and kernel densities of the estimated coefficients.
Furthermore, the package enables researchers to compute jointness measures
and perform Bayesian model selection
to examine the most probable models based on posterior model probabilities.}


\tableofcontents

\clearpage
\section{Introduction}

Since the seminal works of \citet{Leamer+1978,Leamer+1981,Leamer+1983,Leamer+1985}, there has been an increased focus on reporting the fragility of regression estimates.
\citet{Leamer+1983} proposed Extreme Bounds Analysis (EBA) as a remedy for addressing the sensitivity of empirical research findings\footnote{\citet{Hlavac+2016} developed an R package for EBA.}.
In economics, growth regressions \citep{Barro+1991} became a central focus of research on economic growth during the 1990s.
However, the credibility of these results was challenged when \citet{Levine+1992} applied EBA to cross-country economic growth data.
The authors found that investment as a share of GDP was the only variable robust to changes in model specification.
In response, EBA was criticized for being too stringent\footnote{\citet{Granger+1990} proposed a less restrictive variant of EBA.}, leading to the proposal of alternative approaches \citep{Sala+1997}.

Bayesian model averaging (BMA) emerged as a preferred method during a period when studies of economic growth advanced alongside methodological innovations \citep{Fernandez+2001,Fernandez+2001b,Sala+2004,Eicher+2007,Ley+2012,Moser+2014,Fernandez+2001,Fernandez+2001b,Arin+2019}.
As a result, BMA became a widely used technique for assessing the robustness of regressors in economics\footnote{For a detailed review of BMA applications in economics, see \citet{Moral+2015,Steel+2020}.} (e.g., \citet{Liu+2009,Ductor+2016,Figini+2017,Beck+2022,DAndrea+2022,Horvath+2024}), as well as in other fields (e.g., \citet{Sloughter+2013,Baran+2015,Aller+2021,Guliyev+2024,Payne+2024,Beck+2025}).
Moreover, the growing interest in BMA was fueled by the availability of R packages such as \verb+BMA+ \citep{Raftery+2005}, \verb+BAS+ \citep{clyde2011bayesian}, and \verb+BMS+ \citep{Feldkircher+2015}, along with the \verb+gretl+ BMA package developed by \citet{Blazejowski+2015}.

The primary issue with the Bayesian model averaging in the aforementioned studies was its reliance on the assumption of exogenous regressors.
In many contexts, particularly in economics, this premise is unsuitable.
Instead, the assumption of endogenous variables within a simultaneous equations framework is more fitting.
Consequently, a new line of research relaxed the assumption of exogenous regressors \citet{Lenkoski+2014,Leon+2015,Mirestean+2016,Moral+2016,Chen+2018}.
However, these methods have not found their way into mainstream research.
The code to implement them is only available upon request from the authors and is provided exclusively for \verb+MATLAB+ and \verb+GAUSS+.

The \verb+bdsm+ package was developed to address this gap.
It offers tools for performing Bayesian model averaging on dynamic panels with weakly exogenous regressors.
As a result, it enables researchers to address both model uncertainty and reverse causality.
The core of the code is based on the methodological approach developed by \citet{Moral+2012,Moral+2013,Moral+2016}.
While the main aspects of the method are described in the manuscript, interested readers should refer to the original articles for further details.
In addition to the key features developed by \citet{Moral+2016}, the \verb+bdsm+ package offers a wide range of additional functionalities.
The package enables users to employ flexible model prior options, along with a dilution prior, which helps account for multicollinearity.
The \verb+bdsm+ package provides users with graphical options for plotting prior and posterior model probabilities across model sizes and the model space.
Additionally, users can utilize Bayesian model selection to thoroughly examine the best models based on posterior model probability.
The package calculates jointness measures developed by \citet{Doppelhofer+2009,Ley+2007,Hofmarcher+2018}.
Finally, it offers users the option to plot histograms or kernel densities of the estimated coefficients for the examined regressors.

The remainder of the manuscript is structured as follows.
Section \ref{ms_bma} describes the dynamic panel setup considered by \citet{Moral+2013} and outlines the Bayesian model averaging approach used in the package.
Data preparation is detailed in Section \ref{data}, while Section \ref{model_space} addresses the estimation of the model space.
Section \ref{using_bma} provides an overview of the \verb+bdsm+ functions related to performing Bayesian model averaging, calculating jointness measures, and presenting the estimation results.
The details of the model prior choices are described in Section \ref{priors}.
Finally, Section \ref{sum} offers some concluding remarks.


\section{Model setup and Bayesian model averaging}\label{ms_bma}
This section outlines the model setup, describes the approach to Bayesian model averaging implemented in the package, summarizes the main BMA statistics, and discusses model priors and jointness measures.

\subsection{Model setup}
\noindent \citet{Moral+2016} considers the following model specification:
\begin{equation}\label{eq1}
    y_{it}=\alpha y_{it-1}+\beta x_{it}+\eta_{i}+\zeta_{t}+v_{it}
\end{equation}
\noindent where $y_{it}$ is the dependent variable, $i$ $(=1,...,N)$ indexes entity (ex. country), $t$ $(=1,...,T)$ indexes time, $x_{it}$ is a matrix of growth determinants, $\beta$ is a parameter vector, $\eta_{i}$ is an entity specific fixed effect, $\zeta_{t}$ is a period-specific shock and $v_{it}$ is a shock to the dependent variable.
To address the issue of reverse causality the model is build on the assumption of weak exogeneity, that can be formalized as
\begin{equation}\label{eq2}
    \mathbb{E}(v_{i,t}|y^{t-1}_{t},x^{t}_{i},\eta_{i})=0
\end{equation}
\noindent where $y^{t-1}_{t}=(y_{i,0},...,y_{i,t-1})'$ and $x^t_{i}=(x_{i,0},...,x_{i,t})'$.
Accordingly, weak exogeneity implies that the current values of the regressors, lagged dependent variable, and fixed effects are uncorrelated with the current shocks, while they are all allowed to be correlated with each other at the same time.
On the assumption of weakly exogenous regressors, \citet{Moral+2013} augmented equation (\ref{eq1}) with additional reduced-form equations capturing the unrestricted feedback process:
\begin{equation}\label{eq3}
x_{it}=\gamma_{t0}y_{i0}+...+\gamma_{tt-1}y_{it-1}+\Lambda_{t1}x_{i1}+...+\Lambda_{tt-1}x_{it-1}+c_{t}\eta_{i}+\vartheta_{it}
\end{equation}
\noindent where $t=2,\dots ,T;$ $c_{t}$ is the $k\times 1$ vector of parameters.
For $h<t$, $\gamma_{th}$ is a $k\times 1$ vector $(y_{th}^{1},\dots,y_{th}^{k})'$  $h=0,\dots,T-1$; $\Lambda_{th}$ is a $k\times k$ matrix of parameters, and $\vartheta_{it}$ is a $k\times 1$ of prediction errors. The initial observations are defined with
\begin{equation}\label{eq4}
    y_{i0}=c_{0}\eta_{i}+\upsilon_{it}
\end{equation}
\begin{equation}\label{eq5}
    x_{i1}=\gamma_{10}y_{i0}+c_{1}\eta_{i}+\vartheta_{it}
\end{equation}
\noindent where $c_{0}$ is a scalar, $c_{1}$ and $\gamma_{10}$ are $k\times 1$ vectors and $\eta_{i}$ are the individual effects. The mean vector and the covariance matrix of the joint distribution of the initial observations and the individual effects are unrestricted.\footnote{The method outperforms the Arellano--Bond estimator \citep{Moral+2019}.}
\indent For the model setup given in equations (\ref{eq1}) and (\ref{eq3}-\ref{eq5}), \citet{Moral+2013} derived the log-likelihood function:
\begin{equation}\label{eq6}
    \log f(data|\theta) \propto \frac{N}{2}\log\det(B^{-1}D\Sigma D'B'^{-1})-\frac{1}{2}\sum_{i=1}^{N}\{R'_{i}(B^{-1}D\Sigma D'B'^{-1})^{-1}R_{i}\}
\end{equation}
\noindent where $\theta$ denotes parameters to be estimated, $R_{i}=(y_{io},x'_{i1},y_{i1},\dots,x'_{iT},y_{iT})'$ are vectors of observed variables, and $\Sigma=diag[\sigma^{2}_{\eta},\sigma^{2}_{\upsilon_{0}},\Sigma_{\vartheta_{1}},\sigma^{2}_{\upsilon_{1}},...,\Sigma_{\vartheta_{T}},\sigma^{2}_{\upsilon_{T}}]$ is the block-diagonal variance-covariance matrix.
Matrix B is given by:
\begin{equation}\label{eq7}
B=\begin{bmatrix}
 1&0&0&0&0&\dotsc& 0&0&0\\
 -\gamma_{10}&I_{k}&0&0&0&\dotsc&0&0&0\\
 -\alpha&-\beta'&1&0&0&\dotsc&0&0&0\\
 -\gamma_{20}&-\Lambda_{21}&-\gamma_{21}&I_{k}&0&\dotsc&0&0&0\\
0&0&-\alpha&-\beta'&1&\dotsc&\vdots&\vdots&\vdots\\
\vdots&\vdots&\vdots&\vdots&\vdots&\ddots&0&0&0\\
 -\gamma_{T0}&-\Lambda_{T1}&-\gamma_{T1}&-\Lambda_{T2}&-\gamma_{T2}&\dotsc&-\gamma_{TT-1}&I_{k}&0\\
 0&0&0&0&0&\dotsc&-\alpha&-\beta'&1\\
\end{bmatrix}
\end{equation}
and matrix D is given by:
\begin{equation}\label{eq8}
D=\begin{bmatrix}
(c_{0}&c'_{1}&1&c'_{2}&1&\dotsc&c'_{T}&1)' & I_{T(k+1) + 1}
 \end{bmatrix}.
\end{equation}

\indent The model setup in equations (\ref{eq1}) and (\ref{eq3}-\ref{eq5}) requires that in addition to the parameters of interest $\alpha$ and $\beta$, the parameters $\gamma_{ij}$ and $\Lambda_{km}$ need to be estimated.
To make the optimization of likelihood computationally feasible, \citet{Moral+2013} developed Simultaneous Equations Model (SEM) setup where the parameters of non-central interest are incorporated in the variance-covariance matrix.
In the SEM setup, the model is defined by $1 + (T - 1)k + T$ equations:
\begin{equation}\label{sem-eqs}
  \begin{cases}
    \eta_i = \phi_i y_{i0} + x'_{i1} \phi_1 + \epsilon_i &  \\
    x_{it} = \pi_{t0} y_{i0} + \pi_{t1} x_{i1} + \pi^w_t x_{i1} + \xi_{it}, & t = 2, ..., T \\
    y_{it} = \alpha y_{it-1} + x'_{it} \beta + \phi_0 y_{i0} + x'_{i1} \phi_1 + w'_i \delta + \epsilon_i + v_{it}, & t = 1, ..., T
  \end{cases}
\end{equation}
This setup can be rewritten in a matrix form:
\begin{equation}\label{eq9}
    B R_i = C z_i + U_i,
\end{equation}
\noindent where:
\begin{equation}\label{eq10}
    z_i = [y_{i0}, x'_{i1}, w'_i]'
\end{equation}
is the vector of strictly exogenous variables,
\begin{equation}\label{eq11}
    R_i = [y_{i1}, y_{i2}, ..., y_{iT}, x'_{i2}, x'_{i3}, ..., x'_{iT}]',
\end{equation}
\begin{equation}\label{eq12}
    U_i = [\epsilon_i + v_{i1}, \epsilon_i + v_{i2}, ..., \epsilon_i + v_{iT}, \xi'_{i2}, \xi'_{i3}, ..., \xi'_{iT}]'
\end{equation}
\noindent and matrices $B$ and $C$ contain coefficients $\alpha$, $\beta$, $\phi_0$, $\phi_1$.
Since these matrices are not connected to the error, we simply note that they are defined in such a way that the equation (\ref{eq9}) is equivalent to the SEM setup.
The main difference of the SEM setup is that equations for $x_{it}$ now depend only on $y_{i0}$ and $x_{i1}$ and not on $y_{is}$ and $x_{is}$ for other periods $s$.\\
\indent Following \citet{Moral+2013}, we can then define the likelihood function as:
\begin{equation}\label{eq13}
    L \propto - \frac{N}{2} \log \det \Omega(\theta) - \frac{1}{2} tr \{ \Omega(\theta)^{-1} (R - Z \Pi(\theta))' (R - Z \Pi(\theta)) \}
\end{equation}
where $R$ and $Z$ are matrices containing vectors $R_i$ and $z_i$ respectively and:
\begin{equation}\label{eq14}
    \Pi(\theta) = B^{-1} C
\end{equation}
\begin{equation}\label{eq15}
    U^*_i(\theta) = B^{-1} U_i
\end{equation}
\begin{equation}\label{eq16}
    \Omega(\theta) = Var(U^*_i) = B^{-1} \cdot Var(U_i) \cdot  B'^{-1} = B^{-1} \Sigma B'^{-1}
\end{equation}
It is possible to find analytical solution for MLE for some of the parameters.
Then the formula for the likelihood function can be simplified to:
\begin{equation}\label{eq17}
    L(\theta) \propto - \frac{N}{2} \log \det \Sigma_{11} - \frac{1}{2} tr \{ \Sigma_{11}^{-1} U_1' U_1 \} - \frac{N}{2} \log \det (\frac{H}{N})
\end{equation}
\normalsize
\noindent where $U_1$ is a matrix of errors connected only to dependent variables, $\Sigma_{11}$ is a part of the $\Sigma$ matrix:
\begin{equation}\label{eq18}
\Sigma=var(U_{i})=var\left(\frac{U_{i1}}{U_{i2}}\right)=\begin{bmatrix}
            \Sigma_{11}&\Sigma_{12}\\
            \Sigma_{21}&\Sigma_{22}\\
            \end{bmatrix}.
\end{equation}
\noindent and $H = (R_2 + U_1 F_{12})' Q (R_2 + U_1 F_{12})$ with $R_2$ being a matrix of regressor vectors $[x'_{i2}, x'_{i3}, ..., x'_{iT}]$ and $F_{12} = - \Sigma_{11}^{-1} \Sigma_{12}$.

\subsection{Bayesian model averaging}\label{bma}
\noindent Given the likelihood function in (\ref{eq17}), henceforth denoted as $L(\text{data}|\theta_{i}, M_{i})$ for a specific model $i$, it is possible to utilize Bayesian model averaging\footnote{For an introduction to BMA see \citet{Raftery+1995,Raftery+1997,Kass+1995b,Doppelhofer+2009,amini2011bayesian,Beck+2017,Fragoso+2018}.} (BMA).
To achieve that, we first estimate all possible variants of equation:
\begin{equation}\label{eq19}
   Y=f(X,\theta,v)
\end{equation}
\noindent where $Y$ is a vector of dependent variable, $X$ is a matrix of potential determinants, $\theta$ is a parameter vector, and $v$ is a stochastic term.
All the variants include a lagged dependent variable; therefore, with $K$ regressors, there are $2^{K}$ possible models that can be estimated.
Each of these models can be assigned a posterior model probability; however, the marginal (integrated) likelihood, $L(\text{data}|M_{i})$, must first be computed.
\citet{Moral+2012} utilizes approach of developed by \citet{Raftery+1995} and \citet{Sala+2004} based on the Bayesian information criterion (BIC) approximation.\\
\indent The Bayes factor for models $M_{i}$ and $M_{i}$, $B_{ij}=\frac{L(\text{data}|M_{i})}{L(\text{data}|M_{j})}$, can be approximated using Schwartz criterion:
\begin{equation}\label{eq20}
S=\log L(\text{data} \mid\hat{\theta_{i}},M_{i})-\log L(\text{data}|\hat{\theta_{j}},M_{j})-\frac{k_{i}-k_{j}}{2}\log (N)
\end{equation}
where $L(\text{data}|\hat{\theta}_{i}, M_{i})$ and $L(\text{data}|\hat{\theta}_{j}, M_{j})$ are the maximum likelihood values for models $i$ and $j$, respectively.
The terms $k_{i}$ and $k_{j}$ denote the number of regressors in models $i$ and $j$.
Bayesian information criterion is given by:
\begin{equation}\label{eq21}
    BIC=-2S=-2\log B_{ij}.
\end{equation}
Given null model $M_{0}$
\begin{equation}\label{eq22}
B_{ij}=\frac{L(y|M_{i})}{L(y|M_{j})}=\frac{\frac{L(y|M_{i})}{L(y|M_{0})}}{\frac{L(y|M_{j})}{L(y|M_{0})}}=\frac{B_{i0}}{B_{j0}}=\frac{B_{0j}}{B_{0i}}
\end{equation}
and
\begin{equation}\label{eq23}
2\log B_{ij}=2[\log B_{0j} - \log B_{0i}]=BIC_{j}-BIC_{i}.
\end{equation}
The posterior model probability (PMP) of model $j$ given the data is
\begin{equation}\label{eq24}
    \mathbb{P}(M_{j}|y)=\frac{L(\text{data}|M_{j})\mathbb{P}(M_{j})}{\sum_{i=1}^{2^K}L(\text{data}|M_{i})\mathbb{P}(M_{i})}
\end{equation}
where $\mathbb{P}(M_{j})$ denotes prior model probability.
In other words, the PMP represents the share of model $j$ in the total posterior probability mass.
Combining, equations (\ref{eq21}-\ref{eq24}) we get:
\begin{equation}\label{eq25}
\fontsize{9pt}{12pt} \selectfont
\begin{split}
    \mathbb{P}(M_{j}|y)=\frac{L(\text{data}|M_{j})\mathbb{P}(M_{j})}{\sum_{i=1}^{2^K}L(\text{data}|M_{i})\mathbb{P}(M_{i})}
    =\frac{\frac{L(\text{data}|M_{j})}{L(\text{data}|M_{0})}
    L(\text{data}|M_{0})\mathbb{P}(M_{j})}{\sum_{i=1}^{2^K}\frac{L(\text{data}|M_{i})}{L(\text{data}|M_{0})}
    L(\text{data}|M_{0})\mathbb{P}(M_{i})}\\
    =\frac{B_{j0}L(\text{data}|M_{0})\mathbb{P}(M_{j})}{\sum_{i=1}^{2^K}B_{i0}
    L(\text{data}|M_{0})\mathbb{P}(M_{i})}
    =\frac{L(\text{data}|M_{0})B_{j0}\mathbb{P}(M_{j})}{L(\text{data}|M_{0})\sum_{i=1}^{2^K}B_{i0}\mathbb{P}(M_{i})}
    =\frac{B_{j0}\mathbb{P}(M_{j})}{\sum_{i=1}^{2^K}B_{i0}\mathbb{P}(M_{i})}
\end{split}
\end{equation}
\normalsize
Finally, using the result that
\begin{equation}\label{eq26}
 B_{j0}=\exp{(-\frac{1}{2}BIC_{j})}
\end{equation}
we can calculate posterior model probability as
\begin{equation}\label{eq27}
    \mathbb{P}(M_{j}|\text{data})=\frac{\exp{(-\frac{1}{2}BIC_{j})}\mathbb{P}(M_{j})}{\sum_{i=1}^{2^K}\exp{(-\frac{1}{2}BIC_{i})} \mathbb{P}(M_{i})}.
\end{equation}

\subsection{BMA statistics}
\noindent With PMPs, we can calculate useful BMA statistics.
Let's denote by $\pi_k$ the random variable which is equal to one if the $k^{th}$ regressor should be considered as the determinant of the dependent variable.
The posterior inclusion probability (PIP) for the regressor is given by:
\begin{equation}
\mathbb{P}(\pi_k = 1 |\text{data}) = \sum_{j=1}^{2^K} \mathbbold{1} (k^{th} \text{ regressor is in model } M_{j}) \cdot \mathbb{P}(M_{j}|\text{data})
\end{equation}
where the indicator function $\mathbbold{1}$ is equal to one if the regressor is part of the model $M_j$ and zero otherwise.
In other words, the PIP tells us how likely it is that the given regressor has impact on the variable of interest.

Another interesting statistic is the posterior mean (PM) of a given parameter $\beta$.
Let's denote by $\pi_{\beta}$ the random variable which is equal to one if the given parameter is present in the model, and zero otherwise.
The posterior mean of $\beta$ is given by:

\begin{equation}\label{pm}
\mathbb{E}(\beta|\text{data})=\sum_{j=1}^{2^K}\widehat{\beta}_{j} \cdot \mathbb{P}(M_{j}, \pi_{\beta} = 1 |\text{data})
\end{equation}

\noindent where $\widehat{\beta}_{j}$ is the value of the coefficient $\beta$ in model $j$.
It tells us what is the mean (or expected) value for the parameter taking into account all considered models.
Note that if $\beta$ is not present in the given model $j$ we can assign any value to $\widehat{\beta}_{j}$,
because the probability $\mathbb{P}(M_{j}, \pi_{\beta} = 1 |\text{data})$ wil be zero anyway.

The posterior variance of the parameter $\beta$ is equal to:

\begin{equation}
\begin{split}
Var(\beta|\text{data}) =\ &\sum_{j=1}^{2^K}Var(\beta_{j}|\text{data},M_{j}) \cdot \mathbb{P}(M_{j}, \pi_{\beta} = 1 |\text{data})\\[1ex]
&+\sum_{j=1}^{2^K}\Bigl[\widehat{\beta}_{j}-\mathbb{E}(\beta|\text{data})\Bigr]^{2} \cdot \mathbb{P}(M_{j}, \pi_{\beta} = 1 |\text{data})
\end{split}
\end{equation}

\noindent where $Var(\beta_{j}|\text{data},M_{j})$ denotes the conditional variance of the coefficient $\beta$ in model $M_{j}$
(in other words assuming that the model $M_j$ is the true model).
Posterior standard deviation (PSD) of $\beta$ is then defined as the square root of the variance:

\begin{equation}\label{psd}
SD(\beta | \text{data}) = \sqrt{ Var(\beta | \text{data}) }
\end{equation}

\indent Alternatively, one might be interested in the values of the mean and variance on the condition of inclusion of a given parameter,
i.e. assuming that it is definitely a part of the model.
Note that this is usually determined by the presence of a related regressor.
The conditional posterior mean (PMcon) for a parameter $\beta$ is given by:
\begin{equation}
\mathbb{E}(\beta | \pi_{\beta}=1,\text{data})=\frac{\mathbb{E}(\beta|\text{data})}{\mathbb{P}(\pi_{\beta} = 1|\text{data})}.
\end{equation}
Similarly, the conditional variance is:

\begin{equation}
    Var(\beta|\pi_{\beta}=1,\text{data})=\frac{Var(\beta|\text{data})+\mathbb{E}(\beta|\text{data})^2}{\mathbb{P}(\pi_{\beta}=1|\text{data})}-\mathbb{E}(\beta|\pi_{\beta}=1,\text{data})^2
\end{equation}

\noindent and so the conditional standard deviation (PSDcon) is:

\begin{equation}
SD(\beta | \pi_{k}=1,\text{data}) = \sqrt{ Var(\beta | \pi_{k}=1,\text{data}) }
\end{equation}

\indent The BMA statistics allow the assessment of the robustness of the examined regressors.
\citet{Raftery+1995}, classifies a variable as weak, positive, strong, and very strong when the posterior inclusion probability (PIP) is between 0.5 and 0.75, between 0.75 and 0.95, between 0.95 and 0.99, and above 0.99, respectively.
\citet{Raftery+1995} also refers to the variable as robust when the absolute value of the ratio of posterior mean (PM) to posterior standard deviation (PSD) is above 1, indicating that the regressor improves the power of the regression.
\citet{Masanjala+2008} propose a more stringent criterion, where they require the statistic to be higher than 1.3, while \citet{Sala+2004} argue for 2, corresponding to $90\%$ and $95\%$, respectively.

\subsection{Model priors and jointness}\label{prior_joint}
\noindent To perform BMA one needs to specify prior model probability\footnote{For a thorough discussion of model priors see \citet{Sala+2004,Ley+2009,George+2010,Eicher+2011}.}.
The package offers two main options.
The first is binomial model prior \citep{Sala+2004}:

\begin{equation}
    \mathbb{P}(M_{j})=(\frac{EMS}{K})^{k_{j}}(1-\frac{EMS}{K})^{K-k_{j}}
\end{equation}

where $EMS$ is the expected model size and $k_{j}$ is a number of regressors in model $j$.
If $EMS = \frac{K}{2}$, the binomial model prior simplifies to a uniform model prior with $\mathbb{P}(M_{j}) = \frac{1}{2^K}$ for every $j$, meaning that all models are assumed to have equal probabilities.
The second is binomial-beta model prior \citet{Ley+2009} given by:

\begin{equation}
    \mathbb{P}(M_{j}) \propto \Gamma(1+k_{j}) \cdot \Gamma(\frac{K-EMS}{EMS}+K-k_{j}).
\end{equation}

\noindent where $\Gamma$ is the gamma function.
In the context of the binomial-beta prior $EMS = \frac{K}{2}$ corresponds to equal probabilities on model sizes.

\indent In order to account for potential multicolinearity between regressors one can use dilution prior introduced by \citet{George+2010}.
The dilution prior involves augmenting the model prior (binomial or binomial-beta) with a function that accounts for multicollinearity:

\begin{equation}
    \mathbb{P}_{D}(M_{j}) \propto \mathbb{P}(M_{j})|COR_{j}|^{\omega}
\end{equation}

where $\mathbb{P}_{D}(M_{j})$ is the diluted model prior, $|COR_{j}|$ is the determinant of the correlation matrix of regressors in model $j$,
and $\omega$ is the dilution parameter.
The lower the correlation between regressors,
the closer $|COR_{j}|$ is to one, resulting in a smaller degree of dilution.

\indent To determine whether regressors are substitutes or complements,
various authors have developed jointness measures\footnote{
To learn more about jointness measures, we recommend reading \citet{Doppelhofer+2009, Ley+2007, Hofmarcher+2018} in that order.
}.
Assuming two different covariates $a$ and $b$,
let $\mathbb{P}(a\cap b)$ be the posterior probability of the inclusion of both variables,
$\mathbb{P}(\overline{a}\cap \overline{b})$ the posterior probability of the exclusion of both variables,
$\mathbb{P}(\overline{a}\cap b)$ and $\mathbb{P}(a\cap \overline{b})$ denote the posterior probability of including each variable separately.
The first measure of jointness is simply $\mathbb{P}(a\cap b)$.
However, this measure ignores much of the information about the relationships between the regressors.
\citet{Doppelhofer+2009} measure is defined as:

\begin{equation}
    J_{DW}=\text{log}\left[\frac{\mathbb{P}(a\cap b) \cdot \mathbb{P}(\overline{a}\cap \overline{b})}{\mathbb{P}(\overline{a}\cap b) \cdot \mathbb{P}(a\cap \overline{b})}\right].
\end{equation}

If $J_{DW} < -2$, $-2 < J_{DW} < -1$, $-1 < J_{DW} < 1$, $1 < J_{DW} < 2$, and $J_{DW} > 2$, the authors classify the regressors as strong substitutes, significant substitutes, not significantly related, significant complements, and strong complements, respectively.
Jointness measure proposed by \citet{Ley+2007} is given by:

\begin{equation}
    J_{LS}=\frac{\mathbb{P}(a\cap b)}{\mathbb{P}(\overline{a}\cap b)+\mathbb{P}(a\cap \overline{b})}.
\end{equation}

The measure takes values in the range $[0, \infty)$, with higher values indicating a stronger complementary relationship.
Finally, \citet{Hofmarcher+2018} measure of jointness is:

\begin{equation}
    J_{HCGHM}=\frac{(\mathbb{P}(a\cap b)+\rho) \cdot \mathbb{P}(\overline{a}\cap \overline{b})+\rho)-(\mathbb{P}(\overline{a}\cap b)+\rho) \cdot \mathbb{P}(a\cap \overline{b})+\rho)}{(\mathbb{P}(a\cap b)+\rho) \cdot \mathbb{P}(\overline{a}\cap \overline{b})+\rho)+(\mathbb{P}(\overline{a}\cap b)+\rho) \cdot \mathbb{P}(a\cap \overline{b})+\rho)+\rho}.
\end{equation}

\citet{Hofmarcher+2018} advocate the use of the \citet{Jeffreys+1946} prior, which results in $\rho=\frac{1}{2}$.
The measure takes values from -1 to 1, where values close to -1 indicate substitutes, and those close to 1 complements.

\section{Data preparation}\label{data}

This section demonstrates how to prepare the data for estimation.
The first step involves installing the package and subsequently loading it into the R session.

<<eval=FALSE>>=
install.packages("bdsm")
@

<<>>=
library(bdsm)
@

% This part below will be changed when CRAN version is up to date
Throughout the manuscript,
we use the data from \citet{Moral+2016} on the determinants of economic growth.
The package includes the data
along with a detailed description of all variables.

<<>>=
economic_growth[1:12,1:10]
@

Since it is common for researchers to store their data in an alternative format:

<<>>=
original_economic_growth[1:12,1:10]
@

\noindent where there is an already existing column
with the lagged dependent variable,
we provide the \verb+join_lagged_col+ function
to transform the data into the desired format.
The user needs to specify the dependent variable column (\verb+col+),
the lagged dependent variable column (\verb+col_lagged+),
the column identifying the cross-sections (\verb+entity_col+),
the column with the time index (\verb+timestamp_col+),
and the change in the number of time units from period to period
(\verb+timestep+).

<<>>=
economic_growth <- join_lagged_col(
  df            = original_economic_growth,
  col           = gdp,
  col_lagged    = lag_gdp,
  timestamp_col = year,
  entity_col    = country,
  timestep      = 10
)
@

Once the data is in the correct format,
the user can perform further data preparation
using the \verb+feature_standardization+ function.
It allows to perform demeaning (entity/time effects)
or scaling (standardization) as needed.
Often there are columns to which the transformation should not be applied.
These can be specified with the  \verb+excluded_cols+.
It is also possible to group elements of the data frame with respect to a given
column with the \verb+gropu_by_col+.
Finally, with the \verb+scale+ parameter we can decide whether we want to apply
both demeaning and scaling or just demeaning.

For example, we can first standardize all features:

<<>>=
data_standardized_features <- feature_standardization(
    df            = economic_growth,
    excluded_cols = c(country, year, gdp)
  )
@

\noindent and then apply cross-sectional demeaning (fixed time effects):

<<>>=
data_prepared <- feature_standardization(
    df            = data_standardized_features,
    group_by_col  = year,
    excluded_cols = country,
    scale         = FALSE
  )
@

\noindent Note that the example below is the data preparation scheme which was
used in \citet{Moral+2016}.
There is no need to apply panel demeaning (entity fixed effects)
in this framework as can be seen in \autoref{eq12}.\footnote{
In theory the results should be the same with entity fixed effects.
However, because we use numerical methods some discrepancies might occur}.


\section{Estimation of the model space}\label{model_space}

To perform a BMA analysis, we need values of the parameters
as well as various statistics for each considered model
as explained in \autoref{ms_bma}.
We refer to the object that consolidates both the parameters and the statistics
as the \textbf{model space}.
The core function of the package, \verb+optim_model_space+,
is used to estimate the optimal model space using numerical optimization:

<<eval=FALSE>>=
full_model_space <- optim_model_space(
  df            = data_prepared,
  dep_var_col   = gdp,
  timestamp_col = year,
  entity_col    = country,
  init_value    = 0.5
)
@

\noindent The function returns a list with two named arguments
which are explained in the subsections below.
A progress bar is displayed to easily track the ongoing computation.

Since the MLEs for the parameters are found through numerical optimization,
more advanced users can use the \verb+control+ parameter
to control the way the optimization is performed.
We refer to the function manual for more details
and \verb+stats+ package for more details.


\subsection{Model space parameters}

The first element of the list contains the estimated MLEs of the parameters
for each considered model.
Each column represents a single model, and rows correspond to the parameters.
For example, to display the first 10 parameters for 5 models we can call:

<<>>=
full_model_space$params[1:10, 1:5]
@

\noindent \verb+NA+ value means that the corresponding parameter is not present
in the given model.

\subsection{Model space statistics}
The second element of the list provides:
\begin{itemize}
  \item the values of the likelihood function at the estimated MLE (first row),
  \item the Bayesian information criterion (second row),
  \item and the standard errors for the parameters describing linear relations,
  i.e. the beta parameters from \autoref{sem-eqs},
  for each model (remaining rows).
\end{itemize}

<<>>=
full_model_space$stats[, 1:5]
@

\noindent Again, each column represents a single considered model.

Two types of standard errors are provided,
both derived from the Hessian of the maximized log-likelihood function.
The first type consists of the regular standard errors,
calculated using the inverse of the observed information matrix:

\begin{equation}
I(\hat{\theta}) = -\frac{\partial^2 l(\hat{\theta})}{\partial \theta \partial \theta'}
\end{equation}

\noindent where $\hat{\theta}$ are the estimated MLE parameters,
$I(\hat{\theta})$ is the information matrix
and $l(\hat{\theta}) = \log L(\hat{\theta})$
is the natural logarithm of the likelihood function.
The variance covariance matrix is given by:

\begin{equation}
Var(\hat{\theta}) = I(\hat{\theta})^{-1},
\end{equation}

\noindent and the standard errors by

\begin{equation}
\text{SE}(\hat{\theta}) = \sqrt{\text{diag}(Var(\hat{\theta}))}.
\end{equation}

\noindent where the square root is obviously applied separately
to each coordinate of the vector with diagonal values.
The second type are the robust standard errors
or heteroscedasticity consistent standard errors.
To understand how they work,
we first have to rewrite the equation \autoref{eq17} in a form
which will display the contribution of each entity on the likelihood value.
First note that:

\begin{equation}
    L(\theta) \propto - \frac{1}{2} tr \{ \Sigma_{11}^{-1} U_1' U_1 \} - \sum_{i=1}^{N} \frac{1}{2} \log \det \Sigma_{11} - \frac{1}{2} \log \det (\frac{H}{N})
\end{equation}

\noindent Now,
because of the cyclic property of the trace we can rewrite the first term as:

\begin{equation}
    - \frac{1}{2} tr \{ \Sigma_{11}^{-1} U_1' U_1 \} =
    - \frac{1}{2} tr \{ U_1 \Sigma_{11}^{-1} U_1' \} =
    - \frac{1}{2} \sum_{i=1}^N u_i \Sigma_{11}^{-1} u_i'
\end{equation}

\noindent where $u_i$ is a row vector
corresponding to the data relating to the single entity $i$.
Hence, the entire likelihood function can be rewritten
as a sum of contributions from each entity:

\begin{equation}
    L(\theta) \propto \sum_{i=1}^{N} -\frac{1}{2} (\log \det \Sigma_{11} + \log \det (\frac{H}{N}) + u_i \Sigma_{11}^{-1} u_i')
\end{equation}

\noindent From there we can see that the contribution of a single entity $i$ is:

\begin{equation}
 l_i(\theta) \propto -\frac{1}{2} (\log \det \Sigma_{11} + \log \det (\frac{H}{N}) + u_i \Sigma_{11}^{-1} u_i').
\end{equation}

\noindent Now if we consider a multivariate function $\mathbf{l}(\theta)$
of all such contributions (with single contributions as its coordinates),
we can find it's gradient at the MLE:
$G(\hat{\theta}) = \frac{\mathbf{l}(\hat{\theta})}{\partial \theta}$.
Then the robust variance is:

\begin{equation}
Var_{R}(\hat{\theta}) = I(\hat{\theta})^{-1} \cdot G'(\hat{\theta}) G(\hat{\theta}) \cdot I(\hat{\theta})^{-1}
\end{equation}

\noindent and the robust standard errors are given by:

\begin{equation}
\text{SE}_{R}(\hat{\theta}) = \sqrt{\text{diag}(\hat{V}_{R}(\hat{\theta}))}.
\end{equation}

\noindent where the square root is again applied to each coordinate separately.


\subsection{Parallelization}

The \verb+optim_model_space+ function
is the most computationally intensive part of the package.
Therefore, the function provides an option for parallel computing.
If the user's data contains only a few regressors, the sufficient option is

<<eval=FALSE>>=
model_space <- optim_model_space(
  df            = data_prepared,
  dep_var_col   = gdp,
  timestamp_col = year,
  entity_col    = country,
  init_value    = 0.5
)
@

\noindent However, for larger datasets,
it is better to take advantage of parallel computing.
Then the numerical optimization used to find MLEs
can be computed on separate threads for each model.
To do this, first load the \verb+parallel+ package and set up a cluster.

<<eval=FALSE>>=
library(parallel)
# Here we try to use all available cores on the system.
# You might want to lower the number of cores depending on your needs.
cores <- detectCores()
cl <- makeCluster(cores)
setDefaultCluster(cl)
@

\noindent Then the user just needs to provide this cluster to the function:

<<eval=FALSE>>=
model_space <- optim_model_space(
  df            = data_prepared,
  dep_var_col   = gdp,
  timestamp_col = year,
  entity_col    = country,
  init_value    = 0.5,
  cl            = cl
)
@


\subsection{Precomputed model spaces}

Even with parallelization, \verb+optim_model_space+ call may be time-consuming.
Hence, for users who want to quickly start practicing using the \verb+bdsm+,
we provide already computed model space objects included with the package:

\begin{itemize}
  \item \verb+full_model_space+ is the model space built with the entire data
  used by \citet{Moral+2016},
  \item \verb+small_model_space+ is a smaller model space built with only a
  subset of regressors.
\end{itemize}


\section{Performing Bayesian model averaging}\label{using_bma}
% TODO: HERE WE NEED TO ADD SOME TEXT

\subsection{Bayesian model averaging: The \texttt{bma} function}

The \verb+bma+ function enables users to perform Bayesian model averaging using
the object obtained with the \verb+optim_model_space+ function.
The \verb+round+ parameter specifies the decimal place
to which the BMA statistics should be rounded in the results.

<<>>=
bma_results <- bma(full_model_space, df = data_prepared, round = 3)
@

The \verb+bma+ function returns a list containing 16 elements.
However, most of these elements are only required for other functions.
The main objects of interest are the two tables with the BMA statistics.
The results obtained with binomial model prior are first on the list.

<<>>=
bma_results[[1]]
@

PIP denotes the posterior inclusion probability, PM denotes the posterior mean, PSD denotes the posterior standard deviation, and PSDR denotes the posterior standard deviation calculated using robust standard errors.
These are the four main results of BMA with respect to the assessment of individual regressors.
PMcon, PSDcon, and PSDRcon denote the posterior mean, posterior standard deviation, and posterior standard deviation based on robust standard errors, respectively, conditional on the inclusion of the variable.
Users should base their interpretation of the results on conditional BMA statistics
only when they believe that certain regressors must be included.
Finally, for a given parameter we can consider all models that include this parameter,
and check if it has a positive or negative value.
$\%(+)$ denotes the percentage of models with positive value for a given parameter across all models that include that parameter.
A value of $\%(+)$ equal to $0\%$ or $100\%$ indicates coefficient sign stability.

The PIP for all the regressors shows that none of them can be considered very strong according to the classification by \citet{Raftery+1995}.
This also applies to the population variable (\verb+pop+), which has a PIP of 0.990 due solely to approximation.
These results are corroborated by the ratios of PM to PSD and PSDR.
In particular, for the absolute value of the PM to PSDR ratio, only the population variable exceeds 1.3, while investment (\verb+ish+) and the democracy index (\verb+polity+) are above 1.
This finding led \citet{Moral+2016} to emphasize the fragility of economic growth determinants.
The only variable that can be considered robust across all metrics is the lagged GDP (\verb+gdp_lag+).
However, the results change when using the binomial-beta model prior, which is included as the second object in the \verb+bma+ list.

<<>>=
bma_results[[2]]
@

In the case of the binomial-beta model prior, the PIPs for all the regressors increase.
Population is classified as very strong, while all other regressors are classified as strong or positive according to posterior inclusion probabilities.
There are also considerable changes in the PM to PSD and PSD\_R ratios.
The absolute value of the PM to PSD ratio exceeds two for investment and the democracy index, and is above 1.3 for population, investment price (\verb+ipr+), trade openness (\verb+open+), and life expectancy (\verb+lnlex+).
However, these results are less pronounced when using robust standard errors, with only population, trade openness, and the democracy index remaining above 1.3.
Consequently, the results are not robust with respect to the choice of prior model specification.
The reasons behind these differences will become clear once other functionalities of the package are explored.

The last object in the list is a table containing the prior and posterior expected model sizes for the binomial and binomial-beta model priors.
Importantly, these numbers reflect only the number of regressors in a model and do not include the lagged dependent variable, which is present in every model by construction.

<<>>=
bma_results[[16]]
@

The results show that, after observing the data, the researcher should expect around seven and eight and a half regressors in the model under the binomial and binomial-beta model priors, respectively.
These numbers may seem high; however, they are driven by relatively substantial PIPs.
This illustrates the importance of focusing on both posterior inclusion probabilities and the ratios of posterior mean to posterior standard deviation when assessing the robustness of the regressors.

\subsection{Prior and posterior model probabilities}

The \verb+model_pmp+ function allows the user to compare prior and posterior model probabilities over the entire model space in the form of a graph.
The models are ranked from the one with the highest to the one with the lowest posterior model probability.
The function returns a list with three objects:
\begin{enumerate}
    \item a graph for the binomial model prior;
    \item a graph for the binomial-beta model prior;
    \item a combined graph for both binomial and binomial-beta model priors.
\end{enumerate}
The user can retrieve each graph separately from the list; however, the function automatically displays a combined graph.

<<fig=TRUE>>=
for_models <- model_pmp(bma_results)
@

The graphs demonstrate that most of the posterior probability mass is concentrated within just a couple of models.
To view the results for only the best models, the user can use the \verb+top+ parameter.

<<fig=TRUE>>=
for_models <- model_pmp(bma_results, top = 10)
@

The last graph for the binomial-beta prior is particularly illuminating in terms of explaining the very high values of posterior inclusion probabilities.
Almost 70\% of the posterior probability mass is concentrated in just one model; therefore, variables included in this model will have very high PIP values.
The model in question will be identified after implementing \verb+model_sizes+ (and \verb+best_models+, which is covered in \autoref{best-models}).
Nevertheless, the results from the graph suggest that the best model is the one that includes all the regressors or none
(because the prior value is around $\frac{1}{9}$ on the plot).

The \verb+model_sizes+ function displays prior and posterior model probabilities on a graph for models of different sizes.
The graphs exclude the lagged dependent variable; therefore, the model with zero regressors still includes the lagged dependent variable.
Similarly to the \verb+model_pmp+ function is returns a list with three objects:
\begin{enumerate}
    \item a graph for the binomial model prior;
    \item a graph for the binomial-beta model prior;
    \item a combined graph for both binomial and binomial-beta model priors.
\end{enumerate}
Again, the user can retrieve each graph separately from the list; however, the function automatically displays a combined graph.

<<fig=TRUE>>=
size_graphs <- model_sizes(bma_results)
@

The graph in panel b) again explains why PIPs are so high in the case of the binomial-beta model prior.
The model with all the regressors accounts for almost 70\% of the total posterior probability mass,
while the remaining portion is concentrated on models with a high number of regressors.
In contrast, the posterior probability mass for the binomial model prior is centered around models with seven regressors.
This graph clearly illustrates the impact of changes in the model prior on posterior probabilities.

\subsection{Selecting the best models}\label{best-models}

The \verb+best_models+ function allows the user to view a chosen number of the best models in terms of posterior model probability.
The function returns a list containing nine objects:
\begin{enumerate}
    \item An inclusion table stored as an array object;
    \item A table with estimation results using regular standard errors, stored as an array object;
    \item A table with estimation results using robust standard errors, stored as an array object;
    \item An inclusion table stored as a knitr object;
    \item A table with estimation results using regular standard errors, stored as a knitr object;
    \item A table with estimation results using robust standard errors, stored as a knitr object;
    \item An inclusion table stored as a gTree object;
    \item A table with estimation results using regular standard errors, stored as a gTree object;
    \item A table with estimation results using robust standard errors, stored as a gTree object;
\end{enumerate}
The parameters \verb+estimate+ and \verb+robust+ pertain only to the results that will be automatically displayed after running the function.
The parameter \verb+criterion+ determines whether the models should be ranked according to posterior model probabilities calculated using the binomial (1) or binomial-beta (2) model prior.
To obtain the inclusion array for the 10 best models ranked with the binomial model prior, the user needs to run:

<<>>=
best_8_models <- best_models(bma_results, criterion = 1, best = 8)
best_8_models[[1]]
@

1 indicates the presence of a given regressor in a model, while the last row displays the posterior model probability of that model.
To obtain a knitr table with estimation output with regular standard errors for best 3 models ranked with binomial-beta model prior, the user needs to run:

<<>>=
best_3_models <- best_models(bma_results, criterion = 2, best = 3)
best_3_models[[5]]
@

The comparison of the last two tables further highlights the importance of the model prior.
The best model under the binomial model prior accounts for around 9\% of the posterior probability mass,
while the best model under the binomial-beta model prior accounts for over 69\%.
Finally, to obtain a gTree table with estimation output using robust standard errors for the top 3 models ranked by the binomial-beta model prior,
the user needs to run:

<<fig=TRUE>>=
best_3_models <- best_models(bma_results, criterion = 2, best = 3)
best_3_models[[9]]
@

The comparison of the last two tables and the estimation outputs with regular and robust standard errors demonstrates
how the results change when switching between these two variance estimators.

\subsection{Calculating jointness measures}

Within the BMA framework, it is possible to establish the nature of the relationship between pairs of examined regressors using the jointness measures.
This can be accomplished using the \verb+jointness+ function.
The latest jointness measure, introduced by \citet{Hofmarcher+2018}, has been shown to outperform older alternatives developed by \citet{Ley+2007} and \citet{Doppelhofer+2009}\footnote{See section \ref{prior_joint} for the interpretations of jointness measures.}.
Therefore, the \citet{Hofmarcher+2018} measure is the default option in the \verb+jointness+ function.

<<>>=
jointness(bma_results)
@

\noindent Above the main diagonal the user can find the results for the binomial model prior, and below the results for the binomial-beta model prior.
All the values in the table are positive, indicating complementary relationships between the regressors.
Notably, the values for the binomial-beta prior are substantially higher than those for the binomial prior.
This result is not surprising, as the model with all the regressors accounts for almost $70\%$ of the total posterior probability mass.

To obtain the results for the \citet{Ley+2007} measure, the user should run:

<<>>=
jointness(bma_results, measure = "LS")
@

\noindent The values corroborate the results obtained using the \citet{Hofmarcher+2018} measure.
All the regressors exhibit complementary relationships, which are visibly stronger under the binomial-beta model prior.

However, the \citet{Doppelhofer+2009} measure yields a slightly different outcome:

<<>>=
jointness(bma_results, measure = "DW")
@

\noindent In this case, some pairs of regressors have negative values of the jointness measure under the binomial model prior;
however, these values are very close to zero, indicating unrelated variables.
Once again, the values for the binomial-beta model prior are higher, demonstrating how the results are influenced by the choice of model prior.

\subsection{Visualizing model coefficients}

The \verb+coef_hist+ function allows the user to plot the distribution of estimated coefficients.
It returns a list containing a number of objects equal to the number of regressors plus one.
The first object in the list is a graph of the coefficients for the lagged dependent variable, while the remaining objects are graphs of the coefficients for the other regressors.
The graph for the lagged dependent variable collects coefficients from the entire model space, whereas the graphs for the other regressors only collect coefficients from the models that include the given regressor (half of the model space).

There are two main options for visualizing the coefficient distributions.
The first option uses a histogram.
The \verb+coef_hist+ function provides the user with options for controlling the bin widths of the histogram (\verb+BW+, \verb+binW+, \verb+BN+, and \verb+num+).
The default is \verb+BW = FD+, which selects bin widths using the Freedman-Diaconis method.
<<fig=TRUE>>=
coef_plots <- coef_hist(bma_results)
coef_plots[[1]]
@

The second option allows the user to plot kernel densities.
<<fig=TRUE>>=
coef_plots2 <- coef_hist(bma_results, kernel = 1)
coef_plots2[[1]]
@

The choice of appropriate plotting options is left to the user's preferences regarding the style of presentation and the size of the model space.
<<fig=TRUE>>=
library(gridExtra)
grid.arrange(coef_plots[[1]], coef_plots[[2]], coef_plots2[[1]],
             coef_plots2[[2]], nrow = 2, ncol = 2)
@

\section{Changes in model priors}\label{priors}

This section provides a more detailed description of the available model prior options.
Subsection \ref{ems} discusses the consequences of changes in the expected model size, while subsection \ref{dil} describes the dilution prior.

\subsection{Changing expected model size}\label{ems}

The \verb+bma+ function calculates BMA statistics using both the binomial and binomial-beta model priors.
By default, the \verb+bma+ function sets the expected model size (EMS) to $K/2$, where $K$ denotes the total number of regressors.
The binomial model prior with $EMS = K/2$ leads to a uniform model prior, assigning equal probabilities to all models.
In contrast, the binomial-beta model prior with $EMS = K/2$ assumes equal probabilities across all model sizes.
However, the user can modify the prior model specification by changing the \verb+EMS+ parameter.

First, consider the consequence of concentrating prior probability mass on small models by setting $EMS = 2$.

<<>>=
bma_results2 <- bma(full_model_space, df = data_prepared, round = 3, EMS = 2)
@

\noindent Before turning to the main BMA results, let us focus on the changes in the posterior probability mass with respect to model sizes.

<<>>=
bma_results2[[16]]
@

\noindent The results show that decreasing the prior expected model size led to a considerable decline in the posterior expected model size.
The consequences of this change in the prior expected model size are best illustrated using the prior and posterior probability mass over model sizes.

<<fig=TRUE>>=
size_graphs2 <- model_sizes(bma_results2)
@

For both the binomial and binomial-beta model priors, the prior probability mass is more concentrated on small model sizes.
However, for the binomial model prior, the center of the posterior probability mass shifted to medium-sized models, while it remained on large models for the binomial-beta model prior.
Nevertheless, the posterior model probability for the model with all regressors decreased from nearly 0.7 for $EMS = 4.5$ to less than 0.3.
There are also substantial changes in the distribution of the posterior probability mass over the model space.
<<fig=TRUE>>=
model_graphs2 <- model_pmp(bma_results2)
@

Both panels of the graph show that the prior and posterior model probabilities have substantially decoupled from each other.
This strongly indicates that the prior and the data are suggesting vastly different model choices.
The tall blue spike represents the model with no regressors.
The main BMA posterior statistic for the binomial model prior also experienced a significant change.
<<>>=
bma_results2[[1]]
@

\noindent Posterior inclusion probabilities drop considerably for all the regressors, except for population, which remains almost unchanged.
Interestingly, the ratios for all variables declined, with population being the exception.
The ratio for population remains above two for regular standard errors and 1.7 for robust standard errors.
This outcome indicates that population performs relatively better in smaller models.
The results for binomial-beta model prior are  given below.
<<>>=
bma_results2[[2]]
@

\noindent The change in PIPs is again significant, though not as pronounced as in the case of the binomial model prior.
Changes in the ratios are relatively small and irregular for both regular and robust standard errors.
The most pronounced change is the drop in the value of the ratios for the democracy index (\verb+polity+), indicating that this regressor performs better in larger models.

It is also very instructive to examine the jointness measures calculated under the new prior specification.
<<>>=
jointness(bma_results2, measure = "HCGHM", rho = 0.5, round = 3)
@
On the one hand, the results obtained with the binomial-beta model prior did not change in any significant manner.
On the other hand, the results obtained with the binomial model prior changed substantially.
The measure indicates that population is a substitute for both the investment price (\verb+ipr+) and the democracy index, as well as, to a lesser extent, secondary education (\verb+sed+) and population growth (\verb+pgrw+).

Next, to consider the consequences of concentrating prior probability mass on large models, \verb+EMS+ was set to eight.
<<>>=
bma_results8 <- bma(full_model_space, df = data_prepared, round = 3, EMS = 8)
bma_results8[[16]]
@

\noindent The posterior model size increased for the binomial prior; however, it remained almost unchanged for the binomial-beta model prior.
The most interesting aspect is the new graphs of prior and posterior probability mass over the model sizes.
<<fig=TRUE>>=
size_graphs8 <- model_sizes(bma_results8)
@

In both cases, the posterior probability mass has concentrated near the models with all the regressors.
However, in the case of the binomial-beta model prior, the model with all the regressors captures most of the posterior probability mass (almost $96\%$).
This conclusion is further supported by the graphs of posterior model probability across the entire model space.
<<fig=TRUE>>=
model_graphs8 <- model_pmp(bma_results8)
@

Panel (a) demonstrates that the change in the expected model size led to a substantial increase in the posterior model probability for the model with all regressors under the binomial model prior.
It now accounts for over $70\%$ of the total posterior probability mass.
The increase in the expected model size also influenced the main BMA statistics.
<<>>=
bma_results8[[1]]
@

\noindent The PIPs increased considerably.
Population is classified as very strong, while the other regressors are classified as strong or positive.
Interestingly, all the ratios have improved as well, except for population.
The change in the results for the binomial-beta model prior is less pronounced.
<<>>=
bma_results8[[2]]
@

\noindent With the increase in expected model size, population is classified as very strong,
and all the other regressors are classified as strong in terms of the posterior inclusion probability criterion.
Similarly to the case of the binomial prior, all the ratios increased except for population.

Again, it is instructive to examine the jointness measures.
<<>>=
jointness(bma_results8, measure = "HCGHM", rho = 0.5, round = 3)
@

\noindent The values of the measures show that all the regressors exhibit a very strong complementary relationship.
This outcome, once again, underscores the importance of carefully considering the prior when interpreting jointness measures.

\subsection{Dilution prior}\label{dil}
One of the main issues associated with identifying robust regressors is multicollinearity.
Some regressors may approximate the same underlying factor influencing the dependent variable.
Multicollinearity may result from the absence of observable variables associated with a specific theory or from a theory failing to provide a unique candidate for a regressor.
Moreover, some regressors may share a common determinant.
Although \citet{Moral+2013, Moral+2016} addressed this issue to some extent, researchers have another option to mitigate multicollinearity: the dilution prior proposed by \citet{George+2010} which was described in detail in \autoref{prior_joint}.

To apply the dilution prior, the user must set \verb+dilution = 1+ in the \verb+bma+ function.
The user can also manipulate the dilution parameter $\omega$.
The default option is \verb+dil.Par = 0.5+, as recommended by \citet{George+2010}.
<<>>=
bma_results_dil <- bma(
  model_space = full_model_space,
  df          = data_prepared,
  round       = 3,
  dilution    = 1
  )
@

\noindent The effect of implementing the dilution prior is well depicted by the distribution of prior probability mass over the model sizes.

<<fig=TRUE>>=
size_graphs_dil <- model_sizes(bma_results_dil)
@

The change in the prior distribution is more visible for the binomial-beta model prior.
In panel b, the prior probability mass has decreased for larger models and increased for smaller models.
However, this change is not uniform, as models characterized by the highest degree of multicollinearity are subject to the greatest penalty in terms of prior probability mass.

Before moving to the BMA statistics, it is instructive to examine the change in the \verb+dil.Par+ parameter.
<<fig=TRUE>>=
bma_results_dil01 <- bma(
  model_space = full_model_space,
  df          = data_prepared,
  round       = 3,
  dilution    = 1,
  dil.Par     = 0.1
)
size_graphs_dil01 <- model_sizes(bma_results_dil01)
@

\noindent As we can see, decreasing the value of $\omega$ diminishes the impact of dilution on the model prior.
Conversely, raising the \verb+dil.Par+ parameter increases the degree of dilution.
<<fig=TRUE>>=
bma_results_dil2 <- bma(
  model_space = full_model_space,
  df          = data_prepared,
  round       = 3,
  dilution    = 1,
  dil.Par     = 2
)
size_graphs_dil2 <- model_sizes(bma_results_dil2)
@

\noindent An especially strong impact can be seen for the binomial-beta prior.

However, even after giving such priority to the penalty for multicollinearity,
the main BMA statistics remain stable.
<<>>=
bma_results_dil2[[2]]
@
Hence, we see that \citet{Moral+2016}’s claim about the fragility of growth regressors withstands the test of various manipulations in the model prior.

\section{Concluding remarks}\label{sum}

This manuscript introduces the \verb+bdsm+ package, which enables Bayesian model averaging for dynamic panels with weakly exogenous regressors — a methodology developed by \citet{Moral+2012,Moral+2013,Moral+2016}.
This package allows researchers to simultaneously address model uncertainty and reverse causality and is the only \verb+R+ package offering these capabilities.
It provides flexible options for specifying model priors, including dilution prior that accounts for multicollinearity.
The package also includes graphical tools for visualizing prior and posterior model probabilities across model space and model sizes, as well as functions for plotting histograms and kernel densities of the estimated coefficients.
Additionally, it allows researchers to compute jointness measures introduced by \citet{Doppelhofer+2009,Ley+2007,Hofmarcher+2018} to assess whether pairs of regressors act as substitutes or complements.
Users can also perform Bayesian model selection to examine in detail the most probable models based on posterior model probability.

The manuscript outlines the methodological approach, while the detailed explanation can be found in \citet{Moral+2012,Moral+2013,Moral+2016}.
Users unfamiliar with this approach can easily learn to apply it through the hands-on tutorial provided in the manuscript.
The package’s functionalities are illustrated using the original dataset from \citet{Moral+2016} in the context of analyzing the determinants of economic growth.
The results of the examination illustrate that fragility of growth determinants is a persistent feature of the data, confirming \citet{Moral+2016} claims.
The various empirical exercises underscore two important aspects of any BMA analysis.
First, the results should always be validated through extensive changes in prior specifications.
Second, the robustness of the regressors must be evaluated using both
posterior inclusion probabilities and the ratios of the posterior mean to the posterior standard deviation,
as these measures can often lead to differing conclusions.


\addcontentsline{toc}{section}{References}
\bibliography{references}
\bibliographystyle{apalike}

\end{document}