The `stats`

implementation of `rWishart`

is in
`C`

and is very fast. It is often the case that we do not
want a sample from the Wishart distribution, but rather from the inverse
of it or from the Cholesky decomposition of a sample from the Wishart
distribution. Or even from the inverse of the Cholesky decomposition of
a draw from the Wishart distribution. Funnily enough (if you have a
weird sense of humor), when you inspect the source
code for the `rWishart`

distribution (R Core Team (2017)), it generates the Cholesky
decomposition and then multiplies it out. Meanwhile, drawing from the
`rWishart`

and then inverting or doing a Cholesky
decomposition or whatever in R is just slow – comparatively.

This suggests some obvious efficiencies: perhaps, if we would rather have the Cholesky decomposition of the Wishart random matrix, we could tell the function to stop right there.

```
library('CholWishart')
set.seed(20180220)
A <- stats::rWishart(1,10,5*diag(4))[,,1]
set.seed(20180220)
B <- rInvWishart(1,10,.2*diag(4))[,,1]
set.seed(20180220)
C <- rCholWishart(1,10,5*diag(4))[,,1]
set.seed(20180220)
D <- rInvCholWishart(1,10,.2*diag(4))[,,1]
```

Suppose \(X_i \sim MVN(0, \Sigma)\) are independent \(p\)-variate normal random variables, \(i = 1, 2, \ldots n\) with \(n > p-1\). Then \(S = \sum X_i^T X_i\), called the “scatter matrix”, is almost surely positive definite if \(\Sigma\) is positive definite. The random variable \(S\) is said to be distributed as a Wishart random variable: \(S \sim W_p(n, \Sigma)\), see Gupta and Nagar (1999). This can be extended to the non-integer case as well.

How does `rWishart(n, df, Sigma)`

work (supposing
`Sigma`

is a \(p \times p\)
matrix)? First, it generates a sample from the Cholesky decomposition of
a Wishart distribution with \(\Sigma =
\mathbf{I}_p\). How this is done: on the \(i^{th}\) element of the main diagonal, draw
from \(\sqrt{\chi_{p-i+1}^2}\). On the
upper triangle of the matrix, sample from an independent \(N(0,1)\) for each entry in the matrix.
Then, this can be multiplied by the Cholesky decomposition of the
provided `Sigma`

to obtain the Cholesky factor of the desired
sample from the Wishart random variable (this construction is due to
Bartlett and is also known as the Bartlett Decomposition) (see Anderson (1984)). The `rWishart`

function multiplies this out. Therefore, if the Cholesky decomposition
is desired, one only needs to stop there.

If \(X \sim \textrm{W}_p(\nu,\Sigma)\), then we define the Inverse Wishart as \(X^{-1} = Y \sim \textrm{IW}_p(\nu , \Sigma^{-1})\). There are other parameterizations of the distribution, mostly coming down to different ways of writing the \(\nu\) parameter - be aware of this when using any package drawing from the Inverse Wishart distribution (see Dawid (1981) for an alternative; this presentation follows Gupta and Nagar (1999)). This comes up directly in Bayesian statistics. We are also interested in the Cholesky decomposition of this, as it is required in the generation of the matrix variate \(t\)-distribution. In this package it is done by taking the covariance matrix, inverting it, computing the Cholesky decomposition of the inverted covariance matrix, drawing the Cholesky factor of a Wishart matrix using that, and then inverting based on that (as finding \(\Psi^{-1}\) given the Cholesky factorization of \(\Psi\) is relatively fast). This can then be converted into the Cholesky factor of the Inverse Wishart if that is what is desired. This would be slow to do in R, but in C it is not so bad.

Here is what happens with the results of the above:

```
A %*% B
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 -2.775558e-17 -1.387779e-16 -5.551115e-17
## [2,] -4.718448e-16 1.000000e+00 1.387779e-17 0.000000e+00
## [3,] -1.249001e-16 1.387779e-17 1.000000e+00 -5.551115e-17
## [4,] 1.110223e-16 -1.110223e-16 0.000000e+00 1.000000e+00
crossprod(C) %*% crossprod(D) # note: we do not expect C = D^-1, we expect this!
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 -2.775558e-17 -2.081668e-16 -1.110223e-16
## [2,] -4.718448e-16 1.000000e+00 1.387779e-16 0.000000e+00
## [3,] -1.249001e-16 1.110223e-16 1.000000e+00 -1.110223e-16
## [4,] 1.110223e-16 -8.326673e-17 -8.326673e-17 1.000000e+00
crossprod(D) %*% A
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 -4.718448e-16 -1.249001e-16 1.110223e-16
## [2,] -2.775558e-17 1.000000e+00 1.110223e-16 -8.326673e-17
## [3,] -2.081668e-16 1.387779e-16 1.000000e+00 -8.326673e-17
## [4,] -1.110223e-16 0.000000e+00 -1.110223e-16 1.000000e+00
crossprod(C) %*% B
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 -2.775558e-17 -1.387779e-16 -5.551115e-17
## [2,] -4.718448e-16 1.000000e+00 1.387779e-17 0.000000e+00
## [3,] -1.249001e-16 1.387779e-17 1.000000e+00 -5.551115e-17
## [4,] 1.110223e-16 -1.110223e-16 0.000000e+00 1.000000e+00
```

There is some roundoff error.

Suppose, instead of the above definition of the Wishart, we have \(n \leq p-1\). Then the scatter matrix defined above will not be positive definite. This is called the pseudo Wishart distribution. If we then take the Moore-Penrose pseudo-inverse of this, we have the generalized inverse Wishart distribution.

```
A <- rPseudoWishart(n = 1, df = 3, Sigma = diag(5))[, , 1]
A
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.9553698 0.3134085 -0.8487259 -0.5174803 0.7487007
## [2,] 0.3134085 2.3856728 -0.6106572 1.4471373 1.5976931
## [3,] -0.8487259 -0.6106572 0.5838199 -0.7509533 -0.7863404
## [4,] -0.5174803 1.4471373 -0.7509533 4.8520003 1.6696925
## [5,] 0.7487007 1.5976931 -0.7863404 1.6696925 1.4396817
qr(A)$rank
## [1] 3
B <- rGenInvWishart(n = 1, df = 3, Sigma = diag(5))[, , 1]
B
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.33422500 0.189387794 0.07086651 -0.052283855 -0.04241885
## [2,] 0.18938779 0.155647000 -0.02450637 0.001357582 -0.07549583
## [3,] 0.07086651 -0.024506366 0.10489215 -0.052325490 0.03515182
## [4,] -0.05228386 0.001357582 -0.05232549 0.028056042 -0.02793470
## [5,] -0.04241885 -0.075495830 0.03515182 -0.027934699 0.24217199
qr(B)$rank
## [1] 3
```

Note that the rank of both of these matrices is less than the dimension.

This package also has functions for density computations with the Wishart distribution. Densities are only defined for positive-definite input matrices and \(\nu\) parameters larger than the dimension \(p\).

The return value is on the `log`

scale but it can be
specified otherwise.

```
dWishart(diag(3), df = 5, 5*diag(3))
## [1] -19.45038
dInvWishart(diag(3), df = 5, .2*diag(3))
## [1] -19.45038
```

Note that, in general, these will not agree even if their covariance matrix parameters are inverses of each other. One of the reasons this works is that the determinant of \(\mathbf{X}\) is \(1\).

The density functions can take 3-D array input indexed on the third dimension and will output a vector of densities.

The multivariate gamma (\(\Gamma_p\)) and digamma (\(\psi_p\)) functions are extensions of the univariate gamma (\(\Gamma\)) and digamma (\(\psi\)) functions (Mardia, Bibby, and Kent (1982)). They are useful in calculating the densities above. They come up in other distributions as well. The digamma is the first derivative of the gamma function. When the dimension \(p = 1\), they coincide with the usual definitions of the digamma and gamma functions.

The multivariate gamma also comes in a logarithmic form
(`lmvgamma`

).

Anderson, T. W., ed. 1984. *An Introduction to Multivariate
Statistical Analysis*. Wiley.

Dawid, A. P. 1981. “Some Matrix-Variate Distribution Theory:
Notational Considerations and a Bayesian Application.”
*Biometrika* 68 (1): 265–74. http://www.jstor.org/stable/2335827.

Gupta, A. K., and D. K. Nagar. 1999. *Matrix Variate
Distributions*. Monographs and Surveys in Pure and Applied
Mathematics. Taylor & Francis. https://books.google.com/books?id=PQOYnT7P1loC.

Mardia, K. V., J. M. Bibby, and J. T. Kent. 1982. *Multivariate
Analysis*. Probability and Mathematical Statistics. Acad. Press. https://books.google.com/books?id=1nLonQEACAAJ.

R Core Team. 2017. *R: A Language and Environment for Statistical
Computing*. Vienna, Austria: R Foundation for Statistical Computing.
https://www.R-project.org/.