# Creating a Q-matrix to use in PopGenHelpR

Source:`vignettes/articles/PopGenHelpR_createQmatrix.Rmd`

`PopGenHelpR_createQmatrix.Rmd`

## Purpose

To generate a q-matrix of ancestry coefficients for use in the
`PopGenHelpR`

functions `Ancestry_barchart`

and
`Piechart_map`

.

## What is a Q-matrix?

A q-matrix is a matrix containing as many rows as individuals and columns as genetic clusters. Each cell represents an ancestry coefficient (also known as cluster assignments), which is the contribution of a genetic cluster to a particular individual. Q-matrices are commonly used in population genomics to evaluate gene flow between populations (e.g., admixture) or species (e.g., introgression). ADMIXTURE (Alexander et al., 2009) and sNMF (Frichot et al., 2014) are commonly used software to estimate the number of genetic clusters in data and generate ancestry bar charts with q-matrices.

Let’s generate some q-matrices now that you know what they are! We will create q-matrices using each of the programs mentioned above.

### sNMF

We will start with sNMF because it is implemented in the R package LEA
(Frichot & Francois, 2015). After running sNMF (see this
tutorial if you need help) you just need to use the `Q`

function.

```
# If I have a sNMF project named sNMFobject with K number of ancestral populations (genetic clusters), and my best run is run 1 (determined as the run with the lowets cross-entropy)
Qmat <- Q(sNMFobject, K = K, run = 1)
```

All you need to do now is append the sample names to the q-matrix as
the first column (you can do this with `cbind`

or in any text
editor). Then you can use it in `PopGenHelpR`

. Note that you
must be careful that your order in the q-matrix is the same as the order
of the samples you are appending.

#### Example of formatting the q-matrix for `PopGenHelpR`

Here we show you how to format a q-matrix generated with the
`Q`

function from LEA for use in
`PopGenHelpR`

.

First, we will create a matrix that we may expect from LEA. We also
need to create fake sample names **Please note that this is only a
toy example and is not real data.**

```
# Create fake matrix
Qmat <- t(matrix(data = c(0.25, 0.4, 0.35), nrow = 3, ncol = 3))
Fake_inds <- c("FS_1", "FS_2", "FS_3")
```

Cool! We have data, so can we use it in `PopGenHelpR`

? No,
because `Ancestry_barchart`

and `Piechart_map`

need a data.frame or CSV; these functions also need the first column to
be the individual names. `PopGenHelpR`

uses the individual
names as a key to link the q-matrix data with populations and
coordinates.

Let’s add the individual names!

```
# Add the names
Qmat_wnames <- cbind(Fake_inds, Qmat)
```

So can we use this `Qmat_wnames`

now? No, because
`Qmat_wnames`

is still a matrix and let’s see what
`cbind`

did to our numeric data. Notice that
`cbind`

make everything a character, we need the cluster
contributions (columns 2 through 4 here) to be numeric. We will fix this
using the `sapply`

function.

```
# Check the structure of the Qmat_wnames
str(Qmat_wnames)
#> chr [1:3, 1:4] "FS_1" "FS_2" "FS_3" "0.25" "0.25" "0.25" "0.4" "0.4" "0.4" ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : NULL
#> ..$ : chr [1:4] "Fake_inds" "" "" ""
Qmat_df <- as.data.frame(Qmat_wnames)
Qmat_df[2:4] <- sapply(Qmat_df[2:4], as.numeric)
# Check again
str(Qmat_df)
#> 'data.frame': 3 obs. of 4 variables:
#> $ Fake_inds: chr "FS_1" "FS_2" "FS_3"
#> $ V2 : num 0.25 0.25 0.25
#> $ V3 : num 0.4 0.4 0.4
#> $ V4 : num 0.35 0.35 0.35
```

Notice that our cluster contribution columns are now numeric and that
our `Qmat_df`

object is a data.frame.

Now we can use it in `PopGenHelpR`

with a population
assignment file/data.frame to generate figures.

### ADMIXTURE

ADMIXTURE is a little more complex because it is not associated with an R package, but it is nice because it gives us the q-matrix automatically. See this tutorial for more details.

In the example below, we tell ADMIXTURE to use a bed file as input and run the analysis with a K value of 5.

This will output a file with the .Q extension, which contains ancestry coefficients for each individual (our q-matrix).

## Questions???

Please email Keaka Farleigh (farleik@miamioh.edu) if you need help generating a q-matrix or with anything else.

## References

Alexander, D. H., Novembre, J., & Lange, K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome research, 19(9), 1655-1664.

Frichot, E., & François, O. (2015). LEA: An R package for landscape and ecological association studies. Methods in Ecology and Evolution, 6(8), 925-929.

Frichot, E., Mathieu, F., Trouillon, T., Bouchard, G., & François, O. (2014). Fast and efficient estimation of individual ancestry coefficients. Genetics, 196(4), 973-983.