Library DoubleqpcR

library(DoubleqpcR)

Data Handling

To import the data the following can be used. The input.cq, input.raw, input.raw.melt, input.raw.melt.usedOnly are necessary and will be used globally! But they can be created without these functions, of course.

The example data used here is a dilution series and therefore the samples are numeric concentrations of the target genotype. Some methods for the regression will expect, that the sample column are indeed numbers! Sample 100 means a 100% target genome concentration (This can be confusing, because it is not the actual concentration).

read.cqTable(csv = "rps4_8fach_cq.csv") # mistake in name on purpose to demonstrate
## Warning in read.cqTable(csv = "rps4_8fach_cq.csv"): Table does not colnames
## correct! First three needs to be 'type','sample','well'. Renaming, but not
## checking anything! Please manually check.
read.fluorescenceTable(csv = "rps4_8fach.csv")

Cq value determination

To add more Cq values from fluorescence curves any method can be used. DoubleqpcR has two implemented methods and a wrapper for the qpcR package. Both use the first and second derivative maxima to determine the Cq value. (FD = First Derivative = TP = Tourning Point) and (SD = Second Derivative = exponential slope)

if(calculateEverything){
# Calculate new Cq values (needs fluorescence table!)
calc.Cq(method = "TP", cq.new = "TP")
calc.Cq(method = "SD", cq.new = "SD")

calc.Cq.qpcR(method = "TP", cq.new = "TP.qpcR")
calc.Cq.qpcR(method = "SD", cq.new = "SD.qpcR")
}

Get the Cq values

This step is optional and more for internal function use. But if the Cq values (with outlier removal) for a sample are needed in a data format, one can use the following.

myCqValues <- table.Cq(sample = 8.8, target = "bitter", CqType = c("Regression"), outliers = TRUE, alpha = 0.05, format = "data")

Plot the fluorescence curves

To change the visuals you can overwrite the ggplot theme settings and details. This is commented in the plot for all curves.

p <- plot.curve(sample = "all")
p

# p + theme_tq() + scale_color_tq() + theme(legend.position = "none")

An Example:

We choose Dixon outlier test here, as we have only 4 measurements per individual subsample (or no outlier removal). One can later choose outlier detection again when combining the subsamples.

# Here is the command in simpler form:
plot.curve(sample = "81.4.a", target = "bitter")  # sample can be also "81.4". then no percentage is shown.

table.Cq(sample = "81.4.a", target = "bitter", CqType = c("Regression", "Autocalculated"), outliers = TRUE, outliers.method = "Dixon", alpha = 0.05, format = "kable")
## [1] "6.89 in 6.89 7.35 7.44 7.39 is an outlier."
Cq-values: 81.4.a bitter
Regression.bitter Regression.sweet diff.Regression Autocalculated.bitter Autocalculated.sweet diff.Autocalculated
1 8.920 8.620 10.400
2 7.350 9.070 8.560 10.650
3 7.440 9.020 8.610 10.850
4 7.390 8.880 8.510 10.790
mean 7.393 8.973 -1.579 8.575 10.672 -2.098
sd 0.045 0.088 0.133 0.051 0.200 0.251
# plot all samples with Headings, ideal for tabsets.
if(calculateEverything){
for (sampleX in unique(input.cq$sample)) {
   cat('\n\n#### Sample `', sampleX, '`\n\n')
   plot(plot.curve(sample = sampleX, target = "bitter"))
   cat('\n\n')
   show(table.Cq(sample = sampleX, target = "bitter", CqType = c("Regression","Autocalculated","TP.qpcR","SD.qpcR","SD","TP"), outliers = TRUE, alpha = 0.05, format = "kable"))
}
}

Combine data for further analysis

At this point a new list of dataframes (data.cq) will be created via make.data.cq() which looses the connection to the fluorescence table. The steps above can be repeated and added to data.cq via the option add = TRUE. This procedure allows to combine multiple experiments. Cq value types that are wanted to use needs to be selected.

If all values are already present in the input.cq like it is the case in this example we only need to make the data.cq list.

make.Cq.data(add = FALSE, target = "bitter", CqType = c("Regression", "Autocalculated"), outliers = TRUE, outliers.method = "Dixon", alpha = 0.05, silent = TRUE)

In this example we have seperated two experiments with an indicator “a” and “b”. It is now time to combine the two individual subsamples to one.

combineSubsamples(delimiter = ".", outliers = TRUE, outliers.method = "Grubbs")
## [1] "11.09 in 12.35 12.01 11.09 11.85 11.86 11.83 11.92 11.97 is an outlier."
## [1] "12.35 in 12.35 12.01 NA 11.85 11.86 11.83 11.92 11.97 is an outlier."
## [1] "7.88 in NA 7.35 7.44 7.39 7.88 7.44 7.25 7.33 is an outlier."
## [1] "9.47 in 8.62 8.56 8.61 8.51 9.47 8.3 8.75 8.8 is an outlier."
## [1] "8.91 in 8.25 8.47 8.39 8.41 8.26 8.24 8.54 8.91 is an outlier."
## [1] "8.71 in 8.28 8.19 8.71 8.18 8.15 8.06 7.98 8.33 is an outlier."
## [1] "12.37 in 11.8 11.82 12.12 12.37 11.9 11.97 11.94 11.97 is an outlier."
## [1] "4.84 in 5.36 5.67 4.84 5.33 5.59 5.73 5.8 5.63 is an outlier."
## [1] "5.41 in 6.6 6.9 5.41 6.25 6.67 6.72 6.69 6.6 is an outlier."
## [1] "6.25 in 6.6 6.9 NA 6.25 6.67 6.72 6.69 6.6 is an outlier."
## [1] "6.9 in 6.6 6.9 NA NA 6.67 6.72 6.69 6.6 is an outlier."

Summary of Cq values

A summary of the cq values can be given via a data.cq.sum data frame. Samples which are names numeric can be filtered.

Cq.data.mean(CqType = "Regression", onlyNumeric = TRUE)
data.Cq.sum
##    sample mean.Cq.target sd.Cq.target mean.Cq.offtarget sd.Cq.offtarget
## 1     100       5.655000   0.05182388         11.108571      0.06067085
## 2      93       7.320000   0.03023716         10.911250      0.13357047
## 3      88       7.243750   0.23114235          9.940000      0.14985707
## 4      81       7.366667   0.07284687          9.162500      0.25262903
## 5      72       7.483750   0.53401277          8.313750      0.22815643
## 6      46       7.062857   0.10703804          7.080000      0.09856108
## 7      22       8.977500   0.20748494          7.181250      0.18703991
## 8      14       9.991250   0.12170661          7.188750      0.15514394
## 9       8      10.705714   0.22059444          6.925714      0.08734169
## 10      4      11.931429   0.10745985          7.042500      0.20919915
## 11      0      13.107500   0.23699005          5.587143      0.17895198

Plot Cq values

Here all samples that are in the data.cq list will be plotted for one Cq type.

plot.Cq(CqType = "Regression", onlyNumeric = TRUE)

Here all samples that are in the data.cq list will be plotted for one Cq type against the log of the concentration. Therefore the sample name has to be a numeric value which corresponds to the target concentration.

plot <- plot.Cq.efficiency(CqType = "Regression")

plot + theme_tq() + scale_fill_tq() + scale_colour_tq()

Delta Cq Values

To get the delta Cq values the function delta.Cq.data() can be used. It has different methods to calculate the differences…

Here we use all combinations of Cq values from one sample.

delta.Cq.data(CqType = "Regression", method = "c", onlyNumeric = TRUE)
head(delta.Cq)
##   sample delta.Cq
## 1      0    -7.47
## 2      0    -7.16
## 4      0    -7.50
## 5      0    -7.24
## 6      0    -7.10
## 7      0    -7.03

Plot delta Cq values

The difference Plot for the target genotype with the other genotype and vice versa.

plot <- plot.delta.Cq.differences() 
plot + 
  theme_bw() +
  annotate("label", x = 40, y = 6, label = "bitter") +
  annotate("label", x = 40, y = -7, label = "sweet")

A typical box plot representation of delta Cq values. This is of cause only for demonstration purposes as these plots can be made in various ways from the data with standard visualisation methods. (Obviously this makes no sense with “delta.Cq.data(method=”mean”)“.

plot <- plot.delta.Cq.box(points = TRUE) 
plot + theme_bw()

Regression analysis (polynomial)

At this point all samples need to be numeric and represent a percentage!

All samples should be between 0 % and 100 %. Samples that are not numeric and between 0 to 100 will be skipped.

An overview can be plotted for different Cq types from data.Cq.

plot.delta.Cq.overview(CqType = c("Regression", "Autocalculated"))

Model generation

A model (linear, poly3, …) can be used to describe the values from DMAS-qPCR. The R package investR is used for prediction and plotting. And the package caret for cross validation.

regression.delta.Cq(CqType = "Regression", fit = "poly3", cv = TRUE, cvComplete = TRUE)
## [1] "CROSS VALIDATION:"
## Linear Regression 
## 
## 641 samples
##   1 predictor
## 
## No pre-processing
## Resampling: Repeated Train/Test Splits Estimated (25 reps, 50%) 
## Summary of sample sizes: 322, 322, 322, 322, 322, 322, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.4195236  0.9875655  0.3144013
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

## 
## =====
## Leave one proportion completely out - mean cross validation error [percentage points]:
## [1] 1.988061

Model analysis

The model can be analysed by standard methods:

# Model Summary:
summary(model.delta.Cq)
## 
## Call:
## lm(formula = func, data = modelData.delta.Cq)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.36349 -0.18907 -0.01748  0.24451  1.06536 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   -0.50917    0.01635  -31.14   <2e-16 ***
## poly(sample, 3, raw = FALSE)1 90.58956    0.41398  218.83   <2e-16 ***
## poly(sample, 3, raw = FALSE)2 -6.60595    0.41398  -15.96   <2e-16 ***
## poly(sample, 3, raw = FALSE)3 24.26637    0.41398   58.62   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.414 on 637 degrees of freedom
## Multiple R-squared:  0.9878, Adjusted R-squared:  0.9877 
## F-statistic: 1.719e+04 on 3 and 637 DF,  p-value: < 2.2e-16
# Plot the residuals:
plot(summary(model.delta.Cq)$residuals, main = "Residuals for fit", ylab = "residuals")

Model prediction

Values can be predicted with investr package. The mean.response should be set to TRUE. But one could argue that the intrinsic bias from the combinatoric data for model creation is not fit for a mean response and a individual response should be considered (mean.response = FALSE). This leads to enormous uncertainty around delta CQ of 0.

predictionValue <- 0

res <- invest(model.delta.Cq, y0 = predictionValue, interval = "inversion", level = 0.95, mean.response = FALSE)

plotFit(model.delta.Cq, interval = "both",
         level = 0.95, shade = TRUE,
         col.pred = "whitesmoke",
         col.conf = "skyblue",
         extend.range = TRUE,
         xlab = "proportion of target genotype",
         ylab = bquote(Delta*Cq))
abline(h = predictionValue, v = c(res$lower, res$estimate, res$upper), lty = 2, lwd = 0.5, col = "salmon")

For the prediction y = 0 the model predicts a response of 41.8601932 with upper 71.8788959 and lower 27.8077061 interval.

Regression analysis (Linearisation)

At this point all samples need to be numeric and represent a percentage!

All samples should be between 0 % and 100 %. Samples that are not numeric and between 0 to 100 will be skipped.

An overview can be plotted for different Cq types from data.Cq.

plot.delta.Cq.overview(CqType = c("Regression", "Autocalculated"), linSqrtTrans = TRUE, xlab = "shifted square root transformation of proportion")

Model generation

The shifted square root transformation to linearise the data can be used with the parameter linSqrtTrans = TRUE

regression.delta.Cq(CqType = "Regression", linSqrtTrans = TRUE, fit = "linear", cv = TRUE, cvComplete = TRUE)
## [1] "CROSS VALIDATION:"
## Linear Regression 
## 
## 641 samples
##   1 predictor
## 
## No pre-processing
## Resampling: Repeated Train/Test Splits Estimated (25 reps, 50%) 
## Summary of sample sizes: 322, 322, 322, 322, 322, 322, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.5380403  0.9795654  0.4280503
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

## 
## =====
## Leave one proportion completely out - mean cross validation error [percentage points]:
## [1] 3.596187

Model analysis

The model can be analysed by standard methods:

# Model Summary:
summary(model.delta.Cq)
## 
## Call:
## lm(formula = func, data = modelData.delta.Cq)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.70624 -0.31956  0.03975  0.42038  1.16038 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.352907   0.021236  -16.62   <2e-16 ***
## sample       0.868448   0.004983  174.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5372 on 639 degrees of freedom
## Multiple R-squared:  0.9794, Adjusted R-squared:  0.9794 
## F-statistic: 3.037e+04 on 1 and 639 DF,  p-value: < 2.2e-16

Model prediction

Values can be predicted with investr package. The mean.response should be set to TRUE. But one could argue that the intrinsic bias from the combinatoric data for model creation is not fit for a mean response and a individual response should be considered (mean.response = FALSE). This leads to enormous uncertainty around delta CQ of 0.

predictionValue <- 0

res <- invest(model.delta.Cq, y0 = predictionValue, interval = "inversion", level = 0.95, mean.response = FALSE)

plotFit(model.delta.Cq, interval = "both",
         level = 0.95, shade = TRUE,
         col.pred = "whitesmoke",
         col.conf = "skyblue",
         extend.range = TRUE,
         xlab = "proportion of target genotype",
         ylab = bquote(Delta*Cq))
abline(h = predictionValue, v = c(res$lower, res$estimate, res$upper), lty = 2, lwd = 0.5, col = "salmon")

For the prediction y = 0 the model predicts a response of 55.5817329 with upper 70.3085017 and lower 39.2111252 interval.

Example

We look at an example from the study where we examined bitter and sweet almond samples. Here, two datasets from sweet M0 marzipan and from self-made bitter marzipan are used as an example. We use the regression analysis from before. The data in this part has no concentration and we do not look at the fluorescence data.

Import

Let use first import the bitter marzipan data. The .csv table is manually saved from the .xlxs excel supporting data file. We can see that there are two errors. First the column names are not as described by the package which are automatically renamed and we can ignore this error. Secondly a warning about the genotype is presented. This is from the NTC measurement and we can delete this row.

Next we can add this data to the data.cq dataframe. Actually we make a new one, because we do not want to add the data to the regression data we did before.

read.cqTable(csv = "DMASqPCR_bitter_marzipan.csv")
## Warning in read.cqTable(csv = "DMASqPCR_bitter_marzipan.csv"): Table does not
## colnames correct! First three needs to be 'type','sample','well'. Renaming, but
## not checking anything! Please manually check.
## Warning in read.cqTable(csv = "DMASqPCR_bitter_marzipan.csv"): There are not two
## genotypes present in the data. This may cause massive trouble! Maybe misspelled?
input.cq <- input.cq[1:(nrow(input.cq)-1),]

make.Cq.data(add = FALSE, target = "bitter", CqType = "Regression")
## [1] "Sample debittered 1 :"
## [1] "Cq type Regression processing:"
## [1] "Sample debittered 2 :"
## [1] "Cq type Regression processing:"
## [1] "Sample debittered 3 :"
## [1] "Cq type Regression processing:"
## [1] "Sample debittered 4 :"
## [1] "Cq type Regression processing:"
## [1] "Sample debittered 5 :"
## [1] "Cq type Regression processing:"
## [1] "Sample debittered 6 :"
## [1] "Cq type Regression processing:"

Additionally we can think about removing the “debittered 6” experiment as only values for one genotype are present. But here we just leave them be.

Now we add the values from the sweet Marzipan experiment. Here we have a problem with the sample names as they are called “Marzipan1” and so on. we have to manually change it to “Marzipan 1” to be coherent with the other naming…

read.cqTable(csv = "DMASqPCR_sweet_marzipan.csv")
## Warning in read.cqTable(csv = "DMASqPCR_sweet_marzipan.csv"): Table does not
## colnames correct! First three needs to be 'type','sample','well'. Renaming, but
## not checking anything! Please manually check.
input.cq$sample <- sub("([0-9])"," \\1",input.cq$sample)

make.Cq.data(add = TRUE, target = "bitter", CqType = "Regression")
## [1] "Sample Marzipan 1 :"
## [1] "Cq type Regression processing:"
## [1] "Sample Marzipan 2 :"
## [1] "Cq type Regression processing:"
## [1] "Sample Marzipan 3 :"
## [1] "Cq type Regression processing:"

Now we can combine our subsamples and have two entries in the data.Cq!

combineSubsamples(delimiter = " ", useMeans = TRUE)
data.Cq
## $debittered
## $debittered$Regression
##   Regression.bitter Regression.sweet
## 1          18.07667         18.83333
## 2          18.92667         20.02667
## 3          18.12000         19.26000
## 4          19.19333         19.97500
## 5          18.67000         19.27667
## 6          20.55667              NaN
## 
## 
## $Marzipan
## $Marzipan$Regression
##   Regression.bitter Regression.sweet
## 1           20.3275          14.9675
## 2           22.2100          15.7650
## 3           21.4750          14.7950

Note that in this case we have searched for outliers two times. When we make or add the data.Cq in one subsample and when combining all subsamples. If the means are used, then the means will be compared for outliers! So in this case it makes sense. If we work with useMeans = FALSE than the outlier test in the make.Cq.data() function should be omitted!

Plot delta Cq values with prediction

Again the typical box plot representation of delta Cq values but now we include the predictions from the previously generated regression model.

First we have to generate the delta Cq values (this will overwrite the delta.Cq dataframe from regression analysis) As the model in the environment we have to tell the prediction to reTransform the values. We also plot the P value from a t.test for the two boxplots.

delta.Cq.data(method = "combinatorial")
plot.delta.Cq.box(useModel = TRUE, reTransform = TRUE, predictionRange = seq(-6,2,2), pVal = TRUE)

This should be an example for possible Boxplot analysis. These functions do not have the goal to be complete or to cover all possibilities.

Happy R-coding have a nice day!