library(DoubleqpcR)
To import the data the following can be used. The
input.cq
, input.raw
,
input.raw.melt
, input.raw.melt.usedOnly
are
necessary and will be used globally! But they can be created without
these functions, of course.
The example data used here is a dilution series and therefore the samples are numeric concentrations of the target genotype. Some methods for the regression will expect, that the sample column are indeed numbers! Sample 100 means a 100% target genome concentration (This can be confusing, because it is not the actual concentration).
read.cqTable(csv = "rps4_8fach_cq.csv") # mistake in name on purpose to demonstrate
## Warning in read.cqTable(csv = "rps4_8fach_cq.csv"): Table does not colnames
## correct! First three needs to be 'type','sample','well'. Renaming, but not
## checking anything! Please manually check.
read.fluorescenceTable(csv = "rps4_8fach.csv")
To add more Cq values from fluorescence curves any method can be used. DoubleqpcR has two implemented methods and a wrapper for the qpcR package. Both use the first and second derivative maxima to determine the Cq value. (FD = First Derivative = TP = Tourning Point) and (SD = Second Derivative = exponential slope)
if(calculateEverything){
# Calculate new Cq values (needs fluorescence table!)
calc.Cq(method = "TP", cq.new = "TP")
calc.Cq(method = "SD", cq.new = "SD")
calc.Cq.qpcR(method = "TP", cq.new = "TP.qpcR")
calc.Cq.qpcR(method = "SD", cq.new = "SD.qpcR")
}
This step is optional and more for internal function use. But if the Cq values (with outlier removal) for a sample are needed in a data format, one can use the following.
<- table.Cq(sample = 8.8, target = "bitter", CqType = c("Regression"), outliers = TRUE, alpha = 0.05, format = "data") myCqValues
To change the visuals you can overwrite the ggplot theme settings and details. This is commented in the plot for all curves.
<- plot.curve(sample = "all")
p p
# p + theme_tq() + scale_color_tq() + theme(legend.position = "none")
An Example:
We choose Dixon outlier test here, as we have only 4 measurements per individual subsample (or no outlier removal). One can later choose outlier detection again when combining the subsamples.
# Here is the command in simpler form:
plot.curve(sample = "81.4.a", target = "bitter") # sample can be also "81.4". then no percentage is shown.
table.Cq(sample = "81.4.a", target = "bitter", CqType = c("Regression", "Autocalculated"), outliers = TRUE, outliers.method = "Dixon", alpha = 0.05, format = "kable")
## [1] "6.89 in 6.89 7.35 7.44 7.39 is an outlier."
Regression.bitter | Regression.sweet | diff.Regression | Autocalculated.bitter | Autocalculated.sweet | diff.Autocalculated | |
---|---|---|---|---|---|---|
1 | 8.920 | 8.620 | 10.400 | |||
2 | 7.350 | 9.070 | 8.560 | 10.650 | ||
3 | 7.440 | 9.020 | 8.610 | 10.850 | ||
4 | 7.390 | 8.880 | 8.510 | 10.790 | ||
mean | 7.393 | 8.973 | -1.579 | 8.575 | 10.672 | -2.098 |
sd | 0.045 | 0.088 | 0.133 | 0.051 | 0.200 | 0.251 |
# plot all samples with Headings, ideal for tabsets.
if(calculateEverything){
for (sampleX in unique(input.cq$sample)) {
cat('\n\n#### Sample `', sampleX, '`\n\n')
plot(plot.curve(sample = sampleX, target = "bitter"))
cat('\n\n')
show(table.Cq(sample = sampleX, target = "bitter", CqType = c("Regression","Autocalculated","TP.qpcR","SD.qpcR","SD","TP"), outliers = TRUE, alpha = 0.05, format = "kable"))
} }
At this point a new list of dataframes (data.cq) will be created via make.data.cq() which looses the connection to the fluorescence table. The steps above can be repeated and added to data.cq via the option add = TRUE. This procedure allows to combine multiple experiments. Cq value types that are wanted to use needs to be selected.
If all values are already present in the input.cq like it is the case in this example we only need to make the data.cq list.
make.Cq.data(add = FALSE, target = "bitter", CqType = c("Regression", "Autocalculated"), outliers = TRUE, outliers.method = "Dixon", alpha = 0.05, silent = TRUE)
In this example we have seperated two experiments with an indicator “a” and “b”. It is now time to combine the two individual subsamples to one.
combineSubsamples(delimiter = ".", outliers = TRUE, outliers.method = "Grubbs")
## [1] "11.09 in 12.35 12.01 11.09 11.85 11.86 11.83 11.92 11.97 is an outlier."
## [1] "12.35 in 12.35 12.01 NA 11.85 11.86 11.83 11.92 11.97 is an outlier."
## [1] "7.88 in NA 7.35 7.44 7.39 7.88 7.44 7.25 7.33 is an outlier."
## [1] "9.47 in 8.62 8.56 8.61 8.51 9.47 8.3 8.75 8.8 is an outlier."
## [1] "8.91 in 8.25 8.47 8.39 8.41 8.26 8.24 8.54 8.91 is an outlier."
## [1] "8.71 in 8.28 8.19 8.71 8.18 8.15 8.06 7.98 8.33 is an outlier."
## [1] "12.37 in 11.8 11.82 12.12 12.37 11.9 11.97 11.94 11.97 is an outlier."
## [1] "4.84 in 5.36 5.67 4.84 5.33 5.59 5.73 5.8 5.63 is an outlier."
## [1] "5.41 in 6.6 6.9 5.41 6.25 6.67 6.72 6.69 6.6 is an outlier."
## [1] "6.25 in 6.6 6.9 NA 6.25 6.67 6.72 6.69 6.6 is an outlier."
## [1] "6.9 in 6.6 6.9 NA NA 6.67 6.72 6.69 6.6 is an outlier."
A summary of the cq values can be given via a data.cq.sum data frame. Samples which are names numeric can be filtered.
Cq.data.mean(CqType = "Regression", onlyNumeric = TRUE)
data.Cq.sum
## sample mean.Cq.target sd.Cq.target mean.Cq.offtarget sd.Cq.offtarget
## 1 100 5.655000 0.05182388 11.108571 0.06067085
## 2 93 7.320000 0.03023716 10.911250 0.13357047
## 3 88 7.243750 0.23114235 9.940000 0.14985707
## 4 81 7.366667 0.07284687 9.162500 0.25262903
## 5 72 7.483750 0.53401277 8.313750 0.22815643
## 6 46 7.062857 0.10703804 7.080000 0.09856108
## 7 22 8.977500 0.20748494 7.181250 0.18703991
## 8 14 9.991250 0.12170661 7.188750 0.15514394
## 9 8 10.705714 0.22059444 6.925714 0.08734169
## 10 4 11.931429 0.10745985 7.042500 0.20919915
## 11 0 13.107500 0.23699005 5.587143 0.17895198
Here all samples that are in the data.cq list will be plotted for one Cq type.
plot.Cq(CqType = "Regression", onlyNumeric = TRUE)
Here all samples that are in the data.cq list will be plotted for one Cq type against the log of the concentration. Therefore the sample name has to be a numeric value which corresponds to the target concentration.
<- plot.Cq.efficiency(CqType = "Regression")
plot
+ theme_tq() + scale_fill_tq() + scale_colour_tq() plot
To get the delta Cq values the function delta.Cq.data() can be used. It has different methods to calculate the differences…
Here we use all combinations of Cq values from one sample.
delta.Cq.data(CqType = "Regression", method = "c", onlyNumeric = TRUE)
head(delta.Cq)
## sample delta.Cq
## 1 0 -7.47
## 2 0 -7.16
## 4 0 -7.50
## 5 0 -7.24
## 6 0 -7.10
## 7 0 -7.03
The difference Plot for the target genotype with the other genotype and vice versa.
<- plot.delta.Cq.differences()
plot +
plot theme_bw() +
annotate("label", x = 40, y = 6, label = "bitter") +
annotate("label", x = 40, y = -7, label = "sweet")
A typical box plot representation of delta Cq values. This is of cause only for demonstration purposes as these plots can be made in various ways from the data with standard visualisation methods. (Obviously this makes no sense with “delta.Cq.data(method=”mean”)“.
<- plot.delta.Cq.box(points = TRUE)
plot + theme_bw() plot
At this point all samples need to be numeric and represent a percentage!
All samples should be between 0 % and 100 %. Samples that are not numeric and between 0 to 100 will be skipped.
An overview can be plotted for different Cq types from data.Cq.
plot.delta.Cq.overview(CqType = c("Regression", "Autocalculated"))
A model (linear, poly3, …) can be used to describe the values from DMAS-qPCR. The R package investR is used for prediction and plotting. And the package caret for cross validation.
regression.delta.Cq(CqType = "Regression", fit = "poly3", cv = TRUE, cvComplete = TRUE)
## [1] "CROSS VALIDATION:"
## Linear Regression
##
## 641 samples
## 1 predictor
##
## No pre-processing
## Resampling: Repeated Train/Test Splits Estimated (25 reps, 50%)
## Summary of sample sizes: 322, 322, 322, 322, 322, 322, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 0.4195236 0.9875655 0.3144013
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
##
## =====
## Leave one proportion completely out - mean cross validation error [percentage points]:
## [1] 1.988061
The model can be analysed by standard methods:
# Model Summary:
summary(model.delta.Cq)
##
## Call:
## lm(formula = func, data = modelData.delta.Cq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.36349 -0.18907 -0.01748 0.24451 1.06536
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.50917 0.01635 -31.14 <2e-16 ***
## poly(sample, 3, raw = FALSE)1 90.58956 0.41398 218.83 <2e-16 ***
## poly(sample, 3, raw = FALSE)2 -6.60595 0.41398 -15.96 <2e-16 ***
## poly(sample, 3, raw = FALSE)3 24.26637 0.41398 58.62 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.414 on 637 degrees of freedom
## Multiple R-squared: 0.9878, Adjusted R-squared: 0.9877
## F-statistic: 1.719e+04 on 3 and 637 DF, p-value: < 2.2e-16
# Plot the residuals:
plot(summary(model.delta.Cq)$residuals, main = "Residuals for fit", ylab = "residuals")
Values can be predicted with investr package. The mean.response should be set to TRUE. But one could argue that the intrinsic bias from the combinatoric data for model creation is not fit for a mean response and a individual response should be considered (mean.response = FALSE). This leads to enormous uncertainty around delta CQ of 0.
<- 0
predictionValue
<- invest(model.delta.Cq, y0 = predictionValue, interval = "inversion", level = 0.95, mean.response = FALSE)
res
plotFit(model.delta.Cq, interval = "both",
level = 0.95, shade = TRUE,
col.pred = "whitesmoke",
col.conf = "skyblue",
extend.range = TRUE,
xlab = "proportion of target genotype",
ylab = bquote(Delta*Cq))
abline(h = predictionValue, v = c(res$lower, res$estimate, res$upper), lty = 2, lwd = 0.5, col = "salmon")
For the prediction y = 0 the model predicts a response of 41.8601932 with upper 71.8788959 and lower 27.8077061 interval.
At this point all samples need to be numeric and represent a percentage!
All samples should be between 0 % and 100 %. Samples that are not numeric and between 0 to 100 will be skipped.
An overview can be plotted for different Cq types from data.Cq.
plot.delta.Cq.overview(CqType = c("Regression", "Autocalculated"), linSqrtTrans = TRUE, xlab = "shifted square root transformation of proportion")
The shifted square root transformation to linearise the data can be
used with the parameter linSqrtTrans = TRUE
regression.delta.Cq(CqType = "Regression", linSqrtTrans = TRUE, fit = "linear", cv = TRUE, cvComplete = TRUE)
## [1] "CROSS VALIDATION:"
## Linear Regression
##
## 641 samples
## 1 predictor
##
## No pre-processing
## Resampling: Repeated Train/Test Splits Estimated (25 reps, 50%)
## Summary of sample sizes: 322, 322, 322, 322, 322, 322, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 0.5380403 0.9795654 0.4280503
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
##
## =====
## Leave one proportion completely out - mean cross validation error [percentage points]:
## [1] 3.596187
The model can be analysed by standard methods:
# Model Summary:
summary(model.delta.Cq)
##
## Call:
## lm(formula = func, data = modelData.delta.Cq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.70624 -0.31956 0.03975 0.42038 1.16038
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.352907 0.021236 -16.62 <2e-16 ***
## sample 0.868448 0.004983 174.28 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5372 on 639 degrees of freedom
## Multiple R-squared: 0.9794, Adjusted R-squared: 0.9794
## F-statistic: 3.037e+04 on 1 and 639 DF, p-value: < 2.2e-16
Values can be predicted with investr package. The mean.response should be set to TRUE. But one could argue that the intrinsic bias from the combinatoric data for model creation is not fit for a mean response and a individual response should be considered (mean.response = FALSE). This leads to enormous uncertainty around delta CQ of 0.
<- 0
predictionValue
<- invest(model.delta.Cq, y0 = predictionValue, interval = "inversion", level = 0.95, mean.response = FALSE)
res
plotFit(model.delta.Cq, interval = "both",
level = 0.95, shade = TRUE,
col.pred = "whitesmoke",
col.conf = "skyblue",
extend.range = TRUE,
xlab = "proportion of target genotype",
ylab = bquote(Delta*Cq))
abline(h = predictionValue, v = c(res$lower, res$estimate, res$upper), lty = 2, lwd = 0.5, col = "salmon")
For the prediction y = 0 the model predicts a response of 55.5817329 with upper 70.3085017 and lower 39.2111252 interval.
We look at an example from the study where we examined bitter and sweet almond samples. Here, two datasets from sweet M0 marzipan and from self-made bitter marzipan are used as an example. We use the regression analysis from before. The data in this part has no concentration and we do not look at the fluorescence data.
Let use first import the bitter marzipan data. The .csv table is manually saved from the .xlxs excel supporting data file. We can see that there are two errors. First the column names are not as described by the package which are automatically renamed and we can ignore this error. Secondly a warning about the genotype is presented. This is from the NTC measurement and we can delete this row.
Next we can add this data to the data.cq dataframe. Actually we make a new one, because we do not want to add the data to the regression data we did before.
read.cqTable(csv = "DMASqPCR_bitter_marzipan.csv")
## Warning in read.cqTable(csv = "DMASqPCR_bitter_marzipan.csv"): Table does not
## colnames correct! First three needs to be 'type','sample','well'. Renaming, but
## not checking anything! Please manually check.
## Warning in read.cqTable(csv = "DMASqPCR_bitter_marzipan.csv"): There are not two
## genotypes present in the data. This may cause massive trouble! Maybe misspelled?
<- input.cq[1:(nrow(input.cq)-1),]
input.cq
make.Cq.data(add = FALSE, target = "bitter", CqType = "Regression")
## [1] "Sample debittered 1 :"
## [1] "Cq type Regression processing:"
## [1] "Sample debittered 2 :"
## [1] "Cq type Regression processing:"
## [1] "Sample debittered 3 :"
## [1] "Cq type Regression processing:"
## [1] "Sample debittered 4 :"
## [1] "Cq type Regression processing:"
## [1] "Sample debittered 5 :"
## [1] "Cq type Regression processing:"
## [1] "Sample debittered 6 :"
## [1] "Cq type Regression processing:"
Additionally we can think about removing the “debittered 6” experiment as only values for one genotype are present. But here we just leave them be.
Now we add the values from the sweet Marzipan experiment. Here we have a problem with the sample names as they are called “Marzipan1” and so on. we have to manually change it to “Marzipan 1” to be coherent with the other naming…
read.cqTable(csv = "DMASqPCR_sweet_marzipan.csv")
## Warning in read.cqTable(csv = "DMASqPCR_sweet_marzipan.csv"): Table does not
## colnames correct! First three needs to be 'type','sample','well'. Renaming, but
## not checking anything! Please manually check.
$sample <- sub("([0-9])"," \\1",input.cq$sample)
input.cq
make.Cq.data(add = TRUE, target = "bitter", CqType = "Regression")
## [1] "Sample Marzipan 1 :"
## [1] "Cq type Regression processing:"
## [1] "Sample Marzipan 2 :"
## [1] "Cq type Regression processing:"
## [1] "Sample Marzipan 3 :"
## [1] "Cq type Regression processing:"
Now we can combine our subsamples and have two entries in the data.Cq!
combineSubsamples(delimiter = " ", useMeans = TRUE)
data.Cq
## $debittered
## $debittered$Regression
## Regression.bitter Regression.sweet
## 1 18.07667 18.83333
## 2 18.92667 20.02667
## 3 18.12000 19.26000
## 4 19.19333 19.97500
## 5 18.67000 19.27667
## 6 20.55667 NaN
##
##
## $Marzipan
## $Marzipan$Regression
## Regression.bitter Regression.sweet
## 1 20.3275 14.9675
## 2 22.2100 15.7650
## 3 21.4750 14.7950
Note that in this case we have searched for outliers two times. When
we make or add the data.Cq in one subsample and when combining all
subsamples. If the means are used, then the means will be compared for
outliers! So in this case it makes sense. If we work with
useMeans = FALSE
than the outlier test in the
make.Cq.data()
function should be omitted!
Again the typical box plot representation of delta Cq values but now we include the predictions from the previously generated regression model.
First we have to generate the delta Cq values (this will overwrite the delta.Cq dataframe from regression analysis) As the model in the environment we have to tell the prediction to reTransform the values. We also plot the P value from a t.test for the two boxplots.
delta.Cq.data(method = "combinatorial")
plot.delta.Cq.box(useModel = TRUE, reTransform = TRUE, predictionRange = seq(-6,2,2), pVal = TRUE)
This should be an example for possible Boxplot analysis. These functions do not have the goal to be complete or to cover all possibilities.
Happy R-coding have a nice day!