guvenmathbio: May 2020

Saturday, May 23, 2020

FPKM vs read counts of RNA-seq data

"A quick example of the technical aspect:
assume a 1,000 bp transcript. experiment 1 is 5,000,000 total reads and this transcript received 5 bhits. This calculates out to an FPKM of 1.0. But that FPKM is based on only 5 hits which is entirely unreliable. experiment 2 has 100,000,000 total reads and this transcript has 100 hits. This also calculates out to an FPKM of 1.0 however this FPKM is much more reliable as it's based on 100 hits which is a more stable count level. the variance due to aligner error and count methods might only vary that count value by 5% whereas the count of 5 could vary by 80% or more."
link of the given information: http://seqanswers.com/forums/showthread.php?t=30269

Friday, May 22, 2020

NMF for dummies

http://www.billconnelly.net/?p=534

estimating NMF rank how to choose r<=min(m,n)

this is from NMF package from R studio: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.192.3637&rep=rep1&type=pdf

"Several approaches have been proposed to choose the optimal value of r. For example, [Brunet et al., 2004] proposed to take the first value of r for which the cophenetic coefficient starts decreasing, [Hutchins et al., 2008] suggested to choose the first value where the RSS curve presents an inflection point, and [Frigyesi and H¨oglund, 2008] considered the smallest value at which the decrease in the RSS is lower than the decrease of the RSS obtained from random data. "

another useful link:
https://stackoverflow.com/questions/17199575/explain-extractfeatures-from-the-nmf-package-in-r

Wednesday, May 13, 2020

PCA vs NMF

1)PCA and NMF optimize for a different result.

2)PCA finds a new subspace which takes the same variance of the data and leads to a new feature. It is a dimension reduction method.

3)NMF finds nonnegative features of the given data, however one should be careful because NMF is very sensitive to initialization, and hence won’t find the same features every time.

4)Output of NMF can be visualized as a smaller version of original dataset so that one would not have to deal with bigger dataset.

5) NMF is more useful most of the time Interpretability. The key is that all of the features learned via NMF are additive; that is, every point in the transformed space can be constructed by adding together strictly positive features. (http://dx.doi.org/10.1109/IJCNN.2004.1381038)