this is from NMF package from R studio: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.192.3637&rep=rep1&type=pdf
"Several approaches have been proposed to choose the optimal value of r. For example,
[Brunet et al., 2004] proposed to take the first value of r for which the cophenetic coefficient
starts decreasing, [Hutchins et al., 2008] suggested to choose the first value where the RSS curve
presents an inflection point, and [Frigyesi and H¨oglund, 2008] considered the smallest value at
which the decrease in the RSS is lower than the decrease of the RSS obtained from random
data. "
another useful link:
https://stackoverflow.com/questions/17199575/explain-extractfeatures-from-the-nmf-package-in-r
My name is Emine Guven. I am an applied mathematician and study quantitative biology. My interests are cellular aging, VEGF receptors clustering, math modeling of biological systems with a broad focus on data analysis and simulations.This site is reserve as a notebook to keep my studies fresh and open to my students and collaborators.
Friday, May 22, 2020
Wednesday, May 13, 2020
PCA vs NMF
1)PCA and NMF optimize for a different result.
2)PCA finds a new subspace which takes the same variance of the data and leads to a new feature. It is a dimension reduction method.
3)NMF finds nonnegative features of the given data, however one should be careful because NMF is very sensitive to initialization, and hence won’t find the same features every time.
4)Output of NMF can be visualized as a smaller version of original dataset so that one would not have to deal with bigger dataset.
5) NMF is more useful most of the time Interpretability. The key is that all of the features learned via NMF are additive; that is, every point in the transformed space can be constructed by adding together strictly positive features. (http://dx.doi.org/10.1109/IJCNN.2004.1381038)
Tuesday, August 6, 2019
Mixture Models
Gompertz Mixture Model
Nice tutorial: http://documentation.statsoft.com/STATISTICAHelp.aspx?path=Glossary/GlossaryTwo/C/CensoringCensoredObservations
Nice tutorial: http://documentation.statsoft.com/STATISTICAHelp.aspx?path=Glossary/GlossaryTwo/C/CensoringCensoredObservations
Monday, August 5, 2019
Python-machine learning auto data
Python, pandas and numpy installation
solves the problem
solves the problem
C:\> setx PATH "%PATH%;C:\<path\to\python\folder>\Scripts"
C:\> pip install pandas
I wasted time to find out the correct url link:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
Thursday, February 14, 2019
BLAST-Basic Local Alignment Search Tool
Standalone BLAST on Linux machine:
BLAST detects regions of local similarities between sequences.
BLAST detects regions of local similarities between sequences.
- How to Create your own database and search for the desired sequence?
- First way is wget on command line;
which is $ wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz
$ wget -c ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz
-c means to continue
- NCBI ftp website
Thursday, January 3, 2019
notes on Drummond et al "Why highly expressed proteins evolve slowly" paper
Some stuff:
However, those effects in functional density and measurements of residues in protein functions remained unclear.
30 years ago, Zuckerkandl proposed that a protein’s sequence will evolve at a rate primarily
determined by the proportion of its sites involved in specific
functions (or ‘‘functional density’’).
Wednesday, January 2, 2019
gene duplication in bacteria
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2787491/pdf/1745-6150-4-46.pdf
Nice article.
Nice article.
Subscribe to:
Posts (Atom)