Friday, May 22, 2020

estimating NMF rank how to choose r<=min(m,n)

this is from NMF package from R studio: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.192.3637&rep=rep1&type=pdf

"Several approaches have been proposed to choose the optimal value of r. For example, [Brunet et al., 2004] proposed to take the first value of r for which the cophenetic coefficient starts decreasing, [Hutchins et al., 2008] suggested to choose the first value where the RSS curve presents an inflection point, and [Frigyesi and H¨oglund, 2008] considered the smallest value at which the decrease in the RSS is lower than the decrease of the RSS obtained from random data. "


another useful link:
https://stackoverflow.com/questions/17199575/explain-extractfeatures-from-the-nmf-package-in-r

Wednesday, May 13, 2020

PCA vs NMF

1)PCA and NMF optimize for a different result. 
2)PCA finds a new subspace which takes the same variance of the data and leads to a new feature. It is a dimension reduction method.
3)NMF finds nonnegative features of the given data, however one should be careful because NMF is very sensitive  to initialization, and hence won’t find the same features every time.
4)Output of NMF can be visualized as a smaller version of original dataset so that one would not have to deal with bigger dataset.
5) NMF is more useful most of the time Interpretability. The key is that all of the features learned via NMF are additive; that is, every point in the transformed space can be constructed by adding together strictly positive features.  (http://dx.doi.org/10.1109/IJCNN.2004.1381038)

Monday, August 5, 2019

Python-machine learning auto data

Python, pandas and numpy installation
solves the problem

C:\> setx PATH "%PATH%;C:\<path\to\python\folder>\Scripts"
C:\> pip install pandas
I wasted time to find out the correct url link:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"

Thursday, February 14, 2019

BLAST-Basic Local Alignment Search Tool

Standalone BLAST on Linux machine:

BLAST detects regions of local similarities between sequences.

  • How to Create your own database and search for the desired sequence?
  1. First way is wget on command line;
which is $   wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz
      $ wget -c ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz 
-c means to continue
  1. NCBI ftp website 

Thursday, January 3, 2019

notes on Drummond et al "Why highly expressed proteins evolve slowly" paper

Some stuff:


30 years ago, Zuckerkandl proposed that a protein’s sequence will evolve at a rate primarily determined by the proportion of its sites involved in specific functions (or ‘‘functional density’’).
However, those effects in functional density and measurements of residues in protein functions remained unclear.


Wednesday, January 2, 2019

gene duplication in bacteria

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2787491/pdf/1745-6150-4-46.pdf

Nice article.