Wednesday, May 13, 2020

PCA vs NMF

1)PCA and NMF optimize for a different result. 
2)PCA finds a new subspace which takes the same variance of the data and leads to a new feature. It is a dimension reduction method.
3)NMF finds nonnegative features of the given data, however one should be careful because NMF is very sensitive  to initialization, and hence won’t find the same features every time.
4)Output of NMF can be visualized as a smaller version of original dataset so that one would not have to deal with bigger dataset.
5) NMF is more useful most of the time Interpretability. The key is that all of the features learned via NMF are additive; that is, every point in the transformed space can be constructed by adding together strictly positive features.  (http://dx.doi.org/10.1109/IJCNN.2004.1381038)

Monday, August 5, 2019

Python-machine learning auto data

Python, pandas and numpy installation
solves the problem

C:\> setx PATH "%PATH%;C:\<path\to\python\folder>\Scripts"
C:\> pip install pandas
I wasted time to find out the correct url link:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"

Thursday, February 14, 2019

BLAST-Basic Local Alignment Search Tool

Standalone BLAST on Linux machine:

BLAST detects regions of local similarities between sequences.

  • How to Create your own database and search for the desired sequence?
  1. First way is wget on command line;
which is $   wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz
      $ wget -c ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz 
-c means to continue
  1. NCBI ftp website 

Thursday, January 3, 2019

notes on Drummond et al "Why highly expressed proteins evolve slowly" paper

Some stuff:


30 years ago, Zuckerkandl proposed that a protein’s sequence will evolve at a rate primarily determined by the proportion of its sites involved in specific functions (or ‘‘functional density’’).
However, those effects in functional density and measurements of residues in protein functions remained unclear.


Wednesday, January 2, 2019

gene duplication in bacteria

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2787491/pdf/1745-6150-4-46.pdf

Nice article.

Tuesday, November 6, 2018

SQL

Adapted from codecademy.com SQL course:

  • JOIN will combine rows from different tables if the join condition is true.
  • LEFT JOIN will return every row in the left table, and if the join condition is not met, NULL values are used to fill in the columns from the right table.
  • Primary key is a column that serves a unique identifier for the rows in the table.
  • Foreign key is a column that contains the primary key to another table.
  • CROSS JOIN lets us combine all rows of one table with all rows of another table.
  • UNION stacks one dataset on top of another.
  • WITH allows us to define one or more temporary tables that can be used in the final query.