Friday, January 19, 2024

Porbabilistic Machine Learning

 Stochastic Variational Inference

https://www.it.uu.se/research/systems_and_control/education/2018/pml/lectures/

Such a good source for probabilistic ML.

Integration: p(D) = INT( p(D |w)p(w)dt ) 

Optimization: D=arg max p(D|w) over w parameter


The three cornerstones: 

1. (Data) The observed data becomes useful when we have extracted knowledge from it. 

2. (Mathematical model) A mathematical model is a compact representation of the data that in precise mathematical form captures the key properties of the underlying situation. 

3. (Learning algorithm) Used to compute the unknown variables from the observed data using the model.


Key probabilistic objects (notation: D - measured data and w - unknown model variables): 

The full probabilistic model (joint distribution of all known and unknown variables present in the model) is given by

 p(D,w) = p(D |w) | {z } data distribution p(D|w) and {z } prior

 In the Bayesian setting learning amounts to computing the posterior distribution 

p(w | D) = [p(D |w) |  p(w)] /p(D) (i.e., p(D|w) likelihood , p(w) prior, p(D|w) marginal likelihood)


1)p(x, z) = p(x | z) | {z } likelihood p(z) |{z} prior


2)Marginal likelihood, p(x) = INT[ p(x, z)dz],  is highly intractable for many models of interest.


1)SVI is also called Stochastic: Markov chain Monte Carlo (MCMC), sequential Monte Carlo (SMC), stochastic variational inference 

2) Deterministic: variational inference, expectation propagation





Tuesday, November 28, 2023

Physical Modeling with Python

 https://physicalmodelingwithpython.blogspot.com/

Such a good blog.

 Entropy, Relative Entropy, Cross Entropy

https://www.iitg.ac.in/cseweb/osint/slides/Anasua_Entropy.pdf

It is basically coming from Real Analysis of Measure Theory goes beyond Information Theory: Shannon Entropy

https://en.wikipedia.org/wiki/Entropy_(information_theory)#Definition


Lesser the probability for an event, larger the entropy. >>> Entropy of a six-headed fair dice is log2(6).


Tuesday, June 28, 2022

Machine Learning Notes with R studio

 There are 4 types of machine learning algorithms:


1. Supervised (where the data is labeled outputs are categorical and regression--->numerical values)

2. Unsupervised (unlabeled data and to detect the structure of the input, and categorized as an output)

i.clustering

ii. dimension reduction (NMF, PCA..)

3. Semi-supervised (unsupervised techniques first later supervised methods on the same input)

4. Reinforcement learning (a method that uses feedback from operating in a real or synthetic environment.

Friday, April 29, 2022

Staying Fresh and UpToDate. One more project.. The decision of best rank preprint out on bioRxiv

 it has been too many months I have not posted here. Stay fresh notebook. 

Here you go one more project:

https://www.biorxiv.org/content/10.1101/2022.04.14.488288v1.article-metrics

Saturday, May 23, 2020

FPKM vs read counts of RNA-seq data


"A quick example of the technical aspect: 
assume a 1,000 bp transcript. experiment 1 is 5,000,000 total reads and this transcript received 5 bhits. This calculates out to an FPKM of 1.0. But that FPKM is based on only 5 hits which is entirely unreliable. experiment 2 has 100,000,000 total reads and this transcript has 100 hits. This also calculates out to an FPKM of 1.0 however this FPKM is much more reliable as it's based on 100 hits which is a more stable count level. the variance due to aligner error and count methods might only vary that count value by 5% whereas the count of 5 could vary by 80% or more."
 link of the given information: http://seqanswers.com/forums/showthread.php?t=30269