<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Visualising and exploring multivariate datasets using singular value decomposition and self organising maps</title>
	<atom:link href="http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/</link>
	<description></description>
	<pubDate>Fri, 21 Nov 2008 20:53:20 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
		<item>
		<title>By: shane</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-33529</link>
		<dc:creator>shane</dc:creator>
		<pubDate>Mon, 10 Nov 2008 03:13:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-33529</guid>
		<description>Nice tutorial, thanks!

PS. 
   Sim.x = 6
should be
   som.x = 6</description>
		<content:encoded><![CDATA[<p>Nice tutorial, thanks!</p>
<p>PS.<br />
   Sim.x = 6<br />
should be<br />
   som.x = 6</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: paul</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-19980</link>
		<dc:creator>paul</dc:creator>
		<pubDate>Mon, 16 Jun 2008 14:51:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-19980</guid>
		<description>Hi, I'm a math guy getting into bioinformatics. While the biology perplexes me, the math isn't so bad. Anyway, fyi, principal components and svd are basically the same thing. The main difference between the two methods is  mean centering. PCA does this whereas SVD does not. Also, usually the output of pca programs is the new coordinates for the data whereas for svd the output is 3 matrices that can be used to get the new coordinates. Every PCA program really does SVD with just some extra steps.</description>
		<content:encoded><![CDATA[<p>Hi, I&#8217;m a math guy getting into bioinformatics. While the biology perplexes me, the math isn&#8217;t so bad. Anyway, fyi, principal components and svd are basically the same thing. The main difference between the two methods is  mean centering. PCA does this whereas SVD does not. Also, usually the output of pca programs is the new coordinates for the data whereas for svd the output is 3 matrices that can be used to get the new coordinates. Every PCA program really does SVD with just some extra steps.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Singular Value Decomposition &#171; Mstobe&#8217;s Weblog</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-17096</link>
		<dc:creator>Singular Value Decomposition &#171; Mstobe&#8217;s Weblog</dc:creator>
		<pubDate>Fri, 16 May 2008 23:15:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-17096</guid>
		<description>[...] 16, 2008 at 11:15 pm (Uncategorized)  Today I popped into a quite interesting discussion from Mike Barton about Singular Value Decomposition in biology, which is what I&#8217;m working on [...]</description>
		<content:encoded><![CDATA[<p>[...] 16, 2008 at 11:15 pm (Uncategorized)  Today I popped into a quite interesting discussion from Mike Barton about Singular Value Decomposition in biology, which is what I&#8217;m working on [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: chris</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-12730</link>
		<dc:creator>chris</dc:creator>
		<pubDate>Tue, 11 Mar 2008 21:43:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-12730</guid>
		<description>Great round up Mike - very useful!

re: transformation. As a side-note, if multiplying subsets of U and V, you should also multiply by the equivalent elements of D (the eigenvalues).</description>
		<content:encoded><![CDATA[<p>Great round up Mike - very useful!</p>
<p>re: transformation. As a side-note, if multiplying subsets of U and V, you should also multiply by the equivalent elements of D (the eigenvalues).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hanif Khalak</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-12432</link>
		<dc:creator>Hanif Khalak</dc:creator>
		<pubDate>Fri, 07 Mar 2008 23:37:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-12432</guid>
		<description>Very nice post and blog in general - thanks for the tutorial.  I'm interested in using matrix factorization as well, to integrate "multi-omics" datasets into a correlation model.  I found a nice MATLAB example here:  http://www.stanford.edu/~boyd/cvx/examples/html/nonneg_matrix_fact.html

As for R code for NMF, I haven't found anything either, but did see this exchange on R-help:

http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg88593.html

BTW, the original "normalize" line should work fine if you prepend the line "library(som)".</description>
		<content:encoded><![CDATA[<p>Very nice post and blog in general - thanks for the tutorial.  I&#8217;m interested in using matrix factorization as well, to integrate &#8220;multi-omics&#8221; datasets into a correlation model.  I found a nice MATLAB example here:  <a href="http://www.stanford.edu/~boyd/cvx/examples/html/nonneg_matrix_fact.html" rel="nofollow">http://www.stanford.edu/~boyd/cvx/examples/html/nonneg_matrix_fact.html</a></p>
<p>As for R code for NMF, I haven&#8217;t found anything either, but did see this exchange on R-help:</p>
<p><a href="http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg88593.html" rel="nofollow">http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg88593.html</a></p>
<p>BTW, the original &#8220;normalize&#8221; line should work fine if you prepend the line &#8220;library(som)&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-7256</link>
		<dc:creator>Mike</dc:creator>
		<pubDate>Thu, 13 Dec 2007 19:11:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-7256</guid>
		<description>Hi Glen,

Thanks for finding this bug. This script worked when I wrote it a few months back. I can't find the normalize function now, which is really weird. I was in Spain when I wrote this, but doesn't really explain much.

You can get a similar row-normalise functionality by transposing the data, normalising the data by columns then transposing back. This is quite awkward but is the first solution that came to mind.

crabs.x &lt;- as.matrix(t(scale(t(crabs[,4:8]))))</description>
		<content:encoded><![CDATA[<p>Hi Glen,</p>
<p>Thanks for finding this bug. This script worked when I wrote it a few months back. I can&#8217;t find the normalize function now, which is really weird. I was in Spain when I wrote this, but doesn&#8217;t really explain much.</p>
<p>You can get a similar row-normalise functionality by transposing the data, normalising the data by columns then transposing back. This is quite awkward but is the first solution that came to mind.</p>
<p>crabs.x <- as.matrix(t(scale(t(crabs[,4:8]))))</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Glenn Hammonds</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-6939</link>
		<dc:creator>Glenn Hammonds</dc:creator>
		<pubDate>Sun, 09 Dec 2007 02:20:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-6939</guid>
		<description>I hope you don't mind a bug request ...

&#62; crabs.x &#60;- as.matrix(normalize(crabs[,4:8]))
Error in as.matrix(normalize(crabs[, 4:8])) : 
  could not find function "normalize"

any hints?  Are you assuming a standard package I don't have?</description>
		<content:encoded><![CDATA[<p>I hope you don&#8217;t mind a bug request &#8230;</p>
<p>&gt; crabs.x &lt;- as.matrix(normalize(crabs[,4:8]))<br />
Error in as.matrix(normalize(crabs[, 4:8])) :<br />
  could not find function &#8220;normalize&#8221;</p>
<p>any hints?  Are you assuming a standard package I don&#8217;t have?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bioinformatics Zen &#187; Deriving biological meaning from principle components analysis</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-1795</link>
		<dc:creator>Bioinformatics Zen &#187; Deriving biological meaning from principle components analysis</dc:creator>
		<pubDate>Wed, 01 Aug 2007 15:56:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-1795</guid>
		<description>&lt;p&gt;[...] made an excellent point on my previous post - principle components analysis is great and all that, but the results have [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] made an excellent point on my previous post - principle components analysis is great and all that, but the results have [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Animesh Sharma</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-1611</link>
		<dc:creator>Animesh Sharma</dc:creator>
		<pubDate>Thu, 19 Jul 2007 18:45:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-1611</guid>
		<description>Had no idea about "negative matrix factorisation". I have been looking for such a thing me try that and see how it goes :)
That Tamayo et al. paper is classic one, I could not understand anything when I first read, now slowly it is making sense to me. They have used a non variant criterion (with their own definition) for thresholding. Further the expression thresholding does bring the dimension to 3 digits. I feel everyone should read (even those who are averse to gene expression analysis papers), good stuff for sure, thanks.</description>
		<content:encoded><![CDATA[<p>Had no idea about &#8220;negative matrix factorisation&#8221;. I have been looking for such a thing me try that and see how it goes <img src='http://www.bioinformaticszen.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
That Tamayo et al. paper is classic one, I could not understand anything when I first read, now slowly it is making sense to me. They have used a non variant criterion (with their own definition) for thresholding. Further the expression thresholding does bring the dimension to 3 digits. I feel everyone should read (even those who are averse to gene expression analysis papers), good stuff for sure, thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-1596</link>
		<dc:creator>Mike</dc:creator>
		<pubDate>Wed, 18 Jul 2007 10:55:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-1596</guid>
		<description>Hi Animesh,
I am in similar situation, multivariate analysis techniques are an important method, but the mechanics are difficult for me to understand since I have a biological rather than mathematical background. I´ll answer both you points with the limited knowledge I have though.

That principle components have no biological value, I would say yes and no. Without any interpretation, they are indeed meaningless abstractions that produce no additional knowledge about your data. However, as I have tried to show in the above example, with a little more intuition you can begin to interpret the what the components mean. In the crab data, a combination of the first and second components can be used to determine which species the crab is. But as you say, this is an arbitrary mathematical factorisation of the five morphological characteristics, which on it´s own is useless.

There is a technique call negative matrix factorisation, where the produced components have a more relevent meaning to the original data. This technique isn´t available for R, but is for Matlab, so if you have access it might be worth a try.

Secondly, I don´t how to do this with PCA, but it is possible to return to the original data from the SVD factored data. If the first two components contain 80% of the data, then I believe you can select the first two columns of the u matrix and the first two rows of the transposed v matrix, matrix multiply these together and I believe you will get your original data back containing 80% of the original variation. I think this could be a useful method for filtering noise from data.

As for SOMs, again my knowledge is limited, but I would suggest increasing the size of the grid. It might be worth reading &lt;a href="http://www.pnas.org/cgi/content/abstract/96/6/2907" rel="nofollow"&gt;this paper&lt;/a&gt; which is from the developers of the SOM package, where they applied it to gene expression data.</description>
		<content:encoded><![CDATA[<p>Hi Animesh,<br />
I am in similar situation, multivariate analysis techniques are an important method, but the mechanics are difficult for me to understand since I have a biological rather than mathematical background. I´ll answer both you points with the limited knowledge I have though.</p>
<p>That principle components have no biological value, I would say yes and no. Without any interpretation, they are indeed meaningless abstractions that produce no additional knowledge about your data. However, as I have tried to show in the above example, with a little more intuition you can begin to interpret the what the components mean. In the crab data, a combination of the first and second components can be used to determine which species the crab is. But as you say, this is an arbitrary mathematical factorisation of the five morphological characteristics, which on it´s own is useless.</p>
<p>There is a technique call negative matrix factorisation, where the produced components have a more relevent meaning to the original data. This technique isn´t available for R, but is for Matlab, so if you have access it might be worth a try.</p>
<p>Secondly, I don´t how to do this with PCA, but it is possible to return to the original data from the SVD factored data. If the first two components contain 80% of the data, then I believe you can select the first two columns of the u matrix and the first two rows of the transposed v matrix, matrix multiply these together and I believe you will get your original data back containing 80% of the original variation. I think this could be a useful method for filtering noise from data.</p>
<p>As for SOMs, again my knowledge is limited, but I would suggest increasing the size of the grid. It might be worth reading <a href="http://www.pnas.org/cgi/content/abstract/96/6/2907" rel="nofollow">this paper</a> which is from the developers of the SOM package, where they applied it to gene expression data.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Animesh</title>
		<link>http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-1593</link>
		<dc:creator>Animesh</dc:creator>
		<pubDate>Wed, 18 Jul 2007 07:05:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.bioinformaticszen.com/2007/07/exploring-multivariate-data-using-svd-and-som/#comment-1593</guid>
		<description>Great post, thanks :)
When I was reading about PCA and SVD, I also got confused. My Prof. jokingly said "Well PCA is what physicist call SVD"! 
Essentially when PCA is calculated using the covariance matrix, it is directly proportional to SVD. There is a nice tutorial at http://arxiv.org/ftp/physics/papers/0208/0208101.pdf  [PDF] which demonstrates this difference.
Problem I face using such dimensionality reduction techniques:
1. The components can not have a biological meaning attached to it, for e.g. if we use this for analyzing the Gene expression data we move from Gene space to these component space and the whole meaning attached to gene expression is lost in a way
2. Once you pick up the top few vectors which capture say about &#62; 80% of variance, there is no way to go back and retrieve the original data. So gene informations is lost and thus not a good way for feature selection.
Regarding SOFM, unless we use a good feature selector initially [pick good discriminating genes], the feature space is so huge for gene expression that SOFM literally cries.
I am not an expert in this field, but this is what I have felt after reading and using the material available on this. Would love to read more in case you have some pointers which address above problems.</description>
		<content:encoded><![CDATA[<p>Great post, thanks <img src='http://www.bioinformaticszen.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
When I was reading about PCA and SVD, I also got confused. My Prof. jokingly said &#8220;Well PCA is what physicist call SVD&#8221;!<br />
Essentially when PCA is calculated using the covariance matrix, it is directly proportional to SVD. There is a nice tutorial at <a href="http://arxiv.org/ftp/physics/papers/0208/0208101.pdf" rel="nofollow">http://arxiv.org/ftp/physics/papers/0208/0208101.pdf</a>  [PDF] which demonstrates this difference.<br />
Problem I face using such dimensionality reduction techniques:<br />
1. The components can not have a biological meaning attached to it, for e.g. if we use this for analyzing the Gene expression data we move from Gene space to these component space and the whole meaning attached to gene expression is lost in a way<br />
2. Once you pick up the top few vectors which capture say about &gt; 80% of variance, there is no way to go back and retrieve the original data. So gene informations is lost and thus not a good way for feature selection.<br />
Regarding SOFM, unless we use a good feature selector initially [pick good discriminating genes], the feature space is so huge for gene expression that SOFM literally cries.<br />
I am not an expert in this field, but this is what I have felt after reading and using the material available on this. Would love to read more in case you have some pointers which address above problems.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.329 seconds -->
