Talk:Singular value decomposition: Difference between revisions
Jitse Niesen (talk | contribs) archive, move "new" section on complexity down and reply |
→Netflix: new section |
||
Line 138: | Line 138: | ||
:: However, [http://campar.in.tum.de/twiki/pub/Chair/TeachingWs05ComputerVision/3DCV_svd_000.pdf another PDF] makes it look like it is <math>O(n^2k + nk^2+ k^3)</math>. [[User:BenFrantzDale|—Ben FrantzDale]] ([[User talk:BenFrantzDale|talk]]) 14:41, 13 November 2008 (UTC) |
:: However, [http://campar.in.tum.de/twiki/pub/Chair/TeachingWs05ComputerVision/3DCV_svd_000.pdf another PDF] makes it look like it is <math>O(n^2k + nk^2+ k^3)</math>. [[User:BenFrantzDale|—Ben FrantzDale]] ([[User talk:BenFrantzDale|talk]]) 14:41, 13 November 2008 (UTC) |
||
:::I added a bit on computations and complexity to the algorithm. The different complexities that BenFrantzDale mentions are probably due to what you compute. Computing only the singular values is cheaper then computing the singular values and the left and right singular vectors (i.e., the matrices ''U'' and ''V''). -- [[User:Jitse Niesen|Jitse Niesen]] ([[User talk:Jitse Niesen|talk]]) 16:37, 15 December 2008 (UTC) |
:::I added a bit on computations and complexity to the algorithm. The different complexities that BenFrantzDale mentions are probably due to what you compute. Computing only the singular values is cheaper then computing the singular values and the left and right singular vectors (i.e., the matrices ''U'' and ''V''). -- [[User:Jitse Niesen|Jitse Niesen]] ([[User talk:Jitse Niesen|talk]]) 16:37, 15 December 2008 (UTC) |
||
== Netflix == |
|||
There is an |
|||
[http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html?_r=1&partner=permalink&exprod=permalink&pagewanted=all article] in the New York Times Magazine of Nov 23, 2008. It mentions that the SVD decomposition is the main tool for generating recommendations such as those in [[Netflix]], but does not give details. Perhaps somebody who understands this stuff could write a few paragraphs about this application? |
|||
-- [[Special:Contributions/194.24.158.8|194.24.158.8]] ([[User talk:194.24.158.8|talk]]) 14:48, 28 December 2008 (UTC) |
Revision as of 14:48, 28 December 2008
Mathematics Unassessed Mid‑priority | ||||||||||
|
This article may be too technical for most readers to understand. |
|
|
Dimensions of matrices incorrect in statement of theorem
I think it says m-by-m when it should say m-by-n, and there may be other problems. Currently this article says:
Suppose M is an m-by-n matrix whose entries come from the field K, which is either the field of real numbers or the field of complex numbers. Then there exists a factorization of the form
where U is an m-by-m unitary matrix over K, the matrix Σ is m-by-n with nonnegative numbers on the diagonal and zeros off the diagonal, and V* denotes the conjugate transpose of V, an n-by-n unitary matrix over K.
That seems definitely wrong to me. For example, the matrix which has nonzero diagonal elements should be a square matrix -- otherwise it makes no sense. Compare the treatment in Press et al., Numerical Recipes in C, p. 59:
Any M x N matrix A whose number of rows M is greater than or equal to its number of columns N, can be written as the product of an M x N column-orthogonal matrix U, an N x N diagonal matrix W with positive or zero elements (the singular values), and the transpose of an N x N orthogonal matrix V.
And it gives the equation like .
I think I see how to fix it. I think the three matrices are in the same order in the two formulations and M corresponds to m and N corresponds to n, so all I have to do is fix it to say m-by-n for U and n-by-n for sigma, to match up with Press et al.'s formulation. --Coppertwig (talk) 03:35, 25 December 2007 (UTC)
- they are essentially the same, with trivial difference in the proofs. i've reverted your change for consistency of article. Mct mht (talk) 04:08, 25 December 2007 (UTC)
- You're right. I was wrong. Thanks for watching and fixing it. My edit had a rectangular matrix being described as "unitary" with a link to a page about square matrices, which made no sense. I suppose I vaguely remember now having previously seen such a thing as the diagonal of a rectangular matrix after all; anyway, that's consistent with the proof beneath it, as you say, and with the page diagonal matrix. I inserted a few words to emphasize/clarify that it's talking about the diagonal of a (potentially) non-square matrix. Does this proof work regardless of whether m > n or m < n? Press specifies M > N. (of course also possible m=n). --Coppertwig (talk) 13:21, 25 December 2007 (UTC)
- yes, the proofs for M > N, M < N, and M = N are almost verbatim the same, only difference is whether U is square, V is square, or both are square. for example, suppose Σ has more rows then columns. if you chop off the zero part to make Σ square, then U becomes rectangular (and column-orthonormal) but V remains square. this would be the formulation you brought up above. Mct mht (talk) 05:02, 29 December 2007 (UTC)
the spectral theorem
I have removed the following text from the teaser.
"The spectral theorem says that normal matrices, which by definition must be square, can be unitarily diagonalized using a basis of eigenvectors. The SVD can be seen as a generalization of the spectral theorem to arbitrary, not necessarily square, matrices."
It would be very difficult to give a mathematically correct interpretation of the above. The spectral theorem says that a normal matrix can be written as UDU^(inverse). The SVD says that an arbitrary matrix can be written as UDV, where V has absolutely nothing to do with U.
Phrenophobia (talk) 09:06, 10 February 2008 (UTC)
- There was actually a mathematically correct version of it later in the article, as long as one is flexible about the meaning of "generalization". For hermitian positive semi-definite matrices the SVD and unitary diagonalization are the same, and in general the eigenvalue decomposition of the HPSD matrix MM^* is very closely related to the SVD of M. However, I think contrasting the decompositions as you do here is also important, so I included that as well ("V is unrelated to U"). JackSchmidt (talk) 16:23, 10 February 2008 (UTC)
- Another way to phrase the idea mathematically and correctly: "The spectral theorem says that every normal endomorphism of a (complex with hermitian p.d. inner product) vector space is diagonal up to a rotation of the coordinate system. The SVD shows that every homomorphism between distinct (complex with hermitian p.d. inner product) vector spaces is diagonal up to rotations of their coordinate systems." This is nice because it says that once you have a nice enough geometry on the vector spaces, every nice enough operator is just diagonal, it just scales the coordinates. I believe this is sometimes phrased as every operator takes circles to ellipses. The reason the two ideas of eigenvalue decomposition and SVD are not really related as "generalization", but rather are more "analogous", is that there is a very large difference between an endomorphism of one vector space, and a homomorphism between two vector spaces ("U is unrelated to V"). I'm not sure if such a geometric version should be added, but perhaps the geometric meaning section could do with some improvement (maybe a picture). JackSchmidt (talk) 16:35, 10 February 2008 (UTC)
- it's not really correct to say U and V are unrelated. Mct mht (talk) 00:12, 11 February 2008 (UTC)
- Right. However, I find it hard to find a better formulation. I went for "U and V are not related except through M". I also made some more changes to the text that Jack added.
- I'm not so fond of the sentence "The eigenvalue decomposition only applies to square matrices, but gives an extremely tidy decomposition, while the SVD applies to any matrix, but gives a looser decomposition." What do you want to say here? In what sense is the SVD looser?
- I agree with Jack I think the article can be improved by combining this with the geometric section, add a picture there, and perhaps even moving it higher up because it gives a nice explanation about what the SVD is and why we're interested in it. I quite like the description in Cleve Moler's "Professor SVD" (which says that this article is "quite good", so well done!). -- Jitse Niesen (talk) 12:02, 11 February 2008 (UTC)
- I very much like "not related except through M". They are base changes for different vector spaces, and those vector spaces are only related through M. Thanks very much for the better phrasing.
- The new changes introduced a subtle math error. The SVD is not simply a diagonalization, but a positive semi-definite diagonalization, and so the only matrices for which the eigenvalue decomposition and the singular value decomposition are the same are the hermitian positive semi-definite matrices. The new text brings in non-unitarily diagonalizable matrices, which strains the analogy between the two decompositions, but I could not decide if the strain was good for the article or no.
- The troublesome sentence can be removed, *especially* if a geometric version is added. In case someone wants to salvage it: The idea is that the eigenvalue decomposition takes n^2 entries and returns n^2+n invariants, but the singular value decomposition of a square matrix takes n^2 entries and returns 2n^2+n invariants, so it has a much larger space to choose from, yet it describes nearly the same set of matrices. I think the assignment of these decompositions is not continuous in the matrix entries (though I could be wrong, I think there is an exercise somewhere to just give the "formula" for the SVD of a small general matrix; but I think [1,0;0,e] as e varies over a neighborhood of 0 shows discontinuity), so viewing it via dimensions is probably flawed. The informal sense in which it is looser, is simply that the SVD needs two base changes to do what the eigenvalue decomposition does with one. I'll read Prof SVD's stuff. JackSchmidt (talk) 17:52, 11 February 2008 (UTC)
- Sorry for all the minor edits. The map from SVDs to matrices has a small kernel, the map from eigenvalue decomps to matrices has an even smaller kernel, the topological dimension of the space of all SVDs is much larger (even if one were to penalize it for the kernel) than the dimension of the eigenvalue decompositions, yet their images are both dense in the set of all matrices, so the eigenvalue decomposition does "more with less". I'll stress again the informal sense, since it is so much easier to explain: the SVD needs two base changes to do what the eigenvalue decomposition does with one. JackSchmidt (talk) 18:07, 11 February 2008 (UTC)
- Thanks for catching my error. However, I don't agree with the rest of your post. Are you taking into account that the U and V in the SVD have to be unitary? That goes a long way towards resolving the difference. I don't feel like counting dimensions, but it does seem to be roughly similar. -- Jitse Niesen (talk) 18:49, 11 February 2008 (UTC)
(←) I think User:KYN has now fixed up this section nicely (thanks!), but at some point we should add the eigshow picture from the Prof SVD article (also from Trefethen and Bau and just plain old matlab). I am happy letting the "looser" argument go, but I think I did take into account the unitary part, as the article Unitary group says, "The unitary group U(n) is a real Lie group of dimension n^2." However, I may have made a real/complex dimensional error, which would also go a long way towards fixing the problem, so you may be right. The huge problem with the looser argument is that it is comparing apples to oranges (not just real/complex, but which matrices can be represented, which can be represented uniquely after replacing some things with quotient topological spaces, and the fundamental difference: SVD is for homomorphisms and eigendecomposition is for endomorphisms). To be clear, this is why I removed the sentence from the article way back, and only presented the argument to be salvaged. I don't actually hold that it is true, merely that I suspect someone could make a true and interesting statement from it (after finding the right conditions on the two decompositions to make them comparable). JackSchmidt (talk) 22:16, 11 February 2008 (UTC)
- Yup real/complex error. I think they have roughly the same dimension. Thanks for catching it, otherwise I'd probably have continued in that mistaken impression for years! If you do happen to salvage the argument somehow, let me know (even on talk page if it is not worth inclusion in an article). JackSchmidt (talk) 22:23, 11 February 2008 (UTC)
SVD on Scroll Matrix
Is this known to be trivial or should it be cited in this Article?
(* complete dictionary of words, one per row, number of rows is (alpha^word), using words of length "word" and an alphabet of "alpha" number of characters *)
alpha=4;word=3;dict=.;
dict=Partition[Flatten[Tuples[Reverse[IdentityMatrix[alpha]],word]],(alpha*word)]
PseudoInverse[dict]==((Transpose[dict])*((alpha)^-(word -1)))-((word - 1)/(alpha^word)/word)
True
There is nothing special for alpha=4;word=3 - The method works for all cases where word < alpha, but that is a more verbose loop, and the Mathematica PseudoInverse function takes too long to calculate for values of alpha and word that are large, whereas the transpose side of the equation is fast.
100TWdoug (talk) 05:54, 25 March 2008 (UTC)
If there are no objections, I will cite this to http://www.youvan.com as verbose Mathematica code, unless another editor wants to use formal notation. This particular example of alpha = 4; word = 3 happens to cover the biological genetic code, where the alpha(bet) is the 4 nucleotides (A, C, G, T) and the word(length) is the nucleotide triplet (e.g., 3) for the codon.50MWdoug (talk) 00:36, 1 April 2008 (UTC)
- I'm not sure I understand it correctly, but it seems you have a formula for the pseudoinverse of a particular matrix (the variable
dict
in your code). That has little to do with the SVD. - Even if you have a formula for the SVD of your matrix dict, I doubt it belongs in the article. The SVD of many matrices can be calculated explicitly, but I think only important ones should be mentioned. I don't know what the matrix dict looks like (I can't read Mathematica code very well), or how widely used it is. Finally, we have a no-original-research policy. Personal websites are generally not deemed reliable sources, and www.youvan.com does seem like one.
- Sorry about this, but Wikipedia is meant to be an encyclopaedia. It's not a collection of all possible facts. -- Jitse Niesen (talk) 11:03, 1 April 2008 (UTC)
"PseudoInverse[dict]" is a Mathematica function that runs SVD on over-determined or under-determined matrices. (Wolfram could just as well labeled this function "SVD"). So, your comment: "The SVD of many matrices can be calculated explicitly" is very interesting if these matrices are nontrivial. How might we determine whether the right side of the equation, involving Tuples, is either trivial are already known? Thank you for looking at this. 50MWdoug (talk) 01:01, 2 April 2008 (UTC)
Linear Algebraic Proof?
The section "Calculating the SVD" refers to, and provides a dead link to, a "linear algebraic proof." Was this removed? If so, that's unfortunate, because it would be nice to provide a proof that doesn't rely on heavy machinery such as the spectral theorem or Lagrange multipliers. The book by Trefethen and Bau, mentioned in the same section, provides a straightforward inductive linear algebraic proof that requires only a simple metric space compactness argument (not explicitly given in in the book). Jbunniii (talk) 03:51, 5 April 2008 (UTC)
- gotta be kidding me. what you call "heavy machinery" is not at all. whatever "simple metric space compactness argument" you are referring to probably resembles the variational proof given in article: M*M is self-adjoint, so characterize its eigenvalues variationally, using compactness, and you're done.
- as for the deadlink, i think someone who didn't know what a proof is changed the section name from "linear algebraic proof".
- problem with this particular article is that some people edit/comment without knowing what they're talking about. as a result, we have an article that's not as organized as it can be and repeats the obvious unnecessarily. Mct mht (talk) 09:37, 5 April 2008 (UTC)
- Trefethen and Bau treat the SVD (way) before they treat eigenvalues, so in that context the spectral theorem is heavy machinery. Their proof is indeed variational (they start by defining the first singular vector as the one that maximizes ||Mv|| over the unit sphere) but it also avoids Lagrange multipliers. I think it's quite a nice proof, but I don't want to add a third proof which is similar to the proofs that are already in the article. -- Jitse Niesen (talk) 15:27, 7 April 2008 (UTC)
References to isometries in SVD w/ spectral theorm section
Reading this spectral derivation of the SVD I'm guessing that it would be fairly readable to somebody with first year linear algebra. The exception is the injection of the use of the term isometry:
"We see that this is almost the desired result, except that U1 and V1 are not unitary in general. U1 is a partial isometry (U1U*1 = = I ) while V1 is an isometry (V*1V1 = I ). To finish the argument, one simply has to "fill out" these matrices to obtain unitaries."
Reading the wikipedia page on partial isometry doesn't clarify things too much. In that article there's references to:
Hilbert spaces, functional analysis, linear map, final and initial subspaces, isometric map.
It's all very abstract, much more so than ought to be required to read this section (I was looking for an explaination of SVD for nothing more than real matrices, so having to understand all of this stuff first would require a major digression).
I'm still figuring all this stuff out for myself so I'm not sure how one would rewrite this in a clear fashion (intuition says this would involve only references to projection and not isometries). Perhaps somebody in the know already understands how this could be made clearer, and could do so.
Peeter.joot (talk) 15:42, 13 April 2008 (UTC)
- The article just means that U1 and V1 are not square matrices, so are not technically unitary. However, they can be obviously or arbitrarily extended to unitary matrices by adding on some more orthonormal columns. Basically you can ignore the sentence using the word isometry. JackSchmidt (talk) 15:52, 13 April 2008 (UTC)
Confusing to use Capital Sigma as both variable and as summation
Young readers will be confused by the use of capital sigma for two different purposes, as the sign of summation, and as the diagonal matrix of eigenvalues. When I learned this stuff 50 years ago they used lambda for the values. 171.66.187.197 (talk) 13:13, 1 July 2008 (UTC)
- I believe the notation used in the article is common practice. can not be used, because singular values are not to be confused with eigenvalues. Also, and are easy to tell apart, especially since they are rarely used together. This only happens in section Separable models, because it uses capital sigma to denote singular values, which is inconsistent with the remaining article. I'm going to fix that now. --Drizzd (talk) 16:15, 2 July 2008 (UTC)
Superscript '+' (plus) symbol meaning is overloaded in a confusing way
Overall, I think the article on singular value decomposition is very clear and a great help in learning about this topic. In the expression defining the pseudoinverse, it says:
++*, where + is the is the transpose of with every nonzero entry replaced by its reciprocal.
However, the superscript '+' (plus) symbol attached to the variable is not meant to modify in the same way as . I would suggest a different superscript. -- Richard C Yeh, 17 September 2008 —Preceding unsigned comment added by 204.124.117.40 (talk) 19:46, 17 September 2008 (UTC)
OK, I see that the same '+' symbol is used throughout the Moore-Penrose pseudoinverse article. I take back my comment, since the pseudoinverse of would be equal to +. —Preceding unsigned comment added by 204.124.117.40 (talk) 19:50, 17 September 2008 (UTC)
Computational complexity
I cannot find anything about the computational complexity. Did I miss it? If I did not, it is quite important, so could someone who knows that stuff include it in the article? Vplagnol 18:13, 8 February 2007 (UTC)
- I would like to see this as well. The article sketches out one algorithm: "If the matrix has more rows than columns a QR decomposition is first performed. The factor R is then reduced to a bidiagonal matrix. The desired singular values and vectors are then found by performing a bidiagonal QR iteration...". That makes it sound like the algorithm is . There is some discussion of the computational complexity of QR on QR_algorithm#The_practical_QR_algorithm, but the rest is fairly sparse. —Ben FrantzDale (talk) 15:06, 12 November 2008 (UTC)
- According to this PDF, if A is an matrix, then SVD is . It goes on to say that "Use of adaptive eigenspace computation when a new object is added to the set, whose SVD we already know, reduces the computational complexity to ." citing S. Chandrasekaran, B.S. Manjunath, Y.F. Wang, J. Winkeler and H.Zhang. An eigenspace update algorithm for image analysis. CVGIP, 1997.
- However, another PDF makes it look like it is . —Ben FrantzDale (talk) 14:41, 13 November 2008 (UTC)
- I added a bit on computations and complexity to the algorithm. The different complexities that BenFrantzDale mentions are probably due to what you compute. Computing only the singular values is cheaper then computing the singular values and the left and right singular vectors (i.e., the matrices U and V). -- Jitse Niesen (talk) 16:37, 15 December 2008 (UTC)
Netflix
There is an article in the New York Times Magazine of Nov 23, 2008. It mentions that the SVD decomposition is the main tool for generating recommendations such as those in Netflix, but does not give details. Perhaps somebody who understands this stuff could write a few paragraphs about this application? -- 194.24.158.8 (talk) 14:48, 28 December 2008 (UTC)