Jump to content

Talk:Chain rule

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Donvinzk (talk | contribs) at 11:34, 4 June 2011 (Multivariate chain rule). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

WikiProject iconMathematics Start‑class Mid‑priority
WikiProject iconThis article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
StartThis article has been rated as Start-class on Wikipedia's content assessment scale.
MidThis article has been rated as Mid-priority on the project's priority scale.

Initial comments

This page needs a proof and more rigorous maths.


situations for multivariable should be added: if u is a function of x, and y, and both x, and y are functions of t, then,

Example 1

I don't see how Ex 1 is a "This calculation is a typical chain rule application." The calculation doesn't seem to involve differentiation which I would argue is the typical chain rule application. --flatfish89 (talk) 18:24, 24 April 2010 (UTC)[reply]

New proof

I've replaced the current proof with one I saw somewhere. It relies on nothing but the definition of a derivative and the concept of limits. I believe it is quite rigourous and more formal than the previous one, but if there are any flaws with it feel free to point them out or even restore the old one.

Also, if you think it is too long or verbose then correct it.

Someone42 13:43, 25 May 2005 (UTC)[reply]

Isn't that really just the same proof, though? Written out longer. I don't think it is different from the point of view of rigour. Charles Matthews 14:06, 25 May 2005 (UTC)[reply]
I just reverted the thing. Someone42, this is not a math paper, not a math exam, not a math book. This is a general purpose encyclopedia.
I appreciate the many hours you put in that proof, I appreciate your knowledge of mathematics, and your desire for rigor.
However, nobody will read that proof, even mathematicians will gloss over it.
According to Wikipedia:WikiProject Mathematics/Proofs, proofs are discouraged on Wikipedia to start with, and long formal proofs especially. This has been discussed over and over again, and this seems to be the general view over here.
If anything, Wikipedia needs less rigor, not more. This is an encyclopedia for the general public and among its audience mathematicians are less than 10% if not less. Oleg Alexandrov 15:25, 25 May 2005 (UTC)[reply]
Oh well, in retrospect I think I did go a little overboard. Someone42 09:25, 26 May 2005 (UTC)[reply]
I can't really agree about 'rigour'. Anything less than a rigorous proof is just an argument/derivation/bootstrapping. However that is not really the issue here. And I agree with Oleg that the need for proofs is not so high. Charles Matthews 15:29, 25 May 2005 (UTC)[reply]
OK, I did not mean that rigurous proofs are bad overall. I am mathematician too, and I hope I know the value of proofs. I was trying to say that on Wikipedia we can get away with something less than full-blown proofs. So I agree that a proof is not a proof unless it is rigurous. Oleg Alexandrov 22:37, 25 May 2005 (UTC)[reply]
On the other hand, who but a mathmatician (or at least a maths student) would want to know about something as particular as the chain rule? I think Wikipedia is a good maths textbook and dumbing it down so the general public can understand won't add any value... 203.97.255.167 08:37, 22 May 2006 (UTC) (edit: having looked at that proof though, I have to agree that it was a bit overboard and didn't really add any rigour) 203.97.255.167 08:39, 22 May 2006 (UTC)[reply]

In addition, this proof relies on f(g(x+deltax))-f(g(x))=f(g;(x), which it does not. f(g'(x))=f(g(x+deltax)-g(x)). Additionally, the chain rule is not f'(g(x))=f'(g(x))g'(x), as dividing by f'(g(x)) would give you 1=g'(x), which isn't always true. The chain rule is f'(g(x))=f(g'(x))g'(x) —Preceding unsigned comment added by 66.169.198.79 (talk) 06:08, 3 October 2009 (UTC)[reply]

Basic idea

I think the basic idea behind the chain rule is getting swamped in a sea of details and special cases. The basic idea of the chain rule is pretty easy to explain in words:

The best linear approximation of the composition is the composition of the best linear approximations.

Every special case of the chain rule, more or less expresses this fact in various situations, with a wide variety of abstraction or concreteness. But, this is the basic idea, and it would be good if it were made a bit more prominent. Revolver 01:43, 9 October 2005 (UTC)[reply]

That is a very nice description of the chain rule. I'm not sure I dare edit "one of the 500 most frequently viewed mathematics articles," but I'd like to see the above sentence appearing prominently in the article. David Bulger (talk) 03:56, 5 July 2010 (UTC)[reply]

Mix-up??

If I'm not completely mistaken, the defition mixes up f and . Also, x isn't some kind of magic symbol, so doesn't make sense either. It says that the composition of f and g (which is a function) is equal to a certain value of that function, namely that at x. I think the whole thing should read

In algebraic terms, the chain rule (of one variable) states: Given function f that is differentiable at g(x) and a function g that is differentiable at x, then the composite (or shorter: ) is differentiable at x and

Or leave out h completely and just write:

--K. Sperling (talk) 00:37, 12 November 2005 (UTC)[reply]

Chain rule for several variables

I would like to see an extensive expansion of the chain rule for several variables. Thanks, Silly rabbit 06:00, 15 November 2005 (UTC)[reply]

What kind of information would you like to see added? I might try to do this. --Monguin61 03:56, 10 December 2005 (UTC)[reply]

Derivative of composite function

Since the "composition of two functions" is technically a "composite function", would its derivative be called a "composite derivative"? Also, does the secondary (inside) derivative have a special name (something like "harmonic derivative"——though I think that term means something else)?  ~Kaimbridge~20:15, 3 February 2006 (UTC)[reply]

I did not hear this terminology before. Oleg Alexandrov (talk) 00:27, 4 February 2006 (UTC)[reply]
What, "composite function"? I believe that is the legitimate term. See Thesaurus.Maths, PinkMonkey, ThinkQuest. I just wanted to know if its derivative had a special name, and if the inside derivative had a special name. ~Kaimbridge~14:48, 4 February 2006 (UTC)[reply]
I never heard of composite derivative of harmonic derivative. Oleg Alexandrov (talk) 17:30, 4 February 2006 (UTC)[reply]
No, the only term that I know is valid is "composite function". I'm asking if the derivative of a composite function would be referred to as a "composite derivative" (and, if not, does it have a special, unique name——other than just "derivative of a composite function"). Likewise, does the inside derivative (e.g., for ) have a special name——"harmonic derivative" is just some reasonable sounding possibility that I used as an example, not that I'm in any way implying that that is what it is actually called. P=) ~Kaimbridge~18:12, 4 February 2006 (UTC)[reply]
That's what I am trying to say. As far as I know, the answer to your questions is "no". I never heard of "composite derivative" or "harmonic derivative". They don't have any special name, either that or any other one, and I don't know why would any body ever want those things to have any special name. Oleg Alexandrov (talk) 19:41, 4 February 2006 (UTC)[reply]

Role of eta in proof

I have no formal training in maths so forgive me if I sound naiive. Near the bottom of the proof it says "Observe that as and ." Would I be correct to think that "" shows that the "error" (right word?) involved goes to zero as delta goes to zero? 202.180.83.6 03:52, 16 February 2006 (UTC)[reply]

Examples 1 and 2

the primes in example 1 and 2 (where it says f'(x) = ) are very difficult to see. They look exactly like f(x). It may cause a lot of confusion, is there any way to make the primes in f'(x) stand out?

Exceptions

I think that the exceptions to the rule should be mentioned. For instance, , as would be suggested by the power rule. The problem is that sqrt{x+a} is just sqrt{x} shifted back, but x+25 still differentiates to 1. This means that according to the power rule, that constant that's added to x has no affect on it, even thought the derivitive should be . Which doesn't follow directly from the Chain Rule. He Who Is 23:33, 3 June 2006 (UTC)[reply]

Umm... you are missing the whole point of the chain rule? If and , then --Spoon! 11:15, 31 August 2006 (UTC)[reply]

Chain rule in Probability

All that calculus gives me a headache! I have no idea if and/or how they relate, but perhaps the chain rule pertaining to probability theory deserves a place somewhere on this page? You know, the P(A1 n A2 ... n An) = P(A1)P(A2|A1)P(A3|A1 n A2)...P(An|n n-1/i=1 Ai) thing, sorry about the crummy representation. 218.165.75.221 10:04, 30 September 2006 (UTC) M.H.[reply]

Uh, how about no. 69.215.17.209 14:41, 22 April 2007 (UTC)[reply]

Personally, I would like to see some detail about the Chain Rule for probability theory. I have been looking around the internet, and have not been able to find a discussion of it (ideally a step-by-step example or a detailed proof). So, it would be a good thing if wikipedia included something on it. Should the Chain Rule for probability theory be included on this page, the page for probability theory, or it's own page [ex. Chain Rule (Probability Theory)]?? SteelSoul (talk) 17:50, 2 February 2009 (UTC)[reply]

I am adding this page. ---- CharlesGillingham (talk) 21:41, 17 September 2009 (UTC)[reply]

(f(g(x)))' is incorrect and confusing

The statement (f o g)'(x) = (f(g(x)))' is incorrect and fundamentally misunderstands the prime (f') notation. The prime is a transformation from functions to functions; as such it should be applied before the variable x is evaluated, as on the LHS but not as on the RHS. 69.215.17.209 14:44, 22 April 2007 (UTC)[reply]

I see that this edit was reverted, with the argument that this notation is common enough to be included. I understand that people may sometimes use it (I've never seen it myself, and I challenge anyone to produce examples from a common calculus text), but it is not standard and pedagogically very confusing. It is ambiguous whether the constant f(x) or the function f is being differentiated. I do not think that such poor notation should be perpetuated in an encyclopedia, without evidence that it is at least commonly used. --69.212.231.101 03:52, 26 July 2007 (UTC)[reply]

Since the chain rule is a fairly basic concept in calculus and most people at that level haven't taken analysis would it be appropriate to add a note explaining the meaning of the composition operator? —Preceding unsigned comment added by 76.199.5.236 (talk) 06:47, 13 January 2008 (UTC)[reply]
Note: The objection to f(g(x))' refers to an earlier version of the article, and is no longer current. Silly rabbit (talk) 13:51, 13 January 2008 (UTC)[reply]
I agree that, in an article about such an elementary concept in calculus, it is inappropriate to use composition operator. ---- CharlesGillingham (talk) 00:21, 16 September 2009 (UTC)[reply]

Examples

The examples could use some more description, depending on if we're shooting for "definition" or "instructional detail." Substituting U for X^2+1 makes the plug 'n' chug easier, but it's not strictly necessary. Any objections to expanding current objections to include various applications of chain and detailed description of how and why subsitutions are valid?--Legomancer (talk) 22:45, 8 September 2009 (UTC)[reply]

Comment from WPM

For any coordinate (real valued function) y on a line (e.g. the real line) and any point p, denote by dyp the equivalence class of y-y(p)1 (where 1 is the constant function, with value 1) modulo functions vanishing at p to higher order. If y=f(x) (i.e. y = f ⚬ x for some other coordinate x on the same line and some f:RR) then the definition of differentiability of f at x(p) ensures that dyp = f'(x(p)) dxp because f(x)-f(x(p))1 differs from f'(x(p))(x-x(p)1) by a function vanishing at p to higher order.

The chain rule is an immediate consequence. If u = g(y) then, omitting evaluations/subscripts at p, du = g'(y)dy= g'(f(x))f'(x) dx.

Most arguments formalize this basic idea without discussing the conceptual meaning. Geometry guy 00:57, 2 November 2010 (UTC)[reply]

The original proof that was in the article is close to being a direct formalization of this. Unfortunately, it looks like it had gotten a bit over-edited, since it included a version of the proof for several variables that was not really consistent with this philosophy. I've put back in what I believe is an easier to read version of the old proof. I wasn't sure whether and how to address the case of several variables: really the article should have a proper statement of the chain rule of a function between two (finite dimensional) Euclidean spaces. Sławomir Biały (talk) 11:22, 3 November 2010 (UTC)[reply]

Correctness of first proof?

I was revisiting the first proof, and I've come to the conclusion that I don't think it's correct; at least, not in spirit. I was trying to rewrite it from scratch (which is my normal style), and the best I could do was as follows:

One proof of the chain rule begins with the definition of the derivative:
Assume for the moment that g(x) does not equal g(a) for any x near a. Then the previous expression is equal to:
When g oscillates near a, then it might happen that no matter how close one gets to a, there is always an even closer x such that g(x) which equals g(a). For example, this happens for g(x) = x2sin(x) near the point a = 0. To work around this, we introduce a function Q as follows:
Q is defined wherever f is. Furthermore, because f is differentiable at g(a) by assumption, Q is a continuous function at g(a). Now we consider the function:
Whenever g(x) is not equal to g(a), this product is clearly equal to the difference quotient for f ∘ g at a because the factors of g(x) - g(a) cancel. When g(x) equals g(a), THEN A MIRACLE OCCURS and this product is still equal to the difference quotient. Hence we can compute the derivative of f ∘ g at a by computing the limit as x goes to a of the above function. This limit exists because the above function is a product and the limit as x goes to a of each of its factors exists. Furthermore, because Q is continuous, the limit of the first factor equals f′(g(a)), and by definition the limit of the second factor equals g′(a). This proves the chain rule.

The problem is that when g(x) equals g(a) and x is not a, the miracle doesn't occur; the value of the product is f′(g(a)) times zero, which is zero. If we were to take the limit instead of evaluating, then the miracle would occur, but I then don't know how to prove that the limit computes what we want. If we could split up the product, then the miracle would occur, but then we need to show that the limit of the product exists, and I don't know how to prove that directly. The standard proofs get around this by explicitly measuring error terms; when we approach things that way, we never see the zero product, hence the miracle occurs. The whole reason we have this proof, though, is because it avoids error terms, and if we have to introduce them to make this work then there's no point in keeping this proof. So I'm stuck; I don't see how to fill this gap. In fact, as far as I can tell, since the product is zero this proof is just wrong.

The article presently seems to ignore this difficulty. It glosses over it by introducing Q only at the end and ignoring the need for a miracle. But as far as I can tell, it has exactly the same problem. Am I missing something? Or what? Ozob (talk) 04:32, 11 December 2010 (UTC)[reply]

The proof can be made valid, but as your experience shows, only at a considerable cost in complexity and comprehensibility. There used to be a single proof, which had been there since 2004, and which I corrected on 11 November 2008. This remained there until 31 October 2010, when it was replaced by the current first proof. On 2 November 2010 a form of the original proof was added back as "second proof", and on 16 November I repeated the correction to it which I'd made two years previously. The only possible advantage to the first proof is that it's somehow more intuitive; the trouble is, to make it sound it is necessary to complicate it so much that this advantage vanishes. The solution now is to delete the "first proof" and then we can all go round the loop again. SamuelTheGhost (talk) 23:38, 11 December 2010 (UTC)[reply]
OK, I figured it out! When I need a miracle to happen, the difference quotient is equal to zero (which in retrospect is obvious), so the miracle happens!
I'm putting a (revised version) of the above proof into the article as the first proof. Ozob (talk) 20:30, 12 December 2010 (UTC)[reply]

Error in 'First example'

In the example "Suppose that a skydiver ..." the formula g(t) = 4000 − 9.8t2 should be replaced with g(t) = 4000 − ½9.8t2 isn't it? 2.36.204.64 (talk) 22:21, 19 January 2011 (UTC)[reply]

Suggested changes to 'First example'

1) clarify 2nd bullet from "...rate of change in atmospheric pressure at height..." to ...rate of change in atmospheric pressure w.r.t. h, at height...

2) clarify 4th bullet from "...rate of change in atmospheric pressure t seconds after..." to ...rate of change in atmospheric pressure w.r.t. t, t seconds after...

3) the bottom paragraph that starts "It is not true..." is misleading and includes an error. I would end it with the sentence "This need not have anything to do with the buoyant force ten seconds after the skydiver's jump." and start a new paragraph just below that states the following:

It is true that (f o g)'(t) = f'(h) * g'(t). To find the buoyant force w.r.t. t ten seconds after his jump, we must evaluate g(10), his height ten seconds after he jumps, and substitute the result into f'(h). g(10) is 3510 meters above sea level, so the true buoyant force w.r.t. t ten seconds after the jump is (proportional to) f'(3510) * g'(10) = 7.133 * -98 = -699.

This example demonstrates the Chain Rule as the product of two rates. The last sentence that states "g(10) is 3020 meters above sea level, so the true buoyant force ten seconds after the jump is (proportional to) f'(3020)." is erroneous. To use the Chain Rule you need to multiply by f'(g(t)) by g'(t). —Preceding unsigned comment added by 69.117.93.37 (talk) 04:41, 31 January 2011 (UTC)[reply]

I've changed the article. Ozob (talk) 12:34, 31 January 2011 (UTC)[reply]

Evaluation

(Copied from WT:WPM. Ozob (talk) 02:07, 2 March 2011 (UTC))[reply]

The article titled chain rule currently says:

The chain rule is frequently expressed in Leibniz notation. Suppose that u = g(x) and y = f(u). Then the chain rule is
This is often abbreviated as
However, this formula does not specify where each of these derivatives is to be evaluated, which is necessary to make a complete and correct statement of the theorem.

Does this last form really fail to "specify where each of these derivatives is to be evaluated"? It seems to me that the first form above clutters things in such a way as to interfere with understanding, and that the second, read correctly, doesn't really fail to do anything that should be done.

Opinions? Michael Hardy (talk) 23:04, 1 March 2011 (UTC)[reply]

I'm with you on this one. The sentence isn't really Wikipedia-appropriate, anyway -- at best that's textbook language. CRGreathouse (t | c) 01:47, 2 March 2011 (UTC)[reply]
Well, since I'm the one who wrote that sentence, I think I should defend it. But I'm going to do so on Talk:Chain rule, not here. Ozob (talk) 01:49, 2 March 2011 (UTC)[reply]
OK, here's my defense. Yes, that last form really does fail to specify where the derivatives are to be evaluated. That's obvious because it leaves the evaluations out. I think your real objection is: Does anyone really need to specify where the derivatives are to be evaluated, or is it always safe to leave them out and let them be implicitly understood? I'm going to proceed assuming that this is your real objection.
I don't think it is. For a student first learning about the chain rule, the relationships between y, x, u, f, and g will not all be clear. While we don't intend the article to be a textbook treatment of the subject, we should target a very low-level audience—which includes students learning about the chain rule for the first time. Because of that I don't think we can assume that our audience will be able to infer anything about where the derivatives should be evaluated. In particular, I'm worried that they won't be able to guess that dy/du should be evaluated at g(c). I think if you were to ask most students, you'd probably get back nonsense, like saying that it should be evaluated at x or at u. I think it is much better for the article to spell out all the details. I admit that the article already does this when it gives the formula f′(g(x))g′(x); but I think that it is still a good idea to give a full and correct statement in the Leibniz notation, too.
I'm not particularly tied to those words, though. If they sound overly textbook-ish, then they ought to be changed. Maybe it would be good to just leave out that last part and stop after the second displayed equation? Ozob (talk) 02:07, 2 March 2011 (UTC)[reply]

If y = g(u) and u = f(x), then the point at which to evaluate dy/du is u and the point at which to evaluate du/dx is x. That seems obvious. The extra notation will be confusing. Michael Hardy (talk) 04:32, 2 March 2011 (UTC)[reply]

Both formulas should be included. The short one has its merits a good mnemonic device, and the former is indispensable for properly understanding the formula, as Ozob suggested. Tkuvho (talk) 05:55, 2 March 2011 (UTC)[reply]
My objection is that u and x are variables and, if you're being careful, it doesn't make sense to evaluate anything at them. The short form of the Leibniz notation chain rule is the equivalent of the statement (fg)′ = fg′. If you were to teach this to most students, they'd believe that (fg)′(x) = f′(x)g′(x), which is wrong. It's wrong because f′ is a function of u and should be evaluated at u = g(c), just like the first displayed equation shows. Someone with experience can deduce the right place to evaluate f′ by looking at its domain, but I don't think our target audience can. Ozob (talk) 11:58, 2 March 2011 (UTC)[reply]

I disagree: it does make sense to evaluate a function at a variable.

There you have evaluation of a function at the variable x and evaluation of a function at the variable u. Are students really going to mistakenly assume I mean the pointwise product of ƒ and g if I write it that way? I don't think so.

And if I write

is that not also "evaluation of a function at a variable"? Michael Hardy (talk) 20:14, 2 March 2011 (UTC)[reply]

Well, I don't really want to discuss the meaning of the word "evaluation", but I disagree (or at least think the point is arguable). Regardless, the analog of (fg)′(x) = f′(u)g′(x) in Leibniz notation would be:
or maybe
As I said above, dy/dx = (dy/du)(du/dx) is analogous to (fg)′ = fg′, and I don't think either of those are clear to the novice. Ozob (talk) 02:06, 3 March 2011 (UTC)[reply]

To write

is at best redundant. That u is where it's evaluated is inherent in the meaning of the Leibniz notation. It's hard to see how anyone could mistakenly think otherwise. That's why this whole thing about evaluation is pointless. Michael Hardy (talk) 18:24, 3 March 2011 (UTC)[reply]

The telegraphic formula dy/dx = dy/du * du/dx is a good mnemonic device but it is too abbreviated to be self-explanatory. The fact that students frequently make the mistake of evaluating the first factor (dy/du) at the wrong point is amply illustrated in this very page, which contains a detailed discussion of such a typical error in the context of a physics example. I agree that there is a problem with writing , but the problem is not that it is redundant, but that it is too telegraphic: it should be u=g(c) or something. Tkuvho (talk) 18:46, 3 March 2011 (UTC)[reply]
I suspect I know far more about frequent errors in calculus than does anyone else writing here, and I am not aware that that is a frequent mistake, and find it implausible. Please specify where I can find that error in a physics example. Writing amounts to refusing to use the Leibniz notation at all. Michael Hardy (talk) 18:57, 5 March 2011 (UTC)[reply]
Myself, I don't think I ever claimed it was a frequent mistake. But that does not mean it is not a source of confusion. I remember being confused by the shape of the chain rule when I first learned calculus: I wondered why it was so asymmetric, with f and g playing such apparently different roles.
I do not want to be pedantic and insist that everyone put evaluation bars everywhere at every use of the chain rule. But I do want the chain rule correctly and fully stated in Leibniz notation at least once, and that's impossible without evaluation bars. Ozob (talk) 20:43, 5 March 2011 (UTC)[reply]
I agree. Hardy's deletion of this material does not reflect a consensus here and should be reverted. Tkuvho (talk) 07:46, 6 March 2011 (UTC)[reply]
I've put it back in but reorganized it somewhat. I'll be back for more later...... Michael Hardy (talk) 17:58, 6 March 2011 (UTC)[reply]
Thanks. Tkuvho (talk) 12:57, 7 March 2011 (UTC)[reply]
I have edited the section a little. I have a specific objection to the sentence beginning "as always": While that is usually what's done, it is entirely possible to evaluate these functions as some other value. It may even be useful. So I've taken that sentence out. I am pretty happy with how the section is now. Ozob (talk) 22:54, 7 March 2011 (UTC)[reply]

Multivariate chain rule

I might misunderstand the notations, but why is the multivariate chain rule written as:

Why is there a composition on the RHS, and not a product of derivatives, as in the univariate case ? Is it to be understood as a matrix operation, in which case composition corresponds to a product, and it that case, shouldn't this be explicitely signaled ? Donvinzk (talk) 11:34, 4 June 2011 (UTC)[reply]