Jump to content

Talk:Generative pre-trained transformer

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Estadje (talk | contribs) at 21:42, 13 November 2023 (Removed note about broken links, because I just removed the broken links). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


GPT as a generic term

FWIW, I asked GPT-3.5 the following question and was answered as follows:

"Do you think GPT is a generic term for a kind of AI?

Yes, 'GPT' (Generative Pre-trained Transformer) is a specific type of language model developed by OpenAI, but the term is often used as a general descriptor for a type of AI that can generate human-like language output. The GPT architecture has been highly influential in the field of natural language processing and has been used to power a wide range of language-based applications, including chatbots, language translation tools, and content creation tools. Other companies and researchers have also developed similar language models based on the GPT architecture or other approaches, so the term "GPT" has come to represent a broader category of AI language models that share similar features and capabilities." <emphasis added> Estadje (talk) 12:31, 2 April 2023 (UTC)[reply]

FWIW, I asked GPT-3.5... It's really worth nothing at all.
But in any case, I've actually been thinking of generalizing this article a bit myself lately, because you're right about the fact that non-OpenAI researchers use it plenty too. Nevertheless, I was very tempted to revert your edit because it's hardly an improvement as it is. Saying things like "is a kind of artificial intelligence" as a definition doesn't mean anything and points to a lack of understanding of the underlying terminology. I'll keep it as it is for the time being anyway, though, and I'll revisit/clean-up the page when I have more time later. PopoDameron ⁠talk 20:19, 2 April 2023 (UTC)[reply]
Fair point about that direct quote not meaning much...but hey, that's why I only put it in the talk page. :) As for my inserted colloquial language at the beginning of the article, I realize that by itself it's not the best description (hence I also left the better description there also...), but I was aiming for a concise combination of phrases there to help give a broader audience at least a general sense of it. And I do feel like at least some of my other tweaks were a net positive, in any case, but I'm sure there's much room to improve upon them...
As for the portion that you did remove, that's fair due to the source...but I still think it would be great if this article could have a simple breakdown of what the G, P, & T each mean. I'll try to find a better source that explains it as clearly for a broader audience. Estadje (talk) 21:00, 2 April 2023 (UTC)[reply]

Focus of this article

I think we need to have a good discussion about what this article should look like because there are some pretty big problems with it now, and it's starting to attract a lot of attention. The article used to be specifically about OpenAI's family of GPT models, but as Estadje has pointed out, the term GPT is now in common use even outside of OpenAI-developed models.

I think it would make some sense to pivot the article's focus to GPTs in general, as Estadje has begun to do, but the problem there is that we will begin to have a very heavy overlap with large language model. With the exception of BERT, every single LLM in the list on that article can be classified as a GPT. And as it stands, a lot of the content in large language model is relevant to generative pre-trained transformer.

So we need to make the choice of having this article be either about OpenAI's GPT family in particular, or about all models that are 'generative', 'pre-trained' and 'transformers'. If we go for the general option, how do we reconcile it with large language model given that most LLMs are GPTs? At the very least, I think only one of them should contain a list section.

Pinging potentially interested editors to the discussion (Colin MArtem.GInfiniteNexusGmelliDFlhb) PopoDameron ⁠talk 01:03, 3 April 2023 (UTC)[reply]

  • This makes sense. Various subject matter areas often involve articles with heavy overlap (often for valid reasons...), but ideally they're managed/coordinated in a manner that puts some reasonable bounds on the inevitable redundancy. In this case, I agree that there should only be a "list" in one of the two (though I've been fleshing out the list on this one, for the time being).
As for the broader question of whether to make the article about OpenAI's GPT series instead of GPT in general, I'd submit that that could inadvertently perpetuate the notion that GPT remains unique to OpenAI. Estadje (talk) 03:07, 3 April 2023 (UTC)[reply]
  • I think this is a bad idea and would cause confusion and massive overlap. It should clearly remain focused on OpenAI's models.
"Generative pre-trained transformer" is an OpenAI trademark; it's not a generic term, and it's not used either by primary or secondary sources to describe any other models (not even GPT-J, which is just described as a "transformer-based large language model"). I don't care about trademark infringement, but lumping them together is clearly WP:OR that isn't supported by secondary sources. It would be like lumping in ReactOS and Microsoft Windows into the same article. The GPT derivatives created by others should get their own "derivatives" section, not be treated as the "real thing", and only notable ones should be listed.
This article should contain a table of OpenAI's models; then WP:SUMMARY-style sections on each, then sections on how OpenAI trains them, or OpenAI's approach to AI safety, or whatever, then a prose-style section on derivatives (based on reputable secondary sources only).
I also really hope people don't start "asking" ChatGPT what we should put in our articles and treating it as an authority on anything; that's a colossal waste of time for everyone involved. DFlhb (talk) 06:09, 3 April 2023 (UTC)[reply]
I'm inclined to agree. I was initially starting to think that the article should potentially be generalized because there are tons of LLMs now that use the 'GPT' name (see GPT-Neo, Cerebras-GPT, BioGPT (Microsoft), etc...), but I agree that a better solution would be to mention such works in a "derivatives" section and keep the main focus on OpenAI. The term was coined by OpenAI, and even though it is being used by some non-OpenAI companies/institutions, it's cleaner to keep them separated. PopoDameron ⁠talk 06:40, 3 April 2023 (UTC)[reply]
I realize this decision will be made by more experienced editors around here, but I have to say that I think it's very unlikely that OpenAIs application to trademark the term would be granted. The simple fact is that at least a significant portion of its usage has become generic (and there's the nature of the 3 component terms...).
If this article remains about OpenAI's series of GPT technologies, I'd suggest 1) that its title should reflect that particularization (which would avoid actively perpetuating OpenAI's view that it's a trademark), and 2) any derivatives section should at least acknowledge the emerging general usage of the term in common parlance.
(And as for my kicking this off with a quote of ChatGPT, that was just a tongue-in-cheek way of paraphrasing what I would've said on my own anyway. I mean, won't do it again, but come on...) Estadje (talk) 11:07, 3 April 2023 (UTC)[reply]
You're right about my last paragraph; it was too harsh. DFlhb (talk) 11:11, 3 April 2023 (UTC)[reply]
1) that its title should reflect that particularization (which would avoid actively perpetuating OpenAI's view that it's a trademark) I think this is an interesting idea. Did you have any ideas for an alternative title? OpenAI language models? If we did this, there would still be the question of what should happen to the current title. I would think it should still redirect to this article (maybe with a dab hatnote linking to Large language model?). Colin M (talk) 19:34, 4 April 2023 (UTC)[reply]
IMO that'd be less recognizable. There's zero issue with "legitimizing" trademarks/brands, and I think it's more encyclopedic to avoid diluting them. Kleenex for example is just focused on that brand, not on the generic concept. "GPT" is the recognized name of a distinct product family, regardless of the trademark status. DFlhb (talk) 19:59, 4 April 2023 (UTC)[reply]
I think a reasonable middle-ground seems to have emerged now, with the majority of the article essentially focusing on OpenAI's GPTs and then having a section focusing on other/derivative GPTs (also, that section now has a couple secondary sources reflecting the fact that the term is at least sometimes used more broadly now...). Estadje (talk) 20:38, 4 April 2023 (UTC)[reply]
I don't really understand the scope of the "Other (derivative) models" section. Why should it not include all of the entries from List of large language models? They're all a) generative b) pre-trained and c) transformers. Is the inclusion criteria for this section more narrow than that? Colin M (talk) 20:44, 4 April 2023 (UTC)[reply]
I think some folks thought it should just be 'major' ones (at least in the context of this article...), but I don't really feel strongly about how high that 'bar' should be... Estadje (talk) 20:48, 4 April 2023 (UTC)[reply]
I would say that the four models currently listed are actually pretty marginal compared to others which are omitted (e.g. BERT (language model), PaLM, and LLaMA). I think the section should just be removed, since it doesn't seem to have any coherent scope. Colin M (talk) 20:53, 4 April 2023 (UTC)[reply]
My idea was to include as derivatives only models that are actual derivatives, i.e. retrained versions of GPT-2 (which is open-source). But to be clear, that was intended as compromise; I think those belong at GPT-2.DFlhb (talk) 20:56, 4 April 2023 (UTC)[reply]
Perhaps having the "derivatives" section be prose-based might work better than a list...but one way or another, having some kind of other/derivatives/non-OpenAI section is important in light of the ample emerging broader uses of the term (some in secondary sources...). Estadje (talk) 21:00, 4 April 2023 (UTC)[reply]
I support deleting the second table and just having something like a 'see also' link to the list section in large language model. PopoDameron ⁠talk 03:29, 5 April 2023 (UTC)[reply]
Done. Estadje (talk) 04:03, 5 April 2023 (UTC)[reply]
I basically agree with DFlhb. The alternative would result in an article that would be way too duplicative of Large language model (regardless of whether we kept the title "Generative pre-trained transformer", or called it something else like "Large transformer language models" or "Transformer-based foundation language models" or whatever). Colin M (talk) 15:55, 3 April 2023 (UTC)[reply]
Yeah, if it's "Transformer-based foundation language models," it would really be a 100% overlap because all (major) LLMs are transformer based. PopoDameron ⁠talk 16:34, 3 April 2023 (UTC)[reply]
  • The article currently makes it seem like GPTs were invented by OpenAI. If that is correct, then the article should remain focused on OpenAI's GPTs. If GPTs were not actually invented by OpenAI, then the article's lead should be amended and its scope expanded. InfiniteNexus (talk) 17:30, 3 April 2023 (UTC)[reply]
    That's because I just restored the version before Estagje's edits when the article was focused on OpenAI. The term 'GPT' was indeed coined by OpenAI, but the technology GPTs use, namely transformers, was invented by Google. PopoDameron ⁠talk 17:34, 3 April 2023 (UTC)[reply]
    I'd note that something can be initially invented and/or coined by one party, and then become expanded through emerging usage to signify something broader. This is notably more common with inherently generic terms or term components (which quite arguably applies w/ the G, P, & T)
    Whether or not that has happened with "GPT" is not currently a matter of consensus (although OpenAI appears to be asserting its view that GPT should be regarded as their trademark...). Estadje (talk) 18:01, 3 April 2023 (UTC)[reply]

Article expansion (adding of background)

hey, few weeks ago we had a conversation about huge Background section in the GPT-2 article. I believe it belongs here, not there. Please see Talk:GPT-2#Background_section. Also pinging JPxG. Artem.G (talk) 16:15, 12 April 2023 (UTC)[reply]

I agree that most of that would make more sense being here. PopoDameron ⁠talk 16:48, 12 April 2023 (UTC)[reply]
I disagree. Virtually none of the content in that section is specific to OpenAI's language models, in that it's equally relevant to any other LLM. But most of it is so far removed from contemporary models that it would be too detailed even for Large language model. The whole virtue of Wikipedia is that it's a hyperlinked encyclopedia. If someone encounters the term neural network in this article, they can follow the link to the artificial neural network article and read more about the topic there. We don't need 5 paragraphs in this article explaining the fundamentals of neural networks and the history of their development. (Imagine the consequences if we did that for every article about any kind of technology based on neural nets!) Colin M (talk) 19:01, 12 April 2023 (UTC)[reply]
This is true. I actually thought we were on the LLM talk page at first, but even for that, this is probably a bit too much background. If anything, maybe it can be merged into multiple articles (that is, if it isn't copied). PopoDameron ⁠talk 19:19, 12 April 2023 (UTC)[reply]
as I wrote in the linked discussion, I think this section should be completely removed, but because I was reverted I started these discussions. I think the Background section makes no sense either in GPT or in LLM articles, but belongs to the history of AI, NLP, or some history of deep learning articles. But putting it into GPT-2 makes even less sense to me, because by that logic the same section should be copied into every article like gpt-3, 4, palm, llama, etc. Artem.G (talk) 19:35, 12 April 2023 (UTC)[reply]
I think there's rough consensus to remove at this point. As far as I can see, at least 4 editors have agreed that the background section should be removed or drastically reduced in size (me, you, Buidhe, and CommanderWaterford), and only one editor who wants to keep it (the author). As a courtesy, you could copy the content to a userspace page so it can easily be accessed by JPxG or anyone else who wants to adapt it into other articles. But I don't think you should feel compelled to merge the content into another article as a precondition for removing it from GPT-2. Colin M (talk) 20:29, 12 April 2023 (UTC)[reply]

This section should be removed from GPT-2, yes. The main reason it exists there is because, at the time it was written there, this article did not exist, nor did others in the series. The reason it exists at all is that, frankly, there are a large number of articles (History of artificial intelligence, Timeline of artificial intelligence, Progress in artificial intelligence, History of artificial neural networks (and probably others), of which many are poorly written or incomplete, and none of which none serve to explain to the reader "how GPT works". If there is a more appropriate article for this to go in, it should go there, but I do not think it is a good idea to plop this into an article like History of artificial intelligence; for example, said article has sections titled "1.1.2 Alchemical means of artificial intelligence" and "5.1 The rise of expert systems". jp×g 01:56, 14 April 2023 (UTC)[reply]

I must admit I've never read the history of AI in full before, and now when I read it I admit it's strange and outdated (and it's a GA somehow!). But two other articles are much worse, the timeline is just a list of everything ever called AI. It doesn't make any sense to include Ancient Greece there, but here it is. And the progress of AI is partly devoted to games, though it's only one task... IMO all these articles should be merged and rewritten, but I don't know who'd like to do it (definitely not me, I tried to start llm article several times before someone actually created it.) Artem.G (talk) 07:31, 14 April 2023 (UTC)[reply]
I don't think there's any single article the contents of the section can be plopped into in its entirety, but it may be worth taking out and adapting different pieces to improve different articles like History of artificial intelligence, Computational linguistics, Transformer (machine learning), etc. The fact that some of these articles are "poorly written or incomplete" is all the more reason to try to improve them! Colin M (talk) 15:02, 14 April 2023 (UTC)[reply]

Merge with [Chat?]GPT

Proposal to merge with Generative Pre-trained Transformer

OpenAI has filed a trademark for GPT. There will prbably be no contenders, as they have invented the term. [1]

GPT is more akin to a commercial name, rather than a genuine taxon. Large Language Model, seems to be an accepted taxon in Wikipedia, the GPT page, and the scientific community.

As it stands, the current taxonomy (as defined in the leading paragraphs and their platonian definitions) proposes that: ChatGPT is an instance| of GPT, and GPT in turn is an instance of a Large Language Model. My proposition is that we cut the middle man and just classify ChatGPT as an LLM, and make GPT and ChatGPT synonymous. You will find that the only instances of GPT algorithms are GPT1, GPT2, GPT3 and GPT4, and that these were all called ChatGPT.

This proposal would provide a clearer understanding for both students and laymen by shortening the amount of clicks they have to make to understand what ChatGPT really is. Or seen from another perspective, given the same amount of investment, readers will reach one level deeper than the previous one before either becoming satisfied with their current level of knowledge, or becoming frustrated at not understanding.

For a clear example, a layman with the patience to click on two links may read the first paragraphs of ChatGPT, then GPT, then LLM and find the term Neural Network before becoming frustrated and giving up. While under the new structure they would read the first paragraph of ChatGPT, then that of LLM, then click on Neural Network, and they will find an understandable first paragraph, with no further topical terms.

Some may argue that this may consist of original research, but this field does not have any consensus or even proposed taxonomy. An the taxonomy is described not as a separate statement that can be removed for lack of consensus, but as a necessary element of an article, its definition. And since there is no consensus or proposed taxonomy that could source the definition, there is no way to avoid this. I do not recommend to obsess over finding sources for this statement, rather I would recommend interpreting the existing sources, which will probably have enough information to define the concept. I always find it annoying when too many sources are used for an encyclopedic noise.

Thank you for coming to my Ted Talk. At this point, you probably don't need to do anything, really, this is one of those "speak now or be silent forever" scenarios at weddings but on a more bureocratic.

That said, if you see someone opposing this merge, a brief comment expressing your support would be appreciated. Please avoid excess discussion to avoid bikeshedding and filibustering.

Similarly, if you are interested in helping with the merge once an approving sentiment is proven, just let me know briefly. We can coordinate the specific details of the merger LATER. (what bits from which of both articles to use).

Regards, Tom TZubiri (talk) 03:16, 25 April 2023 (UTC)[reply]

Lead paragraphs seem to be already as desired. We just need to merge the two articles.
The page Generative pre-trained transformer is not really reachable, and it's much shorter, so it seems this problem was effectively culled a long time ago, now we just need to delete that. I'll specify the direction of the merge with the mergeto template, I may even move this to the other page since it's so small, and not even put the mergefrom notice on this page since the addition would be so small. TZubiri (talk) 03:23, 25 April 2023 (UTC)[reply]
-----
I disagree. For one thing, OpenAI's trademark application that you cite is still quite uncertain as to whether the USPTO would ultimately approve it or not. (For example, if I understand correctly, "BloombergGPT" isn't even based on OpenAI's technology...but rather the more descriptive/generic sense of the term.) At the very least it seems reasonably disputable, such that at least some people will in good faith regard it more broadly...
Multiple cited references (including secondary sources) in the current initial portion of this article appear to support the proposition that generative pre-trained transformers are a "type" of LLM (albeit one that OpenAI first invented and indeed remains the most prominent provider of...).
Other sources cited indicate that GPT and LLM are sometimes used synonymously, as most LLMs do now tend to have the G, the P, and the T. But in any case, at this point at least, there seems to be enough uncertainty and differential usage that eliminating this article outright would seem unduly rash.
Perhaps for the time being, there should be a section that acknowledges the pending trademark issue, and then see how that evolves...?
Another factor to consider is that even to the extent this article remains 'primarily' focused on OpenAI's GPT systems, that is a much broader universe than just "ChatGPT" -- and it's nice to have a place that addresses the overview/context, history, foundation models, etc.
Estadje (talk) 04:48, 25 April 2023 (UTC)[reply]

Summary of proposal from the previous section

TL;DR Very small page, criticized as without clear focus, probably because it's hard to distinguish from ChatGPT. Merge to ChatGPT if there's any data then delete.--TZubiri (talk) 03:27, 25 April 2023 (UTC)[reply]
I doubt that proposal would gain much support (for multiple likely reasons...), but if you think of other possible ways to improve clarity for more users I'd always be curious... I like the general idea of being mindful of how broader/non-specialized audiences may use/understand terms that had started out as more technical - and sometimes such usages can indeed have an impact on evolving formal definitions - but at any given point current accuracy has to be the priority, and the current general approach is fairly well sourced overall. Estadje (talk) 18:21, 28 April 2023 (UTC)[reply]

Training cost of GPT-n series

GPT-1: There is absolutely no information about what they meant by "1 month on 8 GPUs". I suspect it's V100 GPU, but it has 100 TFLOP/sec, which would give 2e21 FLOP, which is way too high.

GPT-2: The only mention from official source is "tens of petaflop/s-day", so about 1e21 FLOP...

GPT-3: They had to use damn words like "several thousand petaflop/s-days of compute". I actually did a screenshot of the damn Figure 2.2 and measured the histogram precisely. It came out to be 0.89 that of the full figure height, so that corresponds to 10000^0.89 = 3630 PF/s-day, which is almost exactly the 3.1e23 FLOP from my other reference.

GPT-4: > Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

Thanks "Open"AI. pony in a strange land (talk) 00:52, 2 May 2023 (UTC)[reply]

You can calculate the costs of training a LLM (in FLOP) using the formula Cost = 6 * [# param] * [# tokens]. While this doesn’t allow you to calculate wall-clock time or GPU-hours (as that also depends on the efficiency of the code being run and the type of GPU) it is typically the way that AI papers report compute usage. See for example the Pythia, BLOOM, GPT-NeoX, and OPT papers, among others. The derivation of this formula (which is an approximation but a very good one) can be found in Kaplan et al. (2020) “Scaling Laws for Neural Language Models.”
The reason you’re getting much too high a number for GPT-1 is that you can’t actually get 100 TFLOP/s on a V100, just like you can’t actually get 300-whatever TFLOP/s on an A100. Typically a skilled HPC engineer with highly optimized code can achieve in practice half of the number a GPU is officially rated for. Indeed, getting more than 40 TFLOP/V100/s on V100s or 160 TFLOP/A100/s requires substantial skill and achieving more than 55 and 180 respectively is more or less impossible with publicly known techniques. Stellaathena (talk) 13:42, 1 July 2023 (UTC)[reply]