Talk:Generative pre-trained transformer: Difference between revisions

Content deleted Content added

Inline

Revision as of 21:24, 23 April 2023

This article has not yet been rated on Wikipedia's content assessment scale.
It is of interest to the following WikiProjects:

Artificial Intelligence

This article is within the scope of WikiProject Artificial Intelligence, a collaborative effort to improve the coverage of Artificial intelligence on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Artificial IntelligenceWikipedia:WikiProject Artificial IntelligenceTemplate:WikiProject Artificial IntelligenceArtificial Intelligence

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Computing Low‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
Low	This article has been rated as Low-importance on the project's importance scale.

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Computer science Low‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

Low

This article has been rated as Low-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

GPT as a generic term

FWIW, I asked GPT-3.5 the following question and was answered as follows:

"Do you think GPT is a generic term for a kind of AI?

Yes, 'GPT' (Generative Pre-trained Transformer) is a specific type of language model developed by OpenAI, but the term is often used as a general descriptor for a type of AI that can generate human-like language output. The GPT architecture has been highly influential in the field of natural language processing and has been used to power a wide range of language-based applications, including chatbots, language translation tools, and content creation tools. Other companies and researchers have also developed similar language models based on the GPT architecture or other approaches, so the term "GPT" has come to represent a broader category of AI language models that share similar features and capabilities." <emphasis added> Estadje (talk) 12:31, 2 April 2023 (UTC)[reply]

FWIW, I asked GPT-3.5... It's really worth nothing at all.

But in any case, I've actually been thinking of generalizing this article a bit myself lately, because you're right about the fact that non-OpenAI researchers use it plenty too. Nevertheless, I was very tempted to revert your edit because it's hardly an improvement as it is. Saying things like "is a kind of artificial intelligence" as a definition doesn't mean anything and points to a lack of understanding of the underlying terminology. I'll keep it as it is for the time being anyway, though, and I'll revisit/clean-up the page when I have more time later. PopoDameron ⁠talk 20:19, 2 April 2023 (UTC)[reply]

Fair point about that direct quote not meaning much...but hey, that's why I only put it in the talk page. :) As for my inserted colloquial language at the beginning of the article, I realize that by itself it's not the best description (hence I also left the better description there also...), but I was aiming for a concise combination of phrases there to help give a broader audience at least a general sense of it. And I do feel like at least some of my other tweaks were a net positive, in any case, but I'm sure there's much room to improve upon them...

As for the portion that you did remove, that's fair due to the source...but I still think it would be great if this article could have a simple breakdown of what the G, P, & T each mean. I'll try to find a better source that explains it as clearly for a broader audience. Estadje (talk) 21:00, 2 April 2023 (UTC)[reply]

Focus of this article

I think we need to have a good discussion about what this article should look like because there are some pretty big problems with it now, and it's starting to attract a lot of attention. The article used to be specifically about OpenAI's family of GPT models, but as Estadje has pointed out, the term GPT is now in common use even outside of OpenAI-developed models.

I think it would make some sense to pivot the article's focus to GPTs in general, as Estadje has begun to do, but the problem there is that we will begin to have a very heavy overlap with large language model. With the exception of BERT, every single LLM in the list on that article can be classified as a GPT. And as it stands, a lot of the content in large language model is relevant to generative pre-trained transformer.

So we need to make the choice of having this article be either about OpenAI's GPT family in particular, or about all models that are 'generative', 'pre-trained' and 'transformers'. If we go for the general option, how do we reconcile it with large language model given that most LLMs are GPTs? At the very least, I think only one of them should contain a list section.

Pinging potentially interested editors to the discussion (Colin M—Artem.G—InfiniteNexus—Gmelli—DFlhb) PopoDameron ⁠talk 01:03, 3 April 2023 (UTC)[reply]

This makes sense. Various subject matter areas often involve articles with heavy overlap (often for valid reasons...), but ideally they're managed/coordinated in a manner that puts some reasonable bounds on the inevitable redundancy. In this case, I agree that there should only be a "list" in one of the two (though I've been fleshing out the list on this one, for the time being).

As for the broader question of whether to make the article about OpenAI's GPT series instead of GPT in general, I'd submit that that could inadvertently perpetuate the notion that GPT remains unique to OpenAI. Estadje (talk) 03:07, 3 April 2023 (UTC)[reply]

I think this is a bad idea and would cause confusion and massive overlap. It should clearly remain focused on OpenAI's models.

"Generative pre-trained transformer" is an OpenAI trademark; it's not a generic term, and it's not used either by primary or secondary sources to describe any other models (not even GPT-J, which is just described as a "transformer-based large language model"). I don't care about trademark infringement, but lumping them together is clearly WP:OR that isn't supported by secondary sources. It would be like lumping in ReactOS and Microsoft Windows into the same article. The GPT derivatives created by others should get their own "derivatives" section, not be treated as the "real thing", and only notable ones should be listed.

This article should contain a table of OpenAI's models; then WP:SUMMARY-style sections on each, then sections on how OpenAI trains them, or OpenAI's approach to AI safety, or whatever, then a prose-style section on derivatives (based on reputable secondary sources only).

~~I also really hope people don't start "asking" ChatGPT what we should put in our articles and treating it as an authority on anything; that's a colossal waste of time for everyone involved.~~ DFlhb (talk) 06:09, 3 April 2023 (UTC)[reply]

I'm inclined to agree. I was initially starting to think that the article should potentially be generalized because there are tons of LLMs now that use the 'GPT' name (see GPT-Neo, Cerebras-GPT, BioGPT (Microsoft), etc...), but I agree that a better solution would be to mention such works in a "derivatives" section and keep the main focus on OpenAI. The term was coined by OpenAI, and even though it is being used by some non-OpenAI companies/institutions, it's cleaner to keep them separated. PopoDameron ⁠talk 06:40, 3 April 2023 (UTC)[reply]

I realize this decision will be made by more experienced editors around here, but I have to say that I think it's very unlikely that OpenAIs application to trademark the term would be granted. The simple fact is that at least a significant portion of its usage has become generic (and there's the nature of the 3 component terms...).

If this article remains about OpenAI's series of GPT technologies, I'd suggest 1) that its title should reflect that particularization (which would avoid actively perpetuating OpenAI's view that it's a trademark), and 2) any derivatives section should at least acknowledge the emerging general usage of the term in common parlance.

(And as for my kicking this off with a quote of ChatGPT, that was just a tongue-in-cheek way of paraphrasing what I would've said on my own anyway. I mean, won't do it again, but come on...) Estadje (talk) 11:07, 3 April 2023 (UTC)[reply]

You're right about my last paragraph; it was too harsh. DFlhb (talk) 11:11, 3 April 2023 (UTC)[reply]

1) that its title should reflect that particularization (which would avoid actively perpetuating OpenAI's view that it's a trademark) I think this is an interesting idea. Did you have any ideas for an alternative title? OpenAI language models? If we did this, there would still be the question of what should happen to the current title. I would think it should still redirect to this article (maybe with a dab hatnote linking to Large language model?). Colin M (talk) 19:34, 4 April 2023 (UTC)[reply]

IMO that'd be less recognizable. There's zero issue with "legitimizing" trademarks/brands, and I think it's more encyclopedic to avoid diluting them. Kleenex for example is just focused on that brand, not on the generic concept. "GPT" is the recognized name of a distinct product family, regardless of the trademark status. DFlhb (talk) 19:59, 4 April 2023 (UTC)[reply]

I think a reasonable middle-ground seems to have emerged now, with the majority of the article essentially focusing on OpenAI's GPTs and then having a section focusing on other/derivative GPTs (also, that section now has a couple secondary sources reflecting the fact that the term is at least sometimes used more broadly now...). Estadje (talk) 20:38, 4 April 2023 (UTC)[reply]

I don't really understand the scope of the "Other (derivative) models" section. Why should it not include all of the entries from List of large language models? They're all a) generative b) pre-trained and c) transformers. Is the inclusion criteria for this section more narrow than that? Colin M (talk) 20:44, 4 April 2023 (UTC)[reply]

I think some folks thought it should just be 'major' ones (at least in the context of this article...), but I don't really feel strongly about how high that 'bar' should be... Estadje (talk) 20:48, 4 April 2023 (UTC)[reply]

I would say that the four models currently listed are actually pretty marginal compared to others which are omitted (e.g. BERT (language model), PaLM, and LLaMA). I think the section should just be removed, since it doesn't seem to have any coherent scope. Colin M (talk) 20:53, 4 April 2023 (UTC)[reply]

My idea was to include as derivatives only models that are actual derivatives, i.e. retrained versions of GPT-2 (which is open-source). But to be clear, that was intended as compromise; I think those belong at GPT-2.DFlhb (talk) 20:56, 4 April 2023 (UTC)[reply]

Perhaps having the "derivatives" section be prose-based might work better than a list...but one way or another, having some kind of other/derivatives/non-OpenAI section is important in light of the ample emerging broader uses of the term (some in secondary sources...). Estadje (talk) 21:00, 4 April 2023 (UTC)[reply]

I support deleting the second table and just having something like a 'see also' link to the list section in large language model. PopoDameron ⁠talk 03:29, 5 April 2023 (UTC)[reply]

Done. Estadje (talk) 04:03, 5 April 2023 (UTC)[reply]

I basically agree with DFlhb. The alternative would result in an article that would be way too duplicative of Large language model (regardless of whether we kept the title "Generative pre-trained transformer", or called it something else like "Large transformer language models" or "Transformer-based foundation language models" or whatever). Colin M (talk) 15:55, 3 April 2023 (UTC)[reply]

Yeah, if it's "Transformer-based foundation language models," it would really be a 100% overlap because all (major) LLMs are transformer based. PopoDameron ⁠talk 16:34, 3 April 2023 (UTC)[reply]

The article currently makes it seem like GPTs were invented by OpenAI. If that is correct, then the article should remain focused on OpenAI's GPTs. If GPTs were not actually invented by OpenAI, then the article's lead should be amended and its scope expanded. InfiniteNexus (talk) 17:30, 3 April 2023 (UTC)[reply]
That's because I just restored the version before Estagje's edits when the article was focused on OpenAI. The term 'GPT' was indeed coined by OpenAI, but the technology GPTs use, namely transformers, was invented by Google. PopoDameron ⁠talk 17:34, 3 April 2023 (UTC)[reply]

I'd note that something can be initially invented and/or coined by one party, and then become expanded through emerging usage to signify something broader. This is notably more common with inherently generic terms or term components (which quite arguably applies w/ the G, P, & T)

Whether or not that has happened with "GPT" is not currently a matter of consensus (although OpenAI appears to be asserting its view that GPT should be regarded as their trademark...). Estadje (talk) 18:01, 3 April 2023 (UTC)[reply]

Article expansion (adding of background)

hey, few weeks ago we had a conversation about huge Background section in the GPT-2 article. I believe it belongs here, not there. Please see Talk:GPT-2#Background_section. Also pinging JPxG. Artem.G (talk) 16:15, 12 April 2023 (UTC)[reply]

I agree that most of that would make more sense being here. PopoDameron ⁠talk 16:48, 12 April 2023 (UTC)[reply]

I disagree. Virtually none of the content in that section is specific to OpenAI's language models, in that it's equally relevant to any other LLM. But most of it is so far removed from contemporary models that it would be too detailed even for Large language model. The whole virtue of Wikipedia is that it's a hyperlinked encyclopedia. If someone encounters the term neural network in this article, they can follow the link to the artificial neural network article and read more about the topic there. We don't need 5 paragraphs in this article explaining the fundamentals of neural networks and the history of their development. (Imagine the consequences if we did that for every article about any kind of technology based on neural nets!) Colin M (talk) 19:01, 12 April 2023 (UTC)[reply]

This is true. I actually thought we were on the LLM talk page at first, but even for that, this is probably a bit too much background. If anything, maybe it can be merged into multiple articles (that is, if it isn't copied). PopoDameron ⁠talk 19:19, 12 April 2023 (UTC)[reply]

as I wrote in the linked discussion, I think this section should be completely removed, but because I was reverted I started these discussions. I think the Background section makes no sense either in GPT or in LLM articles, but belongs to the history of AI, NLP, or some history of deep learning articles. But putting it into GPT-2 makes even less sense to me, because by that logic the same section should be copied into every article like gpt-3, 4, palm, llama, etc. Artem.G (talk) 19:35, 12 April 2023 (UTC)[reply]

I think there's rough consensus to remove at this point. As far as I can see, at least 4 editors have agreed that the background section should be removed or drastically reduced in size (me, you, Buidhe, and CommanderWaterford), and only one editor who wants to keep it (the author). As a courtesy, you could copy the content to a userspace page so it can easily be accessed by JPxG or anyone else who wants to adapt it into other articles. But I don't think you should feel compelled to merge the content into another article as a precondition for removing it from GPT-2. Colin M (talk) 20:29, 12 April 2023 (UTC)[reply]

This section should be removed from GPT-2, yes. The main reason it exists there is because, at the time it was written there, this article did not exist, nor did others in the series. The reason it exists at all is that, frankly, there are a large number of articles (History of artificial intelligence, Timeline of artificial intelligence, Progress in artificial intelligence, History of artificial neural networks (and probably others), of which many are poorly written or incomplete, and none of which none serve to explain to the reader "how GPT works". If there is a more appropriate article for this to go in, it should go there, but I do not think it is a good idea to plop this into an article like History of artificial intelligence; for example, said article has sections titled "1.1.2 Alchemical means of artificial intelligence" and "5.1 The rise of expert systems". jp×g 01:56, 14 April 2023 (UTC)[reply]

I must admit I've never read the history of AI in full before, and now when I read it I admit it's strange and outdated (and it's a GA somehow!). But two other articles are much worse, the timeline is just a list of everything ever called AI. It doesn't make any sense to include Ancient Greece there, but here it is. And the progress of AI is partly devoted to games, though it's only one task... IMO all these articles should be merged and rewritten, but I don't know who'd like to do it (definitely not me, I tried to start llm article several times before someone actually created it.) Artem.G (talk) 07:31, 14 April 2023 (UTC)[reply]

I don't think there's any single article the contents of the section can be plopped into in its entirety, but it may be worth taking out and adapting different pieces to improve different articles like History of artificial intelligence, Computational linguistics, Transformer (machine learning), etc. The fact that some of these articles are "poorly written or incomplete" is all the more reason to try to improve them! Colin M (talk) 15:02, 14 April 2023 (UTC)[reply]

@@ Line 70: / Line 70: @@
 :I must admit I've never read the history of AI in full before, and now when I read it I admit it's strange and outdated (and it's a GA somehow!). But two other articles are much worse, the timeline is just a list of everything ever called AI. It doesn't make any sense to include Ancient Greece there, but here it is. And the progress of AI is partly devoted to games, though it's only one task... IMO all these articles should be merged and rewritten, but I don't know who'd like to do it (definitely not me, I tried to start llm article several times before someone actually created it.) [[User:Artem.G|Artem.G]] ([[User talk:Artem.G|talk]]) 07:31, 14 April 2023 (UTC)
 :I don't think there's any single article the contents of the section can be plopped into in its entirety, but it may be worth taking out and adapting different pieces to improve different articles like [[History of artificial intelligence]], [[Computational linguistics]], [[Transformer (machine learning)]], etc. The fact that some of these articles are "poorly written or incomplete" is all the more reason to try to improve them! [[User:Colin M|Colin M]] ([[User talk:Colin M|talk]]) 15:02, 14 April 2023 (UTC)
-== Social media use  ==
-Minium in 1000 words [[Special:Contributions/103.113.103.247|103.113.103.247]] ([[User talk:103.113.103.247|talk]]) 20:49, 23 April 2023 (UTC)