Generative pre-trained transformer

Generative pre-trained transformers (GPT) are a type of large language model (LLM)^[1]^[2]^[3] and a prominent framework for generative artificial intelligence.^[4]^[5] The first GPT was introduced in 2018 by the American artificial intelligence (AI) company OpenAI.^[6] GPT models are artificial neural networks that are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content.^[2]^[3] As of 2023, most LLMs have these characteristics^[7] and are sometimes referred to broadly as GPTs.^[8]

OpenAI has released very influential GPT foundation models that have been sequentially numbered, to comprise its "GPT-n" series.^[9] Each of these was significantly more capable than the previous, due to increased size (number of trainable parameters) and training. The most recent of these, GPT-4, was released in March 2023. Such models have been the basis for their more task-specific GPT systems, including models fine-tuned for instruction following—which in turn power the ChatGPT chatbot service.^[1]

The term "GPT" is also used in the names and descriptions of such models developed by others. For example, other GPT foundation models include a series of models created by EleutherAI,^[10] and recently seven models created by Cerebras.^[11] Also, companies in different industries have developed task-specific GPTs in their respective fields, such as Salesforce's "EinsteinGPT" (for CRM)^[12] and Bloomberg's "BloombergGPT" (for finance).^[13]

History

Initial developments

Generative pretraining (GP) was a long-established concept in machine learning applications,^[14]^[15] but the transformer architecture was not available until 2017 when it was invented by employees at Google.^[16] That development led to the emergence of large language models like BERT in 2018^[17] and XLNet in 2019,^[18] which were pre-trained transformers (PT) but not designed to be generative (they were "encoder-only").^[19] Also around that time, in 2018, OpenAI published its article entitled "Improving Language Understanding by Generative Pre-Training," in which it introduced the first generative pre-trained transformer (GPT) system.^[20]

Prior to transformer-based architectures, the best-performing neural NLP (natural language processing) models commonly employed supervised learning from large amounts of manually-labeled data. The reliance on supervised learning limited their use on datasets that were not well-annotated, and also made it prohibitively expensive and time-consuming to train extremely large language models.^[20]

The semi-supervised approach OpenAI employed to make a large-scale generative system—and was first to do with a transformer model—involved two stages: an unsupervised generative "pretraining" stage to set initial parameters using a language modeling objective, and a supervised discriminative "fine-tuning" stage to adapt these parameters to a target task.^[20]

Later developments

Regarding more recent GPT foundation models, OpenAI published its first version of GPT-3 in July of 2020. There were three models, with 1B, 6.7B, 175B parameters, respectively named babbage, curie, davinci (since their initials are B, C, D).

In 2021 July, Codex was published. Codex was built by finetuning a 12B parameter GPT model (a different one from the three previous GPT-3 models) on codes on GitHub.^[21] It is named code-cushman-001.

In 2022 March, OpenAI published instruction finetuning some GPT-3 models, two of which were named davinci-instruct-beta (175B) and text-davinci-001.^[22] OpenAI also started beta testing code-davinci-002^[23].

text-davinci-002 was instruction finetuned from code-davinci-002.

text-davinci-003 and ChatGPT, both released in Nov 2022, both descended from text-davinci-002, by reinforcement learning from human feedback. text-davinci-003 is trained for accomplishing tasks, and ChatGPT is trained for conversations.^[24]^[25]

Foundational models

A foundational model is an AI model trained on broad data at scale such that it can be adapted to a wide range of downstream tasks.^[26]

Thus far, the most notable GPT foundation models have been from OpenAI's GPT-n series. The most recent from that is GPT-4, for which OpenAI declined to publish the size or training details (citing "the competitive landscape and the safety implications of large-scale models").^[27]

OpenAI's "GPT-n" series
Model	Architecture	Parameter count	Training data	Release date	Training cost
GPT-1	12-level, 12-headed Transformer decoder (no encoder), followed by linear-softmax.	117 million	BookCorpus:^[28] 4.5 GB of text, from 7000 unpublished books of various genres.	June 11, 2018^[6]	"1 month on 8 GPUs",^[6] or 1.7e19 FLOP.^[29]
GPT-2	GPT-1, but with modified normalization	1.5 billion	WebText: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit.	February 14, 2019 (initial/limited version) and November 5, 2019 (full version)^[30]	"tens of petaflop/s-day",^[31] or 1.5e21 FLOP.^[29]
GPT-3	GPT-2, but with modification to allow larger scaling	175 billion^[32]	499 Billion tokens consisting of CommonCrawl (570 GB), WebText, English Wikipedia, and two books corpora (Books1 and Books2).	May 28, 2020^[31]	3640 petaflop/s-day (Table D.1 ^[31]), or 3.1e23 FLOP.^[29]
GPT-3.5	Undisclosed	175 billion^[32]	Undisclosed	March 15, 2022	Undisclosed
GPT-4	Also trained with both text prediction and RLHF; accepts both text and images as input. Further details are not public.^[27]	Undisclosed	Undisclosed	March 14, 2023	Undisclosed. Estimated 2.1e25 FLOP.^[29]

Other such models include Google's PaLM, a broad foundation model that has been compared to GPT-3 and has recently been made available to developers via an API,^[33]^[34] and Together's GPT-JT, which has been reported as the closest-performing open-source alternative to GPT-3 (and is derived from earlier open-source GPTs).^[35] Meta AI (formerly Facebook) also has a generative transformer-based foundational large language model, known as LLaMA.^[36]

Foundational GPTs can also employ modalities other than text, for input and/or output. GPT-4 is a multi-modal LLM that is capable of processing text and image input (though its output is limited to text).^[37] Regarding multimodal output, some generative transformer-based models are used for text-to-image technologies such as diffusion^[38] and parallel decoding.^[39] Such kinds of models can serve as visual foundation models (VFMs) for developing downstream systems that can work with images.^[40]

Task-specific models

A foundational GPT model can be further adapted to produce more targeted systems directed to specific tasks and/or subject-matter domains. Methods for such adaptation can include additional fine-tuning (beyond that done for the foundation model) as well as certain forms of prompt engineering.^[41]

An important example of this is fine-tuning models to follow instructions, which is of course a fairly broad task but more targeted than a foundation model. In January 2022, OpenAI introduced "InstructGPT"—a series of models which were fine-tuned to follow instructions using a combination of supervised training and reinforcement learning from human feedback (RLHF) on base GPT-3 language models.^[42]^[43] Advantages this had over the bare foundational models included higher accuracy, less negative/toxic sentiment, and generally better alignment with user needs. Hence, OpenAI began using this as the basis for its API service offerings.^[44] Other instruction-tuned models have been released by others, including a fully open version.^[45]^[46]

Another (related) kind of task-specific models are chatbots, which engage in human-like conversation. In November 2022, OpenAI launched ChatGPT—an online chat interface powered by an instruction-tuned language model trained in a similar fashion to InstructGPT.^[47] They trained this model using RLHF, with human AI trainers providing conversations in which they played both the user and the AI, and mixed this new dialogue dataset with the InstructGPT dataset for a conversational format suitable for a chatbot. Other major chatbots currently include Microsoft's Bing Chat, which uses OpenAI's GPT-4 (as part of a broader close collaboration between OpenAI and Microsoft),^[48] and Google's competing chatbot Bard (initially based on their LaMDA family of conversation-trained language models, with plans to switch to PaLM).^[49]

Yet another kind of task that a GPT can be used for is the meta-task of generating its own instructions, like developing a series of prompts for 'itself' to be able to effectuate a more general goal given by a human user.^[50] This is known as an AI agent, and more specifically a recursive one because it uses results from its previous self-instructions to help it form its subsequent prompts; the first major example of this was Auto-GPT (which uses OpenAI's GPT models), and others have since been developed as well.^[51]

Multimodality

Generative transformer-based systems can also be targeted to tasks involving modalities beyond text.

For example, Microsoft’s “Visual ChatGPT” combines ChatGPT with visual foundation models (VFMs) to enable input or output comprising images as well as text.^[52] Also, advances in text-to-speech technology offer powerful tools for audio content creation when used in conjunction with foundational GPT language models.^[53]

Domain-specificity

Template:List spam GPT systems can be directed toward particular fields or domains. Some reported examples of such models and apps are as follows:

EinsteinGPT - for sales and marketing domains, to aid with customer relationship management (uses GPT-3.5)^[54]
BloombergGPT - for the financial domain, to aid with financial news and information (uses "freely available" AI methods, combined with their proprietary data)^[55]
Khanmigo – described as a GPT version for tutoring, in the education domain, it aids students using Khan Academy by guiding them through their studies without directly providing answers (powered by GPT-4)^[56]^[57]
SlackGPT - for the Slack instant-messaging service, to aid with navigating and summarizing discussions on it (uses OpenAI's API)^[58]
BioGPT - for the biomedical domain, to aid with biomedical literature text generation and mining (uses GPT-2)^[59]

Sometimes domain-specificity is accomplished via software plug-ins or add-ons. For example, several different companies have developed particular plugins that interact directly with OpenAI's ChatGPT interface,^[60]^[61] and Google Workspace has available add-ons such as “GPT for Sheets and Docs”—which is reported to aid use of spreadsheet functionality in Google Sheets.^[62]^[63]

Brand issues

OpenAI, which created the first generative pre-trained transformer (GPT) in 2018, has recently asserted that “GPT” should be regarded as a brand of OpenAI.^[64] In April 2023, OpenAI revised the brand guidelines in its terms of service to indicate that other businesses using its API to run their artificial intelligence (AI) services would no longer be able to include “GPT” in such names or branding.^[65] In May 2023, OpenAI engaged a brand management service to notify its API customers of this policy, although these notifications stopped short of making overt legal claims (such as allegations of trademark infringement or demands to cease and desist).^[64]

Relatedly, OpenAI has applied to the United States Patent and Trademark Office (USPTO) to seek domestic trademark registration for the term “GPT” in the field of AI.^[64] OpenAI sought to expedite handling of its application, but the USPTO declined that request in April 2023.^[66] To get the trademark approved, OpenAI would need to establish that the term is actually “distinctive” to their specific offerings rather than widely understood as a broader technical term for the kind of technology. Some media reports suggest that OpenAI may be able to do so based indirectly on the fame of its GPT-based chatbot product, ChatGPT,^[66]^[67] for which OpenAI has separately sought trademark protection (which it has sought to enforce more strongly).^[68] Other reports indicate that exclusivity for the bare term “GPT” seems unlikely to be granted,^[64]^[69] as it is used frequently to refer simply to AI systems that involve generative pre-trained transformers.^[3]^[70]^[71] If exclusive rights in the term “GPT” itself were to be granted, then everyone else using it in the name or branding of their related offerings would need to stop unless they have permission.^[69] Even if that were to occur, the trademark doctrine of descriptive fair use could still preserve some room to continue non-brand-related usage.^[72]

Selected bibliography

This section lists the main official publications from OpenAI and Microsoft on their GPT models.

GPT-1: report,^[6] GitHub release.^[73]

GPT-2: blog announcement,^[74] report on its decision of "staged release",^[75] GitHub release.^[76]

GPT-3: report.^[31] No GitHub or any other form of code release thenceforth.

InstructGPT: blog announcement,^[42] report.^[43]

ChatGPT: blog announcement (no report).^[47]

GPT-4: blog announcement,^[77] reports,^[78]^[79] model card.^[80]

References

^ ^a ^b Haddad, Mohammed. "How does GPT-4 work and how can you start using it in ChatGPT?". www.aljazeera.com.
^ ^a ^b "Generative AI: a game-changer society needs to be ready for". World Economic Forum.
^ ^a ^b ^c "The A to Z of Artificial Intelligence". Time. April 13, 2023.
^ Hu, Luhui (November 15, 2022). "Generative AI and Future". Medium.
^ "CSDL | IEEE Computer Society". www.computer.org.
^ ^a ^b ^c ^d "Improving language understanding with unsupervised learning". openai.com. June 11, 2018. Archived from the original on 2023-03-18. Retrieved 2023-03-18.
^ Toews, Rob. "The Next Generation Of Large Language Models". Forbes.
^ Mckendrick, Joe (March 13, 2023). "Most Jobs Soon To Be 'Influenced' By Artificial Intelligence, Research Out Of OpenAI And University Of Pennsylvania Suggests". Forbes.
^ "GPT-1 to GPT-4: Each of OpenAI's GPT Models Explained and Compared". MUO. April 11, 2023.
^ Alford, Anthony (July 13, 2021). "EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J". InfoQ.
^ "News" (Press release).
^ Morrison, Ryan (7 March 2023). "Salesforce launches EinsteinGPT built with OpenAI technology". Tech Monitor.
^ "The ChatGPT of Finance is Here, Bloomberg is Combining AI and Fintech". Forbes.
^ Hinton (et-al), Geoffrey (October 15, 2012). "Deep neural networks for acoustic modeling in speech recognition" (PDF). IEEE Signal Processing Magazine. Digital Object Identifier 10.1109/MSP.2012.2205597. doi:10.1109/MSP.2012.2205597. S2CID 206485943.
^ "A tutorial survey of architectures, algorithms, and applications for deep learning | APSIPA Transactions on Signal and Information Processing | Cambridge Core". Cambridge.org. 2014-01-22. doi:10.1017/atsip.2013.9. S2CID 9928823. Retrieved 2023-05-21. {{cite journal}}: Cite journal requires |journal= (help)
^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (December 5, 2017). "Attention Is All You Need". arXiv:1706.03762. {{cite journal}}: Cite journal requires |journal= (help)
^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (May 24, 2019). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2. {{cite journal}}: Cite journal requires |journal= (help)
^ Yang (et-al), Zhilin (2019). "XLNet" (PDF). Proceedings from NeurIPS 2019.
^ Naik, Amit Raja (September 23, 2021). "Google Introduces New Architecture To Reduce Cost Of Transformers". Analytics India Magazine.
^ ^a ^b ^c Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
^ Chen, Mark; Tworek, Jerry; Jun, Heewoo; Yuan, Qiming; Ponde de Oliveira Pinto, Henrique; Kaplan, Jared; Edwards, Harri; Burda, Yuri; Joseph, Nicholas; Brockman, Greg; Ray, Alex; Puri, Raul; Krueger, Gretchen; Petrov, Michael; Khlaaf, Heidy (2021-07-01). "Evaluating Large Language Models Trained on Code". {{cite journal}}: Cite journal requires |journal= (help)
^ Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (2022-12-06). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744.
^ "New GPT-3 capabilities: Edit & insert". openai.com. Retrieved 2023-06-24.
^ Fu, Yao; Peng, Hao; Khot, Tushar (2022). "How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources". Yao Fu’s Notion.
^ "Model index for researchers". OpenAI API. Archived from the original on 23 Jun 2023. Retrieved 2023-06-23.
^ "Introducing the Center for Research on Foundation Models (CRFM)". Stanford HAI.
^ ^a ^b OpenAI (2023). "GPT-4 Technical Report" (PDF). Archived (PDF) from the original on 2023-03-14. Retrieved 2023-03-16.
^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. IEEE International Conference on Computer Vision (ICCV) 2015. pp. 19–27. arXiv:1506.06724. Archived from the original on 2023-02-05. Retrieved 2023-02-07.
^ ^a ^b ^c ^d "ML input trends visualization". Epoch. Retrieved 2023-05-02.
^ Vincent, James (November 7, 2019). "OpenAI has published the text-generating AI it said was too dangerous to share". The Verge.
^ ^a ^b ^c ^d Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165v4. {{cite journal}}: Cite journal requires |journal= (help)
^ ^a ^b Ver Meer, Dave (June 1, 2023). "ChatGPT Statistics". NamePepper. Retrieved 2023-06-09.
^ Vincent, James (March 14, 2023). "Google opens up its AI language model PaLM to challenge OpenAI and GPT-3". The Verge.
^ "Google Opens Access to PaLM Language Model".
^ Iyer, Aparna (November 30, 2022). "Meet GPT-JT, the Closest Open Source Alternative to GPT-3". Analytics India Magazine.
^ "Meta Debuts AI Language Model, But It's Only for Researchers". PCMAG.
^ Islam, Arham (March 27, 2023). "Multimodal Language Models: The Future of Artificial Intelligence (AI)".
^ Islam, Arham (November 14, 2022). "How Do DALL·E 2, Stable Diffusion, and Midjourney Work?".
^ Saha, Shritama (January 4, 2023). "Google Launches Muse, A New Text-to-Image Transformer Model". Analytics India Magazine.
^ Wu (et-al), Chenfei (March 8, 2023). "Visual ChatGPT". arXiv:2303.04671 [cs.CV].
^ Bommasani (et-al), Rishi (July 12, 2022). "On the Opportunities and Risks of Foundation Models". arXiv:2108.07258 [cs.LG].
^ ^a ^b "Aligning language models to follow instructions". openai.com. Archived from the original on 23 March 2023. Retrieved 23 March 2023.
^ ^a ^b Ouyang, Long; Wu, Jeff; Jiang, Xu; et al. (4 March 2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155. {{cite journal}}: Cite journal requires |journal= (help)
^ Ramnani, Meeta (January 28, 2022). "OpenAI dumps its own GPT-3 for something called InstructGPT, and for right reason". Analytics India Magazine.
^ "Stanford CRFM". crfm.stanford.edu.
^ "Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM". Databricks. April 12, 2023.
^ ^a ^b "Introducing ChatGPT". openai.com. Archived from the original on 2023-03-16. Retrieved 2023-03-16.
^ Wiggers, Kyle (May 4, 2023). "Microsoft doubles down on AI with new Bing features".
^ "ChatGPT vs. Bing vs. Google Bard: Which AI Is the Most Helpful?". CNET.
^ "Auto-GPT, BabyAGI, and AgentGPT: How to use AI agents". Mashable. April 19, 2023.
^ Marr, Bernard. "Auto-GPT May Be The Strong AI Tool That Surpasses ChatGPT". Forbes.
^ "Microsoft Open-Sources Multimodal Chatbot Visual ChatGPT". InfoQ.
^ Edwards, Benj (January 9, 2023). "Microsoft's new AI can simulate anyone's voice with 3 seconds of audio". Ars Technica.
^ Morrison, Ryan (March 7, 2023). "Salesforce launches EinsteinGPT built with OpenAI technology".
^ Leswing, Kif (April 13, 2023). "Bloomberg plans to integrate GPT-style A.I. into its terminal". CNBC.
^ "Learning nonprofit Khan Academy is piloting a version of GPT called Khanmigo". Fast Company. May 4, 2023. Retrieved May 22, 2023.
^ "Khan Academy Pilots GPT-4 Powered Tool Khanmigo for Teachers -". THE Journal.
^ Hachman, Mark (May 4, 2023). "Slack GPT will bring AI chatbots to your conversations". PCWorld.
^ Luo (et-al), Renqian (April 3, 2023). "BioGPT: Generative pre-trained transformer for biomedical text generation and mining". Briefings in Bioinformatics. 23 (6). arXiv:2210.10341. doi:10.1093/bib/bbac409. PMID 36156661.
^ "Know about ChatGPT's 13 best plugins, designed to improve your overall user experience – Latest Digital Transformation Trends | Cloud News | Wire19". May 5, 2023.
^ "ChatGPT plugins". openai.com.
^ "How to Use ChatGPT on Google Sheets With GPT for Sheets and Docs". MUO. March 12, 2023.
^ Asay, Matt (February 27, 2023). "Embrace and extend Excel for AI data prep". InfoWorld.
^ ^a ^b ^c ^d Hicks, William (May 10, 2023). "ChatGPT creator OpenAI is asking startups to remove 'GPT' from their names". www.bizjournals.com. Retrieved 2023-05-21.
^ OpenAI (April 24, 2023). "Brand Guidelines". Retrieved 21 May 2023.
^ ^a ^b Heah, Alexa (April 26, 2023). "OpenAI Unsuccessful At Speeding Up Its Attempt To Trademark 'GPT'". DesignTAXI. Retrieved May 21, 2023.
^ 25 April 2023, 08:04 am (2023-04-25). "OpenAI Wants to Trademark 'GPT' Amid Rise of AI Chatbots". Tech Times. Retrieved 2023-05-21.{{cite web}}: CS1 maint: numeric names: authors list (link)
^ "OpenAI files a UDRP case against the current owner of ChatGPT.com". Retrieved 2023-05-21.
^ ^a ^b Demcak, Tramatm-Igor (2023-04-26). "OpenAI's Battle for Brand Protection: Can GPT be trademarked?". Lexology. Archived from the original on May 5, 2023. Retrieved 2023-05-22.
^ Lawton, George (April 20, 2023). "ChatGPT vs. GPT: How are they different? | TechTarget". Enterprise AI. Archived from the original on May 9, 2023. Retrieved 2023-05-21.
^ Robb, Drew (2023-04-12). "GPT-4 vs. ChatGPT: AI Chatbot Comparison". eWEEK. Retrieved 2023-05-21.
^ Rheintgen, Husch Blackwell LLP-Kathleen A. (2013-08-16). "Branding 101: trademark descriptive fair use". Lexology. Retrieved 2023-05-21.
^ finetune-transformer-lm, OpenAI, June 11, 2018, retrieved 2023-05-01
^ "GPT-2: 1.5B release". openai.com. Retrieved 2023-05-01.
^ Solaiman, Irene; Brundage, Miles; Clark, Jack; Askell, Amanda; Herbert-Voss, Ariel; Wu, Jeff; Radford, Alec; Krueger, Gretchen; Kim, Jong Wook; Kreps, Sarah; McCain, Miles; Newhouse, Alex; Blazakis, Jason; McGuffie, Kris; Wang, Jasmine (2019-11-12). "Release Strategies and the Social Impacts of Language Models". arXiv:1908.09203 [cs.CL].
^ gpt-2, OpenAI, 2023-05-01, retrieved 2023-05-01
^ "GPT-4". openai.com. Retrieved 2023-05-01.
^ OpenAI (2023-03-27). "GPT-4 Technical Report". arXiv:2303.08774 [cs.CL].
^ Bubeck, Sébastien; Chandrasekaran, Varun; Eldan, Ronen; Gehrke, Johannes; Horvitz, Eric; Kamar, Ece; Lee, Peter; Lee, Yin Tat; Li, Yuanzhi; Lundberg, Scott; Nori, Harsha; Palangi, Hamid; Ribeiro, Marco Tulio; Zhang, Yi (2023-04-13). "Sparks of Artificial General Intelligence: Early experiments with GPT-4". arXiv:2303.12712 [cs.CL].
^ GPT-4 System Card, OpenAI, March 23 2023 (Accessed May 22 2023).

[:1-1] Haddad, Mohammed. "How does GPT-4 work and how can you start using it in ChatGPT?". www.aljazeera.com.

[:0-2] "Generative AI: a game-changer society needs to be ready for". World Economic Forum.

[:4-3] "The A to Z of Artificial Intelligence". Time. April 13, 2023.

[4] Hu, Luhui (November 15, 2022). "Generative AI and Future". Medium.

[5] "CSDL | IEEE Computer Society". www.computer.org.

[gpt1-6] "Improving language understanding with unsupervised learning". openai.com. June 11, 2018. Archived from the original on 2023-03-18. Retrieved 2023-03-18.

[7] Toews, Rob. "The Next Generation Of Large Language Models". Forbes.

[8] Mckendrick, Joe (March 13, 2023). "Most Jobs Soon To Be 'Influenced' By Artificial Intelligence, Research Out Of OpenAI And University Of Pennsylvania Suggests". Forbes.

[9] "GPT-1 to GPT-4: Each of OpenAI's GPT Models Explained and Compared". MUO. April 11, 2023.

[10] Alford, Anthony (July 13, 2021). "EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J". InfoQ.

[11] "News" (Press release).

[12] Morrison, Ryan (7 March 2023). "Salesforce launches EinsteinGPT built with OpenAI technology". Tech Monitor.

[13] "The ChatGPT of Finance is Here, Bloomberg is Combining AI and Fintech". Forbes.

[14] Hinton (et-al), Geoffrey (October 15, 2012). "Deep neural networks for acoustic modeling in speech recognition" (PDF). IEEE Signal Processing Magazine. Digital Object Identifier 10.1109/MSP.2012.2205597. doi:10.1109/MSP.2012.2205597. S2CID 206485943.

[15] "A tutorial survey of architectures, algorithms, and applications for deep learning | APSIPA Transactions on Signal and Information Processing | Cambridge Core". Cambridge.org. 2014-01-22. doi:10.1017/atsip.2013.9. S2CID 9928823. Retrieved 2023-05-21. {{cite journal}}: Cite journal requires |journal= (help)

[16] Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (December 5, 2017). "Attention Is All You Need". arXiv:1706.03762. {{cite journal}}: Cite journal requires |journal= (help)

[17] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (May 24, 2019). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2. {{cite journal}}: Cite journal requires |journal= (help)

[18] Yang (et-al), Zhilin (2019). "XLNet" (PDF). Proceedings from NeurIPS 2019.

[19] Naik, Amit Raja (September 23, 2021). "Google Introduces New Architecture To Reduce Cost Of Transformers". Analytics India Magazine.

[gpt1paper-20] Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.

[21] Chen, Mark; Tworek, Jerry; Jun, Heewoo; Yuan, Qiming; Ponde de Oliveira Pinto, Henrique; Kaplan, Jared; Edwards, Harri; Burda, Yuri; Joseph, Nicholas; Brockman, Greg; Ray, Alex; Puri, Raul; Krueger, Gretchen; Petrov, Michael; Khlaaf, Heidy (2021-07-01). "Evaluating Large Language Models Trained on Code". {{cite journal}}: Cite journal requires |journal= (help)

[22] Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (2022-12-06). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744.

[23] "New GPT-3 capabilities: Edit & insert". openai.com. Retrieved 2023-06-24.

[fu2022-24] Fu, Yao; Peng, Hao; Khot, Tushar (2022). "How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources". Yao Fu’s Notion.

[25] "Model index for researchers". OpenAI API. Archived from the original on 23 Jun 2023. Retrieved 2023-06-23.

[26] "Introducing the Center for Research on Foundation Models (CRFM)". Stanford HAI.

[gpt4-report-27] OpenAI (2023). "GPT-4 Technical Report" (PDF). Archived (PDF) from the original on 2023-03-14. Retrieved 2023-03-16.

[28] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. IEEE International Conference on Computer Vision (ICCV) 2015. pp. 19–27. arXiv:1506.06724. Archived from the original on 2023-02-05. Retrieved 2023-02-07.

[:3-29] "ML input trends visualization". Epoch. Retrieved 2023-05-02.

[30] Vincent, James (November 7, 2019). "OpenAI has published the text-generating AI it said was too dangerous to share". The Verge.

[:2-31] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165v4. {{cite journal}}: Cite journal requires |journal= (help)

[:8-32] Ver Meer, Dave (June 1, 2023). "ChatGPT Statistics". NamePepper. Retrieved 2023-06-09.

[33] Vincent, James (March 14, 2023). "Google opens up its AI language model PaLM to challenge OpenAI and GPT-3". The Verge.

[34] "Google Opens Access to PaLM Language Model".

[35] Iyer, Aparna (November 30, 2022). "Meet GPT-JT, the Closest Open Source Alternative to GPT-3". Analytics India Magazine.

[36] "Meta Debuts AI Language Model, But It's Only for Researchers". PCMAG.

[37] Islam, Arham (March 27, 2023). "Multimodal Language Models: The Future of Artificial Intelligence (AI)".

[38] Islam, Arham (November 14, 2022). "How Do DALL·E 2, Stable Diffusion, and Midjourney Work?".

[39] Saha, Shritama (January 4, 2023). "Google Launches Muse, A New Text-to-Image Transformer Model". Analytics India Magazine.

[40] Wu (et-al), Chenfei (March 8, 2023). "Visual ChatGPT". arXiv:2303.04671 [cs.CV].

[41] Bommasani (et-al), Rishi (July 12, 2022). "On the Opportunities and Risks of Foundation Models". arXiv:2108.07258 [cs.LG].

[instructgpt-blog-42] "Aligning language models to follow instructions". openai.com. Archived from the original on 23 March 2023. Retrieved 23 March 2023.

[instructgpt-paper-43] Ouyang, Long; Wu, Jeff; Jiang, Xu; et al. (4 March 2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155. {{cite journal}}: Cite journal requires |journal= (help)

[44] Ramnani, Meeta (January 28, 2022). "OpenAI dumps its own GPT-3 for something called InstructGPT, and for right reason". Analytics India Magazine.

[45] "Stanford CRFM". crfm.stanford.edu.

[46] "Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM". Databricks. April 12, 2023.

[chatgpt-blog-47] "Introducing ChatGPT". openai.com. Archived from the original on 2023-03-16. Retrieved 2023-03-16.

[48] Wiggers, Kyle (May 4, 2023). "Microsoft doubles down on AI with new Bing features".

[49] "ChatGPT vs. Bing vs. Google Bard: Which AI Is the Most Helpful?". CNET.

[50] "Auto-GPT, BabyAGI, and AgentGPT: How to use AI agents". Mashable. April 19, 2023.

[51] Marr, Bernard. "Auto-GPT May Be The Strong AI Tool That Surpasses ChatGPT". Forbes.

[52] "Microsoft Open-Sources Multimodal Chatbot Visual ChatGPT". InfoQ.

[53] Edwards, Benj (January 9, 2023). "Microsoft's new AI can simulate anyone's voice with 3 seconds of audio". Ars Technica.

[54] Morrison, Ryan (March 7, 2023). "Salesforce launches EinsteinGPT built with OpenAI technology".

[55] Leswing, Kif (April 13, 2023). "Bloomberg plans to integrate GPT-style A.I. into its terminal". CNBC.

[56] "Learning nonprofit Khan Academy is piloting a version of GPT called Khanmigo". Fast Company. May 4, 2023. Retrieved May 22, 2023.

[57] "Khan Academy Pilots GPT-4 Powered Tool Khanmigo for Teachers -". THE Journal.

[58] Hachman, Mark (May 4, 2023). "Slack GPT will bring AI chatbots to your conversations". PCWorld.

[59] Luo (et-al), Renqian (April 3, 2023). "BioGPT: Generative pre-trained transformer for biomedical text generation and mining". Briefings in Bioinformatics. 23 (6). arXiv:2210.10341. doi:10.1093/bib/bbac409. PMID 36156661.

[60] "Know about ChatGPT's 13 best plugins, designed to improve your overall user experience – Latest Digital Transformation Trends | Cloud News | Wire19". May 5, 2023.

[61] "ChatGPT plugins". openai.com.

[62] "How to Use ChatGPT on Google Sheets With GPT for Sheets and Docs". MUO. March 12, 2023.

[63] Asay, Matt (February 27, 2023). "Embrace and extend Excel for AI data prep". InfoWorld.

[:5-64] Hicks, William (May 10, 2023). "ChatGPT creator OpenAI is asking startups to remove 'GPT' from their names". www.bizjournals.com. Retrieved 2023-05-21.

[65] OpenAI (April 24, 2023). "Brand Guidelines". Retrieved 21 May 2023.

[:6-66] Heah, Alexa (April 26, 2023). "OpenAI Unsuccessful At Speeding Up Its Attempt To Trademark 'GPT'". DesignTAXI. Retrieved May 21, 2023.

[67] 25 April 2023, 08:04 am (2023-04-25). "OpenAI Wants to Trademark 'GPT' Amid Rise of AI Chatbots". Tech Times. Retrieved 2023-05-21.{{cite web}}: CS1 maint: numeric names: authors list (link)

[68] "OpenAI files a UDRP case against the current owner of ChatGPT.com". Retrieved 2023-05-21.

[:7-69] Demcak, Tramatm-Igor (2023-04-26). "OpenAI's Battle for Brand Protection: Can GPT be trademarked?". Lexology. Archived from the original on May 5, 2023. Retrieved 2023-05-22.

[70] Lawton, George (April 20, 2023). "ChatGPT vs. GPT: How are they different? | TechTarget". Enterprise AI. Archived from the original on May 9, 2023. Retrieved 2023-05-21.

[71] Robb, Drew (2023-04-12). "GPT-4 vs. ChatGPT: AI Chatbot Comparison". eWEEK. Retrieved 2023-05-21.

[72] Rheintgen, Husch Blackwell LLP-Kathleen A. (2013-08-16). "Branding 101: trademark descriptive fair use". Lexology. Retrieved 2023-05-21.

[73] finetune-transformer-lm, OpenAI, June 11, 2018, retrieved 2023-05-01

[74] "GPT-2: 1.5B release". openai.com. Retrieved 2023-05-01.

[75] Solaiman, Irene; Brundage, Miles; Clark, Jack; Askell, Amanda; Herbert-Voss, Ariel; Wu, Jeff; Radford, Alec; Krueger, Gretchen; Kim, Jong Wook; Kreps, Sarah; McCain, Miles; Newhouse, Alex; Blazakis, Jason; McGuffie, Kris; Wang, Jasmine (2019-11-12). "Release Strategies and the Social Impacts of Language Models". arXiv:1908.09203 [cs.CL].

[76] gpt-2, OpenAI, 2023-05-01, retrieved 2023-05-01

[77] "GPT-4". openai.com. Retrieved 2023-05-01.

[78] OpenAI (2023-03-27). "GPT-4 Technical Report". arXiv:2303.08774 [cs.CL].

[79] Bubeck, Sébastien; Chandrasekaran, Varun; Eldan, Ronen; Gehrke, Johannes; Horvitz, Eric; Kamar, Ece; Lee, Peter; Lee, Yin Tat; Li, Yuanzhi; Lundberg, Scott; Nori, Harsha; Palangi, Hamid; Ribeiro, Marco Tulio; Zhang, Yi (2023-04-13). "Sparks of Artificial General Intelligence: Early experiments with GPT-4". arXiv:2303.12712 [cs.CL].

[80] GPT-4 System Card, OpenAI, March 23 2023 (Accessed May 22 2023).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

v t e Differentiable computing
General	Differentiable programming Information geometry Statistical manifold Automatic differentiation Neuromorphic computing Pattern recognition Ricci calculus Computational learning theory Inductive bias
Hardware	IPU TPU VPU Memristor SpiNNaker
Software libraries	TensorFlow PyTorch Keras scikit-learn Theano JAX Flux.jl MindSpore
Portals Computer programming Technology