GitHub Copilot: Difference between revisions
m Added a single line noting that Copilot has used GPT-4 as base model since November 2023, cited Github blog |
Restored revision 1244006298 by Citation bot (talk): Rm spam |
||
(21 intermediate revisions by 16 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Artificial intelligence tool}} |
{{Short description|Artificial intelligence tool}} |
||
{{Dist|Microsoft Copilot}} |
|||
{{Use dmy dates|date=April 2022}} |
{{Use dmy dates|date=April 2022}} |
||
{{infobox software |
{{infobox software |
||
Line 10: | Line 11: | ||
| latest release version = 1.7.4421 |
| latest release version = 1.7.4421 |
||
}} |
}} |
||
'''GitHub Copilot''' is a [[code completion]] |
'''GitHub Copilot''' is a [[code completion]] and [[automatic programming]] tool developed by [[GitHub]] and [[OpenAI]] that assists users of [[Visual Studio Code]], [[Visual Studio]], [[Neovim]], and [[JetBrains]] [[integrated development environment]]s (IDEs) by [[autocomplete|autocompleting]] code.<ref name=":0">{{cite web |last1=Gershgorn |first1=Dave |date=29 June 2021 |title=GitHub and OpenAI launch a new AI tool that generates its own code |url=https://www.theverge.com/2021/6/29/22555777/github-openai-ai-tool-autocomplete-code |access-date=6 July 2021 |website=[[The Verge]] |language=en-US}}</ref> Currently available by subscription to individual developers and to businesses, the [[generative artificial intelligence]] software was first announced by GitHub on 29 June 2021, and works best for users coding in [[Python (programming language)|Python]], [[JavaScript]], [[TypeScript]], [[Ruby (programming language)|Ruby]], and [[Go (programming language)|Go]].<ref name=":2">{{Cite web |title=GitHub Copilot · Your AI pair programmer |url=https://copilot.github.com/ |access-date=7 April 2022 |website=GitHub Copilot |language=en-US}}</ref> In March 2023 GitHub announced plans for "Copilot X", which will incorporate a [[chatbot]] based on [[GPT-4]], as well as support for voice commands, into Copilot.<ref>{{cite web |title=GitHub Copilot gets a new ChatGPT-like assistant to help developers write and fix code |url=https://www.theverge.com/2023/3/22/23651456/github-copilot-x-gpt-4-code-chat-voice-support |website=The Verge |date=22 March 2023 |access-date=5 September 2023}}</ref> |
||
== History == |
== History == |
||
On June 29, 2021, GitHub announced GitHub Copilot for technical preview in the Visual Studio Code development environment.<ref name=":0" /><ref>{{Cite web |date=29 June 2021 |title=Introducing GitHub Copilot: your AI pair programmer |url=https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/ |access-date=7 April 2022 |website=The GitHub Blog |language=en-US}}</ref> GitHub Copilot was released as a [[Plug-in (computing)|plugin]] on the JetBrains marketplace on October 29, 2021.<ref>{{Cite web |title=GitHub Copilot - IntelliJ IDEs Plugin {{!}} Marketplace |url=https://plugins.jetbrains.com/plugin/17718-github-copilot/versions/stable |access-date=7 April 2022 |website=JetBrains Marketplace}}</ref> October 27, 2021, GitHub released the GitHub Copilot Neovim plugin as a public repository.<ref>{{Citation |title=Copilot.vim |date=7 April 2022 |url=https://github.com/github/copilot.vim |publisher=GitHub |access-date=7 April 2022}}</ref> GitHub announced Copilot's availability for the Visual Studio 2022 IDE on March 29, 2022.<ref>{{Cite web |date=29 March 2022 |title=GitHub Copilot now available for Visual Studio 2022 |url=https://github.blog/2022-03-29-github-copilot-now-available-for-visual-studio-2022/ |access-date=7 April 2022 |website=The GitHub Blog |language=en-US}}</ref> On June 21, 2022, GitHub announced that Copilot was out of "technical preview", and is available as a subscription-based service for individual developers.<ref>{{Cite web |date=21 June 2022 |title=GitHub Copilot is generally available to all developers |url=https://github.blog/2022-06-21-github-copilot-is-generally-available-to-all-developers/ |access-date=21 June 2022 |website=The GitHub Blog |language=en-US}}</ref> |
On June 29, 2021, GitHub announced GitHub Copilot for technical preview in the Visual Studio Code development environment.<ref name=":0" /><ref>{{Cite web |date=29 June 2021 |title=Introducing GitHub Copilot: your AI pair programmer |url=https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/ |access-date=7 April 2022 |website=The GitHub Blog |language=en-US}}</ref> GitHub Copilot was released as a [[Plug-in (computing)|plugin]] on the JetBrains marketplace on October 29, 2021.<ref>{{Cite web |title=GitHub Copilot - IntelliJ IDEs Plugin {{!}} Marketplace |url=https://plugins.jetbrains.com/plugin/17718-github-copilot/versions/stable |access-date=7 April 2022 |website=JetBrains Marketplace}}</ref> October 27, 2021, GitHub released the GitHub Copilot Neovim plugin as a public repository.<ref>{{Citation |title=Copilot.vim |date=7 April 2022 |url=https://github.com/github/copilot.vim |publisher=GitHub |access-date=7 April 2022}}</ref> GitHub announced Copilot's availability for the Visual Studio 2022 IDE on March 29, 2022.<ref>{{Cite web |date=29 March 2022 |title=GitHub Copilot now available for Visual Studio 2022 |url=https://github.blog/2022-03-29-github-copilot-now-available-for-visual-studio-2022/ |access-date=7 April 2022 |website=The GitHub Blog |language=en-US}}</ref> On June 21, 2022, GitHub announced that Copilot was out of "technical preview", and is available as a subscription-based service for individual developers.<ref>{{Cite web |date=21 June 2022 |title=GitHub Copilot is generally available to all developers |url=https://github.blog/2022-06-21-github-copilot-is-generally-available-to-all-developers/ |access-date=21 June 2022 |website=The GitHub Blog |language=en-US}}</ref> |
||
GitHub Copilot is the evolution of the 'Bing Code Search' plugin for Visual Studio 2013, which was a Microsoft Research project released in February 2014.<ref>{{Cite web |last=Lardinois |first=Frederic |date=2014-02-17 |title=Microsoft Launches Smart Visual Studio Add-On For Code Snippet Search |url=https://techcrunch.com/2014/02/17/microsoft-launches-smart-visual-studio-add-on-for-code-snippet-search/ |access-date=2023-09-05 |website=TechCrunch |language=en-US}}</ref> This plugin integrated with various sources, including MSDN and |
GitHub Copilot is the evolution of the 'Bing Code Search' plugin for Visual Studio 2013, which was a Microsoft Research project released in February 2014.<ref>{{Cite web |last=Lardinois |first=Frederic |date=2014-02-17 |title=Microsoft Launches Smart Visual Studio Add-On For Code Snippet Search |url=https://techcrunch.com/2014/02/17/microsoft-launches-smart-visual-studio-add-on-for-code-snippet-search/ |access-date=2023-09-05 |website=TechCrunch |language=en-US}}</ref> This plugin integrated with various sources, including MSDN and Stack Overflow, to provide high-quality contextually relevant code snippets in response to natural language queries.<ref>{{Cite web |date=2014-02-11 |title=Bing Code Search |url=https://www.microsoft.com/en-us/research/video/bing-code-search/ |access-date=2023-09-05 |website=Microsoft Research |language=en-US}}</ref> |
||
==Features== |
==Features== |
||
When provided with a programming problem in [[natural language]], Copilot is capable of generating solution code.<ref name=":1">{{Cite book |last1=Finnie-Ansley |first1=James |last2=Denny |first2=Paul |last3=Becker |first3=Brett A. |last4=Luxton-Reilly |first4=Andrew |last5=Prather |first5=James |title=Australasian Computing Education Conference |chapter=The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming |date=14 February 2022 |series=ACE '22 |language=en-US |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=10–19 |doi=10.1145/3511861.3511863 |isbn=978-1-4503-9643-1 |s2cid=246681316 |doi-access=free}}</ref> It is also able to describe input code in [[English language|English]] and translate code between programming languages.<ref name=":1" /> |
When provided with a programming problem in [[natural language]], Copilot is capable of generating solution code.<ref name=":1">{{Cite book |last1=Finnie-Ansley |first1=James |last2=Denny |first2=Paul |last3=Becker |first3=Brett A. |last4=Luxton-Reilly |first4=Andrew |last5=Prather |first5=James |title=Australasian Computing Education Conference |chapter=The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming |date=14 February 2022 |series=ACE '22 |language=en-US |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=10–19 |doi=10.1145/3511861.3511863 |isbn=978-1-4503-9643-1 |s2cid=246681316 |doi-access=free}}</ref> It is also able to describe input code in [[English language|English]] and translate code between programming languages.<ref name=":1" /> |
||
According to its website, GitHub Copilot includes assistive features for programmers, such as the conversion of [[Comment (computer programming)|code comments]] to runnable code, and autocomplete for chunks of code, repetitive sections of code, and entire [[Method (computer programming)|methods]] and/or [[Functions (programming)|functions]].<ref name=":2" |
According to its website, GitHub Copilot includes assistive features for programmers, such as the conversion of [[Comment (computer programming)|code comments]] to runnable code, and autocomplete for chunks of code, repetitive sections of code, and entire [[Method (computer programming)|methods]] and/or [[Functions (programming)|functions]].<ref name=":2"/><ref>{{Cite journal |last1=Sobania |first1=Dominik |last2=Schweim |first2=Dirk |last3=Rothlauf |first3=Franz |date=2022 |title=A Comprehensive Survey on Program Synthesis with Evolutionary Algorithms |url=https://ieeexplore.ieee.org/document/9743417 |journal=IEEE Transactions on Evolutionary Computation |volume=27 |pages=82–97 |doi=10.1109/TEVC.2022.3162324 |s2cid=247721793 |issn=1941-0026}}</ref> GitHub reports that Copilot’s autocomplete feature is accurate roughly half of the time; with some Python function header code, for example, Copilot correctly autocompleted the rest of the function body code 43% of the time on the first try and 57% of the time after ten attempts.<ref name=":2" /> |
||
GitHub states that Copilot’s features allow programmers to navigate unfamiliar coding [[Software framework|frameworks]] and languages by reducing the amount of time users spend reading [[documentation]].<ref name=":2" /> |
GitHub states that Copilot’s features allow programmers to navigate unfamiliar coding [[Software framework|frameworks]] and languages by reducing the amount of time users spend reading [[documentation]].<ref name=":2" /> |
||
Line 27: | Line 28: | ||
GitHub Copilot was initially powered by the [[OpenAI Codex]],<ref>{{Cite web |last=Krill |first=Paul |date=12 August 2021 |title=OpenAI offers API for GitHub Copilot AI model |url=https://www.infoworld.com/article/3629469/openai-offers-api-for-github-copilot-ai-model.html |access-date=7 April 2022 |website=InfoWorld |language=en}}</ref> which is a modified, production version of the [[GPT-3|Generative Pre-trained Transformer 3]] (GPT-3), a language model using [[Deep learning|deep-learning]] to produce human-like text.<ref>{{Cite web |date=3 June 2020 |title=OpenAI Releases GPT-3, The Largest Model So Far |url=https://analyticsindiamag.com/open-ai-gpt-3-language-model/ |access-date=7 April 2022 |website=Analytics India Magazine |language=en-US}}</ref> The Codex model is additionally trained on gigabytes of source code in a dozen programming languages. |
GitHub Copilot was initially powered by the [[OpenAI Codex]],<ref>{{Cite web |last=Krill |first=Paul |date=12 August 2021 |title=OpenAI offers API for GitHub Copilot AI model |url=https://www.infoworld.com/article/3629469/openai-offers-api-for-github-copilot-ai-model.html |access-date=7 April 2022 |website=InfoWorld |language=en}}</ref> which is a modified, production version of the [[GPT-3|Generative Pre-trained Transformer 3]] (GPT-3), a language model using [[Deep learning|deep-learning]] to produce human-like text.<ref>{{Cite web |date=3 June 2020 |title=OpenAI Releases GPT-3, The Largest Model So Far |url=https://analyticsindiamag.com/open-ai-gpt-3-language-model/ |access-date=7 April 2022 |website=Analytics India Magazine |language=en-US}}</ref> The Codex model is additionally trained on gigabytes of source code in a dozen programming languages. |
||
Copilot’s OpenAI Codex is trained on a selection of the English language, public GitHub repositories, and other publicly available source code.<ref name=":2" |
Copilot’s OpenAI Codex is trained on a selection of the English language, public GitHub repositories, and other publicly available source code.<ref name=":2"/> This includes a filtered dataset of 159 [[gigabyte]]s of Python code sourced from 54 million public GitHub repositories.<ref>{{Cite web |title=OpenAI Announces 12 Billion Parameter Code-Generation AI Codex |url=https://www.infoq.com/news/2021/08/openai-codex/ |access-date=7 April 2022 |website=InfoQ |language=en}}</ref> |
||
OpenAI’s GPT-3 is licensed exclusively to [[Microsoft]], GitHub’s [[Parent Company|parent company]].<ref>{{Cite web |title=OpenAI is giving Microsoft exclusive access to its GPT-3 language model |url=https://www.technologyreview.com/2020/09/23/1008729/openai-is-giving-microsoft-exclusive-access-to-its-gpt-3-language-model/ |access-date=7 April 2022 |website=MIT Technology Review |language=en}}</ref> |
OpenAI’s GPT-3 is licensed exclusively to [[Microsoft]], GitHub’s [[Parent Company|parent company]].<ref>{{Cite web |title=OpenAI is giving Microsoft exclusive access to its GPT-3 language model |url=https://www.technologyreview.com/2020/09/23/1008729/openai-is-giving-microsoft-exclusive-access-to-its-gpt-3-language-model/ |access-date=7 April 2022 |website=MIT Technology Review |language=en}}</ref> |
||
In November 2023, Copilot was updated to use OpenAI's [[GPT-4]] model |
In November 2023, Copilot Chat was updated to use OpenAI's [[GPT-4]] model.<ref>{{cite web | url=https://github.blog/changelog/2023-11-30-github-copilot-november-30th-update/ | title=GitHub Copilot – November 30th Update · GitHub Changelog | date=30 November 2023 }}</ref> |
||
== Reception == |
== Reception == |
||
Since Copilot's release, there have been concerns with its security and educational impact, as well as licensing controversy surrounding the code it produces.<ref name="Verge legal" /><ref name=":1" /><ref name=":4">{{cite arXiv |last1=Pearce |first1=Hammond |last2=Ahmad |first2=Baleegh |last3=Tan |first3=Benjamin |last4=Dolan-Gavitt |first4=Brendan |last5=Karri |first5=Ramesh |date=16 December 2021 |title=Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions |class=cs.CR |eprint=2108.09293 }}</ref> |
Since Copilot's release, there have been concerns with its security and educational impact, as well as licensing controversy surrounding the code it produces.<ref name="Verge legal" /><ref name=":1" /><ref name=":4">{{cite arXiv |last1=Pearce |first1=Hammond |last2=Ahmad |first2=Baleegh |last3=Tan |first3=Benjamin |last4=Dolan-Gavitt |first4=Brendan |last5=Karri |first5=Ramesh |date=16 December 2021 |title=Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions |class=cs.CR |eprint=2108.09293 }}</ref> |
||
Line 39: | Line 38: | ||
=== Licensing controversy === |
=== Licensing controversy === |
||
While GitHub CEO Nat Friedman stated in June 2021 that "training ML systems on public data is [[fair use]]",<ref>{{Cite tweet |user=natfriedman|number=1409914420579344385|author=Nat Friedman|title=In general: (1) training ML systems on public data is fair use|access-date=2023-02-23 |website=Twitter |language=en|archive-url=https://web.archive.org/web/20210630043243/https://twitter.com/natfriedman/status/1409914420579344385|archive-date=2021-06-30}}</ref> a [[class-action lawsuit]] filed in November 2022 called this "pure speculation", asserting that "no Court has considered the question of |
While GitHub CEO [[Nat Friedman]] stated in June 2021 that "training ML systems on public data is [[fair use]]",<ref>{{Cite tweet |user=natfriedman|number=1409914420579344385|author=Nat Friedman|title=In general: (1) training ML systems on public data is fair use|access-date=2023-02-23 |website=Twitter |language=en|archive-url=https://web.archive.org/web/20210630043243/https://twitter.com/natfriedman/status/1409914420579344385|archive-date=2021-06-30}}</ref> a [[class-action lawsuit]] filed in November 2022 called this "pure speculation", asserting that "no Court has considered the question of |
||
whether 'training ML systems on public data is fair use.'"<ref name="class action suit">{{cite web |last1=Butterick |first1=Matthew |title=GitHub Copilot litigation |url=https://githubcopilotlitigation.com/ |website=githubcopilotlitigation.com |publisher=Joseph Saveri Law Firm |access-date=12 February 2023 |date=November 3, 2022|archive-url=https://web.archive.org/web/20221103204107/https://githubcopilotlitigation.com/pdf/1-0-github_complaint.pdf|archive-date=2022-11-03}}</ref> The lawsuit from Joseph Saveri Law Firm, LLP challenges the legality of Copilot on several claims, ranging from breach of contract with GitHub's users, to breach of privacy under the [[California Consumer Privacy Act|CCPA]] for sharing [[Personal data|PII]].<ref name="Verge class action">{{Cite web |last=Vincent |first=James |date=2022-11-08 |title=The lawsuit that could rewrite the rules of AI copyright |url=https://www.theverge.com/2022/11/8/23446821/microsoft-openai-github-copilot-class-action-lawsuit-ai-copyright-violation-training-data |access-date=2022-12-07 |website=The Verge |language=en-US}}</ref><ref name="class action suit"/> |
whether 'training ML systems on public data is fair use.'"<ref name="class action suit">{{cite web |last1=Butterick |first1=Matthew |title=GitHub Copilot litigation |url=https://githubcopilotlitigation.com/ |website=githubcopilotlitigation.com |publisher=Joseph Saveri Law Firm |access-date=12 February 2023 |date=November 3, 2022|archive-url=https://web.archive.org/web/20221103204107/https://githubcopilotlitigation.com/pdf/1-0-github_complaint.pdf|archive-date=2022-11-03}}</ref> The lawsuit from Joseph Saveri Law Firm, LLP challenges the legality of Copilot on several claims, ranging from breach of contract with GitHub's users, to breach of privacy under the [[California Consumer Privacy Act|CCPA]] for sharing [[Personal data|PII]].<ref name="Verge class action">{{Cite web |last=Vincent |first=James |date=2022-11-08 |title=The lawsuit that could rewrite the rules of AI copyright |url=https://www.theverge.com/2022/11/8/23446821/microsoft-openai-github-copilot-class-action-lawsuit-ai-copyright-violation-training-data |access-date=2022-12-07 |website=The Verge |language=en-US}}</ref><ref name="class action suit"/> |
||
Line 48: | Line 47: | ||
=== Privacy concerns === |
=== Privacy concerns === |
||
The Copilot service is cloud-based and requires continuous communication with the GitHub Copilot servers.<ref>{{cite web |title=GitHub Copilot - Your AI pair programmer |url=https://github.com/features/copilot/#faq-privacy |website=GitHub |access-date=18 October 2022}}</ref> This opaque architecture has fueled concerns over telemetry and data mining of individual keystrokes.<ref>{{cite web |title=CoPilot: Privacy & DataMining |url=https://github.com/community/community/discussions/7263 |website=GitHub |access-date=18 October 2022}}</ref><ref>{{cite web |last=Stallman|first=Richard|author-link=Richard Stallman|title=Who does that server really serve?|url=https://www.gnu.org/philosophy/who-does-that-server-really-serve.en.html |website=gnu.org |access-date=18 Oct 2022}}</ref> |
The Copilot service is [[cloud-based]] and requires continuous communication with the GitHub Copilot servers.<ref>{{cite web |title=GitHub Copilot - Your AI pair programmer |url=https://github.com/features/copilot/#faq-privacy |website=GitHub |access-date=18 October 2022}}</ref> This opaque architecture has fueled concerns over [[telemetry]] and data mining of individual keystrokes.<ref>{{cite web |title=CoPilot: Privacy & DataMining |url=https://github.com/community/community/discussions/7263 |website=GitHub |access-date=18 October 2022}}</ref><ref>{{cite web |last=Stallman|first=Richard|author-link=Richard Stallman|title=Who does that server really serve?|url=https://www.gnu.org/philosophy/who-does-that-server-really-serve.en.html |website=gnu.org |access-date=18 Oct 2022}}</ref> |
||
=== Security concerns with direct use of model output without oversight or testing === |
=== Security concerns with direct use of model output without oversight or testing === |
||
Line 54: | Line 53: | ||
=== Education concerns === |
=== Education concerns === |
||
A February 2022 paper released by the [[Association for Computing Machinery]] evaluates the impact Codex, the technology used by |
A February 2022 paper released by the [[Association for Computing Machinery]] evaluates the impact Codex, the technology used by GitHub Copilot, may have on the education of novice programmers.<ref name=":1" /> The study utilizes assessment questions from an introductory programming class at the [[University of Auckland]] and compares Codex’s responses with student performance.<ref name=":1" /> Researchers found that Codex, on average, performed better than most students; however, its performance decreased on questions that limited what features could be used in the solution (e.g., [[Conditional (computer programming)|conditionals]], [[Collection (abstract data type)|collections]], and [[For loop|loops]]).<ref name=":1" /> Given this type of problem, "only two of [Codex’s] 10 solutions produced the correct output, but both [...] violated [the] constraint." The paper concludes that Codex may be useful in providing a variety of solutions to learners, but may also lead to over-reliance and plagiarism.<ref name=":1" /> |
||
== See also == |
== See also == |
||
{{Div col|colwidth= |
{{Div col|colwidth=15em|content= |
||
* [[ |
* [[ChatGPT]] |
||
* [[Code completion]] |
* [[Code completion]] |
||
* [[ChatGPT]] |
|||
* [[Generative AI]] |
* [[Generative AI]] |
||
* [[Devin AI]] |
* [[Devin AI]] |
||
* [[Microsoft Copilot]] |
|||
}} |
|||
==References== |
==References== |
Latest revision as of 10:07, 28 October 2024
Developer(s) | GitHub, OpenAI |
---|---|
Initial release | October 2021 |
Stable release | 1.7.4421
|
Operating system | Microsoft Windows, Linux, macOS, Web |
Website | copilot.github.com |
GitHub Copilot is a code completion and automatic programming tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code.[1] Currently available by subscription to individual developers and to businesses, the generative artificial intelligence software was first announced by GitHub on 29 June 2021, and works best for users coding in Python, JavaScript, TypeScript, Ruby, and Go.[2] In March 2023 GitHub announced plans for "Copilot X", which will incorporate a chatbot based on GPT-4, as well as support for voice commands, into Copilot.[3]
History
[edit]On June 29, 2021, GitHub announced GitHub Copilot for technical preview in the Visual Studio Code development environment.[1][4] GitHub Copilot was released as a plugin on the JetBrains marketplace on October 29, 2021.[5] October 27, 2021, GitHub released the GitHub Copilot Neovim plugin as a public repository.[6] GitHub announced Copilot's availability for the Visual Studio 2022 IDE on March 29, 2022.[7] On June 21, 2022, GitHub announced that Copilot was out of "technical preview", and is available as a subscription-based service for individual developers.[8]
GitHub Copilot is the evolution of the 'Bing Code Search' plugin for Visual Studio 2013, which was a Microsoft Research project released in February 2014.[9] This plugin integrated with various sources, including MSDN and Stack Overflow, to provide high-quality contextually relevant code snippets in response to natural language queries.[10]
Features
[edit]When provided with a programming problem in natural language, Copilot is capable of generating solution code.[11] It is also able to describe input code in English and translate code between programming languages.[11]
According to its website, GitHub Copilot includes assistive features for programmers, such as the conversion of code comments to runnable code, and autocomplete for chunks of code, repetitive sections of code, and entire methods and/or functions.[2][12] GitHub reports that Copilot’s autocomplete feature is accurate roughly half of the time; with some Python function header code, for example, Copilot correctly autocompleted the rest of the function body code 43% of the time on the first try and 57% of the time after ten attempts.[2]
GitHub states that Copilot’s features allow programmers to navigate unfamiliar coding frameworks and languages by reducing the amount of time users spend reading documentation.[2]
Implementation
[edit]GitHub Copilot was initially powered by the OpenAI Codex,[13] which is a modified, production version of the Generative Pre-trained Transformer 3 (GPT-3), a language model using deep-learning to produce human-like text.[14] The Codex model is additionally trained on gigabytes of source code in a dozen programming languages.
Copilot’s OpenAI Codex is trained on a selection of the English language, public GitHub repositories, and other publicly available source code.[2] This includes a filtered dataset of 159 gigabytes of Python code sourced from 54 million public GitHub repositories.[15]
OpenAI’s GPT-3 is licensed exclusively to Microsoft, GitHub’s parent company.[16]
In November 2023, Copilot Chat was updated to use OpenAI's GPT-4 model.[17]
Reception
[edit]Since Copilot's release, there have been concerns with its security and educational impact, as well as licensing controversy surrounding the code it produces.[18][11][19]
Licensing controversy
[edit]While GitHub CEO Nat Friedman stated in June 2021 that "training ML systems on public data is fair use",[20] a class-action lawsuit filed in November 2022 called this "pure speculation", asserting that "no Court has considered the question of whether 'training ML systems on public data is fair use.'"[21] The lawsuit from Joseph Saveri Law Firm, LLP challenges the legality of Copilot on several claims, ranging from breach of contract with GitHub's users, to breach of privacy under the CCPA for sharing PII.[22][21]
GitHub admits that a small proportion of the tool's output may be copied verbatim, which has led to fears that the output code is insufficiently transformative to be classified as fair use and may infringe on the copyright of the original owner.[18] In June 2022, the Software Freedom Conservancy announced it would end all uses of GitHub in its own projects,[23] accusing Copilot of ignoring code licenses used in training data.[24] In a customer-support message, GitHub stated that "training machine learning models on publicly available data is considered fair use across the machine learning community",[21] but the class action lawsuit called this "false" and additionally noted that "regardless of this concept's level of acceptance in 'the machine learning community,' under Federal law, it is illegal".[21]
FSF white papers
[edit]On July 28 2021, the Free Software Foundation (FSF) published a funded call for white papers on philosophical and legal questions around Copilot.[25] Donald Robertson, the Licensing and Compliance Manager of the FSF, stated that "Copilot raises many [...] questions which require deeper examination."[25] On February 24, 2022, the FSF announced they had received 22 papers on the subject and using an anonymous review process chose 5 papers to highlight.[26]
Privacy concerns
[edit]The Copilot service is cloud-based and requires continuous communication with the GitHub Copilot servers.[27] This opaque architecture has fueled concerns over telemetry and data mining of individual keystrokes.[28][29]
Security concerns with direct use of model output without oversight or testing
[edit]A paper accepted for publication in the IEEE Symposium on Security and Privacy in 2022 assessed the security of code generated by Copilot for the MITRE’s top 25 code weakness enumerations (e.g., cross-site scripting, path traversal) across 89 different scenarios and 1,689 programs.[19] This was done along the axes of diversity of weaknesses (its ability to respond to scenarios that may lead to various code weaknesses), diversity of prompts (its ability to respond to the same code weakness with subtle variation), and diversity of domains (its ability to generate register transfer level hardware specifications in Verilog).[19] The study found that across these axes in multiple languages, 39.33% of top suggestions and 40.73% of total suggestions led to code vulnerabilities. Additionally, they found that small, non-semantic (i.e., comments) changes made to code could impact code safety.[19]
Education concerns
[edit]A February 2022 paper released by the Association for Computing Machinery evaluates the impact Codex, the technology used by GitHub Copilot, may have on the education of novice programmers.[11] The study utilizes assessment questions from an introductory programming class at the University of Auckland and compares Codex’s responses with student performance.[11] Researchers found that Codex, on average, performed better than most students; however, its performance decreased on questions that limited what features could be used in the solution (e.g., conditionals, collections, and loops).[11] Given this type of problem, "only two of [Codex’s] 10 solutions produced the correct output, but both [...] violated [the] constraint." The paper concludes that Codex may be useful in providing a variety of solutions to learners, but may also lead to over-reliance and plagiarism.[11]
See also
[edit]References
[edit]- ^ a b Gershgorn, Dave (29 June 2021). "GitHub and OpenAI launch a new AI tool that generates its own code". The Verge. Retrieved 6 July 2021.
- ^ a b c d e "GitHub Copilot · Your AI pair programmer". GitHub Copilot. Retrieved 7 April 2022.
- ^ "GitHub Copilot gets a new ChatGPT-like assistant to help developers write and fix code". The Verge. 22 March 2023. Retrieved 5 September 2023.
- ^ "Introducing GitHub Copilot: your AI pair programmer". The GitHub Blog. 29 June 2021. Retrieved 7 April 2022.
- ^ "GitHub Copilot - IntelliJ IDEs Plugin | Marketplace". JetBrains Marketplace. Retrieved 7 April 2022.
- ^ Copilot.vim, GitHub, 7 April 2022, retrieved 7 April 2022
- ^ "GitHub Copilot now available for Visual Studio 2022". The GitHub Blog. 29 March 2022. Retrieved 7 April 2022.
- ^ "GitHub Copilot is generally available to all developers". The GitHub Blog. 21 June 2022. Retrieved 21 June 2022.
- ^ Lardinois, Frederic (17 February 2014). "Microsoft Launches Smart Visual Studio Add-On For Code Snippet Search". TechCrunch. Retrieved 5 September 2023.
- ^ "Bing Code Search". Microsoft Research. 11 February 2014. Retrieved 5 September 2023.
- ^ a b c d e f g Finnie-Ansley, James; Denny, Paul; Becker, Brett A.; Luxton-Reilly, Andrew; Prather, James (14 February 2022). "The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming". Australasian Computing Education Conference. ACE '22. New York, NY, USA: Association for Computing Machinery. pp. 10–19. doi:10.1145/3511861.3511863. ISBN 978-1-4503-9643-1. S2CID 246681316.
- ^ Sobania, Dominik; Schweim, Dirk; Rothlauf, Franz (2022). "A Comprehensive Survey on Program Synthesis with Evolutionary Algorithms". IEEE Transactions on Evolutionary Computation. 27: 82–97. doi:10.1109/TEVC.2022.3162324. ISSN 1941-0026. S2CID 247721793.
- ^ Krill, Paul (12 August 2021). "OpenAI offers API for GitHub Copilot AI model". InfoWorld. Retrieved 7 April 2022.
- ^ "OpenAI Releases GPT-3, The Largest Model So Far". Analytics India Magazine. 3 June 2020. Retrieved 7 April 2022.
- ^ "OpenAI Announces 12 Billion Parameter Code-Generation AI Codex". InfoQ. Retrieved 7 April 2022.
- ^ "OpenAI is giving Microsoft exclusive access to its GPT-3 language model". MIT Technology Review. Retrieved 7 April 2022.
- ^ "GitHub Copilot – November 30th Update · GitHub Changelog". 30 November 2023.
- ^ a b "GitHub's automatic coding tool rests on untested legal ground". The Verge. 7 July 2021. Retrieved 11 July 2021.
- ^ a b c d Pearce, Hammond; Ahmad, Baleegh; Tan, Benjamin; Dolan-Gavitt, Brendan; Karri, Ramesh (16 December 2021). "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions". arXiv:2108.09293 [cs.CR].
- ^ Nat Friedman [@natfriedman] (29 June 2021). "In general: (1) training ML systems on public data is fair use" (Tweet). Archived from the original on 30 June 2021. Retrieved 23 February 2023 – via Twitter.
- ^ a b c d Butterick, Matthew (3 November 2022). "GitHub Copilot litigation" (PDF). githubcopilotlitigation.com. Joseph Saveri Law Firm. Archived from the original on 3 November 2022. Retrieved 12 February 2023.
- ^ Vincent, James (8 November 2022). "The lawsuit that could rewrite the rules of AI copyright". The Verge. Retrieved 7 December 2022.
- ^ "Give Up GitHub: The Time Has Come!". Software Freedom Conservancy. Retrieved 8 September 2022.
- ^ "If Software is My Copilot, Who Programmed My Software?". Software Freedom Conservancy. Retrieved 8 September 2022.
- ^ a b "FSF-funded call for white papers on philosophical and legal questions around Copilot". Free Software Foundation. 28 July 2021. Retrieved 11 August 2021.
- ^ "Publication of the FSF-funded white papers on questions around Copilot". Free Software Foundation. 24 February 2022.
- ^ "GitHub Copilot - Your AI pair programmer". GitHub. Retrieved 18 October 2022.
- ^ "CoPilot: Privacy & DataMining". GitHub. Retrieved 18 October 2022.
- ^ Stallman, Richard. "Who does that server really serve?". gnu.org. Retrieved 18 October 2022.