Wikipedia talk:Contributor copyright investigations/Darius Dhlomo: Difference between revisions
Line 42: | Line 42: | ||
:Definitely worth doing as I imagine the large amount of biographies will be the difficult task to tackle. This will narrow them down immensely because so many of Darius's created biographies are just one or two sentences followed by tables. SFB/[[User talk:Sillyfolkboy|talk]] 20:59, 8 September 2010 (UTC) |
:Definitely worth doing as I imagine the large amount of biographies will be the difficult task to tackle. This will narrow them down immensely because so many of Darius's created biographies are just one or two sentences followed by tables. SFB/[[User talk:Sillyfolkboy|talk]] 20:59, 8 September 2010 (UTC) |
||
::Thanks for the feedback. I have held off from this for a bit due to all the hoo-ha related to this CCI as well as the problems with the program mentioned above. However I have tweaked the program a bit to separately give the size of the lead and the maximum size of the other sections of an article. The results of scanning the first 333 articles are given [[User:Boissière/CCI list|here]]. I am spurred on by the probability that the articles are going to be blanked which will cause me a few problems as the program simply reads the latest version. [[User:Boissière|Boissière]] ([[User talk:Boissière|talk]]) 11:36, 10 September 2010 (UTC) |
::Thanks for the feedback. I have held off from this for a bit due to all the hoo-ha related to this CCI as well as the problems with the program mentioned above. However I have tweaked the program a bit to separately give the size of the lead and the maximum size of the other sections of an article. The results of scanning the first 333 articles are given [[User:Boissière/CCI list|here]]. I am spurred on by the probability that the articles are going to be blanked which will cause me a few problems as the program simply reads the latest version. [[User:Boissière|Boissière]] ([[User talk:Boissière|talk]]) 11:36, 10 September 2010 (UTC) |
||
:::Is anything going on with this? I've had the same idea, so if someone else is doing it, that's great. [[Special:Contributions/67.119.14.196|67.119.14.196]] ([[User talk:67.119.14.196|talk]]) 06:29, 17 September 2010 (UTC) |
:::Is anything going on with this? I've had the same idea, so if someone else is doing it, that's great. I was thinking of just spotting anything with over 15 or so consecutive words. [[Special:Contributions/67.119.14.196|67.119.14.196]] ([[User talk:67.119.14.196|talk]]) 06:29, 17 September 2010 (UTC) |
||
== Copyright question == |
== Copyright question == |
Revision as of 06:30, 17 September 2010
Presumptive removal?
Should any prose that smells of copyvio be presumptively removed? I have already found one definite and three possibles in a fairly small sample size and I think that with the potential scale of the problem presumptive removal would speed things up a little bit. Boissière (talk) 21:56, 4 September 2010 (UTC)
- Yes, they should be presumptively removed. With the massive scale of this one there's really no other way to handle it, particularly since all of the articles currently listed are the ones they actually created. VernoWhitney (talk) 23:25, 4 September 2010 (UTC)
- Can we look at Darius's edits by size of the edit instead? As I stated in the opening, the shorter articles (below 2.5KB creation size) he's created are practically a green light for original work. From a legal perspective, no one will bother contesting a couple of sentences describing basic, key information on a subject. Also, I would guess that the copyright problems will lie solely in biographies and not the likes of X at Games...etc. Sillyfolkboy (talk) (edits)Join WikiProject Athletics! 00:47, 5 September 2010 (UTC)
- Yeah, I got it in my head that it would be easier to split out created articles from the other articles they've edited but not created, which is why it ended up like this. I'm running it through my bot right now so tomorrow I should be able to update the pages with created articles sorted by edit size and then other edited articles also sorted by edit size. VernoWhitney (talk) 03:11, 5 September 2010 (UTC)
- Can we look at Darius's edits by size of the edit instead? As I stated in the opening, the shorter articles (below 2.5KB creation size) he's created are practically a green light for original work. From a legal perspective, no one will bother contesting a couple of sentences describing basic, key information on a subject. Also, I would guess that the copyright problems will lie solely in biographies and not the likes of X at Games...etc. Sillyfolkboy (talk) (edits)Join WikiProject Athletics! 00:47, 5 September 2010 (UTC)
Need help?
I just saw this report on ANI and thought I'd see if you'd like some help. I've never gotten involved here so I'm unsure as to how this works, procedurally-speaking. Should I claim an article in the list somehow? I'm guessing the x graphics means no copyright issues found. What happens if I do find something plagiarized? How does it get tagged, and is there somewhere else that would be reported? Sorry for so many questions, but I want to make sure I'm going about it properly before I jump right in, so I don't end up creating even more work for someone. — e. ripley\talk 04:36, 5 September 2010 (UTC)
- Yes, {{n}} means no copyvio found. {{y}} Means there's a problem or at least a likely problem. If you find something that looks to be a problem, whether or not you can find a source, you can a) remove the copyvio yourself on the spot or b) replace the page with {{subst:copyvio|1=source}} and follow the instructions on the generated page that tell you how to list it on the Wikipedia:Copyright problems daily subpage for others to follow up on. VernoWhitney (talk) 12:40, 5 September 2010 (UTC)
- And what does the red X that some editors have been using indicate? DGG ( talk ) 00:18, 9 September 2010 (UTC)
- {{n}} generates , so it means no copyvio found. {{y}} generates which means there's a problem. VernoWhitney (talk) 00:27, 9 September 2010 (UTC)
- A red X means there is no problem, but a green check mark means that there is? That is a very confusing convention. Tim Pierce (talk) 15:48, 14 September 2010 (UTC)
- {{n}} generates , so it means no copyvio found. {{y}} generates which means there's a problem. VernoWhitney (talk) 00:27, 9 September 2010 (UTC)
- And what does the red X that some editors have been using indicate? DGG ( talk ) 00:18, 9 September 2010 (UTC)
Refining approach
This evening I have been trying to develop an API program which would take the wikitext of a suspect article and try to count up the amount of prose in it. It does this by dividing the article into sections and counting the words in each section. A section is principally either a normal section between two headings or a cell in a table. The program then reports the largest section. This way an article consisting mainly of tables should return a low value. Here is what it produces for Articles 61 through 80 (I chose this because this has a reported but not yet cleaned copyvio in Athletics at the 1980 Summer Olympics – Men's 3000 metre steeplechase).
- Cycling at the 1972 Summer Olympics – Men's individual road race - Max words in a section = 190
- National champions Javelin (men) - Max words in a section = 115
- Athletics at the 1992 Summer Olympics – Men's 800 metres - Max words in a section = 34
- Estonia national football team 1996 - Max words in a section = 59
- 1999–2000 in Dutch football - Max words in a section = 102
- 2009 Vuelta a Colombia - Max words in a section = 589
- 1987 Race Walking Year Ranking - Max words in a section = 47
- 2008 Women's Pan-American Volleyball Cup Squads - Max words in a section = 27
- 2004 UCI Road World Championships – Men's road race - Max words in a section = 40
- European Sprint Swimming Championships 1994 - Max words in a section = 46
- National Marathon champions (men) - Max words in a section = 103
- Athletics at the 1980 Summer Olympics – Men's 3000 metre steeplechase - Max words in a section = 212
- European Sprint Swimming Championships 1992 - Max words in a section = 49
- Water polo at the 1988 Summer Olympics - Max words in a section = 112
- Cycling at the 1992 Summer Olympics – Men's individual road race - Max words in a section = 152
- Hockey at the 1999 Pan American Games - Max words in a section = 119
- Squash at the 2007 Pan American Games - Max words in a section = 54
- Athletics at the 1992 Summer Olympics – Men's 1500 metres - Max words in a section = 41
- European Sprint Swimming Championships 1993 - Max words in a section = 104
- Swimming at the 1995 Pan American Games - Max words in a section = 33
The program needs refinement - in 2009 Vuelta a Colombia it is being fooled by the list of teams near the end - I need to work out how to spot that. You can see that the copyvio article mentioned has a word count of 212. Is this an approach worth pursuing further? Boissière (talk) 22:51, 5 September 2010 (UTC)
- Definitely worth doing as I imagine the large amount of biographies will be the difficult task to tackle. This will narrow them down immensely because so many of Darius's created biographies are just one or two sentences followed by tables. SFB/talk 20:59, 8 September 2010 (UTC)
- Thanks for the feedback. I have held off from this for a bit due to all the hoo-ha related to this CCI as well as the problems with the program mentioned above. However I have tweaked the program a bit to separately give the size of the lead and the maximum size of the other sections of an article. The results of scanning the first 333 articles are given here. I am spurred on by the probability that the articles are going to be blanked which will cause me a few problems as the program simply reads the latest version. Boissière (talk) 11:36, 10 September 2010 (UTC)
- Is anything going on with this? I've had the same idea, so if someone else is doing it, that's great. I was thinking of just spotting anything with over 15 or so consecutive words. 67.119.14.196 (talk) 06:29, 17 September 2010 (UTC)
- Thanks for the feedback. I have held off from this for a bit due to all the hoo-ha related to this CCI as well as the problems with the program mentioned above. However I have tweaked the program a bit to separately give the size of the lead and the maximum size of the other sections of an article. The results of scanning the first 333 articles are given here. I am spurred on by the probability that the articles are going to be blanked which will cause me a few problems as the program simply reads the latest version. Boissière (talk) 11:36, 10 September 2010 (UTC)
Copyright question
I know that results and statistics themselves aren't copyrightable, but is there anything copyrightable in the specific format and wording in which they are presented? I ask in relation to this comment I made on the user's Talk page. -- Boing! said Zebedee (talk) 17:35, 11 September 2010 (UTC)
- Tables are copyrightable to the extent that they are creative. If the content itself is not uncreative, a table is only copyrightable in the United States if it is creative in presentation or in the selection of facts. This is the reason why in Feist v. Rural a phone book was not found to be copyright infringement, because the information was presented alphabetically and included the same details that others would include. --Moonriddengirl (talk) 18:09, 11 September 2010 (UTC)
- Oh, that particular source for tables has been discussed at the ANI thread, and we think it's okay. --Moonriddengirl (talk) 18:10, 11 September 2010 (UTC)
- Ah, that's great, thanks -- Boing! said Zebedee (talk) 18:18, 11 September 2010 (UTC)
- Oh, that particular source for tables has been discussed at the ANI thread, and we think it's okay. --Moonriddengirl (talk) 18:10, 11 September 2010 (UTC)
Translations
A certain amount of this copywio may have spilled over to other language versions through translations. Will a list of confirmed copyvio be kept here, and are there any ideas about how this particular problem could be checked and handled? --Sir48 (talk) 13:58, 14 September 2010 (UTC)
- The pages here listing the user's contributions and any action taken to remove them will be kept (and archived when the case is closed). Hut 8.5 14:41, 14 September 2010 (UTC)
Copyvio?
This revision has been marked by a user [1] as copyvio on here [2]. Need explanation on why this is so. 121.120.214.122 (talk) 15:42, 14 September 2010 (UTC)