Wikipedia talk:Contributor copyright investigations/Darius Dhlomo: Difference between revisions
A question |
|||
Line 66: | Line 66: | ||
I've been focusing my efforts Darius's biographies and I've come to believe that DD was using a bot of some sort to create articles. I think this for several reasons. One is that many of them have this "fill in the blank" quality. They almost always include the athlete's gender and the phrase ''his [or her] native country''. The articles' spelling tends to be consistent with that of their sources. If the source uses the British way of spelling things (i.e. metres instead of meters) then so does Darius. If its an American source then he uses the American spelling.--[[User:*Kat*|*Kat*]] ([[User talk:*Kat*|talk]]) 01:26, 21 September 2010 (UTC) |
I've been focusing my efforts Darius's biographies and I've come to believe that DD was using a bot of some sort to create articles. I think this for several reasons. One is that many of them have this "fill in the blank" quality. They almost always include the athlete's gender and the phrase ''his [or her] native country''. The articles' spelling tends to be consistent with that of their sources. If the source uses the British way of spelling things (i.e. metres instead of meters) then so does Darius. If its an American source then he uses the American spelling.--[[User:*Kat*|*Kat*]] ([[User talk:*Kat*|talk]]) 01:26, 21 September 2010 (UTC) |
||
*You're not the first to have observed the former. (There's a ''lot'' of discussion of this on [[Wikipedia:Administrators' noticeboard/Incidents/CCI|the main cleanup page]] (q.v.).) The point about the spelling is interesting, though. Does this apply to the 1-paragraph stubs that people are opining cannot be copyright violations? Please point to an example. [[User:Uncle G|Uncle G]] ([[User talk:Uncle G|talk]]) 01:38, 21 September 2010 (UTC) |
Revision as of 01:38, 21 September 2010
Presumptive removal?
Should any prose that smells of copyvio be presumptively removed? I have already found one definite and three possibles in a fairly small sample size and I think that with the potential scale of the problem presumptive removal would speed things up a little bit. Boissière (talk) 21:56, 4 September 2010 (UTC)
- Yes, they should be presumptively removed. With the massive scale of this one there's really no other way to handle it, particularly since all of the articles currently listed are the ones they actually created. VernoWhitney (talk) 23:25, 4 September 2010 (UTC)
- Can we look at Darius's edits by size of the edit instead? As I stated in the opening, the shorter articles (below 2.5KB creation size) he's created are practically a green light for original work. From a legal perspective, no one will bother contesting a couple of sentences describing basic, key information on a subject. Also, I would guess that the copyright problems will lie solely in biographies and not the likes of X at Games...etc. Sillyfolkboy (talk) (edits)Join WikiProject Athletics! 00:47, 5 September 2010 (UTC)
- Yeah, I got it in my head that it would be easier to split out created articles from the other articles they've edited but not created, which is why it ended up like this. I'm running it through my bot right now so tomorrow I should be able to update the pages with created articles sorted by edit size and then other edited articles also sorted by edit size. VernoWhitney (talk) 03:11, 5 September 2010 (UTC)
- Can we look at Darius's edits by size of the edit instead? As I stated in the opening, the shorter articles (below 2.5KB creation size) he's created are practically a green light for original work. From a legal perspective, no one will bother contesting a couple of sentences describing basic, key information on a subject. Also, I would guess that the copyright problems will lie solely in biographies and not the likes of X at Games...etc. Sillyfolkboy (talk) (edits)Join WikiProject Athletics! 00:47, 5 September 2010 (UTC)
Need help?
I just saw this report on ANI and thought I'd see if you'd like some help. I've never gotten involved here so I'm unsure as to how this works, procedurally-speaking. Should I claim an article in the list somehow? I'm guessing the x graphics means no copyright issues found. What happens if I do find something plagiarized? How does it get tagged, and is there somewhere else that would be reported? Sorry for so many questions, but I want to make sure I'm going about it properly before I jump right in, so I don't end up creating even more work for someone. — e. ripley\talk 04:36, 5 September 2010 (UTC)
- Yes, {{n}} means no copyvio found. {{y}} Means there's a problem or at least a likely problem. If you find something that looks to be a problem, whether or not you can find a source, you can a) remove the copyvio yourself on the spot or b) replace the page with {{subst:copyvio|1=source}} and follow the instructions on the generated page that tell you how to list it on the Wikipedia:Copyright problems daily subpage for others to follow up on. VernoWhitney (talk) 12:40, 5 September 2010 (UTC)
- And what does the red X that some editors have been using indicate? DGG ( talk ) 00:18, 9 September 2010 (UTC)
- {{n}} generates , so it means no copyvio found. {{y}} generates which means there's a problem. VernoWhitney (talk) 00:27, 9 September 2010 (UTC)
- A red X means there is no problem, but a green check mark means that there is? That is a very confusing convention. Tim Pierce (talk) 15:48, 14 September 2010 (UTC)
- Sorry. :) For us, it's always been more of a "y" means problem, "n" means no problem. The images may be confusing, but the letters are pretty intuitive. Wish we could use a similar one-letter scheme that's more visually connected. :/ --Moonriddengirl (talk) 16:11, 20 September 2010 (UTC)
- The convention always made sense to me because unlike most other places where a green check means "good" or the like, here we're actually looking for expected problems and a check means something like "yes, I found something". At least that's how I see it. VernoWhitney (talk) 16:24, 20 September 2010 (UTC)
- Sorry. :) For us, it's always been more of a "y" means problem, "n" means no problem. The images may be confusing, but the letters are pretty intuitive. Wish we could use a similar one-letter scheme that's more visually connected. :/ --Moonriddengirl (talk) 16:11, 20 September 2010 (UTC)
- A red X means there is no problem, but a green check mark means that there is? That is a very confusing convention. Tim Pierce (talk) 15:48, 14 September 2010 (UTC)
- {{n}} generates , so it means no copyvio found. {{y}} generates which means there's a problem. VernoWhitney (talk) 00:27, 9 September 2010 (UTC)
- And what does the red X that some editors have been using indicate? DGG ( talk ) 00:18, 9 September 2010 (UTC)
In Cleanup instructions you note that All contributors with no history of copyright problems are welcome to contribute to clean up. I had in the past an issue related to copyright problems mainly due to misunderstandings, which was finally cleared. Would I be allowed to help here, or not? Rentzepopoulos (talk) 13:01, 20 September 2010 (UTC)
- Hi. I remember that situation. It's understandable to be confused about our ability to include non-commercial material, but there were some issues with close paraphrasing in your initial rewrites. The request that only contributors with no history of copyright problems help is designed to protect both the project and the contributors, since a misunderstanding of this can make them contributory to infringement if they restore copyrighted material. Does it remain only the one instance? If so, I should think you'd be very welcome to mark those uncomplicated situations where the contributor did not add text, but only uncreative information. These are particularly likely to turn up in those articles to which he contributed, but did not create. --Moonriddengirl (talk) 16:11, 20 September 2010 (UTC)
Refining approach
This evening I have been trying to develop an API program which would take the wikitext of a suspect article and try to count up the amount of prose in it. It does this by dividing the article into sections and counting the words in each section. A section is principally either a normal section between two headings or a cell in a table. The program then reports the largest section. This way an article consisting mainly of tables should return a low value. Here is what it produces for Articles 61 through 80 (I chose this because this has a reported but not yet cleaned copyvio in Athletics at the 1980 Summer Olympics – Men's 3000 metre steeplechase).
- Cycling at the 1972 Summer Olympics – Men's individual road race - Max words in a section = 190
- National champions Javelin (men) - Max words in a section = 115
- Athletics at the 1992 Summer Olympics – Men's 800 metres - Max words in a section = 34
- Estonia national football team 1996 - Max words in a section = 59
- 1999–2000 in Dutch football - Max words in a section = 102
- 2009 Vuelta a Colombia - Max words in a section = 589
- 1987 Race Walking Year Ranking - Max words in a section = 47
- 2008 Women's Pan-American Volleyball Cup Squads - Max words in a section = 27
- 2004 UCI Road World Championships – Men's road race - Max words in a section = 40
- European Sprint Swimming Championships 1994 - Max words in a section = 46
- National Marathon champions (men) - Max words in a section = 103
- Athletics at the 1980 Summer Olympics – Men's 3000 metre steeplechase - Max words in a section = 212
- European Sprint Swimming Championships 1992 - Max words in a section = 49
- Water polo at the 1988 Summer Olympics - Max words in a section = 112
- Cycling at the 1992 Summer Olympics – Men's individual road race - Max words in a section = 152
- Hockey at the 1999 Pan American Games - Max words in a section = 119
- Squash at the 2007 Pan American Games - Max words in a section = 54
- Athletics at the 1992 Summer Olympics – Men's 1500 metres - Max words in a section = 41
- European Sprint Swimming Championships 1993 - Max words in a section = 104
- Swimming at the 1995 Pan American Games - Max words in a section = 33
The program needs refinement - in 2009 Vuelta a Colombia it is being fooled by the list of teams near the end - I need to work out how to spot that. You can see that the copyvio article mentioned has a word count of 212. Is this an approach worth pursuing further? Boissière (talk) 22:51, 5 September 2010 (UTC)
- Definitely worth doing as I imagine the large amount of biographies will be the difficult task to tackle. This will narrow them down immensely because so many of Darius's created biographies are just one or two sentences followed by tables. SFB/talk 20:59, 8 September 2010 (UTC)
- Thanks for the feedback. I have held off from this for a bit due to all the hoo-ha related to this CCI as well as the problems with the program mentioned above. However I have tweaked the program a bit to separately give the size of the lead and the maximum size of the other sections of an article. The results of scanning the first 333 articles are given here. I am spurred on by the probability that the articles are going to be blanked which will cause me a few problems as the program simply reads the latest version. Boissière (talk) 11:36, 10 September 2010 (UTC)
- Is anything going on with this? I've had the same idea, so if someone else is doing it, that's great. I was thinking of just spotting anything with over 15 or so consecutive words. Reading the latest revision isn't so good. It's preferable to read all the revisions and figure out which words were added by DD's edits. 67.119.14.196 (talk) 06:29, 17 September 2010 (UTC)
- Thanks for the feedback. I have held off from this for a bit due to all the hoo-ha related to this CCI as well as the problems with the program mentioned above. However I have tweaked the program a bit to separately give the size of the lead and the maximum size of the other sections of an article. The results of scanning the first 333 articles are given here. I am spurred on by the probability that the articles are going to be blanked which will cause me a few problems as the program simply reads the latest version. Boissière (talk) 11:36, 10 September 2010 (UTC)
Copyright question
I know that results and statistics themselves aren't copyrightable, but is there anything copyrightable in the specific format and wording in which they are presented? I ask in relation to this comment I made on the user's Talk page. -- Boing! said Zebedee (talk) 17:35, 11 September 2010 (UTC)
- Tables are copyrightable to the extent that they are creative. If the content itself is not uncreative, a table is only copyrightable in the United States if it is creative in presentation or in the selection of facts. This is the reason why in Feist v. Rural a phone book was not found to be copyright infringement, because the information was presented alphabetically and included the same details that others would include. --Moonriddengirl (talk) 18:09, 11 September 2010 (UTC)
- Oh, that particular source for tables has been discussed at the ANI thread, and we think it's okay. --Moonriddengirl (talk) 18:10, 11 September 2010 (UTC)
- Ah, that's great, thanks -- Boing! said Zebedee (talk) 18:18, 11 September 2010 (UTC)
- Oh, that particular source for tables has been discussed at the ANI thread, and we think it's okay. --Moonriddengirl (talk) 18:10, 11 September 2010 (UTC)
Translations
A certain amount of this copywio may have spilled over to other language versions through translations. Will a list of confirmed copyvio be kept here, and are there any ideas about how this particular problem could be checked and handled? --Sir48 (talk) 13:58, 14 September 2010 (UTC)
- The pages here listing the user's contributions and any action taken to remove them will be kept (and archived when the case is closed). Hut 8.5 14:41, 14 September 2010 (UTC)
Copyvio?
This revision has been marked by a user [1] as copyvio on here [2]. Need explanation on why this is so. 121.120.214.122 (talk) 15:42, 14 September 2010 (UTC)
I think I'm onto something
I've been focusing my efforts Darius's biographies and I've come to believe that DD was using a bot of some sort to create articles. I think this for several reasons. One is that many of them have this "fill in the blank" quality. They almost always include the athlete's gender and the phrase his [or her] native country. The articles' spelling tends to be consistent with that of their sources. If the source uses the British way of spelling things (i.e. metres instead of meters) then so does Darius. If its an American source then he uses the American spelling.--*Kat* (talk) 01:26, 21 September 2010 (UTC)
- You're not the first to have observed the former. (There's a lot of discussion of this on the main cleanup page (q.v.).) The point about the spelling is interesting, though. Does this apply to the 1-paragraph stubs that people are opining cannot be copyright violations? Please point to an example. Uncle G (talk) 01:38, 21 September 2010 (UTC)