Wikipedia:Bots/Requests for approval/Uncle G's major work 'bot
There's a possibility that Uncle G's major work 'bot (talk · contribs) might wake up to do some major work. In this case it is the mass blanking of roughly ten thousand articles, to help address Wikipedia:Contributor copyright investigations/Darius Dhlomo.
Things are currently still at the discussion stage. Here's the background reading:
- Main discussion page
- Wikipedia:Administrators' noticeboard/Incidents/CCI
- List of articles to be touched
- As listed in earlier revisions of Wikipedia:Contributor copyright investigations/Darius Dhlomo and Wikipedia:Contributor copyright investigations/Darius Dhlomo 2, the list as supplied by VernoWhitney (talk · contribs). The current list is the articles touched by Darius Dhlomo. The original list was the articles created by xem.
- Notice that the 'bot will blank each article with
- User:Moonriddengirl/CCIdf (probably to be renamed Wikipedia:Contributor copyright investigations/Darius Dhlomo/Article notice)
- Full task explanation, linked to by the 'bot's edit summaries
- Wikipedia:Contributor copyright investigations/Darius Dhlomo/Task explanation (piped to something like "What is this 'bot doing?")
- Further information for editors, linked to from the blanking notice
- Wikipedia:Contributor copyright investigations/Darius Dhlomo/How to help
Points discussed and to be discussed:
- I'm in favour of the template being outside of Template: namespace and in the project namespace, so that Wikipedia mirrors don't mirror the notice. But there are arguments to the contrary. Please discuss at the main discussion page.
- If this goes ahead, I'm going to be using the same rate limits and whatnot that I used when moving VFD to AFD.
- The 'bot doesn't have the flag. One could argue that ten thousand or so article blankings will light up a lot of watchlists. But drawing people's attention to a copyright problem with their watched articles is partly the desired outcome.
As I said, things are at the discussion stage. But with this sort of major work I want many people to be forewarned about this. There are currently big unsubtle notices on the Village Pump, the Content Noticeboard, the Administrators' Noticeboard, the 'Bot owners' Noticeboard, and the Centralized Discussion template. Feel free to notify anyone else that you think this misses.
I've tested the 'bot on Ted Morgan (boxer) (which was an uncontroversial article to test on, since it is a definite copyright violation of this biography). You can see the edit here. That's what's going to happen, and that's what it's going to look like. I might tweak the edit summary text a bit.
Operator: Uncle G (talk) 16:11, 8 September 2010 (UTC)
Discussion
If you have an opinion on the task, or a better way to do it, please contribute to Wikipedia:Administrators' noticeboard/Incidents/CCI. That's where everyone else is having the discussion. They won't be paying much attention here. ☺ Uncle G (talk) 16:11, 8 September 2010 (UTC)
I have a few unrelated technical questions:
- Is this going to be done with code successfully used in the past, or is this new code? If the latter, I'd want to throw a few test pages at it just to double check that things won't blow up.
- What exactly are the proposed rate limits? If your bot can handle maxlag, I could certainly support a WP:IAR of the limits in WP:BOTPOL in favor of "as fast as maxlag allows" for this particular task. If you would want to do that, it should of course be discussed at Wikipedia:Administrators' noticeboard/Incidents/CCI.
- Note that even if the bot has the bot flag, it is now possible for the bot to not flag its individual edits as bot. Even when not applied to edits, the bot flag still gives some advantages to the bot account that may be useful. Can your bot do this? If you want to test it, you can ask for the flag at WP:BN and do some edits in a userspace sandbox.
Anomie⚔ 17:22, 8 September 2010 (UTC)
Heh! This is code so old that it predates both api.php and maxlag. (Successful past use includes raking various sandboxes, archiving my talk page, and of course moving that VFD mountain.) Right at the moment I'm working on checking it through and updating it to use api.php where appropriate (and where necessary — index.php functionality has changed since I last used some of the programs.). If you think that I wasn't going to throw a few test pages at it before throwing ten thousand pages at it live … ☺
Since the 'bot tools predate it, there's no maxlag. (There's no automated retry logic at all. If an edit fails, it fails.) My very simplistic approach to rate limiting was a hardwired delay of a fixed number of seconds between each operation. If you go back to 2005 in the contributions history, you'll see the delay in operation fairly clearly.
As for the flag: That's a discussion for other people, really. It doesn't affect the operation of any of the things that this 'bot will be doing. There are no queries involved, for instance. (I didn't even have a query-making tool until just recently, when I wrote one to perform an external link query for the GeoCities cleanup discussion.) Uncle G (talk) 22:31, 8 September 2010 (UTC)
- Let's try some more interesting tests. Please feed it the list at [1]. The idea, of course, is to see whether the bot will blindly follow or blank redirects or recreate deleted pages should some of the later entries in the real list have been messed with by uninvolved humans by the time the bot gets to them. Anomie⚔ 12:18, 9 September 2010 (UTC)
- Redirects aren't really going to be an issue in the first place. The article list to be used contains only real articles with content. I'll try the second test, though. The 'bot's been told to pass the "nocreate" parameter, so I'm interested in seeing whether that actually works. Uncle G (talk) 12:32, 9 September 2010 (UTC)
- There's interesting! It doesn't. I wonder why. It's doing what's in the doco. I'm going to have a play. Uncle G (talk) 12:44, 9 September 2010 (UTC)
- Redirects aren't really going to be an issue in the first place. The article list to be used contains only real articles with content. I'll try the second test, though. The 'bot's been told to pass the "nocreate" parameter, so I'm interested in seeing whether that actually works. Uncle G (talk) 12:32, 9 September 2010 (UTC)