Wikipedia:Bots/Requests for approval/Archivedotisbot: Difference between revisions

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 04:35, 19 May 2014

Archivedotisbot

Operator: Kww (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:29, Saturday May 10, 2014 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): PHP (based on Chartbot's existing framework)

Source code available:

Function overview: Removal all archival links to archive.is (and its alias, archive.today)~~, which was put in place to bypass the blacklist)~~

Links to relevant discussions (where appropriate): WP:Archive.is RFC, MediaWiki talk:Spam-blacklist/archives/December 2013#archive.is, Wikipedia:Administrators' noticeboard/Archive261#Archive.is headache

Edit period(s): One time run, with cleanups for any entries that got missed.

Estimated number of pages affected:

Exclusion compliant (Yes/No):

Already has a bot flag (Yes/No):

Function details:

Remove "archiveurl=" and "archivedate=" parameters whenever the archiveurl points at archive.is or archive.today.

Amended description in response to comments below.

The bot cannot implement the RFC result and keep links to archive.is. However, to help prevent deadlinking issues, the bot will take two steps:

When removing a link from an article, the bot will add a talk page notice of the form "Archive item nnnn from archive.today, used to support <url>, has been removed from this article".
A centralised list of all removals will be maintained at User:Archivedotisbot/Removal list.

—Kww(talk) 16:52, 16 May 2014 (UTC)[reply]

Discussion

Comment There is no direct connection between the existence of the links and the blacklisting of the archive.is site. Most of the archive links were put there in good faith. As archive.is performs a unique function, the proposer will need to demonstrate the links themselves are actually in violation of policy, and that any given archive is replaceable – meaning the bot ought to be capable of replacing the links with one on another archive site, particularly where the original referring url has gone dead. Non-replacement will lead to diminution of verifiability of citations used. -- Ohc ^¡digame! 01:20, 12 May 2014 (UTC)[reply]

Leaving the links in place wouldn't correspond to the RFC consensus, and having the links in place while the site is blacklisted makes for a painful editing experience.—Kww(talk) 01:31, 12 May 2014 (UTC)[reply]
Blacklisting does not distinguish good-faith edits. Welcome to the alternate universe of the MediaWiki talk:Spam-whitelist. Will the bot honor the whitelist? If so, we should get some links whitelisted before trial so that functionality may be tested. See MediaWiki talk:Spam-whitelist/Archives/2014/03#archive.is/T5OAy. This should be done before the bot runs, to avoid any discontinuity of referencing, as the whitelist approval process can take months to come to consensus. – Wbm1058 (talk) 01:58, 12 May 2014 (UTC)[reply]

Does anybody keep track of all the archive links they place? I can guess but I can never be sure. If a bot is approved, removals of potentially valid and irreplaceable (in some cases) links will be the default scenario unless all editors who consciously used the site come forward with their full list. I fear that even if I whitelisted all the articles I made substantial contributions to, that list would be incomplete. Then, some links I placed will inevitably get picked off by the bot. -- Ohc ^¡digame! 04:30, 12 May 2014 (UTC)[reply]

I have to reject the timing and implication of this request at this time on a couple of key grounds. Archive.today was not made to bypass the filter. There is no evidence that Archive.is operated the Wiki Archive script/bot. The actual situation was resolved by blocks, not the filter - the filter was by-passable for a long time. Kww made a non-neutral RFC that hinged on perceived use as ads, malware and other forms of attack - without any evidence nor any realization of any of these "bad things" would ever or be likely to occur. Frankly, the RFC was not even closed by an admin and it was that person, @Hobit:, that bought into the malware spiel and found Archive.is "guilty" without any evidence presented. Also, this is six months later, if that's not enough reason to give pause - I'll file for a community RFC or ArbCase on removing the Archive.is filter all the quicker. Back in October 2013, I'd have deferred to the opinion then, but not when thousands of Gamespot refs cannot be used because of Archive.org and Webcite's limitations and Kww seems deaf to the verifiability issues. Those who build content and maintain content pages need Archive.is to reduce linkrot from the most unstable resources like GameSpot. ChrisGualtieri (talk) 04:49, 12 May 2014 (UTC)[reply]

I will simply point out that your arguments were raised and rejected at a scrupulously neutral RFC that was widely advertised for months.—Kww(talk) 05:00, 12 May 2014 (UTC)[reply]

False, I wasn't even a part of the RFC. Also, the malware and illegal aspect were repeatedly pushed without evidence. ChrisGualtieri (talk) 16:29, 12 May 2014 (UTC)[reply]

I didn't say that you had participated: I said that your arguments had been presented. The framing of the RFC statement was scrupulously neutral. Arguments were not neutral, but such is the nature of arguments.—Kww(talk) 16:50, 12 May 2014 (UTC)[reply]

Can you prove, with firm evidence, that archive.today was created to "bypass the blacklist"? That domain has existed for months, and during this time, an attacker could have spilled a mess all over Wikipedia, but this has not occurred. Currently, archive.is does not exist (just try typing in the URL), it redirects to archive.today which is the current location of the site. A website may change domains due to any number of legitimate reasons, ranging from problems with the domain name provider, to breaking ccTLD rules. --benlisquare_T•C•E 06:04, 12 May 2014 (UTC)[reply]

Struck the language expressing cause and effect, and simply note that archive.is and archive.today are the same site.—Kww(talk) 06:13, 12 May 2014 (UTC)[reply]

I did close the RfC and am not an admin. I closed the discussion based upon the contributions to that RfC. There was no "Guilty" reading. Rather it was the sense of the participants that archive.is links should be removed because there was a concern that unethical means (unapproved bot, what looked like a bot network, etc.) were used to add those links. I think my close made it really really clear that I was hopeful we could find a way forward that let us use those links. If you (@ChrisGualtieri:) or anyone else would like to start a new RfC to see if consensus has changed, I'd certainly not object. But I do think I properly read the consensus of the RfC and that consensus wasn't irrational. On topic, I think the bot request should be approved--though if someone were to start a new RfC, I'd put that approval on hold until the RfC finished. Hobit (talk) 18:05, 12 May 2014 (UTC)[reply]
Comment. An unapproved(?) bot is already doing archive.is removal/replace: [1] 77.227.74.183 (talk) 06:18, 13 May 2014 (UTC)[reply]

Im not a bot, so that is completely uncalled for. Werieth (talk) 10:16, 13 May 2014 (UTC)[reply]

When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.

I see you work 24/7 and insert high amount of unreviewed links like a bot ([2], [3] in Barcelona).

I call you a bot. 90.163.54.9 (talk) 13:03, 13 May 2014 (UTC)[reply]

I dont read Chinese, and it looks like a valid archive. Not sure what the issue is. comparing http://www.szfao.gov.cn/ygwl/yxyc/ycgy/201101/t20110120_1631663.htm and its archive version http://www.webcitation.org/684VviYTN the only differences Im seeing is its missing a few images, otherwise its the same article. Werieth (talk) 13:11, 13 May 2014 (UTC)[reply]

The first page has only frame and misses content, second has only a server error message. No human would insert such links. I also notices that you inserted many links to archived copies of youtube video pages, which is nonsense.

You should submit a bot approval request (like this one), and perform a test run before run your bot at mass scale.

Only the fact that in the same transaction you removing archive.is links prevents editors to undo your edits. Otherwise most of your edits would be reverted. 90.163.54.9 (talk) 13:14, 13 May 2014 (UTC)[reply]

Not sure what your looking at but http://www.webcitation.org/684VviYTN looks almost identical to http://www.szfao.gov.cn/ygwl/yxyc/ycgy/201101/t20110120_1631663.htm. The only two differences I see is that the archive is missing the top banner, and the QR code at the bottom. As I said Im not a bot and thus dont need to file for approval. Werieth (talk) 13:21, 13 May 2014 (UTC)[reply]

Forget 684VviYTN, it was my copy-paste error, which I promptly fixed. There are 2 other examples above. 90.163.54.9 (talk) 13:24, 13 May 2014 (UTC)[reply]

taking a look at http://www.apb.es/wps/portal/!ut/p/c1/04_SB8K8xLLM9MSSzPy8xBz9CP0os_hgz2DDIFNLYwMLfzcDAyNjQy9vLwNTV38LM_1wkA6zeH_nIEcnJ0NHAwNfUxegCh8XA2-nUCMDdzOIvAEO4Gig7-eRn5uqX5CdneboqKgIAAeNRE8!/dl2/d1/L2dJQSEvUUt3QS9ZQnB3LzZfU0lTMVI1OTMwOE9GMDAyMzFKS0owNUVPODY!/?WCM_GLOBAL_CONTEXT=/wps/wcm/connect/ExtranetAnglesLib/El%20Port%20de%20Barcelona/el+port/historia+del+port/cami+cap+el+futur/ vs http://web.archive.org/web/20131113091734/http://www.apb.es/wps/portal/!ut/p/c1/04_SB8K8xLLM9MSSzPy8xBz9CP0os_hgz2DDIFNLYwMLfzcDAyNjQy9vLwNTV38LM_1wkA6zeH_nIEcnJ0NHAwNfUxegCh8XA2-nUCMDdzOIvAEO4Gig7-eRn5uqX5CdneboqKgIAAeNRE8!/dl2/d1/L2dJQSEvUUt3QS9ZQnB3LzZfU0lTMVI1OTMwOE9GMDAyMzFKS0owNUVPODY!/?WCM_GLOBAL_CONTEXT=/wps/wcm/connect/ExtranetAnglesLib/El%20Port%20de%20Barcelona/el+port/historia+del+port/cami+cap+el+futur/ it looks like a snapshot of how the webpage looked when it was archived and the page is dynamic. There is one part of the page that appears to be dynamically generated via JavaScript that appears partially broken in the archive but most of the page content persists and is better than not having any of the content if the source goes dead. Instead of complaining about my link recovery work why dont you do something productive? Werieth (talk) 13:36, 13 May 2014 (UTC)[reply]

Productive would be to undo your changes and discuss in public the algorithms of your bot, but it is impossible because you intentionally choose pages with at least one archive.is link and thus you do abuse the archive.is filter making your unapproved bot changes irreversible. Also, you comment those changes as "replace/remove archive.is" albeit 90% of the changes you made are irrelevant to archive.is. 90.163.54.9 (talk) 15:11, 13 May 2014 (UTC)[reply]

Oppose, RFC was non-neutral, bias, and not widely advertised despite Kww's claims, this is obvious from the number of editors who say they had no knowledge of the discussion while clearly being opposed to its outcome. DWB / Are you a bad enough dude to GA Review The Joker? 08:02, 13 May 2014 (UTC)[reply]

DWB:It's hard to give much weight to an argument based on a falsehood. It was placed in the centralized discussion template on Sept 13 and not removed until Oct 31. That you personally missed a discussion doesn't invalidate a discussion. The framing of the question was scrupulously neutral.—Kww(talk) 14:44, 13 May 2014 (UTC)[reply]

And yet so many users say they were not aware of it, of course we are all lying. The RFC was based on the premise "Archive.is does what it is meant to but I am going to accuse it of things I cannot prove, and also one user is adding a lot of their links so we should block it". It was not advertised nor neutral. DWB / Are you a bad enough dude to GA Review The Joker? 19:56, 13 May 2014 (UTC)[reply]

It was advertised in the standard places for 45 days. The RFC question presented three alternatives, the first of which was to leave existing links in place, the second of which was to restore all the links to archive.is that had already been removed, and the third (which gained consensus) was to remove them all. That's about as neutral as you can get, and more widely advertised than normal. That your side did not prevail doesn't mean a discussion is flawed, it simply means that it reached a conclusion that you disagree with.—Kww(talk) 20:04, 13 May 2014 (UTC)[reply]

Question I assume the bot, if approved, will replace, not remove the archives? Will it properly include an edit summary? Thank-you. Prhartcom (talk) 14:12, 13 May 2014 (UTC)[reply]
- Good point. I'm not sure there _are_ replacements, though following the AN thread, it looks like a new archive tool is potentially available. So it certainly should be replacing them where possible. In addition, an edit summary which explains what's going on (ideally with a link to a more detailed explaination) should be required (and trivial I'd think). Hobit (talk) 14:42, 13 May 2014 (UTC)[reply]

User:Kww, what is your answer to this? Prhartcom (talk) 17:28, 13 May 2014 (UTC)[reply]

Easy enough to build a centrally accessible list of what was removed and where it pointed. Finding good replacements is not readily automated.—Kww(talk) 17:35, 13 May 2014 (UTC)[reply]

Agreed that it would be a difficult job and that the bot may have to be semi-automated run by an operator making human decisions as it runs in order to get the archives accurately replaced. Obviously you don't want to simply delete archives and have people mad at you, you want to replace them, achieving the archive.is purge goals as well as link rot prevention goals. I wish you the best of luck with it. Prhartcom (talk) 17:45, 13 May 2014 (UTC)[reply]
Oppose Kww, From your comments in this discussion it doesn't sound like you are interested in the goal of preventing link rot, but only your goal of purging archive.is, so I cannot let you proceed with damaging Wikipedia. Replace not remove. Prhartcom (talk) 21:26, 14 May 2014 (UTC)[reply]

*Oppose Per DWB. Duke Olav Otterson of Bornholm (talk) 15:13, 13 May 2014 (UTC)Blocked as sock.—Kww(talk) 15:45, 13 May 2014 (UTC)[reply]

- Sock of who? Has that user posted here? If not the socking is not abusive and the !vote stands. All the best: Rich Farmbrough, 13:13, 14 May 2014 (UTC).
  - It's an undisclosed alternative account, and is not permitted to participate in community discussions. The block has been upheld by another admin.—Kww(talk) 18:47, 14 May 2014 (UTC)[reply]

Statement: I will make a general comment to whoever closes this thing: this should not be a forum for people that did not prevail at an RFC to attempt to undermine the result. That isn't what a BRFA is about. The RFC had a conclusion, and I am requesting approval to run a bot to implement that conclusion.—Kww(talk) 15:39, 13 May 2014 (UTC)[reply]

Oppose To quote Hawkeye "Do not use a bot to remove links. Per Wikipedia:Archive.is RFC: the removal of Archive.is links be done with care and clear explanation. " Moreover the blocking of The Duke by Kww makes me doubt that Kww is in a good place to run a bot on such a contentious issue. Further the recent discussion was hardly consensual for removing the links, the more time that goes past, the less likely is it that archive.is is abusive as claimed. All the best: Rich Farmbrough, 13:13, 14 May 2014 (UTC).
Oppose I also endorse Rich Farmbrough's view that the RFC was quite clear. The solution is to manually re-find the URLs (or to hunt down replacements at other mirrors) for the ones that have been corrupted by the archiving service (much the same way that I did List of doping cases in sport and it's newly minted subchildren). Removing the parameters outright violates the consensus established at the RFC, and individual editors editing 24/7 to replace these suggests a form of automation and not manually fishing the appropriate archives. Hasteur (talk) 17:18, 14 May 2014 (UTC)[reply]

Rich, Hasteur: the language in the RFC closing is quite clear:"There is a clear consensus for a complete removal of all Archive.is links.". Hawkeye's opinion distorts the RFC closing statement, and does not reflect the actual content of the RFC. The care called for is explicit: "To those removing Archive.is from articles, please be sure to make very clear A) why the community made this decision and B) what alternatives are available to them to deal with rotlink." Not replacement. Not exhaustive searching for alternatives. Again, the purpose of an BRFA is not to provide people that disagree with an RFC an alternate venue to restate their opposition.—Kww(talk) 18:47, 14 May 2014 (UTC)[reply]

Kww you might want to check the RFC again and check your prejudice at the door. I did support removal, but controlled removal to where we don't instantly deadlink the reference by bulk removing archive.is. I'm not attempting to overturn the previous consensus, I am only saying that botting this is not endorsed. Hasteur (talk) 19:04, 14 May 2014 (UTC)[reply]

It doesn't matter what opinion either of us expressed in the RFC, Hasteur. That's not what the closing statement says. It says that the consensus is to remove them all, and there was no consensus for the level of research that you are demanding prior to removal.—Kww(talk) 21:03, 14 May 2014 (UTC)[reply]

I haven't been following the entire discussion about this issue, is there a "tl;dr" somewhere? Is this bot task planning to remove all archive.is links, with the goal that enwp will stop linking to that site as a whole? Or is this just a "cleanup" run to remove all the spammed links. It would be nice if rather than removing, we could convert them to IA links, but that will probably just be a dream ;) Legoktm (talk) 08:20, 15 May 2014 (UTC)[reply]
- @Legoktm: It's in the functional details (Remove "archiveurl=" and "archivedate=" parameters whenever the archiveurl points at archive.is or archive.today.). It means that if the base url is gone, we instantly deadlink the referernce. I observe that it's transcended basic disputes and upholding the consensus and gone to the level of "Cutting off the nose to spite the face" tactics to obliterate links to the offending website. Hasteur (talk) 12:51, 15 May 2014 (UTC)[reply]
- TLDR version for Legoktm: the sole intent of the bot is to remove every reference to archive.is from English Wikipedia. That was the consensus at WP:Archive.is RFC, so that's what the proposed bot would do. Once proposed, editors that did not prevail at the RFC have taken this opportunity to oppose the bot, many of them presenting distorted versions of the RFC close to support their position. If you look above at Hobit's position, you will see that the closer of the RFC agrees that the bot implements the consensus of the RFC. I maintain that that is the sole criteria by which this BRFA should be judged, and all the conversation above is completely irrelevant to the discussion. The question being asked is "does the bot implement the RFC?" not "does the commenter agree that links to archive.is should be removed?"—Kww(talk) 15:08, 15 May 2014 (UTC)[reply]
  - You are correct, though it is perfectly reasonable for those who opposed the removal by any means, to be against the removal by bot, even if they would have supported bot-removal for some other hypothetical links whose removal they supported. Otherwise a system of regression is in place which allows a tyranny of the minority, namely the minority that asks the questions. All the best: Rich Farmbrough, 20:06, 15 May 2014 (UTC).
    - The point is that the people that oppose the bot based on "I don't think the RFC should have generated the result that it did" should have their !votes discarded by whomever closes this thing. There are venues to discuss such things, and BRFA isn't one of them.—Kww(talk) 20:37, 15 May 2014 (UTC)[reply]
      - Kww Please don't strong arm the process like this by trying to use *fD nomenclature like !vote. There are 2 people (myself and Rich Farmbrough) who oppose the bot for completely seperate reasons besides "I don't think the RFC should have generated the result that it did". I am asking that the bot be rejected on the grounds that the deadlinking you propose is more disruptive than a controlled replacement of the links. Hasteur (talk) 20:51, 15 May 2014 (UTC)[reply]
        I'm not strongarming the process at all, Hasteur. The RFC result did not call for leaving links in place when replacements could not be found. It did not call for diligent searching for replacements prior to removal. It called for complete removal of links. The alternative you are attempting to hold the bot to did not gain consensus. I'm quite willing to entertain enhancements such as creating a centralized list of removed links or leaving talk page notices indicating what links have been removed, but I'm not willing to entertain leaving the links in place: that would run counter to the RFC result.—Kww(talk) 21:34, 15 May 2014 (UTC)[reply]
        
        Indeed, and that is perfectly legitimate, if frustrating way to !vote. If it were not, for example, we could get the following situation.
        Scenario 1: kww asks for a BRFA to remove .is links. Vote Yes, 26 % No, 74%. (say 25% think the links are good, 24% think that all bots are evil and 25% think it should be done manually)
        
        Scenario 2: RFC - passes 75%, BRFA, passes 51%.
        
        Clearly this process would be anti-consensus. Equally clearly, by extending the process with sufficient stages, and suitably worded alternatives any conclusion could be reached.
        
        All the best: Rich Farmbrough, 11:02, 16 May 2014 (UTC).

Kww doesn't seem to understand the opposition has nothing to do with "not prevailing" - like this is a trial and we are opted by some "law" to abide it. No part of the RFC was neutral or balanced - despite kww's assertions otherwise. People in the first RFC were under the impression it was all done by a bot - it did not balance the contributions of other editors or even discuss that fact in its opening. The arguments and its closing were highly ambiguous, but its been more than SIX months and much has transpired in that time. I read the RFC as to remove the Bot-added links - not the whole and the close (supervote or not) did not establish a blacklist - but a blacklist was made and the Bot-added links were not purged as was the expected result. Now we are calling for the complete removal of the entire website based on allegations, malware fears, and the acts of a single user all while knowing there are no actual issues with the additions, the website or content displayed itself. And just to top it off, as if it wasn't enough, all in the name of a flawed non-admin closed RFC that took more than six months and a much larger discussion to provoke this attempt to complete an expanded and derived reading as if the last six months (and the blacklist not functioning) never happened. Though it seems consenus can change and it has. ChrisGualtieri (talk) 17:30, 18 May 2014 (UTC)[reply]

First, your reading of the RFC is irrelevant: it was closed, and the closure was never overturned (or even challenged, for that matter). Second, if you believe that the formulation of the RFC was non-neutral, can you at least indicate what part of the original framing was non-neutral? I certainly cast my opinion in one of the discussion sections, but the framing of the circumstances was scrupulously neutral.—Kww(talk) 19:26, 18 May 2014 (UTC)[reply]
- One correction, it was certainly challenged. I think both on my talk page and either AN or ANI. In any case, ChrisGualtieri it seems wise to open a new RfC if you feel the last one was defective (other in Kww's wording or my close) or because CCC. I'm not sure why you haven't done that if you feel there were so many problems. As the closer, I've made it pretty clear I'm comfortable with a new RfC. Heck I'd even be happy to work with you on neutral wording or whatever else might be helpful. I think I closed it correctly and I don't think it was ambiguous--if some part was please let me know and I'll clarify. But I think enough time has passed that a CCC argument is a perfectly good reason to start a new RfC on the topic--I'd not be suprised if you were correct and consensus has changed. Hobit (talk) 22:35, 18 May 2014 (UTC)[reply]
  - We are moving in that direction. Let's talk on your page about an issue or two before a new RFC is made. ChrisGualtieri (talk) 04:05, 19 May 2014 (UTC)[reply]
    - Certainly you could spare a moment to actually identify a specific item in the old RFC that would support your accusations of it being biased. Or is it easier to disrupt this discussion by simply making accusations without supporting them?—Kww(talk) 04:35, 19 May 2014 (UTC)[reply]

@@ Line 116: / Line 116: @@
 :**One correction, it was certainly challenged.  I think both on my talk page and either AN or ANI. In any case, [[User:ChrisGualtieri|ChrisGualtieri]] it seems wise to open a new RfC if you feel the last one was defective (other in Kww's wording or my close) or because CCC.  I'm not sure why you haven't done that if you feel there were so many problems.  As the closer, I've made it pretty clear I'm comfortable with a new RfC. Heck I'd even be happy to work with you on neutral wording or whatever else might be helpful.  I think I closed it correctly and I don't think it was ambiguous--if some part was please let me know and I'll clarify.  But I think enough time has passed that a CCC argument is a perfectly good reason to start a new RfC on the topic--I'd not be suprised if you were correct and consensus has changed.  [[User:Hobit|Hobit]] ([[User talk:Hobit|talk]]) 22:35, 18 May 2014 (UTC)
 :*** We are moving in that direction. Let's talk on your page about an issue or two before a new RFC is made. [[User:ChrisGualtieri|ChrisGualtieri]] ([[User talk:ChrisGualtieri|talk]]) 04:05, 19 May 2014 (UTC)
+:****Certainly you could spare a moment to actually identify a specific item in the old RFC that would support your accusations of it being biased. Or is it easier to disrupt this discussion by simply making accusations without supporting them?&mdash;[[User:Kww|Kww]]([[User talk:Kww|talk]]) 04:35, 19 May 2014 (UTC)