Jump to content

User talk:Monkbot/task 16: remove replace deprecated dead-url params

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Undecided

[edit]

My bot WP:WAYBACKMEDIC and Cyberpower678's User:InternetArchiveBot are together responsible for about 80% of all archives added to Enwiki and also are continually checking and fixing existing archives. Both bots add |dead-url= when it is missing (soon to be |url-status=). Hopefully, we can be on the same page with Monkbot to avoid conflict. I think having it there is a good idea, even if it defaults to 'live'. There are a couple reasons but mainly to get users into the habit of associating all three parameters with archives because many forget or neglect to add the third even when the link is dead. It also makes it easy to switch to dead status without having to remember and/or add the parameter name. Eventually all links die the argument will be needed. -- GreenC 00:52, 2 June 2019 (UTC)[reply]

@Trappist the monk: I don't see why you marked anything "undecided" when the source code is ready and you've requested a bot run. For this run I suppose you plan to keep as is. I however don't see the purpose of this run to change/remove any parameters except the dead-url and alias. And countrary to that (step #3), you seem to go to great length to not remove an empty url-status.
I haven't seen the discussion on why to obsolete dead-url at all, but I do think url-status should be left out in most cases (in the meaning not decided). Setting it "live" needs to be questioned anyways, setting it dead when not dead seems unconstructive. That is, when no archive-url is given the assumption is url-status=live, when archive-url is given the url-status=dead is assumed. Having the possibility to override is useful, but should be rare.
@GreenC: With my suggestion, the argument will not be needed if archive url is only set for dead links. This minimize manual work for people and low need to remember the param name. I could also have suggested url-status=archived as a better name for the override and use url-status=dead only for links that are dead but not archived (or keep it for archived too). That should give more information, so that a bot can monitor an override separate from already found dead links. JAGulin (talk) 18:05, 3 September 2019 (UTC)[reply]
I'm not clear what you mean by 'undecided'. That word does not appear anywhere in User:Monkbot/task 16: remove replace deprecated dead-url params. Similarly 'this run' is also not found on that page so perhaps you are quoting me from someplace else? I intentionally leave |url-status= in response to Editor GreenC's comment at the top of this thread.
In Module:Citation/CS1/Configuration, the meta parameter UrlStatus is set to 'dead'. Before today's change, it was DeadURL set to 'yes'. This is the only parameter that is defaulted in this way and it means that when editors do not set |url-status= to anything (omitted or empty) then the module will assume that the presence of |archive-url= with a value means that the value in |url= is dead. |url-status= is ignored by the module when |archive-url= is empty or omitted. Preemptive archive links are permitted and may, in fact, be encouraged.
Trappist the monk (talk) 18:36, 3 September 2019 (UTC)[reply]
If an archive URL exists, best practice is to have all three arguments (archive-url, archive-date and url-status). |archive-url= can exist regardless of |url= live or dead so nothing can be assumed there ("the argument will not be needed if archive url is only set for dead links"). Likewise URLs die organically and having |url-status= in the cite makes it easier for anyone to change |url-status= to dead vs. not remembering the argument name or even knowing it exists. We can assume all links die in time, and will need it (if a |archive-url= is set). -- GreenC 19:01, 3 September 2019 (UTC)[reply]
@GreenC:Is that "best practice" your opinion or is it documented somewhere? I think we should not add useless manual work. You quoted my if-statement, but talks about the else-path I left out. In that case, the parameter is needed, but it doesn't mean that it should be introduced in the case when not needed. |archive-url= can exist regardless is true, but anyone manually adding it would either do it because the link is dead or because they like to do extra work. In the latter case, they should then add either "live" or "archived" to signal which kind of template override they want. Ttm just agreed that no status parameter will make the template assume status=dead and I agree that's a case where the parameter is required.
When the link dies someone should set the archive-url, but leave status unset. If there's already a premature archive added, the parameter should also be available, so my suggestion also seems to give what you ask for.
@Trappist the monk:The wording was Still to be decided: dead is the default. I assumed that it was the part this thread was all about. On the other hand I see nothing here asking you to keep empty paramerters. What is the wording?
This run was obviously a reference to the Wikipedia:Bots/Requests for approval/Monkbot 16 or anything running the User:Monkbot/task 16: remove replace deprecated dead-url params. You mention "ancillary tasks", but I think changes like this should be focused on the main task. The deleted_count doesn't count that work, so it isn't mentioned in the change logs.
Since this is not so much about your code as it is about the parameters to the template, could you please point me to the discussion where the url-status was defined? JAGulin (talk) 19:39, 3 September 2019 (UTC)[reply]
Still to be decided ... is a leftover from before the time that I agreed to retain |url-status= and |url-status=dead (not really needed because this is the default case when |url-status= is omitted). That paragraph can go away.
So this run is not a quote from me but your own words? The ancillary task, there is only one, was an extension of the deletion of empty |url-status= (before the decision to retain empty |url-status= was taken). Because I was deleting empty |url-status=, deleting other empty parameters in that same template was (is) a simple task – the ancillary task does not delete empty parameters from templates that were not modified to fix |dead-url=. If needs must, it can go away though keeping it is harmless.
Trappist the monk (talk) 20:01, 3 September 2019 (UTC)[reply]
@GreenC: The {{para|url-status}} shows as empty parameter. I assumed you meant "proper value", not "always empty", in your initial post. Did I misunderstand? Do you see value in keeping empty url-status? JAGulin (talk) 19:55, 3 September 2019 (UTC)[reply]
@GreenC: I disagree. The url-status parameter should be set only in the rare cases where the default is not appropriate. Don't fill the edit box with cruft. Don't fill my watchlist with bot edits doing nothing useful. Users should flag a url as dead by applying the archive-url and archive-date parameters only. --Srleffler (talk) 02:29, 12 September 2019 (UTC)[reply]
That is opposite how it works, the default is the same as url-status=no should that param be missing. Nobody is filling edit boxes, sounds like a misunderstanding what the discussion is about. -- GreenC 03:00, 12 September 2019 (UTC)[reply]
Clearly you are confused. The default state is |url-status=dead.
Trappist the monk (talk) 03:04, 12 September 2019 (UTC)[reply]
Guess so. Late night posting. Anyway, checking the code I am mistaken the bots add the param when it is missing, as a sole action, rather only when doing something else like adding/changing/deleting the archive. It's not filling in as a sole action this would be cosmetic. But since most citations contain all three arguments (most archives were created/edited by bot) deleting the argument would be a lot of edits. From that perspective. -- GreenC 04:52, 12 September 2019 (UTC)[reply]

Test cases

[edit]

I don't see any unit testing, and that should be first step for regex like this. Even for currently incorrect template usage, the bot outcome should be predictable and known to the task reviewers. Please add tests where the following is part of the template parameters:

deadurls=no
dead-url=nose
dead-url=truely
deadurl==yes
deadurl=http://true
deadurl=|archive-url=yes/no
deadurl=y|dead-url=false|archive-url=yes/no

Cheers, JAGulin (talk) 19:39, 3 September 2019 (UTC)[reply]

deadurl=|dead-url=no|no-archive-url=
url=example.com/deadurl/404.html|dead-url=yes
url=//example.com/?deadurl=no|dead-url=yes
url=//example.com/page{!}name|dead-url=yes
dead-url=yes|url=//example.com/page{!}name|other={\{1}}|leftover=true

What if it is within "<code></code>" tags? You only edit articles, right, in that case it shouldn't be a problem.

This comment from code seems to exclude empty param, possibly not tested (when being the only cite-template in the page):

// use Wikisearch: insource:/\| *dead\-?url *= *[^\|\}]/

-- JAGulin (talk) 20:55, 3 September 2019 (UTC)[reply]

[edit]

The search insource:/\| *dead\-?url *= *[^\|\}]/ finds about 850 cases. Modify to insource:/\| *dead\-?url *= *[^\|\}]*/ (extra "*" at the end) it is about 46,000 cases. It matches |dead-url=| with no space between the = and | .. the reason is [^\|\}] says "there must be something other than one of these characters" and since there is nothing there but one of those characters it doesn't match. The "*" makes it optional. Or it could be insource:/\| *dead\-?url *=/ is over 54,000 articles. -- GreenC 03:20, 13 October 2019 (UTC)[reply]