Jump to content

Wikipedia:Link rot/URL change requests/Archives/2021/July

From Wikipedia, the free encyclopedia


Hello. Seems that Seed (magazine)'s old URL has been acquired by somebody and the old references containing seedmagazine.com are being redirected to a new outfit presumably in the chase for the clicks. This is redirecting readers in unexpected ways and preventing normal link-rot bots from recognising the domain is dead. Google search for site:wikipedia.org "seedmagazine.com" suggests there are 130ish mentions remaining, with 30ish on English Wikipedia.
I'm sure that somebody who's better versed in the arcane language of Wikipedia in-text search could come up with more usable numbers.

Could a bot writer consider executing following:

  • Add archive URL where it exists and is missing
  • Flip all instances of |url-status=live --> |url-status=dead (if there are any)
  • Wrap the external links in nowiki tags to discourage readers clicking of the links to prevent them from getting redirected unexpectedly

Thanks. Melmann 18:58, 24 June 2021 (UTC)

This is in the parlance a "usurped" domain (or "hijacked"). What I can do for these is Blacklist in the IABot database for each URL along with an archive URL - this will cause the bot to always treat it as dead and archive even if it pings alive, this is across 80+ wiki languages. On enwiki only, can add |url-status=usurped to any in a CS1|2 template. For bare and square links it will try to convert to an archive URL. If no archive is available, it will require manual attention. -- GreenC 21:34, 24 June 2021 (UTC)

Results done. Links were in about 130 articles on enwiki. Also manually converted a dozen {{webarchive}} to cite templates with |unfit=. And cleaned up the Seed (magazine) article. Updated the IABot db etc.. -- GreenC 03:37, 27 June 2021 (UTC)

I did not know there was an unfit parameter. Should really read the documentation more carefully. Thank you for your efforts, much appreciated. Melmann 21:51, 29 June 2021 (UTC)

www.mod.go.jp only allows https access now

Without a fix each URL gets an error page then a redirect after 10 seconds to the home page (in Japanese) rather than the (for the most past) English page relevant to the topic.

I fixed one article and found there are probably 100-200 articles affected.

Is there a kind soul with a robot to programatically change, seems a lot to do by hand.

http://www.mod.go.jp/* to https://www.mod.go.jp/* ? Alex Sims (talk) 05:00, 2 July 2021 (UTC)

Referring to Bender235 and Bender the Bot . -- GreenC 06:19, 2 July 2021 (UTC)
Will do. --bender235 (talk) 12:56, 2 July 2021 (UTC)

Convert 200.57.183.69 to info.guadalajara2011.org.mx

Example. Only convert when there is an archive URL available for the version at info.guadalajara2011.org.mx otherwise add {{dead link}}. Update IABot database for the IP URL with the archive URL at info.guadalajara2011.org.mx - mark "domain" (IP) blacklisted. -- GreenC 19:38, 4 July 2021 (UTC)

Results Converted 401 links and added 726 {{dead link}}. Example. -- GreenC 15:25, 12 July 2021 (UTC)

columbiabusinesstimes.com

I found several articles with columbiabusinesstimes.com broken links to columbiabusinesstimes.com. Can these links be automatically archived? Jarble (talk) 14:52, 5 July 2021 (UTC)

32 articles. Set dead in IABot and created queue to process. -- GreenC 17:04, 12 July 2021 (UTC)

www.ctv.com

I found at least 100 broken links to this site: can they be automatically archived? Jarble (talk) 14:55, 5 July 2021 (UTC)

The site has links in over 3000 articles (not just /servlet). Many appear to be soft 404s that are difficult to detect due to JavaScript. There are working links too example. -- GreenC 06:22, 13 July 2021 (UTC)
Looks like most links were moved to a new domain ctvnews.ca, sometime before 2016. For example this became this. No redirects and no patterns - everything in the new URL is different. Nevertheless I'm finding a way, for some, to make live again. -- GreenC 02:17, 14 July 2021 (UTC)

Results Every link in the ctv.ca domain on enwiki was processed.

  • moved 2,048 links to ctvnews.ca
  • converted 679 ctv.com to archive URLs
  • added 346 {{dead link}} tags
  • many other misc fixes

In case anyone wants to replicate this on other wikis in the future: This was difficult. It is hard to detect 404 status, and hard to find redirect URLs. The redirects to ctvnews.ca, if not found in the header, but can be found by searching for the ctv.ca URL at Wayback with a timestamp of 20160101 (ie. the site had working redirects at one time, but then deleted them, some redirect URLs (not pages) were luckily saved in the Wayback Machine). Since the site uses JS, and does not emit a proper 404, the way I used is w3m utility in -dump mode and if the output is < 43 lines it is probably a 404. -- GreenC 01:31, 15 July 2021 (UTC)

Moved from WP:BOTREQ

theinquirer.net is gone, what we have linked and redirects to somewhat generic trustedreviews.com uri.

We should be looking to "url-status" neuter any citations linked to theinquirer.net, and killing any "External links". Thanks if someone is set up to do that. — billinghurst sDrewth 07:16, 8 July 2021 (UTC)

You should probably post this at WP:URLREQ instead. * Pppery * it has begun... 12:07, 8 July 2021 (UTC)

This is done. It togged around 500 |url-status=unfit or bare URL to archive URL. I manually converted about 6 bare URLs without archives to cite templates, and 2 dozen {{webarchive}} to cite web. Changed domain status to Blacklisted in IABot database. -- GreenC 22:04, 9 July 2021 (UTC)

Living Books please help!

I wonder if you could help me with some linkrot that has crept into my article Living Books? I have put a lot of effort into putting this mammoth article together, though admittedly citations are not my strong suit. Many of these sources have now fallen to linkrot. Your assistance would be invaluable. :)--Coin945 (talk) 13:28, 13 July 2021 (UTC)

Go to History tab, on the top row is a "Fix dead links" to run InternetArchiveBot. Run it a few times over the next 6 weeks, say, 3-4 times because it delays adding until it gets a dead link result multiple time (unless it already knows it's dead in its database). My bot WaybackMedic is more targeted to certain domains with known problems, or links that already have a {{dead link}} tag. -- GreenC 16:30, 13 July 2021 (UTC)
Moved from WP:RSN

Hi all, World Gazetteer [1] or [2] is used as reference for city population sizes on a lot of pages like List of countries by largest and second largest cities, List of highest cities, List of cities in Ghana and many more (wikipedia search). Links to World Gazetteer don't work and many archived links on Wayback Machine don't work too, a message "Sorry, no offline reader allowed. You can use the download function." is returned. A message on Talk:List_of_countries_by_largest_and_second_largest_cities#World_Gazetteer_as_source indicates that the links don't work since at least 31 July 2019. Some archived links work though, for example at List of Nigerian states by area. Maybe www.citypopulation.de can be used as an alternative source.

Because there a a lot of pages with links to World Gazetteer, I ask here how to proceed. Difool (talk) 09:10, 9 July 2021 (UTC)

@Difool: Post the above at WP:URLREQ and someone will help you. Headbomb {t · c · p · b} 18:13, 10 July 2021 (UTC)

OK. Will get to it. -- GreenC 20:21, 11 July 2021 (UTC)

@Difool: It is done. Any problems let me know. -- GreenC 21:28, 16 July 2021 (UTC)

Results

  • Added 514 new working archive URLs. Example
  • Removed 440 archive URLs due to the "Sorry, no offline reader allowed" soft-404. These had no other archive providers available, the ones that did are in the 514 group.
@GreenC: Wow, thanks a lot! The only problem I saw is that an archived "bevoelkerungsstatistik.de" can give a German soft-404: "Leider kein Offline-Reader erlaubt. Dafür gibt es die Download-Funktion.". For example, this one [3]. Difool (talk) 03:29, 17 July 2021 (UTC)
lol oh man should have guessed. Will see if I can find these. You are welcome. This actually was a new thing for the program: finding soft-404s in archive URLs pre-existing on-wiki, has never come up before. -- GreenC 03:51, 17 July 2021 (UTC)
Found 9, fixed manually. -- GreenC 04:21, 17 July 2021 (UTC)

Reuters (again)

Moved from User_talk:GreenC#Reuters

Hi! It seems like Reuters hs changed their URL structure a bit, from e.g https://www.reuters.com/article/2013/12/05/us-sweden-spying-idUSBRE9B40AG20131205 to https://www.reuters.com/article/us-sweden-spying-idUSBRE9B40AG20131205 (removing the date). The old URLs are dead, and IABot are archiving them as such. Could your bot perhaps "update" The URLs instead to the new ones, if the new ones are 200 and the old ones are 404? Jonatan Svensson Glad (talk) 20:50, 1 July 2021 (UTC)

See e.g. Special:Diff/1031471416&oldid=1022265762. Jonatan Svensson Glad (talk) 20:51, 1 July 2021 (UTC)
Josve05a Yes, will do. But only for enwiki. And it will take probably 3-4 days before edits. Estimate these are in around 10,000 articles. -- GreenC 01:50, 2 July 2021 (UTC)
Great, thanks! Jonatan Svensson Glad (talk) 19:00, 2 July 2021 (UTC)
Josve05a giving a status update [and move of thread]. The first run is uploading diffs right now, incorporating the great discovery you made of the dates, plus fixing 404s, soft-404s, and repairing sub-domain issues first noted by Nemo_bis here. It processed 18,290 articles and found changes in 16,506. There are an additional 31,074 articles with Reuters links in a second set. The bot is beginning to process those.. -- GreenC 04:22, 5 July 2021 (UTC)

This is done for now. -- GreenC 20:20, 11 July 2021 (UTC)

@GreenC: Great work! Jonatan Svensson Glad (talk) 21:33, 18 July 2021 (UTC)