Wikipedia:Link rot/URL change requests: Difference between revisions
GrapesRock (talk | contribs) m →ghostarchive.org: Add a question mark |
→ghostarchive.org: Reply |
||
Line 1,558: | Line 1,558: | ||
:I just tried two links [https://ghostarchive.org/archive/20220110/https://www.telegraph.co.uk/sport/football/teams/england/7823072/England-v-USA-1950-World-Cup-win-over-the-Three-Lions-lives-long-in-the-memory.html] from [[England national football team]] and [https://ghostarchive.org/varchive/youtube/20211221/XAJEXUNmP5M] from [[YouTube]] and they both seemed to work? [[User:GrapesRock|GrapesRock]] ([[User talk:GrapesRock|talk]]) 03:14, 20 July 2024 (UTC) |
:I just tried two links [https://ghostarchive.org/archive/20220110/https://www.telegraph.co.uk/sport/football/teams/england/7823072/England-v-USA-1950-World-Cup-win-over-the-Three-Lions-lives-long-in-the-memory.html] from [[England national football team]] and [https://ghostarchive.org/varchive/youtube/20211221/XAJEXUNmP5M] from [[YouTube]] and they both seemed to work? [[User:GrapesRock|GrapesRock]] ([[User talk:GrapesRock|talk]]) 03:14, 20 July 2024 (UTC) |
||
::::Same, GhostArchive works fine for me... don't know why GreenC is/was having trouble. [[User:Nexnot|N''ex'']] [[Special:Random|🌐]] [[WP:Signpost|📰]]<sup>[[User talk:Nexnot| leave a message]]</sup> 04:05, 20 July 2024 (UTC) |
Revision as of 04:05, 20 July 2024
This page is for requesting modifications to URLs, such as marking dead or changing to a new domain. Some bots are designed to fix link rot; they can be notified here. These include InternetArchiveBot and WaybackMedic. This page can be monitored by bot operators from other language wikis since URL changes are universally applicable.
finlex.fi
This section is pinned and will not be automatically archived. |
Finlex.fi URLs aren't dead but for some reason InternetArchiveBot keeps adding archived URLs for them. This was brought up at meta:User_talk:InternetArchiveBot#Finlex.fi_URLs_aren't_dead a month ago: Bot's edits: [1], [2], [3]. Some URLs it tagged as dead but are actually working: [4], [5], [6].
Those finlex.fi URLs that now have both a working URL and an archive URL should be tagged with the |url-status=live
tag, and could someone try to tell IABot that Finlex is live? Thanks. 2001:14BA:9C94:9A00:E866:DADA:1085:E3D9 (talk) 09:28, 17 March 2024 (UTC)
- Just noticed that this same issue is being discussed at fi.wikipedia: fi:Wikipedia:Kahvihuone_(tekniikka)#Botti_hakee_arkistosta_kumottuja_lakeja 2001:14BA:9C94:9A00:E866:DADA:1085:E3D9 (talk) 09:41, 17 March 2024 (UTC)
- The site has a "Are you human?" check box (CloudFlare). This is causing the bot to think it's a dead site. I logged into iabot.org and changed the domain to "Subscription" status and that will cause the bot to avoid this domain, it won't set live or dead. My bot WaybackMedic has capabilities to bypass CloudFlare. I can try to process this domain and see what happens. My bot also has a feature "make live" ie. convert a citation from dead to live state. Unfortunately my bot only works on English Wikipedia. I'll let you know what happens. -- GreenC 15:13, 17 March 2024 (UTC)
- Unfortunately, this site has maximum security enabled, none of my tools can get through. It started happening in late January 2024. I don't know what to do because no bot is able to determine if a link is live or dead. And no archive service such as WaybackMachine is able to archive a page. Only humans can get through, and they need to solve a captcha. It might be worthwhile waiting to see if they relax security in the future, since this is a recent development. -- GreenC 00:40, 19 March 2024 (UTC)
- @GreenC: Before this section gets archived and if it's easy/fast to check, can you check if this is still the case, i.e. that the site still has the maximum security enabled and no tool/bot can get through? Thank you. 85.76.109.152 (talk) 06:21, 2 June 2024 (UTC)
- When going to [7] it still asks "Are you human?" with the CloudFlare security tag at the bottom. This is a feature of CloudFlare service, clients have the option to enable, it's the highest level of security. I'm not aware of a tool that can bypass. What I will do is set a reminder in 6 months to check again and post the results here. I use W-Ping which posts a reminder in the watchlist at whatever time in the future with a custom message. -- GreenC 16:06, 2 June 2024 (UTC)
- @GreenC: Before this section gets archived and if it's easy/fast to check, can you check if this is still the case, i.e. that the site still has the maximum security enabled and no tool/bot can get through? Thank you. 85.76.109.152 (talk) 06:21, 2 June 2024 (UTC)
- Unfortunately, this site has maximum security enabled, none of my tools can get through. It started happening in late January 2024. I don't know what to do because no bot is able to determine if a link is live or dead. And no archive service such as WaybackMachine is able to archive a page. Only humans can get through, and they need to solve a captcha. It might be worthwhile waiting to see if they relax security in the future, since this is a recent development. -- GreenC 00:40, 19 March 2024 (UTC)
- The site has a "Are you human?" check box (CloudFlare). This is causing the bot to think it's a dead site. I logged into iabot.org and changed the domain to "Subscription" status and that will cause the bot to avoid this domain, it won't set live or dead. My bot WaybackMedic has capabilities to bypass CloudFlare. I can try to process this domain and see what happens. My bot also has a feature "make live" ie. convert a citation from dead to live state. Unfortunately my bot only works on English Wikipedia. I'll let you know what happens. -- GreenC 15:13, 17 March 2024 (UTC)
RateTheRef.net
The website RateTheRef.net seems to have been usurped by a Thai gambling site. I don't know how many pages this affects, or whether the old content has been archived, but I figured someone ought to be told. DavidKVT (talk) 21:21, 18 March 2024 (UTC)
- Done User:DavidKVT: Thank you. Added to the JUDI list for a batch job later: Special:Diff/1207703597/1214769148 -- GreenC 01:26, 21 March 2024 (UTC)
cinestaan.com
It looks like the site is dead as I cannot find it on Google search, and an article is error 503. Check this out too. Kailash29792 (talk) 11:10, 23 March 2024 (UTC)
- 2,243 pages. Offline since December 2023. I can do this. -- GreenC 14:16, 23 March 2024 (UTC)
- Done Checked 2,243 pages. Edited 2,206 pages. Added 2,371 archive URLs all WaybackMachine. Added 312
{{dead link}}
tags. Added 255|url-status=dead
for existing archive URLs previously set live. Updated IABot database so changes will propagate to 300+ other wiki language sites. -- GreenC 16:54, 24 March 2024 (UTC)
Bumping thread. GreenC 19:27, 4 June 2024 (UTC) -- GreenC 19:27, 4 June 2024 (UTC)
cfa-www.harvard.edu
URLs of form cfa-www.harvard.edu/iauc can be converted to cbat.eps.harvard.edu/iauc
- http://cfa-www.harvard.edu/iauc/08500/08524.html -->
- http://cbat.eps.harvard.edu/iauc/08500/08524.html
-- GreenC 13:55, 5 April 2024 (UTC)
- Done - converted 63 links: Example Special:Diff/1209696306/1218342515. All edits: [8] -- GreenC 04:24, 11 April 2024 (UTC)
Tag: FABLE-0424
archive.thisislancashire.co.uk
Conversion:
- http://archive.thisislancashire.co.uk/1998/5/8/801697.html -->
- https://www.lancashiretelegraph.co.uk/archive/1998/5/8/801697.html/ (include trailing slash)
-- GreenC 14:50, 5 April 2024 (UTC)
- Not done - too many false positives about 50%. Requires manual checks for each link (aprox 160). Contact me if interested in doing this work, can provide the data. -- GreenC 13:06, 11 April 2024 (UTC)
Tag: FABLE-0424
herbaria4.herb.berkeley.edu
Conversion:
- http://herbaria4.herb.berkeley.edu/eflora_display.php?tid=21820 -->
- https://ucjeps.berkeley.edu/eflora/eflora_display.php?tid=21820
-- GreenC 14:59, 5 April 2024 (UTC)
- Done - converted 232 links: Example Special:Diff/1165643139/1218392247. All edits: [9] -- GreenC 13:22, 11 April 2024 (UTC)
Tag: FABLE-0424
fallingrain.com
Conversion:
- http://www.fallingrain.com/world/PK/3/Toru.html -->
- https://www.fallingrain.com/world/PK/03/Toru.html
1,318 pages -- GreenC 04:19, 6 April 2024 (UTC)
- Done - converted 1,204 links: Example Special:Diff/1216003034/1218402425. All edits: [10] -- GreenC 14:36, 11 April 2024 (UTC)
Tag: FABLE-0424
ilmbwww.gov.bc.ca
Conversion:
http://(wlap|srm|ilmb)www.gov.bc.ca/bcgn-bin/bcg10?name=5586
-->- https://apps.gov.bc.ca/pub/bcgnws/names/5586.html
73 pages -- GreenC 04:25, 6 April 2024 (UTC)
- Done - converted 60 links: Example Special:Diff/1179972920/1218415470. All edits: [11] -- GreenC 15:59, 11 April 2024 (UTC)
Tag: FABLE-0424
quinzaine-realisateurs.com
Conversion:
- http://www.quinzaine-realisateurs.com/qz_an/1998/ -->
- http://www.quinzaine-cineastes.fr/fr/edition/1998
66 pages -- GreenC 04:41, 6 April 2024 (UTC)
- Done - converted 49 links: Example Special:Diff/1112134327/1218507144. All edits: [12] -- GreenC 03:31, 12 April 2024 (UTC)
Tag: FABLE-0424
sherdog.com
Conversion:
- http://www.sherdog.com/news/press%20releases/Cage-Warriors-Announce-Line-Up-10246 -->
- https://www.sherdog.com/news/pressreleases/Cage-Warriors-Announce-LineUp-10246
22 pages -- GreenC 14:01, 6 April 2024 (UTC)
Done - converted 24 links. Example Special:Diff/1193444370/1218565151. All edits: [13] -- GreenC 13:43, 12 April 2024 (UTC)
Tag: FABLE-0424
organismnames.com
Many links are marked dead, but are actually live. Reprocess and reset.
-- GreenC 14:21, 6 April 2024 (UTC)
Done - converted 68 citations to live status. Example Special:Diff/1190213071/1218577608. All edits: [14] -- GreenC 15:14, 12 April 2024 (UTC)
Tag: FABLE-0424
fchd.info
- Convert all to https
- If URL contains a long-dash convert to short dash eg.
-- GreenC 14:36, 6 April 2024 (UTC)
- Done - converted 2,455 links to https. 329 switched from dead to live (Special:Diff/1212622277/1218602461 & Special:Diff/1174652764/1218602000). Fix 5 with long-dash error: Special:Diff/1193226855/1218601104. All edits: [15] -- GreenC 20:49, 12 April 2024 (UTC)
Tag: FABLE-0424
beta.latimes.com
Conversion:
- http://beta.latimes.com/world/africa/la-fg-zimbabwe-arrest-american-20171103-story.html -->
- https://www.latimes.com/world/africa/la-fg-zimbabwe-arrest-american-20171103-story.html
96 pages -- GreenC 15:16, 6 April 2024 (UTC)
- Done - Converted 101 links. Removed 47
{{dead link}}
. Switched 12|url-status=dead
tolive
. All edits: [16] -- GreenC 01:31, 13 April 2024 (UTC)
Tag: FABLE-0424
archive.ilmb.gov.bc.ca
Conversion:
- http://archive.ilmb.gov.bc.ca/bcgn-bin/bcg10?name=1141
- https://apps.gov.bc.ca/pub/bcgnws/names/1141.html
71 pages -- GreenC 17:14, 6 April 2024 (UTC)
- Done - Converted 46 links. Removed 6
{{dead link}}
templates. Added 22{{dead link}}
. Switched 11|url-status=dead
to live. All edits: [17] -- GreenC 01:55, 13 April 2024 (UTC)
Tag: FABLE-0424
www.hrc.org/blog
Conversion:
- https://www.hrc.org/blog/hrc-endorses-u.s.-rep.-colin-allred-and-state-rep.-julie-johnson -->
- https://www.hrc.org/news/hrc-endorses-u-s-rep-colin-allred-and-state-rep-julie-johnson
The "/news" could also be "/press-releases/". The "." convert to "-"
- Done - Checked 258 pages and edited 168 pages. Converted 418 links. Switched 24
|url-status=live
to dead. Added 54 archive URLs (50 Wayback). -- GreenC 19:32, 13 April 2024 (UTC)
Conversion:
- https://www.hrc.org/press/hrc-endorses-kyrsten-sinema-for-u.s.-senate
- https://www.hrc.org/press-releases/hrc-endorses-kyrsten-sinema-for-u.s.-senate
-- GreenC 17:27, 6 April 2024 (UTC)
- Done - Converted 12 links manually. -- GreenC 18:16, 13 April 2024 (UTC)
Tag: FABLE-0424
arrs.run
Conversion:
Add "/MaraRank/", "https" and remove "www"
51 pages -- GreenC 17:33, 6 April 2024 (UTC)
- Done - Checked 36 pages and edited 36 pages. Converted 36 links. Removed 23
{{dead link}}
templates. -- GreenC 20:30, 13 April 2024 (UTC)
Tag: FABLE-0424
algerie360.com/sport
Conversion:
- https://www.algerie360.com/sport/division-1-division-2/hemani-lache-laso-pour-le-csc/
- https://www.algerie360.com/hemani-lache-laso-pour-le-csc/
Remove everything in path but last element.
21 pages -- GreenC 17:42, 6 April 2024 (UTC)
- Done - Checked 19 pages and edited 16 pages. Converted 14 links. Removed 3
{{dead link}}
templates. Added 1{{dead link}}
. Switched 9|url-status=dead
to live. Added 2 archive URLs (2 Wayback). -- GreenC 01:51, 14 April 2024 (UTC)
Tag: FABLE-0424
soccerbase.com
Conversion (players):
- http://www.soccerbase.com/players_details.sd?playerid=63162
- https://www.soccerbase.com/players/player.sd?player_id=63162
- Done - Checked 791 pages and edited 785 pages. Converted 1345 links. Removed 14
{{dead link}}
templates. Switched 342|url-status=dead
to live. Switched 1|url-status=live
to dead. Added 16 archive URLs (6 Wayback).
Conversion (managers):
- http://www.soccerbase.com/managers2.sd?managerid=891
- http://www.soccerbase.com/managers/manager.sd?manager_id=891
- Done - Checked 162 pages and edited 160 pages. Converted 449 links. Switched 167
|url-status=dead
to live.
Conversion (referees):
- http://www.soccerbase.com/refs2.sd?refid=1042
- http://www.soccerbase.com/referees/referee.sd?referee_id=1042
- Done - Checked 60 pages and edited 60 pages. Converted 65 links. Removed 1
{{dead link}}
templates. Added 2 archive URLs (0 Wayback).
Conversion (teams):
- http://www.soccerbase.com/teams2.sd?teamid=2493
- https://www.soccerbase.com/teams/team.sd?team_id=2493
- Done - Checked 86 pages and edited 86 pages. Converted 95 links. Switched 10
|url-status=dead
to live. Added 7 archive URLs (0 Wayback).
-- GreenC 16:11, 14 April 2024 (UTC)
Tag: FABLE-0424
boxingscene.com
Conversion:
- https://www.boxingscene.com/%20/arum-fury-wilder-happen-even-2021-then-joshua-whyte--150822
- https://www.boxingscene.com/arum-fury-wilder-happen-even-2021-then-joshua-whyte--150822
73 pages -- GreenC 18:47, 6 April 2024 (UTC)
- Done - Checked 72 pages and edited 72 pages. Converted 93 links. Removed 7
{{dead link}}
templates. Added 2 archive URLs (2 Wayback). -- GreenC 16:57, 14 April 2024 (UTC)
Tag: FABLE-0424
nzfootball.co.nz
Conversion:
- https://www.nzfootball.co.nz/newsarticle/77966?newsfeedId=569432
- https://www.nzfootball.co.nz/newsarticle/77966
220 pages -- GreenC 18:53, 6 April 2024 (UTC)
- Not done - nothing to do. Links have same content. -- GreenC 17:02, 14 April 2024 (UTC)
Tag: FABLE-0424
wnbl.com.au
Conversion:
- (old): http://wnbl.com.au/todhunter-re-signs-rangers/
- (new): https://wnbl.basketball/blog/news/todhunter-re-signs-rangers/
- (old) http://wnbl.com.au/bendigo-spirit-welcome-back-special-k/
- (new) https://wnbl.basketball/blog/news/bendigo-spirit-welcome-back-special-k/
If path does not contain "/" or "?" or "&" or "#" .. test replacement URL at wnbl.basketball/blog/news
Conversion:
- http://wnbl.com.au/bendigo_news/spirit-reaches-sky/
- https://wnbl.basketball/bendigo/news/spirit-reaches-sky/
"/bendigo_news/" --> "/bendigo/news/"
- Done - Checked 310 pages and edited 185 pages. Converted 403 links. Removed 29
{{dead link}}
templates. Added 2{{dead link}}
. Switched 29|url-status=dead
to live. Switched 1|url-status=live
to dead. Added 151 archive URLs (145 Wayback). -- GreenC 15:14, 15 April 2024 (UTC)
Tag: FABLE-0424
unpo.org
Conversion:
6 pages -- GreenC 21:51, 6 April 2024 (UTC)
- Done (manually) -- GreenC 01:35, 8 April 2024 (UTC)
Tag: FABLE-0424
nonleaguescotland.org.uk
Conversion:
-- GreenC 22:48, 6 April 2024 (UTC)
- Done - Checked 96 pages and edited 82 pages. Converted 591 links. Added 1
{{dead link}}
. Switched 95|url-status=dead
to live. Added 30 archive URLs (30 Wayback). -- GreenC 17:16, 15 April 2024 (UTC)
Tag: FABLE-0424
mediapost.com
Conversion:
- http://www.mediapost.com/publications/index.cfm?fa=Articles.showArticle&art_aid=80921
- https://www.mediapost.com/publications/article/80921/
-- GreenC 03:32, 7 April 2024 (UTC)
- Done - Checked 22 pages and edited 21 pages. Converted 20 links. Removed 3
{{dead link}}
templates. Switched 15|url-status=dead
to live. -- GreenC 19:18, 15 April 2024 (UTC)
Tag: FABLE-0424
thehill.com
Convert from http to https. Some http are 404 but https version is 200.
-- GreenC 03:42, 7 April 2024 (UTC)
- Done - Checked 3,465 pages and edited 3,344 pages. Converted 7,679 links. Removed 1
{{dead link}}
templates. Added 9{{dead link}}
. Switched 105|url-status=dead
to live. Switched 27|url-status=live
to dead. Added 347 archive URLs (254 Wayback). -- GreenC 14:24, 16 April 2024 (UTC)
Tag: FABLE-0424
rugbyleagueproject.org
Conversion:
- http://www.rugbyleagueproject.org/competitions/NSWRL_1945.html
- http://www.rugbyleagueproject.org/seasons/NSWRFL_1945/summary.html
-- GreenC 14:53, 7 April 2024 (UTC)
- Done - Checked 23 pages and edited 24 pages. Converted 20 links. Removed 2
{{dead link}}
templates. Switched 1|url-status=dead
to live. -- GreenC 15:45, 16 April 2024 (UTC)
Tag: FABLE-0424
projects.militarytimes.com
Conversion:
- http://projects.militarytimes.com/citations-medals-awards/recipient.php?recipientid=1068
- https://valor.militarytimes.com/hero/1068
-- GreenC 14:59, 7 April 2024 (UTC)
- Done - Checked 570 pages and edited 568 pages. Converted 647 links. Removed 3
{{dead link}}
templates. Switched 449|url-status=dead
to live. Added 5 archive URLs (0 Wayback). -- GreenC 16:50, 16 April 2024 (UTC)
Tag: FABLE-0424
bundesliga.com
Conversion:
- https://www.bundesliga.com/en/bundesliga/news/noblsp-dfb-cup-final-live-blog-bayern-muenchen-borussia-dortmund.jsp
- https://www.bundesliga.com/en/news/Bundesliga/noblsp-dfb-cup-final-live-blog-bayern-muenchen-borussia-dortmund.jsp
-- GreenC 15:12, 7 April 2024 (UTC)
- Done - Checked 515 pages and edited 116 pages. Converted 136 links. Removed 3
{{dead link}}
templates. Added 0{{dead link}}
. Switched 7|url-status=dead
to live. Switched 0|url-status=live
to dead. Added 7 archive URLs (2 Wayback). -- GreenC 21:21, 16 April 2024 (UTC)
Tag: FABLE-0424
plus.lesoir.be
Conversion:
- http://plus.lesoir.be/90745/article/2017-04-20/agression-de-deux-policiers-schaerbeek-hicham-diop-sera-juge-en-correctionnelle
- https://www.lesoir.be/90745/article/2017-04-20/agression-de-deux-policiers-schaerbeek-hicham-diop-sera-juge-en-correctionnelle
-- GreenC 15:50, 7 April 2024 (UTC)
- Done - Checked 107 pages and edited 106 pages. Converted 119 links. Removed 1
{{dead link}}
templates. Switched 3|url-status=dead
to live. -- GreenC 01:28, 17 April 2024 (UTC)
Tag: FABLE-0424
247sports.com
Conversion:
- https://247sports.com/nfl/dallas-cowboys/Bolt/The-Dallas-Cowboys-2018-regular-season-schedule-117463461
- https://247sports.com/nfl/dallas-cowboys/Article/Dallas-Cowboys-2018-regular-season-schedule-released-117463461
Follow redirects.
-- GreenC 16:19, 7 April 2024 (UTC)
- Done - Checked 4,914 pages and edited 941 pages. Converted 787 links. Removed 22
{{dead link}}
templates. Added 152{{dead link}}
. Switched 17|url-status=dead
to live. Switched 17|url-status=live
to dead. Added 191 archive URLs (143 Wayback). -- GreenC 15:23, 18 April 2024 (UTC)
Tag: FABLE-0424
ytfc.net
Conversion:
- http://www.ytfc.net/news/article/2016-17/hedges-loan-cut-short-3545577.aspx
- https://www.ytfc.net/hedges-loan-cut-short/
-- GreenC 17:09, 7 April 2024 (UTC)
- Done - Checked 71 pages and edited 62 pages. Converted 150 links. Removed 1
{{dead link}}
templates. Added 0{{dead link}}
. Switched 41|url-status=dead
to live. Switched 8|url-status=live
to dead. Added 71 archive URLs (66 Wayback). -- GreenC 21:12, 18 April 2024 (UTC)
Tag: FABLE-0424
uslpdl.com
Conversions:
- http://www.uslpdl.com/news_article/show/759968?referrer_id=2313812
- https://www.uslleaguetwo.com/news_article/show/759968
-- GreenC 20:24, 7 April 2024 (UTC)
- Done - Checked 45 pages and edited 45 pages. Converted 52 links. Removed 3
{{dead link}}
templates. Switched 23|url-status=dead
to live. -- GreenC 00:57, 19 April 2024 (UTC)
Tag: FABLE-0424
geoelections.free.fr
Conversion:
- http://geoelections.free.fr/USA/elec_comtes/1892bidw
- http://geoelections.free.fr/USA/elec_comtes/1892bidw.htm
516 pages (of which 390 already have .htm)
-- GreenC 20:29, 7 April 2024 (UTC)
- Done - Checked 506 pages and edited 144 pages. Converted 135 links. Removed 12
{{dead link}}
templates. Switched 2|url-status=live
to dead. Added 31 archive URLs (31 Wayback). -- GreenC 01:25, 19 April 2024 (UTC)
Tag: FABLE-0424
m.pitchfork.com
Conversion:
- http://m.pitchfork.com/news/63742-kanye-west-says-new-album-coming-this-summer/
- https://pitchfork.com/news/63742-kanye-west-says-new-album-coming-this-summer/
-- GreenC 21:21, 7 April 2024 (UTC)
- Done - Checked 31 pages and edited 30 pages. Converted 31 links. Removed 4
{{dead link}}
templates. Switched 11|url-status=dead
to live. -- GreenC 01:38, 19 April 2024 (UTC)
Tag: FABLE-0424
sundayobserver.lk
Conversion:
- http://www.sundayobserver.lk/2009/07/05/mag04.asp
- http://archives.sundayobserver.lk/2009/07/05/mag04.asp
-- GreenC 23:35, 7 April 2024 (UTC)
- Done - Checked 1,582 pages and edited 1,571 pages. Converted 2,262 links. Removed 2
{{dead link}}
templates. Added 4{{dead link}}
. Switched 758|url-status=dead
to live. Switched 23|url-status=live
to dead. Added 88 archive URLs (70 Wayback). -- GreenC 23:14, 19 April 2024 (UTC)
Tag: FABLE-0424
nation.com.pk
Conversion:
- http://www.nation.com.pk/pakistan-news-newspaper-daily-english-online/Regional/Karachi/31-Dec-2009/Karachi-blast-mastermind-was-arrested-10-days-before-Ashura
- https://www.nation.com.pk/31-Dec-2009/karachi-blast-mastermind-was-arrested-10-days-before-ashura
-- GreenC 23:38, 7 April 2024 (UTC)
- Done - Checked 411 pages and edited 406 pages. Converted 515 links. Removed 19
{{dead link}}
templates. Added 3{{dead link}}
. Switched 349|url-status=dead
to live. Switched 0|url-status=live
to dead. Added 27 archive URLs (10 Wayback). -- GreenC 04:23, 20 April 2024 (UTC)
Tag: FABLE-0424
goldbook.iupac.org
Conversion:
-- GreenC 00:42, 8 April 2024 (UTC)
- Done - Checked 19 pages and edited 19 pages. Converted 19 links. Removed 3
{{dead link}}
templates. -- GreenC 16:24, 20 April 2024 (UTC)
Tag: FABLE-0424
timesofindia.com
Redirects to timesofindia.indiatimes.com .. site needs general work for 404s, soft-404s, https, conversion m.timesofindia.com and so on.
-- GreenC 00:51, 8 April 2024 (UTC)
- Done - Checked 7,198 pages and edited 6,471 pages. Converted 10,180 links. Removed 88
{{dead link}}
templates. Added 240{{dead link}}
. Switched 269|url-status=dead
to live. Switched 95|url-status=live
to dead. Added 818 archive URLs (745 Wayback). -- GreenC 17:46, 21 April 2024 (UTC)
Tag: FABLE-0424
timesofindia.indiatimes.com
Correction: change the above converted URLs containing /amp_whatever/
to /whatever/
eg.
- https://timesofindia.indiatimes.com/entertainment/hindi/bollywood/news/actress-ridhima-pandit-has-special-plans-for-her-birthday/amp_articleshow/69903610.cms
- https://timesofindia.indiatimes.com/entertainment/hindi/bollywood/news/actress-ridhima-pandit-has-special-plans-for-her-birthday/articleshow/69903610.cms
amp_articleshow amp_videoshow amp_etphotostory amp_movie_review amp_ottmoviereview amp_seasonreview amp_movieshow amp_photostory amp_etarticleshow amp_ifsccode amp_recipeshow amp_article.show amp_liveblog amp_articleshow?from=mdr amp_seriesreview
- Done - Checked 2,505 pages and edited 2,344 pages. Converted 5,550 links. Removed 3
{{dead link}}
templates. Added 35{{dead link}}
. Switched 150|url-status=dead
to live. Switched 35|url-status=live
to dead. Added 57 archive URLs (54 Wayback). -- GreenC 01:36, 22 April 2024 (UTC)
Tag: FABLE-0424
m.timesofindia.com
Convert cases like:
{{cite web |url= http://m.timesofindia.com/world/south-asia/Nepal-earthquake-death-toll-rises-to-8413/articleshow/47187088.cms |title= Nepal earthquake death toll rises to 8,413 |date= 7 May 2015 |website= The Times of India |access-date= 9 May 2015 |url-status=dead |archive-url= https://web.archive.org/web/20150510085521/http://m.timesofindia.com/world/south-asia/Nepal-earthquake-death-toll-rises-to-8413/articleshow/47187088.cms |archive-date= 10 May 2015 |df= dmy-all}}
To:
{{cite web |url= http://timesofindia.indiatimes.com/world/south-asia/Nepal-earthquake-death-toll-rises-to-8413/articleshow/47187088.cms |title= Nepal earthquake death toll rises to 8,413 |date= 7 May 2015 |website= The Times of India |access-date= 9 May 2015 |url-status=dead |archive-url= http://web.archive.org/web/20150509190440/http://timesofindia.indiatimes.com/world/south-asia/Nepal-earthquake-death-toll-rises-to-8413/articleshow/47187088.cms |archive-date= 9 May 2015 |df= dmy-all}}
1,183 pages -- GreenC 01:49, 22 April 2024 (UTC)
- Done - Checked 1,185 pages and edited 120 pages. Converted 136 links. Removed 54
{{dead link}}
templates. Switched 75|url-status=dead
to live. -- GreenC 13:01, 22 April 2024 (UTC)
mtoi-pass2 further adjustments:
- Done - Checked 1,075 pages and edited 1,015 pages. -- GreenC 23:26, 24 April 2024 (UTC)
mtoi-pass3 further adjustments:
- Done - Checked 109 pages and edited 41 pages (not tagged). -- GreenC 01:14, 25 April 2024 (UTC)
Tag: FABLE-0424
economictimes.com
Same as above..
- https://www.economictimes.com/news/politics-and-nation/dilip-ghosh-makes-u-turn-says-not-in-favour-of-division-of-bengal/amp_articleshow/85587719.cms
- https://economictimes.indiatimes.com/news/politics-and-nation/dilip-ghosh-makes-u-turn-says-not-in-favour-of-division-of-bengal/articleshow/85587719.cms
- Be mindful
/amp_whatever/
- Be mindful
-- GreenC 21:49, 8 April 2024 (UTC)
- Done - Checked 1,249 pages and edited 1,163 pages. Converted 1,331 links. Removed 11
{{dead link}}
templates. Added 23{{dead link}}
. Switched 35|url-status=dead
to live. Switched 1|url-status=live
to dead. Added 13 archive URLs (8 Wayback). GreenC 16:22, 22 April 2024 (UTC)
Tag: FABLE-0424
m.economictimes.com
Same scenario as m.timesofindia.com above. 335 pages -- GreenC 17:20, 22 April 2024 (UTC)
- Done -- Checked 328 pages and edited 27 pages. Converted 30 links. Removed 8
{{dead link}}
templates. Switched 13|url-status=dead
to live. -- GreenC 18:35, 22 April 2024 (UTC)
met pass2 further adjustments
- Done - Checked 328 pages and edited 207 pages.
met pass3
- Done - Checked 132 pages and edited 101 pages.
Tag: FABLE-0424
rugby15.co.za
Conversion:
- http://www.rugby15.co.za/2015/07/steval-pumas-announce-new-contracts/
- https://www.rugby15.co.za/steval-pumas-announce-new-contracts/
-- GreenC 01:01, 8 April 2024 (UTC)
- Done - Checked 119 pages and edited 119 pages. Converted 136 links. Removed 2
{{dead link}}
templates. Added 2{{dead link}}
. Switched 101|url-status=dead
to live. Switched 0|url-status=live
to dead. Added 7 archive URLs (1 Wayback). -- GreenC 04:16, 25 April 2024 (UTC)
Tag: FABLE-0424
sportskindle.com
- https://www.sportskindle.com/2020/10/14/neufc-kwesi-appiah-signs-contract/
- http://sportskindle.com/neufc-kwesi-appiah-signs-contract/
-- GreenC 22:28, 8 April 2024 (UTC)
- Done - Checked 4 pages and edited 4 pages. Converted 4 links. Removed 3
{{dead link}}
templates. -- GreenC 04:01, 25 April 2024 (UTC)
Tag: FABLE-0424
ssl.ofdb.de
Conversion:
-- GreenC 23:18, 8 April 2024 (UTC)
- Done - Checked 20 pages and edited 20 pages. Converted 18 links. Added 1
{{dead link}}
. Added 2 archive URLs (2 Wayback). -- GreenC 17:21, 25 April 2024 (UTC)
Tag: FABLE-0424
in.rbth.com
Conversion:
- https://in.rbth.com/articles/2011/08/22/brahmos_sets_the_gold_standard_for_russian-indian_defence_projects_12899
- https://www.rbth.com/articles/2011/08/22/brahmos_sets_the_gold_standard_for_russian-indian_defence_projects_12899
--GreenC 00:22, 9 April 2024 (UTC)
- Done - Checked 88 pages and edited 83 pages. Converted 93 links. Removed 9
{{dead link}}
templates. Added 4{{dead link}}
. Switched 32|url-status=dead
to live. Switched 2|url-status=live
to dead. Added 6 archive URLs (4 Wayback). -- GreenC 18:10, 25 April 2024 (UTC)
Tag: FABLE-0424
beta.nydailynews.com
Conversion:
- http://beta.nydailynews.com/news/politics/nys-reform-party-executive-committee-split-gov-candidate-article-1.3948595
- https://www.nydailynews.com/news/politics/nys-reform-party-executive-committee-split-gov-candidate-article-1.3948595
17 pages GreenC 00:27, 9 April 2024 (UTC)
- Done - Checked 17 pages and edited 12 pages. Converted 11 links. Removed 4
{{dead link}}
templates. -- GreenC 02:59, 26 April 2024 (UTC)
Tag: FABLE-0424
www.yfmghana.com
Conversion:
- https://www.yfmghana.com/2018/07/26/full-list-of-winners-jd-nightlife-awards-2018/
- https://yfmghana.com/full-list-of-winners-jd-nightlife-awards-2018/
-- GreenC 00:32, 9 April 2024 (UTC)
- Done - Checked 43 pages and edited 43 pages. Converted 53 links. Removed 7
{{dead link}}
templates. Switched 11|url-status=dead
to live. -- GreenC 04:07, 26 April 2024 (UTC)
Tag: FABLE-0424
FABLE0424
Test run of the WP:FABLE system. Permanently dead links have been identified by FABLE as having moved to a different URL. Changes manually verified beforehand. Changes committed to wiki by WP:WAYBACKMEDIC. Please report errors. -- GreenC 14:49, 10 April 2024 (UTC)
- There was an error in about 42 pages. They are reverted. If you find any not reverted please let me know. -- GreenC 19:58, 10 April 2024 (UTC)
- Done - edited about 600 pages and converted about 700 links to live status. -- GreenC 04:13, 26 April 2024 (UTC)
Tag: FABLE-0424
Network World
https://www.networkworld.com/article/2881467/application-security/secure-islands-protects-files-with-embedded-classification-encryption-and-usage-rights.html is dead. https://www.networkworld.com/article/2881467/secure-islands-protects-files-with-embedded-classification-encryption-and-usage-rights.html (deleting "/application-security") works and redirects to the new URL. Probably others of this format. * Pppery * it has begun... 15:13, 11 April 2024 (UTC)
- It looks like when there is something between the number and the last path element, this can signify a problem, for example: http://www.networkworld.com/article/2220304/opensource-subnet/say-what--gnu-emacs-violates-the-gpl.html --> https://www.networkworld.com/article/2220304/opensource-subnet-say-what--gnu-emacs-violates-the-gpl.html .. in this case /opensource-subnet/ is made part of the last path element, in other cases it is deleted entirely. I can check for it.
- 385 pages. -- GreenC 19:23, 13 April 2024 (UTC)
This site is not well maintained. For example https://www.networkworld.com/article/2159917/malwarebytes-offers-enterprise-anti-malware-detection--prevention.html redirects to https://www.networkworld.com/article/971208/idc-enterprises-still-moving-workloads-back-from-the-cloud.html .. completely different. I'll try verifying redirects are accurate by comparing the last word before .html is the same eg. in this case prevention.html does not equal cloud.html -- GreenC 19:11, 26 April 2024 (UTC)
Done - Checked 381 pages and edited 373 pages. Converted 412 links. Removed 0 {{dead link}}
templates. Added 2 {{dead link}}
. Switched 16 |url-status=dead
to live. Switched 16 |url-status=live
to dead. Added 124 archive URLs (116 Wayback). -- GreenC 20:04, 26 April 2024 (UTC)
rfidjournal.com
A potentially nasty usurped URL case I found: http://www.rfidjournal.com/article/articleview/9632/1/1 currently points to https://www.rfidjournal.com/gs1-releases-guidelines-for-rfid-based-electronic-article-surveillance, an article about GS1 guides, however per the Wayback Machine it previously pointed to https://web.archive.org/web/20180711120951/http://www.rfidjournal.com/articles/view?9632, an article about ScholarChip, which was the intended citation. Not sure there's anything that can be done about it here, but noting it for the record. * Pppery * it has begun... 15:55, 11 April 2024 (UTC)
- What I can do is process the entire domain, log the source and redirect links, and look for patterns of repeating redirects. Sometimes that will surface soft404s like this. BTW they have some tight rate limiting as a freemium method, not sure how my bot will perform. -- GreenC 19:02, 13 April 2024 (UTC)
- 96 pages -- GreenC 19:28, 13 April 2024 (UTC)
Done -- Because of the freemium limitations, I converted everything to archive URLs. Links that don't have archive URLs are left alone. Also the site has a problem with redirects that go to the wrong page, as noted by Pppery; the archive URLs will help. Checked 94 pages and edited 84 pages. Added 120 archive URLs (111 Wayback). -- GreenC 03:21, 27 April 2024 (UTC)
juf.org
Many links might be soft-404 redirects to the home page. -- GreenC 18:47, 13 April 2024 (UTC)
- Done - Checked 144 pages and edited 87 pages. Converted 54 links. Switched 1
|url-status=dead
to live. Switched 3|url-status=live
to dead. Added 52 archive URLs (49 Wayback). -- GreenC 18:33, 27 April 2024 (UTC)
dinamalar.com
The site was apparently revamped, and many old links, even those published as recently as 2022 are no longer available. --Kailash29792 (talk) 13:37, 17 April 2024 (UTC)
1,681 pages. -- GreenC 18:40, 27 April 2024 (UTC)
- Given the URL https://m.dinamalar.com/cinema_detail.php?id=110128 one can generate an intermediary URL https://redirect.dinamalar.com/redirect_to_slug.php?id=110128 which redirects to the destination URL https://www.dinamalar.com/news/tamil-nadu-district-news-madurai/news/110128 .. why don't they do this automatically? I'll see how many work this method. -- GreenC 19:02, 27 April 2024 (UTC)
- Unfortunately this system is giving false positives, redirecting to unrelated articles. For example https://m.dinamalar.com/cinema_detail-amp.php?id=95130 generates https://redirect.dinamalar.com/redirect_to_slug.php?id=95130 which goes to https://www.dinamalar.com/news/puducherry/news/95130 which is completely different content from the original https://web.archive.org/web/20221121213650/https://m.dinamalar.com/cinema_detail-amp.php?id=95130 -- GreenC 20:58, 27 April 2024 (UTC)
Done - Checked 1,686 pages and edited 1,304 pages. Moved 1,153 links to a new URL. Removed 2 {{dead link}}
templates. Added 13 {{dead link}}
. Switched 43 |url-status=dead
to live. Switched 111 |url-status=live
to dead. Added 218 archive URLs (216 Wayback). Changed 439 citation metadata. -- GreenC 22:28, 28 April 2024 (UTC)
mio.to
The site is online, but many old links like this don't work anymore. Kailash29792 (talk) 12:23, 21 April 2024 (UTC)
Done - Checked 489 pages and edited 355 pages. Moved 20 links to a new URL. Removed 0 {{dead link}}
templates. Added 49 {{dead link}}
. Switched 1 |url-status=dead
to live. Switched 74 |url-status=live
to dead. Added 185 archive URLs (185 Wayback). Changed 208 citation metadata fields. -- GreenC 05:26, 29 April 2024 (UTC)
wikispot.org interwiki
The entire WikiSpot: interwiki is dead (around 250 uses). Sometimes the content can be recovered at localwiki (i.e Woodland, California: wikispot:woodland:Museums -> https://localwiki.org/woodland/Museums. Other times that's also a 404 and the content is just gone. * Pppery * it has begun... 04:04, 23 April 2024 (UTC)
- Done - Checked 35 pages and edited 35 pages. Converted 37 interwiki links to wikispot.org. Moved 17 wikispot links to localwiki. Added 17
{{dead link}}
. Added 3 archive URLs (3 Wayback). -- GreenC 17:11, 29 April 2024 (UTC)
wikispot.org pass2 63 pages
- Done - Checked 63 pages and edited 20 pages. Added 14
{{dead link}}
. Switched 2|url-status=live
to dead. Added 19 archive URLs (18 Wayback). Changed 1 citation metadata fields. -- GreenC 18:13, 29 April 2024 (UTC)
- Note: many of the archive.org links to wikispot.org appear to be soft-404 redirects to the home page, or some other useless place on the old website. My bot has trouble detecting these as there is no redirect in the headers. Probably all of the wikispot.org URLs should be checked manually and if there is no viable alternative I recommend nuking the citation entirely as unverifiable because placing a dead link tag will result in bots re-adding the useless archive URL. -- GreenC 18:24, 29 April 2024 (UTC)
- I suspect there's some date (per http://wikispot.org/2015_Shutdown_Notice.html probably circa April 2015) when the site started redirecting to the home page, and all archives after that date are useless. * Pppery * it has begun... 19:18, 30 April 2024 (UTC)
- Note: many of the archive.org links to wikispot.org appear to be soft-404 redirects to the home page, or some other useless place on the old website. My bot has trouble detecting these as there is no redirect in the headers. Probably all of the wikispot.org URLs should be checked manually and if there is no viable alternative I recommend nuking the citation entirely as unverifiable because placing a dead link tag will result in bots re-adding the useless archive URL. -- GreenC 18:24, 29 April 2024 (UTC)
cdn.ampproject.org
Should be converted to regular URLs.
https://ahvalnews-com.cdn.ampproject.org/c/s/ahvalnews.com/fetullah-gulen/cia-collaborated-gulen-lobbyist?amp
- http://ahvalnews.com/fetullah-gulen/cia-collaborated-gulen-lobbyist
197 pages -- GreenC 15:45, 23 April 2024 (UTC)
symantec.com
All URLs starting with http://www.symantec.com/security_response/writeup.jsp? seem to be soft 404. 97 pages. * Pppery * it has begun... 17:00, 27 April 2024 (UTC)
- I processed every symantec link as the site is mostly soft404, I found 11 varieties.
- Done - Checked 384 pages and edited 319 pages. Moved 120 links to a new URL. Added 8
{{dead link}}
. Switched 3|url-status=dead
to live. Switched 59|url-status=live
to dead. Added 351 archive URLs (330 Wayback). Changed 69 citation metadata fields. -- GreenC 22:35, 29 April 2024 (UTC)
wikisophia.org
Entire wikisophia.org site is dead, as well as the wikisophia: interwiki (which is soon going to point to a static page at m:Interwiki map/discontinued#Wikisophia). No replacement known. * Pppery * it has begun... 22:49, 28 April 2024 (UTC)
wikisophia interwiki
- Done Checked 15 pages and edited 15 pages. Converted 14 interwikis. Added 13
{{dead link}}
. Added 1 archive URL. -- GreenC 19:43, 29 April 2024 (UTC)- The above also includes all wikisophia.org links. -- GreenC 19:56, 29 April 2024 (UTC)
- Done Checked 15 pages and edited 15 pages. Converted 14 interwikis. Added 13
koreatimes.co.kr
We seem to have some 3k articles with url=http://www.koreatimes.co.kr
. The website loads fine over HTTPS for me, it should be upgraded. Nemo 04:30, 29 April 2024 (UTC)
- Done - Checked 5,445 pages and edited 3,573 pages. Moved 5,983 links to a new URL. Removed 3
{{dead link}}
templates. Added 15{{dead link}}
. Switched 662|url-status=dead
to live. Switched 25|url-status=live
to dead. Added 327 archive URLs (213 Wayback). Changed 92 citation metadata fields. -- GreenC 16:31, 1 May 2024 (UTC)
A new feature for this move can be seen Special:Diff/1221731335/1221749231 .. the URL redirects with a client-side mechanism (JavaScript) so it was not possible to use page headers which only returns status 200. I developed a headless browser script to retrieve the JS redirect. The script is a CLI utility, in case anyone would like a copy. It requires Node and Puppeteer. -- GreenC 20:55, 1 May 2024 (UTC)
wikilivres.org
Another dead interwiki: the entire site https://wikilivres.org/ is soft 404 of the "redirect to the homepage" variety, as well as the "wikilivres:" and "BiblioWiki:" interwikis that point to it.
I also noticed while investigating this that the wikilivres.ca domain appears to have been usurped, with it originally being a wiki similar to wikisource, and now being a spammy blog. But do note that https://wikilivres.ru/ (with its own wikilivresru: interwiki) is still up. * Pppery * it has begun... 19:24, 30 April 2024 (UTC)
- 59 pages for interwiki and .org. I'll add wikilivres.ca to WP:JUDI (40 pages). -- GreenC 21:02, 1 May 2024 (UTC)
- Some pages inexplicably work eg [18] -- GreenC 12:59, 14 May 2024 (UTC)
- Done - Checked 58 pages and edited 52 pages. Converted 62 interwiki. Added 38
{{dead link}}
. Added 4 archive URLs (2 Wayback). Changed 3 citation metadata fields. -- GreenC 13:06, 14 May 2024 (UTC)
wikinvest.com
Yet another dead interwiki: wikinvest:/https://wikinvest.com. See m:Talk:Interwiki map/Archives/2018#Discontinue Wikinvest. * Pppery * it has begun... 19:28, 30 April 2024 (UTC)
- Done - Checked 141 pages and edited 105 pages. Converted 91 interwiki. Added 12
{{dead link}}
. Switched 4|url-status=live
to dead. Added 129 archive URLs (129 Wayback). Changed 11 citation metadata fields. -- GreenC 18:45, 14 May 2024 (UTC)
gutenberg.org
Entire path https://gutenberg.org/wiki/* is dead. About 40 pages. Also has an interwiki at gutenbergwiki: but it doesn't seem to be used. See m:Interwiki map/discontinued#Gutenbergwiki * Pppery * it has begun... 19:30, 30 April 2024 (UTC)
- Done - Checked 39 pages and edited 37 pages. Switched 10
|url-status=live
to dead. Added 36 archive URLs (36 Wayback). -- GreenC 21:11, 14 May 2024 (UTC)
bigten.org
Hello. The links to articles on the Big Ten Conference are broken as their URLs have changed. For instance, this 2018 article is now here. The string at the end seems to be an unique ID, so I can't predict what is the new URL without searching through the website. Not sure if it's more useful to: 1) use the archived copies where possible then convert the other ones to the new URLs 2) convert all to the new URLs. Almost 2,000 possible broken links. Thanks! MrLinkinPark333 (talk) 03:40, 2 May 2024 (UTC)
Hi User:MrLinkinPark333: Unless there is an undocumented API like exists for Wikipedia:Link_rot/URL_change_requests#dinamalar.com that translates old to new, I don't see much option but convert to archive URLs. You could also contact them to see if they have plans to add redirects. If they ever do, I can go back and unwind the archive URLs and replace with the new URLs. -- GreenC 21:22, 14 May 2024 (UTC)
- Done - Checked 1,326 pages and edited 866 pages. Moved 65 links to a new URL. Added 99
{{dead link}}
. Switched 56|url-status=live
to dead. Added 1,966 archive URLs (1,945 Wayback). Changed 745 citation metadata fields.
webcitation.org
Expand URLs to longform. Fix http->https. Fix |archive-date=
offsets due to relative time-zone differences. Unpack archive.org doubles (they won't work correctly). Note: this work was made possible by a discovery in how to access the WebCite API, which normally gives the appearance of being down/inaccessible due to SSL misconfiguration on server-side. I don't know how long this hack will work, but I am updating the links while it's working. -- GreenC 14:36, 2 May 2024 (UTC)
- Done - Converted about 11,000 links to other other providers. Converted about 1,300 links from short to long form and other misc fixes. Includes 100s of templates. There are still many WebCitation.org URLs remaining unfortunately. -- GreenC 02:44, 14 May 2024 (UTC)
freeuk.com
Some (but not all) pages/subdomains of freeuk.com
currently redirect to [19]. It's not clear to me whether this is more of a small-scale link rot issue or one that affects multiple pages, so listing here out of an abundance of caution. All the best, —a smart kitten[meow] 15:35, 11 May 2024 (UTC)
- I'll check it out, thanks. The domain is in 313 pages. -- GreenC 16:01, 11 May 2024 (UTC)
- Done - Checked 319 pages and edited 98 pages. Added 5
{{dead link}}
. Switched 2|url-status=live
to dead. Added 113 archive URLs (109 Wayback). -- GreenC 20:19, 15 May 2024 (UTC)
symetratour.com
Hello. The Symetra Tour has been renamed to The Epson Tour. Their links have been subsquently moved. Here is the new format:
Some links can not be converted such as [20] this link because the event is no longer held. Other links like this one needs the word symetra changed to epson in order to work like this to that. I fixed some already. 91 links under http and 95 under https currently to fix. Thanks! MrLinkinPark333 (talk) 01:42, 12 May 2024 (UTC)
- 116 pages. -- GreenC 20:42, 3 June 2024 (UTC)
- Done - Checked 116 pages and edited 116 pages. Moved 109 links to a new URL. Switched 1
|url-status=dead
to live. Added 78 archive URLs (77 Wayback). - -- GreenC 22:42, 3 June 2024 (UTC)
- Done - Checked 116 pages and edited 116 pages. Moved 109 links to a new URL. Switched 1
ECI - Election Commission of India
The ECI has changed links for a lot of election results on their site. e.g. [21] to [22]. -MPGuy2824 (talk) 11:43, 14 May 2024 (UTC)
- 4,700 pages -- GreenC 20:40, 3 June 2024 (UTC)
- User:MPGuy2824: The "old." links are not working https://old.eci.gov.in/assembly-election/ae-2021-tamilnadu/ although they were, it exists at Wayback [23] .. hopefully a temporary outage. I'll recheck in a week or ping me if you see it change before then. -- GreenC 23:01, 3 June 2024 (UTC)
- It looks like geofencing, as the link works for me (in India). Let's wait a week as you suggest. -MPGuy2824 (talk) 05:49, 4 June 2024 (UTC)
- User:MPGuy2824: The "old." links are not working https://old.eci.gov.in/assembly-election/ae-2021-tamilnadu/ although they were, it exists at Wayback [23] .. hopefully a temporary outage. I'll recheck in a week or ping me if you see it change before then. -- GreenC 23:01, 3 June 2024 (UTC)
- There is a new BRFA at Wikipedia:Bots/Requests for approval/BaranBOT 2. – DreamRimmer (talk) 13:00, 8 June 2024 (UTC)
iaboterr
Fixing about 800 pages that have an error by IABot adding duplicate archives and incorrect url-status -- GreenC 04:32, 16 May 2024 (UTC)
- Done -Checked 806 pages and edited 748 pages. -- GreenC 06:36, 16 May 2024 (UTC)
Found about 200 pages more in Category:CS1 errors: redundant parameter, and removing duplicate |access-date=
. -- GreenC 16:47, 16 May 2024 (UTC)
- Done - Checked 209 pages and edited 168 pages -- GreenC 18:07, 16 May 2024 (UTC)
britannica.co.kr
This was brought to my attention through Special:Diff/1224405115. The following hostname should be marked as dead and set to the archived urls given that they are no longer serving any content and being redirected to the company's corp site, or simply dead:
- *.britannica.co.kr
– robertsky (talk) 10:56, 18 May 2024 (UTC)
- 53 pages. -- GreenC 20:44, 3 June 2024 (UTC)
- Done - Checked 53 pages and edited 20 pages. Added 4
{{dead link}}
. Added 16 archive URLs (8 Wayback). -- GreenC 00:46, 4 June 2024 (UTC)
- Done - Checked 53 pages and edited 20 pages. Added 4
South Asia Analysis Group
www.southasiaanalysis.org
- domain has been usurped. not sure if it's used anywhere other than Major non-NATO ally (where I already fixed the cite template). thanks, Kdroo (talk) 22:14, 23 May 2024 (UTC)
- Done - added to WP:JUDI for later processing: Special:Diff/1225704304/1225804735 -- GreenC 20:44, 26 May 2024 (UTC)
nfl.com
Hello. I found that URLs under the http://www.nfl.com/news/story/ format are either broken or redirect to a new URL:
- URLs with only numbers are broken, and might have an archived copy.
- URLs with a numbers and letters string might redirect to the new URL. This redirect works
- Some URLs with a number/letters string don't work and need converting with the article name in the URL: This URL should go here
- Some URLs with numbers/letters and article name might redirect to new URLs: This is now here.
9000+ links under http and 100+ links under https Thanks! MrLinkinPark333 (talk) 20:22, 26 May 2024 (UTC)
- Done - Checked 3,402 pages and edited 3,236 pages. Moved 6,863 links to a new URL. Added 32
{{dead link}}
. Switched 72|url-status=dead
to live. Switched 241|url-status=live
to dead. Added 1,043 archive URLs (1,007 Wayback). Changed 888 citation metadata fields. -- GreenC 18:32, 4 June 2024 (UTC)
donjohnsonbigband.com
This domain seems to have been usurped: in 2020, it was still a normal band site https://web.archive.org/web/20201202185840/http://www.donjohnsonbigband.com/[usurped] vs since 2021 it's "DJ Son Band - Rock Music Review" https://web.archive.org/web/20211115153056/https://www.donjohnsonbigband.com/[usurped].
New official URL for the band is https://www.donjohnsonbigband.fi/ TuukkaH (talk) 22:10, 28 May 2024 (UTC)
- Done Amused the usurpers interpreted "donjohnson" as "DJ Son" ie. Don John Son. Or maybe a computer algorithm, stupid AI. Well, I added it to WP:JUDI for future processing:Special:Diff/1225804735/1226183097 and the URL is in one article, Support de Microphones, which I sortafixed.Special:Diff/1171308429/1226183692 -- GreenC 01:35, 29 May 2024 (UTC)
deccanchronicle.com
Deccan Chronicle: Many 2010s articles like this are dead. Kailash29792 (talk) 05:02, 2 June 2024 (UTC)
- Done - Checked 8,059 pages and edited 3,532 pages. Moved 3,217 links to a new URL. Added 81
{{dead link}}
. Switched 334|url-status=dead
to live. Switched 208|url-status=live
to dead. Added 742 archive URLs (694 Wayback). Changed 1,018 citation metadata fields. -- GreenC 22:35, 5 June 2024 (UTC)
cnlbr.org
Old path of "www.cnlbr.org/Portals/.../pagename
" moved to "irp.cdn-website.com/33d0c3d0/files/uploaded/pagename
"
-- BX (talk) 20:58, 2 June 2024 (UTC)
- BX, can you clarify. For example, old URL http://www.cnlbr.org/Portals/0/Hero/Herbert_Rap_Dixon.pdf goes to ? -- GreenC 21:00, 3 June 2024 (UTC)
- @GreenC: The old path after "Portals/" varied, however the new path has no variables. So for your example, the new path is https://irp.cdn-website.com/33d0c3d0/files/uploaded/Herbert_Rap_Dixon.pdf It's basically just cutting the last "pagename" from the old path and pasting it to the the new prefix, if that makes sense. Rgdrs. --BX (talk) 04:03, 4 June 2024 (UTC)
- Got it, didn't realize "33d0c3d0" is a static string. 138 pages. -- GreenC 19:13, 4 June 2024 (UTC)
- User:BX: There were edge cases in about 30 URLs. Needed to convert "%20%20" to "%20". And in some, changing ".pdf" to "-2020.pdf" - After those changes, I was able to convert all to live links. I made metadata changes eg, changing
|publisher=cnlbr.org
to|publisher=Center for Negro League Baseball Research
, because supposed to use names vs. domains. Anything that was previous marked dead and had an archive URL, I changed the primary URL to the live version and set|url-status=live
and kept the original archive URL. -- GreenC 01:37, 7 June 2024 (UTC)
- User:BX: There were edge cases in about 30 URLs. Needed to convert "%20%20" to "%20". And in some, changing ".pdf" to "-2020.pdf" - After those changes, I was able to convert all to live links. I made metadata changes eg, changing
- Got it, didn't realize "33d0c3d0" is a static string. 138 pages. -- GreenC 19:13, 4 June 2024 (UTC)
- @GreenC: The old path after "Portals/" varied, however the new path has no variables. So for your example, the new path is https://irp.cdn-website.com/33d0c3d0/files/uploaded/Herbert_Rap_Dixon.pdf It's basically just cutting the last "pagename" from the old path and pasting it to the the new prefix, if that makes sense. Rgdrs. --BX (talk) 04:03, 4 June 2024 (UTC)
- BX, can you clarify. For example, old URL http://www.cnlbr.org/Portals/0/Hero/Herbert_Rap_Dixon.pdf goes to ? -- GreenC 21:00, 3 June 2024 (UTC)
- Done Checked 140 pages and edited 140 pages. Moved 321 links to a new URL. Switched 17
|url-status=dead
to live. Changed 185 citation metadata fields. -- GreenC 01:37, 7 June 2024 (UTC)- Wow, thanks User:GreenC. The work you and your bot do is invaluable to keeping this place working. Thanks again! Rgrds. --BX (talk) 04:04, 7 June 2024 (UTC)
- Thank you! Your appreciation helps to keep this going. -- GreenC 14:59, 7 June 2024 (UTC)
- Wow, thanks User:GreenC. The work you and your bot do is invaluable to keeping this place working. Thanks again! Rgrds. --BX (talk) 04:04, 7 June 2024 (UTC)
google.com/hostednews
Soft-404s and 404s. 5,300 pages. -- GreenC 20:35, 3 June 2024 (UTC)
- Done - Checked 5,322 pages and edited 4,351 pages. Converted 1 templates. Removed 2
{{dead link}}
templates. Added 1,739{{dead link}}
. Switched 707|url-status=live
to dead. Added 3,633 archive URLs (2,179 Wayback). Changed 176 citation metadata fields. -- GreenC 15:42, 7 June 2024 (UTC)
cinestaan.com makelive
Mysteriously the site is back and working, per this. Maybe the dead links can be reassessed? Kailash29792 (talk) 04:14, 4 June 2024 (UTC)
- Previous: Wikipedia:Link_rot/URL_change_requests#cinestaan.com -- GreenC 19:28, 4 June 2024 (UTC)
- I changed the domain status from "Permadead" to "Permalive" in iabot.org --- for the moment the bot won't convert links to dead automatically. For Enwiki, Medic has a "makelive" function which I could apply to any link responding with status 200. -- GreenC 19:37, 4 June 2024 (UTC)
- It checked every link, any that are 200 it converted to a live link. -- GreenC 01:37, 8 June 2024 (UTC)
Done - Checked 2,242 pages and edited 2,033 pages. Moved 2,360 links to a new URL. Removed 153 {{dead link}}
templates. Added 29 {{dead link}}
. Switched 1,943 |url-status=dead
to live. Added 6 archive URLs (6 Wayback). Changed 65 citation metadata fields.
google.com/patents
2,700 pages. -- GreenC 20:51, 7 June 2024 (UTC)
- The way GreenC bot is handling these by replacing them with half-broken archive.today links is problematic. The bot's activity on this should be paused, the changes made so far should be reverted, and someone should write a bot/script which properly fixes the URLs to current working versions. –jacobolus (t) 16:49, 8 June 2024 (UTC)
- My reply here Special:Diff/1227943344/1227944480. I agree that it's a good idea to switch archived URLs to live URLs, and my bot can do that. But I need to know what the live URL is. And your not providing information how to figure that out. Currently, the bot is repairing a completely broken non-functioning URL with an archive URL. I understand the archive URL is incomplete, but at least better than a completely dead URL. If there is a way to determine the live URL, I can replace the archive URL with the live URL. -- GreenC 17:01, 8 June 2024 (UTC)
- Looks like the Patent ID is in the title of the archive.today page eg. for [24]:
<title>Patent US417831 - ARTISTS EASEL - Google Patents</title>
from which can be generated https://patents.google.com/patent/US417831A .. although I am unclear about "A", how to determine. -- GreenC 17:50, 8 June 2024 (UTC) - Early on Google made up a new identifier for every patent. More recently they have sensibly figured out how to use the patent number itself. I think the A is optional; it's just the form of URL that turned up when I did a search for a couple of these specific patents. You can see how https://patents.google.com/patent/US640792A and https://patents.google.com/patent/US640792 give the same result. –jacobolus (t) 21:48, 8 June 2024 (UTC)
- The archive roll back is done. Edited 731 articles and 1,468 citations. Example: Special:Diff/1227937185/1228131980 and Special:Diff/1227937170/1228131968. There are 64 links with no patent number the list is available here Wikipedia:Link rot/Cases/Googlepatents in case you or anyone want to research. Optionally update that page with the patent numbers and I'll update wiki via bot. -- GreenC 16:54, 9 June 2024 (UTC)
- The linkrot ones can probably be figured out by scraping a wayback page. E.g. the first one is here, from which we can find patent number 2612994, so the current google patent link would be https://patents.google.com/patent/US2612994. –jacobolus (t) 17:00, 9 June 2024 (UTC)
- The archive roll back is done. Edited 731 articles and 1,468 citations. Example: Special:Diff/1227937185/1228131980 and Special:Diff/1227937170/1228131968. There are 64 links with no patent number the list is available here Wikipedia:Link rot/Cases/Googlepatents in case you or anyone want to research. Optionally update that page with the patent numbers and I'll update wiki via bot. -- GreenC 16:54, 9 June 2024 (UTC)
- Looks like the Patent ID is in the title of the archive.today page eg. for [24]:
- My reply here Special:Diff/1227943344/1227944480. I agree that it's a good idea to switch archived URLs to live URLs, and my bot can do that. But I need to know what the live URL is. And your not providing information how to figure that out. Currently, the bot is repairing a completely broken non-functioning URL with an archive URL. I understand the archive URL is incomplete, but at least better than a completely dead URL. If there is a way to determine the live URL, I can replace the archive URL with the live URL. -- GreenC 17:01, 8 June 2024 (UTC)
- Done - Checked 2,702 pages and edited 2,502 pages. Moved 2,891 links to a new URL. Removed 1
{{dead link}}
templates. Added 256{{dead link}}
. Switched 3|url-status=dead
to live. Switched 11|url-status=live
to dead. Added 1,539 archive URLs (222 Wayback). Changed 2 citation metadata fields. (NOTE: these stats are outdated due to the archive roll back in a later pass, which removed 1,468 archive URLs) -- GreenC 16:54, 9 June 2024 (UTC)- Thanks! –jacobolus (t) 16:58, 9 June 2024 (UTC)
google.com/culturalinstitute
666 pages -- GreenC 20:56, 7 June 2024 (UTC)
- Done - Checked 675 pages and edited 670 pages. Moved 489 links to a new URL. Added 63
{{dead link}}
. Switched 1|url-status=dead
to live. Switched 3|url-status=live
to dead. Added 166 archive URLs (163 Wayback). -- GreenC 00:50, 10 June 2024 (UTC)
google.com/finance
562 pages -- GreenC 20:59, 7 June 2024 (UTC)
- Done - Checked 562 pages and edited 513 pages. Converted 1 templates. Moved 365 links to a new URL. Added 40
{{dead link}}
. Switched 7|url-status=dead
to live. Switched 6|url-status=live
to dead. Added 150 archive URLs (127 Wayback). Changed 1 citation metadata fields. -- GreenC 04:01, 10 June 2024 (UTC)
google.com/doodles
900 pages -- GreenC 21:40, 7 June 2024 (UTC)
- Done - Checked 907 pages and edited 904 pages. Moved 960 links to a new URL. Added 4
{{dead link}}
. Switched 9|url-status=dead
to live. Added 21 archive URLs (10 Wayback). -- GreenC 04:20, 11 June 2024 (UTC)
chennaionline.com
The site is working properly per this, but there are still many pre-2020 dead links like this. Kailash29792 (talk) 09:52, 12 June 2024 (UTC)
- Done - Checked 893 pages and edited 147 pages. Moved 8 links to a new URL. Added 19
{{dead link}}
. Switched 49|url-status=live
to dead. Added 80 archive URLs (70 Wayback). Changed 24 citation metadata fields. -- GreenC 18:30, 13 June 2024 (UTC)
angelfire.com
This book, Hook, James; Franck, Dave; Austin, Steve (1982). An Aid to Collecting Selected Council Shoulder Patches with Valuation. has within it a link. Yes, I know it's from angelfire. Can I ask that: www
- User:Evrik - if you don't mind, I'll use this request to process the entire angelfire.com domain which needs to be done anyway, checking for link rot. It will include a trap for https://www.angelfire.com/tx6/patch/cspbook.html to replace with https://scouttrader.org/csiguidebook.shtml .. it is in about 90 pages.
- 4,794 pages -- GreenC 15:59, 13 June 2024 (UTC)
- It may http and not https --evrik (talk) 16:14, 13 June 2024 (UTC)
- You know, the citation is for a book authored by Hook, Franck and Austin (1982). The source link is for a book published by Ellis, Jones and Austin (2003). The replacment link is for a book published by Austin and Keasey (2013). There are many editions and authors. Maybe more. If someone is citing the 1982 edition on page 52, and we change the link to the 2013 edition, it will be a wrong page number. I think this needs to be done with more care - or a consensus discussion. I don't want to be in the position of asked to undo the changes, which is time consuming. -- GreenC 01:50, 15 June 2024 (UTC)
- It may http and not https --evrik (talk) 16:14, 13 June 2024 (UTC)
- Done - Checked 4,890 pages and edited 4,317 pages. Moved 4,750 links to a new URL. Removed 1
{{dead link}}
templates. Added 58{{dead link}}
. Switched 109|url-status=dead
to live. Switched 13|url-status=live
to dead. Added 379 archive URLs (339 Wayback). Changed 521 citation metadata fields. -- GreenC 01:50, 15 June 2024 (UTC)
mhc-macris.net
I didn't check very many of the pages here (first 2 out of 3,685) [25], but the current links are dead, and changing "Details.aspx?" to "details?" fixes them (with them redirecting to a version with a lower case ID). Another improvement is using https instead of http.
For instance in College of the Holy Cross, which as of right now citation 28's url is "http://mhc-macris.net/Details.aspx?MhcId=WOR.K" which just redirects to the mhc-macris home page, but if it's changed to "https://mhc-macris.net/details?MhcId=WOR.K" it redirects to "https://mhc-macris.net/details?mhcid=wor.k" which has the desired content.
There are another handful of URLs needed to be changed sprinkled amongst this search, but some of them are archive links. GrapesRock (talk) 20:52, 19 June 2024 (UTC)
- OK. I'll check every link, in case there are any other soft-404 issues. 3,836 pages. thanks. -- GreenC 23:54, 19 June 2024 (UTC)
- Question: GrapesRock, I have run into a problem. The site is using a bot blocker system that I don't recognize and have tried various methods to get around unsuccessfully. The only way is a "blind move" ie. changing the URL without verifying the new URL exists and/or works. This is potentially dangerous because sites frequently do not migrate every URL to the new scheme. Another method is treat every URL containing "Details.aspx" as a dead link, and add an archive URL. It depends on how reliable the archive.org links are (they may have the same problem saving pages due to bot blocker) vs. how consistent the site was in migrating to the new scheme. If you want to manual spot check to see which of these methods looks better that would be helpful in deciding which course to take. I also emailed the site admins on the off chance they are willing to temporarily whitelist my IP. -- GreenC 17:48, 20 June 2024 (UTC)
- I looked at the top 10 pages with Details.aspx in them, and everyone where I changed it to "details", it redirected to the correct page.
- As for the archive, all the archives I've seen before July 7, 2022 are okay, but on that date and after it is inconsistently marked
- List of National Historic Landmarks in Massachusetts
- http://mhc-macris.net/Details.aspx?MhcId=WSP.211 where the corresponding archive with one capture (on July 7, 2022), https://web.archive.org/web/20220707050042/http://mhc-macris.net/Details.aspx?MhcId=WSP.211 showed okay (no 404s or redirects reported), but didn't display anything
- http://mhc-macris.net/Details.aspx?MhcId=SAL.1126 where the July 7, 2022 showed as okay, but didn't ever load the content https://web.archive.org/web/20220707051738/http://mhc-macris.net/Details.aspx?MhcId=SAL.1126. Earlier archives such as https://web.archive.org/web/20150513134101/http://mhc-macris.net/Details.aspx?MhcId=SAL.1126 work fine
- etc.
- Holyoke, Massachusetts
- This archive for 2023, shows both the archives as blue even though they're 301 redirects.
- List of National Historic Landmarks in Massachusetts
- So, probably if the only archive that exists is from July 7, 2022 or later, it shouldn't be used, but it should be safe to add any archive links from before then.
- This attributes section seems to indicate that the mhcid does not change since its uniquely assigned. Also, the MACRIS home page says "Each historic property or area in the MACRIS database will have an MHC ID assigned to it. The MHC ID in Search Results is linked to a Details screen". This makes me think it's safe to change the URL as long as it's of the format "mhc-macris.net/Details.aspx?MhcId=[THE ID]" since the ID will still be in the MACRIS database and will be linked to a details screen GrapesRock (talk) 18:41, 20 June 2024 (UTC)
- Alright I have it programmed for blind search-replace. I'll wait a till Friday, and see if they respond about the IP. Mean time can start on the other one below. -- GreenC 22:26, 20 June 2024 (UTC)
- Soft-redirect rule:
newurl = "https://mhc-macris.net/Details.aspx?MhcId=WOR.K" if newurl ~ "mhc-macris[.]net/Details[.]aspx[?]MhcId[=]": subs("Details.aspx?", "details?", newurl) if match(newurl, "(?i)[?]MhcId=[^$]*[^$]*", d) > 0: subs(d, tolowerAscii(d), newurl) newurl == "https://mhc-macris.net/details?mhcid=wor.k"
- Done - Checked 3,720 pages and edited 3,716 pages. Moved 4,419 links to a new URL. Removed 36
{{dead link}}
templates. Switched 39|url-status=dead
to live. Added 3 archive URLs (2 Wayback). Changed 22 citation metadata fields. -- GreenC 16:36, 22 June 2024 (UTC)- @GrapesRock@GreenC FWIW, I created {{MACRIS}} a few years back after they changed their url scheme for the second time in a short period. That allows all MACRIS links using the template to be updated with a single edit to the template. I'm not sure if it's worth mass conversion, but wanted to make you both aware. Pi.1415926535 (talk) 20:14, 22 June 2024 (UTC)
- Those templates are OK the problem is they don't account for some percentage of links that were not migrated to the new URL scheme. It assumes all or nothing, in practice is rarely the case - some links get left behind becoming dead URLs. By using standard citation templates, bots like this can check the links and add archives or
{{dead link}}
tags on a per-URL basis. Otherwise the bot would need to be specially programmed for the custom template, and there are thousands of custom templates making it impractical. In this case, the site is bot protected so it really is all or nothing so the template (for now) is not a problem. -- GreenC 04:40, 23 June 2024 (UTC)
- Those templates are OK the problem is they don't account for some percentage of links that were not migrated to the new URL scheme. It assumes all or nothing, in practice is rarely the case - some links get left behind becoming dead URLs. By using standard citation templates, bots like this can check the links and add archives or
- @GrapesRock@GreenC FWIW, I created {{MACRIS}} a few years back after they changed their url scheme for the second time in a short period. That allows all MACRIS links using the template to be updated with a single edit to the template. I'm not sure if it's worth mass conversion, but wanted to make you both aware. Pi.1415926535 (talk) 20:14, 22 June 2024 (UTC)
atlantaintownpaper.com
This source has been moved to roughdraftatlanta.com. For instance in George Floyd protests, currently there is the url https://atlantaintownpaper.com/2020/05/mayor-police-chief-denounce-anarchists-and-terrorists-who-destroyed-city-curfew-begins-at-9-p-m/ and the source has been moved to https://roughdraftatlanta.com/2020/05/30/mayor-police-chief-denounce-anarchists-and-terrorists-who-destroyed-city-curfew-begins-at-9-p-m/ (and there's no redirect).
Both
- https://roughdraftatlanta.com/atlantaintownpaper/2020/05/mayor-police-chief-denounce-anarchists-and-terrorists-who-destroyed-city-curfew-begins-at-9-p-m/
- https://roughdraftatlanta.com/2020/05/mayor-police-chief-denounce-anarchists-and-terrorists-who-destroyed-city-curfew-begins-at-9-p-m/
Redirect to the proper site GrapesRock (talk) 15:22, 20 June 2024 (UTC)
- OK. This is a "Soft-redirect", where a page exists at a new URL but a redirect is missing (versus a soft-404, where the redirect exists but goes to a wrong page). I can fix Soft-redirects, when there is foreknowledge like you helpfully discovered. It also has a "Redirect" element so Soft-redirect --> Redirect --> Destination. 78 pages. -- GreenC 16:28, 20 June 2024 (UTC)
- Done - Checked 78 pages and edited 75 pages. Moved 79 links to a new URL. Added 1
{{dead link}}
. Switched 13|url-status=dead
to live. Added 3 archive URLs (3 Wayback). -- GreenC 02:38, 21 June 2024 (UTC)- Soft-redirect rule:
subs("atlantaintownpaper.com", "roughdraftatlanta.com", newurl)
- Soft-redirect rule:
Ooh, cool! Thanks for the explanation on a piece of terminology, it's always fun to learn new words/concepts (and of course thanks for moving all the stuff). GrapesRock (talk) 14:03, 21 June 2024 (UTC)
- I made a glossary WP:LINKROT#Glossary of terminology it can get complicated. -- GreenC 15:51, 22 June 2024 (UTC)
clatl.com
Redirects to creativeloafing.com and soft-404s - 379 pages -- GreenC 14:38, 21 June 2024 (UTC)
- Done - Checked 379 pages and edited 365 pages. Moved 389 links to a new URL. Added 21
{{dead link}}
. Switched 37|url-status=dead
to live. Switched 8|url-status=live
to dead. Added 108 archive URLs (84 Wayback). Changed 63 citation metadata fields. -- GreenC 15:47, 21 June 2024 (UTC)
- Soft-404 rule by URL: If a redirect contains:
(?i)(page[+]not[+]found|page%20not%20found)
- Soft-404 rule by page title: If a page title contains:
(?i)^[ ]*search([ ]*[|][ ]*Creative Loafing)?[ ]*$
- Soft-404 rule by page content: If a page contains:
Content is needed
- Soft-404 rule by URL: If a redirect contains:
stat.kg
The URL of the National Statistical Comitee of the Kyrgyz Republic changed from stat.kg to stat.gov.kg, everything else stayed the same. The links lead to 404, e. g. in Chaek MarcelloIV (talk) 11:04, 24 June 2024 (UTC)
- Done Checked 780 pages and edited 769 pages. Moved 800 links to a new URL. Added 1
{{dead link}}
. Switched 7|url-status=dead
to live. Added 3 archive URLs (3 Wayback). Changed 19 citation metadata fields.
- Soft-redirect rule:
subs("stat.kg", "stat.gov.kg", newurl)
- Soft-redirect rule:
mtv.com
All mtv.com/news links have broke according to https://variety.com/2024/digital/news/mtv-news-website-archives-pulled-offline-1236047163/. Looks like we have several thousand references. --Nintendofan885T&Cs apply 22:52, 24 June 2024 (UTC)
(edit conflict) Variety is reporting that 20 years of MTV News archives have been pulled. A few I've tested seem to support that:
- https://www.mtv.com/news/3r5xfl/dungeons-and-dragons-arena-of-war
- https://www.mtv.com/news/olxhhg/joe-manganiello-dungeons-dragons-movie
- https://www.mtv.com/news/bhktbo/dungeons-and-dragons-online-character-creator-official-released
Thanks! Sariel Xilo (talk) 23:09, 24 June 2024 (UTC)
FYI looks like mtv.com is marked as permalive on IABot so it will ignore the links on other wikis (as GreenC bot only processes enwiki) --Nintendofan885T&Cs apply 23:23, 24 June 2024 (UTC)
- WaybackMedic can edit the IABot database, changing target links to permadead, which then propagate to the other wikis, via IABot. I will also process all MTV links on enwiki as normal, and see what other soft-404 rules might be discovered, which can also be applied to the IABot database. -- GreenC 01:30, 25 June 2024 (UTC)
- 20,263 pages mtv.com/*
- Soft-404 Rules: When a redirect matches the regex, and original URL does not match. Example for rule #1: [26] redirects to [27]
([.]|[/])mtv[.]com([.](tw|br|au))?(/(music|#))?/news/?
([.]|[/])mtv[.]com([.](tw|br|au))?/?$
([.]|[/])paramountshop[.]com/?$
([.]|[/])mtv[.]com([.](tw|br|au))?/[?]xrs=PPM-18-10caf1c/?$
([.]|[/])mtv[.]com([.](tw|br|au))?/category/ffftn1/style/?$
([.]|[/])paramountplus[.]com/brands/mtv/#ftag=PPM-18-10caf1c/?$
([.]|[/])mtvema[.]com(/en-us)?/?$
([.]|[/])mtv[.]com([.](tw|br|au))?/videos/?$
- For Enwiki: Checked 19,690 pages and edited 18,113 pages. Moved 319 links to a new URL. Removed 2
{{dead link}}
templates. Added 787{{dead link}}
. Switched 11,106|url-status=live
to dead. Added 27,134 archive URLs (25,337 Wayback).- More work to do, about 700 pages missed due to Cirrus vs. SQL search incongruities. -- GreenC 01:10, 28 June 2024 (UTC)
- Done. Checked 643 pages and edited 192 pages. Moved 92 links to a new URL. Added 14
{{dead link}}
. Switched 4|url-status=dead
to live. Switched 51|url-status=live
to dead. Added 54 archive URLs (44 Wayback). -- GreenC 15:54, 1 July 2024 (UTC)
- Done. Checked 643 pages and edited 192 pages. Moved 92 links to a new URL. Added 14
- More work to do, about 700 pages missed due to Cirrus vs. SQL search incongruities. -- GreenC 01:10, 28 June 2024 (UTC)
- For IABot database updates: est. 3-4 days -- GreenC 00:23, 28 June 2024 (UTC)
- Done. Checked about unique 51,000 links. Over 98% are hard and soft dead. Uploaded the information to IABot which will be changing the links across 300+ wikis. -- GreenC 15:57, 1 July 2024 (UTC)
- Done with mtv.com -- GreenC 15:58, 1 July 2024 (UTC)
- @GreenC: Can you also check the following sites found through the certificate (don't match the above regex) that have seemingly also killed their news sections and have similar redirect-to-homepage deadlinks looking at Special:LinkSearch?
- mtv.pl (mtv.pl/newsy)
- mtv.co.uk (mtv.co.uk/news)
Thanks! --Nintendofan885T&Cs apply 19:21, 1 July 2024 (UTC)
- User:Nintendofan885, thanks. New sections below. -- GreenC 14:17, 2 July 2024 (UTC)
apps.ehsni.gov.uk
Looks like we have a soft-redirect from http://apps.ehsni.gov.uk/ambit/Details.aspx?MonID=8572 to https://apps.communities-ni.gov.uk/NISMR-PUBLIC/Details.aspx?MonID=8572. Checking a smattering of links from List of castles in Ireland this seems to redirect to the proper place consistently (i.e. the few links I've checked, changing "http://apps.ehsni.gov.uk/ambit" to "https://apps.communities-ni.gov.uk/NISMR-PUBLIC" has worked). GrapesRock (talk) 17:49, 25 June 2024 (UTC)
- Hi User:GrapesRock: Looks like these exist on 4 pages. Can you repair them? It will be a lot easier than programming a fix. -- GreenC 16:16, 1 July 2024 (UTC)
- Yup, done. For the future, is there any value for posterity in adding posts here for links that only have a smattering of pages or should I just fix 'em? GrapesRock (talk) 16:50, 1 July 2024 (UTC)
- It's hard to say because it depends what work is involved making the fix. I've seen cases where 5 pages can take a long time to figure out manually and better done by bot. To setup the bot, compile, generate a list of target pages, run the bot, check for errors, upload diffs .. it's like 10 or 15 minutes for a small run. If you can do it faster than that manually, go for it. But even for simple cases, if it's more than around 20 pages don't hesitate to ask for bot help. -- GreenC 18:36, 1 July 2024 (UTC)
- Yup, done. For the future, is there any value for posterity in adding posts here for links that only have a smattering of pages or should I just fix 'em? GrapesRock (talk) 16:50, 1 July 2024 (UTC)
kp.by
Looks like there's a soft-redirect from kp.by to kp.ru links, such as "https://www.kp.by/daily/27084/4156223/" in Victory Day (9 May) being dead, but "https://www.kp.ru/daily/27084/4156223/" working GrapesRock (talk) 18:26, 25 June 2024 (UTC)
- 117 pages -- GreenC 16:22, 1 July 2024 (UTC)
- Done - Checked 121 pages and edited 116 pages. Moved 149 links to a new URL. Removed 33
{{dead link}}
templates. Added 2{{dead link}}
. Switched 10|url-status=dead
to live. Added 7 archive URLs (4 Wayback). -- GreenC 19:38, 1 July 2024 (UTC)
- Done - Checked 121 pages and edited 116 pages. Moved 149 links to a new URL. Removed 33
dailynews.gov.bw
URLs of the form "http://www.dailynews.gov.bw/news-details.php?nid=23359" that I've checked soft-redirect to "https://dailynews.gov.bw/news-detail/23359" (example from Gladys Olebile Masire) GrapesRock (talk) 20:58, 25 June 2024 (UTC)
newurl = <url> subs("http://", "https://", newurl) # normal site # http://www.dailynews.gov.bw/news-details.php?nid=23359 # https://dailynews.gov.bw/news-detail/23359 if newurl ~ "www[.]dailynews[.]gov[.]bw/news-details[.]php[?]nid=": subs("www.dailynews.gov.bw", "dailynews.gov.bw", newurl) subs("/news-details.php?nid=", "/news-detail/", newurl) # mobile site # http://www.dailynews.gov.bw/mobile/news-details.php?nid=10829&flag= # https://dailynews.gov.bw/news-detail/10829 if newurl ~ "www[.]dailynews[.]gov[.]bw/mobile/news-details[.]php[?]nid=": subs("www.dailynews.gov.bw", "dailynews.gov.bw", newurl) subs("/mobile/news-details.php?nid=", "/news-detail/", newurl) subs("&flag=", "", newurl)
- Soft-404: if redirect URL is
https://dailynews.gov.bw/page-not-found
- Done - Checked 91 pages and edited 71 pages. Moved 111 links to a new URL. Removed 20
{{dead link}}
templates. Added 6{{dead link}}
. Switched 22|url-status=dead
to live. Added 9 archive URLs (7 Wayback). -- GreenC 21:58, 1 July 2024 (UTC)
- Done - Checked 91 pages and edited 71 pages. Moved 111 links to a new URL. Removed 20
blackcountryhistory.org
From the page 1980, "http://blackcountryhistory.org/collections/getrecord/GB149_P_915/" soft redirects to "https://www.blackcountryhistory.org/collections/getrecord/GB149_P_915/" (adding a "www." to the start fixes the dead link, and might as well upgrade security). GrapesRock (talk) 17:33, 26 June 2024 (UTC)
- 123 pages -- GreenC 22:52, 1 July 2024 (UTC)
- User:GrapesRock - the website has CloudFlare protection at maximum level ("Click box if you are human"). My bot is unable to verify if the new URL is working. Given the low number of links, and simplicity of the move (adding "www" to the domain), I will go ahead and do a "blind move" ie. without verifying. -- GreenC 23:21, 1 July 2024 (UTC)
- Done - Checked 130 pages and edited 106 pages. Moved 133 links to a new URL. Removed 1
{{dead link}}
templates. Switched 28|url-status=dead
to live. Added 5 archive URLs (2 Wayback). -- GreenC 23:38, 1 July 2024 (UTC)
cmt.com
Country Music Television, sister company of MTV. Paramount cost cutting: https://www.savingcountrymusic.com/cmt-mtvs-eradication-of-editorial-content-is-a-catastrophe/
6,278 pages GreenC 02:17, 27 June 2024 (UTC)
- Enwiki - Checked 6,244 pages and edited 3,242 pages. Moved 47 links to a new URL. Removed 2
{{dead link}}
templates. Added 310{{dead link}}
. Switched 1|url-status=dead
to live. Switched 616|url-status=live
to dead. Added 5,635 archive URLs (5,327 Wayback). Changed 220 citation metadata fields. -- GreenC 14:05, 3 July 2024 (UTC) - IABot DB - Checked and updated 6,831 URLs.
Done -- GreenC 22:44, 3 July 2024 (UTC)
cc.com
Comedy Central. More Paramount cost cutting: https://www.hollywoodreporter.com/business/business-news/comedy-central-website-daily-show-clips-wiped-out-1235933345/amp/ -- GreenC 15:03, 27 June 2024 (UTC)
909 pages -- GreenC 15:11, 27 June 2024 (UTC)
Soft-404 when a link redirects to one of these:
([.]|[/])cc[.]com/[?]xrs=PPM-18-10caf1[chd]/?$
([.]|[/])paramountplus[.]com/shows/tosh-0/#ftag=PPM-18-10caf1d/?$
([.]|[/])paramountplus[.]com/shows/the-daily-show/#ftag=PPM-18-10caf1d/?$
([.]|[/])paramountplus[.]com/brands/comedy-central/#ftag=PPM-18-10caf1d/?$
([.]|[/])southpark[.]cc[.]com/seasons/south-park/?$
([.]|[/])cc[.]com/fan-hub/the-daily-show[?]xrs=PPM-18-10caf1d/?$
([.]|[/])southpark[.]cc[.]com/?$
([.]|[/])southpark[.]cc[.]com/wiki/Main/?$
Results:
- Enwiki - Checked 918 pages and edited 782 pages. Moved 135 links to a new URL. Removed 2
{{dead link}}
templates. Added 121{{dead link}}
. Switched 18|url-status=dead
to live. Switched 136|url-status=live
to dead. Added 656 archive URLs (638 Wayback). Changed 131 citation metadata fields. - IABot DB Checked 2,338 links of which modified 1,675. Changes will propagate to 300+ wikis via IABot.
Done -- GreenC 19:03, 4 July 2024 (UTC)
tvland.com
TV Land. More Paramount cost cutting: https://www.hollywoodreporter.com/business/business-news/comedy-central-website-daily-show-clips-wiped-out-1235933345/amp/ -- GreenC 15:08, 27 June 2024 (UTC)
100 pages -- GreenC 15:12, 27 June 2024 (UTC)
Soft-404 when a link redirects to:
paramountplus[.]com/browse/#ftag=PPM-18-10caf1i/?$
Results:
- Enwiki: Checked 106 pages and edited 42 pages. Moved 17 links to a new URL. Added 1
{{dead link}}
. Switched 2|url-status=live
to dead. Added 22 archive URLs (20 Wayback). Changed 8 citation metadata fields. - IABot DB: Checked 184 links and modified 172
Done -- GreenC 00:41, 5 July 2024 (UTC)
zdnet.com
"it went through major changes that broke many of its old links" - https://www.msn.com/en-us/news/technology/the-internet-s-memory-is-under-threat/ar-BB1oRToN
5,100 pages -- GreenC 19:01, 27 June 2024 (UTC)
Soft-404s are any redirects that match this:
zdnet[.]com/topic/(virtualization|microsoft|enterprise-software|apple|networking|education|storage|computing|hardware|social-media|government|smartphones|innovation|open-source|security|smb|collaboration|google|big-data)/?$
Bot results:
- Enwiki: Checked 5,148 pages and edited 2,442 pages. Moved 2,325 links to a new URL. Removed 2
{{dead link}}
templates. Added 52{{dead link}}
. Switched 32|url-status=dead
to live. Switched 116|url-status=live
to dead. Added 639 archive URLs (554 Wayback). Changed 403 citation metadata fields. - IABot DB: Checked 9,883 links and modified 2,629. Changes will propagate to 300+ wikis.
Done -- GreenC 02:24, 6 July 2024 (UTC)
mtv.pl
Paramount site per #mtv.com: mtv.pl (mtv.pl/newsy) 35 pages ---- GreenC 14:14, 2 July 2024 (UTC)
Done - Checked 35 pages and edited 19 pages. Added 3 {{dead link}}
. Added 39 archive URLs (39 Wayback). Changed 2 citation metadata fields. -- GreenC 02:46, 6 July 2024 (UTC)
mtv.co.uk
Paramount site per #mtv.com: mtv.co.uk (mtv.co.uk/news) 1,700 pages -- GreenC 14:15, 2 July 2024 (UTC)
Bot results:
- Enwiki - Checked 1,741 pages and edited 1,503 pages. Moved 942 links to a new URL. Removed 1
{{dead link}}
templates. Added 22{{dead link}}
. Switched 61|url-status=dead
to live. Switched 228|url-status=live
to dead. Added 755 archive URLs (688 Wayback). Changed 280 citation metadata fields.
- IABot DB - Checked 2,616 unique links and fixed 1,550 which will update on 300+ wikis
Done -- GreenC 21:59, 6 July 2024 (UTC)
catholicnewsagency.com
This page "http://www.catholicnewsagency.com/new.php?n=15190" soft-redirects to "http://www.catholicnewsagency.com/news/15190" which redirects to "https://www.catholicnewsagency.com/news/15190/cardinal-rouco-opens-cause-of-canonization-for-spanish-couple" (example from Opus Dei) GrapesRock (talk) 18:10, 2 July 2024 (UTC)
GrapesRock: In Charles of Sezze I found: http://www.catholicnewsagency.com/saint.php?n=416 .. do you think it soft-redirects? Also http://www.catholicnewsagency.com/document.php?n=147 in Maria Gabriella Sagheddu, and https://www.catholicnewsagency.com/martyrology_entry.php?n=596 in List of Catholic saints, and http://www.catholicnewsagency.com/resource.php?n=409 in Natural marriage, and http://www.catholicnewsagency.com/column.php?n=1360 in Mark Templeton (trombonist). There are probably more. -- GreenC 22:22, 6 July 2024 (UTC)
- The saint one seems to exist here https://www.catholicnewsagency.com/saint/st-charles-of-sezze-416 , so there's still that 416. I don't see an algorithmic way to figure out what belongs between saint/ and 416
- Can't find a newer location for http://www.catholicnewsagency.com/document.php?n=147
- Ditto on the martyrology link
- Ditto on the resource link
- The column link soft-redirects to https://www.catholicnewsagency.com/column/51359
- Seems like adding 49999 to the number works, at least in the example above and http://www.catholicnewsagency.com/column.php?n=2930 (from The Fault in our Stars) soft-redirecting to http://www.catholicnewsagency.com/column/52929
- GrapesRock (talk) 23:05, 6 July 2024 (UTC)
- Thanks. -- GreenC 01:28, 7 July 2024 (UTC)
Bot results:
- Enwiki - Checked 2,004 pages and edited 1,352 pages. Moved 1,754 links to a new URL. Removed 9
{{dead link}}
templates. Added 5{{dead link}}
. Switched 70|url-status=dead
to live. Switched 5|url-status=live
to dead. Added 101 archive URLs (90 Wayback). Changed 75 citation metadata fields. -- GreenC 01:28, 7 July 2024 (UTC) - IABot DB - Checked 3,445 unique links and changed 432 which will propagate across 300+ wikis.
Done -- GreenC 14:13, 7 July 2024 (UTC)
ew.com
Hello, Old URLs for Entertainment Weekly that mainly consist of numbers don't work. These links can be sorted into multiple categories:
- 1) Links that already have an archived copy in the article: this link at Don't Tell Me (Avril Lavigne song) is here.
- 2) Links that can be moved over to a new URL: this should go to that for Buckshot_LeFonque_(album). This has /ew/ removed, while adding date and title in the URL.
- 3) Links at a new URL but not in the date/title format: this is now here for Heavy Competition.
- 4) Links that can't be moved over to a new URL: this link at Fast Times at Barrington High needs an archived copy.
Since the new URLs don't always match the date/title format, I would like the broken links to be focused on first Then, if any of the archived links have a working URL in the date/title format, they could be converted over. Any links that don't have a matching date/title format can keep the archived URL because I can't predict the new URL.
- http://ew.com/ew/ 18
- https://ew.com/ew/ 188
- http://www.ew.com/ew/ 6000
- https://www.ew.com/ew/ 3154
These numbers include ones that are already fixed, such at the above link at Don't Tell Me. Thanks! MrLinkinPark333 (talk) 18:54, 4 July 2024 (UTC)
For #2 this is one of those dead redirects that eventually leads to the answer. It's a multi step process:
- Convert http://www.ew.com/ew/article/0,,302934,00.html to https://redir.ew.com/ew/article/0,,302934,00.html
- Run this which finds a redirect saved in the Wayback Machine:
wget -q -O- 'https://web.archive.org/cdx/search/cdx?url=https://redir.ew.com/ew/article/0,,302934,00.html&MatchType=prefix' | awk -v u="https://redir.ew.com/ew/article/0,,302934,00.html" '/text\/html 30[12]/{a[++i]=$2}END{print "https://web.archive.org/web/" a[i] "/" u}'
- With the answer from #2 run this:
curl -ILs 'https://web.archive.org/web/20160606115016/https://redir.ew.com/ew/article/0,,302934,00.html' | /usr/local/bin/awk '/^[ ]*[Ll]ocation:/{sub("^[ ]*[Ll]ocation:[ ]*https?://web[.]archive[.]org/web/[0-9]{14}id_/", "", $0); a[++i]=$0}END{print a[i]}'
- Which produces: https://web.archive.org/web/20151030104409/http://www.ew.com/article/1994/07/15/buckshot-lefonque .. from which extract the answer: http://www.ew.com/article/1994/07/15/buckshot-lefonque
For #3 same as #2, although in this example it leads to a soft-404.
For #4 same as #2. There are over 100,000 URLs in the wayback machine to redir.ew.com so there should be a good chance of finding, though not for this example. -- GreenC 22:27, 7 July 2024 (UTC)
- If that's easier than extracting the title/date, that could work. Didn't know this would be a complex change. Maybe more URLs could be fixed than I thought, even the ones that already have archived copies! MrLinkinPark333 (talk) 22:39, 7 July 2024 (UTC)
- Well, like in https://ew.com/ew/article/0,,302934,00.html there is no title/date in the URL string, and the page itself is dead, so the only way is search the Wayback Machine for old redirects. Unless you have another idea. It does appear to be working pretty well so far getting a lot, it's only slow because of the multiple I/O steps, and large number of links to process. -- GreenC 00:55, 8 July 2024 (UTC)
- I mean title/date in the citation would be added to the URL string like at buckshot. Then again, there's not always dates nor does the title always match the URL like for Heavy Competition. Searching https://web.archive.org/web/*/http://www.ew.com/article/2009/04/17/* for Heavy Competition doesn't give an archived updated URL. MrLinkinPark333 (talk) 20:37, 8 July 2024 (UTC)
- Ah yeah that method will be very hit or miss because it depends on the title in Wiki matching exactly the title in the URL. It worked for a two-word title, but I suspect for longer titles the URL drops words and punctuation. Maybe with a lot of experimenting. It also depends on the date being available. For now, I am doing the above old-redirects-in-wayback method which is getting a lot of positive results. -- GreenC 15:13, 9 July 2024 (UTC)
- I mean title/date in the citation would be added to the URL string like at buckshot. Then again, there's not always dates nor does the title always match the URL like for Heavy Competition. Searching https://web.archive.org/web/*/http://www.ew.com/article/2009/04/17/* for Heavy Competition doesn't give an archived updated URL. MrLinkinPark333 (talk) 20:37, 8 July 2024 (UTC)
- Well, like in https://ew.com/ew/article/0,,302934,00.html there is no title/date in the URL string, and the page itself is dead, so the only way is search the Wayback Machine for old redirects. Unless you have another idea. It does appear to be working pretty well so far getting a lot, it's only slow because of the multiple I/O steps, and large number of links to process. -- GreenC 00:55, 8 July 2024 (UTC)
- Yesterday there was a multirack hardware outage at archive.org that may take some time to fully recover, some of the services I need for this work are running slow or intermittent. -- GreenC 14:37, 8 July 2024 (UTC)
- MrLinkinPark333, I need to scale back the number of parallel processes to 2 because it's slow and tying up my rate-limited slots at the archive. This works but it will be running a long time. This way I can move on to other work. It will remain in "working" mode for a while, could be weeks not sure. Diff uploads will be in intermittent batches. -- GreenC 15:13, 9 July 2024 (UTC)
- No worries. It is a big request after all. MrLinkinPark333 (talk) 20:49, 9 July 2024 (UTC)
- MrLinkinPark333, I need to scale back the number of parallel processes to 2 because it's slow and tying up my rate-limited slots at the archive. This works but it will be running a long time. This way I can move on to other work. It will remain in "working" mode for a while, could be weeks not sure. Diff uploads will be in intermittent batches. -- GreenC 15:13, 9 July 2024 (UTC)
Bot results:
- NOTE: List of URL discoveries for future reference, or in case anyone needs it.
Enwiki: Done in 5 chunks, total was 8,673 pages
- Checked 1,000 pages and edited 804 pages. Moved 454 links to a new URL. Removed 1
{{dead link}}
templates. Added 4{{dead link}}
. Switched 164|url-status=dead
to live. Switched 97|url-status=live
to dead. Added 359 archive URLs (290 Wayback). Changed 106 citation metadata fields.
- Checked 1,000 pages and edited 910 pages. Moved 604 links to a new URL. Removed 1
{{dead link}}
templates. Added 5{{dead link}}
. Switched 286|url-status=dead
to live. Switched 78|url-status=live
to dead. Added 333 archive URLs (274 Wayback). Changed 113 citation metadata fields.
- Checked 2,220 pages and edited 1,761 pages. Moved 1,172 links to a new URL. Removed 2
{{dead link}}
templates. Switched 457|url-status=dead
to live. Switched 170|url-status=live
to dead. Added 724 archive URLs (581 Wayback). Changed 180 citation metadata fields.
- Checked 2,220 pages and edited 1,731 pages. Moved 1,126 links to a new URL. Removed 1
{{dead link}}
templates. Added 3{{dead link}}
. Switched 450|url-status=dead
to live. Switched 145|url-status=live
to dead. Added 736 archive URLs (657 Wayback). Changed 184 citation metadata fields.
- Checked 2,233 pages and edited 1,734 pages. Moved 1,187 links to a new URL. Removed 4
{{dead link}}
templates. Added 5{{dead link}}
. Switched 490|url-status=dead
to live. Switched 167|url-status=live
to dead. Added 724 archive URLs (619 Wayback). Changed 198 citation metadata fields.
IABot DB: Checked 14,591 unique links and changed 13,886 which will propagate across 300+ wikis via IABot
Done -- GreenC 05:27, 15 July 2024 (UTC)
pac-12.com
Hello. Pacific 12 Conference links are not working. Some can be moved to a new URL while others cant:
- /article/ link that has a new URL: this is now here - /article/ is swapped to /news/, and adds aspx to the end for John Ross (American football).
- If the day number starts with a 0, the 0 gets removed.as well like this is now here for N'Keal Harry.
- However, this does not always work. Changing this to that gives a 404 for 2018–19 Pac-12 Conference men's basketball season. The article is also missing from their archives.
Miscellaneous links that do not work that are not under the /article/ format include /content/, and /event/.
Thanks! MrLinkinPark333 (talk) 19:17, 5 July 2024 (UTC)
- Enwiki: Checked 879 pages and edited 825 pages. Moved 736 links to a new URL (per soft-redirect rules given above). Added 72
{{dead link}}
. Switched 9|url-status=dead
to live. Switched 45|url-status=live
to dead. Added 1,261 archive URLs (1,235 Wayback). Changed 135 citation metadata fields. - IABot DB: Checked 1,193 unique links and updated 890. Will propagate across 300+ wikis via IABot.
-- GreenC 04:35, 10 July 2024 (UTC)
- Looking through the remainder links, I see the third link above now redirects here. The date was wrong in the old URL.Therefore, I think this could be revisited later to see if any more links have been moved. I don't think it's worth going through them again now as I've only found that one so far today. MrLinkinPark333 (talk) 22:56, 10 July 2024 (UTC)
- Hmm that exposes a logic flaw in my bot. If there is a soft-redirect defined, it checks the status and when it fails, it assumes the URL is dead. In this case, the soft-redirect failed but a hard redirect existed. It adds overhead to check again so I never do, I am presuming the original link must be dead. As always, can't presume. The question is the overhead worth it - yes for this site, maybe not for others. It's worth retrying a set of articles to see what happens -- GreenC 01:04, 11 July 2024 (UTC)
- The link wasn't working on Friday but it is today. This makes sense as when I asked Pac-12 about the broken links on Friday, they said not all of the links were redirecting while they're reworking the site. MrLinkinPark333 (talk) 01:40, 11 July 2024 (UTC)
- Hmm that exposes a logic flaw in my bot. If there is a soft-redirect defined, it checks the status and when it fails, it assumes the URL is dead. In this case, the soft-redirect failed but a hard redirect existed. It adds overhead to check again so I never do, I am presuming the original link must be dead. As always, can't presume. The question is the overhead worth it - yes for this site, maybe not for others. It's worth retrying a set of articles to see what happens -- GreenC 01:04, 11 July 2024 (UTC)
- ω Awaiting site to be reworked -- GreenC 23:34, 11 July 2024 (UTC)
tampabay.com
Both /blogs/soundcheck
and /blogs/the-buzz-florida-politics
extensions seem to be removeable from the link to achieve a soft-redirect.
Examples:
- http://www.tampabay.com/blogs/soundcheck/xxxtentacion-announces-free-show-at-the-orpheum-in-tampa-this-saturday/2335772 soft-redirects to https://www.tampabay.com/xxxtentacion-announces-free-show-at-the-orpheum-in-tampa-this-saturday/2335772/ (from XXXTentacion)
- http://www.tampabay.com/blogs/the-buzz-florida-politics/rubio-comes-out-in-support-of-medical-marijuana-but-not-ballot/2190709 soft-redirects to https://www.tampabay.com/rubio-comes-out-in-support-of-medical-marijuana-but-not-ballot/2190709/ (from Marco Rubio)
GrapesRock (talk) 00:58, 7 July 2024 (UTC)
6,800 pages -- GreenC 16:58, 10 July 2024 (UTC)
Bot results:
- Enwiki: Checked 6,846 pages and edited 2,329 pages. Moved 2,533 links to a new URL. Removed 3
{{dead link}}
templates. Added 83{{dead link}}
. Switched 214|url-status=dead
to live. Switched 84|url-status=live
to dead. Added 944 archive URLs (751 Wayback). Changed 164 citation metadata fields. - IABot DB: Checked 9,980 unique links and updated 3,366 which will propagate across 300+ wikis via IABot
Started a thread about this at User talk:GreenC bot#Tampabay.com. ▶ I am Grorp ◀ 00:43, 12 July 2024 (UTC)
- Done -- GreenC 04:35, 12 July 2024 (UTC)
bet.com
Black Entertainment Television (Paramount) - 2,100 pages -- GreenC 02:01, 7 July 2024 (UTC)
Soft-404 redirect rules:
bet-awards(/(nominees|performers|photos))?/?$
soul-train-awards(/nominees)?/?$
hip-hop-awards(/(nominees|photos|videos))?/?$
shows(/soul-train-awards)?/?$
vertical(/(jo1ilh/celebrity|o2fii9/news))?/?$
topic(/betexperience)?/?$
Bot results:
- Enwiki: Checked 2,143 pages and edited 1,522 pages. Moved 1,190 links to a new URL. Added 57
{{dead link}}
. Switched 36|url-status=dead
to live. Switched 71|url-status=live
to dead. Added 620 archive URLs (584 Wayback). Changed 263 citation metadata fields. - IABot DB: Checked 3,087 unique links and changed 1,164 which will propagate across 300+ wikis via IABot
Done -- GreenC 04:22, 13 July 2024 (UTC)
abclocal.go.com
Soft-redirects that I found:
- kfsn -> abc30.com
- kgo -> abc7news.com
- /story?section=news/local/los_angeles -> abc7news.com
- kabc -> abc7.com
- kgo -> abc7.com
- wabc -> abc7ny.com
- wls -> abc7chicago.com
- wtvd -> abc11.com
- ktrk -> abc13.com
- http://abclocal.go.com/ktrk/story?section=news/local&id=8700136 soft-redirects to https://abc13.com/archive/8700136/ (from BP)
These ABC affiliates don't seem to work: wpvi, wjrt, wtvg
I'm sure that there's other ones that didn't appear in the first two pages of searching abclocal.go.com (and I can test them if you link them). I found the correct domain by searching the thing directly after .com/ (such as ktrk) on the internet and selecting the corresponding ABC affiliate (and found the Elton John one by searching the title given in the WP article).
(and a question: is this page just for moving *dead* links? I think all pages with espn.go.com in the domain now are either dead or redirect normally) GrapesRock (talk) 16:27, 7 July 2024 (UTC)
- Looks like some wpvi links actually do redirect, such as http://abclocal.go.com/wpvi/story?section=news/politics&id=6038619 to https://6abc.com/archive/6038619/ (from Dana Redd).
- Some, however, do not, such as http://abclocal.go.com/wpvi/story?section=entertainment&id=4498224 (from Elton John). GrapesRock (talk) 18:33, 7 July 2024 (UTC)
- To answer your question, the bot can/will also move redirects that it encounters. Assuming they pass any soft-404 rules. -- GreenC 19:24, 7 July 2024 (UTC)
- GrapesRock, good discovery. The bot checked all abclocal.go.com links on enwiki, it didn't find any additional affiliates. It attempted the soft-redirects per rules above. If not live, it added an archive URL. One thing I did not check, it's possible the new soft-redirect URL when it returns 404 it previously worked and thus has an archive URL, but the one's I manually checked either don't exist or end up soft-404 (redirect to home page), so I didn't look for those. I did check wjrt (abc12.com) and wtvg (13abc.com) for any working soft-redirects. -- GreenC 21:52, 13 July 2024 (UTC)
Bot results:
- Enwiki: Checked 1,857 pages and edited 1,422 pages. Moved 1,389 links to a new URL. Removed 13
{{dead link}}
templates. Added 74{{dead link}}
. Switched 650|url-status=dead
to live. Switched 18|url-status=live
to dead. Added 301 archive URLs (184 Wayback). Changed 30 citation metadata fields. - IABot DB: Checked 3,938 unique links and updated 3,889 which will propagate to 300+ wikis via IABot
- Done -- GreenC 02:32, 14 July 2024 (UTC)
espn.go.com
The pages redirect to the espn.com domain per https://www.niemanlab.org/2016/08/espn-com-has-finally-replaced-espn-go-com-and-a-tweet-about-google-seo-may-be-part-of-why/ (though not all do).
28,843 pages GrapesRock (talk) 20:52, 7 July 2024 (UTC)
Soft-404 rules: over 100 - contact me for the list.
Bot results:
- Enwiki: being done in segments of 3k to 8k pages
- (#1 to 5000) Checked 5,000 pages and edited 4,715 pages. Moved 11,901 links to a new URL. Removed 7
{{dead link}}
templates. Added 115{{dead link}}
. Switched 275|url-status=dead
to live. Switched 149|url-status=live
to dead. Added 1,219 archive URLs (887 Wayback). Changed 411 citation metadata fields. - (#13001 to 16000) Checked 3,000 pages and edited 2,830 pages. Moved 6,940 links to a new URL. Removed 3
{{dead link}}
templates. Added 50{{dead link}}
. Switched 166|url-status=dead
to live. Switched 63|url-status=live
to dead. Added 660 archive URLs (415 Wayback). Changed 260 citation metadata fields. - (#6001 to 13000) Checked 8,000 pages and edited 7,684 pages. Moved 17,056 links to a new URL. Removed 12
{{dead link}}
templates. Added 289{{dead link}}
. Switched 465|url-status=dead
to live. Switched 549|url-status=live
to dead. Added 3,661 archive URLs (2,989 Wayback). Changed 596 citation metadata fields. - (#1 to 5000 + #13001 to 16000) - reprocessed with new code updates: Checked 8,000 pages and edited 1,923 pages. Moved 2 links to a new URL. Switched 16
|url-status=dead
to live. Switched 529|url-status=live
to dead. Added 1,791 archive URLs (1,791 Wayback). Changed 546 citation metadata fields. - (#16001 to 22000) Checked 6,000 pages and edited 5,614 pages. Moved 12,433 links to a new URL. Removed 5
{{dead link}}
templates. Added 134{{dead link}}
. Switched 311|url-status=dead
to live. Switched 326|url-status=live
to dead. Added 2,610 archive URLs (2,017 Wayback). Changed 543 citation metadata fields.
- (#1 to 5000) Checked 5,000 pages and edited 4,715 pages. Moved 11,901 links to a new URL. Removed 7
Still Working days more -- GreenC 03:41, 17 July 2024 (UTC)
social.techcrunch.com
There's server errors on pages in this domain, and removing "social" from the domain fixes them (and based on the archives of the two links they've had this server error since at least October 2023)
Examples (from Instagram):
- https://social.techcrunch.com/2021/05/04/instagram-adds-a-captions-option-for-stories-and-soon-reels/
- https://social.techcrunch.com/2020/11/14/this-week-in-apps-conservative-apps-surge-instagram-redesigned-tiktok-gets-ghosted/
1,818 pages GrapesRock (talk) 21:45, 7 July 2024 (UTC)
The Old New Thing
There are ~100 pages with URLs like http://blogs.msdn.com/b/oldnewthing/archive/2004/09/02/224672.aspx. These are dead but can be fixed by extracting the date from the URL and rewriting it as https://devblogs.microsoft.com/oldnewthing/20040902-00 ("-00" is static), which points to a list of blog posts on that day. Often there's only one, in which case it's unambiguous, but occasionally there's two or more and you need to disambiguate them somehow. Comparing the citation title with the article title is one possibility. (The destination URL for that instance is https://devblogs.microsoft.com/oldnewthing/20040902-00/?p=37983 with a numeric ID that can't be found any way I know of). * Pppery * it has begun... 23:02, 8 July 2024 (UTC)
- User:Pppery, I will do this, but the disambiguation based on
|title=
, probably not. Trying to read and match citation information breaks the model of the bot, it would require a special parser, can be difficult and error prone, and it's probably not that many pages (a percentage of 100 pages), could be done manually by someone faster and more accurately. I could probably parse the HTML to see when there are multiple blog items and log those URLs so we know which articles and URLs need checking. -- GreenC 05:16, 14 July 2024 (UTC)
billboard.com/bbcom/
Hello. Billboard URLs with /bbcom/ are broken. I found this redirects to that for Because You Left. However, the rest of links I tried do not redirect.The numbers in the URL also don't match so it's not a simple find and replace.
- HTTPS - 300.
- HTTP - ~5600.
- HTTP without www - ~220
- HTTPS without www - 25
These include links not in mainspace and ones that are already archived. Thanks! MrLinkinPark333 (talk) 23:47, 14 July 2024 (UTC)
- I checked WaybackMachine for Ghost redirects, like with ew.com, but no ghosts. I guess the solution will be to archive (and any working redirects). -- GreenC 05:18, 15 July 2024 (UTC)
pqasb.pqarchiver.com
Hello again. There are a ton of pqarchiver links broken. These articles can be found at ProQuest but fall into 2 categories:
- 1. URLs with /doc/ can be converted into ProQuest URLs. this is now here for Here, My Dear. Some URLs have a different source type, such as converting this goes here for 1916 Michigan Agricultural Aggies football team and ends in Historical Newspapers. Template:ProQuest can help with this. ProQuest 565997669 points to this and will redirect to the right link for the Aggies article.
- 2. URLs that don't have /doc/ can't be converted as the new URLs don't match the number. Such as this is now here for 1974–75 Buffalo Sabres season.These need regular archives as I can't predict what the number will be in the new URL..
Please note that not all of these links are in articlespace. Also, some of these already have archived copies. Since it's so much, I don't mind if only the /doc/ URLs are focussed on and the other URLS dealt with later. Thanks again! MrLinkinPark333 (talk) 01:55, 17 July 2024 (UTC)
IEEE log in 404s
Most of these (search link) are broken and can be replaced.
E.g.
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=933500&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F2%2F20203%2F00933500
can be replaced with
https://ieeexplore.ieee.org/document/933500/
as long as the first link is 404 and the second link resolves as 200.
The proper ID is written in the string arnumber=933500
Jonatan Svensson Glad (talk) 01:33, 19 July 2024 (UTC)
goo.gl
Google has announced it will no longer support goo.gl shortened links after 25 August 2025. We have quite a lot of these in use on the Wiki at present. It may be necessary to look at a project/bot to replace them with the lengthy URLs. Stifle (talk) 07:52, 19 July 2024 (UTC)
- Google URL Shortener is about the service. https://developers.googleblog.com/en/google-url-shortener-links-will-no-longer-be-available/ also says: "Starting August 23, 2024, goo.gl links will start displaying an interstitial page for a percentage of existing links notifying your users that the link will no longer be supported after August 25th, 2025 prior to navigating to the original target page." PrimeHunter (talk) 08:56, 19 July 2024 (UTC)
- Thank you. I'll triage this after ESPN, above, which will be a couple more days at least. URL shortening was supposedly disallowed on Enwiki. The interstitial page could interfere with the bot, I'll start migrating sooner than later. -- GreenC 15:32, 19 July 2024 (UTC)
- meta:Spam blacklist says
\bgoo\.gl\b(?!/maps\b).*
, allowing goo.gl/maps. I guess the rationale was that it can only redirect to Google pages which aren't blacklisted so it cannot be used to bypass a blacklist entry, and it was probably assumed that Google would keep it working as long as they keep the target online. I think there was a time where Google itself gave goo.gl/maps links without the user asking for url shortening, so it would have been annoying if goo.gl/maps was blacklisted. PrimeHunter (talk) 20:34, 19 July 2024 (UTC)
- meta:Spam blacklist says
- Thank you. I'll triage this after ESPN, above, which will be a couple more days at least. URL shortening was supposedly disallowed on Enwiki. The interstitial page could interfere with the bot, I'll start migrating sooner than later. -- GreenC 15:32, 19 July 2024 (UTC)
- insource:goo.gl insource:/([.]|\/)goo[.]gl/ = 4,000+ pages.
- Over 90% are to goo.gl/maps, and about 350 to images.google.com where they are incorrectly/uselessly in the
|image=
field of infoboxen. -- GreenC 17:40, 19 July 2024 (UTC)
- Over 90% are to goo.gl/maps, and about 350 to images.google.com where they are incorrectly/uselessly in the
ghostarchive.org
Web archive provider GhostArchive is dead as of July 19. insource:ghostarchive insource:/ghostarchive[.]org/ = 66,000 pages to be converted/deleted. -- GreenC 21:20, 19 July 2024 (UTC)
- I just tried two links [28] from England national football team and [29] from YouTube and they both seemed to work? GrapesRock (talk) 03:14, 20 July 2024 (UTC)
- Same, GhostArchive works fine for me... don't know why GreenC is/was having trouble. Nex 🌐 📰 leave a message 04:05, 20 July 2024 (UTC)