Jump to content

Wikipedia:Link rot/URL change requests/Archives/2024/June

From Wikipedia, the free encyclopedia


RateTheRef.net

The website RateTheRef.net seems to have been usurped by a Thai gambling site. I don't know how many pages this affects, or whether the old content has been archived, but I figured someone ought to be told. DavidKVT (talk) 21:21, 18 March 2024 (UTC)

 Done User:DavidKVT: Thank you. Added to the JUDI list for a batch job later: Special:Diff/1207703597/1214769148 -- GreenC 01:26, 21 March 2024 (UTC)

symetratour.com

Hello. The Symetra Tour has been renamed to The Epson Tour. Their links have been subsquently moved. Here is the new format:

Some links can not be converted such as [1] this link because the event is no longer held. Other links like this one needs the word symetra changed to epson in order to work like this to that. I fixed some already. 91 links under http and 95 under https currently to fix. Thanks! MrLinkinPark333 (talk) 01:42, 12 May 2024 (UTC)

116 pages. -- GreenC 20:42, 3 June 2024 (UTC)
 Done - Checked 116 pages and edited 116 pages. Moved 109 links to a new URL. Switched 1 |url-status=dead to live. Added 78 archive URLs (77 Wayback).
-- GreenC 22:42, 3 June 2024 (UTC)

britannica.co.kr

This was brought to my attention through Special:Diff/1224405115. The following hostname should be marked as dead and set to the archived urls given that they are no longer serving any content and being redirected to the company's corp site, or simply dead:

  • *.britannica.co.kr

– robertsky (talk) 10:56, 18 May 2024 (UTC)

53 pages. -- GreenC 20:44, 3 June 2024 (UTC)
 Done - Checked 53 pages and edited 20 pages. Added 4 {{dead link}}. Added 16 archive URLs (8 Wayback). -- GreenC 00:46, 4 June 2024 (UTC)

cinestaan.com

It looks like the site is dead as I cannot find it on Google search, and an article is error 503. Check this out too. Kailash29792 (talk) 11:10, 23 March 2024 (UTC)

2,243 pages. Offline since December 2023. I can do this. -- GreenC 14:16, 23 March 2024 (UTC)
 Done Checked 2,243 pages. Edited 2,206 pages. Added 2,371 archive URLs all WaybackMachine. Added 312 {{dead link}} tags. Added 255 |url-status=dead for existing archive URLs previously set live. Updated IABot database so changes will propagate to 300+ other wiki language sites. -- GreenC 16:54, 24 March 2024 (UTC)

Bumping thread. GreenC 19:27, 4 June 2024 (UTC) -- GreenC 19:27, 4 June 2024 (UTC)

nfl.com

Hello. I found that URLs under the http://www.nfl.com/news/story/ format are either broken or redirect to a new URL:

  • URLs with only numbers are broken, and might have an archived copy.
  • URLs with a numbers and letters string might redirect to the new URL. This redirect works
  • Some URLs with a number/letters string don't work and need converting with the article name in the URL: This URL should go here
  • Some URLs with numbers/letters and article name might redirect to new URLs: This is now here.

9000+ links under http and 100+ links under https Thanks! MrLinkinPark333 (talk) 20:22, 26 May 2024 (UTC)

3,400 pages. -- GreenC 20:46, 3 June 2024 (UTC)
 Done - Checked 3,402 pages and edited 3,236 pages. Moved 6,863 links to a new URL. Added 32 {{dead link}}. Switched 72 |url-status=dead to live. Switched 241 |url-status=live to dead. Added 1,043 archive URLs (1,007 Wayback). Changed 888 citation metadata fields. -- GreenC 18:32, 4 June 2024 (UTC)

deccanchronicle.com

Deccan Chronicle: Many 2010s articles like this are dead. Kailash29792 (talk) 05:02, 2 June 2024 (UTC)

8,000 pages -- GreenC 20:51, 3 June 2024 (UTC)
 Done - Checked 8,059 pages and edited 3,532 pages. Moved 3,217 links to a new URL. Added 81 {{dead link}}. Switched 334 |url-status=dead to live. Switched 208 |url-status=live to dead. Added 742 archive URLs (694 Wayback). Changed 1,018 citation metadata fields. -- GreenC 22:35, 5 June 2024 (UTC)

cnlbr.org

Old path of "www.cnlbr.org/Portals/.../pagename" moved to "irp.cdn-website.com/33d0c3d0/files/uploaded/pagename"

-- BX (talk) 20:58, 2 June 2024 (UTC)

BX, can you clarify. For example, old URL http://www.cnlbr.org/Portals/0/Hero/Herbert_Rap_Dixon.pdf goes to ? -- GreenC 21:00, 3 June 2024 (UTC)
@GreenC: The old path after "Portals/" varied, however the new path has no variables. So for your example, the new path is https://irp.cdn-website.com/33d0c3d0/files/uploaded/Herbert_Rap_Dixon.pdf It's basically just cutting the last "pagename" from the old path and pasting it to the the new prefix, if that makes sense. Rgdrs. --BX (talk) 04:03, 4 June 2024 (UTC)
Got it, didn't realize "33d0c3d0" is a static string. 138 pages. -- GreenC 19:13, 4 June 2024 (UTC)
User:BX: There were edge cases in about 30 URLs. Needed to convert "%20%20" to "%20". And in some, changing ".pdf" to "-2020.pdf" - After those changes, I was able to convert all to live links. I made metadata changes eg, changing |publisher=cnlbr.org to |publisher=Center for Negro League Baseball Research, because supposed to use names vs. domains. Anything that was previous marked dead and had an archive URL, I changed the primary URL to the live version and set |url-status=live and kept the original archive URL. -- GreenC 01:37, 7 June 2024 (UTC)
 Done Checked 140 pages and edited 140 pages. Moved 321 links to a new URL. Switched 17 |url-status=dead to live. Changed 185 citation metadata fields. -- GreenC 01:37, 7 June 2024 (UTC)
Wow, thanks User:GreenC. The work you and your bot do is invaluable to keeping this place working. Thanks again! Rgrds. --BX (talk) 04:04, 7 June 2024 (UTC)
Thank you! Your appreciation helps to keep this going. -- GreenC 14:59, 7 June 2024 (UTC)

google.com/hostednews

Soft-404s and 404s. 5,300 pages. -- GreenC 20:35, 3 June 2024 (UTC)

 Done - Checked 5,322 pages and edited 4,351 pages. Converted 1 templates. Removed 2 {{dead link}} templates. Added 1,739 {{dead link}}. Switched 707 |url-status=live to dead. Added 3,633 archive URLs (2,179 Wayback). Changed 176 citation metadata fields. -- GreenC 15:42, 7 June 2024 (UTC)

ECI - Election Commission of India

The ECI has changed links for a lot of election results on their site. e.g. [2] to [3]. -MPGuy2824 (talk) 11:43, 14 May 2024 (UTC)

4,700 pages -- GreenC 20:40, 3 June 2024 (UTC)
Question? User:MPGuy2824: The "old." links are not working https://old.eci.gov.in/assembly-election/ae-2021-tamilnadu/ although they were, it exists at Wayback [4] .. hopefully a temporary outage. I'll recheck in a week or ping me if you see it change before then. -- GreenC 23:01, 3 June 2024 (UTC)
It looks like geofencing, as the link works for me (in India). Let's wait a week as you suggest. -MPGuy2824 (talk) 05:49, 4 June 2024 (UTC)
There is a new BRFA at Wikipedia:Bots/Requests for approval/BaranBOT 2. – DreamRimmer (talk) 13:00, 8 June 2024 (UTC)

cinestaan.com makelive

Mysteriously the site is back and working, per this. Maybe the dead links can be reassessed? Kailash29792 (talk) 04:14, 4 June 2024 (UTC)

Previous: Wikipedia:Link_rot/URL_change_requests#cinestaan.com -- GreenC 19:28, 4 June 2024 (UTC)
I changed the domain status from "Permadead" to "Permalive" in iabot.org --- for the moment the bot won't convert links to dead automatically. For Enwiki, Medic has a "makelive" function which I could apply to any link responding with status 200. -- GreenC 19:37, 4 June 2024 (UTC)
It checked every link, any that are 200 it converted to a live link. -- GreenC 01:37, 8 June 2024 (UTC)

 Done - Checked 2,242 pages and edited 2,033 pages. Moved 2,360 links to a new URL. Removed 153 {{dead link}} templates. Added 29 {{dead link}}. Switched 1,943 |url-status=dead to live. Added 6 archive URLs (6 Wayback). Changed 65 citation metadata fields.

google.com/patents

2,700 pages. -- GreenC 20:51, 7 June 2024 (UTC)

The way GreenC bot‎ is handling these by replacing them with half-broken archive.today links is problematic. The bot's activity on this should be paused, the changes made so far should be reverted, and someone should write a bot/script which properly fixes the URLs to current working versions. –jacobolus (t) 16:49, 8 June 2024 (UTC)
My reply here Special:Diff/1227943344/1227944480. I agree that it's a good idea to switch archived URLs to live URLs, and my bot can do that. But I need to know what the live URL is. And your not providing information how to figure that out. Currently, the bot is repairing a completely broken non-functioning URL with an archive URL. I understand the archive URL is incomplete, but at least better than a completely dead URL. If there is a way to determine the live URL, I can replace the archive URL with the live URL. -- GreenC 17:01, 8 June 2024 (UTC)
Looks like the Patent ID is in the title of the archive.today page eg. for [5]: <title>Patent US417831 - ARTISTS EASEL - Google Patents</title> from which can be generated https://patents.google.com/patent/US417831A .. although I am unclear about "A", how to determine. -- GreenC 17:50, 8 June 2024 (UTC)
Early on Google made up a new identifier for every patent. More recently they have sensibly figured out how to use the patent number itself. I think the A is optional; it's just the form of URL that turned up when I did a search for a couple of these specific patents. You can see how https://patents.google.com/patent/US640792A and https://patents.google.com/patent/US640792 give the same result. –jacobolus (t) 21:48, 8 June 2024 (UTC)
The archive roll back is done. Edited 731 articles and 1,468 citations. Example: Special:Diff/1227937185/1228131980 and Special:Diff/1227937170/1228131968. There are 64 links with no patent number the list is available here Wikipedia:Link rot/Cases/Googlepatents in case you or anyone want to research. Optionally update that page with the patent numbers and I'll update wiki via bot. -- GreenC 16:54, 9 June 2024 (UTC)
The linkrot ones can probably be figured out by scraping a wayback page. E.g. the first one is here, from which we can find patent number 2612994, so the current google patent link would be https://patents.google.com/patent/US2612994. –jacobolus (t) 17:00, 9 June 2024 (UTC)
 Done - Checked 2,702 pages and edited 2,502 pages. Moved 2,891 links to a new URL. Removed 1 {{dead link}} templates. Added 256 {{dead link}}. Switched 3 |url-status=dead to live. Switched 11 |url-status=live to dead. Added 1,539 archive URLs (222 Wayback). Changed 2 citation metadata fields. (NOTE: these stats are outdated due to the archive roll back in a later pass, which removed 1,468 archive URLs) -- GreenC 16:54, 9 June 2024 (UTC)
Thanks! –jacobolus (t) 16:58, 9 June 2024 (UTC)

google.com/culturalinstitute

666 pages -- GreenC 20:56, 7 June 2024 (UTC)

 Done - Checked 675 pages and edited 670 pages. Moved 489 links to a new URL. Added 63 {{dead link}}. Switched 1 |url-status=dead to live. Switched 3 |url-status=live to dead. Added 166 archive URLs (163 Wayback). -- GreenC 00:50, 10 June 2024 (UTC)

google.com/finance

562 pages -- GreenC 20:59, 7 June 2024 (UTC)

 Done - Checked 562 pages and edited 513 pages. Converted 1 templates. Moved 365 links to a new URL. Added 40 {{dead link}}. Switched 7 |url-status=dead to live. Switched 6 |url-status=live to dead. Added 150 archive URLs (127 Wayback). Changed 1 citation metadata fields. -- GreenC 04:01, 10 June 2024 (UTC)

google.com/doodles

900 pages -- GreenC 21:40, 7 June 2024 (UTC)

 Done - Checked 907 pages and edited 904 pages. Moved 960 links to a new URL. Added 4 {{dead link}}. Switched 9 |url-status=dead to live. Added 21 archive URLs (10 Wayback). -- GreenC 04:20, 11 June 2024 (UTC)

chennaionline.com

The site is working properly per this, but there are still many pre-2020 dead links like this. Kailash29792 (talk) 09:52, 12 June 2024 (UTC)

893 pages -- GreenC 16:06, 13 June 2024 (UTC)
 Done - Checked 893 pages and edited 147 pages. Moved 8 links to a new URL. Added 19 {{dead link}}. Switched 49 |url-status=live to dead. Added 80 archive URLs (70 Wayback). Changed 24 citation metadata fields. -- GreenC 18:30, 13 June 2024 (UTC)

angelfire.com

This book, Hook, James; Franck, Dave; Austin, Steve (1982). An Aid to Collecting Selected Council Shoulder Patches with Valuation. has within it a link. Yes, I know it's from angelfire. Can I ask that: www.angelfire.com/tx6/patch/cspbook.html be swapped with scouttrader.org/csiguidebook.shtml? Thank you. --evrik (talk) 14:29, 13 June 2024 (UTC)

User:Evrik - if you don't mind, I'll use this request to process the entire angelfire.com domain which needs to be done anyway, checking for link rot. It will include a trap for https://www.angelfire.com/tx6/patch/cspbook.html to replace with https://scouttrader.org/csiguidebook.shtml .. it is in about 90 pages.
4,794 pages -- GreenC 15:59, 13 June 2024 (UTC)
It may http and not https --evrik (talk) 16:14, 13 June 2024 (UTC)
You know, the citation is for a book authored by Hook, Franck and Austin (1982). The source link is for a book published by Ellis, Jones and Austin (2003). The replacment link is for a book published by Austin and Keasey (2013). There are many editions and authors. Maybe more. If someone is citing the 1982 edition on page 52, and we change the link to the 2013 edition, it will be a wrong page number. I think this needs to be done with more care - or a consensus discussion. I don't want to be in the position of asked to undo the changes, which is time consuming. -- GreenC 01:50, 15 June 2024 (UTC)
 Done - Checked 4,890 pages and edited 4,317 pages. Moved 4,750 links to a new URL. Removed 1 {{dead link}} templates. Added 58 {{dead link}}. Switched 109 |url-status=dead to live. Switched 13 |url-status=live to dead. Added 379 archive URLs (339 Wayback). Changed 521 citation metadata fields. -- GreenC 01:50, 15 June 2024 (UTC)

mhc-macris.net

I didn't check very many of the pages here (first 2 out of 3,685) [6], but the current links are dead, and changing "Details.aspx?" to "details?" fixes them (with them redirecting to a version with a lower case ID). Another improvement is using https instead of http.

For instance in College of the Holy Cross, which as of right now citation 28's url is "http://mhc-macris.net/Details.aspx?MhcId=WOR.K" which just redirects to the mhc-macris home page, but if it's changed to "https://mhc-macris.net/details?MhcId=WOR.K" it redirects to "https://mhc-macris.net/details?mhcid=wor.k" which has the desired content.

There are another handful of URLs needed to be changed sprinkled amongst this search, but some of them are archive links. GrapesRock (talk) 20:52, 19 June 2024 (UTC)

OK. I'll check every link, in case there are any other soft-404 issues. 3,836 pages. thanks. -- GreenC 23:54, 19 June 2024 (UTC)
 Question: GrapesRock, I have run into a problem. The site is using a bot blocker system that I don't recognize and have tried various methods to get around unsuccessfully. The only way is a "blind move" ie. changing the URL without verifying the new URL exists and/or works. This is potentially dangerous because sites frequently do not migrate every URL to the new scheme. Another method is treat every URL containing "Details.aspx" as a dead link, and add an archive URL. It depends on how reliable the archive.org links are (they may have the same problem saving pages due to bot blocker) vs. how consistent the site was in migrating to the new scheme. If you want to manual spot check to see which of these methods looks better that would be helpful in deciding which course to take. I also emailed the site admins on the off chance they are willing to temporarily whitelist my IP. -- GreenC 17:48, 20 June 2024 (UTC)
I looked at the top 10 pages with Details.aspx in them, and everyone where I changed it to "details", it redirected to the correct page.
As for the archive, all the archives I've seen before July 7, 2022 are okay, but on that date and after it is inconsistently marked
So, probably if the only archive that exists is from July 7, 2022 or later, it shouldn't be used, but it should be safe to add any archive links from before then.
This attributes section seems to indicate that the mhcid does not change since its uniquely assigned. Also, the MACRIS home page says "Each historic property or area in the MACRIS database will have an MHC ID assigned to it. The MHC ID in Search Results is linked to a Details screen". This makes me think it's safe to change the URL as long as it's of the format "mhc-macris.net/Details.aspx?MhcId=[THE ID]" since the ID will still be in the MACRIS database and will be linked to a details screen GrapesRock (talk) 18:41, 20 June 2024 (UTC)
Alright I have it programmed for blind search-replace. I'll wait a till Friday, and see if they respond about the IP. Mean time can start on the other one below. -- GreenC 22:26, 20 June 2024 (UTC)
        
         newurl = "https://mhc-macris.net/Details.aspx?MhcId=WOR.K"
         if newurl ~ "mhc-macris[.]net/Details[.]aspx[?]MhcId[=]":
           subs("Details.aspx?", "details?", newurl)
           if match(newurl, "(?i)[?]MhcId=[^$]*[^$]*", d) > 0:
             subs(d, tolowerAscii(d), newurl)
         newurl == "https://mhc-macris.net/details?mhcid=wor.k"
 Done - Checked 3,720 pages and edited 3,716 pages. Moved 4,419 links to a new URL. Removed 36 {{dead link}} templates. Switched 39 |url-status=dead to live. Added 3 archive URLs (2 Wayback). Changed 22 citation metadata fields. -- GreenC 16:36, 22 June 2024 (UTC)
@GrapesRock@GreenC FWIW, I created {{MACRIS}} a few years back after they changed their url scheme for the second time in a short period. That allows all MACRIS links using the template to be updated with a single edit to the template. I'm not sure if it's worth mass conversion, but wanted to make you both aware. Pi.1415926535 (talk) 20:14, 22 June 2024 (UTC)
Those templates are OK the problem is they don't account for some percentage of links that were not migrated to the new URL scheme. It assumes all or nothing, in practice is rarely the case - some links get left behind becoming dead URLs. By using standard citation templates, bots like this can check the links and add archives or {{dead link}} tags on a per-URL basis. Otherwise the bot would need to be specially programmed for the custom template, and there are thousands of custom templates making it impractical. In this case, the site is bot protected so it really is all or nothing so the template (for now) is not a problem. -- GreenC 04:40, 23 June 2024 (UTC)

atlantaintownpaper.com

This source has been moved to roughdraftatlanta.com. For instance in George Floyd protests, currently there is the url https://atlantaintownpaper.com/2020/05/mayor-police-chief-denounce-anarchists-and-terrorists-who-destroyed-city-curfew-begins-at-9-p-m/ and the source has been moved to https://roughdraftatlanta.com/2020/05/30/mayor-police-chief-denounce-anarchists-and-terrorists-who-destroyed-city-curfew-begins-at-9-p-m/ (and there's no redirect).

Both

Redirect to the proper site GrapesRock (talk) 15:22, 20 June 2024 (UTC)

OK. This is a "Soft-redirect", where a page exists at a new URL but a redirect is missing (versus a soft-404, where the redirect exists but goes to a wrong page). I can fix Soft-redirects, when there is foreknowledge like you helpfully discovered. It also has a "Redirect" element so Soft-redirect --> Redirect --> Destination. 78 pages. -- GreenC 16:28, 20 June 2024 (UTC)
 Done - Checked 78 pages and edited 75 pages. Moved 79 links to a new URL. Added 1 {{dead link}}. Switched 13 |url-status=dead to live. Added 3 archive URLs (3 Wayback). -- GreenC 02:38, 21 June 2024 (UTC)
  • Soft-redirect rule: subs("atlantaintownpaper.com", "roughdraftatlanta.com", newurl)

Ooh, cool! Thanks for the explanation on a piece of terminology, it's always fun to learn new words/concepts (and of course thanks for moving all the stuff). GrapesRock (talk) 14:03, 21 June 2024 (UTC)

I made a glossary WP:LINKROT#Glossary of terminology it can get complicated. -- GreenC 15:51, 22 June 2024 (UTC)

clatl.com

Redirects to creativeloafing.com and soft-404s - 379 pages -- GreenC 14:38, 21 June 2024 (UTC)

 Done - Checked 379 pages and edited 365 pages. Moved 389 links to a new URL. Added 21 {{dead link}}. Switched 37 |url-status=dead to live. Switched 8 |url-status=live to dead. Added 108 archive URLs (84 Wayback). Changed 63 citation metadata fields. -- GreenC 15:47, 21 June 2024 (UTC)
  • Soft-404 rule by URL: If a redirect contains: (?i)(page[+]not[+]found|page%20not%20found)
  • Soft-404 rule by page title: If a page title contains: (?i)^[ ]*search([ ]*[|][ ]*Creative Loafing)?[ ]*$
  • Soft-404 rule by page content: If a page contains: Content is needed

stat.kg

The URL of the National Statistical Comitee of the Kyrgyz Republic changed from stat.kg to stat.gov.kg, everything else stayed the same. The links lead to 404, e. g. in Chaek MarcelloIV (talk) 11:04, 24 June 2024 (UTC)

780 pages -- GreenC 01:35, 25 June 2024 (UTC)
 Done Checked 780 pages and edited 769 pages. Moved 800 links to a new URL. Added 1 {{dead link}}. Switched 7 |url-status=dead to live. Added 3 archive URLs (3 Wayback). Changed 19 citation metadata fields.