Wikipedia:Link rot/URL change requests/Archives/2020/February
This is an archive of past discussions on Wikipedia:Link rot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current main page. |
kodak-worldREMOVETHIS.com now hosts malware
Kodak-worldREMOVETHIS.com was changed to www.officialkodakblack.com but URLs within the site do not necessarily map cleanly. The old URL now hosts malware (Signpost coverage, "Beware of malware", screen shot from Kaspersky).
Please change http://kodak-world.REMOVETHIScom/?page_id=24 (Biography of Kodak Black) to https://web.archive.org/web/20170103124913/http://kodak-world.com?page_id=24 and change the main URL where it appears by itself (such as in "Official web site" links) to www.officialkodakblack.com. Change any other uses to a non-recent/non-poison version on https://web.archive.org or a similar archive site or on www.officialkodakblak.com if it exists, and flag the rest for manual handling.
I found only a few instances of this in a manual sweep of Kodak Black articles in 14 languages so this task may already be complete. ru:Kodak Black, uk:Kodak Black, and fr:Kodak Black are now clean. However, we do need to scan the entire project for other instances of the poisoned web site. Previous discussion which pointed me here is at Wikipedia:Village_pump_(technical)#Should we be checking for links to the Shlayer trojan horse?(permalink). davidwr/(talk)/(contribs) 15:13, 31 January 2020 (UTC)
- @Davidwr: It exists in one article. This page is for custom bot (programming) help, like 100s or 1000s. -- GreenC 16:19, 31 January 2020 (UTC)
- Thanks GreenC. I don't know how I missed the English version. In any case, is there an easy way to request that the entire wikimedia/wikipedia space, across all languages and projects, be scanned for this URL? More generally, is there an easy way to do a wikimedia/wikipedia-wide scan of URLs that are currently "toxic"? davidwr/(talk)/(contribs) 17:05, 31 January 2020 (UTC)
- As for scanning all 300+ language wikis, this Google search has some results, though it is missing the Enwiki so may not be complete. It would be a good question for Village Pump Tech as there might be a tool for searching across all languages. -- GreenC 21:56, 31 January 2020 (UTC)
- Thanks GreenC. I don't know how I missed the English version. In any case, is there an easy way to request that the entire wikimedia/wikipedia space, across all languages and projects, be scanned for this URL? More generally, is there an easy way to do a wikimedia/wikipedia-wide scan of URLs that are currently "toxic"? davidwr/(talk)/(contribs) 17:05, 31 January 2020 (UTC)
U.S. Census Bureau domain factfinder.census.gov shutting down on 31 March 2020
The domain
factfinder.census.gov
will be taken offline on 31 March 2020.
As per https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml:
- Most data previously released on AFF are now being released on the U.S. Census Bureau's new dissemination platform, data.census.gov. For more information about the transition from American FactFinder to data.census.gov, see Transition From AFF. Included on this page are information on historic AFF data, documentation on updating AFF links, and resource materials, including tutorials, webinars, and how-tos on using data.census.gov. If you have questions or comments, please email: cedsci.feedback@census.gov.
There are over 4,600 Wikipedia articles directly referencing this domain, as well as several templates that reference the domain. However, there are over 40,000 Wikipedia articles that use these templates. — Preceding unsigned comment added by Fabrickator (talk • contribs)
- This is hugely important, but also hugely complex, plus most links are hidden inside custom templates to parse. There are two ways to approach it. 1) unwind the templates by converting them to
{{cite web}}
and treat them as dead links and add an archive URL, or 2) find the corresponding new URL at data.census.gov .. the problem with technique #1 is the FactFinder site uses web 2.0 type stuff that Wayback Machine has trouble archiving so won't be much help. Archive.today does better but most of the links are not saved. For #2, this is the ideal solution, but mapping URLs between old and new site looks very complicated. There are two documents (ominously 2 20-page "deep linking guide"), one for the old site and new site - the trick is to learn how to map between them and write software that can do it. -- GreenC 20:47, 8 February 2020 (UTC)
Discussion moved to WP:USCENSUS -- GreenC 03:31, 12 February 2020 (UTC)
- Second shortcut WP:USCENSUSLINKS created, USCENSUS is a confusing name for shortcut, will discuss on Wikipedia talk:US Census Migration shortly. davidwr/(talk)/(contribs) 19:16, 12 February 2020 (UTC)
rpc.ift.org.mx
Technical and legal authorizations from the Mexican Federal Telecommunications Institute's Registro Público de Concesiones (RPC) are cited in hundreds of articles about Mexican broadcasting. There are 1,290 citations from the domain rpc.ift.org.mx which hosts the PDF documents.
On January 31, 2020, the RPC changed to begin serving HTTPS only. In addition, they added a "v" to the URL, so URLs that were formerly
http://rpc.ift.org.mx/rpc/pdfs/96255_181211120729_7489.pdf
changed to
https://rpc.ift.org.mx/vrpc/pdfs/96255_181211120729_7489.pdf
This will particularly be needed for Mexican radio and TV articles, as well as the lists that use them on eswiki (such as es:Anexo:Estaciones de radio en el estado de Michoacán). I am doing some high-link-count articles, like Imagen Televisión, manually. Raymie (t • c) 02:13, 9 February 2020 (UTC)
- I've done the above, so we've gone from 1,290 links out to 560 that need repair. Raymie (t • c) 03:27, 9 February 2020 (UTC)
- @Raymie: done in 430 articles. -- GreenC 05:03, 12 February 2020 (UTC)
- Thank you GreenC for carrying out this continually important work for the project. Raymie (t • c) 06:01, 12 February 2020 (UTC)
Thank you, @Raymie:. Comments like that help to keep going. In case you want to pursue it further there are 57 articles on eswiki with the links (listed). My bot doesn't have permissions there. Or we could make a bot request at [1] but I don't speak Spanish (well). -- GreenC 15:28, 12 February 2020 (UTC)
Extended content
|
---|
|
blackwell-synergy.com and gaylesbiantimes.com
These previously-reputable domains were semi-recently replaced with spam and other nasty content. Blackwell-synergy.com has already been marked as dead in IABot, but I do not believe gaylesbiantimes.com has been. Both need to have |url_status=usurped
set as they are not fit to be linked to. --AntiCompositeNumber (talk) 04:59, 13 November 2019 (UTC)
- Yup, usurped. Blackwell has a lot of links too. I've set GLT to Blacklisted in IABot for now until I can start on this project. -- GreenC 14:34, 13 November 2019 (UTC)
- 4453 globally at the moment, if you're curious. And that's after the global cleanup effort. --AntiCompositeNumber (talk) 15:39, 13 November 2019 (UTC)
- @AntiCompositeNumber: What do you suggest to do with the Blackwell links: 1. try to convert them to doi.org URLs, or 2. treat them as dead links, set to "usurped" and add an archive if avaiable? Or Step 1 and if not then Step 2? For Step 2, there is the possibility no archive can be found and the link exists outside a CS1|2 template in which case it would normally add a
{{dead link}}
but the spam link then is still clickable. There was talk about creating a new template called{{usurped}}
where these free-floating usurped links could be embedded so they don't display but nothing has happened. -- GreenC 16:46, 13 November 2019 (UTC)- @GreenC:The best option is to convert Blackwell links in citation templates to
|doi=
and covert bare links to{{DOI}}
. When that can't be done (say, because they're used in a labeled link, or because that would take a lot of development effort), doi.org links are the best option for an automated fix. If there's no valid DOI and no valid archive, tagging dead and moving on is the best option at the moment. Where we go from there would depend on how many are unfixable. If it's less than ~100, humans can review the links and take appropriate action. --AntiCompositeNumber (talk) 17:05, 13 November 2019 (UTC)
- @GreenC:The best option is to convert Blackwell links in citation templates to
- @AntiCompositeNumber: What do you suggest to do with the Blackwell links: 1. try to convert them to doi.org URLs, or 2. treat them as dead links, set to "usurped" and add an archive if avaiable? Or Step 1 and if not then Step 2? For Step 2, there is the possibility no archive can be found and the link exists outside a CS1|2 template in which case it would normally add a
- 4453 globally at the moment, if you're curious. And that's after the global cleanup effort. --AntiCompositeNumber (talk) 15:39, 13 November 2019 (UTC)
@AntiCompositeNumber: the bot ran for Blackwell and it basically eliminated the domain from mainspace. Replacing the url with |doi=
or doi.org (examples: [2][3][4]) .. It can't detect {{doi}}
so there are a few duplicates ([5]), and in a few cases cite templates ended up with both a doi.org URL and |doi=
. It edited about 550 pages. The spam filters won't allow addition of new archive URLs, for one reason or another the bot couldn't do some things, these remaining pages have a Blackwell domain that need manual attention:
I'll take a look at GLT next. -- GreenC 16:41, 23 November 2019 (UTC)
- @GreenC: Thanks. I've manually fixed those articles. --AntiCompositeNumber (talk) 21:04, 8 December 2019 (UTC)
@AntiCompositeNumber: - GayLesbianTimes.com is only in 76 mainspace articles so I set them manually - either with |url-status=usurped
or for square and bare links that have a {{webarchive}}
moving the archive URL into the square-barelink (example). Those without an archive URL had to be deleted and replaced with a non-URL citation. There are still links in non-mainspace, maybe they should just be blanked with a quick search-replace script unless someone wants to manually fix, it's not possible to add new archive URLs because of a blacklist filter. -- GreenC 01:38, 14 February 2020 (UTC)
comicbookdb.com shutting down on 16 December 2019
Web site comicbookdb.com has announced that it is shutting down as of 16 December 2019.
English-language Wikipedia has about 4,500 articles which include links to comicbookdb.com (mostly using the "comicbookdb" template).
- Most of these pages appear to be available on the Wayback machine.
- Some pages on comicbookdb.com are restricted, meaning you have to get a login, which is currently done easily, but these pages will not be available on the Wayback machine.
- The web site has a number of directory pages (archive of home page at http://web.archive.org/web/20191119005613/http://comicbookdb.com/) , such as list of creators at http://comicbookdb.com/browse.php?search=Creator), but the content of these archived pages does not seem to ever render (e.g. http://web.archive.org/web/20170322201042/http://comicbookdb.com/browse.php?search=Creator)
Fabrickator (talk) 17:53, 20 November 2019 (UTC)
- Looks like someone added "one-size-fits-all" archive to the template. Hard to know how many actually fit, better than nothing. Ideally a bot would convert the templates to
{{cite web}}
with|archive-url=
so the bots can search for custom fit archives on a per-link basis. -- GreenC 04:43, 14 February 2020 (UTC)
springerlink.com
Since a month or two ago, springerlink.com has stopped working. Now all 3500 links from articles are a 404 like this, served by a supposed "UltraDNS client redirection service" with "Copyright © 2001-2008 NeuStar".
The good news is that a request to the Internet Archive can reveal the current location, for instance [6] redirects to [7] (and then [8] which can be ignored). Because the new URLs contain the DOI, they can then be translated in a more permanent doi.org URL. Nemo 08:17, 6 February 2020 (UTC)
- Worth a shot see what archive.org returns if something make the change. The hardest part will be "Springer <whatever>" text that can appear in the title, work, publisher fields and square brackets or free floating text inside/outside a ref. Will start in on this next. -- GreenC 05:13, 12 February 2020 (UTC)
- Nemo following the example the URL is https://doi.org/10.1007%2Fs12132-009-9048-y which redirects to link.springer.com .. it looks like they replaced springerlink.com with link.springer.com .. I'll leave the metadata stuff alone since it ends up at Springer anyway, just replace the springerlink.com URLs to doi.org where possible. -- GreenC 15:47, 12 February 2020 (UTC)
- Yes, changing the URL should be enough. One could replace springerlink.com + whatever with link.springer.com + DOI, but while we're at it better use the doi.org resolver so we don't have to do this again in 5 or 10 years from now. Nemo 19:25, 12 February 2020 (UTC)
- OK, after some testing it seems adding a doi.org url when an existing
|doi=
has the same DOI, so in those cases the net effect will be deletion of|url=
field (or|chapter-url=
or wherever). -- GreenC 21:10, 12 February 2020 (UTC)
- OK, after some testing it seems adding a doi.org url when an existing
- That's fine! Citation bot can then easily finish the job. (Let me know if you're interested in running it yourself on those pages and you can use tips on how to do so.) Nemo 21:24, 12 February 2020 (UTC)
- Done (i hope). Saved about 4,071 links. This includes deletions when the
|doi=
already exists. Another 1,000 archive URL additions when no DOI could be found. Archive URL removals when a doi.org could be found. Added [dead link ] when no archive or doi discovered. Operations on CS1|2 templates, square and bare links; and in Mainspace, File:, Wikipedia: and Template:. -- GreenC 21:25, 13 February 2020 (UTC)- Thank you! I think it might also be worth stripping the archive.today rewrites that I see surviving, for instance https://archive.today/20130202224654/http://www.springerlink.com/content/q134n458307w0125 which could more usefully updated. Some annoying variants like [9] [10] survive: is that because they're not in templates? In templates, springerlink.com URL may be removed if a DOI is present (but this part could be handled by citation bot if your bot can't). A few hundreds links for ISSN and ISBN codes, like [11] or [12], are less than useful too. Nemo 09:38, 14 February 2020 (UTC)
- Yes, if not in templates then it doesn't have much option but to archive it because the other option is to delete the URL and it can't be done safely since it could create smoking craters. The "Minskey moment" diff looks like an oversight in the code, but you are right citation bot should pick those up in time. The ISSN and ISBN hard to say without seeing them in context why they were kept. -- GreenC 04:30, 18 February 2020 (UTC)
- The ISSN are usually ancient batch additions which serve no purpose whatsoever because there's usually another link to the current homepage, plus there's always a link via ISSN or (for articles) other identifiers. Some were links to an RSS function which no longer exists. I've removed them now (some remain in Wikidata, hopefully will be taken care of). Nemo 08:06, 18 February 2020 (UTC)
- Yes, if not in templates then it doesn't have much option but to archive it because the other option is to delete the URL and it can't be done safely since it could create smoking craters. The "Minskey moment" diff looks like an oversight in the code, but you are right citation bot should pick those up in time. The ISSN and ISBN hard to say without seeing them in context why they were kept. -- GreenC 04:30, 18 February 2020 (UTC)
Request for change of (soon to be) broken links to LPSN
(thread moved from WP:BOTREQ by GreenC)
The old LPSN website at http://www.bacterio.net is frequently linked to from Wikipedia. Many of these links target LPSN entries for species. Because all species belong to a genus and because LPSN uses one HTML page per genus name, links to LPSN species names are links to anchors within an LPSN page for the according genus name. For instance, on https://en.wikipedia.org/wiki/Acetobacter_aceti we find the link http://www.bacterio.net/acetobacter.html#aceti to the old LPSN page.
As part of an agreement between the old LPSN maintainer, Aidan C. Parte, and the Leibniz Institute DSMZ, LPSN has been taken over by DSMZ to ensure long-term maintenance (see also announcement here). In the course of this takeover, a new website was created. In contrast to the old LPSN website, the new LPSN website at https://lpsn.dsmz.de (currently https://lpsn-dev.dsmz.de) uses individual pages for species names. We will employ the following mapping:
(1) the domain http://www.bacterio.net is permanently redirected to https://lpsn.dsmz.de;
(2) the page address acetobacter.html is mapped to genus/acetobacter, which is the page for the genus Acetobacter on the new LPSN website.
This means, however, that http://www.bacterio.net/acetobacter.html#aceti is mapped to https://lpsn.dsmz.de/genus/acetobacter and not to https://lpsn.dsmz.de/species/acetobacter-aceti, which is the page for the species on the new LPSN website, as it should be. The reason for this limitation is that the anchor aceti is not even transferred by the browser and thus cannot be processed by the website. While links on https://lpsn.dsmz.de/genus/acetobacter are present that lead to https://lpsn.dsmz.de/species/acetobacter-aceti, it would be more convenient for the user if http://www.bacterio.net/acetobacter.html#aceti was transferred to a link that leads directly to https://lpsn.dsmz.de/species/acetobacter-aceti.
As LPSN URLs are stored in Wikidata (LPSN), this change should be doable task with the help of a bot. Therefore we are kindly asking for help to accordingly modify all Wikipedia links to LPSN species pages as described above. Tobias1984: you did a great job in the past, helping us with BacDive: Is there a chance that you help us again with this issue? --L.C.Reimer
@L.C.Reimer: I can help with this but wanted to get the request moved to the right place. -- GreenC 03:27, 14 February 2020 (UTC)
- L.C.Reimer -- When would https://lpsn.dsmz.de be ready for the change? Seeing about 13,000 links. -- GreenC 04:18, 14 February 2020 (UTC)
@GreenC: We would appreciate your help very much. We will launch the new site and activate the redirect beginning next week. I will give here a note, when it is done.--L.C.Reimer
- This is a very useful and thoughtful request for URL update, but I'd like to note that it ought to be possible for the target website to redirect the requests based on the fragment, if you use JavaScript. MediaWiki for instance rewrites some of its URLs when you're redirected. Nemo 09:43, 14 February 2020 (UTC)
- Nemo thank you for the hint. We just discussed this solution, but this would mean another redirect and we already have 2 redirects. We believe this would negatively affect SEO. However, clean links are favorable and I hope by the aid of GreenC we are able to clean up and maintain these. So, we just launched the new site and the redirects are now active. This means we could start with the bot. @GreenC:: eventually we should discuss the details directly?--L.C.Reimer
- L.C.Reimer, on closer look there are two types of links on Wikipedia. For example in Yersinia aldovae there are two links to bacterio.net .. in the "External links" section which is a normal type of URL directly in the page. The other in the bottom graphic labeled "Taxon identifiers". This is the template
{{taxonbar}}
which pulls the URL from Wikidata. I am able to fix the first type, but not the second. For Wikidata requests you could try [13]. The other problem my processes only update English Wikipedia (and Commons) and since there are about 300 language wikis it presents a challenge to make Wikipedia-wide changes as each wiki language is its own organization where permissions and tools customized for that language are secured eg. ar.wikipedia.org requires tools customized for Arabic language and permissions from the Arabic community to make these changes with a bot. I would suggest, if you are able, to create and maintain redirects. Nevertheless, if you would like to convert the in-wiki links on Enwiki I can do that. -- GreenC 23:23, 18 February 2020 (UTC)- On Enwiki, there are 6,487 links in 6,386 articles that might be converted. The rest are imported from Wikidata via templates like
{{taxonbar}}
. -- GreenC 00:53, 19 February 2020 (UTC)
- On Enwiki, there are 6,487 links in 6,386 articles that might be converted. The rest are imported from Wikidata via templates like
- GreenC Thank you for the explanations. We would be happy, if you could convert the links in Enwiki. We will deal with the links in wikidata separately, as we want to make sure to have clean URLs for future entries anyway. Regarding all the other language wikis we will have a closer look, what we can do.--L.C.Reimer
L.C.Reimer, a couple new issues.
- 1. In this list, there are some links that 404:
http://www.bacterio.net/a/acetoanaerobium.html
has an extra "/a/" in the path (there is "/m/" and other letters). Some links have a leading "-" likehttp://www.bacterio.net/-number.html
. I guess for now it will verify the new URL is working with a header check before making the change or otherwise leave as-is, these look like low volume exceptions.
- For "/a/" it seems that simply removing it works; so
http://www.bacterio.net/a/acetoanaerobium.html
-->http://www.bacterio.net/acetoanaerobium.html
-->https://lpsn.dsmz.de/genus/acetoanaerobium
. -- GreenC 20:01, 19 February 2020 (UTC)
- For "/a/" it seems that simply removing it works; so
- 2. There are links that redirect to an "/order/" so for example http://www.bacterio.net/bacillales.html --> https://lpsn.dsmz.de/order/bacillales .. The only way to determine is by looking at the header for http://www.bacterio.net/bacillales.html which looks like:
Extended content
|
---|
HTTP/1.1 301 Moved Permanently Date: Wed, 19 Feb 2020 18:32:22 GMT Server: Apache Location: https://lpsn.dsmz.de/bacillales.html Content-Length: 244 Content-Type: text/html; charset=iso-8859-1 Via: 1.1 varnish (Varnish/6.3), 1.1 varnish (Varnish/6.3) X-Cache-Hits: 0 X-Cache: MISS Age: 0 Connection: keep-alive HTTP/1.1 301 Moved Permanently Date: Wed, 19 Feb 2020 18:32:23 GMT Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1c mod_fcgid/2.3.9 X-Powered-By: PHP/7.3.5 Location: /order/bacillales Content-Length: 0 Content-Type: text/html; charset=UTF-8 HTTP/1.1 200 OK Date: Wed, 19 Feb 2020 18:32:23 GMT Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1c mod_fcgid/2.3.9 X-Powered-By: PHP/7.3.5 Vary: Accept-Encoding Transfer-Encoding: chunked Content-Type: text/html; charset=UTF-8 |
The second Location: line contains /order/bacillales
which is added onto the domain name found in the first Location line. There are probably other paths besides /order/ we don't know about yet. -- GreenC 19:44, 19 February 2020 (UTC)
Results
@L.C.Reimer: The bot has completed. It converted 11,355 links in 5,718 articles (the previous link count of 6,487 is incorrect.) All links were tested as working (header status code 200). Some typical diffs:
It was unable to convert 1,240 links because the new URL doesn't work (header status 404). Can provide a list of those if you want, most of them appear to be related to Streptomyces. -- GreenC 02:29, 20 February 2020 (UTC)
www.bacterio.cict.fr
Found these: [18] -- GreenC 14:47, 20 February 2020 (UTC)
- It converted 371 links in 343 articles. Examples: [19][20]. It was unable to convert 260 links, a list of these available on request. -- GreenC 15:32, 20 February 2020 (UTC)