Wikipedia:Link rot/URL change requests/Archives/2023/September
This is an archive of past discussions on Wikipedia:Link rot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current main page. |
Kailash29792
support this url Chaya20 (talk) 01:04, 1 September 2023 (UTC)
- /* Kailash29792 */ Reply Chaya20 (talk) 01:05, 1 September 2023 (UTC)
Many old links don't redirect to their new versions. Like this doesn't take us to this. Kailash29792 (talk) 09:28, 29 August 2023 (UTC)
- Is the correct new URL this? I found it by going to the wayback machine for the old link, and found an old redirect there going to this new link. What happens is the website creates a redirect at first, then fails to maintain it over time the old page turns into a 404. The WaybackMachine has a record of the deleted redirect, but retrieving it is another matter. I'm investigating. GreenC 15:28, 29 August 2023 (UTC)
- This retrieves the redirect via the Wayback Machine:
curl -ILs 'http://web.archive.org/web/2id_/https://www.filmcompanion.in/madhumati-dil-tadap-tadap-ke-keh-raha-hai-song-inspired-by-18-century' | awk '/^[ ]*[Ll]ocation:/{sub("^[ ]*[Ll]ocation:[ ]*https?://web[.]archive[.]org/web/[0-9]{14}id_/", "", $0); a[++i]=$0}END{print a[i]}'
- Output: https://www.filmcompanion.in/music/madhumatis-dil-tadap-tadap-ke-kah-raha-was-inspired-by-an-18th-century-song/
- -- GreenC 00:58, 30 August 2023 (UTC)
- Yes. Perhaps this was an erroneous title that the site later fixed, although Wayback had already archived it by then. Kailash29792 (talk) 08:30, 30 August 2023 (UTC)
- This retrieves the redirect via the Wayback Machine:
User:Kailash29792: it has been processed. Example. Note in this example the source URL https://www.filmcompanion.in/mami-2018-soni-director-ivan-ayr-interview is a hard 404 with no redirect info. I was able to find an old now-deleted redirect in the Wayback Machine which pointed to a working page [1]. This is nifty and took a while to figure out. Now I know how to do it, and can reapply it to other domains in the future, that have deleted redirects. -- GreenC 15:59, 2 September 2023 (UTC)
Purging all mainspace links to fmg.ac/Projects/MedLands
Hi, I'd like to request that all mainspace links to fmg.ac/Projects/MedLands be removed, because consensus was reached it is an unreliable source, but it's impractical to manually remove all 1,300+ links. More explanation is at Wikipedia:Bot requests#Erasing all links to fmg.ac/Projects/MedLands from mainspace. Thanks! Nederlandse Leeuw (talk) 21:56, 2 June 2023 (UTC)
- User:Nederlandse Leeuw, is this to remove links within citations (keeping the rest of the citation); or, remove complete citations including the surrounding ref tags? -- GreenC 00:05, 3 June 2023 (UTC)
- The latter. Nederlandse Leeuw (talk) 03:12, 3 June 2023 (UTC)
- "Terminate with extreme prejudice". I'll start on it soon. -- GreenC 03:49, 3 June 2023 (UTC)
- Thanks! Nederlandse Leeuw (talk) 04:33, 3 June 2023 (UTC)
- I'll probably be able to get most of them automated (with programming) but there will be some that can't be automated, which I hope you are or someone else can manually remove to finish it with more refined work. Wiki has endless ways to do things, I can't program for every possibility, or sometimes it can't be done safely. -- GreenC 13:36, 3 June 2023 (UTC)
- Please ping me once the automated process has finished, I'm happy to help with the final clean up. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 20:04, 3 June 2023 (UTC)
- I'll probably be able to get most of them automated (with programming) but there will be some that can't be automated, which I hope you are or someone else can manually remove to finish it with more refined work. Wiki has endless ways to do things, I can't program for every possibility, or sometimes it can't be done safely. -- GreenC 13:36, 3 June 2023 (UTC)
- Thanks! Nederlandse Leeuw (talk) 04:33, 3 June 2023 (UTC)
- "Terminate with extreme prejudice". I'll start on it soon. -- GreenC 03:49, 3 June 2023 (UTC)
- The latter. Nederlandse Leeuw (talk) 03:12, 3 June 2023 (UTC)
ActivelyDisinterested and Nederlandse Leeuw: The user User:Roelof Hendrickx is systematically reverting the work to remove the fmg.ac references. IMO the correct action is to remove the text the references were citing, not to restore an unreliable source. I edited about 300 of 1000 pages. I won't do anything further until this is resolved. Good luck. -- GreenC 00:46, 5 June 2023 (UTC)
- They are also edit warring now, I can't continue while there is a dispute. Suggest either convince the editor, or an RfC/RSN resolution to settle the issue. Let me know when you are ready to continue! -- GreenC 00:55, 5 June 2023 (UTC)
- Of course I am reverting the edits you have made. Those edits are demolishing articles, deleting valuable references and footnotes and external links. Roelof Hendrickx (talk) 01:15, 5 June 2023 (UTC)
- Keeping unreliable sources is, arguably, "demolishing" those articles. You have to make the argument why not, respond to the points made in the discussion at Wikipedia:Reliable_sources/Noticeboard/Archive_405#fmg.ac_(Foundation_for_Medieval_Genealogy). -- GreenC 03:22, 5 June 2023 (UTC)
- Under the header External links there are no sources, just external links. So they shouldn't be deleted, that's censoring. Furthermore the bot removed references but retained the texts. So texts with sources have become unsourced now, does that make Wikipedia better? And finally, a bot should be tested before being used. Which it wasn't as it sometimes only removed part of the link in footnotes and left a mess of the texts. Roelof Hendrickx (talk) 08:52, 5 June 2023 (UTC)
- I cannot reply in an archived discussion. A discussion for which the users that have used the website as source for information were not invited. Then it's easy to reach "consensus". But a so-called consensus between persons with the same biased POV is worthless imho. Roelof Hendrickx (talk) 08:57, 5 June 2023 (UTC)
- @Roelof Hendrickx you can always unarchive the discussion and raise the points you have made. – robertsky (talk) 09:03, 5 June 2023 (UTC)
- I don't know how, and I don't care anymore. This isn't the first problem I have with the way it works here. I have enough of it all. I'm leaving and will not contribute anymore. So go on with destroying articles with bots that don't work well and keep on censoring external links. You have my blessing. Roelof Hendrickx (talk) 09:09, 5 June 2023 (UTC)
- @Roelof Hendrickx you can always unarchive the discussion and raise the points you have made. – robertsky (talk) 09:03, 5 June 2023 (UTC)
- Keeping unreliable sources is, arguably, "demolishing" those articles. You have to make the argument why not, respond to the points made in the discussion at Wikipedia:Reliable_sources/Noticeboard/Archive_405#fmg.ac_(Foundation_for_Medieval_Genealogy). -- GreenC 03:22, 5 June 2023 (UTC)
- Of course I am reverting the edits you have made. Those edits are demolishing articles, deleting valuable references and footnotes and external links. Roelof Hendrickx (talk) 01:15, 5 June 2023 (UTC)
- The editor has retired with the message
I will not accept any responsibility for the reliability of the information in the articles I have edited when those articles have been altered by another user of Wikipedia
, which seems to fundamentally misunderstand how Wikipedia works, and previously edit warred with citation bot trying to stop it from replacing curly apostrophes per MOS:CURLY. If the had wished to discuss the matter I would have suggested halting, but as they don't I believe this should continue. As to the question of external links the argument against MedLands is that it's interpretation of primary sources leaves a lot to be desired, to put it mildly, and per WP:ELNO #2 were better off without these links. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 10:11, 5 June 2023 (UTC)- No, I now know exactly how Wikipedia works. That's the reason why I decided to retire, because the way Wikipedia works has nothing to do with being an encyclopedia. An encyclopedia doesn't use censorship and it doesn't use a manual of style that forces users to makes texts less readable for readers. Neither does it delete references while retaining the text, nor does it uses bots that cripple texts.
- As for Medieval Lands, it's a source just as reliable as other secondary sources. It has its errors, mistakes and typos, just as other sources. All sources should be used with caution. One has to know who's the author, what's the objective of the source, where does the information come from, and are the sources known. When a user writes or edits an article using multiple sources including Medieval Lands, and shows that he/she is interpreting the sources in a scientific way, imho one should try to talk to that user first before deleting information and references. I have used Medieval Lands for articles about members of the House of Nassau, and that part of that website is more reliable than Europäische Stammtafeln and also more reliable than Wikipedia.
- I regret ever starting contributing to Wikipedia, it has proven to be a complete waste of my time and energy. Roelof Hendrickx (talk) 10:56, 5 June 2023 (UTC)
- Maybe you could have taken part in the multiple discussion over the last decade that have shown the many issues with Medlands, and that such issues go unresolved. Those have included that if MedLands uses sources that are reliable then use those instead of MedLands. If you wish to continue the discussion I suggest opening a thread at WP:RSN. As with curly apostrophes, if you disagree with community standards the solution is to open discussions about them not edit war. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 11:54, 5 June 2023 (UTC)
- I started editing articles here 2 years ago. I was not aware of any discussion on Medieval Lands until the changes by the bot. As I mentioned above, as a user who used this website to edit articles, I would have thought it would have been neat if I had been invited to the discussion. New users don't automatically become familiar with ongoing discussions. But by now, I am no longer interested in continuing the discussion. I am done with Wikipedia. Roelof Hendrickx (talk) 14:13, 5 June 2023 (UTC)
- Hi everyone, I hadn't been active on Wikipedia today until now, so I had some catching up to do. I am quite perplexed by the reactions and responses made by Roelof Hendrickx towards the virtually unanimous consensus that fmg.ac/Projects/MedLands
- The claim that these are valuable references and footnotes and external links had already been rejected by community consensus. Deleting unreliable sources has nothing to do with WP:CENSOR:
Content will be removed if it is judged to violate Wikipedia's policies
, in this case WP:RS. - Hendrickx: texts with sources have become unsourced now, does that make Wikipedia better? I have argued (successfully) at RSN that I'd rather move from unreliably-sourced to unsourced statements; readers will treat the latter with more skepticism, and editors will be more motivated to fix the problem. The statements aren't necessarily untrue just because Cawley made them; we just want to give the opportunity to editors to find reliable sources supporting the same statements instead rather than continuing to rely on a source that has been known for over 11 years to be repeatedly unreliable. Because: Every day Cawley stays up across c. 700 articles (576 through the template, 123 outside the template minus the c. 10 that I manually purged already), we are potentially misleading more readers into a false sense of security that certain claims made by Cawley are somewhat reliable, and giving the impression that only a "better source (is) needed" to what Cawley has already "proven". (..) I don't want future readers to be misled.
- Hendrickx has also shown a (regrettable) unwillingness to appeal the process and seek a solution within our policies and guidelines, even when offered to do so by reopening the RSN, or other options. As a relatively new user, he may indeed not have been familiar with the policies and guidelines which we have, let alone ongoing discussions about particular sources (which have been had many times in the RSN archives, as I also found it). But he does have an obligation to find out or check those that are or may be relevant to the kind of edits he wants to do. Since his first edit on 9 June 2021, he has had plenty of opportunity to read the relevant policies and guidelines, ask questions about anything he didn't understand, or question any rule which did not make sense to him. (Which is entirely possible, as rules are changed, modified and refined all the time. Wikipedia wasn't developed overnight, and its policies and guidelines aren't set in stone, although obviously some policies and guidelines are more strongly accepted and important than others).
- If he hasn't read or understood WP:RS, WP:CENSOR or other relevant policies and guidelines before making edits, that has been at his own risk, and he only has his own carelessness to blame. Especially if his edits have been so incredibly dependent on this one single - and as it turns out very unreliable - source, that seeing it purged leads him to lose all willingness to participate in the project anymore. Again, here the community cannot be held responsible for the poor editing decisions Hendrickx has made about how sustainable those edits might be in the long term if they are based on an unreliable source. Every editor knows, or at least should know, that Wikipedia isn't a free-for-all, and one cannot ignore existing rules at one's pleasure. Especially the edit-warring is something he should know (WP:EW) would not help with what he wanted to do. I'm afraid there is nothing more that we can do for Hendrickx. We've given him all the chances, but if he retires on his own accord after having misunderstood how Wikipedia works, that's all we can do. Nederlandse Leeuw (talk) 14:39, 5 June 2023 (UTC)
- Every chance? Yeah right, keep on dreaming. Indeed, I retire on my own accord. And that's due to Wikipedia's editors, not to me. Roelof Hendrickx (talk) 17:42, 5 June 2023 (UTC)
- Unfortunately it's not possible to invite people to such discussions, because it's not possible to determu which editors to invite. Discussions have shown a myriad of issue with MedLands, I can understand this is annoying as it lays out a lot of details in a very simple manner. The only suggestion I have is using it for study, and then going from there to the sources it uses (much in the same way that Wikipedia is used by many people). With all that said unless you willing to come to WP:RSN and convince editors this is a reliable source there's little more to say. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 15:23, 5 June 2023 (UTC)
- Well said. Nederlandse Leeuw (talk) 16:58, 5 June 2023 (UTC)
- Not possible to invite people to such discussions, but it is possible to find the articles where the said website is mentioned. Right. It says enough about willingness to new users. Roelof Hendrickx (talk) 17:43, 5 June 2023 (UTC)
- But then there are still dozens, I not hundreds of editors for some articles, many if not most of who have no interest. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 18:27, 5 June 2023 (UTC)
- I started editing articles here 2 years ago. I was not aware of any discussion on Medieval Lands until the changes by the bot. As I mentioned above, as a user who used this website to edit articles, I would have thought it would have been neat if I had been invited to the discussion. New users don't automatically become familiar with ongoing discussions. But by now, I am no longer interested in continuing the discussion. I am done with Wikipedia. Roelof Hendrickx (talk) 14:13, 5 June 2023 (UTC)
- Maybe you could have taken part in the multiple discussion over the last decade that have shown the many issues with Medlands, and that such issues go unresolved. Those have included that if MedLands uses sources that are reliable then use those instead of MedLands. If you wish to continue the discussion I suggest opening a thread at WP:RSN. As with curly apostrophes, if you disagree with community standards the solution is to open discussions about them not edit war. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 11:54, 5 June 2023 (UTC)
- As a separate comment I don't believe that saying the bot was making a mess is a fair description. GreenC was only doing this work only because they were asked to, and had been checking their work as it went along. As with WP:NOTVANDAL it's important to not describe edits you believe are mistaken in that way. They may have been removing links you believe are useful, but that is something to discuss with myself, Nederlandse Leeuw, and the other editors on RSN. Rather than a reason to denigrate an editor who was only editting in good faith. It's important to remember to assume good faith and that other editors are only acting to help improve the project. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 15:32, 5 June 2023 (UTC)
- I fully agree. Nothing GreenC or their bot did was inappropriate. And, since Hendrickx has indicated to longer be interested in explaining or otherwise contributing to Wikipedia anymore at all, the bot may now resume the process. Cheers, Nederlandse Leeuw (talk) 17:01, 5 June 2023 (UTC)
- So this edit isn't crippling text? Thanks for clarfying that. And you wonder why I have retired? Roelof Hendrickx (talk) 17:49, 5 June 2023 (UTC)
- @Roelof Hendrickx The edit in question was already corrected 46 minutes later, so your complaint is frivolous. If you are really retiring, I advise you to Wikipedia:Leave gracefully:
If you choose to leave the project, do so in a graceful and dignified fashion. It is not necessary to secure the last word, and it is not fair to put other editors in the difficult position of having to assist you withdraw from the project while you attempt to do so.
Nederlandse Leeuw (talk) 18:24, 5 June 2023 (UTC)- No, it was corrected before by me. The edit you refer to again has crippling of text, including the removal of a header and an entire paragraph. I leave in the fashion I have been treated here, not just in this debate but also in four earlier encounters. Roelof Hendrickx (talk) 22:38, 5 June 2023 (UTC)
- You leave with making little sense, and making no actual argument for you points. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 22:42, 5 June 2023 (UTC)
- On the contrary, it's you that's not making actual arguments for your points. The removal of a header and an entire paragraph is not making a point? Says enough doesn't it? Roelof Hendrickx (talk) 22:44, 5 June 2023 (UTC)
- I'm not going to continue this discussion with you. If mistakes were made they would have been corrected, your continued aspersions and nitpicking change nothing. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 22:47, 5 June 2023 (UTC)
- On the contrary, it's you that's not making actual arguments for your points. The removal of a header and an entire paragraph is not making a point? Says enough doesn't it? Roelof Hendrickx (talk) 22:44, 5 June 2023 (UTC)
- You leave with making little sense, and making no actual argument for you points. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 22:42, 5 June 2023 (UTC)
- No, it was corrected before by me. The edit you refer to again has crippling of text, including the removal of a header and an entire paragraph. I leave in the fashion I have been treated here, not just in this debate but also in four earlier encounters. Roelof Hendrickx (talk) 22:38, 5 June 2023 (UTC)
- The edits are done in batches and then checked, again this is neither malicious in any way nor an effect of incompetence. It was a temporary issue that was corrected a short time later. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 18:32, 5 June 2023 (UTC)
- This kind of work deleting entire citations is complex and inherently error prone. The mere existence of errors means nothing. What is important is how many errors, and what was done to correct them. The answer: a 7.3% error rate ie. 73 out of 1000 articles will have an error (based on the results of the first 300 edits). And every edit is manually reviewed and corrected. There are many ways to do automated editing on Wikipedia, a manual review of every edit is acceptable. A 100% correct fully automatic bot is very difficult to make, that kind of labor is not warranted for this few articles. I'm not going to spend days programming just so 73 articles don't have a temporary error that I can manually review and fix in a few minutes. The nature of this work follows the 80/20 Rule which is to say the first 80% is trivial to fix, the next 20% is hard. The last 5% is the hardest of all, taking as long to program for as the first 95% was - so I don't bother with those edge cases rather do them manually since it's only 73 pages. -- GreenC 21:35, 5 June 2023 (UTC)
- I know it's a waste of time, but still I give you this advise if you wanna retain users. Invite them to the discussion, and invite them to think about changing text they wrote. If I had not been completely surprised by the bot edits, and proper explanations had been given, I would've been willing to think with you and help you out on the texts I wrote. It's just a little investment of time that doesn't scare new and inexperienced users away. Roelof Hendrickx (talk) 22:42, 5 June 2023 (UTC)
- You've been here long enough that you should understand at least some of how Wikipedia works. What you have shown here is that you are unwilling to actually discuss issues, or allow anyone to dare touch "your" articles (something I note you have already been warned about). -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 22:45, 5 June 2023 (UTC)
- Well said. To which I might add that WP:OWNERSHIP of articles is not a thing. Nederlandse Leeuw (talk) 22:52, 5 June 2023 (UTC)
I know it's a waste of time. Then do yourself and us all a favour, stop responding, Wikipedia:Leave gracefully, and go do something else that makes you happy, please. Nederlandse Leeuw (talk) 22:50, 5 June 2023 (UTC)
- You've been here long enough that you should understand at least some of how Wikipedia works. What you have shown here is that you are unwilling to actually discuss issues, or allow anyone to dare touch "your" articles (something I note you have already been warned about). -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 22:45, 5 June 2023 (UTC)
- I know it's a waste of time, but still I give you this advise if you wanna retain users. Invite them to the discussion, and invite them to think about changing text they wrote. If I had not been completely surprised by the bot edits, and proper explanations had been given, I would've been willing to think with you and help you out on the texts I wrote. It's just a little investment of time that doesn't scare new and inexperienced users away. Roelof Hendrickx (talk) 22:42, 5 June 2023 (UTC)
- Corrected by me, and then crippled again. Roelof Hendrickx (talk) 22:38, 5 June 2023 (UTC)
Please just leave, Roelof. There's no point in continuing to complain and being hostile to us if you're WP:NOTHERE to build an encyclopedia anymore anyway, and still refuse to learn how our policies and guidelines work. Go do something else that makes you happy. We'll take it from here, thanks. Nederlandse Leeuw (talk) 22:44, 5 June 2023 (UTC)
- This kind of work deleting entire citations is complex and inherently error prone. The mere existence of errors means nothing. What is important is how many errors, and what was done to correct them. The answer: a 7.3% error rate ie. 73 out of 1000 articles will have an error (based on the results of the first 300 edits). And every edit is manually reviewed and corrected. There are many ways to do automated editing on Wikipedia, a manual review of every edit is acceptable. A 100% correct fully automatic bot is very difficult to make, that kind of labor is not warranted for this few articles. I'm not going to spend days programming just so 73 articles don't have a temporary error that I can manually review and fix in a few minutes. The nature of this work follows the 80/20 Rule which is to say the first 80% is trivial to fix, the next 20% is hard. The last 5% is the hardest of all, taking as long to program for as the first 95% was - so I don't bother with those edge cases rather do them manually since it's only 73 pages. -- GreenC 21:35, 5 June 2023 (UTC)
- @Roelof Hendrickx The edit in question was already corrected 46 minutes later, so your complaint is frivolous. If you are really retiring, I advise you to Wikipedia:Leave gracefully:
- GreenC, Nederlandse Leeuw I suggest we pause this for a week. That will allow Roelof Hendrickx time to make any changers they desire or to formulate an argument to WP:RSN on MedLands reliability if they want to. Failing any other objects we would then start back up again. Sorry to jerk you around GreenC, but this seems a better idea than continuing the current conversation. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 10:48, 6 June 2023 (UTC)
- I second AD's suggestion. Nederlandse Leeuw (talk) 11:25, 6 June 2023 (UTC)
- This edit suggest an implicit agreement: Special:Diff/1158585285/1158736035 .. if they are making active changes no rush. Completed through page 400 of about 1000. -- GreenC 14:48, 6 June 2023 (UTC)
- I agree, but giving them some time to manually correct any articles they wish to doesnt hurt anything. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 20:39, 6 June 2023 (UTC)
- No problem ping me when you are ready. I may be in other jobs but everything is setup now. -- GreenC 20:54, 6 June 2023 (UTC)
- Will do, thanks GreenC. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 20:59, 6 June 2023 (UTC)
- No problem ping me when you are ready. I may be in other jobs but everything is setup now. -- GreenC 20:54, 6 June 2023 (UTC)
- I agree, but giving them some time to manually correct any articles they wish to doesnt hurt anything. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 20:39, 6 June 2023 (UTC)
- Is it really necessary to remove from External links? I wasn't under the impression RS applied to those. Srnec (talk) 21:13, 6 June 2023 (UTC)
- I believe it comes under WP:ELNO #2. MedLands presents dubious interpretation of primary documents as fact. Something that has been mentioned in the RSN threads. Linking will present details that are not in Wikipedia's articles, as historians have rejected them, and leave readers wondering why. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 21:36, 6 June 2023 (UTC)
- Ok it's been seven days without any reply. GreenC apologies for jerking you about, could you complete the work when you have time. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 11:51, 13 June 2023 (UTC)
- OK. If the bot edits a page where Roelof has an active interest (no pun intended!), recommend they revert the bot, then edit to remove the unreliable sources and content, as they want to handle it, for that page. -- GreenC 14:41, 13 June 2023 (UTC)
- No offence taken. Although I haven't checked all the articles for which texts I'm responsible, in the sense that I wrote the texts, I believe that the articles I still have to check, haven't references or footnotes that contain links to Medieval Lands. So, I doubt I will have to revert the bot edits in those articles. I might do some additional changes, but I doubt that those changes will have anything to do with the bot edits.
- In the articles I have checked and changed the bot won't find Medieval Lands anymore. Roelof Hendrickx (talk) 17:09, 13 June 2023 (UTC)
- @Roelof Hendrickx I would like to thank you for the efforts you have undertaken in recent days to improve the articles in question. (For instance, this edit; I'm pleasantly surprised that there is a journal about genealogy and heraldry called De Nederlandsche Leeuw – almost identical to my username 'Nederlandse Leeuw' – which can be used as an WP:RS for genealogical information in articles such as Adelaide of Vianden). I would also like to apologise for some remarks I made to you on 5 June that were a little harsh; I should have dealt with my frustation more constructively. You have shown in the past 9 days that you are WP:HERE to help make Wikipedia better, and willing to follow the policies and guidelines in order to do so. I'm very glad with that, and I look forward to working with you if our paths should ever cross again. Happy editing! Nederlandse Leeuw (talk) 07:28, 14 June 2023 (UTC)
- @Nederlandse Leeuw First of all my apologies for my late reply. I unfortunately was occupied with private matters until now. Thanks very much for your message, much appreciated. And I immediately admit that I also made comments that were not civilised at all. I think emotions took the better of me, which shouldn't have happened. I'm only human, and sometimes I make that mistake. My apologies for that too, also to @ActivelyDisinterested and @GreenC. I hope for forgiveness and hope we can continue as if it didn't happen. Or as we Dutch say "zand erover".
- As for the magazine, I'm surprised that you didn't know it. I really thought you had named yourself after it! Or did you name yourself after the order of chivalry? Roelof Hendrickx (talk) 17:32, 16 June 2023 (UTC)
- @Roelof Hendrickx Much appreciated! It might not seem like it, but on this side of the screen, there is also a human being with his own flaws and limitations who doesn't always do things right. As for my nickname, it's not really named after anything in particular, except maybe the Dutch Republic Lion in heraldry, or the Leo Belgicus in cartography (those maps looked pretty cool). But I don't take any of them very seriously, and the nickname has no connections with organisations, publications, or orders of chivalry. Nederlandse Leeuw (talk) 17:43, 16 June 2023 (UTC)
- @Nederlandse Leeuw Now that we both gracefully admit that we're only human with flaws and limitations and that we could admit that we both are able to make mistakes, I'm sure it will work out fine in the future between us. I'm really glad that we cleared it!
- Good to know that you didn't name yourself after anything in particular. it taught me again not to assume knowing something I cannot know for sure. Roelof Hendrickx (talk) 18:02, 16 June 2023 (UTC)
- @Roelof Hendrickx Likewise! I assumed you would be Flemish because of your last name's spelling until you said "as we Dutch say". happy editing! Nederlandse Leeuw (talk) 18:05, 16 June 2023 (UTC)
- @Roelof Hendrickx Much appreciated! It might not seem like it, but on this side of the screen, there is also a human being with his own flaws and limitations who doesn't always do things right. As for my nickname, it's not really named after anything in particular, except maybe the Dutch Republic Lion in heraldry, or the Leo Belgicus in cartography (those maps looked pretty cool). But I don't take any of them very seriously, and the nickname has no connections with organisations, publications, or orders of chivalry. Nederlandse Leeuw (talk) 17:43, 16 June 2023 (UTC)
- @Roelof Hendrickx I would like to thank you for the efforts you have undertaken in recent days to improve the articles in question. (For instance, this edit; I'm pleasantly surprised that there is a journal about genealogy and heraldry called De Nederlandsche Leeuw – almost identical to my username 'Nederlandse Leeuw' – which can be used as an WP:RS for genealogical information in articles such as Adelaide of Vianden). I would also like to apologise for some remarks I made to you on 5 June that were a little harsh; I should have dealt with my frustation more constructively. You have shown in the past 9 days that you are WP:HERE to help make Wikipedia better, and willing to follow the policies and guidelines in order to do so. I'm very glad with that, and I look forward to working with you if our paths should ever cross again. Happy editing! Nederlandse Leeuw (talk) 07:28, 14 June 2023 (UTC)
- OK. If the bot edits a page where Roelof has an active interest (no pun intended!), recommend they revert the bot, then edit to remove the unreliable sources and content, as they want to handle it, for that page. -- GreenC 14:41, 13 June 2023 (UTC)
- @ActivelyDisinterested and Nederlandse Leeuw: All refs removed (1,042 pages).[2] .. recommend submitting a request to WP:BLACKLIST otherwise users will re-add over time since they show on Google and might appear reliable. -- GreenC 01:04, 14 June 2023 (UTC)
- @GreenC Thanks a lot for your work! Nederlandse Leeuw (talk) 07:08, 14 June 2023 (UTC)
- It's already being re-added: Special:Diff/1160003284/1160040573 -- GreenC 13:36, 14 June 2023 (UTC)
- It will happen, as with other unreliable and misleading sources they keep coming back. Thanks for your work. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 13:43, 14 June 2023 (UTC)
- It's already being re-added: Special:Diff/1160003284/1160040573 -- GreenC 13:36, 14 June 2023 (UTC)
- @GreenC Thanks a lot for your work! Nederlandse Leeuw (talk) 07:08, 14 June 2023 (UTC)
- I have no problem with this website being deleted as a source. It is not an RSN because it is self-published, but as far as I can see most of the bot edits are deletions from external links sections? I don't see any rationale anywhere for mass removals from external links sections. Is that the intention?--Andrew Lancaster (talk) 05:29, 14 June 2023 (UTC)
- Yes, per WP:EXT:
Some acceptable external links include those that contain further research that is accurate
. We have already established repeatedly at RSN that Cawley is frequently inaccurate, and Cawley admitted himself that the early part of his detailed Rurikid genealogy may be of little factual significance but is reproduced by way of interest. This is just one of many examples where Cawley shows that he values presenting things that interest him (WP:IJUSTLIKEIT) over being a reliable source (WP:RS). Some of his sources are also unverifiable, such as the 7 sources which are 'private emails' with certain authors, violating WP:V. We should not continue to host known inaccurate/unreliable or unverifiable sources in external links sections per WP:ELNO (as ActivelyDisinterested pointed out above):Any site that misleads the reader by use of factually inaccurate material or unverifiable research, except to a limited extent in articles about the viewpoints that the site is presenting.
Cawley is not presenting 'viewpoints' other than his own hobbyist interest in genealogy, and presenting Cawley's non-expert viewpoint is WP:UNDUE. Cheers, Nederlandse Leeuw (talk) 07:07, 14 June 2023 (UTC)
- Yes, per WP:EXT:
RuPaul
I noticed that on the page for RuPaul, that citation № [102], concerning his spouse Georges LeBar, leads to a link for a site known as georgeslebar.com which when clicked leads to nothing but open domain. MetricPin (talk) 03:03, 28 July 2023 (UTC)Sincerely, MetricPin.
- Done, I removed that citation and replaced it with a more appropriate WP:RS for the citation. Phuzion (talk) 17:07, 5 September 2023 (UTC)
www.fiba.com
The www.fiba.com page can no longer be opened. Unfortunately, it is used in thousands of articles. Sometimes renaming to https://www.fiba.basketball helps (for example here), but in most cases the site is dead. Can something be done about it? Maiō T. (talk) 19:25, 1 September 2023 (UTC)
- Fiba.com has at least 12,000 mainspace links, in a few thousand articles. Most of them are archive.fiba.com and they are working. OTOH www.fiba.com there is a https error. Example says "SEC_ERROR_EXPIRED_CERTIFICATE". They need to pay to renew their certificate. They probably will. Do you know how long it's been down? -- GreenC 04:39, 2 September 2023 (UTC)
- I don't know. Some two or three weeks ago I clicked on a www.fiba.com link and was redirected to www.fiba.basketball, so it was relatively fine back then. Now these redirects doesn't work anymore. Maybe it would be a good idea to wait a few days. Maiō T. (talk) 10:03, 2 September 2023 (UTC)
- The Wayback Machine has info. It looks like on August 18, 2017, the home page www.fiba.com began redirecting to www.fiba.basketball (it may have been August 17, but there were no snapshots for that day). This could explain why the SSL cert expired, they abandoned the domain a long time ago and no longer maintain it. Now, if we look at a deep link: https://www.fiba.com/pages/eng/fa/statistics/p/sid/2271/_/1950_European_Championship_for_Women/player-leaders.html and change the www.fiba.com to archive.fiba.com .. https://archive.fiba.com/pages/eng/fa/statistics/p/sid/2271/_/1950_European_Championship_for_Women/player-leaders.html .. it works! So trying either archive.fiba.com or www.fiba.basketball. If they don't work, add an archive URL. I can do this. -- GreenC 15:50, 2 September 2023 (UTC)
- I don't know. Some two or three weeks ago I clicked on a www.fiba.com link and was redirected to www.fiba.basketball, so it was relatively fine back then. Now these redirects doesn't work anymore. Maybe it would be a good idea to wait a few days. Maiō T. (talk) 10:03, 2 September 2023 (UTC)
- Wow GreenC, you're a genius! Thank you in advance! Maiō T. (talk) 18:18, 2 September 2023 (UTC)
Update what was done today/tonight:
- It ran in 3,024 articles and processed (unknown) number of links estimate around 20,000
- It ran in two passes, the first for archive.fiba.com and second for www.fiba.basketball. Example pass 1 and pass 2. It's messy but works out: combined diff.
- Generally for every link "www.fiba.com" the result will be one of: 1. link remains with a
{{dead link}}
; 2. link remains converted to an archive URL; 3. link is converted to archive.fiba.com ; 4. link converted to www.fiba.basketball which might be a new URL due to redirects. For 3 & 4, pre-existing archive URLs are either removed (bare and square links), or converted to url-status=live - Many of the diffs are dense with changes eg. 2016 in sports. If you see any problems, let me know, I can go back and try to repair it. It's easier looking at the final combined diff. There are so many permutations of link changes, and so many links in a page, it can create some spectacular diffs.
- Some links don't do anything eg. https://www.fiba.basketball/calendar is only good for future dates, but is often cited for back dates. There are 27 links. They fail Verification and technically the entire cite should be deleted.
- Template: and File: were processed (100 total).
- There are other types of FIBA links like "alha.www.fiba.com" which were not processed. More work should be done to discover these edge cases, and what to do with them.
-- GreenC 06:06, 4 September 2023 (UTC)
- Thank you very much! It's amazing what you have done. Maiō T. (talk) 10:15, 4 September 2023 (UTC)
- Found another 70 domains *.fiba.com, of which 68 are dead and replaced with archive URLs Example. -- GreenC 04:48, 5 September 2023 (UTC)
Andhimazhai
Many old links such as this don't redirect to the new versions like this. Kailash29792 (talk) 09:24, 11 September 2023 (UTC)
- User:Kailash29792: This must have happened recently because the WaybackMachine has a snapshot at the old URL in January [3] and the new URL in May [4]. There is no redirect info in headers or the WaybackMachine. The options would be to convert them to archive URLs, or, wait and hope and they add redirects eventually. If the later happens, my bot is capable of rolling back archive URLs and adding the new URL. I suggest adding archive URLs for now. What do you think? -- GreenC 14:28, 16 September 2023 (UTC)
- Even better might be to tag existing links as dead. Am travelling right now, so I can't add the archive links. Kailash29792 (talk) 14:43, 16 September 2023 (UTC)
- OK I'll add archive URLs. I don't know if they are all dead so will check them individually. -- GreenC 15:10, 16 September 2023 (UTC)
- It's done 21 pages edited, they were all soft-404s pointing back to the home page. -- GreenC 18:18, 16 September 2023 (UTC)
- OK I'll add archive URLs. I don't know if they are all dead so will check them individually. -- GreenC 15:10, 16 September 2023 (UTC)
- Even better might be to tag existing links as dead. Am travelling right now, so I can't add the archive links. Kailash29792 (talk) 14:43, 16 September 2023 (UTC)
emertainmentmonthly.com
This domain now leads to a spam site. Referenced on many pages. https://en.m.wikipedia.org/wiki/Special:LinkSearch?target=Http%3A%2F%2Femertainmentmonthly.com
Looks like these references should be updated to point to https://emertainmentmonthly.org/ ?
I was able to find reference #4 from Bernard Cornwell on the new site (below). URL and content seem to match the archive.org record, aside from the extension of course.
Baunno (talk) 12:23, 16 September 2023 (UTC)
- Ok thank you for the report. Converting to .org and checking the new link works otherwise converting to archive URL. I will work on this. There are 61 pages. -- GreenC 14:33, 16 September 2023 (UTC)
It's done. If there was an archive URL there, it left it in place but flipped the status to live, Example. A couple didn't convert because the new URL is a dead link. -- GreenC 18:36, 16 September 2023 (UTC)
- Thank you! Baunno (talk) 15:56, 17 September 2023 (UTC)
USA Basketball
Hello. I was wondering if the broken URLS for USA Basketball (usab.com) could be fixed. These include:
- link at Ashley Houts. It can't be replaced with one of the links here as the event is no longer held.
- Multiple articles at United States women's national under-19 basketball team such as this link It's no longer there in the news section.
I was also wondering if the archive.usab.com and usabasketball.com could be checked as they're broken as well. These two domains might have already been checked for archives, but I'd like to double check. Altogether, these total up to 3,000+ links.
Thank you! MrLinkinPark333 (talk) 17:31, 16 September 2023 (UTC)
- Yes I see usab.com has been excluded from InternetArchiveBot. Same for about half of usabasketball.com .. so they are not being maintained. I'll go through them it will take some time. Everything I spot checked is dead, with long timeout response. Between this and FIBA above, I wonder what is happening in the basketball world. -- GreenC 18:53, 16 September 2023 (UTC)
MrLinkinPark333: The bot checked each URL in all sub-domains for usab.com and usabasketball.com -- it edited about 900 pages including Template: and File:, added about 1,200 new archive lines, flipped about 400 |url-status=live
to dead - it also updated the IABot database (for each URL) so the results will propagate to 100s of other wikis (Example). -- GreenC 05:17, 17 September 2023 (UTC)
- Happened to notice http://basketball.teamusa.org is also soft-404ing. 4 pages only. But teamusa.org has over 3,500 pages and spot checking there are many dead pages. I'll process this as a separate project in a new section. -- GreenC 05:39, 17 September 2023 (UTC)
- Thank you for the quick response! MrLinkinPark333 (talk) 15:21, 17 September 2023 (UTC)
teamusa.org
Over 3,500 pages with many unfixed dead links of various types. -- GreenC 05:44, 17 September 2023 (UTC)
- Edited 3,231 pages. Fixed over 4,000 links. Most were soft-404s. -- GreenC 00:33, 19 September 2023 (UTC)
Blacklist healthlinedotcom
According to RfC Wikipedia:Reliable_sources/Noticeboard#Healthline:_deprecate_or_blacklist? the domain is to be blacklisted, and discussions were to remove all citations containing the domain. Request for help by bot due to scale made by User:Zefr and User:David Gerard. Every edit by the bot will be manually reviewed. Some errors and subsequent corrections are expected. The domain is in 840 pages. -- GreenC 18:52, 7 July 2023 (UTC)
- If the bot can leave a comment tagging that it was Healthline (if that's possible), that would be very helpful afterwards - David Gerard (talk) 19:17, 7 July 2023 (UTC)
- In the edit summary, or the citation needed template? For the former it looks like Special:Diff/1163511203/1164069714 .. for the later I formatted it as "citation needed|date= July 2023" .. note the date + the space after the = .. I figured that would be sufficient to disambiguate it from other uses of the template on the page. It's kind of a cryptic but works. Search on "=<space>July 2023". If you prefer more explicit could add "|reason=Healthline". -- GreenC 20:53, 7 July 2023 (UTC)
- On more thought, a longer reason is probably a good idea. How about "|reason=WP:healthlinedotcom" which should make it obvious why the cite needed tag exists and why it was done, without cluttering the page too much. It could also redirect here instead. -- GreenC 20:59, 7 July 2023 (UTC)
- that would be ideal, "reason=WP:healthlinedotcom" is a searchable flag that the claim itself really needs human inspection. thank you! - David Gerard (talk) 23:05, 7 July 2023 (UTC)
- I have worked on a few from GreenC's initial bot work, but can already see this is going to be a long, tedious process of a) reviewing/editing content of the existing passage, b) finding suitable MEDRS-quality sources for what is often soft content (where healthline thrived on Wikipedia), c) leaving the 'cn' in place because there are no good sources readily identified, and d) fighting an edit war with healthline diehards, such as here.
- GreenC - it might be best to nuke the healthline sources all at once, and I'll work on your list a few at a time. There are many other matters calling. Zefr (talk) 23:38, 7 July 2023 (UTC)
- 602 articles containing WP:healthlinedotcom -- GreenC 14:51, 8 July 2023 (UTC)
- that would be ideal, "reason=WP:healthlinedotcom" is a searchable flag that the claim itself really needs human inspection. thank you! - David Gerard (talk) 23:05, 7 July 2023 (UTC)
- On more thought, a longer reason is probably a good idea. How about "|reason=WP:healthlinedotcom" which should make it obvious why the cite needed tag exists and why it was done, without cluttering the page too much. It could also redirect here instead. -- GreenC 20:59, 7 July 2023 (UTC)
- In the edit summary, or the citation needed template? For the former it looks like Special:Diff/1163511203/1164069714 .. for the later I formatted it as "citation needed|date= July 2023" .. note the date + the space after the = .. I figured that would be sufficient to disambiguate it from other uses of the template on the page. It's kind of a cryptic but works. Search on "=<space>July 2023". If you prefer more explicit could add "|reason=Healthline". -- GreenC 20:53, 7 July 2023 (UTC)
- All gone. I submitted a Blacklist request to SBL: MediaWiki_talk:Spam-blacklist#healthline.com. @Zefr and David Gerard: -- GreenC 14:46, 8 July 2023 (UTC)
- thanks, both of you :-) - David Gerard (talk) 15:57, 8 July 2023 (UTC)
- GreenC - checking on the total number nuked, you first had 602 today, and 6 more in the "all gone" result. Yesterday, when I checked, there were 850 remaining. I know you have edited 20-30 yourself (with thanks) and I have done a dozen, but where might the other ~ 200 be? Zefr (talk) 16:24, 8 July 2023 (UTC)
- Not all the citations removed had a [citation needed] tag added, because they were adjacent to other citations, they don't show up in the All Gone search. Or things like this Special:Diff/1163225910/1164248686. And the first 30 didn't use the WP:healthlinedotcom reason. The number of pages edited by the bot is 823, the rest were manual edits. -- GreenC 16:27, 8 July 2023 (UTC)
According to Healthline they also own Medical News Today (548 pages), DiabetesMine (11), and MediLexicon (160). They also own Greatist and Psych Central. -- GreenC 16:55, 8 July 2023 (UTC)
- Each is a spam site giving opportunities to both advertise and link to other spam published under Red Ventures (parent of healthline and medical news today et al., discussed in the RSN evaluation on healthline), and described as "intent-based media — a term for specialist sites that attract people who are already looking to spend money in a particular area (travel, tech, health) and guide them to their purchases, while taking a cut." In other words, having these Red Venture sites as sources on Wikipedia enables further spam-spread, nonsense promotion, and commercialization. They should all be blacklisted.
- David Gerard - as the RFC on healthline discussed this, would admin allow a fast-track to blacklisting for all Red Ventures sites? Zefr (talk) 17:18, 8 July 2023 (UTC)
- I'm not totally convinced https://greatist.com is a spam site. Their policy page is pretty good. They were purchased by Red, started out independently. The others I have not looked at. Maybe it's best to filter them through RSN first. Some may be OK, others not, or prior to Red's purchase etc. I'd be more comfortable with more eyes on this before we delete references and ban them entirely. -- GreenC 20:57, 8 July 2023 (UTC)
- Caution is ok. We have enough to do to purge healthline! The greatist.com is the same MO as the others: non-expert author + non-expert "medical reviewer" + right display promotion of other Red Venture articles + prominent subscribe bar + "best list", commercial promotion and spam within the article, as for CBD = BS. Should not be on Wikipedia, but there are only 35 insource hits. Zefr (talk) 21:37, 8 July 2023 (UTC)
- The Greatist policy page explicitly says that they're run by Healthline Media, so they should be blacklisted as well. JoelleJay (talk) 23:24, 8 July 2023 (UTC)
- GreenC, David Gerard and JoelleJay - removal of Healthline source notices is complete.
- As noted in the discussion above, Healthline Media's brands are pervasive in public health-related content and continue to be cited on Wikipedia (MedicalNewsToday - the largest - is in 901 articles). All Healthline Media brands have advertising pitches to 3rd parties and promotion of the parent company, indicating the concerns that led to blacklisting Healthline remain in its other brands. Thoughts on blacklisting other Healthline brands? Zefr (talk) 23:20, 20 September 2023 (UTC)
- I'm not totally convinced https://greatist.com is a spam site. Their policy page is pretty good. They were purchased by Red, started out independently. The others I have not looked at. Maybe it's best to filter them through RSN first. Some may be OK, others not, or prior to Red's purchase etc. I'd be more comfortable with more eyes on this before we delete references and ban them entirely. -- GreenC 20:57, 8 July 2023 (UTC)
- "Healthline" is generally considered as reliable source, most articles on that website is based on other studies and researches. It has covered articles on way more things than most other websites that are generally considered reliable health website. Some articles on that website are actually unreliable but still It should be considered as generally reliable source here. It is definitely not a spam site. It is relatively better than its rest branches like "Psych Central", "Medical news today", "Greatist" etc. I think it should be considered as generally reliable or atleast not unreliable source. In fact most other health websites that are generally considered reliable are worse than it, I think more consideration should be done Polarbear678 (talk) 15:00, 9 July 2023 (UTC)
- I have no opinion either way, this isn't the right forum to resolve that question. The RfC Wikipedia:Reliable_sources/Noticeboard#Healthline:_deprecate_or_blacklist? was open for over a month and is closed now. The references have been removed, it would be very difficult to restore them. Very difficult to reverse the RfC results. -- GreenC 16:21, 9 July 2023 (UTC)
Call sign history for U.S. radio stations
Any URL starting with http://licensing.fcc.gov/cgi-bin/ws.exe/prod/cdbs/pubacc/prod/call_hist.pl? needs to be changed to https://licensing.fcc.gov/cgi-bin/ws.exe/prod/cdbs/pubacc/prod/call_hist.pl? — Vchimpanzee • talk • contributions • 15:28, 2 September 2023 (UTC)
- Frankly, any url starting with http://licensing.fcc.gov should be converted to https://licensing.fcc.gov, as all links to the HTTP version of that domain return a 403. According to my quick research, we're looking at about 3,400 instances of HTTP links to that domain. Phuzion (talk) 18:06, 5 September 2023 (UTC)
- Yes I'll take care of it and check https etc.. any other problems like redirects and soft-404s, each URL will be verified is working not assume they all work they rarely all do after a migration. -- GreenC 21:28, 5 September 2023 (UTC)
- Thanks to both of you. Yes, I should have realized the problem might be more extensive.— Vchimpanzee • talk • contributions • 16:56, 14 September 2023 (UTC)
- User:Vchimpanzee - the FCC fixed the 403 error as the http link now redirects to https. We could in theory change all fcc.gov subdomains to https but it's probably redundant. Below is a list of all domains in use on Wikipedia. -- GreenC 14:13, 16 September 2023 (UTC)
- Thanks to both of you. Yes, I should have realized the problem might be more extensive.— Vchimpanzee • talk • contributions • 16:56, 14 September 2023 (UTC)
- Yes I'll take care of it and check https etc.. any other problems like redirects and soft-404s, each URL will be verified is working not assume they all work they rarely all do after a migration. -- GreenC 21:28, 5 September 2023 (UTC)
- As long as it works. Sorry I didn't see this.— Vchimpanzee • talk • contributions • 16:01, 26 September 2023 (UTC)
Historic Hansard
Back in ~2018, the content of hansard.millbanksystems.com (digitised copies of Hansard for the UK Parliament) was transferred to an official site at api.parliament.uk/historic-hansard. The old site remained online, however, and continued to be pretty widely used as references - there are about 7800 links to it. (The new site has around 13k, 10k of which are via templates, changed back in 2018.)
The old site has finally gone offline, possibly forever, and so it's probably a good time to finally change all of these over. I believe the URL patterns are very simple - hansard.millbanksystems.com/...
becomes api.parliament.uk/historic-hansard/...
- which hopefully will make it straightforward. Andrew Gray (talk) 19:52, 19 September 2023 (UTC)
- OK. I'll check each one to make sure it exists at the new site it's common for admins to miss some during migrations. If it doesn't exist it will add an archive URL. I'll also check for soft-404s (redirects to home pages etc). The 10k in templates is scary because there is no easy way to check them for link rot without special code for parsing the template. But that's a general problem with the thousands of custom URL templates, which create huge link rot problems over time. -- GreenC 20:09, 19 September 2023 (UTC)
- @GreenC amazing, thankyou! I'm in touch with the maintainers at Parliament so happy to poke them about any pages which don't exist on the new site, if you do spot any.
- To clarify, the 10k in templates have been switched to the new site for about five years now - I don't think there have been any reported issues arising from it. Andrew Gray (talk) 21:10, 19 September 2023 (UTC)
- We'll see how clean the migration is to api.parliament.uk - if there are enough errors it's a signal there might be other problems. Templates can hide natural entropy. But some sites can surprise and are well maintained. If we can help them find problems via your contacts all the better. -- GreenC 22:20, 19 September 2023 (UTC)
- I discovered these two pages have the same content:
{{cite Hansard |jurisdiction= |title=HC Deb 19 April 1989 vol 151 cc171-3W |url=https://api.parliament.uk/historic-hansard/written-answers/1989/apr/19/mv-perintis |house=House of Commons |date=19 April 1989 |column=171-3W |volume=151}}
==> "HC Deb 19 April 1989 vol 151 cc171-3W". Parliamentary Debates (Hansard). Vol. 151. House of Commons. 19 April 1989. col. 171-3W.
{{Ukhansard|house=HC |date=19 April 1989 |vol=151 |cc=171-3W}}
==> HC Deb, 19 April 1989 vol 151 cc171-3W
- The second one (Ukhansard) doesn't produce the correct URL (ie. [5]) but it does go to the correct day/volume. The first one (cite Hansard) is built manually if you already know the correct URL. I'm not sure which is better/preferred. The second one is a problem due to the wrong URL output, but at least it can automate creating a URL that is nearby. It all creates complications due to the permutations: square link, cite web, or cite journal for the two websites. Plus the two special templates. 8 possibilities. -- GreenC 02:43, 22 September 2023 (UTC)
- As I understand it, these are two different services giving the same content - the one at api.parliament.uk/historic-hansard/ which is the plain digitised version up to 2005, and the one at hansard.parliament.uk which is the same content cleared up and imported into the modern system where it sits alongside the new material. There's some subtle differences (eg the new one doesn't link people/legislation) but basically they're equally useful. I guess for the moment we can link out to either without needing to standardise.
- For {{ukhansard}} - hmm. This is a bit of an odd one and honestly I don't think I'd encountered it before! (It's only used a few dozen times). It's the formally correct style of citation, I think, but the links going to "something reasonably close" is a bit confusing. I'll have a think about this one.
- Thanks so much for all the cleanup! Andrew Gray (talk) 23:45, 23 September 2023 (UTC)
- I think it's done, for this phase anyway. The URLs are boiler-plate code but the metadata changes required discovery and customizations. Here are some diffs that highlight the features of what was done: [6], [7], [8], [9], [10]. Whatever is left, search, looks like mostly archive URLs. I saw very few dead links, the site is admirably in a great condition, not typical for governments. I think if any more work is done it would be converting square links to templates, and converting cite webs to cite hansard, but those are more complex jobs. -- GreenC 01:24, 24 September 2023 (UTC)
- @GreenC Amazing. It sounds like you've pretty much got everything! Thanks again for all your hard work here - I was speaking to the Parliament data people yesterday and they were very impressed by it, I said I would pass their thanks on :-) Andrew Gray (talk) 20:43, 26 September 2023 (UTC)
- You are welcome. Glad to help! You can thank them for maintaining such a high degree of working links, not easy at scale. Many sites, like cia.gov World Factbook, don't even try. They recently changed to a new URL scheme. The day of the switchover, all the old links immediately went 404, no redirects, and no map for what the new link might be. -- GreenC 01:53, 27 September 2023 (UTC)
- @GreenC Amazing. It sounds like you've pretty much got everything! Thanks again for all your hard work here - I was speaking to the Parliament data people yesterday and they were very impressed by it, I said I would pass their thanks on :-) Andrew Gray (talk) 20:43, 26 September 2023 (UTC)
- I think it's done, for this phase anyway. The URLs are boiler-plate code but the metadata changes required discovery and customizations. Here are some diffs that highlight the features of what was done: [6], [7], [8], [9], [10]. Whatever is left, search, looks like mostly archive URLs. I saw very few dead links, the site is admirably in a great condition, not typical for governments. I think if any more work is done it would be converting square links to templates, and converting cite webs to cite hansard, but those are more complex jobs. -- GreenC 01:24, 24 September 2023 (UTC)
etcanada.com
It was announced today that the entertainment news series Entertainment Tonight Canada will be cancelled after next week, and apparently the website is disappearing with it — but as a person who edits principally in the film, television and music areas, I've cited a lot of its content (and I mean a lot a lot) in the past several years, so the links are going to need to be archived for salvage purposes.
It's obviously not a task I want to grind through all by myself if I don't absolutely have to, so I wanted to ask if it's possible to automate checking for all Wikipedia articles that feature links to the https://etcanada.com/ domain, and ensuring that there's an archived copy added to the citation if there isn't already one present yet? Thanks. Bearcat (talk) 19:43, 27 September 2023 (UTC)
- Anyone? This is one week away from becoming a massive emergency. Bearcat (talk) 16:12, 28 September 2023 (UTC)
- @Bearcat, I believe you could run IA Bot on it. Try going to https://iabot.wmcloud.org/index.php?page=manageurldomain. (It doesn't allow me to use it because I'm not an admin.) — Qwerfjkltalk 17:40, 28 September 2023 (UTC)
- IABot probably won't get them all for various reasons. I can run WaybackMedic and force all the links as dead which will then look up and add archive links. If there is no archive link and the link is otherwise live, depending how many there are, I can try to do something for Wayback to SPN (Save Page Now). I've been exceptionally busy lately but will have time to look at this today since it is time-sensitive. About 1,600 pages. -- GreenC 18:43, 28 September 2023 (UTC)
- At NLwiki I run a script that checks whether the URLs in an article have been archived. If not, it performs a save request to Archive.org. The resulting archive URL is added to the article, but I can turn this off, so a BRFA is not required. Wikiwerner (talk) 19:36, 28 September 2023 (UTC)
- IABot probably won't get them all for various reasons. I can run WaybackMedic and force all the links as dead which will then look up and add archive links. If there is no archive link and the link is otherwise live, depending how many there are, I can try to do something for Wayback to SPN (Save Page Now). I've been exceptionally busy lately but will have time to look at this today since it is time-sensitive. About 1,600 pages. -- GreenC 18:43, 28 September 2023 (UTC)
- @Bearcat, I believe you could run IA Bot on it. Try going to https://iabot.wmcloud.org/index.php?page=manageurldomain. (It doesn't allow me to use it because I'm not an admin.) — Qwerfjkltalk 17:40, 28 September 2023 (UTC)
I have marked etcanada.com and related subdomains as permadead in the InternetArchiveBot interface and have started the process of adding archive links. Harej (talk) 20:09, 28 September 2023 (UTC)
- I'm running too :) IABot can't do discovery of new archive.today links etc.. when there is no Wayback links available. It is usually missing Wikipedia pages it hasn't seen yet. And has parsing trouble seeing all links. That's why I run it through Medic. Once done it then updates the IABot database with the new links. -- GreenC 20:29, 28 September 2023 (UTC)
- Done. Medic edited 1,530 pages. It added 1,523 archive URLs. It flipped
|url-status=live
to dead in 333 citations. I'm aware of two{{dead link}}
's in Donny Osmond and I Touch Myself. It overlapped editing with IABot in about 30 pages, and made corrections IABot missed in 2 or 3 of those articles. -- GreenC 01:17, 29 September 2023 (UTC) - Okay, thanks. I'll try to do a run-through later to see if I can catch anything that's still problematic (either because it got missed or because the bot wasn't able to find an archived copy at all), but I appreciate the assistance — since I don't work with those tools that often, I was struggling to understand how to do what I needed or find documentation to help, so I'm most appreciative that people stepped in to help out while I was flailing. Bearcat (talk) 14:02, 29 September 2023 (UTC)
- Ok, let me know what you find it missed. I forgot to check File:, Template: and Module: space. Found three more links there. -- GreenC 16:43, 29 September 2023 (UTC)
- Done. Medic edited 1,530 pages. It added 1,523 archive URLs. It flipped