Wikipedia:Bots/Requests for approval/Lonjers french region rename bot
Operator: Lonjers (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 03:36, Thursday, January 7, 2016 (UTC)
Automatic, Supervised, or Manual:Automatic
Programming language(s):Python
Source code available:https://github.com/utilitarianexe/wiki_france_region_rename
Function overview:Renames regions in info boxes for french department, commune, and arrondissement articles to deal with France consolidating regions as of Jan 1 2016
Links to relevant discussions (where appropriate):https://en.wikipedia.org/wiki/Wikipedia:Bot_requests#New_French_regions_on_1_January https://en.wikipedia.org/wiki/User_talk:AHeneen#help_with_info_box_renaming https://en.wikipedia.org/wiki/Module_talk:Wikidata#Suggested_test_case:_New_French_Regions
Edit period(s): one time
Estimated number of pages affected:30,000
Exclusion compliant (Yes/well will add the code tonight):
Already has a bot flag (Yes/No):
Function details:Fairly simple find and replace task. Bot first gets the list of all French department, commune, and arrondissement articles. It then searches for the appropriate info boxes on each article in the list. It finds in the info box the current region the article is mapped to. This region is then mapped to the new region it belongs to. Because the regions are simply being consolidated not rearranged this new region name can simple replace the old one. An example of an already corrected article(these get auto skipped by the bot) is https://en.wikipedia.org/wiki/Ard%C3%A8che one that is not fixed yet is https://en.wikipedia.org/wiki/Lot_%28department%29. It was suggested to use wikidata properties instead of simply a name replace. This is a good idea but would require much more work. Many of the pages that need corrected do not even have wikidata items yet. I plan on working on this too. But think the simple name replace should be done first. There may be cases were my regexes don't properly macth the info boxes of some articles. I plan on running the program first in a mode to check for any of these cases. It is not possible to do this manually. If any problems are found I will fix the code. This is before I make any edits. But even to do this check requires looking at many articles so I think I need approval first. This is my first time trying this so be patient
Discussion
Hi, User:AHeneen brought your proposal to my attention. You do not need to change the regions in the commune infoboxes because that is done automatically, using the INSEE code. See for instance Largentière: the infobox contains the line "|region = Rhône-Alpes", but this is ignored because the infobox uses the first two numbers of the INSEE code ("07132") to determine it is in the (new) region Auvergne-Rhône-Alpes. I have already updated the regions in all the relevant department, arrondissement and canton articles (infobox and article text). What still needs to be done is change the regions in the article text for the communes. Maybe a bot can help there. But be careful, because not all references to an old region should be changed, for instance Alsace may refer to the traditional region, not the former administrative region. Markussep Talk 08:36, 13 January 2016 (UTC)
- Context-sensitive changes are very tricky for bots. We can try to come up with a restrictive replacement, maybe, but (and I'm sorry to say this) a manually-assisted AWB job might be the way to go here. Needs a closer look to see how these articles are structured. — Earwig talk 08:44, 13 January 2016 (UTC)
- Thanks for the responses. Yikes that must have taken some time to edit all those manually. But better in the end because you fixed it in the article text too. I was avoiding that because of the context problems. User:The Earwig I think you could still edit the text in all the commune articles if you restricted to just the line at the top of the articles. Nearly all of them have the form "in the *** region" at the top of the article and you could just skip the ones that don't exactly match that in the first sentence. I do agree though that changing it anywhere else would require manual checking. Let me know if you think that is a good idea to try and I will modify my code for that task. Kinda just trying to find a way to still use the code that I wrote. But if you don't think it is a good idea that is ok too still good practice. Lonjers (talk) 21:03, 13 January 2016 (UTC)
- There's an important lesson to be learned here about work being spent on bots that later turn out to be unnecessary. It happens, although for your first task it's a bit unfortunate. We can give what you are proposing a shot. I am thinking of some additional conditions, like skipping articles that already include the new region name (which have likely been migrated already) or have "was" in the same sentence as the region-to-replace, but it's still tricky to get right. You might want to start by going through the relevant articles and building a list of which ones the bot would definitely change, so we can get a sense of the number of edits and do some spot-checks. — Earwig talk 22:29, 13 January 2016 (UTC)
- Did some checking and actually not very many of the articles match a standard template. In general it seems most of the articles don't even include the region in the text. The French wikipedia versions of the articles usually do but those seem to already be updated. I guess we should close this request for now. Still looking for little programming tasks to do on wikipediat if you have any suggestions. Lonjers (talk) 23:19, 16 January 2016 (UTC)
- There's an important lesson to be learned here about work being spent on bots that later turn out to be unnecessary. It happens, although for your first task it's a bit unfortunate. We can give what you are proposing a shot. I am thinking of some additional conditions, like skipping articles that already include the new region name (which have likely been migrated already) or have "was" in the same sentence as the region-to-replace, but it's still tricky to get right. You might want to start by going through the relevant articles and building a list of which ones the bot would definitely change, so we can get a sense of the number of edits and do some spot-checks. — Earwig talk 22:29, 13 January 2016 (UTC)
- Thanks for the responses. Yikes that must have taken some time to edit all those manually. But better in the end because you fixed it in the article text too. I was avoiding that because of the context problems. User:The Earwig I think you could still edit the text in all the commune articles if you restricted to just the line at the top of the articles. Nearly all of them have the form "in the *** region" at the top of the article and you could just skip the ones that don't exactly match that in the first sentence. I do agree though that changing it anywhere else would require manual checking. Let me know if you think that is a good idea to try and I will modify my code for that task. Kinda just trying to find a way to still use the code that I wrote. But if you don't think it is a good idea that is ok too still good practice. Lonjers (talk) 21:03, 13 January 2016 (UTC)
- I did not know that the template ignored the Region parameter and used the INSEE number. Sorry for your wasted effort @Lonjers:. Changing the article links within the prose is a huge task and cannot be easily done with a bot, as mentioned above. Also, the region names are only temporary for a few months. The regional governments must chose a new name by 1 July and the national government then has until October to recognize or reject the new region names. Except for Normandy, all of the new region names in the prose of the articles must be changed again when the official name is approved. AHeneen (talk) 03:32, 14 January 2016 (UTC)
- No worries Lonjers (talk) 23:19, 16 January 2016 (UTC)
- I did not know that the template ignored the Region parameter and used the INSEE number. Sorry for your wasted effort @Lonjers:. Changing the article links within the prose is a huge task and cannot be easily done with a bot, as mentioned above. Also, the region names are only temporary for a few months. The regional governments must chose a new name by 1 July and the national government then has until October to recognize or reject the new region names. Except for Normandy, all of the new region names in the prose of the articles must be changed again when the official name is approved. AHeneen (talk) 03:32, 14 January 2016 (UTC)
@Lonjers: Following comments above, do you wish to proceed with this BRFA in any manner? — HELLKNOWZ ▎TALK 15:55, 17 January 2016 (UTC)
- Lets wait to see how the discussion below with Rich goes. I probably do not want to proceed. But I will send you an update when I know for sure. Lonjers (talk) 22:44, 19 January 2016 (UTC)
- @Hellknowz: So I think my plan now is to use this to just remove the unused region name parameter from the articles. Should be simple to update the code to work like this. Let me know if you think this is a good idea. Sorry for being so long getting back to you. Lonjers (talk) 22:37, 25 January 2016 (UTC)
- Let me just say that the region hack is just that: a hack. Once the new names are finalised updating the infoboxen would be a good idea.
- hmmm can you explain why it would be a better solution to use a region name explicitly. Seemingly the templace editors made the choice to change it to use this way for a good reason as the template code looks pretty intense. Would be happy to use this to remove the now unused region parameters. But if there is a good reason let me know and I will try to contact the people who made the template and we can change the template to use the explicit region. Lonjers (talk) 22:44, 19 January 2016 (UTC)
- Let's suppose, for example, that someone, wittingly or unwittingly changes the INSEE. A random editor seeing the wrong region would be at a loss to fix it. A better solution might be to calculate the region and compare it with the given region, adding the article to a hidden tracking category if they don't match. It would also be better to encapsulate the region calculation in a reusable manner, such as
{{French region name from INSEE code}}
. All the best: Rich Farmbrough, 22:45, 20 January 2016 (UTC).
- It certainly confused me how the template works. I think that removing the region param from the current pages would be a good first step. And then changing the template to include the new template you mentioned would make things much cleaner. That template could then be used in other places as well. Lonjers (talk) 22:37, 25 January 2016 (UTC)
- Let's suppose, for example, that someone, wittingly or unwittingly changes the INSEE. A random editor seeing the wrong region would be at a loss to fix it. A better solution might be to calculate the region and compare it with the given region, adding the article to a hidden tracking category if they don't match. It would also be better to encapsulate the region calculation in a reusable manner, such as
- hmmm can you explain why it would be a better solution to use a region name explicitly. Seemingly the templace editors made the choice to change it to use this way for a good reason as the template code looks pretty intense. Would be happy to use this to remove the now unused region parameters. But if there is a good reason let me know and I will try to contact the people who made the template and we can change the template to use the explicit region. Lonjers (talk) 22:44, 19 January 2016 (UTC)
- I also think it would be a perfect pilot task to fix the "Centre" to "Centre-Val de Loire" (in the French commune infobox) now, so maybe continue this BRFA on that basis?
- All the best: Rich Farmbrough, 21:20, 17 January 2016 (UTC).
- So it is actually not obvious to me how the Centre gets in there instead of Centre-Val de Loire. It is not something in the markup of each commune article in the region. It is somehow being generated by the template incorrectly. I am going to look into fixing this in the template. Or if we do decide to explicity add the region to each pages markup the template should pull it form there I think. @Rich Farmbrough: Lonjers (talk) 22:44, 19 January 2016 (UTC)
- The infobox should show "Centre-Val de Loire", not Centre. It does for the communes I checked, except the caption of the detailed map, that's corrected now (may take some time / null-edit to show). For the new regions, these maps are not available yet, and maybe we should change them for department maps instead (when available). Indeed the parameter fields "department" and "region" in the infoboxes are not used anymore, so they can be removed (but don't have to). Markussep Talk 09:19, 20 January 2016 (UTC)