Jump to content

Wikipedia:Bots/Requests for approval/Lonjers french region rename bot

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Lonjers (talk | contribs) at 22:44, 19 January 2016 (Discussion). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Operator: Lonjers (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:36, Thursday, January 7, 2016 (UTC)

Automatic, Supervised, or Manual:Automatic

Programming language(s):Python

Source code available:https://github.com/utilitarianexe/wiki_france_region_rename

Function overview:Renames regions in info boxes for french department, commune, and arrondissement articles to deal with France consolidating regions as of Jan 1 2016

Links to relevant discussions (where appropriate):https://en.wikipedia.org/wiki/Wikipedia:Bot_requests#New_French_regions_on_1_January https://en.wikipedia.org/wiki/User_talk:AHeneen#help_with_info_box_renaming https://en.wikipedia.org/wiki/Module_talk:Wikidata#Suggested_test_case:_New_French_Regions

Edit period(s): one time

Estimated number of pages affected:30,000

Exclusion compliant (Yes/well will add the code tonight):

Already has a bot flag (Yes/No):

Function details:Fairly simple find and replace task. Bot first gets the list of all French department, commune, and arrondissement articles. It then searches for the appropriate info boxes on each article in the list. It finds in the info box the current region the article is mapped to. This region is then mapped to the new region it belongs to. Because the regions are simply being consolidated not rearranged this new region name can simple replace the old one. An example of an already corrected article(these get auto skipped by the bot) is https://en.wikipedia.org/wiki/Ard%C3%A8che one that is not fixed yet is https://en.wikipedia.org/wiki/Lot_%28department%29. It was suggested to use wikidata properties instead of simply a name replace. This is a good idea but would require much more work. Many of the pages that need corrected do not even have wikidata items yet. I plan on working on this too. But think the simple name replace should be done first. There may be cases were my regexes don't properly macth the info boxes of some articles. I plan on running the program first in a mode to check for any of these cases. It is not possible to do this manually. If any problems are found I will fix the code. This is before I make any edits. But even to do this check requires looking at many articles so I think I need approval first. This is my first time trying this so be patient

Discussion

Hi, User:AHeneen brought your proposal to my attention. You do not need to change the regions in the commune infoboxes because that is done automatically, using the INSEE code. See for instance Largentière: the infobox contains the line "|region = Rhône-Alpes", but this is ignored because the infobox uses the first two numbers of the INSEE code ("07132") to determine it is in the (new) region Auvergne-Rhône-Alpes. I have already updated the regions in all the relevant department, arrondissement and canton articles (infobox and article text). What still needs to be done is change the regions in the article text for the communes. Maybe a bot can help there. But be careful, because not all references to an old region should be changed, for instance Alsace may refer to the traditional region, not the former administrative region. Markussep Talk 08:36, 13 January 2016 (UTC)[reply]

Context-sensitive changes are very tricky for bots. We can try to come up with a restrictive replacement, maybe, but (and I'm sorry to say this) a manually-assisted AWB job might be the way to go here. Needs a closer look to see how these articles are structured. — Earwig talk 08:44, 13 January 2016 (UTC)[reply]
Thanks for the responses. Yikes that must have taken some time to edit all those manually. But better in the end because you fixed it in the article text too. I was avoiding that because of the context problems. User:The Earwig I think you could still edit the text in all the commune articles if you restricted to just the line at the top of the articles. Nearly all of them have the form "in the *** region" at the top of the article and you could just skip the ones that don't exactly match that in the first sentence. I do agree though that changing it anywhere else would require manual checking. Let me know if you think that is a good idea to try and I will modify my code for that task. Kinda just trying to find a way to still use the code that I wrote. But if you don't think it is a good idea that is ok too still good practice. Lonjers (talk) 21:03, 13 January 2016 (UTC)[reply]
There's an important lesson to be learned here about work being spent on bots that later turn out to be unnecessary. It happens, although for your first task it's a bit unfortunate. We can give what you are proposing a shot. I am thinking of some additional conditions, like skipping articles that already include the new region name (which have likely been migrated already) or have "was" in the same sentence as the region-to-replace, but it's still tricky to get right. You might want to start by going through the relevant articles and building a list of which ones the bot would definitely change, so we can get a sense of the number of edits and do some spot-checks. — Earwig talk 22:29, 13 January 2016 (UTC)[reply]
Did some checking and actually not very many of the articles match a standard template. In general it seems most of the articles don't even include the region in the text. The French wikipedia versions of the articles usually do but those seem to already be updated. I guess we should close this request for now. Still looking for little programming tasks to do on wikipediat if you have any suggestions. Lonjers (talk) 23:19, 16 January 2016 (UTC)[reply]
I did not know that the template ignored the Region parameter and used the INSEE number. Sorry for your wasted effort @Lonjers:. Changing the article links within the prose is a huge task and cannot be easily done with a bot, as mentioned above. Also, the region names are only temporary for a few months. The regional governments must chose a new name by 1 July and the national government then has until October to recognize or reject the new region names. Except for Normandy, all of the new region names in the prose of the articles must be changed again when the official name is approved. AHeneen (talk) 03:32, 14 January 2016 (UTC)[reply]
No worries Lonjers (talk) 23:19, 16 January 2016 (UTC)[reply]

@Lonjers: Following comments above, do you wish to proceed with this BRFA in any manner? —  HELLKNOWZ  ▎TALK 15:55, 17 January 2016 (UTC)[reply]

Lets wait to see how the discussion below with Rich goes. I probably do not want to proceed. But I will send you an update when I know for sure. Lonjers (talk) 22:44, 19 January 2016 (UTC)[reply]
  • Let me just say that the region hack is just that: a hack. Once the new names are finalised updating the infoboxen would be a good idea.
hmmm can you explain why it would be a better solution to use a region name explicitly. Seemingly the templace editors made the choice to change it to use this way for a good reason as the template code looks pretty intense. Would be happy to use this to remove the now unused region parameters. But if there is a good reason let me know and I will try to contact the people who made the template and we can change the template to use the explicit region. Lonjers (talk) 22:44, 19 January 2016 (UTC)[reply]
  • I also think it would be a perfect pilot task to fix the "Centre" to "Centre-Val de Loire" (in the French commune infobox) now, so maybe continue this BRFA on that basis?
All the best: Rich Farmbrough, 21:20, 17 January 2016 (UTC).[reply]
So it is actually not obvious to me how the Centre gets in there instead of Centre-Val de Loire. It is not something in the markup of each commune article in the region. It is somehow being generated by the template incorrectly. I am going to look into fixing this in the template. Or if we do decide to explicity add the region to each pages markup the template should pull it form there I think. @Rich Farmbrough: Lonjers (talk) 22:44, 19 January 2016 (UTC)[reply]