Jump to content

User talk:Beland

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by A beautiful mind~enwiki (talk | contribs) at 18:32, 23 February 2006 (Perl coding). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This page has a backlog that requires the attention of its owner, who is working for a startup and a bit short on time at the moment.

/Praise /Notable

Feel free to leave a note on this page in the usual manner. When I reply, I usually move your comments onto your talk page, along with my reply, to keep the entire conversation together. I only keep stuff on this page if it requires further action from me, just to keep things tidy. Other memorable comments are moved to one of the above subpages. Feel free to copy or move the conversation back if you need to reply on the same topic, or just post your reply; I'll know what you mean, and I can always check the page history if memory fails me. -- Beland 22:47, 4 September 2005 (UTC)[reply]

I'm wondering why Pearle created this category. I had thought User:D6 created it, since I only noticed it after D6 had added Columbus Township, St. Clair County, Michigan to the category, which was otherwise empty. Anyhow, township categories are a bad thing, IMO, with the possible exception of some large charter townships that might have enough going on in them to warrant a category, like some cities have. olderwiser 03:37, Nov 17, 2004 (UTC)

It looks like a link from Columbus, Michigan to "Columbus_Township, St._Clair_County, Michigan" was interpreted as being a link to a county with the name "Columbus Township, St. Clair", so Columbus, Michigan was added to it. Then there was a second pass to add all Michigan county categories to "Category:Michigan counties", which by necessity creates a lot of new categories (because many only had articles assigned them, not any intro text), so no warning was triggered.
I checked, and this is the only township category Pearle has created. I agree they they are unnecessary, as I wrote in the WikiProject Cities proposal. I will nominate this category for deletion, and make a note to modify matching logic to avoid this problem in the future. Thanks for checking up on it. -- Beland 04:29, 17 Nov 2004 (UTC)

Stub sorting bot

I have put a proposal for a bot to help stub sorting at Wikipedia:Bot requests#Stub_sorting_bot. The task seems well suited to Pearle. Susvolans (pigs can fly) 17:14, 6 Jun 2005 (UTC)

Transwiki backlog

Hey Beland! I know what you mean. The transwiki backlog pains me everytime I see it. Especially because I know from experience that I could spend all my time on Wikipedia transwikiing, but without a bot, it would grow faster than I could do the work (and it's tedious work either way). Unfortunately, the bot, my McBot was broken by the last software upgrade. So I don't have that. However, just yesterday I got Cryptic's transwiki script working and will be using that in the future (though currently I'm swamped with moving to college. My computer is in the mail as we speak). Though it's not as automated as the old bot, it is ultimately a better solution in that everyone could get it and do a little bit of the whole backlog. In case you didn't notice, I'm trying to sell it to you, too... :) So short answer is I'm still working on them, though at a slower rate and with less time available at the moment, so we should spread the word and get others on it too if we can. Although now that I think about it, the transwiki log backlog is just as horrendous. Humph. Guess I'm not much help to you and I'm starting to ramble. But before I go, since I don't remember having the pleasure of talking to you much before, I just wanted to say that this Beland character shows up on my watchlist like every five minutes cleaning up the Wikipedia and keeping everything in order and tidy. You're a true WikiGnome (that's a good thing of course), just like I consider myself. So thankson behalfof all of us. Dmcdevit·t 05:56, August 7, 2005 (UTC)


Graphics tutorials in the project namespace tagged for copying to Wikibooks

Please discuss the disposition of these articles at Wikipedia talk:Graphics in two modes/move. Uncle G 14:13:57, 2005-09-05 (UTC)

Request for addition to Pearle Bot

I notice that you are doing a lot with categories with Pearle Bot, please could I ask you to add to it title sorting feature for List categories - e.g. changing [[category:Rail transport related lists]] on articles starting List of blah to [[category:Rail transport related lists|blah]]. I requested this at Wikipedia:Bot requests#Title sorting on Category:Lists back in March, but nothing has come of it. Thryduulf 13:27, 8 September 2005 (UTC)[reply]

Question about categories

(Replied on User talk:Carcharoth; see also Category talk:Natural hazards)

Hi there. I recently got interested in categories, and I spent some time reading this page [1] and the discussions on the talk pages, especially this one [2] and found this project page [3] and also a page about this category map [4]. All very interesting stuff, and I definitely want to get involved with categories a lot more. Part of the reason I am putting this on your talk page is that I found myself agreeing with a lot of what you said in those discussions. What I was hoping for was some feedback on what I did recently on some categories. In particular, the way I organised the Category:Natural hazards (I think I may have gone too far, especially with the awkward subcategory of Category:Biological hazards, plus I had to work with an existing awkward confusion between Natural disaster and Natural hazard). I also have ideas for similar organising of the Category:Disasters (which I haven't got round to doing yet), plus I also created and populated Category:Wildfires. Before going any further, I wanted to check that I'm not doing anything too wrong, so I'm putting this comment on the Talk pages of several people I saw participating in the discussions I read. I'm not too sure yet how these talk pages work either, so comments on the appropriate page might work better. Plus, is there an easier way to do this categorisation? It is sometimes a bit laborious! Carcharoth 19:51, 8 September 2005 (UTC)[reply]

Dump-based conversion

So I have a list of 11,610 articles that contain HTML entities (&foo; and &#XXX;) and URL-encoded characters (%NN) in links to other articles. Converting these to native Unicode entities would be convient for me in the way that I analyze offline database dumps. I think weird characters in links are also the most confusing place them to be for editors. Would you be able and willing to make use of this list to feed to Curpsbot-unicodify? I also have a list of 3,000 or so articles with links that have double, leading, or trailing spaces in links. Does the bot fix these cases as well?

In the long run, it seems like it would be useful to feed the bot a list of articles that needs to be fixed, rather than trolling various categories for candidates. If you would like me to provide such lists (or some scripts to produce them from database dumps), let me know.

Thanks!

Beland 01:52, 18 September 2005 (UTC)[reply]

Sure, it would be useful to have that list, and I could run the bot over it. One problem, though, I don't use e-mail, so perhaps the easiest thing to do would be to dump it into the Wikipedia:Sandbox and then give me a link to the revision in question.
The bot doesn't currently fix leading, trailing, or double spaces in links (or double underscores), but could easily be modified to do so.
You are right that it would make more sense to use database dumps as a way of generating targets for the bot rather than trawling categories. I'm pretty sure I looked at a page once that had links to database dumps, perhaps you can point me to it. What are your scripts written in? -- Curps 05:13, 18 September 2005 (UTC)[reply]
Well, to avoid non-ASCII character conversion problems, I had Pearle upload the files directly. I'll delete them when they are no longer needed. You can edit User:Pearle/for-curps to get the weird-character list, and User:Pearle/for-curps2 to get the extra-spaces list. Database dumps are found at: http://download.wikimedia.org/wikipedia/en/
Further information on dumps is at Wikipedia:Database download. My scripts are written in Perl, though it's pretty trivial to write something that will search for a certain string in raw wikitext. As long as you don't mind downloading and storing a gigabyte or two.-- Beland 05:52, 18 September 2005 (UTC)[reply]
The bot has now completed the list in for-curps2, but I avoided removing leading and trailing blanks in many cases because of the possibility of unexpected user-visible changes (for example:
text[[ link]]
which actually occurred in one article). Some of the Template:Infobox* templates had to be reverted because underscores can occur as the names of parameters, and inside templates these can occur within [[ ]]... I'll have to give some thought to that, either avoid processing templates entirely, or perhaps just avoid doing underscores within [[ {{{ }}} ]] -- Curps 01:41, 24 September 2005 (UTC)[reply]


Citation issues

You may be interested in reference/citation content/format issues in Talk:Global cooling#Citation format poll (see preceding discussion) and Wikipedia:Requests for comment/SEWilco#Response. (SEWilco 06:00, 30 November 2005 (UTC))[reply]

And if you are interested, please be aware that it is rather more than this: its a conduct dispute about SEWs behaviour; see the entirety of the RFC, and of course Wikipedia:Requests for arbitration/Climate change dispute 2/Evidence. William M. Connolley 09:31, 30 November 2005 (UTC).[reply]

Help portal

Looks promising; a definite improvement to the existing page. Keep up the good work! Dan100 (Talk) 13:24, 13 December 2005 (UTC)[reply]


admin accessible tasklist for continuously running Pearle?

Hi - K1Bond007 and I are babysitting WP:CFD while Kbdank71 and Who are temporarily unavailable. It has become painfully apparent that the cleanup tasks should really be automated. Who's been the primary cleanup dude for a while (doing it manually before he started using Whobot - which I think you're aware of). In any event, category renames can now be done semi-automatically by turning a renamed category into a categoryredirect whose contents are then recatted by AllyUnion's NekoDaemon. This is not too much different than using Pearle or Whobot, but NekoDaemon runs every hour or so (automatically). K1Bond007 and I have had a little chat about this, and it seems like it might be subject to some vandal misuse. I've suggested to AllyUnion that NekoDaemon also respond to something similar to categoryredirect for deleted categories, and he's a little resistant (preferring some sort of community consensus - perhaps related to the vandal misuse issue). What would you think about having a protected page with a tasklist for an essentially continuously running (alright, periodically running) Pearle? I'm not sure what server AllyUnion runs his bots on (could be one of the Wikipedia servers, but I really don't know), but I suspect he'd be amenable to creating an "at" job to run a copy of Pearle on a protected task list. Does this seem like a reasonable idea to you? -- Rick Block (talk) 05:14, 24 December 2005 (UTC)[reply]

Hmm. Actually, Pearle is already running on a daily cron job from my laptop, to refile things from Category:Wikipedia cleanup and update Template:Opentask. But these are commands I've specified manually. There are some security implications when people are able to give arbitrary text as commands for Pearle, especially given that her source code is published. I would probably be comfortable doing that after reviewing the code for security loopholes accessible via the command pathway. Though it might be a good idea to run Pearle as a separate user account (on the local Unix box) in case she is compromised. I suppose with a protected page, you'd be able to trust most people able to give input...but. Running such a bot under its own Wikipedia account (separate from Pearle) is also probably a good idea, since it will likely generate complaints about the categories being deleted and renamed from people unaware of the WP:CFD process, and the admins who do cleanup on WP:CFD should probably be the ones to deal with that.
Having people put a "kill me" template on a category they want to de-populate seems considerably more secure; you just need to recognize that it exists, rather than processing it as a command. Perhaps also slightly more reliable - there's little possibility of some kind of parse error causing havok. The concern about vandal abuse is legitimate - or people accidentally confusing {{cfd}} with the "kill me" tag. But I don't see how it's really much different than the potential for abuse with redirects. It's also not that hard to undo any damage that a vandal does, given that we already have bots to recategorize articles en masse, and the changes would be automatically logged under the user contributions of the bot. One thing that Pearle does do is check to see if a category to be moved was properly tagged {{cfr}}, {{cfd}}, etc. She also now has history-parsing code, which we could recycle to see if the tag was added at least seven days ago, giving a warning message if it was not. That right there would probably stop most vandals. We should probably also be checking Category:Categories for deletion for categories that are not on WP:CFD anyway; that would close the next loophole. The only thing left would be categories that have been tagged, discussed, and closed as "keep" being tagged by someone who didn't agree with the decision. But that could easily happen now by someone editing the "delete me" section of WP:CFD in bad faith, so, feh.
I will need to publish some updates to Pearle before anyone clones her, so if let me know if anyone wants to go ahead and do that. I don't know how much more I can help, because it really is starting to get to be crunch time for my move, and it sounds like my new job is going to start right away in January. Though I seem to have a vague recollection that NekoDaemon is implemented in Python, and I have no idea what mechanism consensus is going to settle on here. -- Beland 06:53, 24 December 2005 (UTC)[reply]
It's implemented pywikipedia framework, and so, yes, it is implmented in Python. I'm ashamed to admit that NekoDaemon has a gaping security hole regarding to vandal abuse when it comes down to the {{categoryredirect}} template. I only asked about community consensus to get a better idea what would be the best course of action handling the situation to prevent vandalism abuse that I may encode into NekoDaemon. Oh, and the server I happen to use is one of the Department of Computer Science at University of California, Riverside's servers. --AllyUnion (talk) 12:48, 25 December 2005 (UTC)[reply]
Hmm... hmm... I'd be inclined to use Special:Listusers/sysop unioned with Special:Listusers/bureaucrat as the basis for an approval list. --AllyUnion (talk) 12:55, 25 December 2005 (UTC)[reply]
I wrote a draft proposal for an automated cfd cleanup process, see Wikipedia:Categories for deletion/cleanup. Before making this too public I'm inviting comments from a few selected stakeholders. Please comment. -- Rick Block (talk) 17:11, 1 January 2006 (UTC)[reply]

Image:Skandinavism.jpg

Hi Beland. You've posted a question regarding the Image:Skandinavism.jpg. I've posted a reply on Image talk:Skandinavism.jpg. Merry Christmas. --Valentinian 21:58, 25 December 2005 (UTC)[reply]

USCOTW

We have revived the USCOTW. I am having trouble updating the template. It appears to pull from Wikipedia:U.S. Wikipedians' notice board/USCOTW/current which I have updated. The update appeared on the WP:USCOTW page but not on the template page nor on Wikipedia:U.S. Wikipedians' notice board, and possibly not on other pages which use the template. I saved an edit to the main template with no changes and that seems to have fixed the problem (for now?). I saw that you made this current set up (with the article name in a separate template back in September, and I am hoping you can help with this. Will the main template need to be resaved (with no changes) every time? Thanks! Cmadler 14:07, 3 January 2006 (UTC)[reply]


Help page

I've noticed that you are knowledgeable about help pages and the links to them. I've just done a major overhaul of the help page, and was wondering if you would go over it and check it for completeness. I'd really appreciate it. Thanks. --Go for it! 00:03, 13 January 2006 (UTC)[reply]

I am currently looking for help (a perl script) to parse and separate red and blue links for various lists at Missing encyclopedic articles. I have made a request at Wikipedia:Computer_help_desk#New_cases, but it has languished for over a week. There was a script that does some of what we need it to do developed by Avar, but it only works on a few of the many lists we are currently working on. Specifically I'm looking for something that would separate and sort a list like this:

  1. Link 1 External search for link 1 Comment about link 1 with link to another article]
    Nested comment about link
    Second nested comment
  • Wrongly nested comment
  1. Link2 Notice space has been removed
  2. Link3] Malformed links with [malformed link
  1. Link4 Renumbering because of space
  2. Link 5
  3. Link 6

into a list like this

Red links

  1. Link2 Notice space has been removed
  2. Link3] Malformed links with [malformed link
  3. Link4 Renumbering because of space

Blue links

  1. Link 1 External search for link 1 Comment about link 1 with link to another article]
    Nested comment about link
    Second nested comment
    Wrongly Fixed nested comment
  2. Link 5
  3. Link 6

This is a worst case example. The current script works well, but it evaluates link by link, not line by line and so [comments] about the link would be removed.

See separting reds and blues for more comments.

Your help would be greatly appreciated. --Reflex Reaction (talk)• 22:00, 17 January 2006 (UTC)[reply]

History of Science

Please consider joining the History of Science WikiProject.--ragesoss 00:49, 28 January 2006 (UTC)[reply]

Good work

Good work on the Jihad draft Beland. My only concern is the examples section which I don't think is necessary. Events like that are already mentioned on several wikipedia articles about terrorism. But nice clean up job and minor edits can be made later. And I agree with you that the DOJ definition is not necessary. Thanks --a.n.o.n.y.m t 20:12, 5 December 2005 (UTC)[reply]

COTW Project

You voted for Male and Female, this week's Collaborations of the week. Please come and help them become featured-standard articles. -- King of Hearts | (talk) 23:08, 30 January 2006 (UTC)[reply]

Hey, I don't know if you're still watching this article or not, but could you take a look at it, if you have a chance some time, and let me know what more you think it needs? I'm probably going to submit it to peer review soon, but wanted to get a few more eyes on it before I do so. Thanks, MC MasterChef :: Leave a tip 00:00, 15 November 2005 (UTC)[reply]

KTVX

You have received this message because you have edited a Salt Lake City media article in the past. We have recently had an edit war regarding the wording and inclusion of a paragraph on the KTVX article. In hopes of resolving this I have put together an informal survey. If you are interested, please stop by Talk:KTVX and add a vote. Thanks, A 09:12, 7 December 2005 (UTC)[reply]

Good grief. Replied. -- Beland 02:56, 8 December 2005 (UTC)[reply]

History of Foo subcats

  • Make sure Wikipedia:Category clearly explains how to add and remove articles from categories, and how to create, delete, and rename categories.

Hi - I was taking a look at Wikipedia:Most referenced articles and it said to ask you if you'd like something extra done. I've seen you're very busy with other things, so this can go bottom of your to-do pile, but I was wondering if you could do a new list for Jan '06, excluding US Census data... the latter skew results (IMO) unfairly.

Sorry to be a pain, Deano (Talk) 23:12, 3 January 2006 (UTC)[reply]

  • I think the thing to do to avoid a million very-similar-but-slightly-different lists is to sort the list by topic: United States, Countries of the world, Cities, Political parties, Decades, Dates of the calendar year, etc. I'm pretty sure I can do this automatically by looking at what categories the articles are in. I will give this a try sometime, though probably not soonish, since I don't have much free time right now. Thanks for the suggestion, Beland 07:38, 12 February 2006 (UTC)[reply]

Yeah I think you're probably right. If/when you do get round to doing something like that, please drop me a line. Cheers, Deano (Talk) 14:35, 12 February 2006 (UTC)[reply]

Perl programming

Hi. I'm a Perl coder and I got directed towards Wikipedia bots so that I could use my knowledge in this area. I've looked at Pearle and I think I could improve on it considerably. I think I could cut down the sourcecode to around fifth of its size and also make it faster. If there are any new feature requests maybe I could take care of those too? I realise you're busy, that is why I'm offering some coding help. A beautiful mind 18:26, 22 February 2006 (UTC)[reply]

Nifty! I'm not sure Pearle is in much need of a speedup. There's a bot speed limit that prevents more than one edit every ten seconds. The part that is computationally intensive, OPENTASK_UPDATE, runs as a cron job, so I don't really notice how long it takes. I think most of the category-cleanup work is being done by other bots now (some of which are clones). If you want to work on it anyway, let me know, and I will publish some changes I've made since the last source code upload.
On the other hand, I have a collection of scripts that analyze Wikipedia database dumps which can take days to run, and which does have a number of outstanding feature requests. It also needs a bunch of cleanup and it would be nifty if it could run in a more automated fashion. It takes maybe 6-7 GB of hard drive space and at least 700MB of physical RAM to run. If you have insufficient resources to run it at home, you might get an account on the meta:Toolserver (which is unfortunately located in Germany and runs Solaris). There are many reports that various groups use to find and fix certain types of problems with articles - you can see some examples on Template:Active Wiki Fixup Projects. Keeping these reports up to date definitely helps make the editors that use them more productive. If you want to work on that, we should probably find some sort of CVS or Subversion repository - it would be nice if one were running on the Toolserver, but Sourceforge might also work.
Wikipedia:Bot requests is also in need of a lot of attention. There are many requests that the comments on the page indicate should not be filled, and these need to be archived. There are some excellent ideas that just need a programmer to implement them, and there are some that may or may not be a good idea, and need some discussion. I'm sure many people would be happy if some of the good requests there were filled. Feel free to borrow the wiki-editing bits from Pearle if you happen to want to build any new Perl-based bots; the code is not copyrighted. Also be sure to request permission on Wikipedia talk:Bots if you intend to actually use a bot to edit Wikipedia.
Let me know if you need any help getting started with whichever project you think is most worthy of your time. Thanks for your helpful interest! -- Beland 14:11, 23 February 2006 (UTC)[reply]
I'd like to take a look at those scripts that analyze database dumps, I'm quite good at optimizing. Since I don't have a machine at home which has that amount of RAM I started the application process for Toolserver. I'll also take a look at bot requests, but I need to get more familiar with wikipedia before I can understand the depths of wikimedia-related topics. Since Pearle is working fine for your needs then I think I'll try to concentrate my efforts elsewhere first. A beautiful mind 18:32, 23 February 2006 (UTC)[reply]


Follow up