Jump to content

Wikipedia:Bot requests/Archive 57

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by ClueBot III (talk | contribs) at 10:20, 19 November 2013 (Archiving 2 discussions from Wikipedia:Bot requests. (BOT)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Archive 50Archive 55Archive 56Archive 57Archive 58Archive 59Archive 60

MySQL expert needed

The Wikipedia 1.0 project needs someone with experience working with large (en.wikipedia-size) MySQL databases. If interested, contact me or the WP 1.0 Editorial Team.— Wolfgang42 (talk) 02:19, 20 October 2013 (UTC)

Um, what specifically do you need help with? You'll probably get more people who can help if they know specifically what you need help with. Legoktm (talk) 02:24, 20 October 2013 (UTC)
Writing various queries to pull information out of the database. The problem is that the en.wikipedia database is very large, and I don't know enough about MySQL to be able to work with a dataset where queries can take weeks to complete. — Wolfgang42 (talk) 18:02, 20 October 2013 (UTC)
If queries take that long something is wrong with the database configuration. More often than not you need to add and/or use indexes. If you provide the table structure and index list queries shouldn't take more than a few hours at most. Werieth (talk) 18:08, 20 October 2013 (UTC)
Can you paste the queries? If you're running on labs, there are different databases like revision_userindex which would be faster if you need an index upon user. Legoktm (talk) 21:10, 20 October 2013 (UTC)
The query (being run on the Labs database) is:
SELECT page_title,
    IF ( rd_from = page_id,
        rd_title,
    /*ELSE*/IF (pl_from = page_id,
        pl_title,
    /*ELSE*/
        NULL -- Can't happen, due to WHERE clause below
    ))
FROM page, redirect, pagelinks
WHERE (rd_from = page_id OR pl_from = page_id)
    AND page_is_redirect = 1
    AND page_namespace = 0 /* main */
ORDER BY page_id ASC;

Wolfgang42 (talk) 22:53, 23 October 2013 (UTC)

The results from the second column seem odd, mixing varbinary & int data and the OR in the where clause doesn't help with the preformance. What exactly are you wanting to get from the database? -- WOSlinker (talk) 23:39, 23 October 2013 (UTC)
You're right—I pasted an older version of the code; I've fixed it to be the title both times. (My mistake for not checking that I had the latest copy in version control.) This query is a direct translation of an agglomeration of perl, bash, and C code which was parsing the SQL dumps directly. What it's trying to do is find redirect targets by looking in the redirect table, and falling back to the pagelinks table if that fails.
I would suspect that the 3-way join isn't helping performance any either, but unfortunately it seems to be needed. If there's a better way to do this, I'd love to see it. — Wolfgang42 (talk) 02:30, 24 October 2013 (UTC)

Try this and see if it works any better. -- WOSlinker (talk) 06:00, 24 October 2013 (UTC)

SELECT page_title, COALESCE(rd_title, pl_title)
FROM page
LEFT JOIN redirect ON page_id = rd_from
LEFT JOIN pagelinks ON page_id = pl_from
WHERE page_is_redirect = 1
    AND page_namespace = 0 /* main */
ORDER BY page_id ASC;
Putting the EXPLAIN keyword in front of the query will return the execution plan, indexes used, etc. --Bamyers99 (talk) 19:40, 24 October 2013 (UTC)

Request for a bot for WikiProject Military history article reviews per quarter

G'day, WPMILHIST has a quarterly awards system for editors that complete reviews (positive or negative) of articles that fall within the WikiProject. So far we (the project coordinators) have done this tallying manually, which is pretty labour-intensive. We have recently included GA reviews, and are having difficulty identifying negative GA reviews using the standard tools. We were wondering if someone could build a bot that could tally all FA, FL, A-Class, Peer and GA reviews of articles that fall within WikiProject Military history? In terms of frequency, we usually tally the points and hand out awards in the first week after the end of each quarter (first weeks in January, April, July and October), but it would be useful functionality to be able to run the bot as needed if that is possible. Regards, Peacemaker67 (send... over) 23:36, 13 October 2013 (UTC)

If this comes up, lemme know, 'kay? Maybe some of the other projects might be able to use it as well. :) John Carter (talk) 23:43, 13 October 2013 (UTC)
Hi, could someone clarify if I am in the wrong place (ie is this not a bot thing)? Thanks, Peacemaker67 (send... over) 03:09, 18 October 2013 (UTC)
G'day all, is this something a bot could do? Peacemaker67 (send... over) 19:59, 25 October 2013 (UTC)

New REFBot - feedback on user talkpages

20:11, 17 October 2013 > message A reference problem

05:19, 18 October 2013 > message A reference problem

11:19, 22 October 2013 > message A reference problem Two replys

  • 11:28, 22 October 2013 Not me Squire!
  • 11:28, 22 October 2013 OOPS, was me -fixed! (This editor is a Senior Editor II and is entitled to display this Rhodium Editor Star.)

The discussion is at Wikipedia talk:WikiProject Australian Roads/Shields, but in summary, there are sets of images transferred from Commons to here as {{PD-ineligible-USonly}}. The user that moved the files (downloaded them from commons then uploaded them here) wants to remove his involvement due to potential legal issues in Australia. Under existing policy, revdel, oversight, and office actions are not appropriate. It was suggested that a bot could upload the same files under a different name and nominates the old ones for deletion per WP:CSD#F1. - Evad37 (talk) 06:42, 26 October 2013 (UTC)

Marking talk pages of Vital articles

Can someone make a bot to mark all talk pages of Vital articles (all levels) with {{VA}}, and fill out its parameters (level, class, topic) if possible. It should also remove such templates from non-VAs.

Ideally this should run on a regular basis, but even a one-off run would be very helpful. -- Ypnypn (talk) 18:48, 28 October 2013 (UTC)

Bot to tag "PROD Survivors" and "Recreated Articles"

In this first paragraph, I will summarize my request: It would be good if someone could please create a bot which tags articles which were PRODded but survived (I shall call these "Survivors"). And/or which tags articles which were PROD-deleted then recreated (I shall call these "Recreated Articles"). You may tag them with {{old prod full}}. You may leave all the template's parameters blank, or you may fill some in.

Rationale: Such tags warn us not to re-add another PROD tag. They also make it more obvious to us that perhaps we should consider nominating the page for WP:AfD.

Here are some things you could do, but which I don't recommend: You could download a database dump with history, parse it, and look for Survivors. But such a dump is over ten terabytes of XML once uncompressed.[1] You could download the dump of all logs, download the dump of all page titles, parse the two, and look for Recreated Articles. User:Tim1357 tried parsing a dump,[2], but he didn't succeed: the matter is still on the latest revision of his to-do list. I suspect it may not be worth attempting either of these difficult tasks.

Here is what I do recommend: It would be worthwhile to create a bot to watch Category:Proposed deletion and tag future Survivors. And to watch for new pages and tag Recreated Articles. User:Abductive suggests some optional refinements.[3]

It would be good if someone could please start writing a bot to do either or both of these tasks. It would be even better if they could provide us with a link to their code-in-progress. User:Kingpin13 and User:ThaddeusB have expressed interest,[4] but nobody seems to have actually written any code to do these tasks on the live Wikipedia.

User:Rockfang started tagging Survivors in 2008 using AWB (the wrong tool for the job) but later stopped. S/he wrote that s/he "got distracted".

AnomieBOT already does one related task. If an article is AfDed, then recreated, AnomieBOT's NewArticleAFDTagger task puts {{old AfD multi}} on that article's talk page. The task's open-source[5] code is here. Maybe you could build on it, and maybe you could even ask Anomie to run it for you. Dear User:Anomie: Do you know if you or any bot ever tagged the pages which were recreated in the years before you wrote your bot?

Cheers, —Unforgettableid (talk) 04:32, 16 October 2013 (UTC)

For the record, I'm a "he". :) Rockfang (talk) 05:17, 16 October 2013 (UTC)
I do not know of anyone who went back and tagged all articles that had ever been deleted through AfD.
I considered the recreated-after-prod tagging at one point. But the task would require keeping a list of every article that was ever prodded and then deleted without the prod tag being removed, which I didn't think was worthwhile. The AfD tagging is easier, since the bot can just look for Wikipedia:Articles for deletion/{{PAGENAME}}. Anomie 11:45, 16 October 2013 (UTC)
I have investigated and found that probably somewhere between 95% and 100% of PROD-deleted articles have the all-caps string "PROD" somewhere in their deletion logs. So, detecting Recreated Articles would be easier than you think. :) Cheers, —Unforgettableid (talk) 19:22, 16 October 2013 (UTC)
"somewhere between 95% and 100%"? Which is it? Anomie 21:04, 16 October 2013 (UTC)
Out of the couple dozen PROD-deleted articles I checked, each and every one had the string somewhere in their deletion logs. But my sample size was so small that I cannot claim with certainty that 100% of PROD-deleted articles have it in their logs. —Unforgettableid (talk) 00:40, 18 October 2013 (UTC)
Whether the number is 95%, 100%, or somewhere in between, searching for the string is quite easy and quite effective. ISTM it's the best way to identify Recreated Articles. Dear all: what do you think? —Unforgettableid (talk) 06:43, 4 November 2013 (UTC)
I think the best way to handle it is get the All articles proposed for deletion category periodically. If an article was in one iteration and not the next, it was either deleted or the tag was removed. --ThaddeusB (talk) 17:55, 4 November 2013 (UTC)
I have long intended to make a bot to tag PROD survivors... is anyone else planning on programming this? If not, I can try to get started on it next week. --ThaddeusB (talk) 19:25, 18 October 2013 (UTC)
Dear ThaddeusB: If you do end up writing such a bot, please do let us know. :) Cheers, —Unforgettableid (talk) 06:33, 4 November 2013 (UTC)
If so, please name it CattlePROD Bot! Headbomb {talk / contribs / physics / books} 17:34, 4 November 2013 (UTC)
Since no one else seems interested, I will try to get to this by the end of the week. --ThaddeusB (talk) 17:55, 4 November 2013 (UTC)

Redirects in templates after page moves

Per WP:BRINT, redirects are undesirable in templates. Currently after a page move, bots (bless their hearts) sweep up all of the broken or double redirects etc., but the links in templates are left untouched. For instance, a page was moved from here to here in January but the accompanying template was not updated until today. Is is possible for a bot to fix redirects on templates that are on a page that is moved? Rgrds. --64.85.216.235 (talk) 05:51, 4 November 2013 (UTC)

Possible order of priority here. It may not be needed to move the redirect if the template is not actually on the page being moved. For example, if the Professional Fraternity Association changed its name to something else, it would cause a redirect on Template:Kappa Kappa Psi since that has a link to the PFA, but wouldn't need to be fixed as badly since the PFA page doesn't include the Kappa Kappa Psi template. — Preceding unsigned comment added by Naraht (talkcontribs) 17:47, 4 November 2013 (UTC)
Yes, to reiterate, a bot that can fix redirects on the templates that are currently on the page that is moved is the priority. Templates that link to the article but are not on the moved page are not priority. Rgrds. (Dynamic IP, will change whem I log off.) --64.85.216.79 (talk) 20:55, 4 November 2013 (UTC)

Star Wars Bot needed?

Bot for Star Wars articles needed, maybe? Might help monitor changes.20-13-rila (talk) 11:19, 5 November 2013 (UTC)

This is not a proper bot task request that can be implemented, especially without detail. What changes is it supposed to monitor? —  HELLKNOWZ  ▎TALK 11:17, 5 November 2013 (UTC)
I was thinking that it might help with the Star Wars WikiProject, which I am a member of. I am not sure if it is needed, which is why I would like to discuss it. 20-13-rila (talk) 11:35, 5 November 2013 (UTC)
May I suggest you discuss this with the project first and come up with a concrete proposal of what task(s) can be done and how. Without further detail, I doubt you will find many interested parties on this page (which is for requesting specific tasks). —  HELLKNOWZ  ▎TALK 11:37, 5 November 2013 (UTC)
Thank you, Rila 20-13-rila (talk) 09:32, 6 November 2013 (UTC)

We have WP:FANMP (a list of FAs yet to appear on the main page) and WP:WBFAN (a list of FAs and former FAs by nominator). Can someone think of a way to produce a hybrid for me, i.e. a list of FAs yet to appear on the main page by nominator? BencherliteTalk 20:30, 5 November 2013 (UTC)

Shutdown of blogs.amd.com

It seems that the articles have been moved to http://community.amd.com and http://developer.amd.com. I think all links to http://blogs.amd.com should be marked with {{dead link}} at least. Please fix them semi-automatically if you can. --4th-otaku (talk) 12:35, 4 November 2013 (UTC)

That's fine, I'll do them all quickly enough tomorrow. Rcsprinter (orate) @ 00:09, 6 November 2013 (UTC)
I can't seem to find any articles which link to blogs.amd.com. Rcsprinter (message) @
[6] There's not a whole lot. —  HELLKNOWZ  ▎TALK 17:12, 8 November 2013 (UTC)
Good find; it'll have to be tomorrow I run the thing. Rcsprinter (post) @ 22:00, 8 November 2013 (UTC)

Help needed tracking recent changes to medical content

User:Femto Bot used to populate Wikipedia:WikiProject Medicine/Recent changes which in turn updated Special:RecentChangesLinked/Wikipedia:WikiProject Medicine/Recent changes. I think that's how it worked. It reported all changes to pages with {{WPMED}} on their talk page. Anyway, it was an awesome tool for patrolling some of Wikipedia's most sensitive content. But since Rich Farmborough was banned from bot work it's stopped working - it only reports recent changes to pages beginning with "A".

This tool aims to do the same thing but it's slow and often times out, and when it works it's running a couple of days behind.

There was also Tim1357's tool, but his account has expired from the Toolserver.

I was wondering if somebody here would be able to provide WP:MED with something to replace these? With something like this a handful of experienced medical editors can effectively patrol all of Wikipedia's medical content. Without it, there's no telling what's happening. --Anthonyhcole (talk · contribs · email) 17:58, 4 November 2013 (UTC)

It appears that the source code for the bot is not available. I see you have attempted to contact him; I will take it over if you are successful getting the code, I can run it if needed. However, he will not be able to run it by himself I fear, due to ArbCom. --Mdann52talk to me! 13:23, 5 November 2013 (UTC)
Should be fairly trivial to write something like this up. Werieth (talk) 13:40, 5 November 2013 (UTC)
1. See VPT for current development.
2. I have asked for a module solution in Lua (negative for the full automation)
3. I am putting a fresh page up manually right now.
4. I moved the RELC page to Wikipedia:WikiProject Medicine/List of pages/Articles for future automation and expansion (and because the old page name was incorrect).
Will be back later on. -DePiep (talk) 14:29, 5 November 2013 (UTC)
Yes. (just curious: my page is 795k (28391 articles), yours is without *bullets is 871k - uses another source category? I started AWB for this, checking four cats deep.)
Now, the MED people are served and I have little time today & tomorrow. So I'll pick it up later. In short, this is my concept around the bot action:
  • A project editor can put a notice (template) on the Project page. The template is called like "{{RELC list: please bot make some RELC lists for this project}}". Parameters are set for: |RELC list1 namespace1=Article [space] + Talk [space], |RELC list2 namespace2=Template + template talk, |other parameters like 1x/month=. The template is invisible. Just like what User:MiszaBot/config does on talkpages to archive.
  • The bot sees the request and writes the list on a dedicated "RELC list" page (in its own section: say ==Pages==. The bot is not the only one that writes on that page).
  • Systematic page names are build like:
Wikipedia:WikiProject Medicine/List of pages our top page
Wikipedia:WikiProject Medicine/List of pages/Articles
Wikipedia:WikiProject Medicine/List of pages/Articles + Talks 0-9 A-M
Wikipedia:WikiProject Medicine/List of pages/Articles + Talks N-Z
Wikipedia:WikiProject Medicine/List of pages/Articles + Talks
Wikipedia:WikiProject Medicine/List of pages/Templates
Wikipedia:WikiProject Medicine/List of pages/Templates + Template talks
Wikipedia:WikiProject Medicine/List of pages/Non-articles
Wikipedia:WikiProject Medicine/List of pages/Non-articles + non-articles talks
The naming suggestion is: first use namespace names into plural; readers see this on top of the RELC special page, so a natural page name is valuable. We also need codes for those "all non-articles" and "A-M" requests.
  • A template for project page, now {{RELC list}}, will use these page definitions too (so we must agree on the names and other protocols), and produces the special links on a project page (as {{RELC list}} does for WP:MED now).
  • There are also other templates like {{RELC list/Listpage header}}
  • FYI, I build such a set, list pages maintained manually, for WP:ELEMENTS at{{WikiProject Elements page lists}}.
  • Trick: the page should contain its own name, so the RELC reader sees: "page was updated on ...".
  • Trick: necessary off-topic pages, like the header template, would appear in the RELC view after edits (disturbing the view because itself not on topic). I created and used a Redirect, which does not change and so does not appear in the special view.
  • Will go writing on the WT:MED page now.
See you Thursday. -DePiep (talk) 16:50, 5 November 2013 (UTC)
Im pretty sure that I can generate a list based on any criteria you need. User:Werieth/sandbox didnt use a category, but rather it used a list of all pages that had {{WikiProject Medicine}}. Defining how you want the lists generated should be doable, we would just need to define a template setup similar to User:MiszaBot/config. The important factors for getting this going is to clearly and simply define things. Break it down to the very basics of what you are looking for, dont factor in how something is done, just what you want done, leave the how for me. Werieth (talk) 17:52, 5 November 2013 (UTC)
If you're asking what functionality WP:MED needs, I was very happy with Special:RecentChangesLinked/Wikipedia:WikiProject Medicine/Recent changes in terms of speed and features. --Anthonyhcole (talk · contribs · email) 18:44, 5 November 2013 (UTC)
re Anthonyhcole@. The page you mention had its last update in 2012. That could be solved of course by updating it today. But there is also this: it was 35k in size, which means it listed only a small part of all the WP:MED pages. See this old version of that page. How small? Today the 'updated' page (named Wikipedia:WikiProject Medicine/List of pages/Articles; big page) has 28.391 MED articles listed, and is 700k. That means your page only listed 35/700=5% or 1400 pages. It did not serve its purpose, one never saw that B-cell chronic lymphocytic leukemia (first B page) was edited. Different check: today I checked the RELC workings with only the MED articles starting with "A" (2500 pages, 70k page). So the old page not even had the "A" complete. It was missing 95% of its target. How was that a good feature?
About speed: opening the Special page to show the edits (WP:MED Articles - Related changes), the special page we want, has acceptable speed, is not slow (for me). Anyway we should not "improve" it by leaving MED articles out at random, do we. It is opening the big list page itself that is slow (700k). That is why I advise readers to leave that page alone, and only the Special page RELC reads it (fast) to produce the desired overview.
If I am missing something, or mistaking your point, please tell me. User experiences (good and bad) are best reported at WT:MED. -DePiep (talk) 19:23, 5 November 2013 (UTC)
FYI, the tool is been changed from "RELC list" into Page reports. -DePiep (talk) 21:51, 8 November 2013 (UTC)

The "OLAC" (Open Language Archives Community) website has consistently helpful pages about resources for the languages of the world, especially the endangered and lesser-taught languages. The OLAC pages use a URL which ends with a three-letter code from the ISO 639-3 language code list, which is found in our language articles infobox. Each OLAC page has a nice descriptive title at the top, such as OLAC resources in and about the Aguaruna language.

Rather than adding several thousand OLAC page links to the External links sections of language articles by hand, couldn't we just write a bot to do this?

I know some languages have multiple language codes in their Wikipedia infobox, due to multiple dialects or language variants. Even if the bot didn't add links for languages with multiple codes, it would still be a big time-saver!

What do you think? Djembayz (talk)

If you look at ǂKx'ao-ǁ'ae you'll see that it already has a language infobox, and that includes a link off to an external site. Why not modify the infobox to add the language-archives.org link? Josh Parris 00:10, 9 November 2013 (UTC)

Without getting too deep into tin foil territory, encrypting is one of many essential steps to ensure readers' privacy. Since October 24, 2013, the Internet Archive now uses HTTP Secure (https://) by default [8]. Just this week they updated their server software so it can handle TLS 1.2, the latest version. It is safe to say they encourage their visitors to access their site using an encrypted connection.

In my opinion, Wikipedia should support this effort and switch all outgoing links to the Internet Archive to HTTPS. According to Alexa, Wikipedia currently ranks fourth among upstream sites to archive.org [9]. {{Wayback}} was already updated in that regard, but most of the links to the Wayback Machine are implemented in one of the many citation templates as encouraged at WP:WBM. I started to fix a lot of those links manually, before realizing it would be a perfect job for a bot.

The Wayback Machine links have a common scheme, e.g. https://web.archive.org/web/20020930123525/http://www.wikipedia.org/. So the task is this: find http://web.archive.org/web/ throughout the article namespace and replace with https://web.archive.org/web/. That's it. --bender235 (talk) 20:51, 8 November 2013 (UTC)

See WP:NOTBROKEN and WP:COSMETICBOT Werieth (talk) 21:08, 8 November 2013 (UTC)
This is not a cosmetic change. It's not like switching http://archive.org/ to http://web.archive.org/, which would indeed change nothing. But switching to https changes the transport mechanism, from unencrypted to encrypted. Even tho it looks simple, it has significant consequences. --bender235 (talk) 21:17, 8 November 2013 (UTC)
In this case changing the transport protocol doesnt make much of a difference. No data other than the page contents (which can easily be retrieved via both secure and non-secure methods) is being transmitted. Thus the possible intercepted data risk is null. All it would do is generate a false sense of security. If you really think it should be done, you might look into a lua replacement module that can be plugged into the citation templates. Werieth (talk) 21:27, 8 November 2013 (UTC)
Lua is a most cluefull suggestion. Josh Parris 23:51, 8 November 2013 (UTC)
I agree with Werieth, this is a huge number of edits (over 160,000 in mainspace) for something that's not broken. If we really want to do this, it would be better as something like a low-priority (only done in combination with more significant changes) change in another tool like AWB. Mr.Z-man 21:22, 8 November 2013 (UTC)
Okay, I'll do that. --bender235 (talk) 23:30, 8 November 2013 (UTC)
I think you've misunderstood. I said it should be done only in combination with more substantial changes. I certainly wasn't saying to go and make 160,000 edits with AWB. Mr.Z-man 00:34, 9 November 2013 (UTC)
I won't do that. I just added it to the regular typofixing scan I do regularly anyway. --bender235 (talk) 00:36, 9 November 2013 (UTC)
Note to everyone: I started a discussion on this over at Village Pump. --bender235 (talk) 10:40, 9 November 2013 (UTC)

Adding ISOC (international), SOC (US) and NOC (Canada) job codes to professions infoboxes

Would it be possible to import those 3 standardized codes into the professions infoboxes ? --Teolemon

Importing CNP Codes (Quebec and Canada)
Importing International Standard Classification of Occupations Codes (International)
  • International Standard Classification of Occupations (en)
  • Standard International Classification code for jobs. "ISCO is a tool for organizing jobs into a clearly defined set of groups according to the tasks and duties undertaken in the job."
  • XLS Structure of those codes: http://www.ilo.org/public/english/bureau/stat/isco/index.htm
  • Value for Librarian is: 2622 (Librarians and related information professionals)
Importing SOC Codes (US)

The individual occupation items don't have yet any SOC codes associated with them, but they are in broad occupation categories in enwiki that should make it easier to match:

Here's the list of SOC codes for matching with the existing items.

— Preceding unsigned comment added by 2A01:E35:2EA8:950:5BF:1AF3:3374:F3D0 (talkcontribs)