User talk:John of Reading
This is John of Reading's talk page, where you can send him messages and comments. |
|
Archives: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28Auto-archiving period: 21 days |
Making a dent
Hi, hopefully I'm making a dent on the number of articles with typos. How many articles do you consider have typos? Sun Creator(talk) 23:46, 5 November 2019 (UTC)
- @Sun Creator: That's a difficult one! There are some numbers at Wikipedia:Typo Team/moss#New statistics, but that approach does not attempt to count multi-word typos such as "to became". Overall, I think we're making progress; it's getting much harder to find common typos. Some years ago, each time I downloaded a fresh database dump I'd find hundreds of "the the" errors; now I find about 30. -- John of Reading (talk) 07:13, 6 November 2019 (UTC)
- An AWB typo scan through a random list of 1000 articles gives a match in the region of 120 articles or 12%. Scale that to the current EN article count of 6,938,158, gives about 720,000 articles to be done. Quite a bit more than the statistics above. Sun Creator(talk) 16:48, 16 November 2019 (UTC)
- What fraction of these matches are just the quote-straightening style rule? -- John of Reading (talk) 17:37, 16 November 2019 (UTC)
- About 100,000 or so according to this, although I checked a small sample and found that 50% are quote-straightening ’s → 's. Not including the similar rules: isn´t → isn't, haven´t → haven't, they’re → they're, don’t → don't etc. Sun Creator(talk) 20:19, 16 November 2019 (UTC)
- I downloaded a copy of the Wikipedia text of 20191121. Then used the database scanner to search for ’s (the incorrect apostrophe type) and it quickly hit a 30K article limit. Based on the % done the total number of articles is nearer 500K then 100K. I'm reluctant to continue correcting them on bulk as although I've not had any complaints some edits do get reverted with unnecessary. Sun Creator(talk) 18:56, 23 November 2019 (UTC)
- @Sun Creator: The 30K limit is configurable on the "Searching" tab of the database scanner, though if the true figure is 500K it will probably run out of memory somewhere and crash. If the quote straightening rules get moved into the general fixes (phab:T231012) then it is possible that a bot might get approved to do those. -- John of Reading (talk) 19:46, 23 November 2019 (UTC)
- Scanned again with only main/article space. RAM was easy, requiring less then 4Gb. 571440 matched articles. Some would be in parts of an article off limits to AWB/T like within references and quotes, but still it is a lot. Sun Creator(talk) 20:47, 23 November 2019 (UTC)
- @Sun Creator: The 30K limit is configurable on the "Searching" tab of the database scanner, though if the true figure is 500K it will probably run out of memory somewhere and crash. If the quote straightening rules get moved into the general fixes (phab:T231012) then it is possible that a bot might get approved to do those. -- John of Reading (talk) 19:46, 23 November 2019 (UTC)
- I downloaded a copy of the Wikipedia text of 20191121. Then used the database scanner to search for ’s (the incorrect apostrophe type) and it quickly hit a 30K article limit. Based on the % done the total number of articles is nearer 500K then 100K. I'm reluctant to continue correcting them on bulk as although I've not had any complaints some edits do get reverted with unnecessary. Sun Creator(talk) 18:56, 23 November 2019 (UTC)
- About 100,000 or so according to this, although I checked a small sample and found that 50% are quote-straightening ’s → 's. Not including the similar rules: isn´t → isn't, haven´t → haven't, they’re → they're, don’t → don't etc. Sun Creator(talk) 20:19, 16 November 2019 (UTC)
- What fraction of these matches are just the quote-straightening style rule? -- John of Reading (talk) 17:37, 16 November 2019 (UTC)
- An AWB typo scan through a random list of 1000 articles gives a match in the region of 120 articles or 12%. Scale that to the current EN article count of 6,938,158, gives about 720,000 articles to be done. Quite a bit more than the statistics above. Sun Creator(talk) 16:48, 16 November 2019 (UTC)
?<=\w)[´ˈ׳᾿‘’′Ꞌꞌ`;]s\b(?<!'\w[´ˈ׳᾿‘’′Ꞌꞌ`;]s|&[#\w]{1,99};s)</nowiki>
Now scanned with the "'s" regex rule above and matched 558380 main/article space articles. Sun Creator(talk) 21:56, 23 November 2019 (UTC)
- John, I know you keep lists of typo in some way, so I pasted a list of 662 typos I checked and corrected in November. It might be useful to check these typos again after giving it a few years to attract occurrences in articles. Sun Creator(talk) 11:05, 26 November 2019 (UTC)
- @Sun Creator: OK, I've copied most of those to my "to do" list. If I continue with my current editing pattern, that means I'll look at them once sometime in the next two years (James 4:13–16) -- John of Reading (talk) 07:25, 27 November 2019 (UTC)
ArbCom 2019 election voter message
Vandalism on Sambandam
Semi-protection: High level of IP vandalism.Revert and protect)
- Aravindddd (talk · contribs · deleted contribs · nuke contribs · logs · filter log · block user · block log) Clearly this sock-puppet account is created for vandalizing this article only.User is here to disrupt the article only, not to contribute.Repeatedly disrupting the original work Sambandam — Preceding unsigned comment added by 122.167.192.67 (talk) 04:44, 20 November 2019 (UTC)
- Please use the article talk page to discuss agreements over the text of the article. I am not an administrator here, and cannot protect the article or block any editor. -- John of Reading (talk) 07:27, 20 November 2019 (UTC)
Blackadder Clan
john, sorry if this is wrong thread for compliments and gratitude, but thanks for your incredible work on the Blackadder clan’s esteemed Scottish heritage and members. I’m doing some extensive research for a dear friend and descendant here in the USA, and am flabbergasted by his apparently genetic passion for his beliefs, undeniable fortitude, philosophical Christian warrior/adventurer nature, and passionate defiance against injustice imposed by authority —especially in freedom of speech area.
Now I just need to find the ‘Sailor Blackadder’ as my friend is a proud Navy veteran. I’ll keep reading!
Your work is wonderful, a gift to history and all future generations, and he will be greatly inspired by his heritage, because of your gift. Organized, well-researched & cited & shared knowledge is truly priceless. We are infinitely grateful to you. Thank you.
Sorry again if this is wrong forum. hugs from here....Suzanne. Swarden8 (talk) 12:14, 20 November 2019 (UTC)
- @Swarden8: Pages like this "user talk page" are exactly the right place to leave messages for particular editors. Thank you, indeed, for your good wishes - but I think you have thanked the wrong editor! I have edited several articles about Scottish clans and clan members, but only to fix a few spellings and grammar errors. If you look through the "page history" of these articles you will be able to find the names of the editors who did the research and contributed most of the content. -- John of Reading (talk) 12:34, 20 November 2019 (UTC)
- Ah, I see, I will be sure to thank them as well, thanks so much for the pointer! please be well! Swarden8 (talk) 13:58, 20 November 2019 (UTC)
Disambiguation link notification for November 21
An automated process has detected that when you recently edited Open-fields doctrine, you added a link pointing to the disambiguation page Osing (check to confirm | fix with Dab solver).
(Opt-out instructions.) --DPL bot (talk) 07:29, 21 November 2019 (UTC)
- Fixed -- John of Reading (talk) 07:44, 21 November 2019 (UTC)
A to An false positive list
I've now added to WP:AWB/T most of the false positives that you provided many moons ago. I didn't do EBU because I think people would say "E-B-U", nor AU$, because that seems to be an Australian dollar, so not sure how that can be a false positive. I notice you are using some 'A to an' regex in the replace with w-links for example here you did a replaced: a Independent → an Independent. Very good indeed. Have you got the the regex working on more complex ones with the second part of the w-link matching, but not the first like 1 and 2? If not, here is the regex that I'm currently using. You can copy paste it to AWB option 'Find and replace', although the XML display doesn't work for the double quote yet sometimes it is required. Notice it got used here and here. I recently notice in your userboxes you are asm-5! A rare thing these days. I can just about read asm (although slowly), but never got to write anything beyond basics. Regards, Sun Creator(talk) 01:14, 23 November 2019 (UTC)
- @Sun Creator: Good morning!
- I included EBU as a false positive because an editor writing "a EBU" might be expecting readers to pronounce it as "a European Broadcasting Union". But I have no objection to changing that to "an EBU".
- The problem with AU$ is that "a AU$7 purchase" is probably meant to be read out as "a 7 dollar purchase", so needs "a", whereas "AU$8" would need "an".
- I have two regular expressions for A to An, one for wikilinks and one for numbers. I don't currently have one for plain text, as that's covered by the rules in WP:AWB/T.
\b(a)\b(?<!\]\]a)(?<!\b(?:class|division|double|grade|group|homage|homenaje|jr\.|junior|list|model|serie|single|triple|type|vitamin)\s+[´’'‘`]?a)(?<!(?:\-|\&|[a-zi]['’‘]|a\.k\.|U\.S\.)[Aa])(\s+\[\[([^\[\]\|]+\|)?(?:a(?!aa)|e(?!(u|we))|i|o(?!ax|bra|cho|d\b|f\b|ggi|kol[íi]e?\b|mr|nce|ne(\b|[a-fhj-qs-z0-9]|r[a-np-z])|rfu\b|opa|rasului|ra[s?]ului|ui)|u(?=(?-i:[a-z]))(?!ga[ln]|k|na(\b|n|r)|nes|ni([^m]|mo|\b)|[rst][aeiou]|vula))[^\[\]\|]*\]\]\w*)
\b(a)(?=\s+(?:11|18|8))(?<!\]\]a)(?<!\b(?:Büyükçakır|jusqu|Sana|Shi)[´’'‘`]a)(?<!\b(?:autoroute|bundesautobahn\s+\d+|Bundesstraße\s+\d+)\|a)(?<!\b(?:a\.k)\.a)(?<!\b(?:F|N)/A)(?<!\b(?:Applied\s+Physics|autobahn|Bantam|Bundesautobahn|Canzona|chega|chlorophyllide|Chromatogr\.?|Chromatography|circular|class\.|Cod\.|Concerto|Crucifixus|Curlew|Divertimento|Divizia|esquina|Galaxy\s+Tab|harmonica|Junior|Magnificats?|Messa|Messe|Miserere|Missa|NZR|Pater\s+Noster|Physics\s+Letters|Phys(?:ical)?\.?\s+Rev(?:iew)?\.?|preludio|Q\s*&|Royal\s+Society[\"\',]*|Sci\s+Series|Section|Série|Te\s+Deum|uitată|y|\d\d|\d\d\d\d)\s+a)(?<!\b(?:id|pages?|volume|type_strain)\s*=\s*a)(?!\s+1[18]\d\d(?<!00)(?:\s+| |-)(?:acre|cc|ft|ton)\b)(\s+\d+)(?!\d)(?<!\b1[18]\d(?:\d\d\d)*)(?!\s+(?:anni|años|ans|autobahn|autoroute|Buckhurst|de|del|et|éves|Hornet|Interceptor|la|las|le|los|millones|voces|voci|y)\b)(cc|hp|K|km|m|mAh|mhz|mm|nd|nm|rd|s(?<=\b\d\d\d0s)|sq|st|th)?\b(?![ \(\)\.\,\;\-\'\"\+\&\w\d]*\.(?i:(?:gif|jpe?g|ogg|ogv|pdf|png|svg|tiff?|webm))\b)([\d\.\,%]*)(?<!\b(?<!trans-)title\d*\s*=[^\|\{\}]{0,255})
- These still occasionally give false positives, of course, and I developed them so long ago that I'd have trouble working out what all the exceptions are trying to catch.
- Quote marks are partially handled by
[´’'‘`]
at various points, but I see that I've failed to allow for a"
double quote mark and for bold/italic markup. - The left hand side of a piped link is skipped by
([^\[\]\|]+\|)?
- this works as a "Find and Replace" rule, but will never work as a WP:AWB/T rule because AWB removes wikilinks from the text before running the typo-fixing rules. - I have rules for "An to A", both in plain text and with wikilinks. They still have many false positives, since "an" turns up a lot in foreign-language text, and every correction needs careful review, since maybe 20% of the time the "an" needs to be corrected to "and" not "a".
- "asm-5", yes, but that was a few decades ago now... -- John of Reading (talk) 07:35, 23 November 2019 (UTC)
- Thank you! Always fascinating to me, to look at new code. Those rules deal mostly in exceptions when numbers are in the text. I think you have another rule that did [this edit, as neither posted above seem to apply. The A to An rule I'm working on deals with the letters rather then numbers. Still more exceptions and even new words to expand, for example a Xbox -> an Xbox, which is not currently dealt with. Sun Creator(talk) 22:05, 24 November 2019 (UTC)
- The first rule here did make the Stephen Donnelly edit - I should have mentioned that these rules, like most of my rules, have "case sensitive" left unticked. -- John of Reading (talk) 22:16, 24 November 2019 (UTC)
- Thank you! Always fascinating to me, to look at new code. Those rules deal mostly in exceptions when numbers are in the text. I think you have another rule that did [this edit, as neither posted above seem to apply. The A to An rule I'm working on deals with the letters rather then numbers. Still more exceptions and even new words to expand, for example a Xbox -> an Xbox, which is not currently dealt with. Sun Creator(talk) 22:05, 24 November 2019 (UTC)
- Thanks, that explains it. I was incorrectly assuming compatibility with WP:AWB/T. Sun Creator(talk) 10:49, 25 November 2019 (UTC)
Google Code-In 2019 is coming - please mentor some documentation tasks!
Hello,
Google Code-In, Google-organized contest in which the Wikimedia Foundation participates, starts in a few weeks. This contest is about taking high school students into the world of opensource. I'm sending you this message because you recently edited a documentation page at the English Wikipedia.
I would like to ask you to take part in Google Code-In as a mentor. That would mean to prepare at least one task (it can be documentation related, or something else - the other categories are Code, Design, Quality Assurance and Outreach) for the participants, and help the student to complete it. Please sign up at the contest page and send us your Google account address to google-code-in-admins@lists.wikimedia.org, so we can invite you in!
From my own experience, Google Code-In can be fun, you can make several new friends, attract new people to your wiki and make them part of your community.
If you have any questions, please let us know at google-code-in-admins@lists.wikimedia.org.
Thank you!
--User:Martin Urbanec (talk) 21:58, 23 November 2019 (UTC)
soi
soi~~