Module talk:Lang-zh: Difference between revisions
m Archiving 1 discussion(s) to Module talk:Lang-zh/Archive 5) (bot |
|||
(38 intermediate revisions by 9 users not shown) | |||
Line 1: | Line 1: | ||
{{Permanently protected}} |
|||
{{WPBS| |
{{WPBS| |
||
{{WikiProject Writing systems}} |
{{WikiProject Writing systems}} |
||
Line 21: | Line 22: | ||
__TOC__ |
__TOC__ |
||
== Commas within literal glosses == |
|||
== Template-protected edit request on 14 October 2023 == |
|||
What should we do if there needs to be a comma within a literal translation? I noticed this on [[Yi Jian Mei (song)]], where the quotes should be placed around the whole comma-separated phrase, not individually around each side of the comma. [[User:Pacificboy|pacificboy]] ([[User talk:Pacificboy|talk]]) 03:56, 11 July 2024 (UTC) |
|||
{{edit template-protected|Module:Lang-zh|answered=yes}} |
|||
Diff with my sandbox implementation: |
|||
{{diffsandbox|Module:Lang-zh|rev1=1117460852|rev2=1180117223}} |
|||
:My assumption when adding this feature was that if one needed to add a comma, it should probably be treated as a proper translation, not a gloss. It turns out I never use this formatting, so I could very plausibly disable it. [[User:Remsense|<span style="border-radius:2px 0 0 2px;padding:3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:1px 3px;color:#000">诉</span>]] 05:49, 11 July 2024 (UTC) |
|||
This does three major things, things that address real perennial limitations with the {{tlx|zh}} template: |
|||
::Ah, that makes sense! I’ll convert it to a translation. Thanks. [[User:Pacificboy|pacificboy]] ([[User talk:Pacificboy|talk]]) 02:45, 12 July 2024 (UTC) |
|||
* It adds a new {{para|out}} parameter, allowing one to select one of the terms to place before the rest, which are then put in brackets, an extremely common presentation format when writing Chinese text inline in paragraphs and tables. |
|||
* It now uses double quotes for the {{para|tr}} parameter, for "full translations" instead of glosses, as is prescribed in [[MOS:FOREIGN]] and [[MOS:ZH]]. |
|||
* It enables the use of multiple, correctly quoted glosses (literal translations, {{para|l}}), delineated by commas. |
|||
== Template-protected edit request on 17 August 2024 == |
|||
Examples: |
|||
:{{tlx|Lang-zh/sandbox|c{{=}}我|p{{=}}wǒ|l{{=}}I, me|j{{=}}ngo<nowiki><sup>5</sup></nowiki>|out{{=}}j}} |
|||
:↓ |
|||
:{{Lang-zh/sandbox|c=我|p=wǒ|j=ngo<sup>5</sup>|l=I, me|out=j}}⸻{{para|out|j}} puts the value for {{para|j}} outside the brackets |
|||
:{{hr}} |
|||
:{{tlx|Lang-zh/sandbox|s{{=}}中华|t{{=}}中華|p{{=}}Zhōnghuá|l{{=}}China|out{{=}}p|labels{{=}}no}} |
|||
:↓ |
|||
:{{Lang-zh/sandbox|s=中华|t=中華|p=Zhōnghuá|l=China|out=p|labels=no}} |
|||
:{{hr}} |
|||
:{{tlx|Lang-zh/sandbox|s{{=}}她刚刚离开了|l{{=}}she's just left|out{{=}}l}} |
|||
:↓ |
|||
:{{Lang-zh/sandbox|s=她刚刚离开了|l=she's just left|out=l}} |
|||
:{{hr}} |
|||
:{{tlx|Lang-zh/sandbox|s{{=}}电脑|t{{=}}電腦|tr{{=}}computer|p{{=}}diànnǎo|l{{=}}electric brain|out{{=}}tr|labels{{=}}no}} |
|||
:↓ |
|||
:{{Lang-zh/sandbox|s=电脑|t=電腦|tr=computer|p=diànnǎo|l=electric brain|out=tr|labels=no}}⸻double quotes now used for {{para|tr}} |
|||
{{hr}} |
|||
I was apprehensive about patching the template, but I'm pretty sure I haven't broken anything, I'm sure someone more experienced than I will take a look-see before it gets merged. :) [[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 17:20, 14 October 2023 (UTC) |
|||
:[[User:Remsense|Remsense]], would you be so kind in adding testcases for this change at [[Template:Lang-zh/testcases]]? [[User:SWinxy|SWinxy]] ([[User talk:SWinxy|talk]]) 20:36, 14 October 2023 (UTC) |
|||
::Certainly! I'll try to cover all the edge cases I can think of. [[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 20:37, 14 October 2023 (UTC) |
|||
::I've gone ahead and added some! if you'd like me to be more thorough/any test cases you need to see, let me know. (Oh, and I found a bug in the process.) [[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 21:17, 14 October 2023 (UTC) |
|||
::I fixed the (very obvious!) bug, and added the ability to put both simplified and traditional outside the parentheses. hopefully everything is good now! [[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 00:37, 18 October 2023 (UTC) |
|||
: I have had a look at the testcases, and it's unclear from them what the use case for this is, what articles it would be used in. I mean when you have something like |
|||
{{Edit template-protected|Module:Lang-zh|answered=yes}} |
|||
:: "computer" (电脑; 電腦; diànnǎo; 'electric brain') |
|||
I propose the following changes to add [[Tâi-uân Lô-má-jī Phing-im Hong-àn|Tâi-lô]] romanization support. Of course, POJ covers 95% of Hokkien/Minnan use cases (hence why I have added the "tailo" IANA subtag) but it could still be useful for Taiwanese-specific pages. Additions and modifications below: |
|||
: It would be normal to use the template for everything inside the bracket, and normal text for "computer". The ones where that wouldn't work such as: |
|||
<syntaxhighlight lang="diff"> |
|||
:: pinyin: Cài Yīngwén (Chinese: 蔡英文) |
|||
--- Module:Lang-zh |
|||
+++ Module:Lang-zh |
|||
@@ after line 29 @@ local labels = { |
|||
: I can't see a use for. You would never put pinyin like that in body text, with the Chinese characters in brackets. Generally in an article both the Chinese and Romanisations should go in brackets so as not to upset the flow of the English text, with the Chinese first. When the Chinese is being discussed, such as in an article on the language, this template isn't really suited for it.--[[Special:Contributions/2A04:4A43:90AF:FAB6:29E3:5D64:C243:2A6D|2A04:4A43:90AF:FAB6:29E3:5D64:C243:2A6D]] ([[User talk:2A04:4A43:90AF:FAB6:29E3:5D64:C243:2A6D|talk]]) 02:08, 15 October 2023 (UTC) |
|||
["sl"] = "Sidney Lau", |
|||
::You are able to put any field outside the brackets: gloss, characters, or romanization. There are situations where any of the three are appropriate, usually depending on whether the orthography, semantics, or phonetics are the focus of the prose. I've worked with all of them editing Chinese-related articles. The Tsai Ing-wen example was copied from above, I wouldn't actually write a personal name in such a way in an article. The cases were meant to demonstrate that all the features are working properly. |
|||
["poj"] = "Pe̍h-ōe-jī", |
|||
::Here is a tweaked excerpt from [[Chinese characters]] where a pinyin-first example is the best-flowing, in my opinion. |
|||
+ ["tl"] = "Tâi-lô", |
|||
::: The barrier between pronunciation and meaning is never total, however: in the Chinese system, phonetic characters may be deliberately chosen as to create certain connotations. This regularly happens for corporate brand names: for example, '[[Coca-Cola]]' is translated phonetically as {{Lang-zh/sandbox|labels=no|s=可口可乐|t=可口可樂|p=Kěkǒu Kělè|out=p}}, with the characters selected so as to possess an additional meaning of 'delicious and enjoyable'. A more literal translation would be 'the mouth can be happy', though the phrase is technically grammatically sound. |
|||
["zhu"] = "Zhuyin Fuhao", |
|||
:: Also, I think it's worth putting non-diacritical pinyin in {{para|lang}} tags when you can, because it will still be pronounced better by a screenreader than an English voice selected will attempting to read it aloud, e.g. {{tlx|zh|labels{{=}}no|c{{=}}蔡英文|p{{=}}Cai Yingwen|out{{=}}p}} is better for screenreaders than just {{code|Cai Yingwen <nowiki>({{zh|labels=no|c=蔡英文}}</nowiki>)}}. |
|||
["l"] = "lit.", |
|||
:: [[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 02:13, 15 October 2023 (UTC) |
|||
::: {{done}} [[User:Pppery|* Pppery *]] [[User talk:Pppery|<sub style="color:#800000">it has begun...</sub>]] 00:01, 27 October 2023 (UTC) |
|||
@@ after line 46 @@ local wlinks = { |
|||
::::[[User:Pppery|Pppery]], thank you so much!<span id="Remsense:1698365581970:Module_talkFTTCLNLang-zh" class="FTTCmt"> — [[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 00:13, 27 October 2023 (UTC)</span> |
|||
["poj"] = "Pe̍h-ōe-jī", |
|||
::::[[User:Pppery|Pppery]], Actually, I made several revisions to the module from the initial one, making initial fixes. Could you merge the newest revision of the sandbox?<span id="Remsense:1698366611655:Module_talkFTTCLNLang-zh" class="FTTCmt"> — [[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 00:30, 27 October 2023 (UTC)</span> |
|||
+ ["tl"] = "Tâi-uân Lô-má-jī Phing-im Hong-àn", |
|||
::::: {{done}} Those too. [[User:Pppery|* Pppery *]] [[User talk:Pppery|<sub style="color:#800000">it has begun...</sub>]] 00:44, 27 October 2023 (UTC) |
|||
{{re|Pppery|Remsense}} Was it after these edits that the lead of [[Prunus kansuensis]] started to look so bold? [[Special:Contributions/77.223.109.164|77.223.109.164]] ([[User talk:77.223.109.164|talk]]) 17:09, 19 December 2023 (UTC) |
|||
@@ after line 63 @@ local ISOlang = { |
|||
["poj"] = "nan-Latn", |
|||
+ ["tl"] = "nan-Latn-tailo", |
|||
@@ after line 74 @@ local italic = { |
|||
:Yes. I thought I tested adequately for this. It's not a bug that should've been introduced, but also I think there's never a real reason to put bold text inside this template, so it's good to identify when it's happening at least. [[User:Remsense|<span style="border-radius:2px 0 0 2px;padding:3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:1px 3px;color:#000">留</span>]] 20:48, 19 December 2023 (UTC) |
|||
["poj"] = true, |
|||
+ ["tl"] = true, |
|||
@@ at line 136 @@ |
|||
== Preventing problems with italic and literals == |
|||
- local orderlist = {"c", "s", "t", "p", "tp", "w", "j", "cy", "sl", "poj", "zhu", "l", "tr"} |
|||
+ local orderlist = {"c", "s", "t", "p", "tp", "w", "j", "cy", "sl", "poj", "tl", "zhu", "l", "tr"} |
|||
@@ after line 150 @@ if (poj1) then |
|||
Hi! I noticed a problem when someone wants to use literals with italic next to a bold word. For example the line below...<br> |
|||
orderlist[4] = "poj" |
|||
<code><nowiki>'''Pak Mong''' ({{zh|t=白芒|l=white ''[[miscanthus]]''}})</nowiki> </code><br> |
|||
- orderlist[5] = "p" |
|||
will produce something like...<br> |
|||
- orderlist[6] = "tp" |
|||
'''Pak Mong''' ([[Traditional Chinese characters|Chinese]]: 白芒; [[Literal translation|lit.]] 'white ''[[miscanthus]]''<nowiki/>') |
|||
- orderlist[7] = "w" |
|||
<br>instead of...<br> |
|||
- orderlist[8] = "j" |
|||
'''Pak Mong''' ([[Traditional Chinese characters|Chinese]]: 白芒; [[Literal translation|lit.]] 'white [[miscanthus|''miscanthus'']]') |
|||
- orderlist[9] = "cy" |
|||
<br> |
|||
- orderlist[10] = "sl" |
|||
In order to solve the problem we could just change the line 193 from... |
|||
+ orderlist[5] = "tl" |
|||
val = "'" .. val .. "'" |
|||
+ orderlist[6] = "p" |
|||
to... |
|||
+ orderlist[7] = "tp" |
|||
val = "<nowiki><nowiki></nowiki>'</nowiki>" .. val .. "<nowiki><nowiki></nowiki>'</nowiki>" |
|||
+ orderlist[8] = "w" |
|||
What do you think? ({{ping|Underwaterbuffalo}}) -- [[User:Basilicofresco|<span style="font:small-caps 1em Verdana;color:green">Basilicofresco</span>]] ([[User talk:Basilicofresco|msg]]) 17:16, 19 October 2023 (UTC) |
|||
+ orderlist[9] = "j" |
|||
:Thank you for raising the issue encountered by [[User:FrescoBot|FrescoBot]] at [[Pak Mong]] article. I don't have any expertise with modules. As long as the fix solves the issue and does not create others, I am all in favor! [[User:Underwaterbuffalo|Underwaterbuffalo]] ([[User talk:Underwaterbuffalo|talk]]) 09:34, 21 October 2023 (UTC) |
|||
+ orderlist[10] = "cy" |
|||
:Yes, the single quotes used to mark literal glosses should not be interpreted as wiki markup. [[User talk:Kanguole|Kanguole]] 09:42, 21 October 2023 (UTC) |
|||
+ orderlist[11] = "sl" |
|||
:The pending revision I have to the module currently produces an output of |
|||
end |
|||
::{{lang-zh/sandbox|t=白芒|l=white ''[[miscanthus]]''}}, which i'm noticing isn't entirely correct. i'll fix it asap. |
|||
</syntaxhighlight> |
|||
:[[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 21:37, 21 October 2023 (UTC) |
|||
[[User:MSG17|MSG17]] ([[User talk:MSG17|talk]]) 15:53, 17 August 2024 (UTC) |
|||
::And there we are! Properly escaping quotes now has |
|||
:::<code><nowiki>'''Pak Mong''' ({{Lang-zh/sandbox|t=白芒|l=white ''[[miscanthus]]''}})</nowiki></code> |
|||
:::output |
|||
:::'''Pak Mong''' ({{Lang-zh/sandbox|t=白芒|l=white ''[[miscanthus]]''}}) |
|||
::[[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 17:13, 23 October 2023 (UTC) |
|||
:@[[User:MSG17|MSG17]]: This sounds reasonable, and would be helpful on pages such as [[Penang Hokkien]] where both POJ and TL are used in the article text. @[[User:Pppery|Pppery]] or @[[User:Jonesey95|Jonesey95]], would you be able to help here? [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 13:03, 19 September 2024 (UTC) |
|||
== Template-protected edit request == |
|||
::I'll take a look at this ASAP, thank you for your improvements! <span style="border-radius:2px;padding:3px;background:#1E816F">[[User:Remsense|<span style="color:#fff">'''Remsense'''</span>]]<span style="color:#fff"> ‥ </span>[[User talk:Remsense|<span lang="zh" style="color:#fff">'''论'''</span>]]</span> 13:06, 19 September 2024 (UTC) |
|||
:{{done}}<!-- Template:ETp --> <span style="border-radius:2px;padding:3px;background:#1E816F">[[User:Remsense|<span style="color:#fff">'''Remsense'''</span>]]<span style="color:#fff"> ‥ </span>[[User talk:Remsense|<span lang="zh" style="color:#fff">'''论'''</span>]]</span> 13:48, 19 September 2024 (UTC) |
|||
== Further romanization discussion == |
|||
Allow specifying {{mono|t}} variant: {{mono|zh-Hant-HK}} vs {{mono|zh-Hant-TW}}. See [[:File:Source_Han_Sans_Version_Difference.svg|this image]] for an example of {{lang|zh-HK|返}} vs {{lang|zh-TW|返}}. [[User:Northern Moonlight|<span style="font-family:system-ui,Inter,-apple-system,sans-serif;background-color:#f3f3fe;padding:2px 5px;border-radius:3px;white-space:nowrap">NM</span>]] 02:45, 20 November 2023 (UTC) |
|||
Coming off of my request to add Tâi-lô, what other romanization systems should be added to the template? I feel like [[Pha̍k-fa-sṳ]] annd [[Wugniu]] could be helpful. I don't see any IANA latn subtages for other Sinitic languages however. [[User:MSG17|MSG17]] ([[User talk:MSG17|talk]]) 15:53, 17 August 2024 (UTC) |
|||
== Trailing bold in l= not being removed == |
|||
:I will write a patch for this ASAP. Here's the question: how should it work? I am thinking just additional parameters {{para|tw}} and {{para|th}}. But how the module presently works, if only one character field is specified, it just gets tagged as {{code|zh}}. But it should be trivial to tweak the code so that the more specific language tag is used when specifying only {{para|t}} or {{para|s}}, et al.[[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 02:54, 20 November 2023 (UTC) |
|||
::I don’t think there’s a need to modify the logic for <code>|t=</code> and <code>|s=</code>. Sometimes the region is really irrelevant that you only want to specify the script (for example, in a page that talks about the history of the writing system). |
|||
::We should also avoid ambiguous abbreviations and just go with a slightly longer format like <code>t_hk</code> and <code>s_sg</code>. |
|||
::[[User:Northern Moonlight|<span style="font-family:system-ui,Inter,-apple-system,sans-serif;background-color:#f3f3fe;padding:2px 5px;border-radius:3px;white-space:nowrap">NM</span>]] 05:08, 26 November 2023 (UTC) |
|||
:::fair—it's really difficult, re: abbreviations, because after the 10th time entering it in an article, you might wish concision was favored over disambiguation, but yeah. I still haven't started on this, but I'll keep this in mind when I get to it shortly. [[User:Remsense|<span style="border-radius:3px 0 0 3px;padding:4px 3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:2px;color:#000">聊</span>]] 05:26, 26 November 2023 (UTC) |
|||
In <syntaxhighlight>{{zh|t=竹子林站|j=Zuk1 Zi2 Lam4 Zaam6|l = '''Bamboo Forest station'''}}</syntaxhighlight>, the opening bold markup is properly removed, but the trailing bold markup is not removed. It looks like the regular expression at <syntaxhighlight>term = string.gsub(term, "^([ \"']*)(.*)([ \"']*)$", "%2")</syntaxhighlight> needs some adjustment to the middle wildcard search. – [[User:Jonesey95|Jonesey95]] ([[User talk:Jonesey95|talk]]) 13:23, 16 September 2024 (UTC) |
|||
== Template-protected edit request on 5 February 2024 == |
|||
:{{ping|Jonesey95}} This is because the * operator is greedy, so .* matches everything else in the string. Changing .* to .*? would make it lazy, so that the final term catches all trailing characters. In other words, change the line of code to: <syntaxhighlight>term = string.gsub(term, "^([ \"']*)(.*?)([ \"']*)$", "%2")</syntaxhighlight> [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 13:51, 16 September 2024 (UTC) |
|||
{{edit template-protected|Module:Lang-zh|answered=no}} |
|||
::Thanks! That fixed the problem at [[Zhuzilin station]] and probably other pages. – [[User:Jonesey95|Jonesey95]] ([[User talk:Jonesey95|talk]]) 17:26, 16 September 2024 (UTC) |
|||
This may be politically inclined, but I am highly unsure if [[Tongyong Pinyin]] should be our generic <code>zh-Latn</code>, as the mainland [[Pinyin]] wins by [[Demographics of China|a very large margin in users]]. |
|||
::Thank you for fixing my shoddy regex, by the way. <span style="border-radius:2px;padding:3px;background:#1E816F">[[User:Remsense|<span style="color:#fff">'''Remsense'''</span>]]<span style="color:#fff"> ‥ </span>[[User talk:Remsense|<span lang="zh" style="color:#fff">'''论'''</span>]]</span> 13:05, 19 September 2024 (UTC) |
|||
:::{{re|Jonesey95|Remsense}} On further reflection, this doesn't work as intended. I had thought the string was a regex, but it is in fact a Lua pattern, which is slightly different. The Lua equivalent of *? is - which would give: <syntaxhighlight>term = string.gsub(term, "^([ \"']*)(.-)([ \"']*)$", "%2")</syntaxhighlight> Writing .*? in Lua (as I suggested above) actually means greedily matching all characters (.*) followed by a single question mark (? can also be an operator, but Lua pattern operators can't be nested so in this context it is interpreted as a literal). So actually the new pattern usually doesn't make a substitution, unless there is a question mark. This means it usually fails, e.g. where there are multiple glosses separated by commas and spaces, the spaces are not stripped. However, looking at what the pattern match applies to, I'm not completely sure I understand why the quotes should be stripped in the first place (is there a set of testcases to check against?). At [[Zhuzilin station]], the current code makes no substitution, and so it keeps the bold formatting, presumably as intended. The old code meant that the bold formatting was stripped at the beginning and not the end, so the rest of the article became bold (which was a bad and confusing error). Correcting .*? to .- as above would strip both, making it impossible to add bold formatting. Is the intention to catch cases where an editor unnecessarily adds quotes to the gloss? Is this a common problem? If so, is removing the ability to add bold and italic formatting a fair price to pay? |
|||
I propose the following changes: |
|||
::: If we want to strip one quote mark but no more (so that we catch editors manually adding quotes, but allow formatting), pattern matching is a bit more complicated. I think it would be easiest to separate the stripping of whitespace and quotes. When stripping one single quote, we need to check that there isn't more than one, but we also need to allow the string to contain an apostrophe (so we can't just use [^']- in the middle) and a gloss could potentially be a single character (so we can't just use [^'].-[^'] in the middle). So it seems easiest to strip the leading and trailing quotes separately. This gives three lines (I've also removed two sets of brackets that were capturing substrings that weren't used): <syntaxhighlight>term = string.gsub(term, "^ *(.-) *$", "%1") |
|||
term = string.gsub(term, "^[\"']?([^\"'].-)$", "%1") |
|||
<syntaxhighlight lang="diff"> |
|||
term = string.gsub(term, "^(.-[^\"'])[\"']?$", "%1")</syntaxhighlight> [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 15:43, 24 September 2024 (UTC) |
|||
--- Module:Lang-zh |
|||
::::I think it's fine to strip all quote marks, in any quantity. That was the original intent of the code, and I don't see any complaints on this page. Adding bold to text is probably against [[WP:MOS]], and adding italics should be done with a parameter. People can use {{tag|b}} and {{tag|i}} tags if they insist on them. – [[User:Jonesey95|Jonesey95]] ([[User talk:Jonesey95|talk]]) 15:51, 24 September 2024 (UTC) |
|||
+++ Module:Lang-zh |
|||
:::::Okay. I had taken your comment about fixing the Zhuzilin station article to mean that keeping the bold markup was intended, but I can see why it could be discouraged. I've also just found [[Template:Lang-zh/testcases]] (I had only looked under Module:Lang-zh before), and I don't see any testcases for stripping markup. So, if stripping markup is the desired functionality, the .- version above would work. I think it would make sense to document this, since there are three different kinds of thing being stripped: whitespace, markup, and quotes (double quotes aren't markup). It could be documented either on [[Template:Lang-zh/doc]] or directly as a code comment next to the line we're discussing, e.g. "remove trailing and leading spaces, quotes, and bold/italic markup". [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 20:39, 24 September 2024 (UTC) |
|||
@@ -54,8 +54,8 @@ local ISOlang = { |
|||
:::::Currently, this stripping only applies to literal glosses and not translations, but they should reasonably be treated the same. So, fixing the pattern, matching all whitespace (not just spaces), expanding the comments, and applying the same to the translation, I suggest changing lines 236-247 to the following:<syntaxhighlight> elseif (part == "l") then |
|||
["c"] = "zh", |
|||
local terms = "" |
|||
-- put individual, potentially comma-separated glosses in single quotes |
|||
["s"] = "zh-Hans", |
|||
-- (first strip leading and trailing whitespace and quotes, including bold/italic markup) |
|||
- ["p"] = "zh-Latn-pinyin", |
|||
for term in val:gmatch("[^;,]+") do |
|||
- ["tp"] = "zh-Latn", |
|||
term = string.gsub(term, "^([%s\"']*)(.-)([%s\"']*)$", "%2") |
|||
+ ["p"] = "zh-Latn", |
|||
terms = terms .. "'" .. term .. "', " |
|||
+ ["tp"] = "zh-Latn-tongyong", |
|||
end |
|||
["w"] = "zh-Latn-wadegile", |
|||
val = string.sub(terms, 1, -3) |
|||
["j"] = "yue-Latn-jyutping", |
|||
elseif (part == "tr") then |
|||
["cy"] = "yue-Latn", |
|||
-- put translations in double quotes |
|||
</syntaxhighlight> |
|||
-- (first strip leading and trailing spaces and quotes, including bold/italic markup) |
|||
val = string.gsub(val, "^([%s\"']*)(.-)([%s\"']*)$", "%2") |
|||
Alternatively, if Pinyin is not a suitable generic <code>zh-Latn</code>, we should still add variant tags for Tongyong Pinyin: |
|||
val = """ .. val .. """ |
|||
end</syntaxhighlight> [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 09:31, 25 September 2024 (UTC) |
|||
<syntaxhighlight lang="diff"> |
|||
:::::{{re|Jonesey95|Remsense}} What do you think? Are you happy with the above suggestion? |
|||
--- Module:Lang-zh |
|||
{{edit template-protected|Module:Lang-zh|answered=yes}} |
|||
:::::Also, instead of directly using a Lua string pattern, it might be more readable and maintainable to use an existing function for stripping leading and trailing characters, namely mw.text.trim:<syntaxhighlight> elseif (part == "l") then |
|||
@@ -55,7 +55,7 @@ local ISOlang = { |
|||
local terms = "" |
|||
-- put individual, potentially comma-separated glosses in single quotes |
|||
["s"] = "zh-Hans", |
|||
-- (first strip leading and trailing whitespace and quotes, including bold/italic markup) |
|||
["p"] = "zh-Latn-pinyin", |
|||
for term in val:gmatch("[^;,]+") do |
|||
- ["tp"] = "zh-Latn", |
|||
term = mw.text.trim(term, "%s\"'") |
|||
+ ["tp"] = "zh-Latn-tongyong", |
|||
terms = terms .. "'" .. term .. "', " |
|||
["w"] = "zh-Latn-wadegile", |
|||
end |
|||
["j"] = "yue-Latn-jyutping", |
|||
val = string.sub(terms, 1, -3) |
|||
["cy"] = "yue-Latn", |
|||
elseif (part == "tr") then |
|||
</syntaxhighlight> [[User:NasssaNser|NasssaNser]]<sub>[[User talk:NasssaNser|talk]]</sub> 11:01, 5 February 2024 (UTC) |
|||
-- put translations in double quotes |
|||
-- (first strip leading and trailing spaces and quotes, including bold/italic markup) |
|||
:This seems sensible to me – hanyu pinyin is more widely used than tongyong pinyin, and indeed [[MOS:PINYIN]] mentions that hanyu pinyin is the default romanization we use. What would the effect of this change be from the reader's perspective? —[[User:Mx. Granger|Mx. Granger]] ([[User talk:Mx. Granger|talk]] '''·''' [[Special:Contributions/Mx. Granger|contribs]]) 15:30, 10 February 2024 (UTC) |
|||
val = mw.text.trim(val, "%s\"'") |
|||
::Like the rest of the language metadata, it adds specificity for screenreaders, as well as other possible presentations of articles. [[User:Remsense|<span style="border-radius:2px 0 0 2px;padding:3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:1px 3px;color:#000">诉</span>]] 23:21, 10 February 2024 (UTC) |
|||
val = """ .. val .. """ |
|||
:{{agree}}– I double-checked, and IANA added Tongyong as a [https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry valid language subtag variant] in 2020. [[User:Remsense|<span style="border-radius:2px 0 0 2px;padding:3px;background:#1E816F;color:#fff">'''Remsense'''</span>]][[User talk:Remsense|<span lang="zh" style="border:1px solid #1E816F;border-radius:0 2px 2px 0;padding:1px 3px;color:#000">诉</span>]] 23:20, 10 February 2024 (UTC) |
|||
end</syntaxhighlight> [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 09:02, 27 September 2024 (UTC) |
|||
::::::{{re|Jonesey95|Remsense}} pinging again. The current code inserts quotes incorrectly, e.g. on the following pages you can see an opening quote followed by a space: [[Gun (staff)]], [[Indonesian slang]], [[Ping On]]. The code above would fix this. [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 14:23, 8 October 2024 (UTC) |
|||
::::::: Reping {{u|Remsense}}. Could you please look at this? I think you're one of the only template editors who has enough understanding of Chinese orthography to understand what is being changed here and why. [[User:Pppery|* Pppery *]] [[User talk:Pppery|<sub style="color:#800000">it has begun...</sub>]] 17:19, 16 November 2024 (UTC) |
|||
::::::::Will have this looked at ASAP. Huge apologies for letting it slip through the cracks for months. <span style="border-radius:2px;padding:3px;background:#1E816F">[[User:Remsense|<span style="color:#fff">'''Remsense'''</span>]]<span style="color:#fff"> ‥ </span>[[User talk:Remsense|<span lang="zh" style="color:#fff">'''论'''</span>]]</span> 23:39, 16 November 2024 (UTC) |
|||
:::::::::{{done}}{{snd}}so sorry for the delay, again. <span style="border-radius:2px;padding:3px;background:#1E816F">[[User:Remsense|<span style="color:#fff">'''Remsense'''</span>]]<span style="color:#fff"> ‥ </span>[[User talk:Remsense|<span lang="zh" style="color:#fff">'''论'''</span>]]</span> 23:37, 18 November 2024 (UTC) |
Latest revision as of 18:30, 15 December 2024
Module:Lang-zh is permanently protected from editing because it is a heavily used or highly visible module. Substantial changes should first be proposed and discussed here on this page. If the proposal is uncontroversial or has been discussed and is supported by consensus, editors may use {{edit template-protected}} to notify an administrator or template editor to make the requested edit.
|
This module does not require a rating on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
To help centralize discussions and keep related topics together, Template talk:Lang-zh and Template talk:Lang-zh/doc redirect here. |
This page has archives. Sections older than 180 days may be automatically archived by Lowercase sigmabot III when more than 4 sections are present. |
Commas within literal glosses
[edit]What should we do if there needs to be a comma within a literal translation? I noticed this on Yi Jian Mei (song), where the quotes should be placed around the whole comma-separated phrase, not individually around each side of the comma. pacificboy (talk) 03:56, 11 July 2024 (UTC)
- My assumption when adding this feature was that if one needed to add a comma, it should probably be treated as a proper translation, not a gloss. It turns out I never use this formatting, so I could very plausibly disable it. Remsense诉 05:49, 11 July 2024 (UTC)
- Ah, that makes sense! I’ll convert it to a translation. Thanks. pacificboy (talk) 02:45, 12 July 2024 (UTC)
Template-protected edit request on 17 August 2024
[edit]This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
I propose the following changes to add Tâi-lô romanization support. Of course, POJ covers 95% of Hokkien/Minnan use cases (hence why I have added the "tailo" IANA subtag) but it could still be useful for Taiwanese-specific pages. Additions and modifications below:
--- Module:Lang-zh
+++ Module:Lang-zh
@@ after line 29 @@ local labels = {
["sl"] = "Sidney Lau",
["poj"] = "Pe̍h-ōe-jī",
+ ["tl"] = "Tâi-lô",
["zhu"] = "Zhuyin Fuhao",
["l"] = "lit.",
@@ after line 46 @@ local wlinks = {
["poj"] = "Pe̍h-ōe-jī",
+ ["tl"] = "Tâi-uân Lô-má-jī Phing-im Hong-àn",
@@ after line 63 @@ local ISOlang = {
["poj"] = "nan-Latn",
+ ["tl"] = "nan-Latn-tailo",
@@ after line 74 @@ local italic = {
["poj"] = true,
+ ["tl"] = true,
@@ at line 136 @@
- local orderlist = {"c", "s", "t", "p", "tp", "w", "j", "cy", "sl", "poj", "zhu", "l", "tr"}
+ local orderlist = {"c", "s", "t", "p", "tp", "w", "j", "cy", "sl", "poj", "tl", "zhu", "l", "tr"}
@@ after line 150 @@ if (poj1) then
orderlist[4] = "poj"
- orderlist[5] = "p"
- orderlist[6] = "tp"
- orderlist[7] = "w"
- orderlist[8] = "j"
- orderlist[9] = "cy"
- orderlist[10] = "sl"
+ orderlist[5] = "tl"
+ orderlist[6] = "p"
+ orderlist[7] = "tp"
+ orderlist[8] = "w"
+ orderlist[9] = "j"
+ orderlist[10] = "cy"
+ orderlist[11] = "sl"
end
MSG17 (talk) 15:53, 17 August 2024 (UTC)
- @MSG17: This sounds reasonable, and would be helpful on pages such as Penang Hokkien where both POJ and TL are used in the article text. @Pppery or @Jonesey95, would you be able to help here? Freelance Intellectual (talk) 13:03, 19 September 2024 (UTC)
- I'll take a look at this ASAP, thank you for your improvements! Remsense ‥ 论 13:06, 19 September 2024 (UTC)
- Done Remsense ‥ 论 13:48, 19 September 2024 (UTC)
Further romanization discussion
[edit]Coming off of my request to add Tâi-lô, what other romanization systems should be added to the template? I feel like Pha̍k-fa-sṳ annd Wugniu could be helpful. I don't see any IANA latn subtages for other Sinitic languages however. MSG17 (talk) 15:53, 17 August 2024 (UTC)
Trailing bold in l= not being removed
[edit]In
{{zh|t=竹子林站|j=Zuk1 Zi2 Lam4 Zaam6|l = '''Bamboo Forest station'''}}
, the opening bold markup is properly removed, but the trailing bold markup is not removed. It looks like the regular expression at
term = string.gsub(term, "^([ \"']*)(.*)([ \"']*)$", "%2")
needs some adjustment to the middle wildcard search. – Jonesey95 (talk) 13:23, 16 September 2024 (UTC)
- @Jonesey95: This is because the * operator is greedy, so .* matches everything else in the string. Changing .* to .*? would make it lazy, so that the final term catches all trailing characters. In other words, change the line of code to: Freelance Intellectual (talk) 13:51, 16 September 2024 (UTC)
term = string.gsub(term, "^([ \"']*)(.*?)([ \"']*)$", "%2")
- Thanks! That fixed the problem at Zhuzilin station and probably other pages. – Jonesey95 (talk) 17:26, 16 September 2024 (UTC)
- Thank you for fixing my shoddy regex, by the way. Remsense ‥ 论 13:05, 19 September 2024 (UTC)
- @Jonesey95 and Remsense: On further reflection, this doesn't work as intended. I had thought the string was a regex, but it is in fact a Lua pattern, which is slightly different. The Lua equivalent of *? is - which would give: Writing .*? in Lua (as I suggested above) actually means greedily matching all characters (.*) followed by a single question mark (? can also be an operator, but Lua pattern operators can't be nested so in this context it is interpreted as a literal). So actually the new pattern usually doesn't make a substitution, unless there is a question mark. This means it usually fails, e.g. where there are multiple glosses separated by commas and spaces, the spaces are not stripped. However, looking at what the pattern match applies to, I'm not completely sure I understand why the quotes should be stripped in the first place (is there a set of testcases to check against?). At Zhuzilin station, the current code makes no substitution, and so it keeps the bold formatting, presumably as intended. The old code meant that the bold formatting was stripped at the beginning and not the end, so the rest of the article became bold (which was a bad and confusing error). Correcting .*? to .- as above would strip both, making it impossible to add bold formatting. Is the intention to catch cases where an editor unnecessarily adds quotes to the gloss? Is this a common problem? If so, is removing the ability to add bold and italic formatting a fair price to pay?
term = string.gsub(term, "^([ \"']*)(.-)([ \"']*)$", "%2")
- If we want to strip one quote mark but no more (so that we catch editors manually adding quotes, but allow formatting), pattern matching is a bit more complicated. I think it would be easiest to separate the stripping of whitespace and quotes. When stripping one single quote, we need to check that there isn't more than one, but we also need to allow the string to contain an apostrophe (so we can't just use [^']- in the middle) and a gloss could potentially be a single character (so we can't just use [^'].-[^'] in the middle). So it seems easiest to strip the leading and trailing quotes separately. This gives three lines (I've also removed two sets of brackets that were capturing substrings that weren't used): Freelance Intellectual (talk) 15:43, 24 September 2024 (UTC)
term = string.gsub(term, "^ *(.-) *$", "%1") term = string.gsub(term, "^[\"']?([^\"'].-)$", "%1") term = string.gsub(term, "^(.-[^\"'])[\"']?$", "%1")
- I think it's fine to strip all quote marks, in any quantity. That was the original intent of the code, and I don't see any complaints on this page. Adding bold to text is probably against WP:MOS, and adding italics should be done with a parameter. People can use
<b>...</b>
and<i>...</i>
tags if they insist on them. – Jonesey95 (talk) 15:51, 24 September 2024 (UTC)- Okay. I had taken your comment about fixing the Zhuzilin station article to mean that keeping the bold markup was intended, but I can see why it could be discouraged. I've also just found Template:Lang-zh/testcases (I had only looked under Module:Lang-zh before), and I don't see any testcases for stripping markup. So, if stripping markup is the desired functionality, the .- version above would work. I think it would make sense to document this, since there are three different kinds of thing being stripped: whitespace, markup, and quotes (double quotes aren't markup). It could be documented either on Template:Lang-zh/doc or directly as a code comment next to the line we're discussing, e.g. "remove trailing and leading spaces, quotes, and bold/italic markup". Freelance Intellectual (talk) 20:39, 24 September 2024 (UTC)
- Currently, this stripping only applies to literal glosses and not translations, but they should reasonably be treated the same. So, fixing the pattern, matching all whitespace (not just spaces), expanding the comments, and applying the same to the translation, I suggest changing lines 236-247 to the following:Freelance Intellectual (talk) 09:31, 25 September 2024 (UTC)
elseif (part == "l") then local terms = "" -- put individual, potentially comma-separated glosses in single quotes -- (first strip leading and trailing whitespace and quotes, including bold/italic markup) for term in val:gmatch("[^;,]+") do term = string.gsub(term, "^([%s\"']*)(.-)([%s\"']*)$", "%2") terms = terms .. "'" .. term .. "', " end val = string.sub(terms, 1, -3) elseif (part == "tr") then -- put translations in double quotes -- (first strip leading and trailing spaces and quotes, including bold/italic markup) val = string.gsub(val, "^([%s\"']*)(.-)([%s\"']*)$", "%2") val = """ .. val .. """ end
- @Jonesey95 and Remsense: What do you think? Are you happy with the above suggestion?
- I think it's fine to strip all quote marks, in any quantity. That was the original intent of the code, and I don't see any complaints on this page. Adding bold to text is probably against WP:MOS, and adding italics should be done with a parameter. People can use
- @Jonesey95 and Remsense: On further reflection, this doesn't work as intended. I had thought the string was a regex, but it is in fact a Lua pattern, which is slightly different. The Lua equivalent of *? is - which would give:
This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
- Also, instead of directly using a Lua string pattern, it might be more readable and maintainable to use an existing function for stripping leading and trailing characters, namely mw.text.trim:Freelance Intellectual (talk) 09:02, 27 September 2024 (UTC)
elseif (part == "l") then local terms = "" -- put individual, potentially comma-separated glosses in single quotes -- (first strip leading and trailing whitespace and quotes, including bold/italic markup) for term in val:gmatch("[^;,]+") do term = mw.text.trim(term, "%s\"'") terms = terms .. "'" .. term .. "', " end val = string.sub(terms, 1, -3) elseif (part == "tr") then -- put translations in double quotes -- (first strip leading and trailing spaces and quotes, including bold/italic markup) val = mw.text.trim(val, "%s\"'") val = """ .. val .. """ end
- @Jonesey95 and Remsense: pinging again. The current code inserts quotes incorrectly, e.g. on the following pages you can see an opening quote followed by a space: Gun (staff), Indonesian slang, Ping On. The code above would fix this. Freelance Intellectual (talk) 14:23, 8 October 2024 (UTC)
- Reping Remsense. Could you please look at this? I think you're one of the only template editors who has enough understanding of Chinese orthography to understand what is being changed here and why. * Pppery * it has begun... 17:19, 16 November 2024 (UTC)
- Will have this looked at ASAP. Huge apologies for letting it slip through the cracks for months. Remsense ‥ 论 23:39, 16 November 2024 (UTC)
- Done – so sorry for the delay, again. Remsense ‥ 论 23:37, 18 November 2024 (UTC)
- Will have this looked at ASAP. Huge apologies for letting it slip through the cracks for months. Remsense ‥ 论 23:39, 16 November 2024 (UTC)
- Reping Remsense. Could you please look at this? I think you're one of the only template editors who has enough understanding of Chinese orthography to understand what is being changed here and why. * Pppery * it has begun... 17:19, 16 November 2024 (UTC)
- @Jonesey95 and Remsense: pinging again. The current code inserts quotes incorrectly, e.g. on the following pages you can see an opening quote followed by a space: Gun (staff), Indonesian slang, Ping On. The code above would fix this. Freelance Intellectual (talk) 14:23, 8 October 2024 (UTC)
- Also, instead of directly using a Lua string pattern, it might be more readable and maintainable to use an existing function for stripping leading and trailing characters, namely mw.text.trim:
- Template-Class Writing system pages
- NA-importance Writing system pages
- Template-Class China-related pages
- NA-importance China-related pages
- Template-Class China-related articles of NA-importance
- WikiProject China articles
- Template-Class Taiwan pages
- NA-importance Taiwan pages
- WikiProject Taiwan articles
- Template-Class Hong Kong pages
- NA-importance Hong Kong pages
- WikiProject Hong Kong articles
- NA-Class Macau pages
- NA-importance Macau pages
- WikiProject Macau articles
- Template-Class Singapore pages
- NA-importance Singapore pages
- WikiProject Singapore articles
- Template-Class Malaysia pages
- NA-importance Malaysia pages
- WikiProject Malaysia articles