This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
TOML is in an awkward place regarding notability. I cannot find any reliable secondary sources which refer to it, but there are many secondary sources available which are not considered reliable, and it is mentioned in the documentation for the pieces of software which use TOML. The format is only semi-formally specified since it changes so frequently, therefore there are no RFCs or standards documents to cite.
On the other hand, TOML is used as a configuration file format for the package management tools pip (Python) and Cargo (Rust), which are the de facto standards for their respective programming languages.
Especially given the growing popularity of TOML, I don't think it is beneficial to remove this page. Alternatively, it could be merged with the Configuration file page after restructuring it.
There was a maintenance template suggesting WP:PRODUCT was relevant, but TOML is more or less a community project, and I'm not sure TOML belongs on the page for Tom Preston-Werner, though it did make me consider the possibility of merging with Configuration file. I have changed the use of the maintenance template so that it refers to the general notability guidelines.
Shouldn't that row from the table be removed entirely? "Easy implementation" seems like a completely arbitrary standard. While you could use something like spec length to compare different formats, including such a measure as a legitimate comparison point seems unfounded in evidence or even discussion by anyone other than the editor who added that row in Rdelfin (talk) 21:01, 30 May 2021 (UTC)[reply]
User NotEnoughWikiContributors collapsed this section (according to RedPint) stating "due to its length" without providing a summary. Neither has. user page. This is a talk page not an article. Many the ideas mentioned in the article show significant bias! One persons 'right' is another's "not-left" i.e. some negative comments can be considered positive features.
JSON human readable?? Talk about TOML as "syntactically noisy"
TOML too many features??
" square brackets for arrays even though square brackets are already reserved for table names; "
If this section is collapsed a summary should be included. Better yet bias in the article should be removed.
I'm super new to editing wikipedia articles and I'm not experienced or knowledgeable about language design, so I'm merely talking about a pattern I feel I noticed. Also I haven't done full-on fact checking yet so please correct me if I'm wrong.
I feel that some criticisms against TOML in the "Criticism" section are somewhat biased and that the bias should be mentioned:
See also PEP 518
StrictYAML's criticism on TOML seems to:
To denote being strongly typed as a bad thing for TOML. However being strongly typed or weakly typed are not inherently bad characteristics.
States syntax typing to be bad. Again syntax typing is not inherently bad.
Criticize TOML as a serialization language and not as a configuration language like TOML was meant to be.
Libconfini seems to also criticize TOML for lack of INI compatibility, even though TOML was never meant to provide backwards-compatibility for INI files. Libconfini also seems denote verbose syntax and the necessity of using quotes for strings as "bad", which I disagree with, as verbose syntax can reduce unnecessary errors that come from not specifying a value as a string. For example in YAML:
# Should be a string.First Name:ChristopherLast Name:Null
I agree with StrictYAML and libconfini, so be warned of my bias. StrictYAML criticizes the fact that TOML uses syntax typing, in general, not just for strings. The critique from libconfini is quite long (it contains 20 points), and it does not have much to do with TOML's verbosity. Like StrictYAML, libconfini criticizes the fact that TOML defines data types via syntax. Beware that data types can exist also in INI files, but are not determined by the syntax (in INI "YES" and YEScan be both booleans, 15 and "15"can be both numbers, and so on). But really the points that libconfini lists are many, and you should be more precise with what you disagree with.
Null would be interpreted as a null value instead of a string. If strings were wrapped around it, then it would have been clear to the parser that "Null" is a string. That is the main difference between a serialization format and a configuration format. In a serialization format you don't know beforehand what is expected, while in a configuration format what is expected is well known and is not up to you. Take JSON as a good example of a serialization format. There you definitely want to have the possibility to make a clear distinction between
{"First Name":"Christopher","Last Name":null}
and
{"First Name":"Christopher","Last Name":"Null"}
In a configuration file instead there is nothing you can write more than
First Name=ChristopherLast Name=Null
You might wish you had a way to force Null to be parsed as null instead of "Null", but it is not up to you to decide what data type Last Name is, and the application will anyway have the last word. As libconfini says, a mismatching data type in TOML is either sanitized (and in this case TOML behaves like INI), or discarded (and so your wish to force Null to be parsed as null instead of "Null" is not met). In configuration files you have no ways to force anything. Unless you have any concrete example where INI risks to be ambiguous, libconfini does have a point. --RedPint (talk) 05:32, 12 October 2021 (UTC)[reply]
libconfini argues that since the application will just decide the data types and there are multiple data types in TOML, it is unnecessary to have multiple data types. However TOML was designed to be have values easily parsed to various data structures, so verbose syntax is naturally going to accomplish this goal the best, due to the reduction of ambiguity. If the only data types were tables, strings, and arrays, it would be more difficult to easily parse TOML files.
For example: If an application were to use an INI format, since INI files only use the string data type, the application would need to manually parse each individual value to convert the data, which would make parsing values to data structures, even if they were to use an INI parser.
I don't understand this objection. A TOML parser will parse strings too (a plain text file contains only strings), but will have to deal with the additional task of having to deduce a type from the syntax and check if this matches with what the application requires, while an INI parser will go straight to the type requested by the application. How is TOML easier to parse? As for the ambiguity, I would need an example where INI is ambiguous. --RedPint (talk) 08:24, 13 October 2021 (UTC)[reply]
The application could just treat it as lowercase. Is this a problem?
Yes. A TOML parser is not designed to do that. For example, a TOML parser will likely have a lookup() function for searching for a key. If you are looking for a key named TOML and you use toml it will just not find it. --RedPint (talk) 08:24, 13 October 2021 (UTC)[reply]
I don't really understand libconfini's argument here, as they seem to imply that requiring enclosed quotes for keys in Unicode is bad (because it introduces unnecessary complexity), without specifying the reason as to why.
A TOML parser could potentially interpret an unquoted unicode key as if it had ASCII characters (say a unicode character producing an ASCII space or tab), which would result in having to parse the whole TOML file as if it was full Unicode.
Requiring keys to be quoted to have unicode reduces ambiguities that the application would have to deal with.
You keep using the word "ambiguity" without explaining it. What kind of ambiguity can an application deal with because of Unicode characters? --RedPint (talk) 08:24, 13 October 2021 (UTC)[reply]
What I mean is that a TOML parser (that only parses ASCII) could potentially misinterpret a Unicode character as a different character (like a newline or space). Sure TOML parsers could add unicode support to parse unquoted unicode key names, but that unneccessarily increases the complexity of TOML parsers. - NotEnoughWikiContributors (talk) 23:25, 13 October 2021 (UTC)[reply]
A TOML parser needs to support non-ASCII characters (UTF-8), the TOML specification requires it. But after being able to read Unicode characters, a TOML parser is also required to throw an error if Unicode characters and spaces are out of quotes in key names. The reason? Unknown. --RedPint (talk) 00:51, 14 October 2021 (UTC)[reply]
There are valid use cases for having time data types in TOML. Not only do date, time, and date-times data types make parsing date/time values easier, they also make the value clear to the reader that it is a value describing date or time, which also accomplishes TOML's goal of being obvious.
Sure. As libconfini says, "why not doing that for a path? Or a username? Or an email address? Or a regular expression? …Or a continent name? These have all a more constraining semantics than dates". Most applications need those more often than dates – OK, maybe not continent names... So, why exactly dates? --RedPint (talk) 08:24, 13 October 2021 (UTC)[reply]
I can't find Tom's reason for having date/times in TOML, so I'm going to try and derive the reason.
I think date times are in TOML for mostly readability reasons (as TOML is meant to be easily readable). After all to a reader TOML date(times) are expressed without quotes, so it would (ideally) express to the reader that date(times) are a numerical value, not a string value.
Side note: I think the expression of TOML's design goal of being "easy to read due to obvious semantics" is vague. TOML could be trying to be easily readable for applications instead of humans, or it could be attempting to be easily readable by only humans or both. I think that goal should be elaborated upon as to not stir confusion. - NotEnoughWikiContributors (talk) 20:21, 13 October 2021 (UTC)[reply]
You can ask them. The way I read it is "easy for humans". The easiest thing to read for machines are binary files that map exactly the applications' memory (often these files are called .dat files).
There's no reason to make a custom data type for (file) paths, usernames, email addresses, regular expression, or continent names, because they can easily be expressed as strings. - NotEnoughWikiContributors (talk) 20:21, 13 October 2021 (UTC)[reply]
Same goes for dates. If you want arguments in support, TOML never explained why dates, and no configuration format ever felt the necessity to create a "date type". Honestly it looks like the date type is just randomly there. --RedPint (talk) 21:20, 13 October 2021 (UTC)[reply]
> Strings are a problem for the same reason we prefer strongly typed `int64_t`, `double`, `bool` in the rest of the TOML spec: Typing adds value. If I wanted to mess around with strings and the "can only parse later but not when reading the file" approach I could stick with JSON or a dozen other (inferior) config formats. That TOML can parse (and hence _validate_) datetime objects is a core strength of TOML. Removing datetimes would greatly reduce the usefulness of TOML.
> [I already did 20+ minutes ago. So let me repeat more explicitly:] We have (large, complex) parameter estimation configurations which change through time and have before/after/during date(time) boundaries. Certain setups are valid, then change into other setups. I turned to TOML (and currently work on R support for it) because date(times) are native for TOML. - NotEnoughWikiContributors (talk) 19:32, 15 October 2021 (UTC)[reply]
Thanks for the research! My comments:
"Typing adds value" Unless you use something similar to XML Schema (W3C), which does add value – and TOML does not use anything similar to that – typing adds a further obstacle and discourages using comments as sorts of schema substitutes (things like "Please use a number between 51 and 109 for this key", and so on, which is what I really love about INI files – P.S. How does TOML deal with numbers that must remain between 51 and 109?).
I agree that XML is not human-friendly. I wasn't proposing using XML, I was saying that the only way to ensure proper validation is to schematize things (it doesn't have to be via XML, it just needs to be a schema; even an INI file can have a schema written in INI, and in that case INI will properly support data types, but still avoiding syntax typing – e.g. where a number is expected you can write indifferently 15 or "15", but not "Wikipedia"; INI data types will always be content-based and not syntax-bases, more or less like what happens in Bash) --RedPint (talk) 05:10, 16 October 2021 (UTC)[reply]
"P.S. How does TOML deal with numbers that must remain between 51 and 109?"
It doesn't as there's no need for the language to deal with min-maxes; just save it as an inline table with two integers (one as the minimum, the other as the maximum), and let the application handle that. Same also goes for INI files, except there is no explicit type, so the application has to manually translate . - NotEnoughWikiContributors (talk) 03:30, 16 October 2021 (UTC)[reply]
I made a very specific example on purpose, but I could have made the example of unsigned numbers, which are a common data type in programming languages. How does TOML force a number to be non-negative? What I am saying is that the language that is truly able to express data types is INI, because the application is truly allowed to establish its data types freely. With TOML instead data types need to be filtered by the official TOML syntax, which becomes counter-intuitive in front of custom data types, small sets of enumeration labels, continent names, and so on. --RedPint (talk) 05:10, 16 October 2021 (UTC)[reply]
"TOML can parse (and hence _validate_) datetime objects is a core strength" That is crazy. Validating email addresses must be the next revolution in configuration files then. But when we will reach the point of validating continent names as primitive data types it will be the ultimate configuration format.
It is crazy to define something as trivial as dates as "a core strength of TOML". That makes the language look rather poor. Validating email addresses is more sophisticated than validating dates; I suppose that if I create a configuration format that validates email addresses I easily beat TOML. Plus, email addresses are way way more often necessary than dates. Also treating dates differently makes the fact that all other custom data types must be represented by quoted strings look even uglier. --RedPint (talk) 01:02, 17 October 2021 (UTC)[reply]
"Certain setups are valid, then change into other setups. I turned to TOML (and currently work on R support for it) because date(times) are native for TOML" The guy talks about a very specific problem. I am sure somewhere in the world there is a person who would love continent names to be native data types so that they can validate their configuration files without having to write code. Validating dates is a trivial task for every application, it literally requires two-three lines of code using standard tools. --RedPint (talk) 21:09, 15 October 2021 (UTC)[reply]
I think they are talking about having to between time formats, which I agree with. When there is no one preexisting data type for time, developers just make their own format, similar to how people will make their own ideal standard format, which can result in difficulty translating time formats between multiple programs (just look at a bunch of foreign JSON, YAML, and XML files that store time as an example). It is not easy at all to maintain compatibility for most proprietary formats, and it is practically impossible to maintain full compatibility with each time format. I can easily imagine INI files having compatibility issues, since INI dialects can vary widely across lots of programs.
e.g 1 program stores date-time as DD/MM/YYYY-SS:MM:HH, a second stores date-times as YYYY/MM/DD:HH:MM:SS, and another stores MM/YYYY/DD:SS:HH:MM. If 1 of these program were to use an INI parser, the application would have to manually translate a value into it's used time format, potentially with the program failing due to the correct time value being stored as the incorrect format. Granted these time standards are nonsensical, but they are to convey my point: it is not easy to translate times.
There is no standard way of writing dates (or better, there are several standard ways), so the best possible choice is to leave freedom (after all, why should a configuration format prefer a format above another? what is the gain?). The strptime() Unix function can deal with the most common formats (Unix is the standard here, so that function is findable under other names also on MS Windows). --RedPint (talk) 05:10, 16 October 2021 (UTC)[reply]
This discussion takes a lot of space, so I'm going to collapse it. Since this discussion is getting pretty long, if you want to converse with me any further on this topic, bring it to my talk page. - NotEnoughWikiContributors (talk) 23:37, 16 October 2021 (UTC)[reply]
This is the only point that I agree with. Empty key names should be illegal, because there is no reason for a config file to have empty key names, since it is harder to identify information in empty keys.
TOML was not designed/meant to be compatible with INI. Even if it was, there are a large amount of various INI dialects, so it would be difficult to make a TOML language that is also compatible with INI files.
It did not even try. Such a critique makes perfect sense from the point of view of an INI parser written to deal with different dialects. --RedPint (talk) 08:24, 13 October 2021 (UTC)[reply]
Exactly, so it did not even try. That would not be a sin if only TOML was not so similar to INI, almost a copy. As for the "more complicated specification", reading the documentation of libconfini does not look more complicated to me than reading that of TOML (and with the exception of arrays of tables/sections, libconfini supports in INI all the hierarchy and data types that are supported by TOML – while the latter does not support many features that libconfini supports, like disabled nodes, implicit keys, relative section paths, and so on). --RedPint (talk) 21:20, 13 October 2021 (UTC)[reply]
libconfini assumes that configuration files are meant to be able to be human-editable. This is not always the case, as there are configuration files that are not meant to be easily edited by humans.
Immediate and obvious are two different terms. Obvious is to be easily discovered, while Immediate is in a short time. It is impossible for a language to be immediately understood upon reading due to the nature of learning an unknown language, but if the language is obvious in it's syntax and easily readable (which is what TOML aims to be), the reader could potentially understand easier.
libconfini gives a fallacious emotional argument that TOML is worse than INI because TOML was made out of Tom Preston-Werner's (hereby shortenable to Tom) dislike for INI's unquoted strings.
Libconfini criticizes that fact that disliking something aesthetically is not enough of a reason for creating a language that makes life more complicated. As for the fallacy, I don't understand where you see it. --RedPint (talk) 08:24, 13 October 2021 (UTC)[reply]
That is not what Tom did though. He created the TOML format due to a lack of a standard configuration file format. Sure making another file format standard could make parsing files more complicated, but was that ever a problem when other file formats arose (like JSON and YAML)? - NotEnoughWikiContributors (talk) 19:25, 13 October 2021 (UTC)[reply]
"I hate how unquoted strings look, so I'll make a language!"
"I hate unquoted strings because they are inherently ambiguous and they go against TOML's design goals."
The two statements above have different meanings. The former states that the person made a language because of an aesthetic they didn't like, but the latter states that the person made a language because there was no configuration file format standard and that unquoted strings go against TOML's design goals. That is why I consider libconfini's quote to be fallacious - NotEnoughWikiContributors (talk) 19:25, 13 October 2021 (UTC)[reply]
Tom Preston-Werner literally says "TOML came about precisely because I can't stand unquoted strings in things like YAML and INI.". So a fallacy would be reporting the opposite. After saying that he created TOML because he "can't stand" unquoted strings, he adds that unquoted strings are ambiguous. On the other hand libconfini illustrates how quoting things instead can create ambiguity, showing the example of continent = Europe, where quoting "Europe" will make you believe that it is a free string, while according to it it is a sort of enumeration label out of only five possible continents, and in configuration files most "strings" are actually like Europe. --RedPint (talk) 21:20, 13 October 2021 (UTC)[reply]
I looked over the quote and realized that I conflated Tom's reasoning for creating TOML (disliking unquoted strings) with how unquoted strings are ambiguous, so sorry about that. Even still I don't really think that the birth place of a standard matters all that much when criticizing TOML. - NotEnoughWikiContributors (talk) 23:22, 13 October 2021 (UTC)[reply]
With the content = Europe example, while programs will technically parse that value in as an enumeration label, the program will still take the value as a string, so it should be denoted that the value is a string; otherwise someone could erroneously think that Europe is a custom data type, when it is in practice a string that is parsed by the program as a custom data type. - NotEnoughWikiContributors (talk) 23:22, 13 October 2021 (UTC)[reply]
While in the file, everything is a string – this is simply the definition of plain text files. When both an INI parser and a TOML parser read either Europe or 1999 they simply tokenize these two strings, but they are still strings. At this point though an INI parser simply waits the application before assigning any data type. In the case of 1999 the INI parser will likely have a number parser included, so it can pass it to the application directly as a number (if that is what the application asks for); in the case of Europe instead, being a custom data type (yes, it is a data type), it will pass it verbatim (as the original string) to the application, and the application will have its own "continent data type parser". As for the TOML parser instead, it will automatically convert the 1999 string into a number and pass it to the application as such no matter what, while Europe will generate an error no matter what (unknown data type). --RedPint (talk) 00:51, 14 October 2021 (UTC)[reply]
Point 18: Performance
Sure TOML would be typically slower to parse to due to the complexity of the language, but in practice this point is moot because:
TOML is meant to be a minimal configuration file format (as in it is for small configuration files). If a TOML file is large enough to take a long time to parse, then at that point TOML would be misused here, because it is not designed for large configuration formats.
Configuration files rarely need to be large enough to cause slowdowns with the parser.
As a language, TOML is not minimal (of the four formats listed in § Comparison to other formats it is probably the most bloated one). If a bloated language is designed for small files something is probably not right (you would normally want a bloated language for a bloated file, not the other way around – and even there, you will probably try to avoid bloated languages at all). As for the size, configuration files of large projects can be really huge. --RedPint (talk) 08:24, 13 October 2021 (UTC)[reply]
The homepage's description of how TOML was meant to be minimal is vague, due to the grammar. Tom could have meant minimal as in small configuration files, minimal as in small language-bloat, or minimal as in quick to parse. However I wouldn't blame you if it turns out that TOML was designed with a different meaning of 'minimal' than first implied, or if I misinterpreted the goal. I currently interpret 'minimal configuration file format' to be small configuration files.
TOML is not meant or designed for large configuration files; only minimal configuration files, so trying to use TOML for large configuration files is like trying to use a hammer to insert a screw; Sure you could technically insert a screw with a hammer, but it would be more effective to use a screwdriver.
> You might wish you had a way to force Null to be parsed as null instead of "Null", but it is not up to you to decide what data type Last Name is, and the application will anyway have the last word. As libconfini says, a mismatching data type in TOML is either sanitized (and in this case TOML behaves like INI), or discarded (and so your wish to force Null to be parsed as null instead of "Null" is not met). In configuration files you have no ways to force anything. Unless you have any concrete example where INI risks to be ambiguous, libconfini does have a point.
If TOML was designed/meant to be human-editable, you would have point. However TOML is only meant to be an easy-to-read and easy-to-parse format for minimal configuration files; it was never meant to be easily editable, because configuration files generally do not need/are not intended for manual edits. - NotEnoughWikiContributors (talk) 06:52, 13 October 2021 (UTC)[reply]
You repeat that TOML was not designed to be human-editable, although such a statement would require a source. The reason of having a configuration file is only that of letting it be editable by a human. If you don't need your configuration to be editable you would use a binary format, which does not need parsing at all and maps directly into the program's memory.
Last but not least, this discussion cannot be about the configuration languages that some Wikipedia editors like the most. It should be about the TOML article. As it is now, it only contains a § Criticism section in the end. Since critiques of TOML exist, it seems fair to me. --RedPint (talk) 08:24, 13 October 2021 (UTC)[reply]
I brought the idea that "the flaws in critiques against TOML should be mentioned" up because the Effects of pornography wikipedia page brings up that there are a fair amount of issues with the current research on the effects of pornography (bias, lack of substantial sample size, and narrow sample types which does not account for other types of romantic relationships). So I got the idea that I think these points should at least be mentioned in the wikipedia page, so as to not be biased. - NotEnoughWikiContributors (talk) 20:21, 13 October 2021 (UTC)[reply]
If you can document the opposite points of view, yes, why not? You cannot amend a critique, but you can show opposite points of views. Consider though that this page in particular suffers the fact that it relies too much on primary sources, as the banner in the lead shows. So, secondary sources would be needed. --RedPint (talk) 21:20, 13 October 2021 (UTC)[reply]
I think the comparison section is heavily biased against TOML. It doesn't mention aspects that are important to configuration files that happen to be in TOML's favor such as support for date, time, and date/time values (an issue with JSON) and the ability to safely load external data (an issue with XML and YAML, as they both load other documents by default).
The "Human Readable" section should be removed completely. The term "human readable" just means that you don't need to decode some binary representation in order to understand it. That includes all of those languages. (Also, singling out XML is just plain weird.)
In the "Syntax Typing" column, YAML is problematic. It decides if something is a string or boolean based on a magic list of words, not the syntax, sometimes.
In "Allows Comments", it should be noted that while JSON doesn't formally support them, many parsers do.
CUE isn't even popular enough to have a page of its own. So why is it in the chart?
In short, the whole comparison is a mess and probably should be deleted. If a comparison is desired, it should be started over from scratch on its own dedicated page. — Preceding unsigned comment added by 148.64.20.31 (talk) 02:06, 23 September 2022 (UTC)[reply]
How about something like the percentage of "noise" characters (where keywords and values are considered length =1)
For example:
JSON:
{
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 27,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
Evaluation :
{
"f": "J", "l": "S",
"i": t,
"a": 27,
"a": {
"s": "2",
"c": "N",
"s": "N",
"p": "1"
},
text: 17
quotes in keywords: 18.
quotes in value strings:12
quotes in numeric entries :0
commas separating entries where spaces would do: 8