Talk:Magic number (programming)
Magic numbers in text files
Do anybody know if there is a possibility to differ text files and binary fily by using the magic number?
- Well, yes and no: At a very basic level, no, because there is technically no difference between a binary file and a text file - all files are stored as binary data, and if you interpret them as ASCII (or Unicode, or whatever) you can display them as text. A magic number is simply a part of the data of a file, not an external property, so it is only a judgement of what the file looks like, not a definite sign of what it is. OTOH, the file(1) command (which were I to rewrite this article, which I might, would have a much greater mention) uses a set of tests powerful enough that it will tell you if a file is pure ASCII (i.e. it has no bytes that would be non-displayable if interpretted as ASCII), and even I think what language it is likely to be (based on relative frequency of different letters, possibly, I'm not sure). So while this is pushing the definition of a magic number somewhat, tools based on the concept can indeed differentiate "text files" from other binary data. - IMSoP 15:48, 20 Apr 2004 (UTC)
- Source code is 'text', so is HTML. Text with accents in it is 'text' too. It really depends on what you mean by 'text', but text files generally do not have a magic number Elektron 11:02, 2004 May 6 (UTC)
- Indeed, but as I say, it depends what you mean by "magic number" as well - utilities like file(1) basically just check magic numbers, but can also make judgements like "is there anything in this file that would be crazy if interpretted as text". As such, HTML files are a subset of text files; JPEG files, however, aren't - they contain things that you couldn't possibly interpret as meaningful ASCII. - IMSoP 21:48, 6 May 2004 (UTC)
- Unicode text files (UTF-8 less often than others), typically include magic numbers known as byte order marks. -- intgr 13:29, 14 November 2006 (UTC)
Magic strings and stuff
Should we include magic strings (such as "$1$" used to identify a md5 password)? Is the md5 init data (0x67452301,0xefcdab89,0x98badcfe,0x10325476, which is really 0123456789abcdeffedcba9876543210 as four little-endian longs) a magic number? Elektron 11:02, 2004 May 6 (UTC)
It's arguable that you can consider a string to be a number of sorts (Consider shebang as a weak example). How is "$1$" used? How is the init data used? Is that set of longs always used in md5? (I'm not familiar with it). I would say yes, though... Dysprosia 11:10, 6 May 2004 (UTC)
- A computer looks at everything as a sequence of 1's and 0's. How those are interpreted is up to you (as a user or programmer). If you want to see a sequence of 1's and 0's as a piece of music, a picture, a text string, or simply a number, that's up to you. In the case of magic numbers, well you need to provide some kind of number. So how about (for instance) picking a number that happens to have the same bits as a string? It's easier to remember. :-) Kim Bruning 18:37, 6 May 2004 (UTC)
- Well, in some cases that isn't quite the causal order of things, but yes. I mean, "<html>" and "<?xml>" could both be used as magic numbers, but they were strings first, with meaning to a text-based interpretter system. Still, comes to the same thing - a string can be seen as a number, a number can be seen as a string. - IMSoP 21:51, 6 May 2004 (UTC)
- I wouldn't call <HTML> 'magic', since you can use <hTmL>, <html >, and its location isn't fixed in the file (you can prefix it with a <!DOCTYPE>). I also doubt that <?xml (3c3f786d6c) requires case-sensitivity or that it occurs at the very beginning of the file, and could equally be in UTF-16 (feff 003c 003f 0078 006d 006c). Of course, FEFF (or FFFE if you order the bytes wrong) can probably be considered a magic number for UTF-16, and efbbbf for UTF-8. Elektron 04:37, 2004 May 8 (UTC)
- The set of longs is used in MD5_Init so that it has some bits turned on at the beginning. Its choice isarbitrary, and would be like calling the standard CRC32 polynomial a 'magic number'. It fits the description we've given (a number chosen for a specific purpose), but doesn't fit the common use of magic numbers (to mark data). "$1$" is used in what OpenSSL calls "the MD5 based BSD password algorithm 1" (I don't think it's been formally named, and the function is just crypt_md5). Such hashed passes look like "$1$salt$hash", as opposed to the UNIX crypt() which looks like "cDr5vRCSFWdnM" (two characters salt, and then the hash). There's also the 'new'-style crypt, which starts with an underscore, and isn't widely supported. Elektron 04:37, 2004 May 8 (UTC)
Move here from article:
0xBADC0DE
- (This must have been used somewhere? It's perfect..)
Kim Bruning 10:55, 27 Jun 2004 (UTC)
- One of my former colleagues used
0xC0FFEE
as a magic number in company-internal tools. — JIP | Talk 06:33, 15 Apr 2005 (UTC)
Anti-pattern
This has the category "Anti-patterns" and is linked from Anti-pattern but no justification is provided in the article as to what's wrong with it. --Random|832 01:09, 2004 Dec 16 (UTC)
- I presume this refers to the "Magic numbers in code" section, which begins
- The term magic number also refers to the bad programming practice of using numbers directly in source code without explanation.
- That sections goes on to explain why the practice is a bad idea. I've fixed Anti-pattern to link to that heading for now, but in general I still wonder if there's some reorganisation to be done here (i.e. this page split, and some of the parts potentially merged with things elsewhere). - IMSoP 11:39, 16 Dec 2004 (UTC)
- Sounds like hard code to me. --Astronouth7303
- Are there any plans to split this page then? Maybe Magic number (file formats), Magic number (debuggers), and Magic number (antipattern)? Ojw 18:12, 12 August 2005 (UTC)
- I would hesitate to split this page due to the small size of the resulting articles. Deco 22:19, 12 August 2005 (UTC)
- As a programmer, the concepts of hardcoding numbers into programs, designing debuggers, and designing file formats, seem like totally different subjects for me. Ojw 22:29, 12 August 2005 (UTC)
- That's nothing compared to pages like fragmentation that discuss uses of the term in several totally different fields. If the sections grow to the point where this article is getting too large, then I think some kind of split is appropriate. Deco 21:25, 1 September 2005 (UTC)
Shebang = 0x2321 or 0x2123
After reading this article I fooled around with hexdump, dumping the first bytes from various files on my filesystem. But when dumping some shell scripts, I found that '#!' was 0x2123, and not 0x2321 as the article says. So I "corrected" the article. But reading the Shebang article, it says 0x2321 too, and googling around dosen't make me any wiser. So I'm a bit confused now, which one is correct? -- RoceKiller 13:23, 14 Apr 2005 (UTC)
The line about shebangs (#!
) in Unix shell script files was recently edited from 0x2321 to 0x2123. Actually, either is equally valid, as this a question of endianness. Little-endian machines like x86 boxes use 0x2123, but big-endian machines like Sun SPARC computers (yes Virginia, there are Unixen for those too) use 0x2321. — JIP | Talk 13:24, 14 Apr 2005 (UTC)
Hexdump is silly, and it thinks that you care about things that are 'word'-aligned, in the day when words were 16 bits. Most GUI hex editors display the hexdump in byte order, though they group them into shorts. Most hex editors also show a side-by-side local-8-bit-character-set text representation (I'll get around to coding a better hexdump sometime...). Where byte order matters, big endian should be preferred since it extends easily to, say, 3 byte magics (like C0FFEE), and it's the order it appears in the file. This both confuses less people, and means misaligned magics are easier to spot. --Elektron 22:44, 2005 May 30 (UTC)
hexdump is not silly and the order on both little and big endian machines should be exactly the same. I am sitting on a little endian machine (Fedora Core running on an AMD processor). The hex code for '#' is 0x23, and the the one for '!' is 0x21. Endianess (if there is such a word) only applies to Integer and Float / Double data types. If you have a string of characters, they are just a string of characters. Are you sure the file you were looking at didn't have them as !# (first person)? The correct sequence is 0x2321. From the magic file (no copyright in file except for description strings) here is a sample shell descriptor:
0 string/b #!\ /bin/sh Bourne shell script text executable
/usr/share/file/magic for some nix machines, /etc/file/magic for others, look for shell --hhhobbit 20:57, 16 November 2006 (UTC)
- The correct SEQENCE is 0x23 0x21, interpreting that sequence as a single 16 bit number will result in either 0x2321 or 0x2123 depending on the endianess it is interpreted with. Plugwash 14:43, 16 January 2007 (UTC)
Magic constants and variables
Some people who only remember "literals in code are generally bad" seem to think that just putting their magic number in a variable or constant (especially a global one in languages that support them) like "FIVE" or "INT_SEVEN" or even a self-describing text string (accessDenied = "Access Denied." -- and that's one of the more useful examples) is a good solution to the problem.
I think we should include at least one paragraph to explain that this approach is the same shit with different icing -- the point of avoiding magic numbers in code is to put them into more descriptive variables that are not related to their content (or content type) as much as to their purpose. LITTLE_PIGGIES_COUNT is okay, INT_FOUR isn't. Especially when the value of INT_FOUR might be changed (there IS production code out there where constants like EIGHT have later been set to 16 -- I'm not kidding). -- Ashmodai 07:10, 3 April 2006 (UTC)
Non-magic numbers
This article should mention that sometimes a number isn't magic. These are usually limited to 0, 1, -1, and sometimes 2. I say this because I've seen well-intentioned code that looks like:
const double zero = 0.0; ... double x = zero;
which is useless since it adds layer of indirection but not a layer of abstraction. —Ben FrantzDale 15:01, 1 May 2006 (UTC)
Constants in program source and magic numbers
Are all hard-coded constants really considered magic numbers? I always thought that the term "magic number" was limited to usages where arbitrary numbers were used as uniquely distinguishing identifiers. The section on coding style seems well-intentioned, but out of place to me. Using the deck of cards analogy, a coding example that showed 1 = spades, 2 = clubs, etc. would seem more applicable than an example showing a constant for the number of cards in the deck. Andrwsc 17:27, 10 May 2006 (UTC)
- I've always understood it to mean all literal numbers that are made more readable by symbolisation. PhiTower 15:09, 25 March 2007 (UTC)
I think it's somewhat misleading to describe the only acceptable literal numbers as 0 and 1. This totally ignores the real issue, which is the programmer symbolising a literal number if and only if it makes the code more understandable/manageable. In this way 0 and 1 should be symbolised sometimes. On the other side, it totally ignores the presence of 2 in numerous algorithms that involve doubling or halving such as binary search, reversing an array, etc. Symbolising the 2 there would be silly. PhiTower 15:09, 25 March 2007 (UTC)
I dare oppose to "most programmers would concede that the use of 0 (zero) and 1 are the only two allowable...". In fact, most old-school programmers use a lot of magic numbers for very good purposes. The people who fire the "antipattern shotgun" at everything are not always right. Sure, people do abuse magic numbers, like they abuse anything... but that does not make them wrong. Magic numbers can for example be used to give constants a meaning much like an "ordinary" enum does, but still visible in the binary code and the debugger. In most respects, a magic number has no disadvantages over any other number (the one notable exception is the switch() statement, because a compiler can implement switch() more efficiently using a jump table if continguous numbers (such as from an enum) are used). As an example for a magic number being used in a file format (or data block), the Microsoft byte-order-mark is a good example of a sensible appication. If you ever have to deal with documents coming in different encodings, BOM makes your life both as programmer and as user a lot happier, I so wish the Unicode Consortium had thought of something similar from the beginning. —Preceding unsigned comment added by 91.35.167.238 (talk) 12:06, 15 October 2007 (UTC)
In the spirit of being bold I've rewritten the entire section on acceptable use of magic numbers. I've added some more examples of common usages of magic numbers in code (drawn from my own programming experiences). I've mentioned the 0 and 1 as True/False, but added a note about the macro definitions in stdlib.h (or cstdlib in the C++ world). Likewise for null pointer use of 0. 205.200.229.174 10:30, 16 November 2007 (UTC)
- I thought 205.200.229.174's re-write was very good and have added some more wikilinks, some text formatting and some info from outside of the C/C++ world too for good measure. I wasn't sure about the section heading any more either, so I've tried to improve the tone of that as well. ---- Nigelj (talk) 18:57, 16 November 2007 (UTC)
- I made the edits by 205.200.229.174, but apparently wasn't logged in at the time. I've made a few further changes (mostly stylistic things, and a couple of typos). I think this section is much better-rounded now than it was before. Ve4cib (talk) 23:37, 17 November 2007 (UTC)
Magic numbers in protocols
I have added the request for expansion template, as this section is virtually empty. I was hoping to see a discussion about things like RFC1700 assigned numbers and their equivalent for other protocols, etc. Andrwsc 17:27, 10 May 2006 (UTC)
Notability
I don't consider all the numbers listed under "Magic debug values" to be notable enough to be worth mentioning here. I'd like to remove all the numbers with no mention about where they are used, as well as the ones where the use mentioned is not notable (I don't consider the string some random person has been using as MAC address or chat nick to be notable). Kasperd 09:47, 13 January 2007 (UTC)
- Agreed - be bold! --Nigelj 13:08, 14 January 2007 (UTC)
nintendo magic number
someone please append it;) http://oopsilon.com/The-Smallest-NDS-File Xchmelmilos (talk) 02:26, 12 April 2008 (UTC)
NPOV: magic constants
While I agree that magic constants are generally bad, it's not an NPOV statement to put in an encyclopedia article. Wikipedia is not a coding style handbook. I think that that section should be changed to reflect a neutral point of view. CapitalSasha ~ talk 04:29, 20 July 2008 (UTC)
- I agree.--Avl (talk) 18:20, 9 October 2008 (UTC)
It's hard to be totally neutral when something is widely acknowledged to be negative. See the Wikipedia section on Spaghetti code, for instance. Or you may want to peruse the entries in the "Anti-pattern" category. That said, I'm gonna rewrite the intro to be a little less prescriptive and I'll also add refs. Am not the original author. Leemeng (talk) 04:12, 10 December 2008 (UTC)
Accepted use of magic numbers
Anyone else thinks this section is slightly strange?
It first says that magic numbers are acceptable in some contexts. Then it says such acceptance is subjective. Yet it goes on with rather detailed prescriptions: "It should be noted that while multiplying or dividing by 2 is acceptable, multiplying or dividing by other values (such as 3, 4, 5, ...) is not, and such values should be defined as named constants."
If the acceptance of magic numbers is subjective, should wikipedia really be giving such detailed advice?
I just think this section reads like a textbook, with some weasel wording ("While such acceptance is subjective"). And I happen to think that what it describes is not generally accepted truth. But it may just be me... --Avl (talk) 18:19, 9 October 2008 (UTC)
- Agreed. Wikipedia is not prescriptive and this "detail" is not sourced. The problem such recommendations run afoul of (and this one does too) is that it fails to take into account context.
- The most common reason why 2 is acceptable, for example, is because it's the base of the binary system, which is natural to computers. But if I'm developing a library that uses base 3 calculations extensively, 3 enjoys the same status, and it would be unnatural to declare a constant for it. This is especially true if the algorithms themselves rely on calculations being done in base 3 (i.e. you cannot declare a constant BASE = 3 and then modify this constant afterwards without the algorithms breaking).
- I've deleted the specific statement you mentioned, as it added nothing and was demonstrably disputable. The rest of the section, while still in need of citation, doesn't run contrary to common usage as far as I can tell. 82.95.254.249 (talk) 13:17, 22 November 2008 (UTC)
Bytes vs Words
I replaced most of the instances of word values (e.g., (0x474946383961
) with byte sequences (47
49
46
38
39
61
), because the file formats discussed specify a certain byte sequence, not a certain integer word value. This distinction is especially important in light of byte endianness, since most of the magic number sequences are the same regardless of the underlying hardware reading the files. Where endianness is an issue for a given magic number (e.g., the TIFF file format), it is mentioned as such. | Loadmaster (talk) 16:30, 5 December 2008 (UTC)