Jump to content

Talk:Mersenne Twister

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 88.170.208.244 (talk) at 11:44, 12 August 2008 (Pseudo code). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Reference needed

For the technically interested somebody should give a reference to the original publication.

—Preceding unsigned comment added by 134.130.15.1 (talk) 18:42, 22 March 2002

Done.
—Preceding unsigned comment added by 213.253.40.49 (talk) 01:40, 23 March 2002

equidistributed

What does "equidistributed in 623 dimensions" mean? AxelBoldt, Saturday, March 30, 2002

Well... It means that if you take the outputs and use them to pick points in 623-dimensional space, you get a statistically uniform distribution in the unit hypercube (assuming you pick between 0 and 1). I would have thought that was obvious? If not, perhaps some parenthetical clarification is indicated?
—Preceding unsigned comment added by The Ostrich (talkcontribs) 10:02, 29 May 2002
Perhaps better: that all sequences of bits of length up to 623 digits appear equally often in the output. Acanon 23:30, 1 Sep 2004 (UTC)

No. It is *not* merely "all sequences of up to 623 output *bits*". It's the much stronger "all sequences of up to 623 output *values*, where each value is 32 bits".

(In contrast, a 65 bit LFSR has equidistribed all sequences of only 2 output 32 bit values. The third 32 bit output value of that LFSR is heavily biased. The fourth value can always be predicted from the previous 3 with 100% accuracy.)

--DavidCary 8 July 2005 23:46 (UTC)

"twist" is for long period, "temper" is for equidistribution.
but I can't say it well in English.
61.211.103.62 12:04, 5 February 2006 (UTC) M.Saito[reply]

description

Perhaps some person should include a description of the algorithm?

—Preceding unsigned comment added by 141.150.119.240 (talk) 16:34, 27 September 2004

Mistake in code?

Within the 'k' indexed 'for' loop the pseudocode uses 'i' instead of 'k'

—Preceding unsigned comment added by 195.92.40.49 (talk) 08:58, 13 March 2006

Also 'index' / 'i' in the final function.
I've corrected both of these.
—Preceding unsigned comment added by El10t (talkcontribs) 17:20, 13 March 2006

cryptography and the MT

i added a sentence about why the MT can't be used for cryptography — observing the sequence of iterates for long enough allows one to predict or independently calulate all future iterates. dunno if this is sufficient. Lunch 01:23, 16 March 2006 (UTC)[reply]

I do not understand what does that mean exactly. Does this prediction require tto know that the MT is used (and no other PRNG)? How is the observation acutually done, how must the numbers been analysed? And why doesn't this contradict the equidistribution?--SiriusB 14:56, 14 May 2007 (UTC)[reply]
There's nothing odd about predictable but still equidistributed. As a very simple example, consider the sequence 1, 2, ..., 500, 1, 2, ..., which repeats indefinitely. This is an equidistributed sequence (each value from 1-500 appears the same number of times) but if I were to tell you the current number in the sequence, you would immediately be able to give the next number in the sequence. —Lowellian (reply) 20:58, 22 May 2007 (UTC)[reply]
OK, but this far from being a random-like distribution. Let's say it in other words: "Random-like" and "predicrable" should contradict each other. But the random-like distribution of the MT is said to be among the best of all, so how can a random-like distribution still be predictable? Or is it only predictable if one knows that the MT is used (and how it works, of course) and the predictability is reduced to finding the correct seed?--SiriusB 10:57, 23 May 2007 (UTC)[reply]
First of all, "random-like" and "predictable" are not contradictory. A good pseudorandom number generator seems "random-like," but by definition of its pseudorandomness, is also predictable if you understand the generating algorithm and have the seed. If you observe about 600-something values of MT, it is possible to determine where in the MT sequence you are, and from there determine future values. —Lowellian (reply) 03:10, 27 May 2007 (UTC)[reply]

Changed seeding algorithm!

Was: MT[i] := last_32bits_of(69069 * MT[i-1])

Is now: MT[i] := last_32bits_of((69069 * MT[i-1]) + 1)

There are 2^19937 possible states for the MT19937 generator and only one of them is unacceptable - all zeros. The old LCG will itself produce an endless run of zeros if the seed is 0. Moreover, I believe it is a mangled version of a VAX generator. See [1]

—Preceding unsigned comment added by 86.138.249.106 (talk) 17:23, 27 June 2006

Seeding

The C implementation of MT takes a 32-bit value as a seed. Does anyone know how to choose seeds that will generate sequences that don't overlap (for, say, 2^64 values) in the 2^19937-1 long full sequence? Or one can be "reasonably sure" that all those 2^32 seeds generate disjuct sequences? Any results published on this? Andras 84.3.64.82 09:54, 12 April 2006 (UTC)[reply]

Yes, there are positive results from Matsumoto & Nishimura themselves (see web site):
Makoto Matsumoto and Takuji Nishimura, "Dynamic Creation of Pseudorandom Number Generators", Monte Carlo and Quasi-Monte Carlo Methods 1998, Springer, 2000, pp 56--69
—Preceding unsigned comment added by 81.208.61.201 (talk) 08:17, 27 March 2007

The "twist" and equidistribution

The article says: "The "twist" is a transformation which assures equidistribution of the generated numbers in 623 dimensions (linear congruential generators can at best manage reasonable distribution in 5 dimensions)" I think this is probably wrong. The equidistribution in 623 dimensions is also attained by a linear congruential generator provided it is as big as Mersenne Twister, and that it is based on a primitive polynomial. In fact all the widely discussed good properties of Mersenne Twister are shared by a such a construction. There are a few good things the "twist" achieves but they are very subtle (perhaps too much so for an encyclopedia article). 203.164.223.124 02:49, 28 April 2006 (UTC)[reply]

Speed

The article says: "It is faster than all but the most statistically unsound generators." Not really. It is quite fast. It is difficult to make a generator this good that runs faster. But a lagged fibonacci generator with sufficiently large lags can run quite a lot faster, and has good enough properties for many purposes. A lagged fibonacci generator with 4 taps (instead of the usual 2) is likely at least as fast as MT and behaves quite well. The construction used in SPRNG with two lagged fibonnaci generators combined with right shift and XOR is also very fast and very good. None of these have the proveable nice properties of the twister, but in practice i would not call them "most statistically unsound".

BTW the free source code distributed by the original inventors of MT is kind of acceptibly quick, but could be a lot faster if optimised, especially for a particular machine. But even when optimised it will be a lot slower than lagged fibonacci. 203.164.223.124 02:49, 28 April 2006 (UTC)[reply]

My thoughts: 1) Several variations are faster, and this is even noted in the article.

2) "most statistically unsound generators" is extremely ambiguous. Does this refer to the most statistically unsound generators in common use? Are we comparing it to the 17 most unsound generators in common use, or the 2 most unsound? Are we including conceivable generators that have never been used?

3) Even if true and unambiguous, this is [peacock language] and a violation of Wikipedia standards. A better statement would be "it runs in O log N time" (or whatever time it runs in). You could even compare this to the time of other commonly used generators. —Preceding unsigned comment added by Bvbellomo (talkcontribs) 21:21, 13 November 2007 (UTC)[reply]

equidistribution

The proveable equidistribution in 623 dimensions may be a red herring. It only applies totalled over the whole period of the generator. The period is so huge that it is unlikely to be used in any real application. It is actually possible to achieve the same guarantee for some really bad generators.

Here's one: take a unsigned binary number with 623*32 = 19936 bits. Initialise it to some value as the seed. Return successive 32 bit sections from the number as pseudo-random return values. When the end of the number is reached, increment the number and start back at the first 32 bit section. Over the whole period, this has equidistribution in 623 dimensions. Over any data set you would use in practice, it sucks.

Mersenne Twister seems to be very well behaved in practice, but this has very little to do with the theoretical guarantee. 203.164.223.124 02:49, 28 April 2006 (UTC)[reply]

What you described is equidistributed in exactly 623 dimensions; from what I understand this algorithm has proven equidistribution in up to 623 dimensions, a much stronger properly which isn't accomplished by your algorithm. --192.198.152.98 15:27, 13 May 2007 (UTC)[reply]

Revision Update

There is a new revision (2002/1/26) by the original authors of MT. We need to update this article to reflect the changes: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c

—Preceding unsigned comment added by 210.19.133.3 (talk) 03:15, 5 October 2006

(Yes, please. The previous initialization, the one that the article mentions, had severe problems; see the explanation (Google's cache) and the new code (Google's cache). The function to revise is init_genrand().) --pgimeno 13:23, 2 January 2007 (UTC)[reply]

Also, we need more explanations about the code. I'm a little unclear of a particular for-loop used in this new revision:

k = (N>key_length ? N : key_length);
for (; k; k--) { ... }

What does this mean? k is initialised to either N or key_length, and used as the default for the empty parameter in the for-loop, but the second parameter is also k. From the code, it looks like as if he's decrementing k, but with the same initial and end conditions. The for-loop is the same as

for(k;k;k--) {...}

I don't make any sense of this. Can anyone explain? Thanks.

—Preceding unsigned comment added by 210.19.133.3 (talk) 03:15, 5 October 2006

It's equivalent to:
k = max(N, key_length);
while(k != 0) {
   ...
   k--;
}
--64.235.97.125 19:16, 16 October 2006 (UTC)[reply]
That's correct. The second parameter in "for(k;k;k--)" is treated as a Boolean and will return true as long as it is not zero. —Lowellian (reply) 21:02, 22 May 2007 (UTC)[reply]

I don't undestand...

I'm somewhat puzzled by the "applications" paragraph. What does it mean that 624 observations allows one to predict the future iterates? Are there any known issues with some seeds? (Please try to be simple, I'm not really in those things)
MaxDZ8 talk 14:32, 26 October 2006 (UTC)[reply]

Tempering function is bijective. So inverse function exists. So we can rebuild internal vectors from 624 outputs. Then we can run MT algorism to get next sequence.133.41.131.128 01:11, 14 March 2007 (UTC)[reply]

I believe I got it. Thank you.
MaxDZ8 talk 06:59, 14 March 2007 (UTC)[reply]

Pseudo code

MT generates 624 numbers at once, but it's for speed. Pseudo code should show algorithm clearly without considering speed. I don't know the grammer of pseudo code, so I show it in C:

   unsigned long genrand_int32(void)
   {
       unsigned long y;
       static unsigned long mag01[2]={0x0UL, MATRIX_A};
       /* mag01[x] = x * MATRIX_A  for x=0,1 */
       mti = mti % N;
       y = (mt[mti] & UPPER_MASK) | (mt[(mti + 1) % N] & LOWER_MASK);
       mt[mti] = mt[(mti + M) % N] ^ (y >> 1) ^ mag01[y & 0x1UL];
       y = mt[mti];
       mti++;
       /* Tempering */
       y ^= (y >> 11);
       y ^= (y << 7) & 0x9d2c5680UL;
       y ^= (y << 15) & 0xefc60000UL;
       y ^= (y >> 18);
       return y;
   }

—The preceding unsigned comment was added by 133.41.131.128 (talk) 04:32, 27 March 2007 (UTC).[reply]

There is no grammar for pseudo-code, it's just supposed to be readable by all human programmers but not by a computer program. It should look like Pascal but you're free to use things like shift 5 bits to the left() while this is clearly a syntax error in Pascal or C. 88.170.208.244 (talk) 11:44, 12 August 2008 (UTC)[reply]

Period

When the article talks about the period, it states that

"in practice, there is little reason to use larger ones, as most applications do not require 2^19937 unique combinations".

This seems like misinformation. On the Pseudorandom Number Generator page, it says that 2^19937 combinations is

"likely far more than the number of computations which can be performed within the entire future existence of the universe)".

I think this statement should be rephrased.

142.162.90.179 19:09, 10 April 2007 (UTC)[reply]

Which statement do you think should be rephrased, the former or the latter? I see no problem with the former statement. It is perfectly true that most applications do not require 2^19937 unique combinations. If it is the latter statement you see a problem with, you should discuss it on Talk:Pseudorandom number generator, not here, since that is the page on which the statement appears. —Lowellian (reply) 20:47, 22 May 2007 (UTC)[reply]
I suggest saying:

"This period exceeds the need of any practical application." Cuddlyable3 10:39, 27 June 2007 (UTC)[reply]

Not at all. The period of MT makes many practical applications unsuitable. For example the python function random.shuffle is limited due to the period of MT. [2] , [3] Shabda 09:26, 9 August 2007 (UTC)[reply]
That's not a practical limitation. It's easier to see that with randomly generated text than permutations: most texts can never come out of a PRNG, because the set of texts is too big (even for fairly short texts). But you'll never notice unless you're patient enough to wait for monkeys to write a Shakespeare play, which might take you a few universe lifetimes, and then cry foul when the allegedly random monkeys start repeating themselves from the beginning (at which point you notice that they were actually just robots using a PRNG). All a bigger period would do is put off that point, and with infinite (i.e no) period the point would never come, but if there's nothing else wrong with the PRNG then this is not much of a limitation. It's similar to secure hashes, which are used to e.g. give "unique" 256-bit tags to arbitrary bit strings, even though it's plain as day to everyone that the tags can't possibly be unique (by the pigeonhole principle). None of it matters in practice, as if this is the only problem with a secure hash, then the day you produce a counterexample we'll be busy dealing with the flying pig pest problem. -- Coffee2theorems 00:01, 10 August 2007 (UTC)[reply]

Unlike Blum Blum Shub, the algorithm in its native form is not suitable for cryptography. Observing a sufficient number of iterates (624 in the case of MT19937) allows one to predict all future iterates. Combining the Mersenne twister's outputs with a hash function solves this problem, but slows down generation.

I think the last sentence in the quote above is overly broad and vague. Will combining the Mersenne twister's with any hash function make it suitable for cryptography? For what types of cryptographic applications will it be suitable? Are there certain crptographic applications for which the combination (Mersenne twister + hash function) would be easy to attack? How sould the Mersenne twister be combined with the hash funciton?

Dubious advantages

The article claims as an advantage that 3. It is faster than all but the most statistically unsound generators. Well, the GSL timing data at http://www.csse.uwa.edu.au/programming/gsl-1.0/gsl-ref_16.html show at least two faster generators, neither of which have shown any statistical defect (or I haven't unearthed a reference to that effect). So can someone please document the claim? If not, it'll be removed in a week.

According to Jackel (2002) Monte Carlo Methods in Finance, the Mersenne Twister, "is in fact faster than most other reliable pseudo-random number generators" and "no slower than any of the other pseudo-random generators". (p. 75) (perhaps a small exaggeration in a casual discussion) Measure for Measure 03:39, 11 November 2007 (UTC)[reply]

I'll also note that the advantage given as "4. It passes numerous tests for statistical randomness, including the stringent Diehard tests." is somewhat tautological. Most currently used generators pass "numerous" tests as well, even the Diehard ones. With that in mind, (4) is like saying that the advantage of a particular kind of car is that it "demonstrably moves you from one place to another". When they all share the same advantage, what is the advantage?

There is also a lack of disadvantages here. MT19937 has a very large amount of state compared to many other generators, and there is no efficient "stepping" ability (issues noted by l'Ecuyer). mdf 19:22, 14 September 2007 (UTC)[reply]

The extensively used program Matlab has a number of built in random number generators. The default one has a period of 2^1492, and as far as I know it is reliable and would have been tested extensively because of the extensive use of Matlab in simulation work requiring random numbers. Matlab (v7) also has the option to use MT as the random number generator. In testing it, I found that the MT was about 30% slower than the default random number generator. Since the Matlab built in has a period long enough for most (well ... probably ANY) purposes, there doesn't seem to be any real advantage (apart from curiosity value) in using MT option. Indeed, Matlab only offers the MT option for the rand() function that gives uniform deviates - the MT option is not available for the randn() function, which generates Gaussian deviates. Alan1507 12:33, 15 September 2007 (UTC)[reply]

My timing tests agree with yours in R2007a (7.4) but the MathWorks people think there's enough to MT that it's the MATLAB default for RAND as of this version. Still not available in RANDN. Patrick O'Leary 17:38, 4 October 2007 (UTC)[reply]
Before you start removing stuff, let's discuss this a bit more. I gather from your timing table that the fastest three are, in order, taus, gfsr4 and mt19937. I take it mt19937 is this one. My question: do taus and gfsr4 pass all the same tests that mt19937 passes? If so, which tests have you applied? Did you apply any tests that some algorithms pass, but others fail? Can you provide a reliable source to back up your claims? 82.139.85.143 04:13, 14 October 2007 (UTC)[reply]
http://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf reports tests for a number of other generators of this ilk. See table I, starting at about page 28. A few of these are faster than MT19937, and all report comparable results. Take care to note that the cut-off for failure is extremely small in these tests. (To paraphrase Linus Torvalds: anyone who thinks a generator that fails a statistical test at the 1.0e-10 level is 'statistically unsound' may well be smoking crack ;-/). Which is to say that the comments from Jackel, quoted by Measure for Measure above, are a better reflection of reality. mdf (talk) 19:05, 19 November 2007 (UTC)[reply]
To mdf: What exactly do you mean with "no efficient 'stepping' ability"?--SiriusB 09:56, 14 November 2007 (UTC)[reply]
To jump ahead so-and-so many states, without actually having to compute every intermediate state. For F2 linear generators (like MT19937), this amounts to a simple matrix multiply, and the matrix can be computed quickly in a small amount of space, even for extremely large steps. L'Ecuyer mentioned, in a few of his many papers, the problem with MT19937 having such a huge state makes the stepping issue hopeless (consider 19937-sized matrices, and O(n^3) algorithms). However, when trying to relocate them to answer your question, I immediately came across Efficient Jump Ahead for F2-linear Random Number Generators, in which Haramoto, Matsumoto, Nishimura and L'Ecuyer effectively solve the MT19937 stepping problem. mdf (talk) 19:05, 19 November 2007 (UTC)[reply]

I should have read the discussion before making changes, I had no idea this was such a live issue. Please feel free to revert if I have misunderstood. However, as a nonspecialist, I'd suggest "fast" might be a sufficient comment for this article? Servalo (talk) 00:32, 29 November 2007 (UTC)[reply]

Examples of users

Here is a start on a list of examples of users of the Mersenne twister. When it gets to be a decent size it can go in the article. Kaldosh 09:32, 25 September 2007 (UTC)[reply]

Pseudocode

This is the current pseudocode in the article:

 // Create a length 624 array to store the state of the generator
 var int[0..623] MT
 var int y
 // Initialise the generator from a seed
 function initialiseGenerator ( 32-bit int seed ) {
     MT[0] := seed
     for i from 1 to 623 { // loop over each other element
         MT[i] := last_32bits_of(1812433253 * (MT[i-1] bitwise_xor right_shift_by_30_bits(MT[i-1])) + i)
     }
 }

 // Generate an array of 624 untempered numbers
 function generateNumbers() {
     for i from 0 to 623 {
         y := 32nd_bit_of(MT[i]) + last_31bits_of(MT[(i+1)%624])
         if y even {
             MT[i] := MT[(i + 397) % 624] bitwise xor (right_shift_by_1_bit(y))
         } else if y odd {
             MT[i] := MT[(i + 397) % 624] bitwise_xor (right_shift_by_1_bit(y)) bitwise_xor (2567483615) // 0x9908b0df
         }
     }
 }
 
 // Extract a tempered pseudorandom number based on the i-th value
 // generateNumbers() will have to be called again once the array of 624 numbers is exhausted
 function extractNumber(int i) {
     y := MT[i]
     y := y bitwise_xor (right_shift_by_11_bits(y))
     y := y bitwise_xor (left_shift_by_7_bits(y) bitwise_and (2636928640)) // 0x9d2c5680
     y := y bitwise_xor (left_shift_by_15_bits(y) bitwise_and (4022730752)) // 0xefc60000
     y := y bitwise_xor (right_shift_by_18_bits(y))
     return y
 }

The definition of pseudocode is: a compact and informal high-level description of a computer programming algorithm that uses the structural conventions of programming languages, but omits detailed subroutines, variable declarations or language-specific syntax. The programming language is augmented with natural language descriptions of the details, where convenient, or with compact mathematical notation. Well, this code is indeed not language-specific, but it is not compact and not informal. It is much bigger and less readible than the equivalents in most languages ("right_shift_by_11_bits" and then assuming % for modulo without explanation???), Since the majority of users of such code are probably going to be C/C++ or FORTRAN users, I propose that it is converted into working C code:

// Create a length 624 array to store the state of the generator
// Most numbers are unsigned 32-bit integers
unsigned long int MT[624];
unsigned long int y;
int current_idx; // points to the current MT output number

// Generate an array of 624 [0..623] untempered numbers
void MTgenerateNumbers() {
  int i;
  for (i = 0; i <= 623; ++i) {
    y = ((MT[i] & 0x8000000) >> 31) // take 32nd bit
      + ( MT[(i+1) % 624] & 0x8fffffff); // AND with last 31 bits
    if (y & 1)
      // y even
      MT[i] = MT[(i + 397) % 624] ^ (y >> 1);
    else
      // y odd
      MT[i] = MT[(i + 397) % 624] ^ (y >> 1) ^ 2567483615ul; // 0x9908b0df; ul indicates unsigned long in C
  }
  current_idx = 0;
}

// Procedure to initialise the generator from a 32-bit seed
void MTinitialise(long unsigned int seed) {
  int i;
  MT[0] = seed;
  for (i = 1; i <= 623; ++i) // loop over remaining elements
    // & is a binary AND, ^ is a binary XOR, and >> a bit shift
    MT[i] = 0xffffffff & (1812433253 * (MT[i-1] ^ (MT[i-1] >> 30)) + i);
  MTgenerateNumbers();
}

 
// Extract the next tempered pseudorandom number
unsigned long int MTrandom() {
  // call generateNumbers() again if the array of 624 numbers is exhausted
  if (current_idx >= 624)
    MTgenerateNumbers();
  y = MT[current_idx];
  y = y ^ (y >> 11);
  y = y ^ ((y << 7) & 2636928640ul); // 0x9d2c5680
  y = y ^ ((y << 15) & 4022730752ul); // 0xefc60000
  y = y ^ (y >> 18);
  ++current_idx; // increment index
  return y;
}

Testing it with seed=1 gives this for the first few numbers:

  0 2773488264
  1 3990704388
  2 3757302560
  3 1411641254
  4 2593950992
  5  307370599
  6 2147185445

If that is not correct, then the pseudocode might be ambiguous. For example it is not clear to me whether 32nd_bit_of(MT[i]) means (MT[i] >> 31) or (MT[i] & 0x8000000). Han-Kwang (t) 01:07, 21 November 2007 (UTC)[reply]

Ambiguity is bad, and should be fixed. I have no problem with working C/C++ code, though this may encourage optimization, and complaints about "Wikipedia sucks because it doesn't compile with BlazerSnort++ 6.1!". Test-vectors is indeed a good idea, but this brings in "original research" and "howto" issues. If that is moot, then it would be good to give the above, plus for states 624 - 630. mdf (talk) 02:20, 21 November 2007 (UTC)[reply]
It seems that there are variations on the initialization routine; e.g. this reference implementation produces different numbers when seeded with 1. I don't want to put in a piece of compilable C code unless I'm really sure that it is correct. A few comments above, there is a C snippet which might be better as an example. Unfortunately, it is incomplete: what are MATRIX_A, UPPER_MASK, LOWER_MASK, N? How is the mt array initialized? Han-Kwang (t) 19:20, 25 November 2007 (UTC)[reply]

All right. While I feel, that a link per programming language is ok, I also think that we should get some sort of order or hierarchy into that list. I suggest an alphabetical order. Opinions? --Gulliveig (talk) 14:22, 25 November 2007 (UTC)[reply]

What is CR?

"The Mersenne twister is a pseudorandom number generator linked to CR developed in 1997 by Makoto Matsumoto" What is "CR"?

What is WELL?

The section about SFMT compares it to WELL in terms of speed... I found some information on http://www.iro.umontreal.ca/~panneton/WELLRNG.html - in the PDF it says that WELL stands for a family of "Well Equidistributed Long-period Linear” random number generators. Also, it is not clear to me why it is relevant in the discussion of SFMT, and not mentioned earlier. Are WELL generators significant enough to be mentioned at all? --Cynebeald (talk) 08:23, 17 January 2008 (UTC)[reply]

Moreover, what does "v-bit accuracy" mean? It cannot be expected by the reader to guess (or even know) what a v-bit is and how it is related to random number generators.--SiriusB (talk) 14:16, 31 March 2008 (UTC)[reply]