Jump to content

Talk:Radix sort

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 79.97.217.134 (talk) at 11:01, 7 January 2010 (Clever Programmers). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

WikiProject iconComputing Start‑class
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
StartThis article has been rated as Start-class on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.
WikiProject iconComputer science Start‑class Mid‑importance
WikiProject iconThis article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
StartThis article has been rated as Start-class on Wikipedia's content assessment scale.
MidThis article has been rated as Mid-importance on the project's importance scale.
Things you can help WikiProject Computer science with:

Possible errors

quicksort is not n (log n), its n^2, I fixed this

- quicksort utilizing a random pivot can be shown to be (Big Theta)(n*log n) as instead of one decision tree for the program there exist many. As one makes n arbitrarily large, the probability of picking a tree which perfectly matches the sortedness of the data set such that the pivot is always "worst case" (and thus yields a n^2 run) becomes increasingly small. Since when we perform algorithmic analysis we are looking at the limit as n goes to infinity we can state that this probability goes to 0 and thus that quicksort is indeed (Big Theta)(n*(log n)).

Nonsense! You must state in advance whether you like to discuss/consider/analyze average, or worst case, or what-ever behaviour!! BTW: When talking Radix-sort one should also mention its generalization, Hash-sort, which is on average asymptotically the fastest known sorting method, O(n), due to the fact that it uses knowledge about the to-be-expected distribution of the keys. With a good chosen hash-function it's run time is C*n + o(n). Enlighting example: sort N uniform and independend distributed random numbers in the halv-open intervall [0,1[ --- or even more instructiv: sort a permutation of the first N natural numbers! Last can be done optimally because of sufficient knowledge of the to-be-expected data.

This character seems to be off-base on a number of counts. Worst case is the conventional way of discussing algorithms.~JMK

std::sort is based on quicksort and heapsort, and is O(n(log(n)). Stop knocking quicksort; yes, it's really O(n*n), but std::sort fixes that problem, and for real problems even raw quicksort is theta(nlog(n)).

You are incorrect. Not all std implementations use introsort, and in fact introsort is a rather sophisticated algorithm that is still not widely known. Further, quicksort doesn't generally degrade all the way to n*n but it can often perform somewhat worse than N Log(N). I love quicksort dearly, but in many cases, or in most, a well implemented radix is strictly better time-wise, and strictly worse space-wise. ~JMK

Hashsort is O(2k(n))[1], which is not O(n) for large k. When you compare realistic sorting algorithms that involve radix or hash-based sorting, you must assume both large n and large k. Bucketsort obviously is O(n) when log(n) > k, but it's impractical for k >> log(n), which a major set of problems (some would argue most problems). Hashsort is slower than std::sort for large k with collisions. Now, if you would like a comparison of hashsort vs. Spreadsort or some other practical general algorithm, I'd be happy to make one.Cuberoot31 00:07, 2 December 2006 (UTC)[reply]

This is also wrong. Radix can be implemented with histograms to make it much faster than your imagined "worst" case in practice and in theory. Radix settles snugly at around 8 for its K even with crude optimizations. This is both worst and best case, and is regarding a floating point implementation. Please pay more attention to external research. ~JMK

Radixsorting with histograms is O(n(k/C)), where C is usually around 8. This is quite fast for small k (say 32), but as k grows, the Radixsort becomes slower. std::sort has used introsort for years[2], and you might be surprised at how well it compares with Radixsort, when you add in the runtime overhead of memory allocation and random access that you can't avoid with a radix sort. In my testing, std::sort is roughly 3 times as fast as quicksort, depending on the compiler and OS.Cuberoot31 (talk) 21:41, 20 May 2008 (UTC)[reply]

Huh! This test only shows there is no advantage over quicksort, really, because a constant multiplier of 3 means std::sort=O(quicksort) Alex 10:52, 19 Aug 2009 (UTC) —Preceding unsigned comment added by 193.9.13.137 (talk)


Does the sample implementation given depend on the endianness used by the computer? I'm not quite sure what exactly the >> operator does. --AxelBoldt

The value of x >> y is floor(x / 2y), provided x and y are non-negative, and y is not too big. So it doesn't depend on endianness in this case. But if x is negative, then the value is undefined, so this is a problem. Another problem with the code in the article is that it assumes that CHAR_BIT is 8. --Zundark, 2001 Dec 11

So maybe we should make them unsigned ints to take care of the sign issue? Is there a clean way to deal with the CHAR_BIT thingy?

I think we should probably have an implementation that deals with keys that are (arbitrary l ength) strings, since that seems to be the more interesting case. --AxelBoldt

I changed the int array to unsigned. The CHAR_BIT thing is more problematic, so I just added a note pointing it out. On most compilers for desktop computers CHAR_BIT is 8, so this isn't too bad a restriction. --Zundark, 2001 Dec 11


Having looked at it more closely, I see that it has more serious problems than those mentioned above, so I've moved it here until it gets fixed. --Zundark, 2001 Dec 11


Sample radix sort of an array of integers

This sample implementation is written in the C programming language.

/*
 * This implementation sorts one byte at a time, so 32 bit integers get sorted in 4 passes.
 * It uses a simplified bucket sort to sort on each byte, which requires O(n) memory.
 * It assumes that CHAR_BIT is 8.
 */

struct listnode                               /* a simple linked list data structure */
{                                             /* used for the bucket sort */
   unsigned val;
   struct listnode * next;
};

void sort(unsigned * array, int length)
{
   int i, j;
   unsigned char key;              
   struct listnode nodes[length];             /* an array of preallocated listnodes */
   struct listnode * buckets[0x100];          /* the buckets for the bucket sort */
   
   memset(buckets, 0, 0x100 * sizeof(struct listnode *)); /* zero the memory */
   
   for(i = 0; i < sizeof(unsigned); i++)      /* iterate over the bytes in an unsigned int */
   {
      for(j = 0; j < length; j++)             /* iterate over the array */
      {
	 key = (char)((array[j] >> (i * 8)) & 0xFF); /* grab the byte */

         nodes[j].val = array[j];             /* put the byte in the bucket */
	 nodes[j].next = buckets[key];
	 buckets[key] = &(nodes[j]);
      }
      
      j = length - 1;
      for(key = 0xFF; key >= 0; key--)        /* loop over the buckets */
        while(buckets[key] != 0)
        {
           array[j] = buckets[key]->val;      /* pull the values out of the buckets */
	   buckets[key] = buckets[key]->next; /* and put them back in the array */
	   j--;
	}
   }
}


I have written FORTRAN examples of both the recursive forward radix sort and the nonrecursive reverse radix sort. Both examples sort pointers to fixed length character string arrays. Should I add these examples to the article, or are non-C examples considered obsolete ? StuRat 09:47, 5 August 2005 (UTC)[reply]

- Don't see why they would be! Not everyone uses C after all


Is there anyone to write pascal source of radix sort.

Style Improvement Request

In the section "Incremental Trie-based Radix sort", there is a sentence with this as its subject: "The sorted lists of other non-incremental radix sorting algorithms that sort all of the keys in a batch". I hope its obvious that this needs to be improved. I would do it myself, but I have no idea what it's trying to say. Can someone please attend to this? Thanks. Danielx 16:15, 13 August 2006 (UTC)[reply]

OK. I revised that sentence. -Ed Lee 24.12.11.31 03:04, 15 October 2006 (UTC)[reply]


Most of the article was already separated into a discussion of least significant digit radix sorts versus most significant digit radix sorts. One of the earlier contributors was probably only familiar with least significant digit radix sorts, which is why the article originally started off sounding as if least significant digit radix sorts were the only kind that existed. I've more explicitly separated the discussion of least significant digit and most significant digit radix sorts, reformatted the queue example to be easier to read, and made some other changes.

I think that the mention of, "In practice, O(n) time is only observed when the number of items is much greater than the size of the key space," is in reference to the observed execution time of an LSD radix sort processing fixed-length keys relative to the execution time of an O(n*log(n)) time sorting algorithm, because it might not be obvious from just looking at the asymptotic notation that a radix sort might actually be slower than a comparison-based sorting algorithm for a sufficiently small amount of input data. So, I added some discussion of that.

I've communicated with people who insist on defining the length of a linear quantity of data as n*log(n), meaning n keys with log(n) bits per key, so that the time complexity of a radix sort would end up looking like O(n*log(n)), and the time complexity of some comparison-based sorting algorithms would end up looking like O(n*log(n)*log(n*log(n))), but that notational convention is going to be confusing, so I did not want to go into a discussion of notational preferences.

Someone else who has relevant expertise in the area of in-place radix sorting will have to contribute on that topic, since I have not studied that topic and frankly can't see how that would work without allocating additional memory beyond the original memory required to store the input data. The computer would have to perform a lot of exchanges to do a radix sort in place, or perhaps I am misunderstanding what, "in place," means in this context.

No one is omniscient, so I encourage you to contribute what truths you know to the Wikipedia.

-Ed Lee 24.12.11.31 03:41, 15 October 2006 (UTC)[reply]


I've worked with a derivative algorithm Spreadsort for some time. Some fixes: Radixsort is O(n*k), or more precisely O(n*(k/s)*2s) for a MSD Radixsort, where s is the size of chunk processed at a time in bits. The main sorting page for that is wrong, and I'm going to fix it. By contrast, comparisons can be quicker than O(k), they are often theta(1). They are only O(k) if usually done between items with very similar keys. Another issue with the assumption that a comparison is just as slow as checking small pieces of a key multiple times is that comparisons are usually running sequentially along a single strip of memory (which is fast), versus looking at pieces of larger chunks in an array multiple times (which is slower). The biggest performance problem with MSD radixsort is the overhead of processing nearly empty bins (say you have 2 items in a bin, and they match to all but the last bit; you could be processing 256 bins each iteration k/s times instead of doing 1 comparison), and the memory overhead of creating bins. LSD's big problem is with long keys, especially strings. All the memory allocation/deallocation is really slow relative to other operations, on top of having to sort items of different length separately.

There is nothing fundamental stopping an "in-place" MSD radixsort as stability is not required; the memory overhead of such an implementation is proportional to 2s. Spreadsort can be thought of as a modified MSD radixsort that switches to comparison-based sorting where radixsort is slow and has cut the problem down.Cuberoot31 08:32, 9 November 2006 (UTC)[reply]

I have thought about it a little more, and MSD radix sort doesn't have much going for it when k > log(n). When k < log(n) it can be in-place and thus use less memory than LSD, but MSD's best performance when n < 2k requires data that splits evenly into a single item per bin. Otherwise, if you have 2 items per bin, and they only differ on the last bit, you will recurse through all 2s bins multiple times until you hit the end of the radix, where a comparison-based algorithm would do a single comparison on those 2 and be done. It is important not to forget the overhead of creating, filling, and deleting bins (or whole copies of arrays) in analyzing the relative performance of radix sorts. I have found in practice that memory operations are much slower than all but the slowest comparisons (string < being one slow exception) on a system that has been up long enough to fragment its memory, so it is important to keep this in mind when designing and analyzing radix-based sorts. This is a major reason why radix-based sorts aren't more popular. If you want to use MSD radix sort for its ability to run in parallel, why not use MSD for the first (or first few) iterations to split up the problem into smaller bins, then share those bins out across multiple systems, and have them use LSD radix sort (or Spreadsort). MSD radix sort has poor performance on small bins, which you will eventually get if you recurse long enough, so much so that I consider it an impractical algorithm for general application on its own. If I'm missing something, I'd welcome some more comment. I believe that Spreadsort solves this weakness in MSD radix sort, which is one of the reasons why I developed it. I would welcome some real performance data comparing MSD radix sort versus LSD, std::sort, or Spreadsort on a single processor where n << 2k. If I don't get a reply soon (a week), it sounds like this issue has been resolved, and I would like to take down the dispute tag.Cuberoot31 16:49, 10 November 2006 (UTC)[reply]


Cuberoot31's discussion of MSD radix sort seems limited to an array-based MSD radix sort. A trie-based radix sort is a kind of MSD radix sort, and it performs best in terms of lowest memory usage and fastest execution speed when the keys are redundant and sparsely distributed in the key space. The recursive function calls of a more traditional array-based MSD radix sort analogously trace the paths of a trie in stack space in a depth-first traversal. I wrote BRADSORT, which is a kind of trie-based radix sort, and a link to its source code is in the reference section of the Wikipedia radix sort article. You can compile BRADSORT with a C compiler and compare execution times with different quantities and distributions of input data.

Regarding Rspeer's change on 04:53, 2 December 2006, where Rspeer comments, "take out misinformation about big-O notation. You never assume constant memory," I disagree. In the analysis of the time complexity of sorting algorithms, you certainly do assume constant-bounded memory or else you lose all connection to physical reality, as memory accesses no longer happen in constant-bounded time as the size of the memory grows due to the speed of light limitation.

-Ed Lee 24.12.11.31 06:05, 7 December 2006 (UTC)[reply]


I agree, the trie-based sorting does improve the worst-case memory usage and runtime for an MSD radix sort. So what do you claim as the worst, average, and best-case performance of your trie-based MSD radix sort? How does it handle the problem of (many groups of) a few items with only slightly different key values of the form: xxxxxxxxxxxxx xxxxxxxxxxxxy xxxxxyxxxxxxx xxxxxxxxxyxxx xxxxxxxxxxyxx in an efficient manner? I don't see how any radix sort can be faster than theta(nk), and in terms of worst-case, a trie-based sort can still have significant overhead due to storing possible next steps that may not be filled. As noted in the link, this can be six times the size of the input data; I'm not sure why it couldn't be higher, but I'd have to delve more into your implementation to see it. So radix-based sorts are good when the key value is short. For long keys, where k >> log(n), comparison-based sorting has the advantage of being able to look along a key linearly through memory (which is fast) as opposed to in a fragmented fashion, piece-by-piece as radix sorts do, and comparison-based sorts can be done without significant additional memory, which can be very slow to allocate and deallocate relative to comparisons. The disadvantage of conventional comparison-based algorithms in this situation is that they still need log(n) comparisons, and if the keys don't differ until near the end, they need to look (linearly) through k bits each comparison. I'll do some runtime comparisons when I get a chance; I believe trie-based radixsorting can have excellent performance under some conditions.Cuberoot31 20:47, 7 December 2006 (UTC)[reply]


For a trie-based radix sort, the amount of time spent processing each unit length of input data has a constant upper bound, so the time complexity is O(N), where N is the total length of the input data, whether the strings are fixed in length or variable in length.

BRADSORT v1.50 deals with duplicate strings by storing one instance of a string and a count of the number of times that each string appears to indicate each string's frequency. So, if there are duplicates of a string in the input data which occupy more memory space than the amount of memory required to store a count and one instance of the string, then data compression occurs.

For unique strings, BRADSORT executes the most quickly and uses the least amount of memory space for representing strings as paths in the trie when the unique prefixes are short, such as:

axxxxxxx...
bxxxxxxxx...
cxxxxxxxx...
dxxxxxxxxx...

It does not matter how long the individual strings are if the unique prefixes are relatively short, because the trie only needs to represent the unique prefixes of the strings as paths in the trie in order to distinguish the strings from each other. The worst-case trie space usage occurs when the entire string or almost the entire string defines a unique string prefix, such as:

xxxxxxxxxa
xxxxxxxxxb
xxxxxxxxxc
xxxxxxxxxd

The worst-case space usage of BRADSORT and the longest sorting time occurs in this situation, when strings are identical to each other except for the very last bit or last few bits. The memory space required to store the trie structure can be greater than 6 times the size of the input data. What the exact amount of space is in the worst case depends on the C compiler implementation and how much space it allocates for a node structure. For BRADSORT v1.50, each node structure below the root node corresponds to one bit of a string. For 2 strings that are nearly identical and overlap as paths in the trie except for the very last bit, the worst-case space required to store the trie structure would be approximately:

sizeof(node)*total_number_of_bits_in_strings/number_of_strings +
sizeof(node)*number_of_strings

As I mentioned in the radix sort article, trie-based radix sorts can be used for incremental sorting. If you already have a sorted list and need to insert one new string into it, then you might have to re-sort the entire list with a batch sorting algorithm, which can require more time than inserting a new string into a trie.

Other programs use different trie structures and different methods for minimizing space usage. For example, the people who wrote, "burstsort," avoid trie branching to some extent in order to increase the locality of data references for faster execution and to minimize space usage.

-Ed Lee 24.12.11.31 01:51, 8 December 2006 (UTC)[reply]


I agree that tries are useful for reusable data structures, and burstsort is another such algorithm I've been aware of for some time; I read up on it when looking at algorithms with some similarity to Spreadsort (I'm curious as to how burstsort and BRADSORT compare in performance). Many sorting algorithms have a trie equivalent, but for many problems an array is more appropriate; it depends on the application. In particular, rebuilding key values can add a large enough overhead to readout from the trie that lookups on a trie can be slower than even a binary search! (depending on the trie, data type, and whether people wish to look at the key, vs a value stored depending on the key) I'd like to clarify on sorting performance: your O(N) is equivalent to O(nk). Your s for the performance summaries I put on the sorting algorithm page sounds like it is 2, so your worst-case is still basically O(nk). What is different is that since BRADSORT (and other conventional trie approaches such as burstsort) is allocating and deallocating memory, it has a significant number of memory operations, which are generally much slower than processing operations on a per-operation basis (sometimes requiring off-processor communication). Also, O(nlog(n)) comparison-based approaches are really O(nlog(n)) comparisons and slightly less swaps, where swaps can be constant time if pointers are used, and the comparisons are O(k), but comparison operations reading linearly through memory are very fast on conventional modern computers. So it's really O(nk) slow operations for radix sorting versus O(nlog(n)) times (1 slow operation + 1 usually fast O(k) operation) for comparison-based sorting. I found this was true in developing Spreadsort, where comparisons were faster than memory allocation and swaps, which I needed to optimize aggressively. std::sort is a good basis for performance comparison because it is quite fast on modern processors.Cuberoot31 15:17, 9 December 2006 (UTC)[reply]


Earlier versions of BRADSORT did rebuild key values from paths in the trie, because the keys were entirely represented as paths in the trie, but BRADSORT v1.50 stores keys separately from the trie structure. Each individual key in BRADSORT v1.50 occupies contiguous memory space to represent the key in its entirety, so no rebuilding of a key is necessary. The trie structure in BRADSORT v1.50 acts more as a hierarchical index to the keys rather than as a complete representation of the keys. Reading out all of the keys with BRADSORT v1.50 is as fast as traversing a sorted linked list, because BRADSORT v1.50 uses a circularly threaded trie.

I don't think that we have a material difference of opinion on what notation to use for expressing the time complexity. If k has a constant upper bound in O(nk), then the expression can also be written as O(n) by the definition of O(). I prefer the more concise notation. We can theorize all that we want about the performance of sorting algorithms, but the bottom line is the performance that is achieved in the real world on real-world applications.

-Ed Lee 24.12.11.31 20:34, 9 December 2006 (UTC)[reply]


I do not accept the assumption that k has a constant upper bound; strings can be of infinite length and I've seen real-world sorting problems with keys over 1kB in size (compression is usually appropriate in such cases, but that's a side issue). I think it is appropriate to compare worst-case performance for large n AND large k, otherwise the relative performance of radix-based sorting and comparison-based sorting is difficult to understand. Let's try to keep the article accurate with only the assumptions necessary for a reasonable discussion, and in terms of real-world performance, radix-based algorithms do poorly relative to comparison-based algorithms for large k.Cuberoot31 04:26, 10 December 2006 (UTC)[reply]


There is no real computer system with an infinite amount of memory capable of storing a string of infinite length. Debating about which sorting algorithm is better in a situation where there are infinitely long keys is purely academic. If you define the length of the input data to be nk, the number of keys times the length of a key, then it is unsurprising that a radix sort will take O(nk) time to complete. An optimal comparison-based sorting algorithm will take O(nk*log(n)) time to complete with the same data.

I do not accept the generalization that radix-based algorithms perform, "poorly," relative to comparison-based algorithms for large key lengths, k. For incremental sorting, a trie-based radix sort can be faster than a batch sorting algorithm. For sorting many duplicate keys, a trie-based radix sort which represents multiple instances of a key as a count and one instance of the key can use less memory due to the resulting data compression and be faster than other sorting algorithms due to a greater locality of data references and less need or no need for virtual memory which is paged from a hard drive. The same conditions under which a comparison can be done in constant time benefit a trie-based radix sort, when the unique key prefixes are short relative to the lengths of the keys. The same conditions under which a comparison of two strings takes the longest time, when the two strings are identical or nearly identical up to the last bit, also make a trie-based radix sort take the longest time to process the data. Comparison-based sorting algorithms can be faster than radix sorting algorithms when a sufficiently small number of keys are processed. It is conceivable that a hybrid radix/comparison sort may be faster than a pure radix sort when the input data is partitioned into small enough groups by the radix sorting portion that can then be processed by a comparison-based sorting algorithm. -Ed Lee 24.12.11.31 03:39, 11 December 2006 (UTC)[reply]


It is correct that comparison-based algorithms will take O(nk*log(n)) worst-case time because of the time for comparisons, but since the comparisons run linearly through memory, a processor can run through even long k quickly, enough to make up for the log(n) for even large n (billions of items), while O(nk) separate operations require random accesses, which are very slow in comparison, and even worse, most radix-based algorithms require allocating and deleting memory, which is horribly slow (multiple orders of magnitude) relative to running linearly though k bytes in order. If modern processors weren't pipelined, this might be different, but pipelining is now a standard optimization in hardware design. Tries are basically equivalent to heaps in terms of being able to incrementally sort; this is a useful capability, I agree, but does not differentiate radix-based sorting from comparison-based sorting. To attain compression, you need overhead bytes, and for some types of data, there will be an expansion. As far as I know, this is a capability only of radix-based algorithms, and when applied it makes them specialized in their utility to problems that can be compressed; the overhead of compression is prohibitive otherwise. So this is a useful capability for a subset of problems, but not all problems. Spreadsort is an example of a hybrid algorithm, as is the more specialized burstsort. These approaches are significantly faster than Quicksort alone, but when compared to std::sort on random data with large k, it becomes a much closer competition (in my testing Spreadsort was still faster than std::sort by 20-30% in my tests). I recommend comparing BRADSORT to std::sort with such a situation. The difficult part is creating a large n, large k testcase. One conceivable way to do this is to write 100MB random bytes in a file (0-255, use rand() % 256), and count every 0 as an end of string. Do not allow writing two zeros in a row to prevent getting a significant number of null strings. Keep track of both runtime and memory usage. I will try to run this test myself when I next get sufficient time.Cuberoot31 20:08, 11 December 2006 (UTC)[reply]


With the complicated and varied cache strategies employed by different microprocessors, it's hard to tell in advance how efficient or inefficient random memory accesses are going to be.

Regarding deletions, BRADSORT does not perform any deletions of nodes until the algorithm is finished, although incomplete deletion functions are in the source code. I never got around to updating the deletion functions for the new threaded trie structure in BRADSORT v1.50.

For your test of BRADSORT, you might have to increase the size of the input buffer to be large enough to hold the longest input string that you want:

/* The maximum length, in bytes, of an input string.  You can modify this. */
#define MAXSLEN (255)
char s [MAXSLEN+1];

BRADSORT v1.50 defaults to using strings that are terminated by the '\n' character, ASCII value 10. I recommend that you put a '\n' character in s[MAXSLEN] in the main() function in case the input buffer contains no newline characters. Also, change the "MAXSLEN+1" expression in the while() to "MAXSLEN":

s[MAXSLEN] = '\n';

while (NULL != fgets (s, MAXSLEN, stdin))
        if ((incomplete=insert10 ((uchar *)s))==1) break;

A more practical application of BRADSORT would be to sort words or count the number of times that each word occurs in some file, such as:

tr -cs [a-z][A-Z] [\012*] <textfile.txt | bradsort f >outputfile.txt

     or for the bash shell:

tr -cs [a-z][A-Z] [$'\n'*] <textfile.txt | bradsort f >outputfile.txt

The tr program is available on Unix and Linux systems. This particular example does not deal with words that contain apostrophes or hyphenated words. What it should do is replace all non-alphabetic characters from a text file with the '\n', newline, line feed, character and then squeeze consecutive instances of the '\n' character into one character so that words are separated from each other on separate lines. The output then gets piped to BRADSORT with the "f" option which will display the frequency of each word next to each word in the output file. The frequency is represented as a count of the number of times that each word appears in the input.

-Ed Lee 24.12.11.31 02:58, 12 December 2006 (UTC)[reply]


Constant memory

Ed, I really don't believe it's useful to assume constant memory when talking about the asymptotic behavior of an algorithm. That would make every algorithm run in O(1) space and O(1) time, if you think it through. If you disagree, point me to an algorithms textbook that assumes constant memory.

The only thing close to that that I can think of is something you may have touched on: sometimes you do assume each memory access happens in constant time, just to simplify things, even though it's not true. rspeer / ɹəədsɹ 16:17, 9 December 2006 (UTC)[reply]


O() notation was originally used in the study of pure math, where infinite domain intervals are commonly considered, but O() is still useful in the study of the relative performance of algorithms within a finite interval. No definition of O() that I have seen restricts its use to infinite intervals, just sufficiently large intervals:

Suppose f and g are two real-valued functions defined on real numbers. We write f=O(g) when there are constants c and N such that f(n)≤cg(n) for all nN .... It means that when n is large enough, some constant multiple of g(n) is an upper bound on the size of f(n).

(Christopher J. Van Wyk, Data Structures and C Programs, p. 35, ISBN 0-201-16116-8.)

There is no problem with using O() notation in O(1) space and O(1) time. You just don't abstract the notation down to that level where all time complexities and space complexities on real computers can be expressed as O(1). Allowing a computer's memory to be infinite in size, as I previously stated, loses touch with physical reality, and then O() is no longer as useful as an estimate of how much physical time an algorithm will require to complete.

-Ed Lee 24.12.11.31 19:50, 9 December 2006 (UTC)[reply]

Slashdot sending editors here

I love Slashdot. History of the Article page so far in 2007: edits on Jan 15,24,26 and Feb 17th. Then there's a Slashdot story (http://it.slashdot.org/it/07/02/25/1848200.shtml) about a supposedly "new" sort algorithm which, in reality, appears to be the good old Radix sort and we get (so far) four edits today. (Feb 25th). Thomas Dzubin Talk 22:46, 25 February 2007 (UTC)[reply]

Example?

The LSD example provided seems rather poor, in that it is unnecessary to sort by the 1s place to achieve correct ordering. Perhaps someone could revise the example to make 1s place ordering a necessary component?

PS I realize the 1s place ordering is necessary for the radix sort. But the example itself doesn't need this ordering, so it might make the radix sort seem a bit less useful than it really is. My two cents.168.39.166.127 19:04, 27 February 2007 (UTC)[reply]

Please Compare MSD and LSD in detail

I find that most texts book do not have a detailed description of MSD. I hope someone could analyse it by the Big-O notation, and compare it with LSD. MSD is not necessarily done recursively.Some text book says we could use Insertion Sort to sort each bin. I would also like to see the time complexity analysis of this method. THANK YOU! (I am a newbie on algorithms) Visame (talk) 23:12, 18 June 2008 (UTC)[reply]

I see that it's being done, although I'm not a big fan of Big-O notation when I know how to spell it out in terms of maximum, minimum, and constant. When you use insertion sort to sort each bin, rather than a recursion, then it's called bucket sort.

AFAIK, there are likely to be implementations of MSD and LSD that work out to pretty much the same thing, but MSD is more easily understood in a recursive manner, so it's probably more common. I suspect that LSD implementations dominate external disk-based implementations, though. BrewJay (talk) 09:37, 25 July 2008 (UTC)[reply]

Merge from bucket sort.

Graphics from bucket sort might be useful, but the mention of insertion sort in it reminds me of a horrible sorting method I devised in high school. (Index keys by the number of keys they are superior to). I'm not sure that inferior methods like bubble sort are not instructive or useful in a pinch or a variation. BrewJay (talk) 09:20, 25 July 2008 (UTC)[reply]

Clever Programmers

People who try to be clever, and end up making the code unreadable to should leave it alone. Replacing the merge function with a call to sum() was really gratuitous. Yes, it works, but only for obscure reasons. There is nothing that would make a normal reader think of sum, which normally sums a list and returns a scalar, as a way to appending a bunch of lists together.

The goal of the Wikipedia should be to explain, not for you to show off that you have some clever and completely opaque trick.

I agree with the change if it makes the article more clear to you. But don't think the sum() definition is accidental nor some advanced stuff, it's fairly trivial in terms of the definition of the list type. You should learn more about recursion and functional programming if that simple example looks obscure to you. Diego (talk) 10:47, 25 November 2009 (UTC)[reply]
You condescension is unnecessary and, frankly, offensive. I understand recursion and functional programming quite well, having taught programming for nearly 30 years.My point, which you brush aside, is that there is nothing in the use of sum() that would lead the average reader of Wikipedia to understand that you use it to concatenate lists. Again, clarity is the most important aspect. We strive to instruct the reader, so we sacrifice the more esoteric so that the novice may understand.
I didn't mean to be condescending, it's just that I failed to see how you are skilled in FP yet find basic functions on lists obscure. I would have understood had you argued about the functional vs imperative nature of the example, as I agree that skill in functional programming is less common.
But your opinion stating that viewing list concatenations as adittion is "gratuitous" (when it's the very theoretic definition of list concatenation) and that "sum() normally sums a list and return a scalar", when addition can be and is usually applied to every data type with a successor (or 'cons') operator, is what made you look a novice to my eyes. Diego (talk) 20:35, 26 November 2009 (UTC)[reply]
Diego, not every reader of the page is going to see "sum" and instantly think of list concatenation, regardless of the theoretical foundations of it. Those of us who edit articles like this can of course understand what it means, but the readers don't, necessarily. In writing for an audience that does not consist solely of yourself, you have to be willing to write below your own level of expertise. This is an important component of teaching, and is emphatically not a sign that the teacher is a "novice" and needs to learn more. rspεεr (talk) 07:27, 27 November 2009 (UTC)[reply]

To rspεεr: I concur, that's why agreed to the change all along. I was trying to understand the rationale for the change, in order to clarify whether it's really a wise move changing a one-liner with a more complex three lines loop (with a '+=' assignmet-addition operator, no less). There's also the danger that you're over-thinking when trying to put yourself in the shoes of others. Will ALL readers with few expertice find "for i in x: l += i" clearer than "sum(x, [])"? What about someone with a mathematical background, instead of a programming one? It bothers me when someone assumes that putting code in imperative style makes it automatically clearer to everybody.

In particular, this is an implementation detail and the purpose of the 'merge' method is already stated in the comment (concatenate the lists back in order for the next step), so is this additional complexity really helping to better understand the whole program? Will that not miss the forest for the trees? What's wrong with a bit of idiomatic Python, and why do you think it's less clear than a spelled-out loop? Those are my concerns. Diego (talk) 11:05, 27 November 2009 (UTC)[reply]


A particular problem with sum() here is that it is, precisely as Diego says, idiomatic Python. The topic of this wikipage is radix sort and any particular language used for the example should be used as generically as possible. The Python should not be too Pythonesque. People come here to learn about radix sorting, not Python, and the python code should be transparent enough that a budding programmer without experience of Python can follow the code to a reasonable degree. (On this score the += is far more helpful as it exists in many languages.) However, as the algorithm dates back to the fifties, or earlier, and as its formal efficiency is independent of any specific implementation, I strongly urge that a PSEUDOCODE version be included in the page. Not only would this conform to the high standard of most of the other sorting articles, but it would make the article more accessible to more readers. For instance, the Python code is more accessible if it appears alongside pseudocode.