Miller Shuffle Algorithm
12/7/2023 7:52 pm | : 7 mins. | Share to:
Everyone loves 'shuffle' on their music player. I never really thought about how it worked to ensure that it randomly shuffled through a playlist without repeats, but this guy has, and he has improved on it.
From the linked instructables page:
With the case of an MP3 player, or any play-list shuffle, one might and apparently some do (even on big name electronics and streaming services), simply use an operation like songIndex=random(NumOfSongs). This use of a PRNG will give you a good mathematical randomness to what is played. But you will get many premature repetitions of song selection. With a population of 1000 to choose from you will get ~37% repeats out of 1000 plays. From the start of a listening session on average a song will repeat within ~26 songs. The % of repeats is the same regardless of the selection population size.
The accepted goal of a “Shuffle” algorithm is herein defined as providing means to reorder a range of items in a random like manner, where each item is referenced once, and only once, when going through the range (# of items). So for 1-52 (think card deck) you get 52 #s in a pseudo random order, or similarly for a song list of 1000s. Re-shuffling gives a very different mix.
The Fisher-Yates (aka Knuth) algorithm has been a solution that fixes this unwanted repetition. The 1000 songs play in a 'random' order without repeating. The issue this algorithm does come with is the added burden of an array in RAM memory of 2 times the maximum number of songs (for up to 65,000 songs 128KB of RAM is needed) being dedicated to shuffled indexes for the duration that access to additional items from the shuffle are desired (so as to not give repeats).
After reading it and trying to read the code (written in C, I believe), there is also a link to a github repo with more iterations on the algorithm. A few excerpts:
The way the algorithm works its magic is by utilizing multiple curated computations which are ‘symmetrical’, in that the range of values which go in are the same values which come out albeit in a different order. Conceptually each computation {e.g. si=(i+K) mod N } stirs or scatters about the values within its pot (aka: range 0 to N-1) in a different way such that the combined result is a well randomized shuffle of the values within the range. This is achieved without the processing of intermediate “candidates” which are redundant or out of range values (unlike with the use of a PRNG or LFSR) which would cause a geometrically increasing inefficiency, due to the overhead of retries.
So basically you can query the algorithm, providing the "deck" of things, and what your location in the query is, the seed info (for randomness), and then it can tell you what is next in the shuffle without actually having to move things around.
For example, let's say we had a deck with five cards in it: [a, b, c, d, e] - We just tell the algorithm three things:
- What our current index is in the deck
- A "shuffle ID" which is the seed for the randomizing
- How many entries are in the deck
So if we're just starting it, we'd say we're at position -1, which doesn't exist. We give it the random shuffle ID of "123" and lastly we tell it there are 5 entries. It then calculates that the next position to start playing is 4th in the queue. Then when it is time to play the next song the algorithm is fed "4", "123" and "5" to then return 2, etc.
Under the most common other way of handling randomizing playlists what happens is it takes the indexes for each item in the deck, then shuffles them. So it might create a separate list of the deck positions, [2,5,1,3,4] which requires you to maintain this memory. For desktop computers, obviously that is not a problem. But what if you're making a tiny computer using a simple board like an Arduino or something and you have memory limitations, etc. This is a big step forward for it.
Will it change the world? Doubtful. But still interesting to learn about.