Free Forum - Some "fast shuffle" stats

Roger Harris

Posts count: 84

Topics count: 3

Jan 30, 2006, 12:53:37 AM

Send message

Some "fast shuffle" stats

Just a quick follow-up and progress-to-date on the discussion some of us were having concerning a technique for speeding up sims by not shuffling each shoe. The following sims were run with a modified version of ET Fan's PowerSim which initializes 250 arrays of dealing sequences and uses each once before reshuffling the shoe. (The first array is initialized with the numbers 1 through the total number of cards, and each successive sequence array is a copy of the previous, which is then shuffled using the same random-swapping techinque as shuffing the shoe. These arrays then determine the sequence that cards are dealt from the shoe, rather than using 1, 2, 3,... each time. After dealing from each of the 250 sequences once, the shoe itself is shuffled and the process repeated.)

As I previously reported, this techique cuts the PowerSim execution times by 40% to 45%. The question is, does the techique produce useful results?

Each of the following summarized result lines represents 5 billion rounds simmed, but the stats were collected each one million rounds, so there were 5000 sets of data points in each sim. Stats were collected for the IBA, the per-hand variance, the number of doubles, splits, player BJs, and dealer BJs. The purpose of analyzing 5000 one-million-round results was to compare the means and standard deviations of the 5000 samples to determine if the standard shuffle and the "fast shuffle" had similar statistical characteristics. (For example, if the standard deviations were higher using the fast shuffle, that would indicate that the true significance of the samples was less than expected, even if the means happened to be similar.) Sims were run for 1-, 2-, 6- and 8-deck games using the same rules and strategy files (75% pen, S17 DAS DOA, using generic strategy, except for the 2-deck game, for which ET Fan had a custom strategy for H17 DAS so I used that). In the chart, PS is the original PowerSim and PSx2 is the fast-shuffle version:

 

              ----- IBA -------   -- VARIANCE ---   -- DOUBLES ---   -- SPLITS ---   - PLAYER BJ -   - DEALER BJ - 

                Mean    StdDev     Mean   StdDev      Mean  StdDev   Mean  StdDev    Mean   StdDev    Mean  StdDev 

              -------- --------   ------- -------   -------- -----   ------- -----   ------- -----   ------- ----- 

1-DECK   PS  -0.001956 0.001132   1.31111 0.00117   102146.8 302.2   20717.1 152.8   47944.4 199.0   47945.2 208.5 

       PSx2  -0.001962 0.001132   1.31109 0.00115   102136.2 296.2   20719.4 149.6   47942.7 204.3   47947.8 203.8 

 

2-DECKS  PS  -0.004509 0.001149   1.35176 0.00129   112250.9 313.2   25867.0 177.9   47620.6 204.8   47617.5 205.2 

       PSx2  -0.004569 0.001145   1.35173 0.00125   112239.5 306.7   25868.6 174.1   47623.1 202.9   47622.4 200.1 

 

6-DECKS  PS  -0.004252 0.001129   1.33212 0.00128   103974.1 300.4   27864.9 182.3   47426.0 202.7   47424.8 204.2 

       PSx2  -0.004257 0.001139   1.33213 0.00129   103971.3 297.3   27869.3 184.7   47426.2 203.5   47423.7 201.6 

 

8-DECKS  PS  -0.004508 0.001145   1.33329 0.00129   104065.3 300.3   28239.6 184.3   47400.7 202.6   47407.3 202.2 

       PSx2  -0.004465 0.001137   1.33330 0.00131   104064.3 300.4   28241.3 185.1   47405.5 202.2   47407.4 204.3

In general, I believe the results were pretty good; i.e. I believe that if given a third set of data using one or the other sim, I think it would be hard to tell which it was. (But I do intend to try that experiment.)

Another test of the results is to see if the 5000 data sets seem to follow a normal distribution. If so, then about 68% of the results should fall within one StdDev of the mean and about 95% should fall within two StdDevs. Each of the data sets easily meet that criteria when compared to there own means and StdDevs, but what if we take the means and StdDevs of the original PowerSim as truly representative of the poplulation, and check to see if the 5000 fast shuffle samples also meet that criteria? The following chart shows what percentage of the PowerSimX2 samples fall within 1 and 2 StdDevs of the PowerSim means:

 

                IBA    DBLS   SPLS   P-BJ   D-BJ 

                -----  -----  -----  -----  -----  

1-DECK   1 SD   0.687  0.693  0.688  0.668  0.696 

         2 SD   0.954  0.956  0.956  0.948  0.959 

 

1-DECKS  1 SD   0.689  0.694  0.692  0.685  0.694 

         2 SD   0.955  0.960  0.961  0.957  0.961 

 

1-DECKS  1 SD   0.684  0.694  0.682  0.680  0.688 

         2 SD   0.952  0.954  0.951  0.954  0.960 

 

1-DECKS  1 SD   0.680  0.683  0.686  0.680  0.677 

         2 SD   0.958  0.953  0.951  0.955  0.954

Again, I believe the results are pretty good, but I'm completely open to criticism of the analysis and suggestions for further tests.

Norm Wattenberger

Posts count: 478

Topics count: 31

Jan 30, 2006, 11:18:00 AM

Send message

What do you expect to see?

Inadequate randomness and in particular inadequate period or repeated sequences can cause the results you are observing to go up, down or remain the same. In particular, standard deviation numbers stabilize very quickly and therefore are not likely to be affected by anything related to period or repeats. Plug in a terrible RNG and you are likely to see the same sort of numbers. But, you will never know what effect it has on a future simulation.

Sun Runner

Posts count: 503

Topics count: 7

Jan 30, 2006, 12:44:54 PM

Send message

Boorinnng. :) (nt)

Roger Harris

Posts count: 84

Topics count: 3

Jan 30, 2006, 12:51:52 PM

Send message

Forgot to mention....

... that I'm setting the RNG seed to a known number (depending on the number of decks) to create the dealing sequences, before seeding (from current time) for the run. That means that, for any particular number of decks, the sequences are always the same. The seeds I'm using were determined by a using a program to identify candidate seeds by counting the total number of times each card in the shoe was dealt in the complete 250-sequence cycle. The program computed the standard deviation of those numbers and selected candidate seeds as being those that produced lower standard deviations than typical 250-shoe deals with shuffling every shoe. I then took several of the ones with the lowest SDs into Excel and graphed them; first, in there shoe-ordinal position to look for seeds that produced peaks and valleys that visually appeared to be well distributed through the deck, then in sorted order to verify that the tops of the bar graph produced a smooth S-curves. (In fact, they look much better than typical 250-shoe runs with shuffling each shoe, but of course those are varying each time, so in the long run they smooth out.) Then, using the ones that looked promising, I ran many 100-million- and 500-million-round sims, then for each number of decks, selected the seeds that seemed to produce the most consistent results.

Since the sequences are the same for each of the 5000 million-round samples that I summarized in my posting, I believe that if there were any anomalies such as the type you are concerned about, they would have indeed produced higher standard deviations in those 5000 samples. But in fact, the standard deviations are generally about the same or lower than those produced by the original PowerSim.

I believe that you would be correct that systematic anomalies could occur occasionally and randomly if I created the dealing sequences randomly each time. That's exactly why I decided that that isn't a good idea, even if many runs looked good. The fact that I'm using the same sequences each time means that your "terrible RNG" analogy doesn't quite hold -- these sequences will perform the same each time -- and as I said before, I don't believe it's really necessary for the sequences to be truly random in each sim, since the subject shoes are.

I'm sure that there are more rigorous statistical tests that could be performed to insure that the seeds I'm using are free of such defects, which is why I'm asking for advice on further testing. If there are any such defects, I am still of the opinion that (many) sequences can be found that don't have those defects.

Norm Wattenberger

Posts count: 478

Topics count: 31

Jan 30, 2006, 1:03:41 PM

Send message

That's worse

You seem to be attempting to create uniform randomness. That's what people tried in the early days of RNGs. They generated pretty 2D histograms and thought that meant randomness. Marsaglia rotated them in 3 dimensions and showed clear planes. Hence his line "random numbers fall mainly in the plains." You can't pick sets of random looking numbers. That's specifically nonrandom. Decades of research by mathematicans has gone into generating random numbers without tricks. This wheel does not need reinvention.

Roger Harris

Posts count: 84

Topics count: 3

Jan 30, 2006, 1:32:05 PM

Send message

Sorry, but I don't think you are looking at it correctly

The shoes themselves are shuffled "randomly," subject to the quality of the PRNG. It seems to me that that's a completely separate issue. I am definitely not attempting to reinvent random number generation; I'm exploring the true meaning of a random permutation of a set, which I don't see as being the same thing at all. What I'm getting at is that, assuming that the shuffled shoe is a reasonable approximation of a random permutation (good enough to give useful results in a BJ sim, at least), then there's nothing magic about the sequence 1, 2, 3... in that shoe. What I'm claiming is that, assuming that the shoe itself is a random permutation, then there are many unique, uncorrelated sequences that also produce usefully accurate approximations of random permutations.

I'm perfectly willing to put the hypothesis to the test, but I don't seem to be getting a lot of help with that. You seemed to have dodged my main point: If this technique can be proven to produce consistent results with low standard deviations across a large number of samples, and the same sequences will be used each time so systematic anomalies cannot mysteriously arise in future runs, then why does this technique not produce useful results?

Norm Wattenberger

Posts count: 478

Topics count: 31

Jan 30, 2006, 2:05:10 PM

Send message

It ain't random

You are diluting the effect of the RNG and you are attempting to force randomness. The results must be nonrandom. You are also forcing shuffle to shuffle similarities and excluding some very nonrandom looking sets. But, such sets must exist. You are looking for good distributions at the cost of randomness. Surely this will give you results that "look" random. But they aren't.

Tricks like this damage the good work that was put into creating a good random number stream in the first place. That's really all I have to say on the subject as this is getting quite repetitive.:)

Roger Harris

Posts count: 84

Topics count: 3

Jan 30, 2006, 3:06:00 PM

Send message

Well, I have to say...

... I'm a bit surprised at the weakness of your argument. I'm definitely not "excluding some very nonrandom looking sets," because they must exist in the shuffled shoe when viewed through the sequence 1, 2, 3... or any other sequence, if the shoe itself is a reasonable approximation of a random permutation. That seems to me to be a key point: If what you are implying were logically correct, then it seems to me it wouldn't be valid to use the sequence 1, 2, 3... on every shuffled shoe.

anonimuss

Posts count: 377

Topics count: 20

Jan 30, 2006, 3:52:05 PM

Send message

Uhhh..

SW already covered that years ago in PB.

Roger Harris

Posts count: 84

Topics count: 3

Jan 30, 2006, 4:34:45 PM

Send message

And...?

Do you happen to recall any of the specifics?

Roger Harris

Posts count: 84

Topics count: 3

Jan 31, 2006, 4:31:32 AM

Send message

Proof of concept

It occured to me tonight that there is an easy way to test the basic validity of this technique. The problem with analyzing even a single-deck game is that the number of permutations of 52 cards is staggeringly large. But if the concept is valid, then the technique should work with a much smaller, easily analyzed deck, and conversely if the technique is invalid, then flaws should show up in a much smaller deck, too (perhaps even more so?).

I ran some tests using a deck of 9 cards, numbered 1 through 9. There are only 362880 permutations of the 9 card deck. From each shuffle, I dealt 4 cards, and there and only 3024 permutations of 4-card draws. I then ran sims of 100,000 trials (i.e. less than the total number of deck permutations, greater than the total number of hand permutations), then taking the 4 cards as a 4-digit number, I counted the number of times each possible hand was drawn. For the first set of runs, I shuffled the deck after each hand, and for the second set of runs I created a set of 18 sequences and reshuffled the deck only after using each sequence once. (I created the sequences using the same technique I described of using an initial seed that produced a low standard deviation in the number of times each card was drawn in an 18-round cycle.)

I'm not sure how many tests I should run to be convincing, but for ALL of the tests I've run so far, regardless of the shuffling method, the range of the results, the standard deviations, and the graphs are virtually identical. I simply don't see any indication of the non-randomness that you presume.

Norm Wattenberger

Posts count: 478

Topics count: 31

Jan 31, 2006, 9:25:27 AM

Send message

Interesting; but meaningless

An extremely poor RNG could give you the same results. Looking at results is not a 'proof of concept.' Unless you run every possible sim and look at every possible stat that could ever be generated. It's like saying you won using a Martingale, so progressions work.

Roger Harris

Posts count: 84

Topics count: 3

Jan 31, 2006, 8:35:46 PM

Send message

Yes, I think it's interesting, and not totally meaningless

(... at least, I happen to think it's somewhat less boorinnng than whether the A,8 index ought to be rounded up or down, but to each his own I guess ;-) Anyway, thanks for hanging with me for at least one more post.)

"An extremely poor RNG could give you the same results."

Yes, indeed one could -- but not just any old "extremely poor RNG." We can say specifically that any RNG that causes the shuffle algorithm to produce normally-distributed 4-card frequencies would be expected to give you the same kind of results, whatever other flaws it might have, because that's all that particular sim was intended to test. There was no test for anything like sequential correlations or cyclic patterns or "rotating in three dimensions." But consider this: Suppose this was a sim for a game in which you bet a fixed amount each hand and won a fixed amount for certain 4-card hands. The frequences of the 4-card hands would be the only thing that matters. Why should you want to use a Mersenne Twister RNG for this particular application if an "extremely poor RNG" existed that gave equivalent results in considerably less time? Granted, that's an overly simplified example, since BJ is much more complicated than such a game, but that principle, in a nutshell, is why I don't think it's necessary to require a BJ shuffle algorithm to perform as a general-purpose RNG in its own right. What reason is there for worrying about flaws in the pseudo-randomness that don't make any detectable difference in the results?

"Looking at results is not a 'proof of concept.'"

Well sure, it's impossible to "prove" anything with statistics, but that's not exactly what I meant to imply. What I think the test "proves" was that the technique has a statistical behavior with respect to 4-card frequencies that appears to be indistinguishable from shuffling every shoe. But that's good to know, because there would be no point in going further if it didn't, so it's not meaningless.

As for sequential correlations or cyclic repetition, I think the question remains: what kind of flaws would actually cause poor or unreliable results in a BJ sim, and how would you know whether or not they exist in any particular sim, regardless of how it was implemented?

One easily stated requirement would be that you don't want any non-random hand-to-hand correlations within the same shoe -- not if you're going to use the sim to test any kind of betting strategies (counting or progressions). In all the cases where the TC is certain number, you want to have confidence that the player advantage on the next hand is what it "ought" to be, or the sim simply isn't useful.

In that light, look at what's happening in my algorithm: I have 250 sets of sequences which have been shuffled using exactly the same method that would be used to shuffle the shoe itself. So, for the first 250 shoes at least, I think you would have to agree that there is absolutely no difference between my algorithm and shuffling the shoe each time.

So much for any hand-to-hand correlations within a shoe, or any shoe-to-shoe correlations or cyclic patterns within 250 shoes (and that applies to any cycle if that iteration is considered in isolation from the others). Any correlations or patterns that would make a difference in a BJ sim would be a minimum of 250 shoes apart.

But the shoe is shuffled before starting another 250-shoe cycle. The 251st shoe deals the cards in exactly the same order as the 1st shoe and the 252nd is the same as the 2nd, etc., but since the shoe has been shuffled, that fact alone is not of any concern. (In the standard algorithm, every shoe deals cards in exactly the same order as the 1st shoe.)

So, it appears to me (and I'm perfectly willing to be proven wrong about this) that the sole remaining area for potential problems is that the repeated use of the same 250 sequences on the same shoes without shuffling would be that you can get over- and under-sampling of certain cards in each shoe if the sequences are not "well distributed" through the shoe (at least, not when compared to what would be produced randomly, which is not an even distribution). And the problem with that is, since we know that different cards have different effects on the player advantage, then over- and under-sampling will skew the results within any given 250-shoe cycle. Since the particular cards affected are changing each cycle, sometimes it would be in the players advantage and sometimes not, so in the long run it should average out to produce a final result somewhere near the true mean. But the problem would be, as ET Fan pointed out (and as I discovered existed in my original method of using an incrementing offset instead of random sequences), that the sim would really only have the statistical significance of a smaller sample size. And that's where the standard deviations of many shorter-term sims comes in: any loss of significance should be reflected in the SD.

You seem to believe that there could be problems that would not show up in those standard deviations for IBA and frequencies of certain hands, but which could suddenly pop up in the very next sim to ruin the results. I just don't see how that could suddenly happen out of the blue if the sequences are the same each cycle, and there are no short-term characteristics within any cycle that are different from shuffling each shoe, and the only things that will be different in the next sim are the permutations of the shoe each cycle.

But perhaps there are more meaningful statistics other than SD for measuring such things? (That's a rhetorical question, since I presume that if you knew of one, you would have mentioned it by now.) I attempted a chi-square analysis on the two 5000 samples sims, but that didn't seem to be sensitive enough to say more than that there is a "high" probability that both sets of data are random.

No need to reply if you really feel you are pointlessly repeating yourself.

New Articles

Some "fast shuffle" stats