Charlie Harvey

Of Birthday Problems, Haskell and Floating Point Precision

Learning Haskell by working my way through the Real World Haskell book and the huge variety of online resources recently has reignited my curiosity about “proper” computer science. The sort with maths. I’ve picked up most of my maths as a side effect of hacking code, so I have been slowly working my way through “Sets, Logic and Maths for Computing” by David Makison, which is a reasonably thorough undergraduate-level introduction biased more towards set theory than some texts. I think that it has been worth doing.

my qorkings out for the birthday problem solution

In the chapter on probability and combinatorics a classic question is posed, which has for various reasons, rather occupied my mind for the last couple of days.

There are 25 people in a room. What is the probability that at least two are born on the same dayof the year (but not necessarily the same year)? For simplicity, assume that each year has 365 days, and that the people’s birthdays are randomly distributed through the year (both assumptions being, of course, rough approximations)

Now there are a number of solutions, but most of them require some big maths. Which is to say numbers like 36525, dividing factorials. As it turns out Haskell, whilst being able to handle mega big Integers, isn’t quite as nice when you want arbitrary precision floats. Of which more anon. But first I need to explain how I solved the problem. There are other solutions — see the Birthday Problem entry for example — but the one I got to seems to be the standard way of doing it.

How to solve the problem

The text gives three extremely helpful hints that help you get to a solution.

  1. Be careful with your choice of sample space
  2. First find the probability that no 2 distinct people have the same birthday
  3. You will need a pocket calculator

Well my pocket calculator can’t handle 36525. Maybe I need a better one. Enter GHCi, which said$ ghci GHCi, version 7.4.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> 365^25 11410944981823451546774580833569544217039269564449787139892578125 Nice.

So, back to hint 1, thinking about the size of my sample space. Well, the sample space has to be based on the 365 days of the year. The order is significant and we allow ourselves to repeat days. Which means that we use permutations with repetition allowed, the formula for which is just nk. In our case, that means our old pal 36525. Which is just saying 365*365*365… and so on until we’ve multiplied 365 by itself 25 times.

The second hint suggests that we look for the set of all people with different birthdays. Again, the order is significant but this time repetitions are very much not allowed — remember that we are looking for distinct birthdays. So, this is a case of permutations. If you write this all down mathematically you get something like

Image, used under creative commons, from wikipedia

If you are still confused at this point, have a quick look at this youtube video, from "Maths made almost bearable" which explains things quite nicely.

Haskell time

To get to this point took me a while. I’m not a mathemetician, but I did get there in the end. However, the Haskell implementation turned out to be harded than I had expected. Possibly because I was trying to make a one liner. But mostly because I struggled to find a way to divide large Integers together.

I started by looking for the built in factorial function. Which it turned out wasn’t built in. There is, however, a product function, which multiplies together the elements of a list. Combining that with the fact that one can generate a list from a range (1..5) gave me a way to write fac (as I decided to call my factorial function) nicelylet fac n = product [1..n]

Now that I could do factorials, I could implement a permutation fuction. Easy enough to do like thislet perm n' = fac 365 `div` fac (365-n');The n' becomes relevant later, but it is just an Haskellish name for a variable that is somehow related to n, rather than calling it n2 or something as one might in another language. We take the factorial of 365 and divide it by the factorial of 365 minus another integer. To solve the question n' would be 25. This is probably a case of bad naming, as mathemeticians would usually refer to what I am calling n' as k and to 365 as n. If this bothers you, you should change it. Here is a test run anyhow.Prelude>let fac n = product [1..n] Prelude> let perm n' = fac 365 `div` fac (365-n'); Prelude> perm 2 132860 Prelude> perm 25 4921543948648615193038527019122032412846804592445950918656000000 Prelude>

Now that I had those bits, what I wanted to say wasbd n = 100 - 100 * perm n / 365^nI multiplied the probability of all the birthdays being distinct by 100 to get a percentage. Then I took that percentage off 100 to find out the percentage probability that all the birthdays are not distinct. That is at least 2 people share a birthday.

However, dividing big integers like that can, it seems, overflow boundaries of the Double that Haskell comes with. I had a dig about for ways round this. I would have thought it would be widely documented, but as it turns out that was not the case. The HMPFR library came up during my search, but the solution I went with in the end was somewhere in a mammoth thread on Haskell Café entitled about integer and float operations. Who knew? I implemented a function that does division by first constructing a Rational, then converting to floating point using fromRational. Note that I had to import Data.Ratio to get the % operator.x /. y = > fromRational (x % y)

This mostly does as you would expect. Mostly. And usually for big numbers, which is what we care about.Prelude> import Data.Ratio Prelude Data.Ratio> let x /. y = fromRational (x % y) Prelude Data.Ratio> fac 777 /. fac 778 1.2853470437017994e-3 Prelude Data.Ratio> fac 777 /. fac 771 2.158351619436432e17

I now had all the components I needed to make a nice little one liner to solve the birthday problem for whatever number I wanted.Prelude Data.Ratio> let bd n = 100.0 - 100.0 * perm n /. (365^n) where perm n' = fac 365 `div` fac (365-n'); fac x = product [1..x]; x /. y = fromRational (x % y) Prelude Data.Ratio> bd 25 56.86997039694639 At this stage I could answer the original question and say that the probability that at least 2 of the 25 people in the room were born on the same day of the year is 56.87 percent (ish). Which may or may not surprise you.

The implementation could stand to be a little more efficient. I do a lot of work in the fac function. I can improve performance by changing how I calculate the permutation so that I need to calculate fewer factorials. How? Well fac 365 / fac 365 - n is the same as fac 365 - (n - 1) and I was able to take advantage of this fact like thisPrelude> fac 365 `div` fac 362 48228180 Prelude> product [363..365] 48228180That made me happy, because I could avoid a fair bit of redundant calculation. You can observe the effect of redundant calculation when the figures are an order of magnitude bigger. You could do the following.Prelude> fac 36500 `div` fac 36300 1664794314530650901156157427362126079623718218858338320926253293194738029255822028880804606888391620622216346830718850489163512882032403154323479420940484945108180108150817321610145580902741060457892438267562280504387599793310194282194436244175253336388945446577578317455324514816275134418826320949533614256936277001924393246993319975850825529842665801562179315307492480192204683979570635723886832740975026116663456784309654170151889459039390520911997827868496881384821437627268818497913193773276823414991643772715776993626808085451928663301632257485173243595486303454386605026994637467439070225888511938467908860108173806269061218085426994717143254299411207555930798536689399855217030247250194784241041929395588519421076007220176675600893925794424367074488857667687578194574907273296310755718552967233481418597816809213004129158878860590102578587316264240572006400000000000000000000000000000000000000000000000000 Prelude> product [36301..36500] 1664794314530650901156157427362126079623718218858338320926253293194738029255822028880804606888391620622216346830718850489163512882032403154323479420940484945108180108150817321610145580902741060457892438267562280504387599793310194282194436244175253336388945446577578317455324514816275134418826320949533614256936277001924393246993319975850825529842665801562179315307492480192204683979570635723886832740975026116663456784309654170151889459039390520911997827868496881384821437627268818497913193773276823414991643772715776993626808085451928663301632257485173243595486303454386605026994637467439070225888511938467908860108173806269061218085426994717143254299411207555930798536689399855217030247250194784241041929395588519421076007220176675600893925794424367074488857667687578194574907273296310755718552967233481418597816809213004129158878860590102578587316264240572006400000000000000000000000000000000000000000000000000 The same result. But I can count to 9 whilst the first calculation runs, but not even get to 1 when the second does. Charlies like massive speed improvements! Even if, in this case, it is barely noticeable when I call the bd function.

Here is the improved and more efficient codePrelude> let bd n = 100.0 - 100.0 * perm n /. (365^n) where perm n' = product [(366-n')..365]; x /. y = fromRational (x % y) Prelude> bd 25 56.86997039694639There is no doubt still plenty of room for improvement, but this code solves the problem as required.

Another approach

It took a fair while to get such a tiny bit of code to work, but the Wikipedia entry had another, to my mind more elegant (though less accurate) method to approximate the probability.

The probability of any two people not having the same birthday is 364/365. In a room containing n people, there are C(n, 2)=n(n-1)/2 pairs of people, i.e. C(n, 2) events. The probability of no two people sharing the same birthday can be approximated by assuming that these events are independent and hence by multiplying their probability together. In short 364/365 can be multiplied by itself C(n, 2) times.

Implementing that as a one liner starts with the C(n,2) part. That is maths speak for combinations, meaning that order is not significant and repetition is not allowed. The formula I know is n! / k!(n-k)! More of those factorials. I wrote a little combine function, which, unlike the perm function implements combinations generallyPrelude Data.Ratio> let combine n' k = round $ fac n' / (fac k * fac (n'-k) ) Prelude Data.Ratio> combine 10 6 210 As before, however, I needed to guard against overflows by using the /. division trick. Otherwise I could end up withPrelude Data.Ratio> combine 180 6 -269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824 Prelude Data.Ratio> combine 190 6 -269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824 Prelude Data.Ratio>Not what I wanted to see. Compare and contrast withPrelude Data.Ratio> let combine n' k = round $ fac n' /. (fac k * fac (n'-k) ) Prelude Data.Ratio> combine 190 6 60334683255 Prelude Data.Ratio> combine 180 6 43424719800 Prelude Data.Ratio> combine 170 6 30663442810 Prelude Data.Ratio>

With the division issue once again resolved, I was in a position to implement the exponent approximation method of calculating birthday probabilities thus.Prelude Data.Ratio> let bd' n = 100 - 100 * (364/365) ^ (combine n 2) where combine n' k = floor $ fac n' / (fac k * fac (n'-k) ); x /. y = fromRational x % y Prelude Data.Ratio> bd' 25 56.09077642340239 Prelude Data.Ratio> -- compared to Prelude Data.Ratio> bd 25 56.86997039694639 Prelude Data.Ratio>

Conclusion

So, there you have it. Two answers to the question, and I also learned one way to handle arbitrary precision floating point division in Haskell. It is probably time to move on to the next chapter. But before I finally leave the birthday problem behind, here is one way to get the whole problem completely muddled. You can read a great explanation of why Carson got so mixed up in the New York times blog.

Clip courtesy of Cornell University


Comments

  • Be respectful. You may want to read the comment guidelines before posting.
  • You can use Markdown syntax to format your comments. You can only use level 5 and 6 headings.
  • You can add class="your language" to code blocks to help highlight.js highlight them correctly.

Privacy note: This form will forward your IP address, user agent and referrer to the Akismet, StopForumSpam and Botscout spam filtering services. I don’t log these details. Those services will. I do log everything you type into the form. Full privacy statement.