Easy Probability and Clojure - 1 - The South Park Socks Drawer

Need help with your custom Clojure software? I'm open to (selected) contract work.

October 31, 2018

Please share: Twitter.

These books fund my work! Please check them out.

Introduction to the Easy Probability and Clojure series

Lots of programmers would like to learn to analyze data or "do AI", and the understanding of probability is the fundamental skill. There is a lot of assumed knowledge before you can do anything on your own, rather than blindly following image classification tutorials for the 17th time. It takes time to grow it, though.

The pat shouldn't be overwhelming, though, at least its beginning, or you'd quickly drop out. It should be a relaxing activity after work, rather than the thing that shows off and convinces you that you'd never make it. An easy start.

This is the first part in a series of Clojure & Probability tutorials, which will be solving interesting probability puzzles from the classic book 50 Challenging Problems in Probability. It's a collection of exercises similar to The Little Schemer, but for probability. You may prefer to also read some probability prose; see my recommendation.

This series does not require any fancy high performance library (such as Neanderthal, ClojureCUDA, or Bayadera). Even Clojure beginners can follow it and sharpen up their Clojure skills while learning probability. An easy first step to kickstart you towards more advanced topics.

The Sock Drawer

I am giving a talk at the Code Mesh 2018 conference next week. I noticed long ago that many functional programmers wear cool socks and I bought a few pairs for special occasions. In one of my drawers, I keep South Park themed ones. I do not wear them regularly, but only when it's on air, and only at programming conferences.

I am not even sure how many pairs I have. I only remember vaguely that I bought a few with Randy Marsh and some with Butters on them. It's a hard-headed drawer; it does not let me open it and take a peek. When I randomly draw a pair, the probability that I get a matching pair of Randy's is $\frac{1}{2}$.

I wonder if I'll have enough socks for my trip to London, so I would like to know a) how small can the number of socks in that drawer be? b) How small can it be if I know that I've never lost a Butters sock without losing its pair? Should I buy more socks?

Solution

The book 50 Challenging Problems in Probability with Solutions by Frederick Mosteller, which I've taken this problem from, also discusses the solutions. This nice little book costs $6.34, but if it's inconvenient for you to buy, the Google Books preview that I linked to, offers most pages for free reading, including page 15, which contains the solution to this problem, The Sock Drawer. Therefore, I won't use many words to describe it; help yourself. It's better to devote time to discuss some nice beginner-friendly code that implements it.

The probability of the first sock that I draw being Randy is $\frac{r}{r+b}$, and the second $\frac{r-1}{r+b-1}$, since there is one sock less left in the drawer, and that one is mister Marsh. The probability of both events occuring is the product of these two, and I know it's $\frac{1}{2}$:

Mosteller describes it nicely with this equality

$P(R,R) = \frac{r}{r + b} \times \frac{r - 1}{r + b - 1} = \frac{1}{2}$

and then tests it for a particular combination of $r=5$ and $b=2$.

This arithmetic is simple to code and fast to execute on the computer, so I wrote a function to do that for me for any combination of Randy and Butters socks.

(defn probability-matching-randy [r b]
  (if (<= 2 r)
    (if (<= 0 b)
      (* (/ r (+ r b))
         (/ (- r 1) (+ r b -1)))
      1)
    0))

Try it with 5 and 2, and we get the same result as in the book: 5 and 2 is not a combination that leads to probability $\frac{1}{2}$.

(probability-matching-randy 5 2)

10/21

Brute force

The naive solution is to try as many combinations as we can construct, and accept those that lead to the desired result. Clojure's list comprehensions can do that for me. For each combination of numbers that for gives me, I calculate the probability of a matching pair of Randy socks, and when that probability is $\frac{1}{2}$ I keep the combination.

(for [r (range 100)
      b (range 100)
      :let [p (probability-matching-randy r b)]
      :when (= 1/2 p)]
  [r b])

3	1
15	6
85	35

Apparently, I either have 3 Randy and 1 Butters socks (and an unmatched pair lost somewhere on travel), or 15 Randy socks and 3 pairs of Butters, or a whole lotta socks in that drawer.

But, remember that I am pretty sure that I've never lost a Butters sock! I have to check only combinations with even numbers of $b$ 's! I could have changed the :when clause to (and (even? b) (= 1/2 p)) but in that case, probability-matching-randy would have been called twice as often. It's not an issue in this trivial program which is not optimized anyway, but I'll try to introduce some good practices for high performance code on the way, so here it is: don't call expensive computations when you don't have to (probability-matching-randy is rather cheap, while for and the sequences are wasteful, but we'll use some imagination here to pretend that probability-matching-randy is a beast).

(cons ["Randy" "Butters"]
      (take 1 (filter identity
                      (for [r (range 100)
                            b (range 100)]
                        (when (even? b)
                          (let [p (probability-matching-randy r b)]
                            (when (= 1/2 p)
                              [r b])))))))

Randy	Butters
15	6

21 socks. Cool.

We have had to try all combinations to discover the ones that match the condition. As soon as the problems start being a little bit more complex, we'd have lots of combinations to check. Real-world problems grow beyond what computers can scale rather quickly with naive brute force search. Machine learning, data analysis, and related fields rely on finding clever ways to avoid trying out the majority of possible combinations, and only trying those that have some chance of being relevant.

A little help from analysis

Toy examples are good to showcase simple shortcuts.

It would be nice if we could do some arithmetic operation on $\frac{r}{r + b} \times \frac{r - 1}{r + b - 1} = \frac{1}{2}$ to get an easy expression such as $r = \alpha b + \beta$. Try yourself and see how it goes. I couldn't get anywhere.

However, Mosteller cleverly sees that, while we can not establish an easy equality, there is a tidy inequality that can help us: $\frac{r}{r + b} > \frac{r - 1}{r + b - 1}$, for $b>0$.

He plugs it in, and gets ${(\frac{r}{r + b})}^2 > \frac{1}{2} > {(\frac{r - 1}{r + b - 1})}^2$, which can be worked with to give us $(\sqrt{2} + 1)b + 1 > r > (\sqrt{2} + 1) b$. Now we have a lower and upper bound of possible $r$'s when we decide on $b$ 's, and only need to test these, or, in this case, only one:

(defn smallest-randy [b]
  (if (<= 0 b)
    (long (Math/ceil (* (inc (Math/sqrt 2)) b)))
    0))

(smallest-randy 1)

What's left is to try a range of $b$ s and check whether the corresponding $r$ gives a match:

(cons ["Butters" "Randy" "P(Randy, Randy)" "1/2?"]
      (take 3 (map (fn [b]
                     (let [candidate-r (smallest-randy b)
                           pr-2-r (probability-matching-randy candidate-r b)]
                       [b candidate-r pr-2-r (= 1/2 pr-2-r)]))
                   (range 2 1000 2))))

Butters	Randy	P(Randy, Randy)	1/2?
2	5	10/21	false
4	10	45/91	false
6	15	1/2	true

I should probably not buy more socks. I have at least 7 matching pairs of Randy's, after all.

To help Randy socks stay in matching pairs just like Butters, please pledge donations on Patreon to support my work on Clojure high performance, data analysis, and ML libraries. If you'd rather like that I keep losing odd Randy socks but still continue writing this series at a good pace, then consider pledging donations for my Patreon campaign. If you prefer that I stop goofing around with South Park socks and use Bernoulli urns with this kind of problems, supporting my work on Patreon is a slam dunk.

And, since it's Halloween today, I created a special tier on Patreon for 3 Halloween Backers, which is available only today!

What could go wrong?

Nice and easy, right? The greatest help in discovering the minimal number of socks was that I knew the probability $\frac{1}{2}$. In a real world problem, part of the problem would be to estimate that from previous data.

Next, we wouldn't know what's inside the drawer. Maybe there are other socks inside, but there may be lions and tigers, too. Additionally, who cares about socks? We would probably have to estimate a more subtle thing, such as the expected value of second-hand socks. That changes over time. Not only that, but the contents of the drawer would change over time. And nobody would tell you, but things outside the drawer would also have influence on the drawer. The drawer would influence them, too.

Hey, I do not think that kind of problem would be easy to tackle with naive solutions. We would probably need lots of higher level math, and then specialized algorithms to enable that math to be ~~calculated~~ estimated in a finite amount of time.

Luckily, we will keep with these nice little puzzles and take a humble path to our understanding of probability, at least in this series.