An Urn Experiment: Part 1

Print This Post

An urn contains r red and g green marbles. A marble is drawn, one after the other and without replacement until a red marble is drawn. We want to model the length of this experiment, i.e. the random variable is the number of drawn marbles. At first glance, it may seem as though the geometric distribution works, however, since the marbles are drawn without replacement, the trials are not independent and the probability of choosing a red marble changes (increases) with every draw of a green marble. In this post (part 1) as well as the next (part 2), we will model this random variable by simulating the experiment many times using some of the built-in functions in Mathematica and drawing a histogram. In part 3, we will give a probabilistic argument to get the the probability function (PDF) of the random variable. In that post, we will validate our methods by superimposing a graph of the PDF with those of the histogram. Hopefully they will be close!

The main function in our code is called RandomChoice. This function can take on one or more arguments. See the documentation for more details. We will use the version of this function which takes one argument; namely, a list of the objects. The output of this function will be one of these objects and is pseudorandom.

Let’s say you want to choose an objects from the list objects={1,2,a,b,c}. Define this objects and apply RandomChoice as follows:

objects = {1, 2, a, b, c};

RandomChoice[objects]

These two lines of code will produce an output of 1, 2, a, b, or c. Evaluating the second line multiple times gives different outputs.

For our problem, we want to set up an “urn” object that has a certain number of red and a certain number of green marbles. Let’s say 3 red and 4 green. Since we want to simulate this experiment in the most general way, we don’t want to define the urn objects by writing:

urn={"R","R","R","G","G","G","G"}

Although this would be perfectly fine, if we decide to change the number of red or green marbles, then we would have to manually change this list to reflect the new urn.

The Table command produces a list of objects indexed by one or more parameters. It is a very versatile function. We will use it to generate a list of identical elements. We will generate a list of “R”‘s and a list of “G”‘s:

reds = Table["R", {3}]

greens = Table["G",{4}]

Now we will create the urn by joining these two lists using the Join function:

urn = Join[reds, greens]

Now let’s draw a marble and save the result using the name draw:

draw = RandomChoice[urn]

The value of draw will either be “R” or “G”. We want to check which one it is. We can do this by using the same infix operator, ===.

draw === "G"

This code will output True or False, depending on whether draw has a value of “R” or “G” respectively.

We have most of the ingredients to simulate this experiment: Building the urn, drawing a marble, and checking whether or not the marble is red. The next thing we will do is to repeat these steps until we draw a red marble. Programmatically, this means we will build the urn, draw a marble, and check if it is red, and repeat until draw === “G” returns False. Another way to say this is to say we will repeat the three steps while  draw === “G” returns True. We will do this with the While function. This function takes two arguments. The first is an boolean expression, an expression that returns true or false. The second expression is called the body and this is the code that will be executed repeatedly and until the first boolean expression returns false. Here is what our code will look like (Mathematica ignores everything between (* and *). We use this to make the code human readable. In programming these are all comments. Commenting your code is extremely important!

rv = 0;

(*build urn*)

(*draw*)

rv++;

While[draw === "R",

(*build*)

(*draw*)

rv++

];

rv

This code contains one thing we haven’t talked about yet, the random variable! We need to keep track of the value of the random variable each time we draw a marble. We set it up to have an initial value of zero rv = 0 and every time we draw a marble we increment its value rv++. The semi-colons in the code suppress the output. Putting rv at the end of the code returns the value.

Okay, we are now ready to put this all together.

rv = 0;
(*build urn*)
reds = Table["R", {3}];
greens = Table["G", {4}];
urn = Join[reds, greens];
(*draw*)
draw = RandomChoice[urn];
rv++;
While[draw === "G",
(*build*)
reds = Table["R", {3}];
greens = Table["G", {4 – rv}];
urn = Join[reds, greens];
(*draw*)
draw = RandomChoice[urn];
(*draw*)
rv++

];
rv

Copy/Paste this code in Mathematica. The result will be either 1, 2, 3, 4, or 5. The best case scenario is choosing a red marble on the first draw, this corresponds to the random variable having a value of 1. The worst case scenario is choosing all the (4) green marbles in which case the fifth draw will definitely be red, this corresponds to the random variable having a value of 5.

Can you explain why the following line of code is there?

greens = Table["G", {4 – rv}];

In the next post, we will build a custom function that will simulate this experiment many times and return the values of the random variable from each simulation.

Disclaimer: There is a much more efficient way to write the code for this experiment. However, as this is targeted to readers who are new to Mathematica and perhaps even to programming in general, we will be satisfied with this version for now. Readers who are more experienced with Mathematica and/or programming in general are encouraged to write faster and more efficient code. This definitely comes in handy if we want to simulate such an experiment a large number of times, say 1 million times.

Leave a Reply

Your email address will not be published. Required fields are marked *