An Urn Experiment: Part 2

In the last post we developed some code to simulate the experiment of choosing marbles from an urn containing r red and g green marbles. We choose one marble at a time and without replacement until we see a red marble. In this post, we will build a function called simUrn that will repeat this experiment many times and return all the values of the random variable. Recall that the random variable is the number of marbles we had to draw before seeing a red marble.

Here is the code we have so far:

rv = 0;
(*build urn*)
reds = Table["R", {3}];
greens = Table["G", {4}];
urn = Join[reds, greens];
(*draw*)
draw = RandomChoice[urn];
rv++;
While[draw === "G",
(*build*)
reds = Table["R", {3}];
greens = Table["G", {4 – rv}];
urn = Join[reds, greens];
(*draw*)
draw = RandomChoice[urn];
(*draw*)
rv++

];
rv

We will be using a Module so that we can localize all the variables. We also need to introduce a new variable called rvList that will store the value of the random variable after each simulation. Finally, we will use a counter variable called counter that will repeat the experiment until the desired number of simulations is achieved.

Here are two templates for Module:

ModuleFunctionTemplates

Both template require two arguments. The first argument is a list containing the local variables and the second argument contains the main body of the function. In the second template, we can initialize the values of the local variables. The code we developed last time (with some minor adjustments) will be the main body of the Module. We need to add a While loop to repeat the experiment until a specified number of times and after each simulation we want to store the value of the random variable.

WhileFunctionTemplate.png

Let’s say we want to simulate the experiment 1000 times, our “test” will be counter < 1000. Our body will be mostly the code above that actually does the simulation.

Here is a high-level view of the Module code:

Module[{counter = 0, rv, reds, greens, urn, draw, rvList={}},]

While[counter < 1000,

(*simulate the experiment*)

(*store the value of the random variable*)

];

(*return the list containing the random variable values*)
rvList
]

Notice that two of the local variables are initialized (counter and rvList) while the others aren’t. It is quite often that the two templates of Module are used together.

Let’s now talk about how to build functions. There are two ways to extend the Wolfram Language. One way is to Set a symbol with an expression or a number: x=RandomReal[]. Another way is to use SetDelayed: x:=RandomReal[]. The former method evaluates the right hand side of the equal sign and assigns the result to x. The latter version only evaluates the right hand side when the symbol x is called. We want the latter version.

Finally our function simUrn needs to be customizable. We want to be able to specify a number of parameters (arguments). Number of reds, number of greens, and the number of simulations. Let’s call these values r, g, counterMax respectively. We will define our function as follows:

simUrn[r_,g_,counterMax_]:=Module[…]

Ok, so let’s put all the code together.

simUrn[r_,g_,counterMax_]:=Module[{counter = 0, rv, reds, greens, urn, draw, rvList={}},

While[counter < 1000,

(*simulate the experiment*)
rv = 0;
(*build urn*)
reds = Table["R", {3}];
greens = Table["G", {4}];
urn = Join[reds, greens];
(*draw*)
draw = RandomChoice[urn];
rv++;
While[draw === "G",
(*build*)
reds = Table["R", {3}];
greens = Table["G", {4 – rv}];
urn = Join[reds, greens];
(*draw*)
draw = RandomChoice[urn];
(*draw*)
rv++
]
(*store the value of the random variable*)
AppendTo[rvList,rv]
];
(*return the list containing the random variable values*)
rvList
]

We are almost done. Now we need to make sure we pass on the arguments (number of reds, number of greens, etc) to the code. Currently, the code will run with 3 red and 4 green marbles and will simulate the experiment 1000 times. Here is the adjusted code.

simUrn[r_, g_, counterMax_] :=
Module[{counter = 0, rv, reds, greens, urn, draw, rvList = {}},
While[counter < counterMax,(*simulate the experiment*)rv = 0;
(*build urn*)reds = Table["R", {r}];
greens = Table["G", {g}];
urn = Join[reds, greens];
(*draw*)draw = RandomChoice[urn];
rv++;
While[draw === "G",
(*build*)reds = Table["R", {r}];
greens = Table["G", {g – rv}];
urn = Join[reds, greens];
(*draw*)
draw = RandomChoice[urn];
rv++]
(*store the value of the random variable*)
AppendTo[rvList, rv];
counter++;
];
(*return the list containing the random variable values*)
rvList]

Let’s test this code by running the function simUrn[3, 4, 100]. My output is shown below. You output will contain different numbers.

{2,2,2,3,3,2,4,2,3,1,1,3,1,1,1,4,1,4,2,1,4,3,1,3,2,4,2,1,2,4,1,2,2,2,3,1,2,2,4,3,2,3,3,1,3,3,1,3,2,1,3,2,1,1,2,1,1,3,2,3,1,1,1,2,4,3,1,1,3,2,3,3,4,4,4,1,5,4,5,5,1,3,1,2,1,1,1,1,1,4,3,3,1,1,2,4,1,4,4,3}

We will now wrap the code inside Histogram to a get visual of the distribution. Here are some templates on how to use this built-in function.
HistogramFunctionTemplates.png

We will use the last version, use Automatic as the bin specification “bspec” and “Probability” as the height specification “hSpec”. Instead of copy-pasting the list that we got, we can just put the code used to generate the list of values. Here is the code that will simulate the experiment 1000 times and show a histogram showing the probabilities associated with each value of the random variable.

Histogram[simUrn[3, 4, 1000],Automatic, "Probability"]

Histogram1.png

An Urn Experiment: Part 1

Print This Post

An urn contains r red and g green marbles. A marble is drawn, one after the other and without replacement until a red marble is drawn. We want to model the length of this experiment, i.e. the random variable is the number of drawn marbles. At first glance, it may seem as though the geometric distribution works, however, since the marbles are drawn without replacement, the trials are not independent and the probability of choosing a red marble changes (increases) with every draw of a green marble. In this post (part 1) as well as the next (part 2), we will model this random variable by simulating the experiment many times using some of the built-in functions in Mathematica and drawing a histogram. In part 3, we will give a probabilistic argument to get the the probability function (PDF) of the random variable. In that post, we will validate our methods by superimposing a graph of the PDF with those of the histogram. Hopefully they will be close!

The main function in our code is called RandomChoice. This function can take on one or more arguments. See the documentation for more details. We will use the version of this function which takes one argument; namely, a list of the objects. The output of this function will be one of these objects and is pseudorandom.

Let’s say you want to choose an objects from the list objects={1,2,a,b,c}. Define this objects and apply RandomChoice as follows:

objects = {1, 2, a, b, c};

RandomChoice[objects]

These two lines of code will produce an output of 1, 2, a, b, or c. Evaluating the second line multiple times gives different outputs.

For our problem, we want to set up an “urn” object that has a certain number of red and a certain number of green marbles. Let’s say 3 red and 4 green. Since we want to simulate this experiment in the most general way, we don’t want to define the urn objects by writing:

urn={"R","R","R","G","G","G","G"}

Although this would be perfectly fine, if we decide to change the number of red or green marbles, then we would have to manually change this list to reflect the new urn.

The Table command produces a list of objects indexed by one or more parameters. It is a very versatile function. We will use it to generate a list of identical elements. We will generate a list of “R”‘s and a list of “G”‘s:

reds = Table["R", {3}]

greens = Table["G",{4}]

Now we will create the urn by joining these two lists using the Join function:

urn = Join[reds, greens]

Now let’s draw a marble and save the result using the name draw:

draw = RandomChoice[urn]

The value of draw will either be “R” or “G”. We want to check which one it is. We can do this by using the same infix operator, ===.

draw === "G"

This code will output True or False, depending on whether draw has a value of “R” or “G” respectively.

We have most of the ingredients to simulate this experiment: Building the urn, drawing a marble, and checking whether or not the marble is red. The next thing we will do is to repeat these steps until we draw a red marble. Programmatically, this means we will build the urn, draw a marble, and check if it is red, and repeat until draw === “G” returns False. Another way to say this is to say we will repeat the three steps while  draw === “G” returns True. We will do this with the While function. This function takes two arguments. The first is an boolean expression, an expression that returns true or false. The second expression is called the body and this is the code that will be executed repeatedly and until the first boolean expression returns false. Here is what our code will look like (Mathematica ignores everything between (* and *). We use this to make the code human readable. In programming these are all comments. Commenting your code is extremely important!

rv = 0;

(*build urn*)

(*draw*)

rv++;

While[draw === "R",

(*build*)

(*draw*)

rv++

];

rv

This code contains one thing we haven’t talked about yet, the random variable! We need to keep track of the value of the random variable each time we draw a marble. We set it up to have an initial value of zero rv = 0 and every time we draw a marble we increment its value rv++. The semi-colons in the code suppress the output. Putting rv at the end of the code returns the value.

Okay, we are now ready to put this all together.

rv = 0;
(*build urn*)
reds = Table["R", {3}];
greens = Table["G", {4}];
urn = Join[reds, greens];
(*draw*)
draw = RandomChoice[urn];
rv++;
While[draw === "G",
(*build*)
reds = Table["R", {3}];
greens = Table["G", {4 – rv}];
urn = Join[reds, greens];
(*draw*)
draw = RandomChoice[urn];
(*draw*)
rv++

];
rv

Copy/Paste this code in Mathematica. The result will be either 1, 2, 3, 4, or 5. The best case scenario is choosing a red marble on the first draw, this corresponds to the random variable having a value of 1. The worst case scenario is choosing all the (4) green marbles in which case the fifth draw will definitely be red, this corresponds to the random variable having a value of 5.

Can you explain why the following line of code is there?

greens = Table["G", {4 – rv}];

In the next post, we will build a custom function that will simulate this experiment many times and return the values of the random variable from each simulation.

Disclaimer: There is a much more efficient way to write the code for this experiment. However, as this is targeted to readers who are new to Mathematica and perhaps even to programming in general, we will be satisfied with this version for now. Readers who are more experienced with Mathematica and/or programming in general are encouraged to write faster and more efficient code. This definitely comes in handy if we want to simulate such an experiment a large number of times, say 1 million times.