Binary choice

Assume that (for whatever reason) there is also a random component: $U_A = V_A + \epsilon_A \ .$ Choice is made according to

Person making the choice is deterministic, but there are additional criteria (for example ``was swimming yesterday'') which are not included.

If they were included, then there would be no $\epsilon_A$ in this interpretation.

Choice based on random utilities

$\begin{displaymath} Pr(A) = Pr( U_A > U_B) = Pr( V_A + \epsilon_A > V_B + \epsilon_B) \end{displaymath}$

(29.2)

$\begin{displaymath} = Pr( \epsilon_B - \epsilon_A < V_A - V_B ) \end{displaymath}$

(29.3)

**Figure:** Two random distributions, centered around $\left\langle U_A\right\rangle =3$ and $\left\langle U_B\right\rangle =9$ . Normally, solution B will win because it has higher utility, but there is a finite probability that will come out really low and comes out really high, in which case A will win.
$\includegraphics[width=0.8\hsize]{overlap-gpl.eps}$

Linear decomposition of systematic part of utility

$\begin{displaymath} V_A = \beta_1 \, x_{A,1} + \beta_2 \, x_{A,2} + ... = \vec\beta \cdot \vec x_{A} \end{displaymath}$

(29.4)

$\begin{displaymath} V_B = ... = \vec\beta \cdot \vec x_{B} \ . \end{displaymath}$

(29.5)

In principle, the $x_{X,i}$ can be arbitrary functions. In practice, they are usually simple transformations of basic variables, e.g. time, or distance, or distance squared.

Simple example

$\begin{displaymath} \begin{tabular}{\vert c\vert c\vert c\vert} \hline Car & Bus... ...r[cent] & cost with bus[cent] & -0.012 \\ \hline \end{tabular}\end{displaymath}$

(29.6)

$\begin{displaymath} V_{car} = -1.4 - \frac{0.1}{min} \times \hbox{time w/ car} - \frac{0.012}{cents} \times \hbox{cost w/ car} \ ; \end{displaymath}$

(29.7)

$\begin{displaymath} V_{bus} = 0 - \frac{0.1}{min} \times \hbox{time w/ bus} - \frac{0.012}{cents} \times \hbox{cost w/ bus} \ . \end{displaymath}$

(29.8)

For example: Time with car 10min; with bus 20min. Cost with car 200cents; with bus 100cents. Then

$\begin{displaymath} V_{car} = -1.4 - 1 - 2.4 = - 4.8 \ ; \end{displaymath}$

(29.9)

$\begin{displaymath} V_{bus} = 0 - 2 - 1.2 = - 3.2 \ . \end{displaymath}$

(29.10)

$\begin{displaymath} P_{car} = \frac{e^{V_{car}}}{e^{V_{car}} + e^{V_{bus}}} \ . \end{displaymath}$

(29.11)

$\begin{displaymath} P_{bus} = \frac{e^{V_{bus}}}{e^{V_{car}} + e^{V_{bus}}} \ . \end{displaymath}$

(29.12)

2nd example

Car	Bus	Coeff
1	0	-1.4
time with car[min]	time with bus[min]	-0.1
cost with car[cent]	cost with bus[cent]	-0.012
1 if female	0	0.6
1 if ( unmarried OR spouse cannot drive OR travels to work w/ spouse )	0	-0.2
1 if ( married AND spouse is working AND spouse drives to work indep'y )	0	1.2

If person is unmarried OR if spouse cannot drive OR if person travels to work with spouse, then utility of car is decreased.

Probability distributions, generating functions, etc.

From this point on, progress is made by making assumptions about the statistical distributions of the noise parameters $\eps_i$ . Different assumptions will lead to different models.

Before looking into some specific forms, it makes sense to quickly recall probability distributions and generating functions.

A probability density function essentially gives the probability that a certain option is selected. For example, the Gaussian probability density function

$\begin{displaymath} f(x) = \frac{1}{\sqrt{2\pi} \, \sigma} \, \exp\left( - \frac{1}{2} \, \left(\frac{x}{\sigma}\right)^2 \right) \ . \end{displaymath}$

(29.13)

The generating function

is the integral of the probability density function. That is

The generating function can be used to compute the probability that the selected value is smaller than some given value

. Rather obviously, one has

$\begin{displaymath} Pr(x < X) = \int_{-\infty}^X f(x) = F(X) - F(-\infty) \ . \end{displaymath}$

Binary Probit (Randomness is Gaussian)

$\begin{displaymath} Pr(A) = Pr( U_A > U_B) = Pr( \epsilon_B - \epsilon_A < V_A - V_B ) \ . \end{displaymath}$

(29.14)

Gaussian distributions have the property that sums/differences of Gaussian distributed variables are still Gaussian distributed. In consequence, $\epsilon := \epsilon_B - \epsilon_A$ is Gaussian distributed, for example (with mean zero and ``width'' $\sigma$ ):

$\begin{displaymath} f(\eps) = \frac{1}{\sqrt{2\pi} \, \sigma} \, \exp\left( - \frac{1}{2} \, \left(\frac{\epsilon}{\sigma}\right)^2 \right) \ . \end{displaymath}$

(29.15)

**Figure:** **[[TOP:]]** Gaussian distribution. **[[BOTTOM: Error Function ``erf'', giving the probability that a random variable is larger than .]]** **[[this would better be gnuplot]]**
$\includegraphics[height=0.4\hsize,width=0.6\hsize]{gz/gauss.eps.gz}$ [[ $\includegraphics[height=0.4\hsize,width=0.6\hsize]{erf-gpl.eps}$ ]]

Now we need $Pr( \epsilon < C )$ , where

, and we know that $\epsilon$ is normally distributed. As equation:

$\begin{displaymath} Pr( \epsilon < C ) = \frac{1}{\sqrt{2\pi} \, \sigma} \, \int... ...ac{1}{2} \, \left(\frac{\epsilon}{\sigma}\right)^2 \right) \ . \end{displaymath}$

(29.16)

The solution of this needs the so-called error function, sometimes denoted by erf, or double erf(double x) under linux. Before the age of electronic computers, the error function was inconvenient to use, which is why the main theoretical development followed a different path, described in the following.

An important piece of knowledge is what happens when random variables are combined. For example, the sum of two Gaussian-distributed random variables are again Gaussian-distributed.

Gumbel distribution

$\begin{displaymath} F(\epsilon) = \exp[ - e^{ - \mu \, (\epsilon - \eta) } ] \ . \end{displaymath}$

(29.17)

$\begin{displaymath} f(\epsilon) = F'(\epsilon) = \mu \, e^{- \mu \, (\epsilon - \eta)} \, \exp[ - e^{- \mu \, (\epsilon - \eta)} ] \ . \end{displaymath}$

(29.18)

Variance: $\frac{\pi^2}{6 \mu^2} \sim \frac{1}{\mu^2}$ ( $\mu =$ width parameter).

Combination of Gumbel-distributed variables

(Remember: Sum of two Gaussian rnd variables $\leadsto$ new Gaussian rnd variable with properties ...)

If $\epsilon_1$ and $\epsilon_2$ indep Gumbel with same $\mu$ , then $\max(\epsilon_1,\epsilon_2)$ also Gumbel-distributed with the same $\mu$ and a new $\eta$ of

$\begin{displaymath} \mu^{-1} \, \ln[ e^{\mu \eta_1} + e^{\mu \eta_2} ] \ . \end{displaymath}$

(29.19)

If $\epsilon_1$ and $\epsilon_2$ indep Gumbel with same $\mu$ , then $\epsilon = \epsilon_1 - \epsilon_2$ is logistically distributed (see below) with generating function

$\begin{displaymath} F(\epsilon) = \frac{1}{1 + e^{\mu \, (\eta_2 - \eta_1 - \epsilon)}} \ . \end{displaymath}$

(29.20)

Logistic distribution

Generating function: $F(\epsilon) = \frac{1}{1 + e^{-\mu \, \epsilon}} \ .$ Note that

$\begin{displaymath} F(-\infty) = \frac{1}{1 + e^{\infty}} = \frac{1}{\infty} = 0 \ ; \ \ F(+\infty) = \frac{1}{1+e^{-\infty}} = 1 \ , \end{displaymath}$

(29.21)

Probability density function: $f(\epsilon) = \frac{\mu \, e^{-\mu \, \epsilon}}%% {(1+e^{-\mu \, \epsilon})^2} \ .$ The logistic probability density function looks somewhat similar to the Gaussian probability density function (Fig. 29.3). $\mu$ is the width parameter.

**Figure 29.3:** Logistic distribution vs. Gaussian distribution, TOP: linear y-axis, BOTTOM: logarithmic y-axis. The logistic distribution is more pointed at its maximum, but has fatter tails (i.e. towards small/large ).
$\includegraphics[width=0.6\hsize]{logistic-vs-gauss-gpl.eps}$ $\includegraphics[width=0.6\hsize]{logistic-vs-gauss-logscale-gpl.eps}$

Binary logit (randomness is Gumbel distributed)

Coming back to binary choice, one now assumes that $\epsilon_A$ and $\epsilon_B$ are Gumbel distributed, meaning that $\epsilon = \epsilon_B - \epsilon_A$ is logistically distributed.

$\begin{displaymath} \int_{-\infty}^C \, f(\epsilon) \, d\epsilon = F(C) - F(-\infty) = \frac{1}{1 + e^{-\mu \, C}} \ . \end{displaymath}$

(29.22)

If we re-translate this into our original variables, we obtain $Pr(A) = \frac{1}{1 + e^{-\mu \, V_A + \mu \, V_B}} = \frac{e^{\mu \, V_A}}{e^{\mu \, V_A} + e^{\mu \, V_B}} \ .$

This is similar to what we have seen in the departure time choice (except that here are only two options; for departure time choice we had many).

Note that the noise parameter $\mu$ comes from the width parameter of the logistic distribution. Large noise

small $\mu$ (

small inverse temperature)

choice more random.