next up previous contents
Next: Multinomial choice Up: Discrete choice theory Previous: Contents   Contents

Subsections

Binary choice

$=$ choice between two options.

Systematic vs random component of utility

Option $A$, for example ``go swimming''.

Has systematic utility (that we compute): $V_A$.

Assume that (for whatever reason) there is also a random component: \(
U_A = V_A + \epsilon_A \ .
\) Choice is made according to $U_A$.

Possible interpretations:

Person making the choice is not determinstic.

Person making the choice is deterministic, but there are additional criteria (for example ``was swimming yesterday'') which are not included.

If they were included, then there would be no $\epsilon_A$ in this interpretation.

Choice based on random utilities

Now let us assume there are two options, $A$ (``go swimming'') and $B$ (``stay home'').

We assume that the option with the larger utility is selected (cf. Fig. 29.1):

\begin{displaymath}
Pr(A) = Pr( U_A > U_B) = Pr( V_A + \epsilon_A > V_B + \epsilon_B)
\end{displaymath} (29.2)



\begin{displaymath}
= Pr( \epsilon_B - \epsilon_A < V_A - V_B )
\end{displaymath} (29.3)

Figure: Two random distributions, centered around $\left\langle U_A\right\rangle =3$ and $\left\langle U_B\right\rangle =9$. Normally, solution B will win because it has higher utility, but there is a finite probability that $U_B$ will come out really low and $U_A$ comes out really high, in which case A will win.
\includegraphics[width=0.8\hsize]{overlap-gpl.eps}

Linear decomposition of systematic part of utility

Assume that $V_A$, $V_B$ are linear in contributions:

\begin{displaymath}
V_A = \beta_1 \, x_{A,1} + \beta_2 \, x_{A,2} + ...
= \vec\beta \cdot \vec x_{A}
\end{displaymath} (29.4)

and similarly
\begin{displaymath}
V_B = ... = \vec\beta \cdot \vec x_{B} \ .
\end{displaymath} (29.5)

In principle, the $x_{X,i}$ can be arbitrary functions. In practice, they are usually simple transformations of basic variables, e.g. time, or distance, or distance squared.

Simple example

A result from discrete choice modeling often looks like this:

\begin{displaymath}
\begin{tabular}{\vert c\vert c\vert c\vert}
\hline
Car & Bus...
...r[cent] & cost with bus[cent] & -0.012 \\
\hline
\end{tabular}\end{displaymath} (29.6)

Interpretation: Systematic utility with car is
\begin{displaymath}
V_{car} = -1.4 - \frac{0.1}{min} \times \hbox{time w/ car}
- \frac{0.012}{cents} \times \hbox{cost w/ car} \ ;
\end{displaymath} (29.7)

systematic utility with bus is
\begin{displaymath}
V_{bus} = 0 - \frac{0.1}{min} \times \hbox{time w/ bus}
- \frac{0.012}{cents} \times \hbox{cost w/ bus} \ .
\end{displaymath} (29.8)

(Compare: departure time ex.; but this here has only two options.)

For example: Time with car 10min; with bus 20min. Cost with car 200cents; with bus 100cents. Then

\begin{displaymath}
V_{car} = -1.4 - 1 - 2.4 = - 4.8 \ ;
\end{displaymath} (29.9)


\begin{displaymath}
V_{bus} = 0 - 2 - 1.2 = - 3.2 \ .
\end{displaymath} (29.10)

The probas to select car/bus (see later) will be something like

\begin{displaymath}
P_{car} = \frac{e^{V_{car}}}{e^{V_{car}} + e^{V_{bus}}} \ .
\end{displaymath} (29.11)


\begin{displaymath}
P_{bus} = \frac{e^{V_{bus}}}{e^{V_{car}} + e^{V_{bus}}} \ .
\end{displaymath} (29.12)

2nd example

Car Bus Coeff
1 0 -1.4
time with car[min] time with bus[min] -0.1
cost with car[cent] cost with bus[cent] -0.012
1 if female 0 0.6
1 if ( unmarried OR spouse cannot drive OR travels to work w/ spouse ) 0 -0.2
1 if ( married AND spouse is working AND spouse drives to work indep'y ) 0 1.2

Meanings:

If person is female, utility of car is increased.

If person is unmarried OR if spouse cannot drive OR if person travels to work with spouse, then utility of car is decreased.

Etc.

Probability distributions, generating functions, etc.

From this point on, progress is made by making assumptions about the statistical distributions of the noise parameters $\eps_i$. Different assumptions will lead to different models.

Before looking into some specific forms, it makes sense to quickly recall probability distributions and generating functions.

A probability density function essentially gives the probability that a certain option is selected. For example, the Gaussian probability density function

\begin{displaymath}
f(x)
= \frac{1}{\sqrt{2\pi} \, \sigma}
\, \exp\left( - \frac{1}{2} \, \left(\frac{x}{\sigma}\right)^2 \right) \ .
\end{displaymath} (29.13)

gives the probability that option $x$ is selected. More precisely, one would have to say that

\begin{displaymath}
\int_x^{x+\Delta x} f(x)
\end{displaymath}

is the probability that anything between $x$ and $x + \Delta x$ is selected.

The generating function $F(x)$ is the integral of the probability density function. That is

\begin{displaymath}
f(x) = F'(x) \ .
\end{displaymath}

In some cases, the generating function is simpler than the probability density function.

The generating function can be used to compute the probability that the selected value is smaller than some given value $X$. Rather obviously, one has

\begin{displaymath}
Pr(x < X) = \int_{-\infty}^X f(x)
= F(X) - F(-\infty) \ .
\end{displaymath}

Binary Probit (Randomness is Gaussian)

Recall: We have

\begin{displaymath}
Pr(A) = Pr( U_A > U_B) = Pr( \epsilon_B - \epsilon_A < V_A - V_B ) \ .
\end{displaymath} (29.14)

We are now looking for mathematical forms of $Pr(A)$.

Assume that $\eps_A$ and $\eps_B$ are Gaussian distributed.

Gaussian distributions have the property that sums/differences of Gaussian distributed variables are still Gaussian distributed. In consequence, $\epsilon := \epsilon_B - \epsilon_A$ is Gaussian distributed, for example (with mean zero and ``width'' $\sigma$):

\begin{displaymath}
f(\eps)
= \frac{1}{\sqrt{2\pi} \, \sigma}
\, \exp\left( - \frac{1}{2} \, \left(\frac{\epsilon}{\sigma}\right)^2 \right) \ .
\end{displaymath} (29.15)

See Fig. 29.2[[ top]].

Figure: [[TOP:]] Gaussian distribution. [[BOTTOM: Error Function ``erf'', giving the probability that a random variable is larger than $x$.]] [[this would better be gnuplot]]
\includegraphics[height=0.4\hsize,width=0.6\hsize]{gz/gauss.eps.gz} [[\includegraphics[height=0.4\hsize,width=0.6\hsize]{erf-gpl.eps}]]

Now we need $Pr( \epsilon < C )$, where $C := V_A - V_B$, and we know that $\epsilon$ is normally distributed. As equation:

\begin{displaymath}
Pr( \epsilon < C ) = \frac{1}{\sqrt{2\pi} \, \sigma} \, \int...
...ac{1}{2} \, \left(\frac{\epsilon}{\sigma}\right)^2 \right) \ .
\end{displaymath} (29.16)

[[See Fig. 29.2 bottom.]]

The solution of this needs the so-called error function, sometimes denoted by erf, or double erf(double x) under linux. Before the age of electronic computers, the error function was inconvenient to use, which is why the main theoretical development followed a different path, described in the following.

An important piece of knowledge is what happens when random variables are combined. For example, the sum of two Gaussian-distributed random variables are again Gaussian-distributed.

Gumbel distribution

As preparation, learn about the so-called Gumbel distribution:

Generating function

\begin{displaymath}
F(\epsilon) = \exp[ - e^{ - \mu \, (\epsilon - \eta) } ] \ .
\end{displaymath} (29.17)

Probability denstity function

\begin{displaymath}
f(\epsilon) = F'(\epsilon) = \mu \, e^{- \mu \, (\epsilon - \eta)}
\, \exp[ - e^{- \mu \, (\epsilon - \eta)} ] \ .
\end{displaymath} (29.18)

Location of maximum: $\eta$ (location parameter).

Variance: $\frac{\pi^2}{6 \mu^2} \sim \frac{1}{\mu^2}$ ($\mu =$ width parameter).

Combination of Gumbel-distributed variables

(Remember: Sum of two Gaussian rnd variables $\leadsto$ new Gaussian rnd variable with properties ...)

For Gumbel:

If $\epsilon_1$ and $\epsilon_2$ indep Gumbel with same $\mu$, then $\max(\epsilon_1,\epsilon_2)$ also Gumbel-distributed with the same $\mu$ and a new $\eta$ of

\begin{displaymath}
\mu^{-1} \, \ln[ e^{\mu \eta_1} + e^{\mu \eta_2} ] \ .
\end{displaymath} (29.19)

If $\epsilon_1$ and $\epsilon_2$ indep Gumbel with same $\mu$, then $\epsilon = \epsilon_1 - \epsilon_2$ is logistically distributed (see below) with generating function

\begin{displaymath}
F(\epsilon) = \frac{1}{1 + e^{\mu \, (\eta_2 - \eta_1 - \epsilon)}} \ .
\end{displaymath} (29.20)

Logistic distribution

Generating function: \(
F(\epsilon) = \frac{1}{1 + e^{-\mu \, \epsilon}} \ .
\) Note that

\begin{displaymath}
F(-\infty) = \frac{1}{1 + e^{\infty}}
= \frac{1}{\infty} = 0 \ ; \ \
F(+\infty) = \frac{1}{1+e^{-\infty}} = 1 \ ,
\end{displaymath} (29.21)

as it should be for a generating function.

Probability density function: \(
f(\epsilon) = \frac{\mu \, e^{-\mu \, \epsilon}}%%
{(1+e^{-\mu \, \epsilon})^2} \ .
\) The logistic probability density function looks somewhat similar to the Gaussian probability density function (Fig. 29.3). $\mu$ is the width parameter.

Figure 29.3: Logistic distribution vs. Gaussian distribution, TOP: linear y-axis, BOTTOM: logarithmic y-axis. The logistic distribution is more pointed at its maximum, but has fatter tails (i.e. towards small/large $x$).
\includegraphics[width=0.6\hsize]{logistic-vs-gauss-gpl.eps} \includegraphics[width=0.6\hsize]{logistic-vs-gauss-logscale-gpl.eps}

Binary logit (randomness is Gumbel distributed)

Coming back to binary choice, one now assumes that $\epsilon_A$ and $\epsilon_B$ are Gumbel distributed, meaning that $\epsilon =
\epsilon_B - \epsilon_A$ is logistically distributed.

Again, find $Pr( \epsilon < C )$. This is

\begin{displaymath}
\int_{-\infty}^C \, f(\epsilon) \, d\epsilon
= F(C) - F(-\infty)
= \frac{1}{1 + e^{-\mu \, C}} \ .
\end{displaymath} (29.22)

If we re-translate this into our original variables, we obtain \(
Pr(A) = \frac{1}{1 + e^{-\mu \, V_A + \mu \, V_B}}
= \frac{e^{\mu \, V_A}}{e^{\mu \, V_A} + e^{\mu \, V_B}} \ .
\)

This is similar to what we have seen in the departure time choice (except that here are only two options; for departure time choice we had many).

Note that the noise parameter $\mu$ comes from the width parameter of the logistic distribution. Large noise $=$ small $\mu$ ($=$ small inverse temperature) $=$ choice more random.


next up previous contents
Next: Multinomial choice Up: Discrete choice theory Previous: Contents   Contents
2004-02-02