# An Exercise in Equity and Efficiency of a Lottery

As an Associate Fellow with the Office of Evaluation Sciences, I’ve had the opportunity to collaborate on a series of projects linked to the distribution and impact of pandemic relief for small businesses. One such project that some brilliant researchers on our team spearheaded centered on the distributional consequences on different financial award disbursement mechanisms.

To do this, one might anticipate that these researchers did a standard
regression analysis to assess how business, and business owner,
characteristics led to disparities in access. However, this approach is
limited in one key respect: *endogeneity.* It is impossible to know
whether unobserved factors (factors that just couldn’t be measured
because of limited availability of data) were the true cause of observed
disparities.

So, the researchers did something that a number of other scholars have
done in addressing similar questions related to a government’s or
organization’s disbursement of financial awards. They *simulated* how
different disbursement mechanisms affected the distribution of awards
and summarized any systematic disparities in awards received that
emerged.

I won’t summarize all their findings here, but I encourage anyone
interested to read the full
report.
Instead, I want to use this post as an opportunity to flesh out a much
more stylized and abstract exercise in equity that not only speaks to
the question of award disbursement, but also to the effect disbursement
mechanisms have on awardee outcomes: namely, *survival*, which is linked
more generally to the issue of *efficiency*.

This issue is something that is difficult to tease out in practice. We might imagine how certain disparities in awards received affect outcomes, but measurement poses a severe impediment.

This is a situation where mathematical modeling can help. By developing a stylized model we can imagine counterfactuals that can’t be observed in an empirical setting.

Below, I introduce such a model, describe alternative lottery mechanisms, and illustrate how to do this with some R code.

## A Counterfactual Model of Small Business Survival

I begin by introducing the model. Suppose a population of *N* small
businesses indexed *i* = 1, …, *N*. And, suppose there is some
organization (government or private, it doesn’t matter for this
exercise) that has *M* awards it has to decide how to allocate among the
*N* businesses.

Each business has a set of counterfactual outcomes, namely, their
likelihood of survival with and without an award. If *i* receives an
award, its probability of survival is *π*_{i}^{a},
while its probability of survival is
*π*_{i}^{n} ≤ *π*_{i}^{a}
without an award. We say *i*’s revealed probability of survival given
whether it does or does not receive an award, denoted
*A*_{i} ∈ {0, 1}, is

*π*_{i} = *A*_{i}*π*_{i}^{a} + (1 − *A*_{i})*π*_{i}^{n}.

Now, the organization tasked with distributing awards has a problem. The
number of possible awardees exceeds the number of available awards
(*M* < *N*). Because of this scarcity, some businesses will receive
an award while others will have to be excluded. Clearly, it would be
best if all businesses receive an award since each business does at
least as well or better with an award than without.

The question is, *how to choose who does and doesn’t get an award?*

At the heart of this question is the tension between *equity* and
*efficiency*. On the one hand, it is desirable that all businesses have
an equal opportunity to receive a financial award. On the other hand,
some businesses may benefit more from receiving an award than others.
Thus, an equitable lottery may not produce the best net social benefit.

This is a weighty question, and I make no pretense to know what the morally right decision to make is. I also don’t think this is an appropriate end of mathematical modeling.

Rather, I think this exercise is best suited to clarifying the trade-offs between equity and efficiency.

In particular, consider four set-ups. First, suppose the organization
giving out awards decides to implement a lottery. However, in designing
this lottery, it may choose whether to institute a *fair* lottery where
each business has an equal likelihood of getting an award, or else the
organization may choose to implement one of three *weighted* lotteries
where businesses receive differential likelihoods of being awarded
depending on some set of criteria. For one lottery, the criterion is
based on how much a business benefits from an
award:*π*_{i}^{a} − *π*_{i}^{n}.
The next assigns weights based on the anticipated likelihood of survival
with an ward: *π*_{i}^{a}. The final weighted
lottery is based on the likelihood of business failure without an award:
1 − *π*_{i}^{n}.

In reality, it’s unlikely that the organization disbursing awards has perfect information about these survival probabilities. But, for this example we’ll pretend as if it does, if only to make stark the distributional and survival impact of the alternative lotteries.

For the fair lottery, each business’s probability of receiving an award is simply:

*α*_{i}^{fair} = *M*/*N*.

This is because there are *M* available awards and *N* total businesses.
Say *N* = 100 and *M* = 5. If the lottery were to be repeated
indefinitely, each business would get an award in 0.05 proportion of
lotteries.

For each of the weighted lotteries, weights are assigned that are proportional to the criteria defined above. Respectively:

- selection on improvement (SOI):
*ω*_{i}∝*π*_{i}^{a}−*π*_{i}^{n}; - selection on survival (SOS):
*ω*_{i}∝*π*_{i}^{a}; - selection on failure (SOF):
*ω*_{i}∝ 1 −*π*_{i}^{n}.

For each set of weights, the probability of an award is then defined as

*α*_{i} = *M*(*ω*_{i}/*Σ*_{i}*ω*).

Now, there is a slight technical problem with the above. This
formulation is simply the weighted average of awards received per
business in successive lotteries. While these are constructed such that
no more than *M* awards will ever be disbursed, it is possible in theory
for some businesses to receive more than one award. We could take a
number of approaches to putting a ceiling on the weights to avoid this
happening, but in the simulation to follow this isn’t an issue.[1]

Now, with these lotteries defined, in the next section I illustrate the
implications of these alternative approaches. I do so with a special eye
towards not only efficiency vs. equity, but also disproportional
benefits to *underserved* vs. *non-underserved* businesses. The latter
are not explicit categories in the model but are nonetheless real-world
characteristics of businesses that we might imagine (and that do) affect
survival with and without financial support. Minority, women, veteran,
and disabled owned businesses have been highlighted as particularly
susceptible to negative shocks induced by the Covid-19 pandemic. For
this reason, ensuring such businesses have access to pandemic relief has
been a concern for both researchers and policymakers. While the highly
stylized model presented here does not explicitly factor in such
back-ground characteristics, we can easily imagine that businesses with
a lower probability of survival without a loan are analogous to
underserved businesses. By summarizing how different lotteries
systematically benefit or neglect these businesses we can gain
intuitions for how underserved status might be associated with
disproportionate inclusion or exclusion from financial support in
applied settings.

## A Simulation

With the model defined, let’s take it for spin. We can easily program a simulation in R to see how different lotteries affect both small business survival and the distributional consequences thereof.

I’ll need the following packages:

```
library(seerrr)
library(estimatr)
library(tidyverse)
library(kableExtra)
```

Next, to make life easier, I define a helper function that returns the expected survival of a business given its probability of getting an award and its probability of survival with and without said award.

```
# Define a function to report expected survival:
p <- function(a, pa, pn) {
a * pa + (1 - a) * pn
}
```

I now define the data-generating process. To keep things simple, suppose
*N* = 100 and *M* = 15. Further, assume that probability of survival
without financing follows a uniform distribution

*π*_{i}^{n} ∼ 𝒰(0, 1),

and that survival with a loan is

*π*_{i}^{a} ∼ min [*π*_{i}^{n}+𝒰(0,1),1].

We can specify this in R as follows. Using `simulate`

from the `seerrr`

package, we’ll choose iterate the data-generating process 100 times (we
could do more if we wanted):

```
sim <- simulate(
R = 100, # No. of iterations
N = 100, # No. of businesses
M = 15, # No. of awards
# Prob. of survival without an award
pn = runif(N),
# Prob. of survival with an award
pa = sapply(pn + runif(N),
function(x) min(x, 1)),
# Probabilities under alternative lotteries
a_fair = M / N,
a_soi = (pa - pn) / sum(pa - pn) * M,
a_sos = pa / sum(pa) * M,
a_sof = (1 - pn) / sum(1 - pn) * M,
# Expected outcomes
p_fair = p(a_fair, pa, pn),
p_soi = p(a_soi, pa, pn),
p_sos = p(a_sos, pa, pn),
p_sof = p(a_sof, pa, pn)
) %>% bind_rows()
```

The above produces 100 different datasets simulated from the data generating process and then stacks them on top of each other.

First, let’s explore the results with respect to the distributional consequences of alternative lotteries. To do this, I do some data cleaning and reshaping:

```
sim %>%
select(contains("a_"), pn) %>%
pivot_longer(
!pn, names_to = "lotto", names_prefix = "a_"
) %>%
mutate(
lotto = toupper(lotto),
sim = rep(1:100, each = n() / 100)
) -> plot_smry
```

I then plot for each lottery the probability of receiving an award over the probability of survival without an award. I’ve also layered on LOESS lines grouped by iteration.

```
ggplot(plot_smry) +
aes(
x = pn,
y = value,
group = sim
) +
geom_point(
col = "grey",
alpha = 0.1
) +
stat_smooth(
geom = "line",
alpha = 0.2
) +
facet_wrap(
~ lotto
) +
theme_bw() +
labs(
x = "Survival without Support",
y = "Pr(Award)",
caption = "Results after 100 Simulations"
)
```

As we would expect, the fair lottery gives each business the same probability of receiving financing regardless of its characteristics (in this case, survival probability without an award). We do, however, see disparities across the other weighted lotteries.

The lottery that selects on survival with an ward (SOS) disproportionately benefits businesses that also happen to be better off without an award. Conversely, the lottery that weights more toward businesses that are likely to fail without an award (SOF), unsurprisingly, disproportionately benefits businesses with a lower probability of survival without an award. Meanwhile, the lottery that gives greatest weight to businesses that would most benefit from an award (SOI) also advantages businesses with a lower probability of survival without an award, but the results are less stark than with SOF.

The fact that SOS clearly disadvantages businesses that are worse off without an award is a point worth returning to. As the OES report highlights, in practice many lenders took a similar approach in disbursing financing. It was believed that weighting on survival with an award was the most efficient approach given the limited resources available to organizations. This approach, unfortunately, came at the expense of underserved businesses.

*Was this intuition correct? Was weighting in favor of businesses deemed
most likely to survive with an award the most efficient approach?*

To see how the different lotteries affected survival, I first clean the data as follows:

```
sim %>%
select(sim, ID, contains("p_")) %>%
pivot_longer(
cols = starts_with("p_"),
names_to = "Lottery",
names_prefix = "p_"
) %>%
mutate(
clust_var = paste0(ID, sim, sep = "_")
) -> out_df
```

The above puts the data in a convenient shape for regression analysis. In particular, we can estimate how individual businesses did under each of the weighted lotteries relative to the fair lottery. Do do this I estimate a regression model where the outcome is survival probability under different lotteries, the predictors of interest are indicators for each of the weighted lotteries, and where we control for business ID and simulation iteration to ensure estimates reflect within business, within simulation differences. Finally, standard errors are clustered by business-simulation.

```
diff_by_lotto <-
lm_robust(
value ~ Lottery,
fixed_effects = ~ ID + sim,
clusters = clust_var,
data = out_df,
se_type = "stata"
)
```

The results can be summarized as in the below table:

```
tidy(diff_by_lotto) %>%
select(
term, estimate, conf.low, conf.high
) %>%
mutate_if(
is.numeric,
~ round(.x * 100, 3)
) %>%
mutate(
CI = paste0("[", conf.low,", ", conf.high, "]"),
term = str_remove(term, "Lottery") %>%
toupper()
) %>%
select(-conf.low, -conf.high) %>%
kable("markdown",
col.names = c("Lotto", "Estimate", "95% CI"))
```

Lotto | Estimate | 95% CI |
---|---|---|

SOF | 1.211 | [1.146, 1.275] |

SOI | 2.447 | [2.34, 2.554] |

SOS | 0.261 | [0.238, 0.285] |

The point estimates reflect the percentage point increase in the likelihood of survival under one of the weighted lotteries relative to the fair lottery. As it turns out, each of the weighted lotteries is more efficient than the fair lottery. This partly confirms the intuition that weighting in favor of businesses that are most likely to survive with an award is more efficient than a fair, unweighted lottery.

However, among the weighted lotteries SOS yields the least improvement. In fact, the best strategy is SOI followed by SOF. What is interesting about these results is not only that each of those lotteries is more efficient than favoring businesses that are most likely to survive with financing, but also that each is weighted in favor of underserved businesses.

Let me repeat, *weighting in favor of businesses that are most likely to
survive with a loan* **is not** *the most efficient solution*. On
average, business survival is most improved by lotteries that also
disproportionately benefit disadvantaged businesses.

That’s an interesting finding, and one that probably deserves more study than what I can muster in a single blog post.

[1] It’s always good to save a little something for another day I suppose.