As a young researcher focused on biological and data sciences, I want to use my skills in data analysis to advocate for an urgent and underappreciated cause: animal advocacy. I have a strong commitment to veganism that began when I went vegetarian my freshman year of college. This was the first time in my life that I had ever given any thought to the food that I put on my plate. Though I made this change for environmental reasons, it was the start of a large shift in my worldview. While I have always been unsettled by animal cruelty, I began to think critically about food.
Within a handful of months, I was vegan. I have come to view animal rights as an intersectional issue. As more animals suffer on factory farms, the industry consolidates to a handful of large companies. In turn, these companies have the power to influence policy. This industrial complex has caused not only more animal suffering, but also dangerous work conditions, pollution, public health crises, and food insecurity, all of which disproportionately impact poor communities. I believe that effective animal advocacy is not separate from these issues; addressing them in tandem will generate a strong foundation for sincere advocacy.
I have only begun my advocacy work, but believe I am posed to do more. As an undergraduate, I conducted independent research on animal agriculture. As I learned skills in data analysis and hypothesis testing, I turned them towards animal advocacy. I have highlighted my work on my website, including the finding that industrial animal agriculture disproportionately pollutes poor communities. I also joined a housing co-op that cooks communal vegan meals nightly. I have found that in my social circles, the simple act of cooking a vegan meal with friends can go a very long way. I believe that I can combine my professional aspirations as a data scientist and my vegan convinctions and lifestyle, I can help advance this important cause.
I estimate that 1002.84 ± 11.09 purchases will be made in the fifth region.
To reach this estimate, I assume that the relationships between advertising spending (for both TV and radio) and product purchases are linear. I make this assumption because there are not enough data to model alternative distributions reliably. Radio yields the most successful linear model by fitting the data well (R^2 = 0.9262). By using a linear model I also assume that the market has not reached advertising saturation. Each advertising dollar grants the product in question additional exposure. Lastly, I assume that there is no interaction between TV and radio spending.
Under these assumptions, I modeled TV spending, radio spending, and also both types of spending at once as sales predictors using linear regressions. Multiple regressions were used for modeling both types of spending at once. The model of best fit was used for the final estimate.
Its seems that the linear model for radio alone is the strongest predictor of sales, based on the greatest statistical significance at p = 0.037, and the lowest standard error of 11.09. Using radio advertising spending as a model for estimating product sales, there can be an expected amount of 1002.84 ± 11.09 sales. As it is unlikely that TV advertising has 0 impact on sales, this estimate is almost definitely flawed. With more data, a better model that is capable of significantly estimating the effects of both variables could provide a more accurate sales estimate.
library(lme4)
## Warning: package 'lme4' was built under R version 3.4.3
## Loading required package: Matrix
library(bbmle)
## Loading required package: stats4
# Make vectors from data
dollarsTV <- c(2,5,6,8)
dollarsRadio <- c(3,4,9,12)
purchasesCount <- c(271,440,735,787)
# Make linear models between spending and purchases
m.radio <- lm(purchasesCount~dollarsRadio) #radio
m.TV <- lm(purchasesCount~dollarsTV) #TV
m.both <- lm(purchasesCount ~ (dollarsTV + dollarsRadio)) #both
# Create statistical summaries of these relationships
stats.radio <- summary(m.radio)
stats.TV <- summary(m.TV)
stats.both <- summary(m.both)
# Lowest standard error - radio
radioError <- stats.radio$coefficients[2,2]
radioError
## [1] 11.09093
# Based on p-values, R2, and standard error, radio alone is the best fit.
# Compare models to estimate weights
m.compare <- AICtab(m.both, m.TV, m.radio, weights = TRUE)
# Estimate sales based on $10K TV and $15K radio spending
# y = mx + b, x = 15
estimate <- coef(m.radio)[2]*15 + coef(m.radio)[1]
estimate
## dollarsRadio
## 1002.843
The p-value is 0.401.
I tested the hypothesis that more than 20% of customers regularly bring their own tote bags using an upper tail test of population proportion. I used sample size (4), sample proportion (0.25), and hypothesized value (0.2) to calculate a Z test-statistic, I then used to calculate this p-value using an upper tail test.
# Set values
n <- 4 #sample size
pbar <- 1/n # sample proportion
p0 <- .2 # hypothesized value
# Get Z statistic, then p-value
z <- (pbar-p0)/sqrt(p0*(1-p0)/n) # test statistic
pvalue <- pnorm(z, lower.tail=FALSE) #pvalue
pvalue
## [1] 0.4012937
I believe that the growing human population is not a meaningful concern in addressing climate change because the population that is growing does not drive climate change. While the growing population is associated with an increase in global carbon emissions, energy sources are becoming more renewable, and innovations in agricultural science have allowed for more people to be fed than ever. Furthermore, the wealthiest percentiles of the human population are responsible for a disproportionate amount of CO2 emissions, as shown in “Extreme Carbon Inequality,” a 2015 Oxfam report. As the poorest 50% of the population contributes 10% of the global CO2 emissions, and as poorer countries have the most rapidly increasing populations, it seems unconvincing that the increasing population is as large a threat to climate change as the top 10% of the global population, who contribute roughly 49% of global emissions.
Additionally, attempts at population control have historically targeted vulnerable populations. Rather than focusing climate change efforts on castigating poorer populations, we should aim to change the practices and policies of wealthy countries and industries.
A second thing I believe is that, as a general rule, addressing social issues with punitive measures rather than support systems largely perpetuates inequality. The policies that lead to mass incarceration have sharpened divides of race and wealth in the United States. Lack of social services for individuals with mental illnesses also leads them to be disproportionately incarcerated. Similar criminalization policies have contributed to the opioid crisis that so strongly impacts Philadelphia. Turning a new leaf, Philadelphia’s Mayor and DA are pursuing the opening a Safe Injection Facility that will almost certainly save lives and mitigate harm.
The statistical power of your experiment is 55.47%.
In order to reach this value, I first determined significance threshold. I did this by calculating a p-value using an upper tail test of population proportion in which null hypothesis is that the coin is fair, in an experiment where 8 of 10 coin flips return heads. I then calculated power by using this p-value as the significance threshold. The Cohen’s d effect size was calculated using the hypothesized proportion of 0.8 and the null proportion of 0.5.
# Set values
n = 10 #sample size
pbar = 8/n # sample proportion
p0 = .5 # hypothesized value
z = (pbar-p0)/sqrt(p0*(1-p0)/n) # test statistic
pvalue = pnorm(z, lower.tail=FALSE) # alpha value
pvalue
## [1] 0.02888979
library(pwr)
# Calculate power using effect size, significance level, n, and an upper tail hypothesis
test <- pwr.p.test(h = ES.h(p1 = 0.8, p2 = 0.5),
sig.level = pvalue,
n = n,
alternative = "greater")
test$power
## [1] 0.5547069
Using an upper tail test of population proportion in which the null hypothesis is that there is a 75% chance that any given customer will make a purchase, I calculated 0.28 as the p-value for this experiment.
While the probability of seeing this result if the store owner is incorrect is 0.28, the probability that the store owner is correct cannot be directly evaluated. As hypothesis testing is limited to determining the probability that a result would be seen if a null hypothesis (that there is up to a 75% chance that any given customer will make a purchase) is true, I would not say that the probability that the store owner’s claim is correct can be determined.
pbar <- 1 # sample proportion
p0 <- .75 # hypothesized value
n <- 1 # sample size
z <- (pbar-p0)/sqrt(p0*(1-p0)/n) # test statistic
pvalue <- pnorm(z, lower.tail=FALSE)
pvalue
## [1] 0.2818514
In my independent research and motivation for farm animal advocacy I make a big assumption: that people will care more about animal welfare if the prevalence of factory farming is addressed as an intersectional issue. To examine whether this approach is effective, I would like to test the hypothesis that a leaflet that intersectionally advocates for animal welfare is more effect at decreasing animal product consumption than no leaflet.
A leaflet that discusses animal agriculture with respect to the inequalities exacerbated by the industry would be created. These include: health of workers, health of poorer communities, prevalence of food insecurity, environmental racism, and the increasing hardships of independent farmers.
Participants would be collected outside of dining halls at universities. After completing a food frequency questionnaire, they will be assigned to either the variable group that receives the leaflet or the control group that does not. Two months later, they would be contacted through their email (provided in the questionnaire) and be asked to complete the same questionnaire. Social desirability bias will be mitigated by not reminding participants of the leaflet in the second questionnaire.
A power analysis can determine the sample size necessary to find the smallest effect size of interest, which should be small in the case of exploratory research such as this. I suggest using an alpha value of 0.05, a Cohen’s d of 0.2, and power of 90%. This analysis suggests a sample size of 215 per group, or 430 total. Using a conservatively estimated 24% email survey response rate, a total of 1,792 participants should be collected at intake, and 896 should be in each group.
This study would be pre-registered, and all data and materials would be made available online. Limitations include self-reporting and recall biases inherent to the invalidated food frequency questionnaire, as well as prevailing social desirability bias.
test <- pwr.p.test(h = .2,
sig.level = 0.05,
power = 0.90,
alternative = "greater")
test$n
## [1] 214.0962
I would ask the researcher for the expected effect size used in power analysis as well as their justification for using this estimate. If the expected effect size was greater than the actual effect size of the difference between these interventions, it is likely that the study did not have enough subjects to infer a genuine relationship when there could very well be one. If the research is exploring unstudied relationships, I would hope that a small effect size (like d=0.2) was used to estimate power. On the other hand, if one treatment needs to be demonstrated to be a certain degree more effective than the other (e.g. if one treatment is more expensive and needs to be demonstrated by a company to be financially worthwhile), this should also be reflected in power estimate conducted using higher effect size.
While the p-value of 0.14 does not cross the defined threshold for statistical significance here and shows that the results may be a result of randomness, it does not mean that one intervention is not preferable to the other. If one of the two interventions must be carried out, if the expected effect size was justified and two interventions have equal costs, and if one intervention was demonstrated to possibly have a stronger efficacy, that intervention might as well be the one picked. If the effect size was not justified, the study should be either reconducted entirely. If the researcher does not the resources to conduct a new experiment, the existing results should be examined using a post-hoc power analysis. If the researcher reconducts studies entirely, the first completed study should also be reported.
The discussion concerning the importance of specifying a smallest effect size of interest is well described in the following paper:
Lakens, D. (2014). Performing high-powered studies efficiently with sequential analyses: Sequential analyses. European Journal of Social Psychology, 44(7), 701–710. https://doi.org/10.1002/ejsp.2023
~15.075, resimulated below:
trial <- function(){
# There are 4 groups of 25 people, created from 50 men and 50 women
# Function returns a vector returning a simulated number of women in groups A,B,C,D
# Count women and men
women <- 50
men <- 50
total <- women+men
# Create randomly ordered sample of remaining women and men
# Women are coded as 1 so that they can be summed easily
random <- sample(c(rep(1, women), rep(0, men)))
# Organize into a data frame with 4 rows, sum to get number of women in each, get largest group size
max(colSums(matrix(random, ncol=4, byrow=TRUE)))
}
# Conduct n simulations and average to get group estimates
n <- 50000 # arbitrarily specified
largestGroup <- sum(replicate(n, trial()))/n
largestGroup
## [1] 15.05892
15.05892