---
title: "Statistical Inference and Power Analysis for Direct and Spillover Effects"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{interference_vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

This vignette addresses the usage of the functions involved in statistical inference and power analysis for the direct and spillover effects in two-stage randomized experiments motivated by the JD data set.

## Study Design
In 2007, the ministry in charge of employment in France launched a public employment integration service contract for young graduates seeking employment. A randomized experiment of this job placement assistance program was conducted and the methods in this package can be used to analyze the data. The following examples focus on two specific outcomes: fixed-term contract of six months or more (LTFC) and permanent contract (PC).

## Data
The data set is a subset of the original JD data set and includes the following variables:

`anonale`: local employment agency

`tempsc_av`: full-time work (at time of assignment)

`assigned`: 1 if the individual is assigned to treatment, 0 otherwise

`pct0`: share of the local population treated

`cdi`: binary variable for whether the individual works on a permanent contract, 8 months after the assignment

`cdd6m`: binary variable for whether the individual works in CDD (LTFC-time contract) for more than 6 months, 8 months after the assignment

`emploidur`: binary variable for whether the individual works on a permanent or LTFC-term contract for more than 6 months, 8 months after the assignment

`tempsc`: binary variable for whether the individual works full time, 8 months after the assignment

`salaire`: individual's salary in Euros.

## Overview
The relevant functions for this analysis are the following:

1. `ZSRE`: returns a list of `Z` the vector of the desired binary treatment assignment variable 

2. `YSRE`: returns a list of `Y` the vector of the outcomes for a desired variable of interest.

3. `CalAPO`: returns a list of point estimates and variances for the average potential outcomes, unit level direct effect, marginal direct effect, and unit level spillover effect.

4. `Test2SRE`: returns the rejection region for the desired test. This function takes in the data, the effect type (i.e. direct effect, marginal direct effect, or spillover effect) and outputs the rejection region at the desired significance level.

5. `calpara`: returns a list of the estimated within-cluster variance, between cluster variance, intra-class correlation coefficient, and average of the assignment vector which are necessary for the `Calsamplesize` 


6. `Calsamplesize`: returns a list of the necessary total number of clusters in order to achieve a given power level at a given significance level for the three types of effects.

## Functions
First, import the RCT2 library and load the relevant data set.
```{r}
library(RCT2)
data(jd)
```


### CalAPO
In order to calculate a list of point estimates and variances for an effect of interest, run the `CalAPO` command. It is necessary first to create the vector of treatment assignments, `A`, which will depend on the study design. In this experiment, there are three treatment assignment mechanisms with treated probabilities 25\%, 50\%, and 75\% respectively. 


Then, run the `CalAPO` command, which takes in the vector of treatment assignments, the assignment mechanism vector, and the vector of outcomes for the variable of interest which is `Y.LTFC` in this case. We see that the estimated average potential outcome for long-term fixed contracts is given by `Y.hat`. As stated in the paper, we also have the results for the estimated direct effects under the three treatment mechanisms (`ADE.est`), the estimated marginal direct effect (`MDE.est`), and the estimated spillover effects (`ASE.est`). We also have the estimated covariance matrices for the average potential outcomes, the estimated direct effect, estimated marginal effect, and estimated spillover effects.

```{r}
data_LTFC <- data.frame(jd$assigned, jd$pct0, jd$cdd6m, jd$anonale)
colnames(data_LTFC) <- c("Z", "A", "Y", "id")
test <- CalAPO(data_LTFC)
print(CalAPO(data_LTFC))
```

Similarly, we can run this on the permanent contracts.

```{r}
data_perm <- data.frame(jd$assigned, jd$pct0, jd$cdi, jd$anonale)
colnames(data_perm) <- c("Z", "A", "Y", "id")
CalAPO(data_perm)
```


### Test2SRE
We can also perform hypothesis tests on this data by using the `Test2SRE` function. THE `Test2SRE` function takes in `Z`, `A`, `Y`, as before, and also takes in an extra argument `effect`, where the desired effect should be specified (either ADE for direct effect, MDE for marginal direct effect, or ASE for spillover effect). The function returns `TRUE` if the hypothesis should be rejected, and `FALSE` otherwise. The default significance level is set to 0.05, but may be changed by altering the `alpha` argument.

```{r}
Test2SRE(data_LTFC, effect="MDE", alpha=0.05)
```

### Calpara and Calsamplesize  
Lastly, we can perform sample size calculations for the sample size needed for a given power at a given significance level. First, we call the `calpara` function to calculate the necessary parameters for the sample size calculation, including the within-class and between class variances and the intra-class correlation coefficient. The effect size and the assignment mechanism also need to be specified based on the study design. In this case, `mu` is the effect size and `qa` is the vector of probabilities of being assigned to one of the three assignment mechanisms.


Then, call the `calpara` command to calculate the within-class and between class variances, and the intra-class correlation coefficient.

```{r}
# calculate variances for permanent contract
var.perm <- calpara(data_perm)

# calculate variances for long term fixed contract
var.LTFC <- calpara(data_LTFC)
```

The elements of the output of `calpara` can be accessed as below. For example, to retrieve the total variance of the potential outcomes for the permanent contracts and long-term fixed contracts, the following code can be run:


```{r}
sigma.perm <- var.perm$sigma.tot
sigma.LTFC <- var.LTFC$sigma.tot
print(sigma.perm)
```


Then, we specify the effect size and use the `Calsamplesize` function to calculate the appropriate sample sizes for the permanent contract and the LTFC. The default `alpha`(significance level) and `beta` (power) are set at 0.05 and 0.2 respectively.

```{r}
### effect size and assignment mechanism
mu <- 0.03
qa <- rep(1/3,3)

# calculate sample size for the permanent contract
print("Permanent Contract:")
print(Calsamplesize(data_LTFC, 0.03, qa, 0.05, 0.2))


# calculate sample size for the long term fixed contract
print("Long Term Fixed Contract:")
print(Calsamplesize(data_perm, 0.03, qa, alpha=0.05, beta=0.2))
```

From the results, we can see the necessary total number of clusters for each assignment mechanism with size `n.avg` needed to detect a specific alternative at a certain power and significance level.