Skip to contents

This function generates potential outcomes from the simulation settings described in Parast et al. (2024) . It creates a dataset of potential outcomes $$P = (Y_1, S_1, Y_0, S_0)$$ and observed outcomes $$P_{observed} = (Y, S)$$ based on a random treatment assignment \(Z\).

Usage

DGP_no_X(
  n,
  p,
  mu_star = NULL,
  Sigma_star = NULL,
  model = c("Gaussian", "misspecified")
)

Arguments

n

Integer. Total sample size.

p

Numeric. Probability of being assigned to the treatment group (Z=1).

mu_star

Numeric vector. The mean vector for \(P\). Required if model = "Gaussian".

Sigma_star

Matrix. The covariance matrix for \(P\). Required if model = "Gaussian".

model

Character. The type of data generation: "Gaussian" or "misspecified".

Value

A list containing:

  • Z: Treatment assignment vector.

  • n1: Number of treated units.

  • n0: Number of control units.

  • P: Full matrix of potential outcomes.

  • P_observed: Observed outcomes \((Y, S)\) corresponding to the assigned treatment \(Z\).

  • P_unobserved: Counterfactual outcomes under the opposite treatment.

This function is useful for generating synthetic data to test or explore the method, for instance to verify the behavior of BSET_no_X under known simulation settings.

Details

The function supports two types of data-generating processes:

  • Gaussian model: Potential outcomes are drawn from a multivariate normal distribution: $$P \sim \mathcal{N}_{4}(\mu^{*}, \Sigma^{*}).$$

  • Non-linear model: Potential outcomes for the surrogate are generated from a non-Gaussian distribution, and the potential outcomes for the primary outcome are generated from a non-linear function of the surrogate plus non-Gaussian noise.

References

Parast L, Cai T, Tian L (2024). “A rank-based approach to evaluate a surrogate marker in a small sample setting.” Biometrics, 80(1), ujad035.

Examples

set.seed(123)
data <- DGP_no_X(
  n = 100,
  p = 0.5,
  mu_star = c(6, 6, 2.5, 2.5),
  Sigma_star = kronecker(diag(2), matrix(c(3, 3, 3, 3.1), 2, 2)),
  model = "Gaussian"
)