Skip to contents

Overview

Sample selection bias arises when the observed sample is not a random draw from the population. For example:

  • Only firms above a revenue threshold are surveyed.
  • Only farms that adopted a technology are observed using it.
  • Participation in a programme is voluntary, so only volunteers are observed.

If selection into the sample is correlated with firm efficiency, ignoring this leads to biased frontier estimates. sfaselectioncross implements the two-step ML estimator of Greene (2010), leveraging the sample selection correction provided via sfaR (Dakpo et al. 2022), which corrects for this bias using a probit selection equation (Heckman 1979 correction).

The selection model requires: - A binary selection indicator d (1 = selected/observed, 0 = not selected). - A selection equation formula selectionF specifying which variables drive selection. At least one variable must appear in selectionF but not in the main frontier formula. - Only selected observations (d == 1) participate in the frontier and receive efficiency estimates. Efficiency for non-selected observations is NA.

Data Preparation (Simulated Example)

We simulate data following the approach in the sfaR documentation:

library(smfa)
#> Loading required package: sfaR
#>            ****           *******  
#>           /**/           /**////** 
#>   ****** ******  ******  /**   /** 
#>  **//// ///**/  //////** /*******  
#> //*****   /**    ******* /**///**  
#>  /////**  /**   **////** /**  //** 
#>  ******   /**  //********/**   //**
#> //////    //    //////// //     //    version 1.0.1
#> 
#> * Please cite the 'sfaR' package as:
#>   Dakpo KH., Desjeux Y., Henningsen A., and Latruffe L. (2024). sfaR: Stochastic Frontier Analysis Using R. R package version 1.0.1.
#> 
#> See also: citation("sfaR")
#> 
#> * For any questions, suggestions, or comments on the 'sfaR' package, you can contact directly the authors or visit:  https://github.com/hdakpo/sfaR/issues
#>                         .d888         
#>                        d88P"          
#>                        888            
#> .d8888b  88888b.d88b.  888888 8888b.  
#> 88K      888 "888 "88b 888       "88b 
#>  Y8888b. 888  888  888 888   .d888888  
#>      X88 888  888  888 888   888  888 
#>  88888P' 888  888  888 888   "Y888888 
#>                           version 1.0.0
#> 
#> * Please cite the 'smfa' package as:
#> Owili, S. O. (2026). smfa: Stochastic Metafrontier Analysis. R package version 1.0.0.
#> 
#> See also: citation("smfa")
#> 
#> * For any questions, suggestions, or comments on the 'smfa' package, you can contact the authors directly or visit:
#>   https://github.com/SulmanOlieko/smfa/issues

N <- 500; set.seed(12345)
z1 <- rnorm(N); z2 <- rnorm(N)
v1 <- rnorm(N); v2 <- rnorm(N)
g  <- rnorm(N)
e1 <- v1
e2 <- 0.7071 * (v1 + v2)
ds <- z1 + z2 + e1
d  <- ifelse(ds > 0, 1, 0)        # 1 = selected into the sample
group <- ifelse(g > 0, 1, 0)      # two technology groups
u  <- abs(rnorm(N))
x1 <- abs(rnorm(N)); x2 <- abs(rnorm(N))
y  <- abs(x1 + x2 + e2 - u)
dat <- as.data.frame(cbind(y = y, x1 = x1, x2 = x2,
                            z1 = z1, z2 = z2, d = d, group = group))

# About 50% of observations are selected
table(dat$d)
#> 
#>   0   1 
#> 237 263
#>    0    1
#> 1013  987

Method 1: sfaselectioncross + LP Metafrontier

meta_sel_lp <- smfa(
  formula    = log(y) ~ log(x1) + log(x2),
  selectionF = d ~ z1 + z2,      # selection equation: d is the binary indicator
  data       = dat,
  group      = "group",
  S          = 1L,
  udist      = "hnormal",
  groupType  = "sfaselectioncross",
  modelType  = "greene10",        # Greene (2010) two-step ML correction
  lType      = "kronrod",         # integration method for the selection likelihood
  Nsub       = 20,               # number of sub-intervals for numerical integration
  uBound     = Inf,
  method     = "bfgs",
  itermax    = 2000,
  metaMethod = "lp"
)
#> First step probit model...
#> Second step Frontier model...
#> First step probit model...
#> Second step Frontier model...
summary(meta_sel_lp)
#> ============================================================ 
#> Stochastic Metafrontier Analysis
#> Metafrontier method: Linear Programming (LP) Metafrontier 
#> Stochastic Production/Profit Frontier, e = v - u 
#> Group approach     : Sample Selection Stochastic Frontier Analysis 
#> Group estimator    : sfaselectioncross 
#> Group optim solver : BFGS maximization 
#> Groups ( 2 ): 0, 1 
#> Total observations : 500 
#> Distribution       : hnormal 
#> ============================================================ 
#> 
#> ------------------------------------------------------------ 
#> Group: 0 (N = 252)  Log-likelihood: -225.33119
#> ------------------------------------------------------------ 
#> -------------------------------------------------------------------------------- 
#> Sample Selection Correction Stochastic Frontier Model 
#> Dependent Variable:                                                       log(y) 
#> Log likelihood solver:                                         BFGS maximization 
#> Log likelihood iter:                                                          67 
#> Log likelihood value:                                                 -225.33119 
#> Log likelihood gradient norm:                                        5.77769e-07 
#> Estimation based on:                             N =  131 of 252 obs. and K =  6 
#> Inf. Cr:                                           AIC  =  462.7 AIC/N  =  3.532 
#>                                                    BIC  =  479.9 BIC/N  =  3.663 
#>                                                    HQIC =  469.7 HQIC/N =  3.585 
#> -------------------------------------------------------------------------------- 
#> Variances: Sigma-squared(v)   =                                          0.01981 
#>            Sigma(v)           =                                          0.01981 
#>            Sigma-squared(u)   =                                          2.94034 
#>            Sigma(u)           =                                          2.94034 
#> Sigma = Sqrt[(s^2(u)+s^2(v))] =                                          1.72051 
#> Gamma = sigma(u)^2/sigma^2    =                                          0.99331 
#> Lambda = sigma(u)/sigma(v)    =                                         12.18326 
#> Var[u]/{Var[u]+Var[v]}        =                                          0.98180 
#> -------------------------------------------------------------------------------- 
#> Average inefficiency E[ui]     =                                         1.36817 
#> Average efficiency E[exp(-ui)] =                                         0.37581 
#> -------------------------------------------------------------------------------- 
#> Stochastic Production/Profit Frontier, e = v - u 
#> Estimator is 2 step Maximum Likelihood 
#> Final maximum likelihood estimates 
#> -------------------------------------------------------------------------------- 
#>                          Deterministic Component of SFA 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> (Intercept)        1.36885    0.11843 11.5584 < 2.2e-16 ***
#> log(x1)            0.17123    0.06350  2.6963  0.007011 ** 
#> log(x2)            0.07768    0.05238  1.4829  0.138102    
#> -------------------------------------------------------------------------------- 
#>                   Parameter in variance of u (one-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zu_(Intercept)     1.07853    0.10204  10.569 < 2.2e-16 ***
#> -------------------------------------------------------------------------------- 
#>                  Parameters in variance of v (two-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)    
#> Zv_(Intercept)     -3.9216     1.1265 -3.4813 0.000499 ***
#> -------------------------------------------------------------------------------- 
#>                             Selection bias parameter 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)
#> rho                0.27413    1.03629  0.2645   0.7914
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> -------------------------------------------------------------------------------- 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14 
#> Log likelihood status: successful convergence  
#> --------------------------------------------------------------------------------  
#> 
#> ------------------------------------------------------------ 
#> Group: 1 (N = 248)  Log-likelihood: -197.67108
#> ------------------------------------------------------------ 
#> -------------------------------------------------------------------------------- 
#> Sample Selection Correction Stochastic Frontier Model 
#> Dependent Variable:                                                       log(y) 
#> Log likelihood solver:                                         BFGS maximization 
#> Log likelihood iter:                                                          68 
#> Log likelihood value:                                                 -197.67108 
#> Log likelihood gradient norm:                                        2.70510e-06 
#> Estimation based on:                             N =  132 of 248 obs. and K =  6 
#> Inf. Cr:                                           AIC  =  407.3 AIC/N  =  3.086 
#>                                                    BIC  =  424.6 BIC/N  =  3.217 
#>                                                    HQIC =  414.4 HQIC/N =  3.139 
#> -------------------------------------------------------------------------------- 
#> Variances: Sigma-squared(v)   =                                          0.03365 
#>            Sigma(v)           =                                          0.03365 
#>            Sigma-squared(u)   =                                          1.84490 
#>            Sigma(u)           =                                          1.84490 
#> Sigma = Sqrt[(s^2(u)+s^2(v))] =                                          1.37060 
#> Gamma = sigma(u)^2/sigma^2    =                                          0.98209 
#> Lambda = sigma(u)/sigma(v)    =                                          7.40483 
#> Var[u]/{Var[u]+Var[v]}        =                                          0.95221 
#> -------------------------------------------------------------------------------- 
#> Average inefficiency E[ui]     =                                         1.08374 
#> Average efficiency E[exp(-ui)] =                                         0.43864 
#> -------------------------------------------------------------------------------- 
#> Stochastic Production/Profit Frontier, e = v - u 
#> Estimator is 2 step Maximum Likelihood 
#> Final maximum likelihood estimates 
#> -------------------------------------------------------------------------------- 
#>                          Deterministic Component of SFA 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> (Intercept)        1.22609    0.09780 12.5368 < 2.2e-16 ***
#> log(x1)            0.14445    0.03802  3.7994  0.000145 ***
#> log(x2)            0.11056    0.03775  2.9290  0.003401 ** 
#> -------------------------------------------------------------------------------- 
#>                   Parameter in variance of u (one-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zu_(Intercept)     0.61243    0.12841  4.7694 1.847e-06 ***
#> -------------------------------------------------------------------------------- 
#>                  Parameters in variance of v (two-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zv_(Intercept)    -3.39184    0.69675 -4.8681 1.127e-06 ***
#> -------------------------------------------------------------------------------- 
#>                             Selection bias parameter 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)
#> rho                0.78787    0.60643  1.2992   0.1939
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> -------------------------------------------------------------------------------- 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14 
#> Log likelihood status: successful convergence  
#> --------------------------------------------------------------------------------  
#> 
#> ------------------------------------------------------------ 
#> Metafrontier Coefficients (lp):
#>   (LP: deterministic envelope - no estimated parameters)
#> 
#> ------------------------------------------------------------ 
#> Efficiency Statistics (group means):
#> ------------------------------------------------------------ 
#>   N_obs N_valid TE_group_BC TE_group_JLMS TE_meta_BC TE_meta_JLMS  MTR_BC
#> 0   252     131     0.39953       0.39621    0.39953      0.39621 1.00000
#> 1   248     132     0.43254       0.42840    0.37567      0.37207 0.86741
#>   MTR_JLMS
#> 0  1.00000
#> 1  0.86741
#> 
#> Overall:
#> TE_group_BC=0.4160  TE_group_JLMS=0.4123
#> TE_meta_BC=0.3876   TE_meta_JLMS=0.3841
#> MTR_BC=0.9337     MTR_JLMS=0.9337
#> ------------------------------------------------------------ 
#> Total Log-likelihood: -423.0023 
#> AIC: 870.0045   BIC: 920.5798   HQIC: 889.8502 
#> ------------------------------------------------------------ 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14

Note: The selectionF argument is compulsory for groupType = "sfaselectioncross". The left-hand side must be the binary selection variable (d). At least one regressor in the selection equation should not appear in the main frontier formula (exclusion restriction for identification).

Method 2: sfaselectioncross + QP Metafrontier

meta_sel_qp <- smfa(
  formula    = log(y) ~ log(x1) + log(x2),
  selectionF = d ~ z1 + z2,
  data       = dat,
  group      = "group",
  S          = 1L,
  udist      = "hnormal",
  groupType  = "sfaselectioncross",
  modelType  = "greene10",
  lType      = "kronrod",
  Nsub       = 20,
  uBound     = Inf,
  method     = "bfgs",
  itermax    = 2000,
  metaMethod = "qp"
)
#> First step probit model...
#> Second step Frontier model...
#> First step probit model...
#> Second step Frontier model...
summary(meta_sel_qp)
#> ============================================================ 
#> Stochastic Metafrontier Analysis
#> Metafrontier method: Quadratic Programming (QP) Metafrontier 
#> Stochastic Production/Profit Frontier, e = v - u 
#> Group approach     : Sample Selection Stochastic Frontier Analysis 
#> Group estimator    : sfaselectioncross 
#> Group optim solver : BFGS maximization 
#> Groups ( 2 ): 0, 1 
#> Total observations : 500 
#> Distribution       : hnormal 
#> ============================================================ 
#> 
#> ------------------------------------------------------------ 
#> Group: 0 (N = 252)  Log-likelihood: -225.33119
#> ------------------------------------------------------------ 
#> -------------------------------------------------------------------------------- 
#> Sample Selection Correction Stochastic Frontier Model 
#> Dependent Variable:                                                       log(y) 
#> Log likelihood solver:                                         BFGS maximization 
#> Log likelihood iter:                                                          67 
#> Log likelihood value:                                                 -225.33119 
#> Log likelihood gradient norm:                                        5.77769e-07 
#> Estimation based on:                             N =  131 of 252 obs. and K =  6 
#> Inf. Cr:                                           AIC  =  462.7 AIC/N  =  3.532 
#>                                                    BIC  =  479.9 BIC/N  =  3.663 
#>                                                    HQIC =  469.7 HQIC/N =  3.585 
#> -------------------------------------------------------------------------------- 
#> Variances: Sigma-squared(v)   =                                          0.01981 
#>            Sigma(v)           =                                          0.01981 
#>            Sigma-squared(u)   =                                          2.94034 
#>            Sigma(u)           =                                          2.94034 
#> Sigma = Sqrt[(s^2(u)+s^2(v))] =                                          1.72051 
#> Gamma = sigma(u)^2/sigma^2    =                                          0.99331 
#> Lambda = sigma(u)/sigma(v)    =                                         12.18326 
#> Var[u]/{Var[u]+Var[v]}        =                                          0.98180 
#> -------------------------------------------------------------------------------- 
#> Average inefficiency E[ui]     =                                         1.36817 
#> Average efficiency E[exp(-ui)] =                                         0.37581 
#> -------------------------------------------------------------------------------- 
#> Stochastic Production/Profit Frontier, e = v - u 
#> Estimator is 2 step Maximum Likelihood 
#> Final maximum likelihood estimates 
#> -------------------------------------------------------------------------------- 
#>                          Deterministic Component of SFA 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> (Intercept)        1.36885    0.11843 11.5584 < 2.2e-16 ***
#> log(x1)            0.17123    0.06350  2.6963  0.007011 ** 
#> log(x2)            0.07768    0.05238  1.4829  0.138102    
#> -------------------------------------------------------------------------------- 
#>                   Parameter in variance of u (one-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zu_(Intercept)     1.07853    0.10204  10.569 < 2.2e-16 ***
#> -------------------------------------------------------------------------------- 
#>                  Parameters in variance of v (two-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)    
#> Zv_(Intercept)     -3.9216     1.1265 -3.4813 0.000499 ***
#> -------------------------------------------------------------------------------- 
#>                             Selection bias parameter 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)
#> rho                0.27413    1.03629  0.2645   0.7914
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> -------------------------------------------------------------------------------- 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14 
#> Log likelihood status: successful convergence  
#> --------------------------------------------------------------------------------  
#> 
#> ------------------------------------------------------------ 
#> Group: 1 (N = 248)  Log-likelihood: -197.67108
#> ------------------------------------------------------------ 
#> -------------------------------------------------------------------------------- 
#> Sample Selection Correction Stochastic Frontier Model 
#> Dependent Variable:                                                       log(y) 
#> Log likelihood solver:                                         BFGS maximization 
#> Log likelihood iter:                                                          68 
#> Log likelihood value:                                                 -197.67108 
#> Log likelihood gradient norm:                                        2.70510e-06 
#> Estimation based on:                             N =  132 of 248 obs. and K =  6 
#> Inf. Cr:                                           AIC  =  407.3 AIC/N  =  3.086 
#>                                                    BIC  =  424.6 BIC/N  =  3.217 
#>                                                    HQIC =  414.4 HQIC/N =  3.139 
#> -------------------------------------------------------------------------------- 
#> Variances: Sigma-squared(v)   =                                          0.03365 
#>            Sigma(v)           =                                          0.03365 
#>            Sigma-squared(u)   =                                          1.84490 
#>            Sigma(u)           =                                          1.84490 
#> Sigma = Sqrt[(s^2(u)+s^2(v))] =                                          1.37060 
#> Gamma = sigma(u)^2/sigma^2    =                                          0.98209 
#> Lambda = sigma(u)/sigma(v)    =                                          7.40483 
#> Var[u]/{Var[u]+Var[v]}        =                                          0.95221 
#> -------------------------------------------------------------------------------- 
#> Average inefficiency E[ui]     =                                         1.08374 
#> Average efficiency E[exp(-ui)] =                                         0.43864 
#> -------------------------------------------------------------------------------- 
#> Stochastic Production/Profit Frontier, e = v - u 
#> Estimator is 2 step Maximum Likelihood 
#> Final maximum likelihood estimates 
#> -------------------------------------------------------------------------------- 
#>                          Deterministic Component of SFA 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> (Intercept)        1.22609    0.09780 12.5368 < 2.2e-16 ***
#> log(x1)            0.14445    0.03802  3.7994  0.000145 ***
#> log(x2)            0.11056    0.03775  2.9290  0.003401 ** 
#> -------------------------------------------------------------------------------- 
#>                   Parameter in variance of u (one-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zu_(Intercept)     0.61243    0.12841  4.7694 1.847e-06 ***
#> -------------------------------------------------------------------------------- 
#>                  Parameters in variance of v (two-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zv_(Intercept)    -3.39184    0.69675 -4.8681 1.127e-06 ***
#> -------------------------------------------------------------------------------- 
#>                             Selection bias parameter 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)
#> rho                0.78787    0.60643  1.2992   0.1939
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> -------------------------------------------------------------------------------- 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14 
#> Log likelihood status: successful convergence  
#> --------------------------------------------------------------------------------  
#> 
#> ------------------------------------------------------------ 
#> Metafrontier Coefficients (qp):
#>               Estimate Std. Error z value  Pr(>|z|)    
#> (Intercept) 1.36736720 0.00042531 3215.00 < 2.2e-16 ***
#> log(x1)     0.16870720 0.00027176  620.79 < 2.2e-16 ***
#> log(x2)     0.07759335 0.00030299  256.09 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ------------------------------------------------------------ 
#> Efficiency Statistics (group means):
#> ------------------------------------------------------------ 
#>   N_obs N_valid TE_group_BC TE_group_JLMS TE_meta_BC TE_meta_JLMS  MTR_BC
#> 0   252     131     0.39953       0.39621    0.39914      0.39582 0.99892
#> 1   248     132     0.43254       0.42840    0.37556      0.37196 0.86709
#>   MTR_JLMS
#> 0  0.99892
#> 1  0.86709
#> 
#> Overall:
#> TE_group_BC=0.4160  TE_group_JLMS=0.4123
#> TE_meta_BC=0.3874   TE_meta_JLMS=0.3839
#> MTR_BC=0.9330     MTR_JLMS=0.9330
#> ------------------------------------------------------------ 
#> Total Log-likelihood: -423.0023 
#> AIC: 876.0045   BIC: 939.2237   HQIC: 900.8116 
#> ------------------------------------------------------------ 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14

Method 3: sfaselectioncross + SFA (Huang)

meta_sel_huang <- smfa(
  formula     = log(y) ~ log(x1) + log(x2),
  selectionF  = d ~ z1 + z2,
  data        = dat,
  group       = "group",
  S           = 1L,
  udist       = "hnormal",
  groupType   = "sfaselectioncross",
  modelType   = "greene10",
  lType       = "kronrod",
  Nsub        = 100,
  uBound      = Inf,
  method      = "bfgs",
  itermax     = 2000,
  metaMethod  = "sfa",
  sfaApproach = "huang"
)
#> First step probit model...
#> Second step Frontier model...
#> First step probit model...
#> Second step Frontier model...
summary(meta_sel_huang)
#> ============================================================ 
#> Stochastic Metafrontier Analysis
#> Metafrontier method: SFA Metafrontier [Huang et al. (2014), two-stage] 
#> Stochastic Production/Profit Frontier, e = v - u 
#> SFA approach       : huang 
#> Group approach     : Sample Selection Stochastic Frontier Analysis 
#> Group estimator    : sfaselectioncross 
#> Group optim solver : BFGS maximization 
#> Groups ( 2 ): 0, 1 
#> Total observations : 500 
#> Distribution       : hnormal 
#> ============================================================ 
#> 
#> ------------------------------------------------------------ 
#> Group: 0 (N = 252)  Log-likelihood: -225.33119
#> ------------------------------------------------------------ 
#> -------------------------------------------------------------------------------- 
#> Sample Selection Correction Stochastic Frontier Model 
#> Dependent Variable:                                                       log(y) 
#> Log likelihood solver:                                         BFGS maximization 
#> Log likelihood iter:                                                          67 
#> Log likelihood value:                                                 -225.33119 
#> Log likelihood gradient norm:                                        5.77769e-07 
#> Estimation based on:                             N =  131 of 252 obs. and K =  6 
#> Inf. Cr:                                           AIC  =  462.7 AIC/N  =  3.532 
#>                                                    BIC  =  479.9 BIC/N  =  3.663 
#>                                                    HQIC =  469.7 HQIC/N =  3.585 
#> -------------------------------------------------------------------------------- 
#> Variances: Sigma-squared(v)   =                                          0.01981 
#>            Sigma(v)           =                                          0.01981 
#>            Sigma-squared(u)   =                                          2.94034 
#>            Sigma(u)           =                                          2.94034 
#> Sigma = Sqrt[(s^2(u)+s^2(v))] =                                          1.72051 
#> Gamma = sigma(u)^2/sigma^2    =                                          0.99331 
#> Lambda = sigma(u)/sigma(v)    =                                         12.18326 
#> Var[u]/{Var[u]+Var[v]}        =                                          0.98180 
#> -------------------------------------------------------------------------------- 
#> Average inefficiency E[ui]     =                                         1.36817 
#> Average efficiency E[exp(-ui)] =                                         0.37581 
#> -------------------------------------------------------------------------------- 
#> Stochastic Production/Profit Frontier, e = v - u 
#> Estimator is 2 step Maximum Likelihood 
#> Final maximum likelihood estimates 
#> -------------------------------------------------------------------------------- 
#>                          Deterministic Component of SFA 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> (Intercept)        1.36885    0.11843 11.5584 < 2.2e-16 ***
#> log(x1)            0.17123    0.06350  2.6963  0.007011 ** 
#> log(x2)            0.07768    0.05238  1.4829  0.138102    
#> -------------------------------------------------------------------------------- 
#>                   Parameter in variance of u (one-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zu_(Intercept)     1.07853    0.10204  10.569 < 2.2e-16 ***
#> -------------------------------------------------------------------------------- 
#>                  Parameters in variance of v (two-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)    
#> Zv_(Intercept)     -3.9216     1.1265 -3.4813 0.000499 ***
#> -------------------------------------------------------------------------------- 
#>                             Selection bias parameter 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)
#> rho                0.27413    1.03629  0.2645   0.7914
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> -------------------------------------------------------------------------------- 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14 
#> Log likelihood status: successful convergence  
#> --------------------------------------------------------------------------------  
#> 
#> ------------------------------------------------------------ 
#> Group: 1 (N = 248)  Log-likelihood: -197.67108
#> ------------------------------------------------------------ 
#> -------------------------------------------------------------------------------- 
#> Sample Selection Correction Stochastic Frontier Model 
#> Dependent Variable:                                                       log(y) 
#> Log likelihood solver:                                         BFGS maximization 
#> Log likelihood iter:                                                          68 
#> Log likelihood value:                                                 -197.67108 
#> Log likelihood gradient norm:                                        2.70510e-06 
#> Estimation based on:                             N =  132 of 248 obs. and K =  6 
#> Inf. Cr:                                           AIC  =  407.3 AIC/N  =  3.086 
#>                                                    BIC  =  424.6 BIC/N  =  3.217 
#>                                                    HQIC =  414.4 HQIC/N =  3.139 
#> -------------------------------------------------------------------------------- 
#> Variances: Sigma-squared(v)   =                                          0.03365 
#>            Sigma(v)           =                                          0.03365 
#>            Sigma-squared(u)   =                                          1.84490 
#>            Sigma(u)           =                                          1.84490 
#> Sigma = Sqrt[(s^2(u)+s^2(v))] =                                          1.37060 
#> Gamma = sigma(u)^2/sigma^2    =                                          0.98209 
#> Lambda = sigma(u)/sigma(v)    =                                          7.40483 
#> Var[u]/{Var[u]+Var[v]}        =                                          0.95221 
#> -------------------------------------------------------------------------------- 
#> Average inefficiency E[ui]     =                                         1.08374 
#> Average efficiency E[exp(-ui)] =                                         0.43864 
#> -------------------------------------------------------------------------------- 
#> Stochastic Production/Profit Frontier, e = v - u 
#> Estimator is 2 step Maximum Likelihood 
#> Final maximum likelihood estimates 
#> -------------------------------------------------------------------------------- 
#>                          Deterministic Component of SFA 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> (Intercept)        1.22609    0.09780 12.5368 < 2.2e-16 ***
#> log(x1)            0.14445    0.03802  3.7994  0.000145 ***
#> log(x2)            0.11056    0.03775  2.9290  0.003401 ** 
#> -------------------------------------------------------------------------------- 
#>                   Parameter in variance of u (one-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zu_(Intercept)     0.61243    0.12841  4.7694 1.847e-06 ***
#> -------------------------------------------------------------------------------- 
#>                  Parameters in variance of v (two-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zv_(Intercept)    -3.39184    0.69675 -4.8681 1.127e-06 ***
#> -------------------------------------------------------------------------------- 
#>                             Selection bias parameter 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)
#> rho                0.78787    0.60643  1.2992   0.1939
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> -------------------------------------------------------------------------------- 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14 
#> Log likelihood status: successful convergence  
#> --------------------------------------------------------------------------------  
#> 
#> ------------------------------------------------------------ 
#> Metafrontier Coefficients (sfa):
#> Meta-optim solver  : BFGS maximization 
#>              Estimate Std. Error z value  Pr(>|z|)    
#> (Intercept) 1.2984590  0.2752514  4.7174 2.389e-06 ***
#> log(x1)     0.1557503  0.0038319 40.6458 < 2.2e-16 ***
#> log(x2)     0.0921453  0.0042741 21.5589 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>   Meta-frontier model details:
#> -------------------------------------------------------------------------------- 
#> Normal-Half Normal SF Model 
#> Dependent Variable:                                          group_fitted_values 
#> Log likelihood solver:                                         BFGS maximization 
#> Log likelihood iter:                                                         608 
#> Log likelihood value:                                                  305.40441 
#> Log likelihood gradient norm:                                        2.74614e-05 
#> Estimation based on:                                         N =  263 and K =  5 
#> Inf. Cr:                                         AIC  =  -600.8 AIC/N  =  -2.284 
#>                                                  BIC  =  -582.9 BIC/N  =  -2.217 
#>                                                  HQIC =  -593.6 HQIC/N =  -2.257 
#> -------------------------------------------------------------------------------- 
#> Variances: Sigma-squared(v)   =                                          0.00573 
#>            Sigma(v)           =                                          0.00573 
#>            Sigma-squared(u)   =                                          0.00002 
#>            Sigma(u)           =                                          0.00002 
#> Sigma = Sqrt[(s^2(u)+s^2(v))] =                                          0.07583 
#> Gamma = sigma(u)^2/sigma^2    =                                          0.00295 
#> Lambda = sigma(u)/sigma(v)    =                                          0.05438 
#> Var[u]/{Var[u]+Var[v]}        =                                          0.00107 
#> -------------------------------------------------------------------------------- 
#> Average inefficiency E[ui]     =                                         0.00329 
#> Average efficiency E[exp(-ui)] =                                         0.99672 
#> -------------------------------------------------------------------------------- 
#> Stochastic Production/Profit Frontier, e = v - u 
#> -----[ Tests vs. No Inefficiency ]-----
#> Likelihood Ratio Test of Inefficiency
#> Deg. freedom for inefficiency model                                            1 
#> Log Likelihood for OLS Log(H0) =                                       305.40442 
#> LR statistic:  
#> Chisq = 2*[LogL(H0)-LogL(H1)]  =                                        -0.00000 
#> Kodde-Palm C*:       95%: 2.70554                                   99%: 5.41189 
#> Coelli (1995) skewness test on OLS residuals
#> M3T: z                         =                                        -0.06529 
#> M3T: p.value                   =                                         0.94794 
#> Final maximum likelihood estimates 
#> -------------------------------------------------------------------------------- 
#>                          Deterministic Component of SFA 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> (Intercept)        1.29846    0.27525  4.7174 2.389e-06 ***
#> .X2                0.15575    0.00383 40.6458 < 2.2e-16 ***
#> .X3                0.09215    0.00427 21.5589 < 2.2e-16 ***
#> -------------------------------------------------------------------------------- 
#>                   Parameter in variance of u (one-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)
#> Zu_(Intercept)     -10.985    167.527 -0.0656   0.9477
#> -------------------------------------------------------------------------------- 
#>                  Parameters in variance of v (two-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zv_(Intercept)    -5.16142    0.20002 -25.805 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> -------------------------------------------------------------------------------- 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14 
#> Log likelihood status: successful convergence  
#> --------------------------------------------------------------------------------  
#> Log likelihood status: successful convergence  
#> 
#> ------------------------------------------------------------ 
#> Efficiency Statistics (group means):
#> ------------------------------------------------------------ 
#>   N_obs N_valid TE_group_BC TE_group_JLMS TE_meta_BC TE_meta_JLMS  MTR_BC
#> 0   252     131     0.39953       0.39621    0.39825      0.39494 0.99680
#> 1   248     132     0.43254       0.42840    0.43109      0.42696 0.99665
#>   MTR_JLMS
#> 0  0.99680
#> 1  0.99664
#> 
#> Overall:
#> TE_group_BC=0.4160  TE_group_JLMS=0.4123
#> TE_meta_BC=0.4147   TE_meta_JLMS=0.4109
#> MTR_BC=0.9967     MTR_JLMS=0.9967
#> ------------------------------------------------------------ 
#> Total Log-likelihood: -117.5979 
#> AIC: 269.1957   BIC: 340.844   HQIC: 297.3104 
#> ------------------------------------------------------------ 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14

Method 4: sfaselectioncross + SFA (O’Donnell)

meta_sel_odonnell <- smfa(
  formula     = log(y) ~ log(x1) + log(x2),
  selectionF  = d ~ z1 + z2,
  data        = dat,
  group       = "group",
  S           = 1L,
  udist       = "hnormal",
  groupType   = "sfaselectioncross",
  modelType   = "greene10",
  lType       = "kronrod",
  Nsub        = 100,
  uBound      = Inf,
  method      = "bfgs",
  itermax     = 2000,
  metaMethod  = "sfa",
  sfaApproach = "ordonnell"
)
#> First step probit model...
#> Second step Frontier model...
#> First step probit model...
#> Second step Frontier model...
#> Warning: The residuals of the OLS are right-skewed. This may indicate the absence of inefficiency or
#>   model misspecification or sample 'bad luck'
summary(meta_sel_odonnell)
#> Warning: 263 MTR value(s) > 1 detected in O'Donnell SFA approach. This
#> typically occurs when the second-stage SFA estimates near-zero inefficiency
#> (sigma_u -> 0), causing TE_meta ~= 1 and MTR = TE_meta/TE_group > 1. Consider
#> using metaMethod='lp' or sfaApproach='huang' instead.
#> ============================================================ 
#> Stochastic Metafrontier Analysis
#> Metafrontier method: SFA Metafrontier [O'Donnell et al. (2008), envelope] 
#> Stochastic Production/Profit Frontier, e = v - u 
#> SFA approach       : ordonnell 
#> Group approach     : Sample Selection Stochastic Frontier Analysis 
#> Group estimator    : sfaselectioncross 
#> Group optim solver : BFGS maximization 
#> Groups ( 2 ): 0, 1 
#> Total observations : 500 
#> Distribution       : hnormal 
#> ============================================================ 
#> 
#> ------------------------------------------------------------ 
#> Group: 0 (N = 252)  Log-likelihood: -225.33119
#> ------------------------------------------------------------ 
#> -------------------------------------------------------------------------------- 
#> Sample Selection Correction Stochastic Frontier Model 
#> Dependent Variable:                                                       log(y) 
#> Log likelihood solver:                                         BFGS maximization 
#> Log likelihood iter:                                                          67 
#> Log likelihood value:                                                 -225.33119 
#> Log likelihood gradient norm:                                        5.77769e-07 
#> Estimation based on:                             N =  131 of 252 obs. and K =  6 
#> Inf. Cr:                                           AIC  =  462.7 AIC/N  =  3.532 
#>                                                    BIC  =  479.9 BIC/N  =  3.663 
#>                                                    HQIC =  469.7 HQIC/N =  3.585 
#> -------------------------------------------------------------------------------- 
#> Variances: Sigma-squared(v)   =                                          0.01981 
#>            Sigma(v)           =                                          0.01981 
#>            Sigma-squared(u)   =                                          2.94034 
#>            Sigma(u)           =                                          2.94034 
#> Sigma = Sqrt[(s^2(u)+s^2(v))] =                                          1.72051 
#> Gamma = sigma(u)^2/sigma^2    =                                          0.99331 
#> Lambda = sigma(u)/sigma(v)    =                                         12.18326 
#> Var[u]/{Var[u]+Var[v]}        =                                          0.98180 
#> -------------------------------------------------------------------------------- 
#> Average inefficiency E[ui]     =                                         1.36817 
#> Average efficiency E[exp(-ui)] =                                         0.37581 
#> -------------------------------------------------------------------------------- 
#> Stochastic Production/Profit Frontier, e = v - u 
#> Estimator is 2 step Maximum Likelihood 
#> Final maximum likelihood estimates 
#> -------------------------------------------------------------------------------- 
#>                          Deterministic Component of SFA 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> (Intercept)        1.36885    0.11843 11.5584 < 2.2e-16 ***
#> log(x1)            0.17123    0.06350  2.6963  0.007011 ** 
#> log(x2)            0.07768    0.05238  1.4829  0.138102    
#> -------------------------------------------------------------------------------- 
#>                   Parameter in variance of u (one-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zu_(Intercept)     1.07853    0.10204  10.569 < 2.2e-16 ***
#> -------------------------------------------------------------------------------- 
#>                  Parameters in variance of v (two-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)    
#> Zv_(Intercept)     -3.9216     1.1265 -3.4813 0.000499 ***
#> -------------------------------------------------------------------------------- 
#>                             Selection bias parameter 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)
#> rho                0.27413    1.03629  0.2645   0.7914
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> -------------------------------------------------------------------------------- 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14 
#> Log likelihood status: successful convergence  
#> --------------------------------------------------------------------------------  
#> 
#> ------------------------------------------------------------ 
#> Group: 1 (N = 248)  Log-likelihood: -197.67108
#> ------------------------------------------------------------ 
#> -------------------------------------------------------------------------------- 
#> Sample Selection Correction Stochastic Frontier Model 
#> Dependent Variable:                                                       log(y) 
#> Log likelihood solver:                                         BFGS maximization 
#> Log likelihood iter:                                                          68 
#> Log likelihood value:                                                 -197.67108 
#> Log likelihood gradient norm:                                        2.70510e-06 
#> Estimation based on:                             N =  132 of 248 obs. and K =  6 
#> Inf. Cr:                                           AIC  =  407.3 AIC/N  =  3.086 
#>                                                    BIC  =  424.6 BIC/N  =  3.217 
#>                                                    HQIC =  414.4 HQIC/N =  3.139 
#> -------------------------------------------------------------------------------- 
#> Variances: Sigma-squared(v)   =                                          0.03365 
#>            Sigma(v)           =                                          0.03365 
#>            Sigma-squared(u)   =                                          1.84490 
#>            Sigma(u)           =                                          1.84490 
#> Sigma = Sqrt[(s^2(u)+s^2(v))] =                                          1.37060 
#> Gamma = sigma(u)^2/sigma^2    =                                          0.98209 
#> Lambda = sigma(u)/sigma(v)    =                                          7.40483 
#> Var[u]/{Var[u]+Var[v]}        =                                          0.95221 
#> -------------------------------------------------------------------------------- 
#> Average inefficiency E[ui]     =                                         1.08374 
#> Average efficiency E[exp(-ui)] =                                         0.43864 
#> -------------------------------------------------------------------------------- 
#> Stochastic Production/Profit Frontier, e = v - u 
#> Estimator is 2 step Maximum Likelihood 
#> Final maximum likelihood estimates 
#> -------------------------------------------------------------------------------- 
#>                          Deterministic Component of SFA 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> (Intercept)        1.22609    0.09780 12.5368 < 2.2e-16 ***
#> log(x1)            0.14445    0.03802  3.7994  0.000145 ***
#> log(x2)            0.11056    0.03775  2.9290  0.003401 ** 
#> -------------------------------------------------------------------------------- 
#>                   Parameter in variance of u (one-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zu_(Intercept)     0.61243    0.12841  4.7694 1.847e-06 ***
#> -------------------------------------------------------------------------------- 
#>                  Parameters in variance of v (two-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zv_(Intercept)    -3.39184    0.69675 -4.8681 1.127e-06 ***
#> -------------------------------------------------------------------------------- 
#>                             Selection bias parameter 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)
#> rho                0.78787    0.60643  1.2992   0.1939
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> -------------------------------------------------------------------------------- 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14 
#> Log likelihood status: successful convergence  
#> --------------------------------------------------------------------------------  
#> 
#> ------------------------------------------------------------ 
#> Metafrontier Coefficients (sfa):
#> Meta-optim solver  : BFGS maximization 
#>               Estimate Std. Error z value  Pr(>|z|)    
#> (Intercept) 1.36739085 0.00198573  688.61 < 2.2e-16 ***
#> log(x1)     0.16870720 0.00027021  624.36 < 2.2e-16 ***
#> log(x2)     0.07759335 0.00030126  257.57 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>   Meta-frontier model details:
#> -------------------------------------------------------------------------------- 
#> Normal-Half Normal SF Model 
#> Dependent Variable:                                                  lp_envelope 
#> Log likelihood solver:                                         BFGS maximization 
#> Log likelihood iter:                                                         333 
#> Log likelihood value:                                                 1002.79269 
#> Log likelihood gradient norm:                                        3.27261e-03 
#> Estimation based on:                                         N =  263 and K =  5 
#> Inf. Cr:                                        AIC  =  -1995.6 AIC/N  =  -7.588 
#>                                                 BIC  =  -1977.7 BIC/N  =  -7.520 
#>                                                 HQIC =  -1988.4 HQIC/N =  -7.560 
#> -------------------------------------------------------------------------------- 
#> Variances: Sigma-squared(v)   =                                          0.00003 
#>            Sigma(v)           =                                          0.00003 
#>            Sigma-squared(u)   =                                          0.00000 
#>            Sigma(u)           =                                          0.00000 
#> Sigma = Sqrt[(s^2(u)+s^2(v))] =                                          0.00534 
#> Gamma = sigma(u)^2/sigma^2    =                                          0.00003 
#> Lambda = sigma(u)/sigma(v)    =                                          0.00555 
#> Var[u]/{Var[u]+Var[v]}        =                                          0.00001 
#> -------------------------------------------------------------------------------- 
#> Average inefficiency E[ui]     =                                         0.00002 
#> Average efficiency E[exp(-ui)] =                                         0.99998 
#> -------------------------------------------------------------------------------- 
#> Stochastic Production/Profit Frontier, e = v - u 
#> -----[ Tests vs. No Inefficiency ]-----
#> Likelihood Ratio Test of Inefficiency
#> Deg. freedom for inefficiency model                                            1 
#> Log Likelihood for OLS Log(H0) =                                      1002.79271 
#> LR statistic:  
#> Chisq = 2*[LogL(H0)-LogL(H1)]  =                                        -0.00003 
#> Kodde-Palm C*:       95%: 2.70554                                   99%: 5.41189 
#> Coelli (1995) skewness test on OLS residuals
#> M3T: z                         =                                        68.20600 
#> M3T: p.value                   =                                         0.00000 
#> Final maximum likelihood estimates 
#> -------------------------------------------------------------------------------- 
#>                          Deterministic Component of SFA 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> (Intercept)        1.36739    0.00199  688.61 < 2.2e-16 ***
#> .X2                0.16871    0.00027  624.36 < 2.2e-16 ***
#> .X3                0.07759    0.00030  257.57 < 2.2e-16 ***
#> -------------------------------------------------------------------------------- 
#>                   Parameter in variance of u (one-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value Pr(>|z|)
#> Zu_(Intercept)     -20.852    164.024 -0.1271   0.8988
#> -------------------------------------------------------------------------------- 
#>                  Parameters in variance of v (two-sided error) 
#> -------------------------------------------------------------------------------- 
#>                Coefficient Std. Error z value  Pr(>|z|)    
#> Zv_(Intercept)   -10.46369    0.08722 -119.97 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> -------------------------------------------------------------------------------- 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14 
#> Log likelihood status: successful convergence  
#> --------------------------------------------------------------------------------  
#> Log likelihood status: successful convergence  
#> 
#> ------------------------------------------------------------ 
#> Efficiency Statistics (group means):
#> ------------------------------------------------------------ 
#>   N_obs N_valid TE_group_BC TE_group_JLMS TE_meta_BC TE_meta_JLMS   MTR_BC
#> 0   252     131     0.39953       0.39621    0.99998      0.99998 16.55549
#> 1   248     132     0.43254       0.42840    0.99998      0.99998  5.29814
#>   MTR_JLMS
#> 0 16.71157
#> 1  5.35380
#> 
#> Overall:
#> TE_group_BC=0.4160  TE_group_JLMS=0.4123
#> TE_meta_BC=1.0000   TE_meta_JLMS=1.0000
#> MTR_BC=10.9268     MTR_JLMS=11.0327
#> ------------------------------------------------------------ 
#> Total Log-likelihood: 579.7904 
#> AIC: -1125.581   BIC: -1053.933   HQIC: -1097.466 
#> ------------------------------------------------------------ 
#> Model was estimated on : Apr Fri 24, 2026 at 15:14

Interpreting the Selection Correction

The first-stage probit model estimates the selection probability. The key additional parameter in the frontier model is rho — the correlation between the selection equation error and the frontier equation noise.

# The rho parameter appears in the summary output:
# ----------------------------------------------------------------
#              Selection bias parameter
# ----------------------------------------------------------------
#           Coefficient Std. Error z value  Pr(>|z|)
# rho          0.89550    0.28696  3.1207  0.001804 **

# A significant rho indicates selection bias IS present and the
# correction is important.
rho value Interpretation
≈ 0, p > 0.05 No significant selection bias; standard SFA may be sufficient
> 0, p < 0.05 Positive selection — efficient firms are more likely selected
< 0, p < 0.05 Negative selection — inefficient firms are more likely selected

Extracting Efficiencies

Only selected observations (those with d == 1) receive efficiency estimates:

eff_sel <- efficiencies(meta_sel_lp)

# Non-selected observations have NA efficiencies
sum(is.na(eff_sel$TE_group_BC))   # count of non-selected obs
#> [1] 237

# Subset for selected observations in group 1
sel_grp1 <- eff_sel[eff_sel$group == 1 & !is.na(eff_sel$TE_group_BC), ]
summary(sel_grp1[, c("TE_group_BC", "TE_meta_BC", "MTR_BC")])
#>   TE_group_BC        TE_meta_BC          MTR_BC      
#>  Min.   :0.01365   Min.   :0.01189   Min.   :0.7502  
#>  1st Qu.:0.21013   1st Qu.:0.18598   1st Qu.:0.8450  
#>  Median :0.40784   Median :0.34783   Median :0.8686  
#>  Mean   :0.43254   Mean   :0.37567   Mean   :0.8674  
#>  3rd Qu.:0.64468   3rd Qu.:0.57565   3rd Qu.:0.8873  
#>  Max.   :0.93622   Max.   :0.86071   Max.   :1.0000