Wooldridge Econometrics Pdf 128405 | Papke Wooldridge Fractionalresponse

Partial capture of text on file.
                                                              Journal of Econometrics 145 (2008) 121–133
                                                             Contents lists available at ScienceDirect
                                                            Journal of Econometrics
                                                      journal homepage: www.elsevier.com/locate/jeconom
            Paneldatamethodsforfractionalresponsevariableswithanapplicationtotest
            passrates
            Leslie E. Papke, Jeffrey M. Wooldridge∗
            DepartmentofEconomics,MichiganStateUniversity,EastLansing,MI48824-1038,UnitedStates
            a r t i c l e    i n f o                   a b s t r a c t
            Article history:                           We revisit the effects of spending on student performance using data from the state of Michigan.
            Available online 19 June 2008              In addition to exploiting a dramatic change in funding in the mid-1990s and subsequent nonsmooth
            JEL classification:                        changes, we propose nonlinear panel data models that recognize the bounded nature of the pass
            C23                                        rate. Importantly, we show how to estimate average partial effects, which can be compared across
            C25                                        many different models (linear and nonlinear) under different assumptions and estimated using many
                                                       different methods. Wefindthatspendinghasnontrivialandstatisticallysignificanteffects,althoughthe
            Keywords:                                  diminishingeffect is not especially pronounced.
            Fractional response                                                                                  ©2008ElsevierB.V.Allrightsreserved.
            Paneldata
            Unobservedeffects
            Probit
            Partial effects
            Executivesummary                                                       todistrictspending.Insomespecifications,wealsoallowspending
                                                                                   to be correlated with time-varying unobserved inputs, such as the
               Determining the effects of school inputs on student perfor-         average quality of the students in a particular grade, or parental
            manceintheUnitedStatesisanimportantpolicyissue.Discussions             effort.
            ofincreasedfundingforK-12education,aswellastheimplications                Usingdataonafourth-grademathtestforMichiganfrom1992
            for equalized funding across schools, rely on benefits measured in     through 2001, which includes significant changes in funding that
            termsofstudentperformance.Inmanystates,includingMichigan,              resulted from Proposal A, we use a probit functional form for
            success is measured – and reported widely in the press – in terms      the mean response to impose a bounded effect of spending on
            of pass rates on statewide standardized tests. Because pass rates,     pass rates. Given a 10% increase in four-year averaged spending,
            whenmeasuredasaproportion,arenecessarilyboundedbetween                 the estimated average effect on the pass rate varies from about
            zeroandone,standardlinearmodelsmaynotprovideanaccurate                 three to six percentage points, with the higher estimate occurring
            picture of the effects of spending on pass rates throughout the en-    whenspendingisallowedtobecorrelatedwithunobservedtime-
            tire distribution of spending. In particular, if pass rates depend on  varying inputs. In the latter case, as spending varies from the 5th
            spending,therelationshipmustbebounded–otherwise,passrates              percentile to the 95th percentile, the estimated effect on the pass
            are eventually predicted to be greater than one.                       rate falls by roughly three percentage points — a nontrivial but
               Some of the most convincing studies examining the link              not overwhelming change. The estimate for the linear model lies
            betweenstudentachievementandspendinghaveusedpaneldata,                 between the marginal effects at the extreme values of spending.
            particularly when the time periods straddle a policy change that       Therefore, the linear approximation does a good job in estimating
            induces (arguably) exogenous variation in spending. Yet standard       theaverageeffectofspendingonpassrates,eventhoughitmisses
            linear panel data models are not well suited to pass rates because     someofthenonlineareffectsatmoreextremespendinglevels.
            it is difficult to impose a positive yet bounded effect of spending
            on pass rates. In this paper, we extend our earlier work on            1. Introduction
            fractionalresponsemodelsforcrosssectiondatatopaneldata.We
            allow unobserved time-constant district effects – which capture           In 1994, voters in Michigan passed Proposal A, which led to
            historicaldifferencesamongdistricts–tobesystematicallyrelated          major changes in the way K-12 education is financed. The system
                                                                                   went from one largely based on local property tax revenues to
                                                                                   funding at the state level, supported primarily by an increase
             ∗ Correspondingauthor.Tel.:+15173535972;fax:+15174321068.             in the sales tax rate. One consequence of this change is that
               E-mail address: wooldri1@msu.edu (J.M. Wooldridge).                 the lowest spending districts were provided with a foundation
            0304-4076/$–seefrontmatter©2008ElsevierB.V.Allrightsreserved.
            doi:10.1016/j.jeconom.2008.05.009
         122                                     L.E. Papke, J.M. Wooldridge / Journal of Econometrics 145 (2008) 121–133
         allowance significantly above their previous per-student funding.     slopecoefficients.Thestatisticalpropertiesofparameterestimates
         As described in Papke (2005), the change in funding resulted in a     and partial effects of so-called ‘‘fixed effects fractional logit’’ are
         naturalexperimentthatcanbeusedtomorepreciselyestimatethe              largely unknown with small T. (Hausman and Leonard (1997)
         effects of per-student spending on student performance.               includeteam‘‘fixedeffects’’intheiranalysis,buttheseparameters
            Papke (2005) used building-level panel data, for 1993 through      can be estimated with precision because Hausman and Leonard
         1998, and found nontrivial effects of spending on the pass rate       have many telecasts per team. Therefore, there is no incidental
         onastatewidefourth-grademathtest.Onepotentialdrawbackof               parametersproblemintheHausmanandLeonardsetup.)
         Papke’sanalysisisthatsheusedlinearfunctionalformsinherfixed              In this paper we extend our earlier work and show how to
         effects and instrumental variables fixed effects analyses, which      specify, and estimate, fractional response models for panel data
         ignore the bounded nature of a pass rate (either a percentage         with a large cross-sectional dimension and relatively few time
         or a proportion). Papke did split the sample into districts that      periods. We explicitly allow for time-constant unobserved effects
         initially were performing below the median and those performing       that can be correlated with explanatory variables. We cover two
         above the median, and found very different effects. But such          cases. The first is when, conditional on an unobserved effect,
         sample splitting is necessarily arbitrary and begs the question       the explanatory variables are strictly exogenous. We then relax
         as to whether linear functional forms adequately capture the          the strict exogeneity assumption when instrumental variables are
         diminishingeffects of spending at already high levels of spending.    available.
            Empirical studies attempting to explain fractional responses          Rather than treating the unobserved effects as parameters
         haveproliferatedinrecentyears.Justafewexamplesoffractional            to estimate, we employ the Mundlak (1978) and Chamberlain
         responses include pension plan participation rates, industry          (1980)deviceofmodelingthedistributionoftheunobservedeffect
         marketshares,television ratings, fraction of land area allocated to   conditional on the strictly exogenous variables. To accommodate
         agriculture, and test pass rates. Researchers have begun to take      this approach, we exploit features of the normal distribution.
         seriously the functional form issues that arise with a fractional     Therefore, unlike in our early work, where we focused mainly
         response: a linear functional form for the conditional mean might     on the logistic response function, here we use a probit response
         miss important nonlinearities. Further, the traditional solution of   function. In binary response contexts, the choice between the
         usingthelog-oddstransformationobviouslyfailswhenweobserve             logistic and probit conditional mean functions for the structural
         responses at the corners, zero and one. Just as importantly, even     expectation is largely a matter of taste, although it has long been
         in cases where the variable is strictly inside the unit interval,     recognized that, for handling endogenous explanatory variables,
         we cannot recover the expected value of the fractional response       theprobitmeanfunctionhassomedistinctadvantages.Wefurther
         from a linear model for the log-odds ratio unless we make strong      exploit those advantages for panel data models in this paper. As
         independenceassumptions.                                              we will see, the probit response function results in very simple
            In Papke and Wooldridge (1996), we proposed direct models          estimation methods. While our focus is on fractional responses,
         for the conditional mean of the fractional response that keep the     our methods apply to the binary response case with a continuous
         predicted values in the unit interval. We applied the method of       endogenousexplanatoryvariableandunobservedheterogeneity.
         quasi-maximum likelihood estimation (QMLE) to obtain robust              An important feature of our work is that we provide simple
         estimators of the conditional mean parameters with satisfactory       estimates of the partial effects averaged across the population
         efficiency properties. The most common of those methods, where        – sometimes called the ‘‘average partial effects’’ (APEs) or
         the mean function takes the logistic form, has since been             ‘‘population averaged effects’’. These turn out to be identified
         applied in numerous empirical studies, including Hausman and          under no assumptions on the serial dependence in the response
         Leonard (1997), Liu et al. (1999), and Wagner (2001). (In a           variable, and the suspected endogenous explanatory variable is
         private communication shortly after the publication of Papke and      allowed to arbitrarily correlate with unobserved shocks in other
                                                               r
         Wooldridge(1996),inwhichhekindlyprovidedStata
code,John               timeperiods.
         Mullahy dubbed the method of quasi-MLE with a logistic mean              The rest of the paper is organized as follows. Section 2
         function ‘‘fractional logit’’, or ‘‘flogit’’ for short.)              introduces the model and assumptions for the case of strictly
            Hausman and Leonard (1997) applied fractional logit to panel       exogenous explanatory variables, and shows how to identify the
         dataontelevisionratingsofNationalBasketballAssociationgames           APEs. Section 3 discusses estimation methods, including pooled
         to estimate the effects of superstars on telecast ratings. In using   QMLE and an extension of the generalized estimating equation
         pooled QMLE with panel data, the only extra complication is in        (GEE)approach.Section4relaxesthestrictexogeneityassumption,
         ensuring that the standard errors are robust to arbitrary serial      and shows how control function methods can be combined
         correlation (in addition to misspecification of the conditional       with the Mundlak–Chamberlain device to produce consistent
         variance).Butamoresubstantiveissueariseswithpaneldataanda             estimators. Section 5 applies the new methods to estimate the
         nonlinear response function: How can we account for unobserved        effects of spending on math test pass rates for Michigan, and
         heterogeneity that is possibly correlated with the explanatory        Section 6 summarizesthepolicyimplicationsofourwork.
         variables?
            Wagner (2003) analyzes a large panel data set of firms to
         explain the export-sales ratio as a function of firm size. Wagner     2. Models and quantities of interest for strictly exogenous
         explicitly includes firm-specific intercepts in the fractional logit  explanatoryvariables
         model, a strategy suggested by Hardin and Hilbe (2007) when
         one observes the entire population (as in Wagner’s case, because         We assume that a random sample in the cross section is
         he observes all firms in an industry). Generally, while including     available, and that wehaveavailableT observations,t = 1,...,T,
         dummies for each cross section observation allows unobserved          for each randomdrawi.Forcross-sectionalobservationiandtime
         heterogeneity to enter in a flexible way, it suffers from an          period t, the response variable is yit, 0 ≤ yit ≤ 1, where outcomes
         incidental parameters problem under random sampling when T            at the endpoints, zero and one, are allowed. (In fact, yit could be
         (the numberoftimeperiods)issmallandN (thenumberofcross-               a binary response.) For a set of explanatory variables xit, a 1 × K
         sectional observations) is large. In particular, with fixed T, the    vector, we assume
         estimatorsofthefixedeffectsareinconsistentasN → ∞,andthis
         inconsistency transmits itself to the coefficients on the common      E(y |x ,c ) = Φ(x β +c),       t = 1,...,T,                  (2.1)
                                                                                  it  it i        it     i
                                                                  L.E. Papke, J.M. Wooldridge / Journal of Econometrics 145 (2008) 121–133                                                123
               where Φ(·) is the standard normal cumulative distribution                                 appears in D(c |x ) as a way of conserving on degrees-of-freedom.
                                                                                                                            i  i
               function (cdf). Assumption (2.1) is a convenient functional form                          But an unrestricted Chamberlain (1980) device, where we allow
               assumption. Specifically, the conditional expectation is assumed                          eachx tohaveaseparatevectorofcoefficients,isalsopossible.
                                                                                                                 it
               to be of the index form, where the unobserved effect, ci, appears                             Another way to relax (2.6) would be to allow for het-
               additively inside the standard normal cdf, Φ(·).                                          eroskedasticity, with a convenient specification being Var(c |x ) =
                                                                                                                            2                                                        i  i
                                                                                                                  ¯                 ¯
                   The use of Φ in (2.1) deserves comment. In Papke and                                  Var(c |x ) = σ exp(x λ).AsshowninWooldridge(2002,Problem
                                                                                                                i  i       a         i
               Wooldridge (1996), we allowed a general function G(·) in place                            15.18), the APEs are still identified, and a similar argument works
               of Φ(·) but then, for our application to pension plan participation                       here as well. The normality assumption can be relaxed, too, at the
               rates, we focused on the logistic function, Λ(z) ≡ exp(z)/[1 +                            cost of computational simplicity. More generally, one might sim-
                                                                                                                                             ¯
               exp(z)]. As we proceed it will be clear that using Λ in place of Φ                        ply assumeD(c|x) = D(c|x),which,combinedwithassumption
                                                                                                                             i  i          i  i
                                                                                                                                                          ¯
               in (2.1) causes no conceptual or theoretical difficulties; rather, the                    (2.5), applies that E(yit|xi) = E(yit|xit,xi). Then, we coulduseflex-
               probit function leads to computationally simple estimators in the                         ible models for the latter expectation. In this paper, we focus on
               presenceofunobservedheterogeneityorendogenousexplanatory                                  (2.6) because it leads to especially straightforward estimation and
               variables. Therefore, in this paper we restrict our attention                             allows simple comparisonwithlinearmodels.
                                                                                                                                                                           ¯
               to Eq. (2.1).                                                                                 Forsomepurposes,itisusefultowritec = ψ+xξ+a,where
                                                                                                                                   2                   2        i           i      i
                   Because Φ is strictly monotonic, the elements of β give                               a|x ∼ Normal(0,σ ).(Notethatσ = Var(c|x),theconditional
                                                                                                           i  i                    a                   a            i  i
               the directions of the partial effects. For example, dropping the                          varianceofc .)Naturally,ifweincludetime-perioddummiesinx ,
                                                                                                                         i                                                                 it
               observation index i, if xtj is continuous, then                                           asisusuallydesirable,wedonotincludethetimeaveragesofthese
                                                                                                             ¯
                                                                                                         in x .
               ∂E(y |x ,c)                                                                                    i
                     t   t     =βφ(xβ+c).                                                    (2.2)           Assumptions (2.1), (2.5) and (2.6) impose no additional
                    ∂x               j    t                                                              distributional assumptions on D(y |x ,c ), and they place no
                       tj                                                                                                                               it  i   i
               For discrete changes in one or more of the explanatory variables,                         restrictions on the serial dependence in {yit} across time.
               wecompute                                                                                 Nevertheless, the elements of β are easily shown to be identified
                                                                                                         uptoapositive scale factor, and the APEs are identified. A simple
                    (1)                  (0)                                                             waytoseethisistowrite
               Φ(x β+c)−Φ(x β+c),                                                            (2.3)
                    t                    t
                         (0)        (1)                                                                                                        ¯
                                                                                                         E(y |x ,a ) = Φ(ψ +x β +xξ +a)                                                 (2.7)
               wherex        andx       are twodifferent values of the covariates.                            it  i  i                 it        i      i
                         t          t
                   Eqs. (2.2) and (2.3) reveal that the partial effects depend on the                    andso
               level of covariates and the unobserved heterogeneity. Because xt                                                                ¯
                                                                                                         E(y |x ) = E[Φ(ψ +x β+xξ+a)|x]
               is observed, we have a pretty good idea about interesting values                               it  i                    it       i       i   i
               to plug in. Or, we can always average the partial effects across the                                                          ¯              2 1/2
                                                                                                                     =Φ[(ψ+x β+xξ)/(1+σ ) ]                                             (2.8)
               sample{x : i = 1,...,N}onx .Butc isnotobserved.Apopular                                                               it       i             a
                            it                          t                                                or
               measureoftheimportanceoftheobservedcovariatesistoaverage
               thepartialeffectsacrossthedistributionofc,toobtaintheaverage                                                                   ¯
                                                                                                         E(y |x ) ≡ Φ(ψ +x β +xξ ),                                                     (2.9)
               partial effects. For example, in the continuous case, the APE with                             it  i           a      it a      i a
               respect to x , evaluated at x , is                                                        where the subscript a denotes division of the original coefficient
                              tj                  t                                                                      2 1/2
                                                                                                         by (1 + σa)           . The second equality in (2.8) follows from a
               E [β φ(x β +c)] = β E [φ(x β +c)],                                            (2.4)       well-known mixing property of the normal distribution. (See,
                c   j     t                j c      t
               which depends on x (and, of course, β) but not on c. Similarly,                           for example, Wooldridge (2002, Section 15.8.2) in the case of
                                          t                                                              binary response; the argument is essentially the same.) Because
               we get APEs for discrete changes by averaging (2.3) across the                                                                                 ¯
                                                                                                         weobservearandomsampleon(y ,x ,x),(2.9)impliesthatthe
               distribution of c.                                                                                                                     it   it  i
                                                                                                         scaled coefficients, ψ ,β , and ξ are identified, provided there
                   Withoutfurtherassumptions,neitherβnortheAPEsareknown                                                              a    a         a
               to be identified. In this section, we add two assumptions to (2.1).                       are no perfect linear relationships among the elements of xit and
                                                                                                         that there is some time variation in all elements of x . (The latter
               The first concerns the exogeneity of {xit : t = 1,...,T}. We                                                                                                  it
                                                                                                                                                       ¯
               assume that, conditional on c , {x              :  t  = 1,...,T} is strictly              requirement ensures that xit and xi are not perfectly collinear
                                                      i    it                                            for all t.) In addition, it follows from the same arguments in
               exogenous:                                                                                Wooldridge (2002, Section 15.8.2) that the average partial effects
               E(y |x ,c ) = E(y |x ,c ),           t = 1,...,T,                             (2.5)       canbeobtainedbydifferentiatingordifferencing
                   it  i   i         it  it   i
               wherex ≡(x ,...,x )isthesetofcovariatesinalltimeperiods.                                                            ¯
                         i       i1         iT                                                           E¯ [Φ(ψ +x β +xξ )],                                                         (2.10)
                                                                                                           x        a     t  a      i a
               Assumption (2.5) is common in unobserved effects panel data                                  i
               models, but it rules out lagged dependent variables in xit, as well                       withrespecttotheelementsofxt.But,bythelawoflargenumbers,
               asotherexplanatoryvariablesthatmayreacttopastchangesinyit.                                (2.10) is consistently estimated by
               Plus, it rules out traditional simultaneity and correlation between                               N
               time-varying omitted variables and the covariates.                                          −1X                           ¯
                                                                                                         N          Φ(ψ +xβ +xξ ).                                                    (2.11)
                   Wealsoneedtorestrict the distribution of c given x in some                                             a      t  a     i  a
                                                                            i          i                        i=1
               way. While semiparametric methods are possible, in this paper                             Therefore, given consistent estimators of the scaled parameters,
               weproposeaconditionalnormalityassumption,asinChamberlain                                  wecanplugtheminto(2.11)andconsistentlyestimatetheAPEs.
               (1980):                                                                                       Before turning to estimation strategies, it is important to
                                                             ¯       2                                   understand why we do not replace (2.1) with the logistic
               c |(x ,x ,...,x ) ∼ Normal(ψ +xξ,σ ),                                         (2.6)
                i   i1   i2         iT                         i    a                                    function E(y |x ,c ) = Λ(x β + c) and try to eliminate c
                                  −1PT                                                                                    it  it  i              it        i                                 i
                        ¯
               where xi ≡ T              t=1 xit is the 1 × K vector of time averages.                   by using conditional logit estimation (often called fixed effects
               As we will see, (2.6) leads to straightforward estimation of the                          logit). As discussed by Wooldridge (2002, Section 15.8.3), the
               parameters βj up to a common scale factor, as well as consistent                          logit conditional MLE is not known to be consistent unless the
                                                                                             ¯
               estimators of the APEs. Adding nonlinear functions in x in                                responsevariableisbinaryand,inadditiontothestrictexogeneity
                                                                                              i
               the conditional mean, such as squares and cross products, is                              assumption (2.5), the yit,t               = 1,...,T, are independent
               straightforward. It is convenient to assume only the time average                         conditional on (x ,c ). Therefore, even if the y                     were binary
                                                                                                                                 i   i                                    it
                    124                                                                                         L.E. Papke, J.M. Wooldridge / Journal of Econometrics 145 (2008) 121–133
                    responses, we would not necessarily want to use conditional logit                                                                                                  underassumptions(2.1),(2.5)and(2.6)only.Nevertheless,wecan
                    toestimateβbecauseserialdependenceisoftenanissueevenafter                                                                                                          gain some efficiency by exploiting serial dependence in a robust
                    accounting for c . Plus, even if we could estimate β, we would not                                                                                                 way.
                                                         i
                    be able to estimate the average partial effects, or partial effects                                                                                                       Multivariate weighted nonlinear least squares (MWNLS) is
                    at interesting values of c. For all of these reasons, we follow the                                                                                                ideallysuitedforestimatingconditionalmeansforpaneldatawith
                    approachofspecifyingD(ci|xi)andexploitingnormality.                                                                                                                strictly exogenous regressors in the presence of serial correlation
                                                                                                                                                                                       andheteroskedasticity. What we require is a parametric model of
                    3. Estimationmethodsunderstrictexogeneity                                                                                                                          Var(y |x ), where y is the T × 1 vector of responses. The model
                                                                                                                                                                                                   i     i                         i
                                                                                                                                                                                       in (3.1) is sensible for the conditional variances, but obtaining
                           Given (2.9), there are many consistent estimators of the scaled                                                                                             the covariances Cov(y ,y |x ) is difficult, if not impossible, even
                                                                                                                                                                                                                                          it      ir      i
                                                                                                                                        ¯                                              if Var(y |x ,c ) has a fairly simple form (such as being diagonal).
                    parameters. For simplicity, define w                                                       ≡ (1,x ,x), a 1 × (1 +                                                                   i     i      i
                                                                                                         it                       it      i
                    2K) vector, and let θ ≡ (ψ ,β′,ξ′)′. One simple estimator of                                                                                                       Therefore, rather than attempting to find Var(y |x ), we use
                                                                                         a       a       a                                                                                                                                                                                                    i     i
                    θ is the pooled nonlinear least squares (PNLS) estimator with                                                                                                      a ‘‘working’’ version of this variance, which we expect to be
                    regressionfunctionΦ(w θ).ThePNLSestimator,whileconsistent                                                                                                          misspecified for Var(y |x ). This is the approach underlying the
                              √                                              it                                                                                                                                                            i     i
                    and            N-asymptotically normal (with fixed T), is almost certainly                                                                                         generalized estimating equation (GEE) literature when applied to
                    inefficient, for two reasons. First, Var(yit|xi) is probably not                                                                                                   panel data, as described in Liang and Zeger (1986). In the current
                    homoskedasticbecauseofthefractionalnatureofyit.Onepossible                                                                                                         context we apply this approach after having modeled D(ci|xi) to
                    alternative is to model Var(y |x ) and then to use weighted least                                                                                                  arrive at the conditional mean in (2.9).
                                                                                       it      i                                                                                              It is important to understand that the GEE and MWNLS are
                    squares. In some cases – see Papke and Wooldridge (1996) for the
                    cross-sectional case – the conditional variance can be shown to be                                                                                                 asymptotically equivalent whenever they use the same estimates
                                                                                                                                                                                       of the matrix Var(y |x ). In other words, GEE is quite familiar
                    Var(y |x ) = τ2Φ(w θ)[1−Φ(w θ)],                                                                                                             (3.1)                                                                i     i
                                 it     i                           it                             it                                                                                  to economists once we allow the model of Var(y|x) to be
                                                                                                                                                                                                                                                                                                                    i     i
                    where0 < τ2 ≤ 1.Under(2.9)and(3.1), a natural estimator of θ                                                                                                       misspecified. To this end, let V(xi,γ) be a T × T positive definite
                    is a pooled weighted nonlinear least squares (PWNLS) estimator,                                                                                                    matrix, which depends on a vector of parameters, γ, and on the
                    where PNLS would be used in the first stage to estimate the                                                                                                        entire history of the explanatory variables. Let m(xi,θ) denote the
                    weights. But an even simpler estimator avoids the two-step                                                                                                         conditional mean function for the vector y . Because we assume
                                                                                                                                                                                                                                                                                           i
                    estimationandisasymptoticallyequivalenttoPWNLS:thepooled                                                                                                           the mean function is correctly specified, let θo denote the value
                    Bernoulliquasi-MLE(QMLE),whichisobtainedmymaximizingthe                                                                                                            such that E(y |x ) = m(x,θ ). In order to apply MWNLS, we
                                                                                                                                                                                                                       i     i                         i      o
                    pooled probit log-likelihood. We call this the ‘‘pooled fractional                                                                                                 need to estimate the variance parameters. However, because this
                    probit’’ (PFP) estimator. The PFP estimator is trivial to obtain in                                                                                                variancematrixisnotassumedtobecorrectlyspecified,wesimply
                    econometrics packages that support standard probit estimation                                                                                                                                                                   ˆ                                                               √
                                                                                                                                                                                       assumethattheestimatorγ converges,atthestandard                                                                                   Nrate,to
                    – provided, that is, the program allows for nonbinary response                                                                                                     somevalue,sayγ∗.Inotherwords,γ∗ isdefinedastheprobability
                                                                                                                                                                     ¯
                    variables. The explanatory variables are specified as (1,x ,x ).                                                                                                                        ˆ
                                                                                                                                                               it      i               limit of γ (which exists quite generally) and then we assume
                    Typically, a generalized linear models command is available, as                                                                                                                                           √ ˆ                     ∗
                                       r                                                                                                                                               additionallythat                            N(γ−γ )isboundedinprobability.Thisholds
                    in Stata
. In applying the Bernoulli QMLE, one needs to adjust                                                                                                     in regular parametric settings.
                    the standard errors and test statistics to allow for arbitrary serial                                                                                                     Because (3.1) is a sensible variance assumption, we follow the
                    dependence across t. The standard errors that are robust to                                                                                                        GEE literature and specify a ‘‘working correlation matrix’’. The
                    violations of (3.1) but assume serial independence are likely to                                                                                                   mostconvenientworkingcorrelationstructures,andthosethatare
                    be off substantially; most of the time, they would tend to be too                                                                                                  programmed in popular software packages, assume correlations
                    small. Typically, standard errors and test statistics computed to be                                                                                               that are not a function of x . For our purposes, we focus on a
                    robust to serial dependence are also robust to arbitrary violations                                                                                                                                                                   i
                                                                                                                                                    r                                  particular correlation matrix that is well-suited for panel data
                                                                                                                                                   

                    of (3.1), as they should be. (The ‘‘cluster’’ option in Stata                                                                       is a good                      applications with small T. In the GEE literature, it is called
                    example.)                                                                                                                                                          an ‘‘exchangeable’’ correlation pattern, where we act as if the
                           A test of independence between the unobserved effect and x
                                                                                                                                                                          i            standardized errors have a constant correlation. To be precise,
                    is easily obtained as a test of H0 : ξa = 0. Naturally, it is best to                                                                                              define, for each i, the errors as
                    makethistest fully robust to serial correlation and a misspecified
                    conditional variance.                                                                                                                                              u ≡y −E(y |x)=y −m(x,θ ), t =1,...,T,                                                                                                       (3.2)
                                                                                                                                                                                         it            it               it      i            it             t     i      o
                           Because the PFP estimator ignores the serial dependence                                                                                                     where, in our application, m (x ,θ) = Φ(w θ) = Φ(ψ +
                    in the joint distribution D(yi1,...,yiT|xi) – which is likely                                                                                                                                                                           t      i                                it                               a
                                                                                                                                                                                                            ¯
                                                                                                                                                                                       x β + xξ ). Generally, especially if y                                                             is not an unbounded,
                    to be substantial even after conditioning on x – it can be                                                                                                           it     a             i   a                                                                  it
                                                                                                                                          i                                            continuous variable, the conditional correlations, Corr(u ,u |x ),
                    inefficient compared with estimation methods that exploit the                                                                                                                                                                                                                                         it      is     i
                                                                                                                                                                                       are a function of x . Even if they were not a function of x , they
                    serial dependence. Yet modeling D(yi1,...,yiT|xi) and applying                                                                                                                                                 i                                                                                           i
                    maximum likelihood methods, while possible, is hardly trivial,                                                                                                     wouldgenerallydependon(t,s).Asimple‘‘working’’assumption
                                                                                                                                                                                       is that the correlations do not depend on x and, in fact, are the
                    especially for fractional responses that can have outcomes at the                                                                                                                                                                                                        i
                    endpoints. Aside from computational difficulties, a full maximum                                                                                                   same for all (t,s) pairs. In the context of a linear model, this
                    likelihood estimator would produce nonrobust estimators of the                                                                                                     working assumption is identical to the standard assumption on
                    parameters of the conditional mean and the APEs. In other words,                                                                                                   thecorrelationmatrixinaso-calledrandomeffectsmodel;see,for
                    if our model for D(y ,...,y |x ) is misspecified but E(y |x ) is                                                                                                   example,Wooldridge(2002,Chapter10).
                                                                    i1                  iT      i                                                          it      i                          If we believe the variance assumption (3.1), it makes sense to
                    correctlyspecified,theMLEwillbeinconsistentfortheconditional                                                                                                       define standardized errors as
                    mean parameters and resulting APEs. (Loudermilk (2007) uses                                                                                                                             p
                    a two-limit Tobit model in the case where a lagged dependent                                                                                                       e ≡u / Φ(w θ )[1−Φ(w θ )];                                                                                                                  (3.3)
                    variableisincludedamongtheregressors.Insuchcases,afulljoint                                                                                                          it           it                    it   o                           it   o
                    distributional assumption is very difficult to relax. The two-limit                                                                                                under (3.1), Var(eit|xi)                                        = τ2. Then, the exchangeability
                    Tobit model is ill-suited for our application because, although our                                                                                                assumption is that the pairwise correlations between pairs of
                    response variable is bounded from below by zero, there are no                                                                                                      standardized errors are constant, say ρ. Remember, this is a
                    observations at zero.) Our goal is to obtain consistent estimators                                                                                                 workingassumptionthatleadstoanestimatedvariancematrixto
The words contained in this file might help you see if this file matches what you are looking for:

...Journal of econometrics contents lists available at sciencedirect homepage www elsevier com locate jeconom paneldatamethodsforfractionalresponsevariableswithanapplicationtotest passrates leslie e papke jeffrey m wooldridge departmentofeconomics michiganstateuniversity eastlansing mi unitedstates a r t i c l n f o b s article history we revisit the effects spending on student performance using data from state michigan online june in addition to exploiting dramatic change funding mid and subsequent nonsmooth jel classification changes propose nonlinear panel models that recognize bounded nature pass rate importantly show how estimate average partial which can be compared across many different linear under assumptions estimated methods wefindthatspendinghasnontrivialandstatisticallysignificanteffects althoughthe keywords diminishingeffect is not especially pronounced fractional response elsevierb v allrightsreserved paneldata unobservedeffects probit executivesummary todistrictspending in...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area