156x Filetype PDF File size 1.04 MB Source: people.stern.nyu.edu
Journal of Econometrics 145 (2008) 121–133 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom Paneldatamethodsforfractionalresponsevariableswithanapplicationtotest passrates Leslie E. Papke, Jeffrey M. Wooldridge∗ DepartmentofEconomics,MichiganStateUniversity,EastLansing,MI48824-1038,UnitedStates a r t i c l e i n f o a b s t r a c t Article history: We revisit the effects of spending on student performance using data from the state of Michigan. Available online 19 June 2008 In addition to exploiting a dramatic change in funding in the mid-1990s and subsequent nonsmooth JEL classification: changes, we propose nonlinear panel data models that recognize the bounded nature of the pass C23 rate. Importantly, we show how to estimate average partial effects, which can be compared across C25 many different models (linear and nonlinear) under different assumptions and estimated using many different methods. Wefindthatspendinghasnontrivialandstatisticallysignificanteffects,althoughthe Keywords: diminishingeffect is not especially pronounced. Fractional response ©2008ElsevierB.V.Allrightsreserved. Paneldata Unobservedeffects Probit Partial effects Executivesummary todistrictspending.Insomespecifications,wealsoallowspending to be correlated with time-varying unobserved inputs, such as the Determining the effects of school inputs on student perfor- average quality of the students in a particular grade, or parental manceintheUnitedStatesisanimportantpolicyissue.Discussions effort. ofincreasedfundingforK-12education,aswellastheimplications Usingdataonafourth-grademathtestforMichiganfrom1992 for equalized funding across schools, rely on benefits measured in through 2001, which includes significant changes in funding that termsofstudentperformance.Inmanystates,includingMichigan, resulted from Proposal A, we use a probit functional form for success is measured – and reported widely in the press – in terms the mean response to impose a bounded effect of spending on of pass rates on statewide standardized tests. Because pass rates, pass rates. Given a 10% increase in four-year averaged spending, whenmeasuredasaproportion,arenecessarilyboundedbetween the estimated average effect on the pass rate varies from about zeroandone,standardlinearmodelsmaynotprovideanaccurate three to six percentage points, with the higher estimate occurring picture of the effects of spending on pass rates throughout the en- whenspendingisallowedtobecorrelatedwithunobservedtime- tire distribution of spending. In particular, if pass rates depend on varying inputs. In the latter case, as spending varies from the 5th spending,therelationshipmustbebounded–otherwise,passrates percentile to the 95th percentile, the estimated effect on the pass are eventually predicted to be greater than one. rate falls by roughly three percentage points — a nontrivial but Some of the most convincing studies examining the link not overwhelming change. The estimate for the linear model lies betweenstudentachievementandspendinghaveusedpaneldata, between the marginal effects at the extreme values of spending. particularly when the time periods straddle a policy change that Therefore, the linear approximation does a good job in estimating induces (arguably) exogenous variation in spending. Yet standard theaverageeffectofspendingonpassrates,eventhoughitmisses linear panel data models are not well suited to pass rates because someofthenonlineareffectsatmoreextremespendinglevels. it is difficult to impose a positive yet bounded effect of spending on pass rates. In this paper, we extend our earlier work on 1. Introduction fractionalresponsemodelsforcrosssectiondatatopaneldata.We allow unobserved time-constant district effects – which capture In 1994, voters in Michigan passed Proposal A, which led to historicaldifferencesamongdistricts–tobesystematicallyrelated major changes in the way K-12 education is financed. The system went from one largely based on local property tax revenues to funding at the state level, supported primarily by an increase ∗ Correspondingauthor.Tel.:+15173535972;fax:+15174321068. in the sales tax rate. One consequence of this change is that E-mail address: wooldri1@msu.edu (J.M. Wooldridge). the lowest spending districts were provided with a foundation 0304-4076/$–seefrontmatter©2008ElsevierB.V.Allrightsreserved. doi:10.1016/j.jeconom.2008.05.009 122 L.E. Papke, J.M. Wooldridge / Journal of Econometrics 145 (2008) 121–133 allowance significantly above their previous per-student funding. slopecoefficients.Thestatisticalpropertiesofparameterestimates As described in Papke (2005), the change in funding resulted in a and partial effects of so-called ‘‘fixed effects fractional logit’’ are naturalexperimentthatcanbeusedtomorepreciselyestimatethe largely unknown with small T. (Hausman and Leonard (1997) effects of per-student spending on student performance. includeteam‘‘fixedeffects’’intheiranalysis,buttheseparameters Papke (2005) used building-level panel data, for 1993 through can be estimated with precision because Hausman and Leonard 1998, and found nontrivial effects of spending on the pass rate have many telecasts per team. Therefore, there is no incidental onastatewidefourth-grademathtest.Onepotentialdrawbackof parametersproblemintheHausmanandLeonardsetup.) Papke’sanalysisisthatsheusedlinearfunctionalformsinherfixed In this paper we extend our earlier work and show how to effects and instrumental variables fixed effects analyses, which specify, and estimate, fractional response models for panel data ignore the bounded nature of a pass rate (either a percentage with a large cross-sectional dimension and relatively few time or a proportion). Papke did split the sample into districts that periods. We explicitly allow for time-constant unobserved effects initially were performing below the median and those performing that can be correlated with explanatory variables. We cover two above the median, and found very different effects. But such cases. The first is when, conditional on an unobserved effect, sample splitting is necessarily arbitrary and begs the question the explanatory variables are strictly exogenous. We then relax as to whether linear functional forms adequately capture the the strict exogeneity assumption when instrumental variables are diminishingeffects of spending at already high levels of spending. available. Empirical studies attempting to explain fractional responses Rather than treating the unobserved effects as parameters haveproliferatedinrecentyears.Justafewexamplesoffractional to estimate, we employ the Mundlak (1978) and Chamberlain responses include pension plan participation rates, industry (1980)deviceofmodelingthedistributionoftheunobservedeffect marketshares,television ratings, fraction of land area allocated to conditional on the strictly exogenous variables. To accommodate agriculture, and test pass rates. Researchers have begun to take this approach, we exploit features of the normal distribution. seriously the functional form issues that arise with a fractional Therefore, unlike in our early work, where we focused mainly response: a linear functional form for the conditional mean might on the logistic response function, here we use a probit response miss important nonlinearities. Further, the traditional solution of function. In binary response contexts, the choice between the usingthelog-oddstransformationobviouslyfailswhenweobserve logistic and probit conditional mean functions for the structural responses at the corners, zero and one. Just as importantly, even expectation is largely a matter of taste, although it has long been in cases where the variable is strictly inside the unit interval, recognized that, for handling endogenous explanatory variables, we cannot recover the expected value of the fractional response theprobitmeanfunctionhassomedistinctadvantages.Wefurther from a linear model for the log-odds ratio unless we make strong exploit those advantages for panel data models in this paper. As independenceassumptions. we will see, the probit response function results in very simple In Papke and Wooldridge (1996), we proposed direct models estimation methods. While our focus is on fractional responses, for the conditional mean of the fractional response that keep the our methods apply to the binary response case with a continuous predicted values in the unit interval. We applied the method of endogenousexplanatoryvariableandunobservedheterogeneity. quasi-maximum likelihood estimation (QMLE) to obtain robust An important feature of our work is that we provide simple estimators of the conditional mean parameters with satisfactory estimates of the partial effects averaged across the population efficiency properties. The most common of those methods, where – sometimes called the ‘‘average partial effects’’ (APEs) or the mean function takes the logistic form, has since been ‘‘population averaged effects’’. These turn out to be identified applied in numerous empirical studies, including Hausman and under no assumptions on the serial dependence in the response Leonard (1997), Liu et al. (1999), and Wagner (2001). (In a variable, and the suspected endogenous explanatory variable is private communication shortly after the publication of Papke and allowed to arbitrarily correlate with unobserved shocks in other r Wooldridge(1996),inwhichhekindlyprovidedStata code,John timeperiods. Mullahy dubbed the method of quasi-MLE with a logistic mean The rest of the paper is organized as follows. Section 2 function ‘‘fractional logit’’, or ‘‘flogit’’ for short.) introduces the model and assumptions for the case of strictly Hausman and Leonard (1997) applied fractional logit to panel exogenous explanatory variables, and shows how to identify the dataontelevisionratingsofNationalBasketballAssociationgames APEs. Section 3 discusses estimation methods, including pooled to estimate the effects of superstars on telecast ratings. In using QMLE and an extension of the generalized estimating equation pooled QMLE with panel data, the only extra complication is in (GEE)approach.Section4relaxesthestrictexogeneityassumption, ensuring that the standard errors are robust to arbitrary serial and shows how control function methods can be combined correlation (in addition to misspecification of the conditional with the Mundlak–Chamberlain device to produce consistent variance).Butamoresubstantiveissueariseswithpaneldataanda estimators. Section 5 applies the new methods to estimate the nonlinear response function: How can we account for unobserved effects of spending on math test pass rates for Michigan, and heterogeneity that is possibly correlated with the explanatory Section 6 summarizesthepolicyimplicationsofourwork. variables? Wagner (2003) analyzes a large panel data set of firms to explain the export-sales ratio as a function of firm size. Wagner 2. Models and quantities of interest for strictly exogenous explicitly includes firm-specific intercepts in the fractional logit explanatoryvariables model, a strategy suggested by Hardin and Hilbe (2007) when one observes the entire population (as in Wagner’s case, because We assume that a random sample in the cross section is he observes all firms in an industry). Generally, while including available, and that wehaveavailableT observations,t = 1,...,T, dummies for each cross section observation allows unobserved for each randomdrawi.Forcross-sectionalobservationiandtime heterogeneity to enter in a flexible way, it suffers from an period t, the response variable is yit, 0 ≤ yit ≤ 1, where outcomes incidental parameters problem under random sampling when T at the endpoints, zero and one, are allowed. (In fact, yit could be (the numberoftimeperiods)issmallandN (thenumberofcross- a binary response.) For a set of explanatory variables xit, a 1 × K sectional observations) is large. In particular, with fixed T, the vector, we assume estimatorsofthefixedeffectsareinconsistentasN → ∞,andthis inconsistency transmits itself to the coefficients on the common E(y |x ,c ) = Φ(x β +c), t = 1,...,T, (2.1) it it i it i L.E. Papke, J.M. Wooldridge / Journal of Econometrics 145 (2008) 121–133 123 where Φ(·) is the standard normal cumulative distribution appears in D(c |x ) as a way of conserving on degrees-of-freedom. i i function (cdf). Assumption (2.1) is a convenient functional form But an unrestricted Chamberlain (1980) device, where we allow assumption. Specifically, the conditional expectation is assumed eachx tohaveaseparatevectorofcoefficients,isalsopossible. it to be of the index form, where the unobserved effect, ci, appears Another way to relax (2.6) would be to allow for het- additively inside the standard normal cdf, Φ(·). eroskedasticity, with a convenient specification being Var(c |x ) = 2 i i ¯ ¯ The use of Φ in (2.1) deserves comment. In Papke and Var(c |x ) = σ exp(x λ).AsshowninWooldridge(2002,Problem i i a i Wooldridge (1996), we allowed a general function G(·) in place 15.18), the APEs are still identified, and a similar argument works of Φ(·) but then, for our application to pension plan participation here as well. The normality assumption can be relaxed, too, at the rates, we focused on the logistic function, Λ(z) ≡ exp(z)/[1 + cost of computational simplicity. More generally, one might sim- ¯ exp(z)]. As we proceed it will be clear that using Λ in place of Φ ply assumeD(c|x) = D(c|x),which,combinedwithassumption i i i i ¯ in (2.1) causes no conceptual or theoretical difficulties; rather, the (2.5), applies that E(yit|xi) = E(yit|xit,xi). Then, we coulduseflex- probit function leads to computationally simple estimators in the ible models for the latter expectation. In this paper, we focus on presenceofunobservedheterogeneityorendogenousexplanatory (2.6) because it leads to especially straightforward estimation and variables. Therefore, in this paper we restrict our attention allows simple comparisonwithlinearmodels. ¯ to Eq. (2.1). Forsomepurposes,itisusefultowritec = ψ+xξ+a,where 2 2 i i i Because Φ is strictly monotonic, the elements of β give a|x ∼ Normal(0,σ ).(Notethatσ = Var(c|x),theconditional i i a a i i the directions of the partial effects. For example, dropping the varianceofc .)Naturally,ifweincludetime-perioddummiesinx , i it observation index i, if xtj is continuous, then asisusuallydesirable,wedonotincludethetimeaveragesofthese ¯ in x . ∂E(y |x ,c) i t t =βφ(xβ+c). (2.2) Assumptions (2.1), (2.5) and (2.6) impose no additional ∂x j t distributional assumptions on D(y |x ,c ), and they place no tj it i i For discrete changes in one or more of the explanatory variables, restrictions on the serial dependence in {yit} across time. wecompute Nevertheless, the elements of β are easily shown to be identified uptoapositive scale factor, and the APEs are identified. A simple (1) (0) waytoseethisistowrite Φ(x β+c)−Φ(x β+c), (2.3) t t (0) (1) ¯ E(y |x ,a ) = Φ(ψ +x β +xξ +a) (2.7) wherex andx are twodifferent values of the covariates. it i i it i i t t Eqs. (2.2) and (2.3) reveal that the partial effects depend on the andso level of covariates and the unobserved heterogeneity. Because xt ¯ E(y |x ) = E[Φ(ψ +x β+xξ+a)|x] is observed, we have a pretty good idea about interesting values it i it i i i to plug in. Or, we can always average the partial effects across the ¯ 2 1/2 =Φ[(ψ+x β+xξ)/(1+σ ) ] (2.8) sample{x : i = 1,...,N}onx .Butc isnotobserved.Apopular it i a it t or measureoftheimportanceoftheobservedcovariatesistoaverage thepartialeffectsacrossthedistributionofc,toobtaintheaverage ¯ E(y |x ) ≡ Φ(ψ +x β +xξ ), (2.9) partial effects. For example, in the continuous case, the APE with it i a it a i a respect to x , evaluated at x , is where the subscript a denotes division of the original coefficient tj t 2 1/2 by (1 + σa) . The second equality in (2.8) follows from a E [β φ(x β +c)] = β E [φ(x β +c)], (2.4) well-known mixing property of the normal distribution. (See, c j t j c t which depends on x (and, of course, β) but not on c. Similarly, for example, Wooldridge (2002, Section 15.8.2) in the case of t binary response; the argument is essentially the same.) Because we get APEs for discrete changes by averaging (2.3) across the ¯ weobservearandomsampleon(y ,x ,x),(2.9)impliesthatthe distribution of c. it it i scaled coefficients, ψ ,β , and ξ are identified, provided there Withoutfurtherassumptions,neitherβnortheAPEsareknown a a a to be identified. In this section, we add two assumptions to (2.1). are no perfect linear relationships among the elements of xit and that there is some time variation in all elements of x . (The latter The first concerns the exogeneity of {xit : t = 1,...,T}. We it ¯ assume that, conditional on c , {x : t = 1,...,T} is strictly requirement ensures that xit and xi are not perfectly collinear i it for all t.) In addition, it follows from the same arguments in exogenous: Wooldridge (2002, Section 15.8.2) that the average partial effects E(y |x ,c ) = E(y |x ,c ), t = 1,...,T, (2.5) canbeobtainedbydifferentiatingordifferencing it i i it it i wherex ≡(x ,...,x )isthesetofcovariatesinalltimeperiods. ¯ i i1 iT E¯ [Φ(ψ +x β +xξ )], (2.10) x a t a i a Assumption (2.5) is common in unobserved effects panel data i models, but it rules out lagged dependent variables in xit, as well withrespecttotheelementsofxt.But,bythelawoflargenumbers, asotherexplanatoryvariablesthatmayreacttopastchangesinyit. (2.10) is consistently estimated by Plus, it rules out traditional simultaneity and correlation between N time-varying omitted variables and the covariates. −1X ¯ N Φ(ψ +xβ +xξ ). (2.11) Wealsoneedtorestrict the distribution of c given x in some a t a i a i i i=1 way. While semiparametric methods are possible, in this paper Therefore, given consistent estimators of the scaled parameters, weproposeaconditionalnormalityassumption,asinChamberlain wecanplugtheminto(2.11)andconsistentlyestimatetheAPEs. (1980): Before turning to estimation strategies, it is important to ¯ 2 understand why we do not replace (2.1) with the logistic c |(x ,x ,...,x ) ∼ Normal(ψ +xξ,σ ), (2.6) i i1 i2 iT i a function E(y |x ,c ) = Λ(x β + c) and try to eliminate c −1PT it it i it i i ¯ where xi ≡ T t=1 xit is the 1 × K vector of time averages. by using conditional logit estimation (often called fixed effects As we will see, (2.6) leads to straightforward estimation of the logit). As discussed by Wooldridge (2002, Section 15.8.3), the parameters βj up to a common scale factor, as well as consistent logit conditional MLE is not known to be consistent unless the ¯ estimators of the APEs. Adding nonlinear functions in x in responsevariableisbinaryand,inadditiontothestrictexogeneity i the conditional mean, such as squares and cross products, is assumption (2.5), the yit,t = 1,...,T, are independent straightforward. It is convenient to assume only the time average conditional on (x ,c ). Therefore, even if the y were binary i i it 124 L.E. Papke, J.M. Wooldridge / Journal of Econometrics 145 (2008) 121–133 responses, we would not necessarily want to use conditional logit underassumptions(2.1),(2.5)and(2.6)only.Nevertheless,wecan toestimateβbecauseserialdependenceisoftenanissueevenafter gain some efficiency by exploiting serial dependence in a robust accounting for c . Plus, even if we could estimate β, we would not way. i be able to estimate the average partial effects, or partial effects Multivariate weighted nonlinear least squares (MWNLS) is at interesting values of c. For all of these reasons, we follow the ideallysuitedforestimatingconditionalmeansforpaneldatawith approachofspecifyingD(ci|xi)andexploitingnormality. strictly exogenous regressors in the presence of serial correlation andheteroskedasticity. What we require is a parametric model of 3. Estimationmethodsunderstrictexogeneity Var(y |x ), where y is the T × 1 vector of responses. The model i i i in (3.1) is sensible for the conditional variances, but obtaining Given (2.9), there are many consistent estimators of the scaled the covariances Cov(y ,y |x ) is difficult, if not impossible, even it ir i ¯ if Var(y |x ,c ) has a fairly simple form (such as being diagonal). parameters. For simplicity, define w ≡ (1,x ,x), a 1 × (1 + i i i it it i 2K) vector, and let θ ≡ (ψ ,β′,ξ′)′. One simple estimator of Therefore, rather than attempting to find Var(y |x ), we use a a a i i θ is the pooled nonlinear least squares (PNLS) estimator with a ‘‘working’’ version of this variance, which we expect to be regressionfunctionΦ(w θ).ThePNLSestimator,whileconsistent misspecified for Var(y |x ). This is the approach underlying the √ it i i and N-asymptotically normal (with fixed T), is almost certainly generalized estimating equation (GEE) literature when applied to inefficient, for two reasons. First, Var(yit|xi) is probably not panel data, as described in Liang and Zeger (1986). In the current homoskedasticbecauseofthefractionalnatureofyit.Onepossible context we apply this approach after having modeled D(ci|xi) to alternative is to model Var(y |x ) and then to use weighted least arrive at the conditional mean in (2.9). it i It is important to understand that the GEE and MWNLS are squares. In some cases – see Papke and Wooldridge (1996) for the cross-sectional case – the conditional variance can be shown to be asymptotically equivalent whenever they use the same estimates of the matrix Var(y |x ). In other words, GEE is quite familiar Var(y |x ) = τ2Φ(w θ)[1−Φ(w θ)], (3.1) i i it i it it to economists once we allow the model of Var(y|x) to be i i where0 < τ2 ≤ 1.Under(2.9)and(3.1), a natural estimator of θ misspecified. To this end, let V(xi,γ) be a T × T positive definite is a pooled weighted nonlinear least squares (PWNLS) estimator, matrix, which depends on a vector of parameters, γ, and on the where PNLS would be used in the first stage to estimate the entire history of the explanatory variables. Let m(xi,θ) denote the weights. But an even simpler estimator avoids the two-step conditional mean function for the vector y . Because we assume i estimationandisasymptoticallyequivalenttoPWNLS:thepooled the mean function is correctly specified, let θo denote the value Bernoulliquasi-MLE(QMLE),whichisobtainedmymaximizingthe such that E(y |x ) = m(x,θ ). In order to apply MWNLS, we i i i o pooled probit log-likelihood. We call this the ‘‘pooled fractional need to estimate the variance parameters. However, because this probit’’ (PFP) estimator. The PFP estimator is trivial to obtain in variancematrixisnotassumedtobecorrectlyspecified,wesimply econometrics packages that support standard probit estimation ˆ √ assumethattheestimatorγ converges,atthestandard Nrate,to – provided, that is, the program allows for nonbinary response somevalue,sayγ∗.Inotherwords,γ∗ isdefinedastheprobability ¯ variables. The explanatory variables are specified as (1,x ,x ). ˆ it i limit of γ (which exists quite generally) and then we assume Typically, a generalized linear models command is available, as √ ˆ ∗ r additionallythat N(γ−γ )isboundedinprobability.Thisholds in Stata . In applying the Bernoulli QMLE, one needs to adjust in regular parametric settings. the standard errors and test statistics to allow for arbitrary serial Because (3.1) is a sensible variance assumption, we follow the dependence across t. The standard errors that are robust to GEE literature and specify a ‘‘working correlation matrix’’. The violations of (3.1) but assume serial independence are likely to mostconvenientworkingcorrelationstructures,andthosethatare be off substantially; most of the time, they would tend to be too programmed in popular software packages, assume correlations small. Typically, standard errors and test statistics computed to be that are not a function of x . For our purposes, we focus on a robust to serial dependence are also robust to arbitrary violations i r particular correlation matrix that is well-suited for panel data of (3.1), as they should be. (The ‘‘cluster’’ option in Stata is a good applications with small T. In the GEE literature, it is called example.) an ‘‘exchangeable’’ correlation pattern, where we act as if the A test of independence between the unobserved effect and x i standardized errors have a constant correlation. To be precise, is easily obtained as a test of H0 : ξa = 0. Naturally, it is best to define, for each i, the errors as makethistest fully robust to serial correlation and a misspecified conditional variance. u ≡y −E(y |x)=y −m(x,θ ), t =1,...,T, (3.2) it it it i it t i o Because the PFP estimator ignores the serial dependence where, in our application, m (x ,θ) = Φ(w θ) = Φ(ψ + in the joint distribution D(yi1,...,yiT|xi) – which is likely t i it a ¯ x β + xξ ). Generally, especially if y is not an unbounded, to be substantial even after conditioning on x – it can be it a i a it i continuous variable, the conditional correlations, Corr(u ,u |x ), inefficient compared with estimation methods that exploit the it is i are a function of x . Even if they were not a function of x , they serial dependence. Yet modeling D(yi1,...,yiT|xi) and applying i i maximum likelihood methods, while possible, is hardly trivial, wouldgenerallydependon(t,s).Asimple‘‘working’’assumption is that the correlations do not depend on x and, in fact, are the especially for fractional responses that can have outcomes at the i endpoints. Aside from computational difficulties, a full maximum same for all (t,s) pairs. In the context of a linear model, this likelihood estimator would produce nonrobust estimators of the working assumption is identical to the standard assumption on parameters of the conditional mean and the APEs. In other words, thecorrelationmatrixinaso-calledrandomeffectsmodel;see,for if our model for D(y ,...,y |x ) is misspecified but E(y |x ) is example,Wooldridge(2002,Chapter10). i1 iT i it i If we believe the variance assumption (3.1), it makes sense to correctlyspecified,theMLEwillbeinconsistentfortheconditional define standardized errors as mean parameters and resulting APEs. (Loudermilk (2007) uses p a two-limit Tobit model in the case where a lagged dependent e ≡u / Φ(w θ )[1−Φ(w θ )]; (3.3) variableisincludedamongtheregressors.Insuchcases,afulljoint it it it o it o distributional assumption is very difficult to relax. The two-limit under (3.1), Var(eit|xi) = τ2. Then, the exchangeability Tobit model is ill-suited for our application because, although our assumption is that the pairwise correlations between pairs of response variable is bounded from below by zero, there are no standardized errors are constant, say ρ. Remember, this is a observations at zero.) Our goal is to obtain consistent estimators workingassumptionthatleadstoanestimatedvariancematrixto
no reviews yet
Please Login to review.