ABSTRACT. This paper investigates forecasting accuracy of four
different hedonic approaches, when vacant urban land prices are
predicted in local markets. The investigated hedonic approaches are: 1)
ordinary least squares estimation, 2) robust MM-estimation, 3)
structural time series estimation and 4) robust local regression.
Post-sample predictive testing indicated that more accurate predictions
are obtained if the unorthodox methods of this paper are used instead of
the conventional least squares estimation. In particular, the predictive
unbiassness can significantly be improved when using the unconventional
hedonic methods of the study. The paper also studied the structure of
urban land prices. The most important attribute variables in explaining
land prices were permitted building volume, house price index, northing
and easting. The influence of parcel size variable and different
indicator variables on land prices were much weaker.
KEYWORDS: Land price; Hedonic model; Prediction; Robustness;
Flexibility
SANTRAUKA
Nagrinejama, kokiu tikslumu keturi skirtingi hedonistiniai metodai
prognozuoja laisvu zemes plotu kainas vietinese miestu rinkose.
Nagrineti tokie hedonistiniai metodai: 1) maziausiuju kvadratu metodas,
2) daugybiniq modeliu vertinimas, 3) struktoriniu laiko eiluciu
vertinimas, 4) lokaline regresine analize. Post-sample prognostinis
testas parode, kad tikslesnes prognozes gaunamos taikant netradicinius
siame darbe nurodytus metodus, o ne iprasta maziausiuju kvadratu metoda.
Taikant netradicinius hedonistinius tyrimo metodus, gali gerokai
padideti prognoziu nesaliskumas. Darbe nagrineta ir zemes kainu mieste
struktura. Aiskinant zemes kainas is budingu kintamuju svarbiausi buvo
leidziamas pastato dydis, busto kainu indeksas, sklypo padetis. Sklypo
dydzio kintamasis ir ivairiu rodikliu kintamieji zemes kainoms turejo
daug mazesne itaka.
1. INTRODUCTION
Hedonic methods are often advocated in complex land valuation
assignments in order to objectively minimise the systematic valuation
error and in order to produce the necessary quality-adjustments, which
stem from the differentiated nature of separate land parcels, validly
and reliably. However, the use of hedonic models is plagued with some
fundamental problems imposing serious threats to their empirical
adequacy. These fundamental dilemmas include: (1) the temporal
variability of land prices, (2) the spatial variability of land prices,
(3) the model specification dilemma and (4) outlying and influential
observations.
When investigating the temporal dimension of land prices it is
important to understand that the behaviour of land prices is generally
nonstationary. This is a typical characteristic of many economic time
series, which means that the data-generating process that produces the
observables is itself transient in time. The effect of time is also
multidimensional: Often we can legitimately separate from each other the
price trend, the price cycle, seasonal variation and random variation.
Traditionally, when modelling temporal land price movements, the effect
of time has been tried to reduce to the variation of cost-of-living
index or house price index, which have subsequently been used as
explanatory variables in a hedonic regression. Also the indicator
variable technique (i.e. by using yearly time dummy variables) has been
a very popular approach when analysing the temporal dimension of land
prices. These approaches contain problems mainly because the influence
of time can only be estimated in a manner, which is not very accurate in
practice. Structural time series models, on the other hand, usually
provide a more accurate description about temporal movements.
The spatial variation of land prices can be divided to the spatial
heterogeneity and spatial dependency. Spatial heterogeneity implies that
functional forms and parameters vary with location and are not
homogeneous throughout the data set, whereas spatial dependence implies
that the variation is a function of distance. The spatial dependency
problem can usually be solved by including location or some distance
variables into a hedonic regression as explanatory variables. The
spatial heterogeneity problem is usually more problematic: One natural
solution would be to narrow the analyses into reasonably small
submarkets, which homogenises the data. However, in practise this
operation is not typically feasible due to the scarcity of observations
for the hedonic modelling purposes. Adaptive modelling techniques, such
as local regression, usually provide a better solution to the spatial
heterogeneity problem in that they possess a spatial adaptation property
and thus explicitly address the spatial heterogeneity problem.
The model specification dilemma can be solved by three different
ways: (1) parametrically, (2) semiparametrically and (3)
nonparametrically. Parametric modelling is the classical approach in the
hedonic modelling of land prices, which is theory-laden because
pre-specified functional forms are used in the analysis. Nonparametric
techniques are on the other hand data-driven, very flexible tools and
semiparametric techniques combine features from parametric and
nonparametric approaches. The exact research problem determines what
approach should be used. Generally, nonparametric methods are useful
when associations between variables are complex (i.e. highly nonlinear)
and theoretically unknown. Parametric models apply well to a less
complex setting where there exists valid prior knowledge about
model's functional form. Irrespective of a chosen approach the
model specification dilemma contains the choice of a hedonic
model's functional form, the selection of relevant study variables
and an error distribution assumption. And it should be noted that the
result depends on the chosen scale, which is often, however, implicit.
Parametric models that represent data modelling culture (Breiman,
2001) have formed the conventional dogma of hedonic pricing methods in
land price studies, where prespecified global models are estimated by
means of ordinarily least squares or some modification thereof. Benefits
of parametric approaches undeniably include: simplicity,
interpretability, parsimony and comprehensive statistical theory. The
fundamental obstacle, however, under-lying the general use of parametric
models is their inflexibility, i.e. inability to learn genuine structure
about the hedonic relationship from the evidence in such decision-making
settings, where theoretically unknown nonlinearity is expected. This is
the typical case when the effects of variables representing location and
time are considered (McMillen and Thorsnes, 2003). The conventional
result is that even the best parametric model tends to impose
restrictions that substantially reduce the explanatory and predictive
power of hedonic equation (Pace, 1993 and 1995; Anglin and Gencay, 1996;
inter alia). Unless the theory-laden parametric model coincides with the
data-generating process, profound mis-specification errors may result
imposing serious threats to their empirical validity.
Semiparametric and nonparametric approaches are representative of
algorithmic modelling culture (Breiman, 2001) that emphasise aspects of
learning the complex structure from the available facts and adaptability
to the features underlying the data. Semiparametric estimators are, more
precisely, an intermediate strategy between theory-laden and data-driven
estimators that have restricted learning ability, i.e. semiparametric
estimators can approximate functions only within some prespecified
classes. Their practical relevance is mainly in balancing the dual goals
of low specification error and high efficiency (Pace, 1995; Anglin and
Gencay, 1996) and in enchaining the interpretability of results.
Nonparametric estimators are by their nature highly flexible and, thus,
capable of approximating very general classes of functions (e.g. smooth
functions, square integrable functions) that does not require any
restrictive, unwarranted prespecification of the functional form of mean
response function (nor any specific error distribution assumption). This
renders nonparametric estimators to be powerful data-driven tools,
albeit highly sensitive to the problem of undersmoothing or overfitting,
if local estimation is implemented unduly.
Outlying and influential observations are very common in the land
value studies, which may be genuine, faultless values, generated under
conditions of some untypical factors or they can contain different
errors (such as recording and measurement error; wrong population,
etc.). Traditional hedonic modelling techniques, especially the ordinary
least squares technique, are sensitive to outlying observations; even a
single outlier can drastically change the results and misguide the
inferences. In fact, a single sufficiently deviating data point can
cause that the least squares estimator breaks down and generates results
that are utterly unreliable and uninformative. Robust methods such as
MM-estimation, on the contrary, are not sensitive to outliers or
influential observations and, therefore, can tolerate a certain amount
of bad observations without the fear that the estimator breaks down and
produces completely useless results.
2. THE RESEARCH PROBLEM
In this study four different hedonic modelling approaches are
empirically compared together when urban land prices are modelled in a
local market of Espoo, Finland. The fits are analysed and post-sample
predictions are calculated across different modelling schemes. The main
research question is: "Which approach produces the most accurate
post-sample predictions with the given vacant urban land price
data?" The forecasting accuracy is perhaps the most important
operation criterion, which determines the much of the utility of the
corresponding hedonic model. Generally, for any valuation method to have
a sufficient degree of validity it must produce an accurate prediction
of the most probable market price of a land parcel.
The four different hedonic approaches that are investigated in this
paper consist of:
1) Ordinary least squares estimation.
2) Robust MM-estimation.
3) Structural time series estimation.
4) Robust local regression.
Ordinary least squares estimation and robust MM-estimation
represent parametric hedonic methods, structural time series estimation
is a semiparametric hedonic method and robust local regression is a
nonparametric hedonic method.
Post-sample predictions are analysed using six different predictive
accuracy indicators, which are: 1) mean prediction error, 2) mean
absolute percentage error, 3) mean absolute error, 4) root mean squared
error, 5) correlation coefficient and 6) gravity.
3. PREVIOUS RELATED RESEARCH'
Most of the hedonic modelling studies in land markets have been
based on the ordinary least squares estimation, yet some nonparametric
and semiparametric estimation techniques have also been applied.
However, none of the hedonic studies has been focused on the issue of
land price prediction, which is the main focus in this paper.
Shimizu and Nishimura (2007) estimated using ordinary least squares
hedonic price equations of commercial and residential land prices in
Tokyo for a 25-year period (from 1975 to 1999) and investigated possible
structural changes in these price equations. They find that the price
structure differed significantly among locations reflecting differences
in supplier pricing and end-user preferences. They also found
significant structural changes in the underlying price structure,
identifying pre-bubble, bubble and post-bubble periods.
Colwell and Munneke (2003) examined urban land prices within a
nonparametric framework using piecewise parabolic regression with
specific interest in the land price gradient with respect to distance
from the inner city. They concluded that the piecewise parabolic
regression is an amazingly flexible technique, which can be used to
represent very complex land value functions.
Clapp et al. (2001) estimated a hedonic price index equation to
determine the value of land under residential structures in Fairfax
county, Virginia at various points in time over the 1975 to 1992 time
frame. A set of three simultaneous equations explained land value
together with changes in population density and the percentage working
at home. The method of estimation was ordinary least squares. They
stressed the importance of dealing with the double simultaneity issue
and found that the land-value surface has changed dramatically over
time.
Lin and Evans (2000) investigated the relationship between the
price of land and size of plot when plots were small. They used a land
price data from the city of Taipei, Taiwan. They found that the price of
land per unit of area increases with lot size.
Colwell and Munneke (1999) investigated spatial dimension to the
concavity in the total price and parcel size relationship, when the
dataset consisted of sales of vacant residential, commercial and
industrial land in Cook County, Illinois during the time period of 1986
to 1993. They used ordinary least squares and found that concavity is
higher in the rest of Cook County than in the CBD for all three land-use
types.
Thorsnes and McMillen (1998) used a semiparametric estimator to
analyse the relationship between land values and parcel size in the
Portland, Oregon, metropolitan area. The value-size relationship was
estimated nonparametrically and a simple log-linear parametric
relationship was assumed for the rest of the model. They found that
ordinary least squares and semiparametric estimates imply similar
results: There was a concave value-size relationship meaning that
subdivision costs cause large parcels to trade at a discount.
Colwell (1998) investigated a nonparametric method, a piecewise
parabolic regression analysis, for estimating spatial land price
functions in the Chicago CBD. The independent variables were barycentric
coordinates that uniquely described the location of observations in
space. Colwell found that this nonparametric method goes a long toward
solving the problem of the spatial correlation of residuals, which
affects most hedonic models.
Atack and Margo (1998) investigated using ordinary least squares a
simple monocentric urban model of the price of vacant land in Manhattan
in the time frame of 1835 to 1900. They also found that vacant land in
Manhattan was price elastic with respect to distance from the CBD in
1845 but becomes price inelastic in the post-Civil War period.
Colwell and Munneke (1997) studied the structure of urban land
prices in Chicago using data from the sales of commercial, residential
and industrial land during the time period of 1986 to 1992. The method
of estimation was ordinary least squares (and multinomial logit
estimation to control for possible sample selection bias).They found
evidence that land prices are non-linear in nature and that land prices
are concave in parcel size.
McMillen (1996) analysed locally weighted regression in modelling
land prices in Chicago using two different data sets from 1836 to 1990.
Two parametric models were estimated: a simple monocentric model and a
more flexible spatial expansion model. These fits were compared to local
linear regression estimates, which locally estimated the spatial
expansion model. McMillen also demonstrated that local regression is
useful for both prediction as well as testing hypothesises in land
markets. McMillen summarised: "Locally weighted regression is a
useful tool for spatial modelling. Nonlinearity is handled directly and
simply".
4. HEDONIC METHODS OF THE STUDY
4.1. Ordinary least squares estimation
Ordinary least squares estimation is by far the most applied
hedonic method in practice. This is a parametric estimator where the
form of hedonic function is specified before seeing the data. The only
aspects that are determined from the data are the hedonic prices of
different attribute variables.
The conventional hedonic regression approach that is based on
ordinary least squares is an appropriate modelling context, strictly
speaking, if the interest solely focuses on the cross-sectional
variation of the hedonic prices and if the problem due to spatial
heterogeneity can adequately be addressed. When temporal aspects are
analysed with the ordinary least squares estimator, several problems are
encountered. According to Scwann (1998) the core problem in local
markets is the lack of sufficient degrees of freedom, since estimation
involves an extensive set of time-indexed dummy variables along with
other regressors, at least one for each time period. Even if the
locality of the markets imposes no dilemma, the major weakness of these
methods remains: parameters values in one period do not affect the
values of parameters in other periods (Francke and Vos, 2004).
Some nonlinear features can be accounted by the ordinary least
squares estimator e.g. by using the double-log model specification.
However, many nonlinear features are in any case omitted if the orthodox
least squares estimator is applied. As a result, the ordinary least
squares estimation produces only a coarse description about the actual
dependencies between the regressand and the regressors. Whether this
approximation is satisfactory in practise depends largely on the
predictive accuracy of the estimated hedonic model.
The time element is estimated by the OLS using house-price index
measure. The main idea is then that the temporal variability can be
reduced to that of the variability in that index. Time indexed dummy
variables were not used, on the one hand, because of high collinearity
between the different indicators and, on the other hand, because the use
of house price index variable tend to produce a better approximation to
temporal movements than by simply using time indicators.
4.2. Robust MM-estimation
The aim of robust statistics is to investigate the behaviour of
estimators, when the basic modelling assumptions (linearity, normality,
independence, etc.) are not exactly valid but are at most approximations
to reality. To put it slightly differently, the basic aims of robust
statistics are (Hampel et al., 1986, p. 11):
* To describe a structure best fitting the bulk of the data.
* To identify deviating data points (outliers) or deviating
substructures for further treatment.
* To identify and give a warning about highly influential data
points (leverage points).
* To deal with unspecified serial correlations, or more generally,
with deviations from the assumed correlation structures.
In practice the approximate nature of hedonic models is largely
result of the occurrence of gross errors, the empirical character of
models and only partial validity of theoretical modelling assumptions.
In general, the hedonic model can be considered as robust if
* It is reasonably unbiased and efficient.
* Small deviations from the hedonic model assumptions will not
substantially impair the performance of the hedonic model.
* Somewhat larger deviations will not invalidate the hedonic model
completely.
In this study a very fault tolerant and computationally intensive
method, the three-stage MM-estimation, is analysed in the hedonic
modelling of land prices. This estimator is parametric in nature, i.e.
the model structure is fixed in advance. In the first phase of the
MM-estimation is calculated a regression estimate, which is consistent
and have a high break-down point, but is not necessarily efficient. In
the second phase the scale of errors is estimated, which is based on the
residuals of the first phase. In the third phase is calculated the
M-estimate of the hedonic prices. The breakdown point the MM-estimator
is the highest, i.e. 50% of the data can be corrupted before the
estimator provides useless results.
The computational algorithm used to derive the hedonic prices is a
variant of iterative re-weighted least squares, which is applied in the
M-estimation. Iterative solution is needed because weights depend on
residuals, residuals depend on estimated hedonic prices and hedonic
prices depend on weights. Lets assume that we have an initial estimate
of hedonic prices, [[??].sub.0], and its deviation measure [s.sub.2].
Lets define the weights:
[w.sub.i]([beta]) = [[psi].sub.1]
([r.sub.i]([beta])/s)/[r.sub.i]([beta])/s. (1)
where [[psi].sub.1] is double-weighted objective function,
[r.sub.i]([beta]) are residuals and s is a measure of scale. Then lets
define:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (2)
and
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (3)
where -g([beta]) is the gradient of residuals sum of squares. Now
the applied iterative formula for the derivation of hedonic prices can
be written:
[B.sub.j+1] = [B.sub.j] + 1/[2.sub.k] [DELTA]([B.sub.j]), (4)
where [DELTA](beta]) = [M.sup.-1]([beta])g([beta]). The integer k
is chosen so that the left side of the inequality:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5)
is minimised and 0 < [delta] < 1.
The time element is estimated by the MM-method using house-price
index measure.
4.3. Structural time series models
The variation of observed land prices is a combination of
cross-sectional and time-series variations (Schulz, 2003, p. 58).
Besides the spatial characteristics, the selling date is an important
attribute in explaining the evolution of market prices through the flux
of time which itself is directly an unobservable quantity, i.e. time is
a latent variable. What we can observe are different states that occur
in a predefined submarket and changes that they cause in prices in that
market area. (Francke and Vos, 2004)
The time-series or temporal variation is a result of changing
market conditions, which are driven by, among others, changes in
consumers' preferences, investors' expectations and
technological advantages. The temporal variation can be understood as
representing that part of price variation that is more or less common to
all parcels of land in the same submarket (Schulz and Werwatz, 2004). An
empirical model of land prices has to recognize these two different, yet
closely related sources of variations.
Given the special characteristics of land markets one natural
solution to the dual problem of hedonic modelling caused by
spatio-temporal variation is combine the flexibility of a time series
model with that of the interpretation of a regression. This is the
underlying rationale in the structural time series approach: the
observations are directly made up of trend, cycle, seasonal, and
regression components plus error. In essence, structural time series
models can be thought of as regression models in which explanatory
variables are functions of time and the parameters are time-varying
(Harvey, 1997).
Structural time series methods can also be understood as
semiparametric estimators that combine many of the benefits of
parametric and nonparametric estimators; temporal variability of land
prices is estimated in a nonparametric fashion, which permits the effect
of time to be linear, convex and concave in different regions, whereas
the hedonic prices of attribute variables are estimated in a parametric
manner.
When considering the determination of hedonic prices in land
markets and, specifically the temporal dimension, there are several
benefits in using the structural time series approach and the associated
state space form as compared to the Box-Jenkins ARIMA methodology. These
include (Harvey and Shephard, 1993; Harvey, 1997; Durbin and Koopman,
2002, p. 51-53):
* Structural analysis of the problem. Different components that
make up the series, including the regression elements, are modelled
explicitly when, in contrast, the Box-Jenkins approach is a sort of
"black box". A structural model provides not only the
forecasts of the series but also presents a set of stylised facts. Also
a structural model can be handled within a unified statistical framework
that produces optimal estimates with well-defined properties.
* Management of nonstationarity. In a structural model
nonstationarity can be handled conveniently by unobserved components
without the need of differencing any variables. By comparison, in the
Box-Jenkins approach the stationary is assumed, and nonstationary
components of the series are usually eliminated by differencing the
variables, which results to a potential loss of valuable long-term
information. Furthermore, the standard unobserved component models are
simple, yet effective, leading to parsimonious representations for the
systems.
* Generality. Multivariate observations can easily be handled with
structural models, which cover as special cases a wide range of
econometric models (including all ARIMA models). Explanatory variables
can be introduced into the model structure and the associated regression
coefficients (hedonic prices) can be permitted to vary stochastically
over time if needed. Different kinds of intervention variables can be
specified and lagged values of dependent as well as explanatory
variables can be incorporated to a model. Missing observations and
varying dimensionality of observations are issues that are
straightforward to deal with structural models.
In this study the local level model or the random walk plus noise
model is used to capture the underlying trend in the series. The local
level model is the simplest, yet effective, structural trend model,
which regards an observation on land price [p.sub.t] at time t as being
made up of an underlying level [[mu].sub.t] and an irregular disturbance
[[epsilon].sub.t] (Koopman et al., 1999; Durbin and Koopman, 2002, p.
44-45):
[p.sub.t] = [[mu].sub.t] + [[epsilon].sub.t], {[[epsilon].sub.t]} ~
NID(0, [[sigma].sup.2.sub.[epsilon]],
[[mu].sub.t] = [[mu].sub.t-1] + [[eta].sub.t], {[[eta].sub.t]} ~
NID(0, [[sigma].sup.2.sub.[eta]]. (6)
The underlying level [[mu].sub.t] is not directly observable. It is
generated by a random walk, i.e. the level term in the current period is
equal to the level term in the previous period plus a level disturbance
term [[eta].sub.t]. The effect of [[eta].sub.t] is to allow the level of
the trend to shift up and down. It is generally assumed that the level
and irregular disturbances are mutually independent and independent of
[[mu].sub.0]. The signal-to-noise ratio q = [[sigma].sup.2.sub.[eta]] /
[[sigma].sup.2.sub.[epsilon]] plays a vital role in determining how
observations should be weighted for prediction and smoothing. Basically
the higher q is, the greater is the discounting of past observations.
The reduced form of local level model is ARIMA(0,1,1) with certain
restrictions on the parameter space.
Cycles are characteristic to many economic time series as economy
goes from boom to recession and back again. These can be modelled in
different ways, but in this study cycles are effectively presented as a
mixture of sine and cosine waves with two parameters [[theta].sub.1] and
[[theta].sub.2]. If [[psi].sub.t] is a cyclical function of time with
frequency [[lambda].sub.c] that is measured in radians, then (Harvey and
Shephard, 1993):
[[psi].sub.t] = [[theta].sub.1] cos [[lambda].sub.c]t +
[[theta].sub.2] sin [[lambda].sub.c]t, (7)
where the period of the cycle is 2[pi]/[[lambda].sub.c],
[square root of [[theta].sup.2.sub.1] + [[theta].sup.2.sub.2]] is
the amplitude and [tan.sup.-1]([[theta].sub.2]/[[theta].sub.1]) is the
phase. A stochastic cycle can be constructed recursively:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (8)
where [[kappa].sub.t] and [[kappa]'.sub.t] are mutually
uncorrelated with a common variance [[sigma].sup.2.sub.[kappa]]. [rho]
[member of] [0,1] is a damping factor. Stationary models correspond to
situations where [rho] is strictly less than one. A first-order
autoregressive process is an important limiting case of a stochastic
cycle when a frequency [[lambda].sub.c] is equal to 0 or [pi].
The calculations of unobserved components and hedonic prices are
done by using the Kalman filtering and smoothing recursions. These can
be expressed as:
[[??].sub.t|t-1] = [[PI].sub.t][[??].sub.t-1] + [W.sub.t][beta],
(9)
[S.sub.t|t-1] = [[PI].sub.t][S.sub.t-1] [[PI]'.sub.t] +
[R.sub.t][Q.sub.t][R'.sub.t], (10)
[v.sub.t] = [p.sub.t] - [Z.sub.t][[??].sub.t|t-1] -
[X.sub.t][beta], (11)
[F.sub.t] = [Z.sub.t][S.sub.t|t-1][Z'.sub.t] + [H.sub.t], (12)
[[??].sub.t] = [[??]sub.t|t-1] +
[S.sub.t|t-1][Z'.sub.t][F.sub.t.sup.-1] ([p.sub.t] -
[Z.sub.t][[??].sub.t|t-1] - [W.sub.t][beta]), (13)
[S.sub.t] = [S.sub.t|t-1] -
[S.sub.t|t-1][Z'.sub.t][F.sub.t.sup.-1][Z.sub.t] [S.sub.t|t-1],
(14)
where [p.sub.t] is a N x 1 vector of observed land prices at time
t, [[alpha].sub.t] is a m x 1 state vector and [beta] is a (p + k) x 1
vector of unknown regression coefficients that are assumed to be
constant (3). [Z.sub.t] is a non-stochastic N x m matrix of cycle and
trend components, [X.sub.t] is a non-stochastic N x (p + k) matrix of
observations on explanatory variables and [[epsilon].sub.t] is a N x 1
vector of serially uncorrelated measurement errors with zero mean and
covariance matrix [Q.sub.t], i.e. E([[epsilon].sub.t]) = 0 and
Var([epsilon].sub.t]) = [H.sub.t]. Now [[PI].sub.t] is a m x m state
transfer matrix, [W.sub.t] is a m x (p + k) matrix, [R.sub.t] is a m x g
matrix and [[eta].sub.t] is a g x 1 vector of serially uncorrelated
error terms with mean zero and covariance matrix [Q.sub.t], i.e.
E([[eta].sub.t]) = 0 and Var([[eta.sub.t]) = [Q.sub.t]. The estimator of
[p.sub.t] is [[??].sub.t|t-1] = [Z.sub.t][[??].sub.t|t-1] +
[W.sub.t][beta]. [v.sub.t] are one-step ahead prediction errors called
innovations, which represents that part of the [p.sub.t] that cannot be
predicted from the past. [F.sub.t] is the conditional variance of the
prediction error.
The basic Kalman filtering and smoothing recursions, which are
described in the formulas 9-14, are supplemented in this research by a
set of complementary vector and matrix recursions, because
non-stationary components and fixed regression effects are present. This
is called an augmented Kalman filter (Koopman et al., 1999; Durbin and
Koopman, 2002, p. 115-120), which is described by the equations:
[V.sub.t] = -Z[A.sub.t|t-1] - [X.sub.t][beta], (15)
[A.sub.t+1|t] = [[PI].sub.t][A.sub.t|t-1] + [W.sub.t]B +
[K.sub.t][V.sub.t], (16)
([m.sub.t], [M.sub.t]) = ([m.sub.t-1], [M.sub.t-1]) +
[V'.sub.t][F.sub.t.sup.-1] ([v.sub.t], [V.sub.t]) (17)
with [A.sub.1|0] = [W.sub.0]B and B = ([B.sub.x], [B.sub.i]) is a
square selection matrix of zeros and ones and the subscripts x, i are
related to regression and initial effects, respectively. The number of
columns of [V.sub.t] and [A.sub.t+1|t] is the same as in the matrix B.
[K.sub.t] = [[PI].sub.t][S.sub.t|t-1] [Z'.sub.t][F.sub.t.sup-1] is
the so-called Kalman gain. Now the one-step ahead prediction of the
state vector and the associated mean square error matrix are given by:
[[??].sup.*.sub.t|t-1] = [[??].sub.t|t-1] +
[A.sub.t|t-1][M.sup.-1.sub.t-1][m.sub.t-1], (18)
[S.sup.*.sub.t|t-1] = [S.sub.t|t-1] +
[A.sub.t|t-1][M.sup.-1.sub.t-1][m.sub.t-1], [A'.sub.t|t-1]. (19)
The one-step ahead prediction errors and the associated mean square
error matrix are given by:
[v.sup.*.sub.t] = [v.sub.t] +
[V.sub.t][M.sup.-1.sub.t-1][m.sub.t-1], (20)
[F.sup.*.sub.t] = [F.sub.t] +
[V.sub.t][M.sup.-1.sub.t-1][V'.sub.t], (21)
The matrix inversions for [M.sub.t] can be evaluated in a manner
similar to recursive regressions (de Jong, 1991).
4.4. Robust local regression (4)
Much of the aim of applied hedonic analysis is to produce a
reasonable approximation to the generally unknown mean response
function. The primary implication of the theoretical literature
concerning hedonic prices in the real estate markets is that hedonic
relationships are expected to be highly nonlinear due to their
locational uniqueness that induces spational heterogeneity of regression
surfaces (Wallace, 1996; McMillen and Thorsnes, 2003) that cannot be, in
general, specified a priori (Anglin and Gencay, 1996). Nonlinearity
indicates locally changing degrees of curvature in the hedonic function
with non-constant characteristic values.
Nonlinearity is a fundamental feature that characterise processes
in the real estate markets imposing serious threats to empirical
validity of hedonic models that in current practise are predominantly
used. The complex question of validity underlying hedonic model
specification can be divided into three subproblems that involve
determining (Pace, 1993 and 1995; Wallace, 1996):
(1) the relevant set of response and attribute variables;
(2) the appropriate functional form between these variables;
(3) the adequate error distribution for inference.
Economic theory and past experience usually provide useful a priori
information of what variables should enter the model structure that
substantially reduce the threat of omitted variable bias. Phenomena in
real estate markets are, however, strongly dependent on the particular
submarket, time period and property type and, as a consequence, the
selection of proper set of dependent and conditioning variables is
partially an empirical question, too.
Economic theory or previous experience rarely provides any
specific, valid guidance on the choice of an appropriate functional form
of the hedonic model (Pace, 1993 and 1995; Anglin and Gencay, 1996;
Gencay and Yang, 1996). A prespecified functional form is, however, the
fundamental assumption underlying the use of theory-laden parametric
models; a poor choice imposes artificial structure on data and
significantly invalidates results of the subsequent analysis (5). In
contrast, nonparametric techniques are data-driven, flexible approaches
that can learn much of the genuine structure from available facts and,
therefore, allow greatly reduced attention to the question of which
functional form ought to be used.
Local regression techniques can significantly reduce the
mis-specification error by letting the data to determine the appropriate
functional relationship between the response and a set of attributes.
Locally weighted regression adapts locally to changing curvature in the
hedonic surface by giving more weight to nearby observations (McMillen,
1996) and, therefore, can account for complex nonlinear patterns. The
local adaptation property, which is achieved by parametric localization
(Cleveland and Loader, 1996), makes it a highly attractive tool for
estimating spationally non-homogeneous hedonic functions. Furthermore,
any specific assumption underlying the error distribution can be relaxed
and, in most cases, derived directly from the evidence e.g. by
resampling techniques.
Data on land prices are imperfect, which generate difficult
problems with conventional parametric approaches. In particular, extreme
points, influential and outlying observations, which might represent
erroneous data or otherwise reflect unusual market conditions such as
non-arm's length transactions, can seriously undermine the
performance of parametric estimator. The results of locally weighted
regression can be robustified in a straightforward manner by a scheme,
which is a variant of M-estimation. This simple regulation robuste
typically offers enough protection against unusual or aberrant
observations.
The local regression problem can be formalized by using locally
weighted least squares (e.g. Ruppert and Wand, 1994; Loader, 2004):
Minimize [n.summation over (i=1)] [W.sub.H] ([x.sub.i] -
x)([p.sub.i] - <[theta], F([x.sub.i] - x)>).sup.2], (22)
where [theta] is the d + 1 vector of unknown coefficients and F(x)
is a vector of basis polynomials. [W.sub.H] is a multivariate weight
function and [H.sup.1/2] is a bandwidth matrix. The local least squares
estimate of the unknown regression function f(x) is then (6):
[??](x) = [e'.sub.1] [(X'WX).sup.-1] X'Wp. (23)
For local cubic regression [e.sub.1] is a {1 + d + 1/2 d(d + 1) +
1/6 d(d + 1)(d + 1} 1 x 1 vector having 1 in the first entry and all
other entries 0. For local quadratic and linear model the dimension of
[e.sub.1] is, respectively, {1 + d + 1/2 d(d + 1)} x 1 and {1 + d} x 1.
p = [[p.sub.1], ..., [p.sub.n]]' is a vector of observed land
prices and the data matrix X for the local cubic model is:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (24)
where [x.sub.(i)] = ([x.sub.i] - x) and the vec-operator stacks the
columns of ([x.sub.i] - x) ([x.sub.i] - x)', each below the
previous, but with entries above main diagonal omitted; [cross product]
is the Knonecker (tensor) product. The local linear and quadratic model
uses only the first two and three, respectively, columns of the data
matrix. The weight matrix is composed of W = diag {[W.sub.H] ([x.sub.1]
- x), ..., [W.sub.H]([x.sub.n] - x)} [equivalent to] diag
{[w.sub.1](x),... ,[w.sub.n](x)}.
The computational method for estimating the fit at points of
evaluation is based on a damped Newton-Raphson algorithm (Loader, 1999,
p. 209-211; Loader, 2004):
[[??].sub.k+1] = [[??].sub.k] + 1/[2.sup.j] [J.sup.-1] X'Wq.
(25)
[[??].sub.k] is an estimate of the parameter vector [theta] at the
k + 1 iteration. Here j is selected to be the smallest non-negative
integer that results in an increase of the local log-likelihood at every
step, and q is the usual score function. Furthermore, the Jacobian
matrix can be expressed as J = [D.sup.1/2] P' [SIGMA] [PD.sup.1/2],
where D is the matrix of diagonal elements of J = X' WVX with P and
[SIGMA] representing, respectively, the eigenvectors and eigenvalues of
[D.sup.1/2][JD.sup.1/2]. V is the diagonal observed information matrix.
Since the sample sizes and the dimensions of attribute spaces are small
in this study, direct evaluation of fit is potentially feasible
(although not applied) using the recursion of (25), which would mean
that separate weighted least squares regression is estimated for each
data point, with more influence given to nearby observations.
Direct evaluation is, however, computationally infeasible for
larger data sets (7) and for increased dimensions of attribute space; a
general algorithm is needed to perform the selection of evaluation
points, where the fit is subsequently estimated. Tree-based structures
are popular; in particular, the k-d-trees due to (Friedman, Bentley and
Finkel, 1977) or growing adaptive trees (see, inter alia, Loader, 1999,
p. 212-218). Different interpolation schemes (e.g. blending functions)
can be used to define the fit elsewhere.
The time element is estimated by the local regression using
house-price index measure.
5. SAMPLE DATA
The sample data of this study involve observations on urban
residential land prices and the associated characteristics in the
municipality of Espoo, a highly polycentric city (8), which lies inside
the Helsinki metropolitan area with circa 225 000 habitants; its
population is the second largest of the cities in Finland, which has
experienced a rapid growth in its late history. The study period is from
January, 1985 to December, 2007 with total number of observations of
3149 that constitute a judgement sample and cover phases of upward and
rapid downward movements of land prices. In that period Finnish economy
has experienced a great depression, which has had a major influence of
land prices also. The observations from the last year (total of 78) are
held back for post-sample predictive testing; a choice which is a
somewhat arbitrary and mainly dictated by practical valuation concerns.
In Table 1 are documented some standard sample statistics for the study
variables.
The total sales price is chosen as the proper dependent variable
(instead of the unit price) after some empirical experimentation: The
goodness-of-fit statistics are much better when total sales price is
explained by attribute variables. The unit of total sales price is
[euro]. Permitted building volume and parcel size variables are
expressed in square meters. Northing and easting represent co-ordinates
in the Finnish KKJ-system. The house price index variable is a
quality-adjusted measure of house prices in the Helsinki metropolitan
area and it is unitless. There are also seven indicator variables in the
data set that deserves a mentioning. Presence of a shore indicator
receives a value of one if the land parcel is bordered on water system
and null otherwise. Presence of a shore NA indicator receives a value of
one, if it is not known whether the land parcel is bordered on water
system and null otherwise. There existed 327 observations where it was
not known whether it was bounded by a water system or not. Housing block
indicator receives a value of one, if the intended use of the land in
the detailed plan is multistorey apartment block and null otherwise (9).
Single-family house indicator receives a value of one if the intended
use of the land in the detailed plan is for single-family houses and
null otherwise. Row house indicator receives a value of one, if the
intended use of the land in the detailed plan is for row houses and null
otherwise. Municipality indicator receives a value of one if the buyer
of the land is a municipality and null otherwise. And finally, private
person indicator receives a value of one if the buyer of the land is a
private person and null otherwise.
6. RESULTS OF HEDONIC MODEL ESTIMATION
6.1. Ordinary least squares estimation
Table 2 summarises the results of the ordinary least squares
estimation, in which double-log model specification is used (i.e. all
quantitative variables are logarithmised). The standard goodness-of-fit
statistics, the coefficient of determination and standard error of
regression, indicate the fit is quite good. In particular, the
coefficient of determination statistics is over 0.70, which commonly
used target in land valuation. Furthermore, the standard error of
regression is below 0.40 indicating that the internal precision is
acceptable. Statistically, four most significant attribute variables
are, respectively: house price index (t-value is 55.53), permitted
building volume (t-value is 42.01), easting (t-value is 27.53) and
northing (t-value is -23.89). Furthermore, there are six statistically
significant indicator variables and parcel size variable in the hedonic
model. However, these remaining seven attribute variables explain the
observed variability in land prices much less than the four most
significant attribute variables. All explanatory variables are plausible
in sign and magnitude. Overall, 74 outliers were dropped from the final
hedonic model (10).
6.2. Robust MM-estimation
Table 3 summarises the results of the robust MM-estimation, in
which double-log model specification is used. The standard error of
regression statistic indicates a slightly better fit than in the case of
ordinary least squares (11). All explanatory variables are plausible in
sign and magnitude. The standard error of regression statistic is below
0.40 indicating that the internal precision is acceptable.
Statistically, four most significant attribute variables are,
respectively: easting (t-value is now increased to 143.62!), house price
index (t-value is 56.26), permitted building volume (t-value is 46.19)
and northing (t-value is -32.89). Furthermore, there are six
statistically significant indicator variables and parcel size variable
in the hedonic model. However, these remaining seven attribute variables
explain the observed variability in land prices much less than the four
most significant attribute variables. No outliers were dropped from this
hedonic model. Instead, the influence of aberrant observations was down
weighted by using a specific weight function.
6.3. Structural time series models
Table 4 summarises the results of the structural time series
estimation, in which double-log model specification is used for
regression effects. The standard goodness-of-fit statistics, the
coefficient of determination and the standard error of regression,
indicate the fit is pretty good. In particular, the coefficient of
determination statistics is 0.90 and the standard error of regression is
below 0.34 indicating that the internal precision is acceptable (12).
Statistically, three most significant attribute variables are,
respectively: permitted building volume (t-value is 36.96), easting
(t-value is 28.07) and northing (t-value is -24.79). The statistical
significance of the house price index variable is now significantly
reduced (the t-value is now only 6.61). The reason for this is that the
unobserved components are, in fact, already revealing much of the same
information than the house price index variable. Furthermore, there are
six statistically significant indicator variables and parcel size
variable in the hedonic model. However, these remaining seven attribute
variables explain the observed variability in land prices much less than
the three most significant attribute variables. All explanatory
variables are plausible in sign and magnitude.
Structural time series model also uses unobserved components to
account for the temporal variability in the dependent variable. In Table
4 there are three different unobserved components in the model
structure: the level term (which is the dynamic version of the constant
variable), one cycle term (with two components) and an 1st order
autoregressive (AR(1)) process. The data analysed contained many
outlying observations in terms of an unusual high value of standardised
residual (13). Instead of removing the outlier its effect was
statistically measured by an impulse intervention variable and the
influence was subsequently included as part of the overall model
specification resulting to no loss of price information. In the final
hedonic model there are 73 impulse intervention variables.
6.4. Robust local regression
Table 5 summarises the results of the robust local regression, in
which local double-log model specification is used. To avoid the curse
of dimensionality, only the six most significant variables from the
ordinary least squares estimation were included into final model of
local regression (14). The overall in-sample fit is better than in the
former approaches. The coefficient of determination statistic is 0.93
and the standard error of regression statistic is 0.29. Because the
local regression is a nonparametric method, there are no coefficient
estimates that can be reported (15). No outliers were dropped from this
hedonic model. Instead, the influence of aberrant observations was down
weighted by use of M-estimation.
7. MEASURES OF PREDICTIVE ACCURACY
Predictive accuracy is perhaps the single most important
operational criterion in the evaluation of performance of chosen hedonic
model. The success of hedonic model-based forecast depends on (see,
Hendry, 1997):
(1) the existence of structure;
(2) whether such structure is informative about the future;
(3) the proposed method capturing the structure;
(4) the exclusion of irregularities that swamp the structure.
The aspects in (1)-(2) are characteristics of the economic system
and the last two of the chosen forecasting method. When structure is
understood as a systematic relation between the entity to be forecast
and the available information, the conditions in (1)-(4) are sufficient
for forecastability.
There are numerous different indicators for post-sample predictive
assessment of hedonic models (e.g. Case et al., 2004) and the relative
ranking of the performance of various models varies according to the
applied accuracy measure. Mean prediction error is evaluated in this
study by the arithmetic average prediction error, which measures the
predictive unbiassness of the hedonic model. Two measures of strength of
the association between predictions and observed out-of-sample land
prices are reported. First, the usual correlation coefficient is
calculated, which is a useful measure of statistical relation in the
case of normally distributed variables and when the focus is on the
co-variation of variables. The major problem of using the classical
correlation measure in land valuation studies lies in its strong
dependency on the normality assumption, which is typically violated by
the influence of aberrant error terms, whose effect is squared in the
denominator, which, in turn, tend to lead to highly similar standard
deviations between different model alternatives. Secondly, the gravity
(see McMillen, 2001) is reported that is not strongly dependent on any
particular distributional assumptions. Generally, the gravity seems to
be a viable measure of strength of association (16).
Root mean squared error (RMSE) is the most commonly used measure of
success of numeric prediction, which controls the reliability or
variability of predictions. This statistic is very sensitive to outlying
observations tending to exaggerate the variance of prediction errors of
model choices in which the prediction error is larger than the others
(which is typical in land price studies). Mean absolute error WE) is
generally a more appropriate indicator of predictive variability, and is
especially suitable in cases of outlying prediction errors. Widely used
measure of predictive variability is mean absolute percentage error
(MAPE) (see e.g. Makridakis and Hibon, 2000) which, however, has some
problems of asymmetry and instability, when the data are small.
8. FORECASTING ACCURACY OF DIFFERENT HEDONIC APPROACHES
Table 6 summarises the post-sample prediction statistics for the
four different approaches (ordinary least squares, MM-estimation,
structural time series and local regression). Six different measures of
predictive accuracy are reported. First of all, the mean prediction
error, which measures the predictive unbiassness, is significantly
reduced, when MM-estimation, structural time series or local regression
is used instead of the orthodox ordinary least squares. The mean
prediction error is 81% smaller, when MM-estimation is used, 64% smaller
when structural time series is used and 59% smaller, when local
regression is used, instead of ordinary least squares. It therefore
seems that predictive validity can be significantly improved when
unorthodox methods (MM-estimation, structural time series, local
regression) are applied. The mean prediction error is smallest when
MM-estimation is used.
MAPE is useful predictive measure and usually in practice it is the
measure we looking for. Here all approaches produce an error, which is
only a slight over 2%. The unorthodox approaches produce a smaller MAPE
that the orthodox approach: Structural time series gives 7% smaller
MAPE, MM-estimation generates 5.5% smaller MAPE and local regression
produces 4.3% smaller MAPE than in the case of ordinary least squares.
MAPE is smallest when structural time series are applied.
The unorthodox approaches all give the same RMSE of 0.18, whereas
the ordinary least squares produces RMSE of 0.20. It means that the
unorthodox approaches produce a RMSE that is 10% smaller the one that is
obtained by ordinary least squares. MAE is a robust version of RMSE and
it is usually more reliable indicator than RMSE. Again MAE is highest
when ordinary least squares is used: Structural time series generates
9.7% smaller MAE, MM-estimation and local regression produce 6.5%
smaller MAE, when compared to the case of ordinary least squares.
Correlation coefficients are very similar between the approaches,
the only exception is the value underlying structural time series, which
is 1% higher than in the other cases. When gravity is used there are
more differences between the approaches: the highest association is
obtained when structural time series is used and the lowest association
is obtained when ordinary least squares is used. Specifically,
structural time series produces 9.5% higher gravity, MM-estimation
generates 8.2% higher gravity and local regression gives 6.5% higher
gravity, when compared to the case of ordinary least squares estimation.
9. CONCLUSIONS
This paper has investigated the structure of urban residential land
prices and, specifically, the predictive accuracy of hedonic models
between four different approaches, when land prices are predicted in a
local market. In this study applied hedonic approaches are: 1) ordinary
least squares estimation, 2) robust MM-estimation, 3) structural time
series estimation and 4) robust local regression. Ordinary least squares
and robust MM-estimation are parametric methods, structural time series
estimation is a semi-parametric method and robust local regression is a
nonparametric method.
Post-sample predictive assessment indicated that more precise
predictions are obtained if the unorthodox methods of this study are
used instead of the conventional least squares estimation. In
particular, the predictive unbiassness can significantly be improved, if
we move from the orthodox least squares estimation to using robust
MM-estimation, structural time series estimation or robust local
estimation. All six different forecasting indicators were better (or at
least equal) in the case of the non-standard hedonic methods. Among the
four different hedonic approaches, in overall, finest post-sample
predictions are produced by the structural time series estimation.
The hedonic estimation revealed that there are four separate
attribute variables that have an overriding effect of land prices. These
independent variables are: permitted building volume, house price index,
northing and easting. The influence of parcel size variable and
different indicator variables on land prices were much weaker.
Received 23 May 2008; accepted 9 July 2008
REFERENCES
Anglin, P. M. and Gencay, R. (1996) Semiparametric estimation of a
hedonic price function, Journal of Applied Econometrics, 11(6), pp.
633-648.
Atack, J. and Margo, R.A. (1998) Location, location, location! The
price gradient for vacant urban land: New York, 1835 to 1900, Journal of
Real Estate Finance and Economics, 16(2), pp. 151-172.
Breiman, L. (2001) Statistical modeling: The two cultures,
Statistical Science, 16(3), pp. 199-231.
Case, B., Clapp, J., Dubin, R. and Rodriquez, M. (2004) Modelling
spatial and temporal house price patterns: A comparison of four models,
Journal of Real Estate Finance and Economics, 29(2), pp. 167-191.
Clapp, J. M., Rodriquez, M. and Pace, R. K. (2001) Residential land
values and the decentralization of jobs, Journal of Real Estate Finance
and Economics, 22(1), pp. 43-61.
Cleveland, W. S. and Loader, C.R. (1996) Smoothing by local
regression: Principles and methods. In: Hdrdle, W. and Schimek, M. G.
(eds.) Statistical Theory and Computational Aspects of Smoothing,
Physica-Verlag.
Colwell, P.F. (1998) A primer on piecewise parabolic multiple
regression analysis via estimations of Chicago CBD land prices, Journal
of Real Estate Finance and Economics, 17(1), pp. 87-97.
Colwell, P. F. and Munneke, H. J. (1997) The structure of urban
land prices, Journal of Urban Economics, 41(3), pp. 321-336.
Colwell, P. F. and Munneke, H. J. (1999) Land prices and land
assembly in the CBD, Journal of Real Estate Finance and Economics,
18(2), pp. 163-180.
Colwell, P. F. and Munneke, H. J. (2003) Estimating a price surface
for vacant land in urban area, Land Economics, 79(1), pp. 15-28.
Durbin, J. and Koopman, S. J. (2002) Time Series Analysis by State
Space Methods. Oxford Statistical Science Series #24, Oxford University
Press.
Francke, M. K. and Vos, G. A. (2004) The hierarchical trend model
for property valuation and local price indices, Journal of Real Estate
Finance and Economics, 28(2/3), pp. 179-208.
Friedman, J. H., Bentley, J. L. and, Finkel, R. A. (1977) An
algorithm for finding best matches in logarithmic expected time, ACM
Transactions on Mathematical Software, 3(3), pp. 209-226.
Gencay, R. and Yang, X. (1996) A forecast comparison of residential
housing prices by parametric versus semiparametric conditional mean
estimators, Economic Letters, 52(2), pp. 129-135.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A.
(1986) Robust Statistics: The Approach Based on Influence Functions,
Wiley Series in Probability and Mathematical Statistics.
Hannonen, M. (2005) On the recursive estimation of hedonic prices
of land, Nordic Journal of Surveying and Real Estate Research, 2(2), pp.
30-56.
Harvey, A. C. (1997) Trends, cycles and auto-regressions, Economic
Journal, 107 (January), pp. 192-201.
Harvey, A. C. and Shephard, N. (1993) Structural Time Series
Models. In: Maddala, G. S., Rao, C. R. and Vinod, H. D. (eds.) Handbook
of Statistics, Vol. 11, Elsevier Science Publishers B. V.
Hendry, D.F. (1997) The econometrics of macroeconomic forecasting,
Economic Journal, 107 (September), pp. 1330-1357.
de Jong, P. (1991) The diffusive Kalman filter, Annals of
Statistics, 19(2), pp. 1073-1083.
Koopman, S. J., Harvey, A. C., Doornik, J. A. and Shephard, N.
(1999) Structural Time Series Analysis, Modelling and Prediction using
Stamp, Timberlake Consultants, London.
Lin, T. and Evans, A. W. (2000) The relationship between the price
of land and size of plot when plots are small, Land Economics, 76(3),
pp. 386-394.
Loader, C. (1999) Local Regression and Likelihood. Springer Series
in Statistics and Computing, Springer-Verlag.
Loader, C. (2004) Smoothing: Local Regression Techniques, Handbook
of Computational Statistics, (eds.) Gentle, J., Hdrdle, W. and Mori, Y.,
Springer-Verlag.
McMillen, D. P. (1996) One hundred fifty years of land values in
Chicago: A nonparametric approach, Journal of Urban Economics, 40(1),
pp. 100-124.
McMillen, D. P. (2001) Nonparametric employment subcenter
identification, Journal of Urban Economics, 50(3), pp. 448-473.
McMillen, D. P. and Thorsnes, P. (2003) The aroma of Tahoma:
Time-varying average derivates and the effect of a superfund site on
house prices, Journal of Business and Economics Statistics, 21(2), pp.
237-246.
Pace, R. K. (1993) Nonparametric methods with applications to
hedonic models, Journal of Real Estate Finance and Economics, 7(3), pp.
185-204.
Pace, R. K. (1995) Parametric, semiparametric, and nonparametric
estimation of characteristic values within mass assessment and hedonic
pricing models, Journal of Real Estate Finance and Economics, 11(3), pp.
195-217.
Rousseeuw, P. J. and Yohai, V. J. (1984) Robust Regression by Means
of S-estimators. In: Franke J., Hardle, W. and Martin D. (eds.) Robust
and Nonlinear Time Series, Lecture Notes in Statistics, Springer-Verlag,
26, pp. 256-272.
Ruppert, D. and Wand, M. P (1994) Multivariate Locally Weighted
Least Squares Regression, Annals of Statistics, 22(3), pp. 1346-1370.
Shimizu, C. and Nishimura, G. N. (2007) Pricing structure in Tokyo
metropolitan land markets and its structural changes: Pre-bubble,
bubble, post-bubble, Journal of Real Estate Finance and Economics,
35(4), pp. 475-496.
Schulz, M.A.R. (2003) Valuation of Properties and Economic Models
of Real Estate Markets. Dissertation, Berlin: Humboldt University
Berlin.
Schulz, R. and Werwatz, A. (2004) A state space model for Berlin
house prices: Estimation and economic interpretation, Journal of Real
Estate Finance and Economics, 28(1), pp. 37-57.
Schwann, G. M. (1998) A real estate price index for thin markets,
Journal of Real Estate Finance and Economics, 16(3), pp. 269-287.
Thorsnes, P. and McMillen, D. P. (1998) Land value and parcel size:
A semiparametric analysis, Journal of Real Estate Finance and Economics,
17(3), pp. 233-244.
Wallace, N. E. (1996) Hedonic-based price indexes for housing:
Theory, estimation and index construction, FRBSF Economic Review, No. 3,
pp. 34-48.
Marko HANNONEN
Institute of Real Estate Studies, Department of Surveying, Helsinki
University of Technology, Espoo, P.O. Box 1200, FIN-02015 HUT, Finland
E-mail: marko.hannonen@pp.inet.fi; Tel: +358 05 596 6065; Telefax.
+358 9 465 077
Notes
(1) This section reviews the hedonic price studies in land markets,
which are presented in major scientific journals since 1995. Major
findings, the data and the modelling methods are documented.
(2) In the study these are obtained by using S-estimation
(Rousseeuw and Yohai, 1984).
(3) Regression coefficients can be time-varying, but this the
representation used in the empirical section of the study. Here p and k
denote the number quantitative and qualitative explanatory variables,
respectively.
(4) Local regression means locally weighted regression, in which
local polynomial functions are used in estimating the regression
surface.
(5) In the parametric modelling context, a common solution to the
problem of selecting an appropriate functional form is to consider a set
of parametric functions with the objective of finding a model structure
that matches the evidence in most measurable respects. However, there is
no clear evidence that this practice will be successful in avoiding
functional form mis-specification (Anglin and Gencay, 1996; Hannonen,
2005). Specification searches can be highly time-consuming and the
intrinsic power of these specification tests is somewhat questionable.
(6) Assuming, as usual, that X' WX is non-singular.
(7) Also visualisation of regression surfaces, variance functions
etc. demands that a separate, reduced number of fitting points are
selected.
(8) Because of this polycentric nature numerous distance measures
are needed to various subcenters. From the hedonic modelling viewpoint
this creates problems of multicollinearity, when several distance
measures are used. As a solution, no distance measures are used but the
co-ordinates describing location are used instead.
(9) Intended use of all sites in this study is for housing so that
there does not exist non-residential types of land use.
(10) Outliers are considered here as those observations whose
standardised residual is larger than 3.5. This is a typical value in the
Finnish practice when hedonic based land analysis is conducted.
(11) The coefficient of determination statistic cannot be
calculated in standard manner and thus it is not reported in the case of
MM-estimation.
(12) In fact, the standard error of regression statistic is now
same as in the case of MM-estimation.
(13) Intervention variables are used for those observations whose
standardised residual is larger than 3.5.
(14) In other words, only those attribute variables were included
into the local regression from the ordinary least estimation with global
double-log specification whose t-values in absolute terms were higher
than five.
(15) To be more specific, nonparametric estimators are, in fact,
"over-parametric" in a sense that they generate an infinite
number of hedonic prices for each attribute depending on the values of
that attribute. In practice, for a particular characteristic a single
representative hedonic price is often of direct interest and,
consequently, some kind of average derivative is usually needed.
However, there are problems in the average derivate estimation, so that
in this study no average hedonic prices are calculated.
(16) Here gravity is calculated so that the weighted (by the area)
inner product of predictions and realisations is divided by the
L2-distance between predictions and realisations.
Table 1. Sample statistics of study variables
Arithmetic
Variable mean Minimum
Total sales price 191738.49 631.00
Permitted building volume 736.23 30.00
Parcel size 2286.93 300.00
Northing 6677724.65 6668110.00
Easting 2540238.28 2528820.00
House price index 204.2 116.40
Presence of a shore indicator NA 0
Presence of a shore NA indicator NA 0
Housing block indicator NA 0
Singly-family house indicator NA 0
Row house indicator NA 0
Municipality indicator NA 0
Private person indicator NA 0
Std.
Variable Maximum Deviation
Total sales price 19450000.00 647188.81
Permitted building volume 54322.40 1750.24
Parcel size 113000.00 4800.9
Northing 6781565.00 5954.26
Easting 3475399.47 17009.52
House price index 350.10 57.78
Presence of a shore indicator 1 NA
Presence of a shore NA indicator 1 NA
Housing block indicator 1 NA
Singly-family house indicator 1 NA
Row house indicator 1 NA
Municipality indicator 1 NA
Private person indicator 1 NA
Table 2. Fit and hedonic prices from the ordinary least squares
regression
Variable Coefficient Standard error
Constant 1658.300 152.110
Permitted building volume 0.766 0.0182
Parcel size 0.115 0.0191
Northing -260.586 10.909
Easting 165.110 5.998
House price index 1.455 0.0262
Presence of a shore indicator -0.768 0.143
Singly-family house indicator 0.0807 0.0269
Row house indicator 0.138 0.0404
Municipality indicator -0.295 0.0671
Private person indicator -0.0619 0.0181
Presence of a shore NA indicator 0.0848 0.0235
Variable t-value p-value
Constant 10.90 0.0000
Permitted building volume 42.01 0.0000
Parcel size 6.04 0.0000
Northing -23.89 0.0000
Easting 27.53 0.0000
House price index 55.53 0.0000
Presence of a shore indicator -5.38 0.0000
Singly-family house indicator 3.00 0.0027
Row house indicator 3.42 0.0006
Municipality indicator -4.40 0.0000
Private person indicator -3.42 0.0006
Presence of a shore NA indicator 3.61 0.0003
Coefficient of determination: 0.88 Standard error of
regression: 0.37 Number of outliers: 74
Table 3. Fit and hedonic prices from the MM-estimation
Variable Coefficient Standard error
Constant 1589.955 123.018
Permitted building volume 0.774 0.0168
Parcel size 0.109 0.0174
Northing -257.274 7.823
Easting 166.216 1.157
House price index 1.453 0.0258
Presence of a shore indicator -0.867 0.133
Singly-family house indicator 0.0803 0.0258
Row house indicator 0.182 0.0395
Municipality indicator -0.313 0.0672
Private person indicator -0.0666 0.0177
Presence of a shore NA indicator 0.0665 0.0233
Variable t-value p-value
Constant 12.93 NA
Permitted building volume 46.19 NA
Parcel size 6.25 NA
Northing -32.89 NA
Easting 143.62 NA
House price index 56.26 NA
Presence of a shore indicator -6.51 NA
Singly-family house indicator 3.11 NA
Row house indicator 4.61 NA
Municipality indicator -4.66 NA
Private person indicator -3.77 NA
Presence of a shore NA indicator 2.85 NA
Coefficient of determination: NA Standard error of regression: 0.34
Table 4. Fit and hedonic prices from the structural time series
estimation
Variable Coefficient Standard error
Level 1704.400 144.250
Cycle (comp. #1) 0.0178 0.126
Cycle (comp. #2) -0.00430 0.141
AR(1) 0.0339 0.143
Permitted building volume 0.7050 0.0190
Parcel size 0.1790 0.0195
Northing -256.740 10.358
Easting 158.11 5.632
House price index 0.927 0.140
Presence of a shore indicator -1.101 0.146
Housing block indicator 0.168 0.0378
Singly-family house indicator 0.123 0.0268
Row house indicator 0.178 0.0398
Municipality indicator -0.358 0.0630
Private person indicator -0.0811 0.0171
Variable t-value p-value
Level 11.82 0.0000
Cycle (comp. #1) NA NA
Cycle (comp. #2) NA NA
AR(1) NA NA
Permitted building volume 36.96 0.0000
Parcel size 9.19 0.0000
Northing -24.79 0.0000
Easting 28.07 0.0000
House price index 6.61 0.0000
Presence of a shore indicator -7.54 0.0000
Housing block indicator 4.45 0.0000
Singly-family house indicator 4.59 0.0000
Row house indicator 4.46 0.0000
Municipality indicator -5.68 0.0000
Private person indicator -4.74 0.0000
Coefficient of determination: 0.90 Standard error of regression: 0.34
Number of interventions: 72
Table 5. Fit of the local regression
Standard
Variable Coefficient error t-value p-value
Constant NA NA NA NA
Permitted building NA NA NA NA
volume
Parcel size NA NA NA NA
Northing NA NA NA NA
Easting NA NA NA NA
House price index NA NA NA NA
Presence of a NA NA NA NA
shore indicator
Coefficient of determination: 0.93 Standard error of regression: 0.29
Table 6. Post-sample prediction statistics for different approaches
Measure of Modelling approach
predictive
accuracy Ordinary Structural
least MM- time Local
squares estimation series regression
Mean predict. 0.16 0.031 -0.057 0.066
error
MAPE 2.56 2.42 2.38 2.45
RMSE 0.2 0.18 0.18 0.18
MAE 0.31 0.29 0.28 0.29
Correlation 0.86 0.86 0.87 0.86
Gravity 2714 2936 2971 2891
COPYRIGHT 2008 Vilnius Gediminas Technical
University Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2008 Gale, Cengage Learning. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.