_{1}

^{*}

The yield map is generated by fitting the yield surface shape of yield monitor data mainly using paraboloid cones on floating neighborhoods. Each yield map value is determined by the fit of such a cone on an elliptical neighborhood that is wider across the harvest tracks than it is along them. The coefficients of regression for modeling the paraboloid cones and the scale parameter are estimated using robust weighted M-estimators where the weights decrease quadratically from 1 in the middle to zero at the border of the selected neighborhood. The robust way of estimating the model parameters supersedes a procedure for detecting outliers. For a given neighborhood shape, this yield mapping method is implemented by the Fortran program paraboloidmapping.exe, which can be downloaded from the web. The size of the selected neighborhood is considered appropriate if the variance of the yield map values equals the variance of the true yields, which is the difference between the variance of the raw yield data and the error variance of the yield monitor. It is estimated using a robust variogram on data that have not had the trend removed.

The yield mapping method I extensively describe here follows Bachmaier [

The yield map values determined by my method does not depend on the removal of outliers, but from fitting the yield surface of the raw data using paraboloid cones on floating neighborhoods. By choosing these neighborhoods across the tracks wider than along them, the yield map can be adapted better to changes in yield along the tracks than across them. This is an advantage over other methods that often do not smooth sufficiently across the tracks. The parameters for modeling a paraboloid cone are estimated by robust weighted M-estimators (cf. Hampel et al. [

The example in this paper uses a data set where the yields have been converted to dry matter yields. Time and header status were not measured. Therefore for every measurement the distance to preceding harvest paths was calculated. A distance that is considerably smaller than the cutting width of the combine indicates that the measurement cannot have come from a full swath, so it might be incorrect. The distance to preceding paths also downweights measurements because of end-path delays if the combine driver did not lift the header after harvesting a path. The GPS-points of the following low yield measurements are usually close to other harvest paths around the boundary of the field. I applied a smooth method of weighting to reduce the effect of such dubious measurements.

A paraboloid cone (

Fitting paraboloid cones is only meaningful if one assumes that the true unknown yield surface is smooth. From a mathematical point of view, it should be a twice continuously differentiable function of the Gauss-Krüger coordinates. Fields where any rectangles were cut out, as often occurs in agricultural trials, are to be excluded from consideration.

But what is the true yield at a point of the field? Imagine a point as a 1 cm 1 cm square. One could say that if there is a wheat ear at this point, the yield (in Mg ha^{-1}) is tremendously large (as the area of 1 cm^{2} is so small), whereas the yield is zero if there is no ear. Such a yield map would have a huge microscale variation, and that is not desirable. It is sensible that the true yield denotes the

true average yield of a rectangle around this point, the rectangle to be harvested for measuring its yield with the monitoring process. Thus, the true yield surface is continuous. It has no microscale variation because points close to each other refer to almost the same area.

Every continuously differentiable function defined on a two-dimensional area can be approximated by a skewed plane if it is confined to a small environment. A paraboloid cone, however, approximates a twice continuously differentiable function better as it can also take account of hilltops or depressions. This allows a neighborhood that is considerably larger. Paraboloid cones arise from approximating smooth functions by a two-dimensional Taylor series that includes only linear and quadratic terms.

The two-dimensional quadratic model for fitting paraboloid cones is

where z_{i} is the measured yield at the i^{th} site. For reasons of numeric stability, the coordinates x_{i} and y_{i} should be the mean adjusted Gauss-Krüger coordinates

where and are the means of the Gauss-Krüger coordinates and over all points used for fitting the cone. The random variables in (1) denote the errors, which should be around zero.

In extrapolation or where there are few valid measurements around a point, the yield value should be obtained using a skewed plane rather than a paraboloid cone because the latter can lead to exaggerated values. A skewed plane is modeled by

Such a plane is horizontal if.

The regression coefficients are not estimated by classical least-squares because this method is not robust against outliers, which often occur in raw yield data. Instead, M-estimates are used.

To understand the concept of an M-estimate, consider first the simplest case of estimating a location parameter. A property of the mean is that the sum of deviations from it is zero:. This is demonstrated in

This effect can be avoided by bounding the bisecting line, so that the influence of any outlier is also bounded. This is illustrated in

The influence of the outlier is reduced when solving

. The solution of this equation, , which is the intersection of the ψ-function and the abszissa, is called an M-estimate.

Since the dispersion of the data is usually unknown, an M-estimate should be scale-invariant, which is so when a scale estimate is incorporated into the equation to be solved. The M-estimate for the location parameter is then defined by. The scale estimate can also be determined as an M-estimate for scale if one simultaneously solves an additional equation for. This will be done in the following, where M-estimates are expanded to a regression model.

The objective is to estimate simultaneously the regression coefficients and the scale parameter in (1) and (3) by robust weighted M-estimators. Thus, equations have to be solved, where is the number of regression coefficients (if a skewed plane is modeled, if a paraboloid cone is modeled). The solutions, , and are the M-estimates for the unknown parameters and. Following Huber [

(4)

where the three equations of the right column in (4) are omitted for skewed planes (), and the residuals result as:

(5)

the weights will be defined in (23). The number

(Satterthwaite [

results in. Thus, data weighted according to lead to the same variance reduction as unweighted data. If all weights were equal,. The effective number is greater the more the weights equal each other. It is never greater than. Data points with a low weight, , such as those close to the border of the selected neighborhood, barely enlarge, unless there are no data points close to the point to be mapped.

The weighted M-estimates used for the regression and

coefficients, , are based on the -function in (8) and the -function in (9). Illustrated are these functions in

(8)

and is defined as

where

(10)

ist the expected value of for an distributed random variable, thus providing the scale parameter for the standard normal distribution.

To obtain the regression coefficients, , and the scale estimate, the equation system in (4) is solved in the following three successive steps:

The starting value for the column vector of regression coefficients is obtained with the method of weighted least-squares:

where the -matrix

is the design matrix whose i^{th} line () consists of the line vector.

(13)

which is based on the mean adjusted Gauss-Krüger coordinates x_{i} and y_{i} in (2). The weight matrix W = diag(w) is a diagonal matrix, i.e. and for.

The starting value for the scale solution is

with according to (7)., which has been defined in (6), should be considerably greater than. Since is based on squared deviations, it is subject to outliers, so that the robustly defined might be overestimated. However, large starting values for scale should be preferred as they counteract the danger of scale implosion (Huber [

The classical starting values obtained in the first step, in (11) and in (14), serve as the results of the iteration in the second step.

To avoid solutions that could abduct us from the bulk of the data, redescending M-estimates are not yet applied in the second step when running the iteration procedure, which will be described soon. Instead, the -function in (8) is made monotone by replacing its redescending parts by horizontal lines:

The-function remains unchanged, as it is already monotone in.

Finally, the same iteration procedure is run again, using the results of the second step as starting values (iteration) and applying the redescending -function in (8) instead of the monotone one in (15). The results of the last iteration, and, are the numerical solutions, and, of the equation system in (4).

The iteration procedure follows Huber [

It suffices to describe the (m + 1)^{th} iteration on the basis of the results of the m^{th} iteration ():

Based on the vector of regression coefficients of the m^{th} iteration, the residuals are computed first:

Then the -estimate of iteration is calculated:

(17)

with according to (7). Now, the residuals are used to compute the so-called Winsorized residuals:

where in (15) if the iteration refers to the second step, and in (8) if it concerns the third step.

Using the vector of these Winsorized residuals, , the difference vector

is calculated by weighted least-squares and used to compute the new vector of coefficients of regression:

According to Huber [

The iteration procedure ends if

for a small, e.g., in the second and in the third step; is the j ^{th} diagonal element of.

The most important reason for omitting yield monitor data concerns start-path delays (see e.g. Thylén et al. [^{th} measured value is downweighted by a factor of 0.5 because of uncertainty as to whether this value is correct. The other values are then weighted fully in the context of delays. The size of depends on the yield monitor. For the Claas Agrocom Quantimeter, which was used in the trial described here, proved adequate.

Another reason for downweighting values is if the GPS points of the current swath are close to a preceding neighboring swath. Such a small distance might arise from GPS errors or inaccuracies, but it can also be the consequence of a smaller current swath width. If the minimum distance to the preceding harvest tracks, , is small, it is less likely that the measured yield comes from a full swath width. At present, a raw yield value is weighted fully (weight 1) only if, whereby sw means the “full” swath width, which is somewhat smaller than the combine’s cutting width, cw, when the combine is steered by a man. If, the raw yield value gets weight zero. The weights in between are obtained by linear interpolation. This is illustrated in

In Bachmaier [

, it comes close to the function in

The global weight, , is the product of the single weight factors. For example, if the measurement is the value of a harvest track, it is downweighted by a factor of 0.5, as mentioned above, and if its minimal distance from neighboring tracks, , is, it results in a weight factor of 0.75 (

Other sources of error that result in dubious measurements, e.g. unusual values for moisture or speed, could be dealt with similarly. The greater the likelihood that the measurement is erroneous, the lower its weight is.

To estimate the regression coefficients for the paraboloid cone or the plane at the point, all points within an elliptical neighborhood see

where denotes the radius of the neighborhood across the tracks and is a distance measure between and on the basis of an elliptical metric, which will be defined in (24). The weights are local because they depend on the distance from the fixed point, and they change if this point changes.

The global weights, , do not depend on. The final weights, , of the yield measurements at the points within the neighborhood for determining the regression coefficients in (4) are the products of the global and local weights, so they also depend on:

In Bachmaier [

Raw yield data often show surface textures along the harvest tracks that remain when the data are amended. In Bachmaier and Auernhammer [

The butterfly neighborhood I used in Bachmaier [

A single swath of wrong values can also distort a yield estimate if its influence on the paraboloid regression is

too large. To avoid this, the neighborhood must be chosen large enough, and the tracks in the mid should have nearly the same sum of weights. This is a further reason why the chosen local weight function defined on an elliptical neighborhood downweights more slowly than linearly.

The ‘elliptical metric’, , is based on the ratio, , of the ‘radius’ (half diameter) of the ellipse across the harvest track and the ‘radius’ along it:

The components of the difference vector

along and across the harvest tracks, and, can be calculated from neighboring GPS points on the same harvest tracks as follows:

(25)

The main advantage of modeling paraboloid cones over planes relates to yield peaks or depressions see

The choice of model can be based on the ratio of the sum of weights within a narrower neighborhood around to the sum of weights within the entire neighborhood. A small ratio indicates that there are only a few valid measurements close to, so the desired fit is determined mainly by extrapolation. The paraboloid cone should be used only if this ratio exceeds a certain value.

The considerations that lead to the choice of an elliptical neighborhood also depend on a sufficiently large share of valid measurements within the nearer environment. If it is too small, a circular neighborhood should be used.

To ensure that the ratio changes smoothly, a function for a degree of membership in the narrower environment is used, which is 1 at the point. Likewise to the local weight function in (22), it decreases quadratically to zero at the boundary of the neighborhood because the effect of a paraboloid extrapolation can also increase quadratically with the distance. This leads to the definition of the following ratio:

The sums relate to the weights of all points within an initial elliptic neighborhood of. An increase of its size according to the rule in (30) does not involve a new computation of.

To form an idea of the limits at which to model paraboloid yield surfaces or at which to use an elliptical neighborhood, the ratio of the expected values of the nominator and the denominator ofis helpful. Under the assumption of continuously and uniformly dispersed data points with global weight, this ratio does not depend on the radii of the ellipse, so it can be calculated for a circle with radius 1:

Since monitor yield data often show surface textures along the harvest tracks, an elliptical neighborhood that is wider across the tracks than it is along them is preferred. The determination of an adequate radius proportion, , with statistical methods is not afforded in this article. Yield data obtained with the Claas Agrocom Quantimeter suggest that could be a good choice. However, if there are few values in the nearer environment of, so that in (26) is small, the neighborhood should not be shortened along the tracks, and a circular neighborhood, i.e. a local radius ratio of, should be used. Yet to determine this, an initial neighborhood is already necessary. It is based on the `full' radius ratio, , and on an initial radius, a proper value of which can be found using the method in Section 3. The initial radius along the tracks is then given by. Based on this neighborhood, an initial measure, in (26), is computed to determine the local radius ratio. For it is assigned its maximum value, , because 0.6 is already close to the ratio, , of expected values in (27). A circular neighborhood, i.e., is only used if, and for the local radius ratio, , is defined as a linear interpolation between 1 and. This is summarized in the following definition:

Further computations of are based on a neighborhood that already respects this local radius ratio,. When deciding on the model, the radius of the ellipse remains unchanged, but the radius along the tracks is adapted to it: .This can enlarge the neighborhood, but it rarely occurs, unless corners of a field are harvested. The elliptical neighborhood with these radii, and, also serves as the initial neighborhood in (30), where the final neighborhood size is determined. Its shape, however, i.e., its radius ratio, , remains unchanged, until the next point is mapped.

Paraboloid cones are preferred, therefore, the limit at which to model it should be less than the ratio in (27). The transition from planes to paraboloid cones should be smooth to obtain a continuous yield map. In Bachmaier [

The measure, which it refers to, is based on the elliptical neighborhood with radiiand.

Modeling only a horizontal plane (p = 1) for very small has proved inadequate because areas with only a few measurements occur mainly at corners of a field, where the yield usually tends to become lower. Instead, the neighborhood size is increased until reaches a minimum value of 0.30, so that there are enough data to fit an appropriate skewed plane.

The neighborhood should also be enlarged if the effective number of data points, , is less than a minimum effective number,. This often applies where the combine enters the harvest track because the corresponding measurements have a global weight of zero. The rule for increasing the neighborhood's radii is as follows:

The proportion, , of to, as well as the choice of model is no longer affected by this procedure, as these decisions have already been made on the basis of and.

If the variance of the true unknown yields of the field were known, the yield map could be considered optimal if its variance equaled that of the true yields. If it is less than the variance of the true yields, the map is too smooth, if it is greater, the map is too detailed. The variance of the true yields is unknown. It needs to be estimated in a way that does not depend on how the yield map has been generated.

The variance of the measured yields comprises the variance of the true yields and the error variance of the yield monitor. Hence, the required variance of the true unknown yields results in

The error variance, , can be estimated by the nugget component of a variogram. The reason is that there is no microscale variation in the yields because, as defined in the introduction, any yield at a field point refers to the average yield on a rectangle to be harvested around it, so all the nugget effect arising in an empirical variogram reduces to the error variance, , of the yield monitor (Cressie [

The empirical variance of measured values can be given by the following two equivalent formualae; the latter is less well known.

is the number of summands in the latter formula, which expresses the variance as half the average of the squared differences between all pairs of values. Squared differences of yields usually increase if the spatial distance between the data points increases. This is expressed by a variogram, which gives the variance as a function of the separation distance, , which is also called the lag. Empirical variances, , depicted in a variogram are restricted to pairs of values that are separated by a given lag, so they can be called variances of the yield data points at given separation distance (Bachmaier and Backes [

(33)

This writing is as usual as misleading. One must be aware that the index in this sum is not a counter of certain logged locations (mean adjusted GPS points), but a counter of pairs of such locations. A single location can occur in many pairs, so that, for example, the third logged location point, which was denoted by in Section 2, could now be indexed by

. The index counts all pairs of those locations that are separated by a vector whose length, , is approximately equal to the target length, i.e., where is the class width of distances used for computing a single variogram value (m in

The total variance of the measured yields in (32) is split into variances at given separation distance in (33), which in turn can be used to obtain the total variance as a weighted mean of them:

where correspond to the target distances in (33). They are the midpoints of the different classes, whereas are the averages of all distances within class, i.e.

the averages of all with. An analogue of the formula in (34) is needed to compute the total variance robustly.

The variogram estimator in (33) is based on squared differences among data, so it is sensitive to outlying yield data points. A single outlying datum can distort the variogram estimates since it occurs in several paired comparisons over many or all lag intervals (Lark [

Robust variogram estimates have been introduced by several authors (Cressie and Hawkins [

(35)

The robust variances, , at a target separation distance, , are then defined by the solution of

withas in

To compute the robust variogram for the yield monitor data according to (36), it was necessary to omit summands related to pairs that are close to each other on the same harvest track. This ensures near independence of the remaining pairs. Pairs where either or has a global weight of zero were canceled also. Such a robust variogram is shown in

The desired classical variance of the true unknown yields is now, analogous to (31), computed as the following difference of robust variances:

The robust error variance at separation distance zero,

which is the nugget component of the robust variogram, is determined by a weighted cubic extrapolation (Bachmaier [

Analogously to (34), the robust variance of all measured values, is estimated as a weighted mean of all values depicted in the robust variogram:

Thus, the variance estimation in (37) depends only on robust variogram values. They are barely affected by outliers, so they underestimate the theoretical classical variogram values a little, as these also include the variance of the yield monitor, which might be large because of the outliers. But since the underestimation of the robust variances affects all variances in the variogram similarly, the difference in (37), which gives the desired variance of the true yields, is barely affected. Therefore, this estimation method worked well, as shown in the Monte Carlo results in Bachmaier [

In 2001 the winter wheat harvest of Bandstauden field in Zeilitzheim (Germany) was investigated. Using the measurements for moisture, the raw yield data measured by the Claas Agrocom Quantimeter with a data logging frequency of 0.2 Hz were converted to dry matter yields. Zero yields were omitted as were those with a missing value for moisture. Values with technical errors were assigned the global weight zero, which meant they were discarded also.

Figures 10(a), (b) shows the yield monitor data in Mg ha^{-1} and their global weights, , respectively. Weights less than 1 arise from a small deviation from the preceding harvest path or from values at the beginning of a swath. Many neighboring swathes were harvested in the same direction, therefore, regions with invalid values are larger than usual.

The larger the neighborhood size, the smoother the yield map is and the less is its variance. With a fine -textured yield map in

An adequate neighborhood size is obtained if the variance of the yield map equals that of the true yields. The variance estimation of the true yields is described from (36) to (39). It is based on the robust variogram in

According to (37), the variance of the true yields is estimated by

where the robust error variance, , equals the nugget component, , which was obtained as the intercept on the ordinate of the extrapolated robust variogram in

Since the yield map is required to be as smooth as the true yields,

with a minimum of five times the swath width) and was increased until the minimum effective number of data reached. This number needs to be adapted to the neighborhood size. It should approximately correspond to the that which is expected if most measurements of the neighborhood are valid. It depends on the data logging frequency, on the swath width, and, of course, on the neighborhood size.

The yield mapping method proposed in this article is based mainly on modeling paraboloid cones on moving elliptical neighborhoods. The method of determining the parameters of the model is robust, so that the detection of outliers was not necessary. Besides, the method refers to weights that decrease, like a paraboloid cone opening downwards, to zero at the neighborhood’s border. This corresponds to a smooth transition from being fully considered to being not considered at all, so that a larger neighborhood is necessary, as the values close to the border do not have full weight. A robust variogram, computed independently of the yield mapping method, served to estimate the variance of the true unknown yield values. This provided a measure of how smooth the yield map would be if all values had been measured correctly. For an elliptical neighborhood with at least, (i.e.) and a minimum effective number of the estimated yield map had the same variance as the true unknown yield map.

In 2004, an experiment was done on a part of the Lamprechtsfeld in Thalhausen (Germany) where measured reference values for yield data from two yield monitors were obtained; Data Vision Flowcontrol and AgLeader. The results based on butterfly neighborhoods suggest that, if one wants a yield map whose sum of squared deviations from the true unknown values is minimized, the neighborhood size should be greater than that of

The method proposed in the present paper cannot be used to optimize the neighborhood’s shape, but it gets along with yield monitor data only. It could already be seen from

Currently, the proposed method is applied to a single yield monitor only (Claas Agrocom Quantimeter) with a data logging frequency of 0.2 Hz and one crop. It could also be applied to harvests from other crops or from combines equipped with other cutting widths and other yield monitors. This might lead to different results, however, the yield mapping rule proposed here would produce yield maps that are not too smooth, not too fine-textured and not subject to large outliers. The reason is that it requires at least a radius of and it does not depend on the swath width, sw, because the radii are expressed in multiples of it. This also guarantees robustness against exclusively erroneous measurements on one or two harvest tracks within the neighborhood. The method requires a minimum effective number of data,. A yield monitor with a higher frequency, such as the AgLeader monitor, provides more data, so would be reached within a smaller neighborhood. But these data are usually less accurate, so the neighborhood size should not decrease. What should be adapted to the yield monitor is the number, ,of zero -weights at the beginning of harvest tracks. It should increase with the frequency of the system and be adapted to the intensity of its smoothing algorithm. In the Fortran program paraboloidmapping (Bachmaier [

I thank Margaret A. Oliver, University of Reading, for her helpful and detailed remarks on this research and Hermann Auernhammer, Technische Universität München-Weihenstephan, for supporting this work.