Like many of you, I have download the new dataset and have started to dive into it.
Having worked with the 2007 dataset I did find a lot of changes.
One of the biggest changes was reading the csv files in matlab. Using readcsv or readxls are not the best choices.
What I did not expect was that each file had a different number of columns, so my first foray into the data led to big errors assuming all the NEE or GPP data from each site were in the same column.
After some ‘strang and drum’, Gilberto recommended using readtable. This was a great suggestion. I now had the labels linked with the columns of data.
Another aspect of the new files are huge numbers of output of met and flux variables. One file had 22 columns of temperature. There are also a large number of NEE, GPP and Reco variables, based on different gap filling and ustar methods. Plus information on confidence intervals. This is a good statistical advance for the network, though it may not be of interest to the general user. So one has to decide which variables may be more or most appropriate.
So for those wanting to work with a smaller and more concise dataset there is the subset.
I also recommend downloading the metadata files. I did not do that the first round. But by doing so I could link annual sums with plant functional types, have information on longitude and latitude, etc.
Working with annual sums, I tend to lobby for variables that are summed rather than averaged. Latent heat exchange is a good example. It is currently in units of W m-2. For the annual sums I would prefer GJ m-2 y-1 or MJ m-2 y-1. Average energy fluxes are a big ambiguous; they have different uses and interpretations if they are averaged over the daylight periods, when they are none zero, or over 24 hours.
We continue to lobby for site teams to upload the supporting biomass and ancillary data. While the templates may seem overwhelming, they can be parse and simplified for certain variables. For example, if one has a time series of leaf area index, create a spreadsheet that has the header information on the site and investigators and then transpose a column with the leaf area data and corresponding dates. Quite easy.
Finally, the first release is missing lots of meteorological variables, like wind direction, friction velocity, net radiation, soil heat flux, PAR and reflected par and shortwave. These will be added to the next release. I am also lobbying for information on standard deviations of u, v and w so we can compute flux footprints. We also need to carry diffuse radiation for sites which make that measurements. We are also working with the data team to make the BADM data (biological and ancillary metadata) more accessible. There is a lot of information submitted to the database on such properties as functional group, measurement height, species, soil properties, leaf area index etc, that will be critical towards interpreting the flux data and for running model simulations. Being able to access and use these data remain a high priority as the system evolves.
In closing we urge you, the user community, to give us feedback (constructive criticism) if you detect issues as you use the data so we can make the next products better. With many eyes on the data problems can get fixed and the datasets can become better and more useful to the user community..