Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour with order of columns in IntercomparisonMetrics #220

Open
s-scherrer opened this issue Mar 12, 2021 · 2 comments
Open

Comments

@s-scherrer
Copy link
Collaborator

When setting up a test for the new PairwiseIntercomparisonMetrics (#219) I stumbled on some (for me) unexpected behaviour.

I used a testcase with 4 datasets, and apparently the order of the datasets changed, leading to wrong results (see example below).
The issue is caused by different orders of the dataset names passed to IntercomparisonMetrics. It works when dataset_names passed to IntercomparisonMetrics comes from get_dataset_names using a DataManager. If one just passes the names of the datasets in a different oder, the results are wrong. However, in the former case the wrong column (c1) is used as reference.

I believe the problem is that pytesmo automatically converts a dataset dictionary to a Data Manager (

self.data_manager = DataManager(datasets, spatial_ref, period)
), and in the following just uses an order derived from this.

Based on this order, perform_validation then renames the columns of the DataFrame to ref, k1, k2, and k3 (

f = lambda x: "k{}".format(x) if x > 0 else 'ref'
), but if the order of dataset_names was different, the columns are not in the order they should be.

I think this should probably be fixed in Validation, since only validation knows which ones are reference and which ones are non-reference datasets.

@wpreimes, any thoughts on this?

from datetime import datetime
import pandas as pd

from pytesmo.validation_framework.validation import Validation
from pytesmo.validation_framework.metric_calculators import IntercomparisonMetrics, get_dataset_names
from pytesmo.validation_framework.data_manager import DataManager
# test data
startdate = datetime(2000, 1, 1)
enddate = datetime(2000, 12, 31)
dt_index = pd.date_range(start=startdate, end=enddate, freq='D')

names = ['ref', 'c1', 'c2', 'c3']
# always 0.5
df = pd.DataFrame(index=dt_index, data={
                  name: np.repeat(0.5, dt_index.size) for name in names})
df['c1'] += 0.2  # some positive bias
df['c2'] -= 0.2  # some negative bias
df['c3'] -= 0.3  # some more negative bias
df
ref c1 c2 c3
2000-01-01 0.5 0.7 0.3 0.2
2000-01-02 0.5 0.7 0.3 0.2
2000-01-03 0.5 0.7 0.3 0.2
2000-01-04 0.5 0.7 0.3 0.2
2000-01-05 0.5 0.7 0.3 0.2
... ... ... ... ...
2000-12-27 0.5 0.7 0.3 0.2
2000-12-28 0.5 0.7 0.3 0.2
2000-12-29 0.5 0.7 0.3 0.2
2000-12-30 0.5 0.7 0.3 0.2
2000-12-31 0.5 0.7 0.3 0.2

366 rows × 4 columns

Bias between c1 and ref is 0.2.

# set up datasets for validation
class DummyReader:
    def __init__(self, df, name):
        self.name = name
        self.data = pd.DataFrame(df[name])

    def read_ts(self, *args, **kwargs):
        return self.data
    
    def __repr__(self):
        return f"Reader for {self.name} with columns {list(self.data.columns)}"

datasets = {
    key+"name": {"columns": [key], "class": DummyReader(df, key)}
    for key in df
}
datasets
{'refname': {'columns': ['ref'], 'class': Reader for ref with columns ['ref']},
 'c1name': {'columns': ['c1'], 'class': Reader for c1 with columns ['c1']},
 'c2name': {'columns': ['c2'], 'class': Reader for c2 with columns ['c2']},
 'c3name': {'columns': ['c3'], 'class': Reader for c3 with columns ['c3']}}
# run validation
metrics = IntercomparisonMetrics(dataset_names=list(datasets.keys()))
val = Validation(
    datasets,
    "refname",
    scaling=None,
    metrics_calculators={(4, 4): metrics.calc_metrics}
)
results = val.calc(1, 1, 1)
/home/samuel/.local/lib/python3.8/site-packages/scipy/stats/stats.py:3508: PearsonRConstantInputWarning: An input array is constant; the correlation coefficent is not defined.
  warnings.warn(PearsonRConstantInputWarning())
/home/samuel/miniconda3/envs/pytesmo/lib/python3.8/site-packages/numpy/lib/function_base.py:2642: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[:, None]
/home/samuel/projects/QA4SM/pytesmo/src/pytesmo/scaling.py:250: RuntimeWarning: invalid value encountered in true_divide
  return ((src - np.mean(src)) / np.std(src)) * np.std(ref) + np.mean(ref)
results[list(results.keys())[0]]["BIAS_between_refname_and_c1name"]
array([0.4], dtype=float32)

The bias should be 0.2 here.

When using get_dataset_names instead of list(datasets.keys()) it works correctly:

datamanager = DataManager(datasets, ref_name="refname")
ds_names = get_dataset_names(datamanager.reference_name, datamanager.datasets, n=4)
ds_names, list(datasets.keys())
(['c1name', 'c2name', 'c3name', 'refname'],
 ['refname', 'c1name', 'c2name', 'c3name'])
metrics = IntercomparisonMetrics(dataset_names=ds_names)
val = Validation(
    #datamanager,
    datasets,
    "refname",
    temporal_ref="refname",
    scaling=None,
    metrics_calculators={(4, 4): metrics.calc_metrics}
)
results = val.calc(1, 1, 1)
/home/samuel/.local/lib/python3.8/site-packages/scipy/stats/stats.py:3508: PearsonRConstantInputWarning: An input array is constant; the correlation coefficent is not defined.
  warnings.warn(PearsonRConstantInputWarning())
/home/samuel/miniconda3/envs/pytesmo/lib/python3.8/site-packages/numpy/lib/function_base.py:2642: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[:, None]
/home/samuel/projects/QA4SM/pytesmo/src/pytesmo/scaling.py:250: RuntimeWarning: invalid value encountered in true_divide
  return ((src - np.mean(src)) / np.std(src)) * np.std(ref) + np.mean(ref)
results[list(results.keys())[0]]["BIAS_between_c1name_and_refname"]
array([0.2], dtype=float32)
@s-scherrer
Copy link
Collaborator Author

I introduced a new keyword argument rename_cols to Validation.calc and Validation.perform_validation that can be used to disable the renaming. This will be included in PR #219, and fixes the issue from above somewhat. However, I still think the renaming is a very hacky and fragile construct, and we should probably find a way to remove this in the future.

During testing I stumbled on a related problem: If I use a metrics calculator with (n, 2) in Validation, e.g.:

dr = pd.date_range("2000", "2020", freq="D")
n = len(dr)
x = np.ones(n) * 2
df = pd.DataFrame(
    {
        "reference": x,
        "zplus2": x + 2,
        "plus1": x + 1,
    },
    index=dr
)

datasets = {
    key + "_name": {"columns": [key], "class": DummyReader(df, key)}
    for key in df
}

val = Validation(
    datasets,
    "reference_name",
    scaling=None,  # doesn't work with the constant test data
    temporal_matcher=None,
    metrics_calculators={
        (3, 2): PairwiseIntercomparisonMetrics().calc_metrics
    }
)
results = val.calc(1, 1, 1)

Then the results dictionary has these keys:

(('plus1_name', 'plus1'), ('reference_name', 'reference')), 
(('plus1_name', 'plus1'), ('zplus2_name', 'zplus2')), 
(('reference_name', 'reference'), ('zplus2_name', 'zplus2'))

that is, the tuples are sorted alphabetically.

When using the new CombinedTemporalMatcher that I implemented (#219), which only returns combinations including ref

(('plus1_name', 'plus1'), ('reference_name', 'reference')), 
(('reference_name', 'reference'), ('zplus2_name', 'zplus2'))

In one case the reference is first, in the other case the reference is second, which is pretty unintuitive as a user. In this case I would expect that it's always (other, ref), or (ref, other), and not depend on the name of the datasets. The temporal matcher returns the matched pairs with keys (other, ref), which I was expecting to get.

I know that pytesmo by default does all combinations, and in this case a general rule is harder. But I think pytesmo should respect the order that was given by the temporal matcher. Furthermore, since dictionary orders are now reliable (since CPython 3.6, and since Python 3.7 in the standard), the default matcher should probably also respect the order of datasets in the DataManager/datasets dictionary, if it doesn't do this already.

@s-scherrer
Copy link
Collaborator Author

Addendum: This reordering is especially a problem when calculating bias, because the sign is then different in these two cases, and simply using tuple(sorted(key)) doesn't work for interpreting the result correctly.

So I think this needs to be fixed when we want to incorporate the new PairwiseIntercomparisonMetrics.

@wpreimes do you have an idea how far reaching those changes would be? Is this order being relied upon a lot in pytesmo?

s-scherrer added a commit to s-scherrer/pytesmo that referenced this issue Apr 23, 2021
wpreimes pushed a commit that referenced this issue Apr 23, 2021
* temporal matching for multiple frames

* don't convert timezone naive indices

* combined matcher for dictionary

* added option to use all with combined_dropna

* comparison of new and old matching implementation

This adds a test to check whether the new matching in
BasicTemporalMatching works as the old one in case there are no
duplicates in the reference frame.

It replaced the matcher in the other tests by a dummy that still uses
the old matching implementation.
The following commit will remove the dummy and set the new expected
values.

* changed tests for ascat_ismn to use new matching

* fixup for previous commit

* added PairwiseIntercomparisonMetrics

* fixed test for PairwiseIntercomparisonMetrics

* statistics based metrics calculator

* Added tests for PairwiseIntercomparisonMetrics

One test still fails due to difference in scaling before calculating
ubRMSD.

* Added CIs to PairwiseIntercomparisonMetrics

* made bootstrapped CI tests less sensitive

* made bootstrapped CI tests less sensitive

* added missing import

* temporal matching for multiple frames

* don't convert timezone naive indices

* added option to use all with combined_dropna

* combined matcher for dictionary

* comparison of new and old matching implementation

This adds a test to check whether the new matching in
BasicTemporalMatching works as the old one in case there are no
duplicates in the reference frame.

It replaced the matcher in the other tests by a dummy that still uses
the old matching implementation.
The following commit will remove the dummy and set the new expected
values.

* changed tests for ascat_ismn to use new matching

* fixup for previous commit

* added PairwiseIntercomparisonMetrics

* fixed test for PairwiseIntercomparisonMetrics

* statistics based metrics calculator

* Added tests for PairwiseIntercomparisonMetrics

One test still fails due to difference in scaling before calculating
ubRMSD.

* Added CIs to PairwiseIntercomparisonMetrics

* made bootstrapped CI tests less sensitive

* made bootstrapped CI tests less sensitive

* added missing import

* make_combined_temporal_matcher now also works with k>2

* Fixed some issues

- bug in validation.get_data_for_result_tuple
- no segfault when running tests anymore

* better confidence intervals for tcol

* added tests for tripletmetrics

* added imports

* run tests all at the same time

* removed imports from validation_framework init

* triple metrics only

* removed xarray test

* fixed test

* fixed test

* TripleIntercomparisonMetrics -> TripleCollocationMetrics

* use percentile method by default; analytical CIs for mse_bias

* better test for #220
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant