log on action and prob for off-policy evaluation #43

jonastim · 2023-08-23T21:30:34Z

The main change is for the off-policy evaluator to log the action and probability of the learning model rather than that of the logged data (which is identical for all models and not very useful when trying to compare different learners).

Introduced a helper function for sampling the actions and used if in some other code placed to avoid redundancy.

codecov · 2023-08-24T21:34:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7499d8b) 99.90% compared to head (3c0686b) 99.90%.

❗ Current head 3c0686b differs from pull request most recent head dd3b34b. Consider uploading reports for the commit dd3b34b to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
- Coverage   99.90%   99.90%   -0.01%     
==========================================
  Files          55       56       +1     
  Lines        7455     7366      -89     
==========================================
- Hits         7448     7359      -89     
  Misses          7        7

Flag	Coverage Δ
	`99.90% <100.00%> (-0.01%)`	⬇️
ubuntu-latest	`99.90% <100.00%> (-0.01%)`	⬇️
unittest	`99.90% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jonastim · 2023-08-24T21:46:41Z

coba/evaluators/online.py

                        else:
                            ope_reward = [ sum(p*float(R.eval(a)) for p,a in zip(P,A)) for P,A,R in zip(on_probs,log_actions,log_rewards) ]
+                            on_action, on_prob = zip(*[sample_actions(actions, probs) for actions, probs in zip(log_actions, on_probs)])
                    else:


How to do this for continuous actions?

For continuous actions we just need to call on_action,on_prob = predict(log_context, log_actions)[:2]

Maybe on line 246? I don't think we need to have separate processing for batched and non-batched. Man I hate all this batched logic. It's all here for neural network stuff we do where backpropagation with mini-batches can give huge gains in computation time.

Tried to add support for continuous actions but struggle to make some tests pass, see below.

jonastim · 2023-08-24T21:47:13Z

coba/evaluators/online.py

            if record_context: out['context']      = log_context
            if record_actions: out['actions']      = log_actions
            if record_rewards: out['rewards']      = log_rewards

            out.update({k: interaction[k] for k in interaction.keys()-OffPolicyEvaluator.IMPLICIT_EXCLUDE})

-            if record_ope_loss: out['ope_loss'] = get_ope_loss(learner)
+            if record_ope_loss: out['ope_loss'] = get_ope_loss(learner) if not batched else [get_ope_loss(learner)] * len(log_context)


Make OPE loss work for batched evaluation

jonastim · 2023-08-24T21:48:29Z

coba/learners/safety.py

-                I = [self._get_pmf_index(p) for p in pred]
-                A = [ a[i] for a,i in zip(actions,I) ]
-                P = [ p[i] for p,i in zip(pred,I) ]
+                A, P = list(map(list, zip(*[sample_actions(a, p, self._rng) for a, p in zip(actions, pred)])))


Could remove list(map(list, if it was ok to return a tuple instead of a list.

jonastim · 2023-08-24T21:55:13Z

I can't seem to re-run tests.
Looks like these unrelated tests should be less sensitive

======================================================================
FAIL: test_DM (coba.tests.test_environments_filters.OpeRewards_Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/coba/coba/coba/tests/test_environments_filters.py", line 2390, in test_DM
    self.assertAlmostEqual(new_interactions[0]['rewards'].eval('c'),.79699, places=4)
AssertionError: 0.7970473766326904 != 0.79699 within 4 places (5.7376632690453455e-05 difference)

======================================================================
FAIL: test_DM_action_not_hashable (coba.tests.test_environments_filters.OpeRewards_Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/coba/coba/coba/tests/test_environments_filters.py", line 2404, in test_DM_action_not_hashable
    self.assertAlmostEqual(new_interactions[0]['rewards'].eval(['c']),.79699, places=4)
AssertionError: 0.7970473766326904 != 0.79699 within 4 places (5.7376632690453455e-05 difference)

----------------------------------------------------------------------

mrucker · 2023-09-04T00:29:29Z

I can't seem to re-run tests. Looks like these unrelated tests should be less sensitive

======================================================================
FAIL: test_DM (coba.tests.test_environments_filters.OpeRewards_Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/coba/coba/coba/tests/test_environments_filters.py", line 2390, in test_DM
    self.assertAlmostEqual(new_interactions[0]['rewards'].eval('c'),.79699, places=4)
AssertionError: 0.7970473766326904 != 0.79699 within 4 places (5.7376632690453455e-05 difference)

======================================================================
FAIL: test_DM_action_not_hashable (coba.tests.test_environments_filters.OpeRewards_Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/coba/coba/coba/tests/test_environments_filters.py", line 2404, in test_DM_action_not_hashable
    self.assertAlmostEqual(new_interactions[0]['rewards'].eval(['c']),.79699, places=4)
AssertionError: 0.7970473766326904 != 0.79699 within 4 places (5.7376632690453455e-05 difference)

----------------------------------------------------------------------

Totally agree. Are these the only tests you're having problems with? I've tried to bump down a lot of tests over time.

Sync upstream

jonastim · 2023-11-06T22:39:51Z

coba/tests/test_evaluators_online.py

+                    raise Exception()
+                return 0.5
+
+            def predict(self, context, actions):


Struggling to make this test pass.
The processing thinks it's of AX format and then always fills in Nones for the probability. I haven't worked with continuous actions before and I am not quite sure about all the different formats and in the SafeLearner
Any advice, @mrucker?

jonastim added 6 commits August 23, 2023 17:12

log on action and prob for off-policy evaluation

a70ec66

clean up sample

ccc5331

clean up sample function and use it throughout

a0353c4

remove unused function

cb659b4

typing fix

3a0372a

tuple -> list

b1db33e

unit test

50cc8ca

jonastim commented Aug 24, 2023

View reviewed changes

jonastim marked this pull request as ready for review August 24, 2023 21:55

Trigger build checks

688cc8f

jonastim and others added 9 commits September 28, 2023 14:56

Merge pull request #5 from VowpalWabbit/master

d0ec9e5

Sync upstream

experiment

4ca7b00

Merge branch 'master' of https://github.com/jonastim/coba

af39097

additional explorer

23490d5

additional explorer

aad6af5

test different explorers

9c7efac

Merge branch 'master' into off_policy_evaluation_recorded_metrics

3c0686b

continuous action handling

3d161d8

struggling to make tests pass

0efdfaa

jonastim commented Nov 6, 2023

View reviewed changes

Merge branch 'master' into off_policy_evaluation_recorded_metrics

dd3b34b

mrucker merged commit d9367a7 into VowpalWabbit:master Nov 19, 2023
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log on action and prob for off-policy evaluation #43

log on action and prob for off-policy evaluation #43

jonastim commented Aug 23, 2023 •

edited

Loading

codecov bot commented Aug 24, 2023 •

edited

Loading

jonastim Aug 24, 2023

mrucker Sep 4, 2023

mrucker Sep 4, 2023

jonastim Nov 6, 2023

jonastim Aug 24, 2023

jonastim Aug 24, 2023

jonastim commented Aug 24, 2023

mrucker commented Sep 4, 2023

jonastim Nov 6, 2023

log on action and prob for off-policy evaluation #43

log on action and prob for off-policy evaluation #43

Conversation

jonastim commented Aug 23, 2023 • edited Loading

codecov bot commented Aug 24, 2023 • edited Loading

Codecov Report

jonastim Aug 24, 2023

Choose a reason for hiding this comment

mrucker Sep 4, 2023

Choose a reason for hiding this comment

mrucker Sep 4, 2023

Choose a reason for hiding this comment

jonastim Nov 6, 2023

Choose a reason for hiding this comment

jonastim Aug 24, 2023

Choose a reason for hiding this comment

jonastim Aug 24, 2023

Choose a reason for hiding this comment

jonastim commented Aug 24, 2023

mrucker commented Sep 4, 2023

jonastim Nov 6, 2023

Choose a reason for hiding this comment

jonastim commented Aug 23, 2023 •

edited

Loading

codecov bot commented Aug 24, 2023 •

edited

Loading