Skip to content

Commit

Permalink
Fixed some typos in the documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
mrucker committed Feb 13, 2024
1 parent 2a39e3a commit 27a3c52
Show file tree
Hide file tree
Showing 3 changed files with 91 additions and 88 deletions.
46 changes: 25 additions & 21 deletions doc/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,33 +15,35 @@ Coba can be installed via pip.
Coba has no hard dependencies, but it does have optional depdencies for certain functionality.

Coba will let you know if a dependency is needed, so there is no need to install things upfront.

The examples contained in the documentation use the following optional dependencies.

.. code-block:: bash
$ pip install matplotlib pandas scipy numpy vowpalwabbit
$ pip install matplotlib pandas scipy numpy vowpalwabbit cloudpickle
About Contextual Bandits
~~~~~~~~~~~~~~~~~~~~~~~~
Contextual Bandits
~~~~~~~~~~~~~~~~~~

A contextual bandit (sometimes called a contextual multi-armed bandit) is an abstract game where players
repeatedly interact with a "contextual bandit". In each interaction the contextual bandit presents the
player with a context and a choice of actions. The player must then choose to play one action from the
set of presented actions.
player with a context and a choice of actions.

The player must choose an action to play from the set of presented actions. If the player chooses well
the contextual bandit gives a large reward. If the player chooses poorly the contextual bandit gives a
small reward. The player only observes the reward for the action they chose.

If the player chooses well the contextual bandit gives a large reward. If the player chooses poorly the
contextual bandit gives a small reward. The player only observes the reward for the action they choose.
The player's goal is to earn as much reward as possible. To succeed players need to learn what actions
give large rewards in what contexts. This game is of interest to researchers because it ammenable to
mathematical analysis while also being applicable to many real world problems.

About Contextual Bandit Learners
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Contextual Bandit Learners
~~~~~~~~~~~~~~~~~~~~~~~~~~

Contextual bandit learners are machine learning algorithms that have been designed to play contextual bandits.
They should be used to solve problems with partial feedback. Partial feedback is common in the real world where
we can observe the result of what we chose to do but don't know what would have happened had we done something
else.
They are particularly good at solving problems with partial feedback. That is, problems where we can observe
the consequence of an action but don't entirely know what would have happened had we done something else.

About Coba
~~~~~~~~~~
Expand All @@ -51,11 +53,11 @@ coba supports two use cases.

One, evaluating a contextual bandit learner on *many* contextual bandits. This is useful for algorithm researchers who want
to broadly evaluate a contextual bandit learner's capabilities. Coba achieves this by creating contextual bandits from the
many real-world supervised datasets hosted on openml.org.
hundreds of real-world supervised datasets hosted on openml.org.

Two, evaluating *many* contextual bandit learners on a particular contextual bandit. This is useful for application
researchers who are trying to solve a specific use-case. Coba achives this by providing robust implementations of
well-known algorithms behind a common interface.
researchers who are trying to solve a specific use-case. Coba achieves this by providing robust implementations of
well-known algorithms behind a common interface.

Key Concepts
~~~~~~~~~~~~
Expand All @@ -68,15 +70,17 @@ Key Concepts
3. Learner -- A player in a contextual bandit game.
4. Evaluator -- A method to evaluate how well a Learner plays an Environment
5. Experiment -- A collection of Environments, Learners, and Evaluators.
6. Result -- Data generated when Learners were evaluated playing an Environment.
6. Result -- Data generated when an Experiment Evaluates Learners in an Environment.

Knowing these concepts can help you find help and perform advanced experiments.

The core concepts help in finding more information about Coba. For example, all built-in learners can be
found at :ref:`coba-learners`. Help with creating environments can be found at :ref:`coba-environments`. The
types of evaluation that coba supports out of the box can be found at :ref:`coba-evaluators`. The various ways
an experiment can be configured is described at :ref:`coba-experiments`. And details regarding analysis
functionality can be found at :ref:`coba-results`.
Knowing the key concepts makes it easier to find help and information about Coba.
For example, notebooks for each of the six key concepts are available under Basic Examples on the left.
All built-in learners can be found at :ref:`coba-learners`.
All environments and their filters can be found at :ref:`coba-environments`.
The types of evaluators that coba supports out of the box can be found at :ref:`coba-evaluators`.
The various ways an experiment can be configured is described at :ref:`coba-experiments`.
And details regarding analysis functionality can be found at :ref:`coba-results`.

Next Steps
~~~~~~~~~~
Expand Down
80 changes: 36 additions & 44 deletions doc/source/notebooks/First_Algorithm.ipynb

Large diffs are not rendered by default.

53 changes: 30 additions & 23 deletions doc/source/notebooks/First_Application.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,19 @@
"source": [
"# First Application\n",
"\n",
"This notebook introduces basic Coba functionality for contextual bandit application research.\n",
"This notebook introduces basic Coba functionality for contextual bandit application research. That is, evaluating how learners perform in specific environments.\n",
"\n",
"## Creating Environments\n",
"\n",
"The primary means of performing this type of research is to create environments representing the desired use-case.\n",
"The primary means of performing application research in coba is to create coba environments that emulate the desired application.\n",
"\n",
"This is often done with domain specific datasets. Here we discuss three types of datasets that can be used.\n",
"The easiest way to do this is with domain specific datasets. Here we discuss three types of datasets that can be used.\n",
"\n",
"### 1. Supervised Datasets\n",
"\n",
"Perhaps the easiest way to get started is with domain relevant supervised datasets.\n",
"\n",
"These kind of datasets can be generated in lab based validation studies which provide ground truth data that may not be available at deployment time. Using these kinds of datasets can give one a sense of whether this problem is even feasibly solvable by contextual bandit learners. To ingest this data into a coba environment prepping your labeled data into the `X` and `Y` variable shown below. After that simply call `cb.Environments.from_supervised`."
"These kind of datasets can be generated in lab based validation studies that provide ground truth data that may not be available at deployment time. While these datasets may be unrealistic in a deployment scenario they can still indicate the feasiblility of using contextual bandit learners. Use `cb.Environments.from_supervised`, as shown below, to ingest labeled features as a coba environment."
]
},
{
Expand Down Expand Up @@ -52,14 +52,14 @@
"source": [
"### 2. Explicit Datasets\n",
"\n",
"If you have a better understanding of your domain you might want to create an environment from scratch.\n",
"If you have a good model or understanding of your domain you might want to create an environment from scratch.\n",
"\n",
"This can be done by using Pandas dataframes to define a context, actions, and rewards for each interaction."
"This can be done by implementing the environments interface (This could also be done using dataframes with appropriate column names)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 2,
"id": "c2140211",
"metadata": {},
"outputs": [
Expand All @@ -77,21 +77,25 @@
" 'rewards': DiscreteReward([['c', 'f'], [0.3, 5]])}]"
]
},
"execution_count": 5,
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import coba as cb\n",
"\n",
"context = [[1,2,3,4], [4,5,6,7], [7,8,9,0]]\n",
"actions = [['a','b'], ['a','c'], ['c','f']]\n",
"rewards = [[1.2,0.1], [0.5,0.0], [0.3,5.0]]\n",
"class MyEnvironment:\n",
"\n",
"df = pd.DataFrame({'context':context, 'actions':actions, 'rewards':rewards})\n",
" def read(self):\n",
" context = [[1,2,3,4], [4,5,6,7], [7,8,9,0]]\n",
" actions = [['a','b'], ['a','c'], ['c','f']]\n",
" rewards = [[1.2,0.1], [0.5,0.0], [0.3,5.0]]\n",
"\n",
"list(cb.Environments.from_dataframe(df)[0].read())"
" for C,A,R in zip(context,actions,rewards):\n",
" yield {'context':C, 'actions': A, 'rewards': R }\n",
"\n",
"list(cb.Environments.from_custom(MyEnvironment())[0].read())"
]
},
{
Expand All @@ -103,9 +107,9 @@
"\n",
"If you are working with an already deployed system there's a good chance you will have logged bandit data.\n",
"\n",
"Coba has many tools to work with this kind of data once it has also been ingested.\n",
"Coba has many tools to work with this kind of data once it has been ingested.\n",
"\n",
"The easiest way to ingest this data is to used dataframes as well. The only difference from above are the columns."
"The easiest way to ingest this data is to use dataframes."
]
},
{
Expand All @@ -128,6 +132,7 @@
}
],
"source": [
"import coba as cb\n",
"import pandas as pd\n",
"\n",
"context = [[1,2,3,4], [4,5,6,7], [7,8,9,0]]\n",
Expand All @@ -147,20 +152,20 @@
"source": [
"## Pre-processing Environments\n",
"\n",
"Once you've created the environments you'd like to apply contextual bandit learners on it's time to pre-process. Coba provides several convenience methods to prep your data for learning. In fact, you can even run experiments to see which pre-processing steps most improve CB learner performance. Below we demonstrate a few of these on synthetic data but you'd be working with the environments created following the steps above. "
"Once you've created an environment you'd like to apply contextual bandit learners on it's time to pre-process. Coba provides several convenience methods to prep your data for learning. In fact, you can even run experiments to see which pre-processing steps most improve CB learner performance. Below we demonstrate a few of these on synthetic data but you'd be working with an environment created using one of the methods described at the top of this notebook. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 3,
"id": "ecb0ecdf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1. LinearSynth(A=5,c=5,a=5,R=['a', 'xa'],seed=1) | Scale('shift': 'med', 'scale': 'iqr', 'scale_using': 100) | BatchSafe(Finalize())\n"
"1. LinearSynth(A=5,c=5,a=5,R=['a', 'xa'],seed=1) | Sort('sort_keys': '*') | BatchSafe(Finalize())\n"
]
}
],
Expand All @@ -182,11 +187,11 @@
"cb.Environments.from_linear_synthetic(100).noise(context=(0,.1),reward=(0,1))\n",
"\n",
"#cycle can be used to check your robustness to non-stationarity. Cycle will wrap\n",
"#reward values to the right one space. This means optimal actions wrap to the right.\n",
"#reward values to the right one space, which cycles the optimal action right as well.\n",
"cb.Environments.from_linear_synthetic(100).cycle(after=100)\n",
"\n",
"#sort can be used to check your robustness to domain-shift. Sorting on context\n",
"#values creates an environment context distributions is a function of time step\n",
"#creates an environment where context distribution is a function of time step\n",
"cb.Environments.from_linear_synthetic(100).sort()"
]
},
Expand All @@ -199,7 +204,7 @@
"\n",
"Once your environment has been prepped you are ready to run an experiment. \n",
"\n",
"For this example we're going to use an openml dataset but everything we do below could be done with your environment used in place.\n",
"For this example we're going to use an openml dataset, but everything we do below could be done with an environment you created.\n",
"\n",
"### Feature Scaling Experiment"
]
Expand Down Expand Up @@ -244,7 +249,9 @@
"\n",
"Coba doesn't offer any out of the box methods like random search or halving.\n",
"\n",
"Instead we rely primarily on experiment level paralellization to make explicit searches computationally viable."
"Instead we rely primarily on experiment level paralellization to make explicit searches computationally viable.\n",
"\n",
"The legend in coba plots will always be sorted according to learner performance to make it easier to pick out the best."
]
},
{
Expand Down

0 comments on commit 27a3c52

Please sign in to comment.