Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch argparse to config.py / fiddle / argparse-dataclass #21

Open
euanong opened this issue Jan 21, 2024 · 4 comments
Open

Switch argparse to config.py / fiddle / argparse-dataclass #21

euanong opened this issue Jan 21, 2024 · 4 comments
Assignees

Comments

@euanong
Copy link
Collaborator

euanong commented Jan 21, 2024

No description provided.

@euanong euanong self-assigned this Jan 21, 2024
@JasonGross
Copy link
Owner

JasonGross commented Jan 21, 2024

Dump of my thoughts:

  1. I've heard someone suggest Hydra + Fire
  2. argparse-dataclass looks like it's missing support for nested data classes, right? Maybe there's a way to kludge nested dataclass support?
  3. I was not able to make heads or tails of fiddle from looking at it for two minutes, I'll take a deeper look later
  4. click looks cool, but it seems mostly geared around functions not dataclasses?

@JasonGross
Copy link
Owner

Looking a bit more at fiddle, I guess the essential design question here is which way we want the arrows to point. Right now model configuration and running feels a bit spaghetti, I think, because the arrows don't all point the same way:
The top-level model drivers all invoke train_or_load_model with something that is subclassed from a config object in the train_or_load_model file.

I think right now train_or_load_model is doing too many things:

  1. it is constructing wandb information from model config (this should be factored into a separate function)
  2. it is constructing disk path information from model config (this should also be factored)
  3. it tries loading the model from disk or else wandb
  4. it constructs training arguments from model config & wandb info (this should also be factored)
  5. it runs the training loop
  6. it saves the model to disk & wandb

@JasonGross
Copy link
Owner

I think the relevant design constraints are:

  1. it's nice for consumers of experiments to be able to import something from wherever the config is saved and make a single call that fetches the model, training it if it doesn't exist
  2. experiments are varied, and should have control over how to configure model setup and training
  3. (HookedTransformer) model architecture is currently uniform and should be de-duplicated across all experiments that involve a single HookedTransformer model
  4. logging is uniform; experimental setup should not have to think about wandb, disk, etc
  5. the model configuration should be serializable (for logging) and reproducible
  6. we should be able to define various configurations of an experiment we care about either in python (or yaml, I guess) or from the command line

@JasonGross
Copy link
Owner

I am thinking that plausibly we want to invert the control flow: rather than having a unified config object class across all experiements, we want to define wrappers of useful common functionality, and merge configs for various functions with fiddle?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants