Step/trial storage and LLM generation strategy management #15

doxav · 2024-11-12T14:03:26Z

I have two feature requests related to optimization tracking and strategy management in Trace:

Persistent Storage of Optimization Steps
How can I store each optimization step (including parameters, gradients, and results) in a way that’s similar to how Optuna organizes trials in a study (e.g., using SQLite, PostgreSQL, MySQL, Redis, etc.)?
- This would enable analysis of parameter-result correlations, support for experiment reproducibility, distributed optimization, and fault tolerance.
- Is there a recommended way to achieve this currently, or would this require new functionality?
Management and Evolution of Generation Strategies
What is the best approach for implementing and managing different generation strategies (e.g., genetic algorithms, evolutionary strategies...) within Trace?
- Additionally, is there support or guidance on implementing a self-improving strategy, such as the Self-Taught Optimizer (STOP) approach? (Referencing STOP paper.)
- Such a feature would enable more sophisticated optimization workflows by adjusting and evolving strategies automatically.

chinganc · 2024-11-12T20:02:30Z

These are great suggestions.

Currently we suggest saving parameters nodes directly. All nodes in Trace can supports serialization (i.e. pickle and deepcopy) if the base object it wraps supports so. And after backward is called, a used parameter node object (i.e. nodes that are set to trainable) also contains the propagated feedback in its feedback attribute. So saving the nodes would keep all information. For trainable functions, its parameter can be obtained from its parameter attribute, which can be saved too.

Would this meet your need? We're happy to implement new features to make Trace more convenient to you. Just let us know.

This can be implemented as an Optimizer class or outside. The former would support some book keeping such as memory or additional validation, but it may have some limitation due to the PyTorch-like design of optimizer usage. For best flexibility, I would suggest writing the search algorithm outside, just Trace optimizer as subroutines.
Thanks for suggesting STOP. That should be easily implementable with Trace. If you're interested in contributing, we're happy to work with you. Alternatively, we will write an example of STOP implementation later on.

doxav · 2024-11-13T13:47:48Z

Persistent Storage of Optimization Steps

to my experience typical step-by-step optimization stores parameters and new performance measured to identify top configurations, guide exploration/exploitatoin and compare parameters influence/sensitivity.
Parameters in our case can be new value of trainable nodes/functions parameters. Storing feedback, as well diff between previous and new parameters, seems also required for later analysis.
New performance should be measured (utility), should it require to run a new loss measure (feedback) ?
why storing each trial after backward rather than step() ?

Generatiion based strategies

I still need to better understand how the utility function should be measured in the Optimizer class to align properly with STOP and ensure correct implementation.
The self-reinforcement of STOP could be executed within a single step (which would be easier for self-refining a step), but this approach seems computationally heavy due to extensive inference. Alternatively, it could follow the current step-by-step flow, but I don't yet see a clear and stable approach. For example, in OptoPrime, this would involve evolving the _step method, though this approach seems to introduce considerable instability.

allenanie · 2024-11-13T15:51:47Z

I can help on the STOP implementation as well -- looking like a great way to expand Trace's capability!

chinganc · 2024-11-13T17:56:46Z

@doxav Some clarification on logging.

... New performance should be measured (utility), should it require to run a new loss measure (feedback) ?

why storing each trial after backward rather than step() ?

Sorry maybe I wasn't clear. feedback at a node is computed after backward and it is removed when zero_feedback is called. So in a regular usage of

optimizer.zero_feedback() # feedback is cleared
optimizer.backward(target, feedback) # feedback is computed
optimizer.step() # feedback persists

saving the nodes by e.g. deepcopying after step would save both the latest value and the feedback resulting into it. By tracking these, you should be able to do analysis you described.

doxav · 2024-11-14T12:19:37Z

Just to let you know that I won’t be able to respond until Wednesday. If you have ideas on the points below in the meantime, we could build it progressively.

For STOP algorithm:

it can either self-refine just within a step or guide it across the different steps - I think a first simpler approach should be to run within a step (in any case we need to keep trace of candidates to avoid repeating)
it would improve the generated candidate solution at each step and the candidate generator (meta opitmizer)
we need to properly design the utility / meta-utility function, I think you are the best to think it.

For the trials/attempts logging:

it is separated from STOP design but STOP needs to keep track of passed attempts, so it might be thought to also serve this purpose.
I am still not clear how Trace/OptoPrime is able to manage long explorations of the space of parameters and associated results without it (I understand feedbacks and grads are taken in consideration as long as zero_grad is not called).

allenanie · 2024-11-14T16:37:59Z

Hi @doxav, you can save parameters of the model. We actually do this in here: https://github.com/microsoft/Trace/blob/main/examples/bbh/run_prompt_bigbench_trace.py

If after an update, the model performed worse than before, we can re-load the parameters from the previous step.

In the example above, we reload if there is an error in the code.

The schema looks something like:

val_perfs = {}
model = SomeTracedModel()
for step in range(20):
  try:
    model.forward()
  except trace.ExecutionError as e:
    if len(val_perfs) > 0:
       best_checkpoint = max(val_perfs, key=val_perfs.get)
       model.load(best_checkpoint)

   checkpoint_name = f"{save_dir}/{task_name}/epoch_{epoch}_step_{step}.pkl"
   if correctness:
      # evaluate on val examples
      val_perf, _ = evaluate(model, val_examples)
      val_perfs[checkpoint_name] = val_perf
      model.save(checkpoint_name)

This type of optimization loop is very common in PyTorch. We can definitely provide some specialized trainer that can do this under the hood.

allenanie · 2024-11-15T17:54:43Z

I am still not clear how Trace/OptoPrime is able to manage long explorations of the space of parameters and associated results without it (I understand feedbacks and grads are taken in consideration as long as zero_grad is not called).

I realize you might be asking a different question -- OptoPrime has self.memory -- it presents past variables and feedbacks in the prompt context:

Below are some variables and their feedbacks you received in the past:
...

doxav · 2024-11-18T18:08:45Z

Hi,

I finally had the time to look into it this afternoon.

As STOP generates several candidates at each iteration, the optimizer must be able to select best candidate among them via a utility function (e.g. loss/task specific metric) or it could be a comparison mean between candidates if no metric is available to find best.

STOP also requires to be able to average the utility of all candidates generated by the optimizer to estimate if it is an improved optimizer, or this could also be done via comparison if no metric is available.

The trick to use comparison would allow to cover any case by using LLM when no utility function is provided/available.

To implement this simply, we could:

Option A: wrap/decorate construct_prompt in an improver managing the generation of several candidates and selecting best, and this improver function could be iteratively evolved (or in parallel) to find better improver.
Option B: evolve construct_prompt to generate several candidates and selecting best, and new construct_prompt could be iteratively evolved (or in parallel) to find better improver.

I like Option A because it could easily apply to other Optimizer I think.

# Option A idea of implementation

    class OptoPrimeImprover(Optimizer):

        def __init__(self, base_optimizer, num_candidates=3, utility_function=None, preference_function=None, **kwargs):
            super().__init__(base_optimizer.parameters, **kwargs)
            self.base_optimizer = base_optimizer
            self.num_candidates = num_candidates
            self.utility_function = utility_function  # User-provided utility function
            self.preference_function = preference_function  # User-provided preference function

        @bundle(trainable=True)
        def improver(self, summary, mask=None, *args, **kwargs):
            """Improver function that generates multiple candidates and selects the best."""
            candidates = []
            # Generate multiple candidate prompts
            for _ in range(self.num_candidates):
                # Generate a candidate prompt using the base optimizer's construct_prompt
                system_prompt, user_prompt = self.base_optimizer.construct_prompt(summary, mask, *args, **kwargs)
                candidates.append((system_prompt, user_prompt))
            # Evaluate candidates and select the best
            if self.utility_function:
                # Evaluate each candidate individually
                best_score = float('-inf')
                best_prompt = None
                for system_prompt, user_prompt in candidates:
                    score = self.utility_function(system_prompt, user_prompt)
                    if score > best_score:
                        best_score = score
                        best_prompt = (system_prompt, user_prompt)
            elif self.preference_function:
                # Use preference function to select the best candidate among all
                best_prompt = self.preference_function(candidates)
            else:
                # Default behavior if no utility or preference function is provided
                best_prompt = candidates[0]  # Select the first candidate
            return best_prompt

        def construct_prompt(self, summary, mask=None, *args, **kwargs):
            # Use the improver function to generate and select the best prompt
            return self.improver(summary, mask, *args, **kwargs)
        # Other methods (e.g., _step, call_llm) remain unchanged or inherit from base_optimizer

    # Usage example,
    base_optimizer = OptoPrime(parameters, config_list=config_list)

    # Option 1: Define a utility function (e.g., evaluating prompt quality)
    def my_utility_function(system_prompt, user_prompt):
        # return score

    # Option 2: Define a preference function (e.g., using an LLM to compare candidates)
    def my_preference_function(candidates):
        # Implement comparison logic to select the best candidate
        best_candidate = max(candidates, key=lambda x: len(x[1]))
        return best_candidate
    
     # Create an OptoPrimeImprover using the base optimizer and a utility function
    optimizer = OptoPrimeImprover( base_optimizer=base_optimizer, num_candidates=5,
        utility_function=my_utility_function)  # OR use preference_function=my_preference_function for Option 2

# Option B idea of implementation

    class OptoPrimeEnhanced(n):

        def __init__(self, parameters, config_list=None, num_candidates=3,
                     utility_function: Callable[[str, str], float] = None,
                     preference_function: Callable[[List[Tuple[str, str]]], Tuple[str, str]] = None, **kwargs):
            super().__init__(parameters, config_list=config_list, **kwargs)
            self.num_candidates = num_candidates
            self.utility_function = utility_function
            self.preference_function = preference_function

        @bundle(trainable=True)
        def construct_prompt(self, summary, mask=None, *args, **kwargs) -> Tuple[str, str]:
            candidates = []
           
            for _ in range(self.num_candidates):
                # Generate a candidate prompt
                system_prompt, user_prompt = super().construct_prompt(summary, mask, *args, **kwargs)
                candidates.append((system_prompt, user_prompt))
           
            # Evaluate and select the best candidate
            if self.utility_function:
                # Use utility function to score candidates
                best_candidate = max(candidates, key=lambda pair: self.utility_function(pair[0], pair[1]))
            elif self.preference_function:
                # Use preference function to select the best candidate
                best_candidate = self.preference_function(candidates)
            else:
                # Default to the first candidate if no functions are provided
                best_candidate = candidates[0]
           
            return best_candidate

allenanie · 2024-11-19T17:05:10Z

Based on my understanding of STOP, I think option A as you described is the closest to the original paper's intention.

Do you want to do a fork & new branch and make a pull request after? It doesn't have to be fully working, and we can iterate.

doxav · 2024-11-21T12:12:41Z

Ok, I'll try to do that this Monday

chinganc assigned chinganc and allenanie Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step/trial storage and LLM generation strategy management #15

Step/trial storage and LLM generation strategy management #15

doxav commented Nov 12, 2024 •

edited

Loading

chinganc commented Nov 12, 2024

doxav commented Nov 13, 2024

allenanie commented Nov 13, 2024

chinganc commented Nov 13, 2024 •

edited

Loading

doxav commented Nov 14, 2024 •

edited

Loading

allenanie commented Nov 14, 2024

allenanie commented Nov 15, 2024

doxav commented Nov 18, 2024

allenanie commented Nov 19, 2024

doxav commented Nov 21, 2024

Step/trial storage and LLM generation strategy management #15

Step/trial storage and LLM generation strategy management #15

Comments

doxav commented Nov 12, 2024 • edited Loading

chinganc commented Nov 12, 2024

doxav commented Nov 13, 2024

allenanie commented Nov 13, 2024

chinganc commented Nov 13, 2024 • edited Loading

doxav commented Nov 14, 2024 • edited Loading

allenanie commented Nov 14, 2024

allenanie commented Nov 15, 2024

doxav commented Nov 18, 2024

allenanie commented Nov 19, 2024

doxav commented Nov 21, 2024

doxav commented Nov 12, 2024 •

edited

Loading

chinganc commented Nov 13, 2024 •

edited

Loading

doxav commented Nov 14, 2024 •

edited

Loading