Test / promptfoo Issues #872

pl-shernandez · 2024-11-18T21:26:48Z

Issue

I'm having some issues getting the test functionality to succeed so I stripped it down to a very simple script. The issue is when using any of npx genaiscript test add, right-clicking the script and choosing "Run Tests" or via the VS Code Test Explorer. My teammate is experiencing the same issue.

Script

I made this barebones script which works just fine when ran and I know that the model is deployed.

genaiscript: add
prompting azure:gpt-4o (~245 tokens)

1 + 1 equals 2.

genaiscript: success

add.genaiscript.mjs

script({
  title: 'Simple Math Test',
  description: 'Validates that the model correctly calculates 1+1.',
  group: 'Basic Tests',
  temperature: 0,  
  model: 'azure:gpt-4o',
  maxTokens: 10, 
  tests: [
    {
      files: [],  
      rubrics: ['output correctly calculates 1+1 as 2'],
      facts: [`The model should return "2".`],
      asserts: [
        {
          type: 'equals',
          value: '2', 
        },
      ],
    },
  ],
});

$`What is 1 + 1?`;

Errors

The errors I receive in the terminal when running via right-clicking the script or Test Explorer...

❌ add Command failed with exit code 1: npx --yes '[email protected]' eval --config .genaiscript/tests/add.promptfoo.yaml --max-concurrency 1 --no-progress-bar --cache --verbose --output .genaiscript/tests/add.promptfoo.res.json http://127.0.0.1:15500/eval?evalId=eval-gtY-2024-11-18T19%3A59%3A59

and in the PromptFoo UI

Similarly if i run npx genaiscript test add

Troubleshooting

I tried adding additional variables for the default models in my .env but that didn't help
AZURE_OPENAI_API_KEY=hidden
AZURE_OPENAI_API_ENDPOINT=hidden
GENAISCRIPT_DEFAULT_MODEL="azure:gpt-4o"
GENAISCRIPT_DEFAULT_SMALL_MODEL="azure:gpt-4o"
GENAISCRIPT_DEFAULT_VISION_MODEL="azure:gpt-4o"

Tried adding promptfoo as a dependency
npm install -d promptfoo@latest

Tried un-installing and re-installing extension

Setup

genaiscript": "^1.76.0
vscode: Version: 1.95.3 (Universal)
mac os: 14.6.1

Going Further

Prior to this I was able to runs some more complex prompts with tests unsuccessfully using an augmented version of the poem example. But I received unterminated JSON errors even with 4096 tokens set. I felt I should try and ask for help on this first.

The text was updated successfully, but these errors were encountered:

pelikhan · 2024-11-19T06:12:07Z

I suspect we leak some logging in the console.out that breaks the console.log

I will investigate.

pelikhan · 2024-11-19T06:15:14Z

(The formatting is unfortunate in promptfoo but there is no way to customize how it is rendered in the cli)

pl-shernandez · 2024-11-19T15:38:33Z

Troubleshooting Experiment with Ollama

I tried another experiment this AM using ollama locally and I updated the add.genaiscript.mjs to use model: 'ollama:llama3.2' inside the script(...

The script runs and succeeds with 🦙 locally serving the default. The test running approaches though now have this message
"Error: OpenAI API key is not set. Set the OPENAI_API_KEY environment variable or add apiKeyto the provider config.\n\nError: OpenAI API key is not set. Set the OPENAI_API_KEY environment variable or addapiKey to the provider config.\n at OpenAiChatCompletionProvider.callApi

Experimented with setting my .env file to a mix of

GENAISCRIPT_DEFAULT_MODEL="ollama:llama3.2"
GENAISCRIPT_DEFAULT_SMALL_MODEL="ollama:llama3.2"
GENAISCRIPT_DEFAULT_VISION_MODEL="ollama:llama3.2"

and adding placeholders without success but my understanding is those aren't needed for Ollama

OPENAI_API_KEY=
OPENAI_API_ENDPOINT=
GENAISCRIPT_DEFAULT_MODEL="ollama:llama3.2"
GENAISCRIPT_DEFAULT_SMALL_MODEL="ollama:llama3.2"
GENAISCRIPT_DEFAULT_VISION_MODEL="ollama:llama3.2"

promptfoo really seems dead set on that OPEN_API_KEY

pelikhan · 2024-11-19T16:55:45Z

The issue is that rubrick or facts are LLM-as-judge type of assertions and they require further configuration of the LLM. This is not well supported by genaiscript yes (oops). The promptfoo docs are at https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded/llm-rubric/#overriding-the-llm-grader for info.

To debug further the promptfoo issues, you can take a look at the .yml files dropped under .genaiscript/tests/*.yml. These are promptfoo configuration that are run directly by the promptfoo cli. For those, all the docs at https://www.promptfoo.dev/docs/getting-started/ apply since GenAIScript is merely launching the promptfoo cli on those. This is usually how I debug promptfoo.

pelikhan · 2024-11-19T17:49:49Z

I need to add an escape hatch to allow for configuration the promptfoo grader.

pelikhan · 2024-11-19T23:12:06Z

@pl-shernandez 1.76.2 fixes the JSON parsing issue , which was caused from a "pollution" of the process.stdout when we pipe the JSON out. The issue of properly configuring promptfoo for rubric testing is still somewhat relevant. For example, I haven't see how promptfoo handles running Azure with Microsoft Entra yet.

Maybe it's time to remove this dependency and run it all in genaiscript.

pl-shernandez · 2024-11-21T13:20:34Z

@pelikhan Spent some time working with this today to better understand. I was able to call the promptfoo npx directly and supply an override grader as suggested while using the yaml that genaiscript creates. I had to do two things in calling directly to satisfy promptfoo:

Add these environment variables to .env for their configuration

AZURE_DEPLOYMENT_NAME= 
AZURE_API_KEY= 
AZURE_API_HOST=

Put in the flag for the --grader and remove the --cache flag and using the auto-generated yaml config with no modifications
npx --yes '[email protected]' eval --grader azureopenai:chat:gpt-4o-mini --config .genaiscript/tests/add.promptfoo.yaml --max-concurrency 1 --no-progress-bar --verbose --output .genaiscript/tests/add.promptfoo.res.json

This is the command that it runs when right click to run tests or test explorer is used that I modified
npx --yes '[email protected]' eval --config .genaiscript/tests/add.promptfoo.yaml --max-concurrency 1 --no-progress-bar --cache --verbose --output .genaiscript/tests/add.promptfoo.res.json

pelikhan mentioned this issue Nov 19, 2024

fixing promptfoo output #876

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test / promptfoo Issues #872

Test / promptfoo Issues #872

pl-shernandez commented Nov 18, 2024 •

edited

Loading

pelikhan commented Nov 19, 2024

pelikhan commented Nov 19, 2024

pl-shernandez commented Nov 19, 2024

pelikhan commented Nov 19, 2024

pelikhan commented Nov 19, 2024

pelikhan commented Nov 19, 2024

pl-shernandez commented Nov 21, 2024 •

edited

Loading

Test / promptfoo Issues #872

Test / promptfoo Issues #872

Comments

pl-shernandez commented Nov 18, 2024 • edited Loading

Issue

Script

Errors

Troubleshooting

Setup

Going Further

pelikhan commented Nov 19, 2024

pelikhan commented Nov 19, 2024

pl-shernandez commented Nov 19, 2024

Troubleshooting Experiment with Ollama

pelikhan commented Nov 19, 2024

pelikhan commented Nov 19, 2024

pelikhan commented Nov 19, 2024

pl-shernandez commented Nov 21, 2024 • edited Loading

pl-shernandez commented Nov 18, 2024 •

edited

Loading

pl-shernandez commented Nov 21, 2024 •

edited

Loading