Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite repetitions and invalid JSON - Outlines with MLX #1131

Open
ea167 opened this issue Sep 5, 2024 · 1 comment · May be fixed by #1134
Open

Infinite repetitions and invalid JSON - Outlines with MLX #1131

ea167 opened this issue Sep 5, 2024 · 1 comment · May be fixed by #1134

Comments

@ea167
Copy link

ea167 commented Sep 5, 2024

Describe the issue as clearly as possible:

On certain prompts, the LLM can spiral into an infinite loop providing the same item repeatedly, until stopped by max_tokens parameter.

In that case, the JSON will fail with an exception as being invalid, without returning any result.

Llama.cpp and MLX-LM have parameters to penalize repetition and thus preventing it.
While Outlines accept additional parameters to pass to Llama.cpp, it does not for MLX-LM,
resulting in prompt failure.

long_42k_llm_prompt.md

Steps/code to reproduce the bug:

RESULTS_JSON_SCHEMA = """{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
 "results": {
  "type": "array",
  "items": {
   "type": "string"
  }
 }
},
"required": ["results"],
"additionalProperties": false
}"""
 
 
from outlines import models, generate, samplers
import json
 
model = models.mlxlm("mlx-community/Meta-Llama-3.1-8B-Instruct-4bit")
sampler = samplers.multinomial( top_p=0.1 )
generator = generate.json( model, RESULTS_JSON_SCHEMA, sampler )
 
json_answer = generator( long_42k_llm_prompt, max_tokens=1000 )
print( json.dumps( json_answer, indent=4 ) )

Expected result:

List without endless repetition at the end.

When running directly MLX-LM, we get an infinite loop, stopped by max_tokens only

python -m mlx_lm.generate --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --prompt "$(< ~/Downloads/long_42k_llm_prompt.md)" --max-tokens 5000

...
687. **Methodist Hospital**
688. **Methodist Hospital**
689. **Methodist Hospital**
690. **Methodist Hospital**
691. **Methodist Hospital**
692. **Methodist Hospital**
693. **Methodist Hospital**
694. **Methodist Hospital**
695. **Methodist Hospital**
696. **Methodist Hospital**
697. **Methodist Hospital**

==========
Prompt: 11380 tokens, 432.382 tokens-per-sec
Generation: 5000 tokens, 26.872 tokens-per-sec
Peak memory: 6.891 GB

Error message:

No response

Outlines/Python version information:

Version information

0.0.47.dev69+g72377db
Python 3.12.4
mlx==0.17.2
mlx-lm==0.18.1

Context for the issue:

No response no

@ea167
Copy link
Author

ea167 commented Sep 6, 2024

I created the PR #1134 to fix the problem.

Please review it and merge it.

@rlouf rlouf added the JSON label Sep 9, 2024 — with Linear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants