Language Model Inference Script

This script runs inference for a language model using Hugging Face Transformers, supporting chat-style message input and optional system message prepending.

Requirements

Python 3.8+
torch
transformers

Install dependencies:

pip install torch transformers

Usage

Basic Example

Run inference with a model and a JSON file containing messages:

python inference.py --model_name <model_path_or_name> --message_file messages.json

Interactive Chat Mode

You can chat interactively with the model. After each response, you can enter a new message (including an empty message) and continue the conversation. Type exit or quit to end the session.

Start interactive mode with no initial message:

python inference.py --model_name <model_path_or_name>

Or, after running with a message or message file, you will be prompted to continue interactively.

With a System Message

To prepend a system message from a text file:

python inference.py --model_name <model_path_or_name> --message_file messages.json --system_message_file system.txt

With a Single Message

You can also provide a single message directly:

python inference.py --model_name <model_path_or_name> --message "Hello, how are you?"

Additional Options

--torch_dtype: Set torch dtype (e.g., auto, float16, bfloat16, float32)
--max_new_tokens: Max new tokens to generate (default: 1000)
--device: Device to run the model on (e.g., cpu, cuda:0, mps)

Message File Format

The --message_file should be a JSON file containing a list of messages, e.g.:

[
  {"role": "user", "content": "Hello!"},
  {"role": "assistant", "content": "Hi! How can I help you?"}
]

Notes

You can send empty messages (just press Enter) in interactive mode; these will be appended and sent to the model.

System Message File

The --system_message_file should be a plain text file. Its content will be prepended as a system message, e.g.:

You are a helpful assistant.

Example

python inference.py --model_name Qwen/Qwen3-1.7B --message_file messages.json --system_message_file system.txt --device cpu

Interactive Example

python inference.py --model_name Qwen/Qwen3-1.7B

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
unsloth		unsloth
README.md		README.md
convert_hf_gguf.py		convert_hf_gguf.py
inference.py		inference.py
llamacpp.md		llamacpp.md
manual_merge_lora.py		manual_merge_lora.py
merge_lora.py		merge_lora.py
messages.json		messages.json
sft_unsloth.py		sft_unsloth.py
system_message.txt		system_message.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Model Inference Script

Requirements

Usage

Basic Example

Interactive Chat Mode

With a System Message

With a Single Message

Additional Options

Message File Format

Notes

System Message File

Example

Interactive Example

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Language Model Inference Script

Requirements

Usage

Basic Example

Interactive Chat Mode

With a System Message

With a Single Message

Additional Options

Message File Format

Notes

System Message File

Example

Interactive Example

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages