Skip to content

Merge changes from randaller/llama-chat #4

Open
Honigmelone wants to merge 1 commit intorandaller:mainfrom
Honigmelone:main
Open

Merge changes from randaller/llama-chat #4
Honigmelone wants to merge 1 commit intorandaller:mainfrom
Honigmelone:main

Conversation

@Honigmelone
Copy link
Copy Markdown

Hey,

I notices the default prompt in example-chat.py was quite different from your two repos. I have merges some more recent changes from https://github.com/randaller/llama-chat to get the interactive chat working in the cpu only version.

I have not merged the model and the tokenizer yet. You might want to consider to build up on this and to merge them as well to obtain two consistent repositories

@randaller
Copy link
Copy Markdown
Owner

@Honigmelone this will break all other examples; llama-chat is now a primary repo, and this repo is deprecated

@Honigmelone
Copy link
Copy Markdown
Author

I see, is it somehow possible to run llama-chat in cpu only mode or do you drop this functionality?

@alaestor
Copy link
Copy Markdown

alaestor commented Mar 19, 2023

I haven't a clue what I'm doing and am just quickly messing around, but regarding llama-chat/llama/model.py: I changed use_gpu in def forward to False, and then all occurrences of .cuda() to .cpu() in Transformer's and Attention's inits. It just sorta... worked. Kind of. I assume it's tailored for GPU use because it's slow as heck on CPU (going from llama-cpu @ 1it/s to the bodged llama-chat's 6~8s/it with 7B on my 7950x)

Hopefully proper CPU support will come to the main repo some day.... For now I guess I'll just base my own personal experiments on this deprecated repo, or Frankenstein myself some hybrid of the two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants