

Thanks for the edit. You have a very intriguing idea; a second LLM in the background with a summary of the conversation + static context might make performance a lot better. I don’t know if anyone has implemented it/knows how one can DIY it with Kobold/Ollama. I think it is an amazing idea for code assistants too if you’re doing a long coding session.
Yes, just thought if you could check that the correct ports are opened. I.e. is port 443 open for NGINX on Unraid? Is NGINX forwarding traffic to the correct port to your backend? Is the backend configured to allow traffic on a certain domain/all domains if it is handling HTTPS?