Instead of using character ai, which will send all my private conversations to governments, I found this solution. Any thoughts on this? 😅

  • Naz@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 days ago

    If you are using CPU only, you need to look at very small models or the 2-bit quants.

    Everything will be extremely slow otherwise:

    GPU:

    Loaded Power: 465W

    Speed: 18.5 tokens/second

    CPU: Loaded Power: 115W

    Speed: 1.60 tokens/second

    GPUs are at least 3 times faster for the same power draw.