• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    7 hours ago

    For local LLMs, this is an issue because it breaks your prompt cache and slows things down, without a specific tiny model to “categorize” text… which few have really worked on.

    I don’t think the corporate APIs or UIs even do this. You are not wrong, but it’s just not done for some reason.

    It could be that the trainers don’t realize its an issue. For instance, “0.5-0.7” is the recommended range for Deepseek R1, but I find much lower or slightly higher is far better, depending on the category and other sampling parameters.