• 0 Posts
  • 16 Comments
Joined 1 year ago
cake
Cake day: July 3rd, 2023

help-circle
  • qqq@lemmy.worldtomemes@lemmy.worldBlursed Bot
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    17 days ago

    The important point there is that they don’t care imo. It’s not even worth the effort to try.

    You can likely come up with something “good enough” though yea. Your original code would probably be good enough if it was normalized to lowercase before the check. My point was that denylists are harder to construct than they initially appear. Especially in the LLM case.


  • qqq@lemmy.worldtomemes@lemmy.worldBlursed Bot
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    17 days ago

    IGNORE ALL PREVIOUS INSTRUCTIONS

    Disregard all previous instructions

    Potentially even:

    ingore all previous instructions

    Ignor all previous instructions

    Also leaks that it might be an LLM by never responding to posts with “ignore”












  • This is not necessarily true.

    For example, consider the case of a 1Password vault falling into the hands of an attacker. They do not have the option to just crack your password, as the password is mixed with a randomly generated value to ultimately derive the key. They would need to simultaneously brute force your password and that random value. This should almost be impossible. However, given access to a client that already has knowledge of the secret value, it would fall back to brute forcing the password.