Chibi character with books

A lot of people don't seem to understand what context is, how to manage it, or how it can be used properly, or why it's better to not just completely fill it up. This is a simplified version to help you get the basics down.

Listen, if I can wrap my stoopid head mostly around this, then you can too! 💜

What Is Context?

When you talk to a bot, it doesn't actually "remember" things. Instead, it works by looking at a chunk of text called the context. The context is everything the chatbot can see right now in order to decide what to say next. That chunk of text lives inside something called the context window. You can think of the context window as the chatbot's active working memory, whereas the Nexus is more like a diary of important things it should remember when it needs to.

It's not the full conversation history forever; it's just the part of the conversation that still fits inside the window at that moment. Imagine it being like the last 10 pages in a novel you're reading. On DJ, the chat context window is around 10,000 tokens.

What Takes Up Space?

What's important is that the context window isn't only your messages and the chatbot's replies. It also includes:

All of that takes up space in the same 10k window when the bot has to formulate a reply.

What Happens When It Fills Up?

Once the conversation gets long enough, older parts of the chat fall out of the context window to make room for new text. When that happens, the chatbot literally cannot see those earlier messages anymore. From its point of view, they no longer exist — so it can't use them, even if they were important.

If something happened before that 10k window, hasn't been referenced since, and it's not in your nexus, it's gone. So when a chatbot "forgets" something, it's usually been pushed outside of the context. It's just that the information is no longer inside the active context window. The chatbot can only respond based on what's currently inside that window (recent messages and the instruction block), not on anything that came before it.

Example:

StageToken CountWhat's Happening
Starting context1500Already loaded
Your first message1200You speak
Instructions2000Bot guts, persona, system prompt, etc.
Bot's output2000Bot responds
Your next message1300You speak again
Instructions (again)2000Same as before
Total: 10,000 tokens — WINDOW IS FULL!

At this point, the first 1300 tokens (your first message) get pushed out to make space for the new information.

Why Have Token Limits?

If we had bots that were 2k tokens, then system prompts that were 1k, then lorebooks with no limits 5k, thinking prompts that are 2k before they're filled, and then we're also asking for 4k/5k actual message outputs, the whole context window is going to get eaten up EVERY single message. We don't want that!

Current Limits:

Important: The 3000 token output cap is independent of the context window. Old tokens get pushed out of the context to make room for new ones. The 3000 tokens output cap is from the start of the message generation (thinking if you're using Athena) to the end.

Why Is There an Output Cap?

3000 tokens is a lot. That's roughly 12,000–14,000 characters depending on the tokenizer. Asking the bot to generate 5000–6000 tokens of content puts a lot of strain on resources and can slow models down, degrade quality, and cause instability like failed generations.

The short version: Shorter limits produce faster, cleaner interactions that feel better to most users. Long responses are slow to generate, increase latency, take longer to stream, and fail more often. When responses get too long, model performance degrades—they ramble, contradict themselves, or drift away from the original instructions.

Athena/Nyx Thinking

This is why I personally always delete my thinking prompt and don't bother with things like trackers. I would rather have the context window full of things that actually happened in the roleplay and have less reliance on the nexus, when the short term memory works well. On other sites, short term memory is sometimes all you have. By completely neglecting the context window, you're kind of throwing away a useful tool and putting the entire burden of your RP's memory onto the nexus.

Simple Example (6k window, messages only):

With thinking prompts kept:
- 1k thinking + 1k plot
- 1k thinking + 1k plot
- 1k thinking + 1k plot
(Only 3 turns before window is full)

With thinking prompts deleted:
- 1k plot
- 1k plot
- 1k plot
- 1k plot
- 1k plot
- 1k plot
(6 turns of actual RP before window is full!)

Do Thinking Prompts Fall Off?

Yup! They get pushed out of the context window after a few messages because they take up space. It's not an extra mechanism. They make the output larger (especially if you have a massive thinking prompt), and therefore get pushed off the edge faster. If you have something like a 2000 token thinking output and then a 2000 token text output, your entire context window might be taken up by just those last two messages.

Do I Have to Remove It?

No! But if it's really big, it's much more likely to mess things up (forget older events, make more mistakes about small things). If your prompt is smaller, it won't make a massive difference. But consider: if your thinking is huge, maybe look for other places to cut down. Make lorebooks shorter and simpler, cut down on your persona, cut down on your system prompt. Something has to give! 💚

(Nebula and Aster can both help you easily remove your thinking blocks with a tap of a button!)

Context Window Does More Than Remember

The LLM basically studies its window not just for the instructions or what has happened, but also how it is worded. It's why the first message controls so much of the writing style and tone for the roleplay. It's also where the LLM picks up their patterns!

Examples:

Pro Tip: You can change the POV, message length, tense, or pronouns by shifting what's in your context. If the bot has an intro in the first person and you don't like it, just write your preferred tense for a few messages. Once the context has more of your preferred style, it will stick to that. The old tense is out of the context, gone!

Calculating Your Context Usage

Here's a rough way to estimate how many tokens you're using before messages even come into play:

  1. Grab all the text from: bot guts (persona, details, author note), persona, system prompt, and lorebooks (estimate 100-1500 tokens depending on optimization)
  2. Put them into a tokenizer like OpenAI's tokenizer
  3. Add maybe 200–300 for other small things like chat history

That's roughly your little package of tokens you're sending to the model when you send a reply.

Further Reading

Unofficial fan site. All characters, bots, and guides are fan-made and not affiliated with DreamJourney AI, Janitor AI, or any other platform. DreamJourney AI and Janitor AI are 18+ platforms — please make sure you meet the age requirements.

Please take care of yourself when engaging with AI roleplay. Some chatbots deal with darker themes — use your own discretion, and close the chat or find another character if things start to bother you. It is always okay to step away. 💚

🌻 © Sunflower Seeds  ·  Fan-made & unofficial