Context Window Guide — Sunflower Seeds

A lot of people don't seem to understand what context is, how to manage it, or how it can be used properly, or why it's better to not just completely fill it up. This is a simplified version to help you get the basics down.

Listen, if I can wrap my stoopid head mostly around this, then you can too! 💜

What Is Context?

When you talk to a bot, it doesn't actually "remember" things. Instead, it works by looking at a chunk of text called the context. The context is everything the chatbot can see right now in order to decide what to say next. That chunk of text lives inside something called the context window. You can think of the context window as the chatbot's active working memory, whereas the Nexus is more like a diary of important things it should remember when it needs to.

It's not the full conversation history forever; it's just the part of the conversation that still fits inside the window at that moment. Imagine it being like the last 10 pages in a novel you're reading. On DJ, the chat context window is around 10,000 tokens.

What Takes Up Space?

What's important is that the context window isn't only your messages and the chatbot's replies. It also includes:

All instructions (the system prompt that tells the bot how to behave, your bot guts, lorebook pulls, your persona details, your author note, relevant info from the nexus)
Any extra information that gets injected behind the scenes, between each message you send

All of that takes up space in the same 10k window when the bot has to formulate a reply.

What Happens When It Fills Up?

Once the conversation gets long enough, older parts of the chat fall out of the context window to make room for new text. When that happens, the chatbot literally cannot see those earlier messages anymore. From its point of view, they no longer exist — so it can't use them, even if they were important.

If something happened before that 10k window, hasn't been referenced since, and it's not in your nexus, it's gone. So when a chatbot "forgets" something, it's usually been pushed outside of the context. It's just that the information is no longer inside the active context window. The chatbot can only respond based on what's currently inside that window (recent messages and the instruction block), not on anything that came before it.

Example:

Stage	Token Count	What's Happening
Starting context	1500	Already loaded
Your first message	1200	You speak
Instructions	2000	Bot guts, persona, system prompt, etc.
Bot's output	2000	Bot responds
Your next message	1300	You speak again
Instructions (again)	2000	Same as before
Total: 10,000 tokens — WINDOW IS FULL!

At this point, the first 1300 tokens (your first message) get pushed out to make space for the new information.

Why Have Token Limits?

If we had bots that were 2k tokens, then system prompts that were 1k, then lorebooks with no limits 5k, thinking prompts that are 2k before they're filled, and then we're also asking for 4k/5k actual message outputs, the whole context window is going to get eaten up EVERY single message. We don't want that!

Current Limits:

1500 tokens bot guts
1500 tokens of lorebook info can be pulled per message
375 to 500 tokens for persona (1500 characters)
3000 token message output cap

Important: The 3000 token output cap is independent of the context window. Old tokens get pushed out of the context to make room for new ones. The 3000 tokens output cap is from the start of the message generation (thinking if you're using Athena) to the end.

Why Is There an Output Cap?

3000 tokens is a lot. That's roughly 12,000–14,000 characters depending on the tokenizer. Asking the bot to generate 5000–6000 tokens of content puts a lot of strain on resources and can slow models down, degrade quality, and cause instability like failed generations.

The short version: Shorter limits produce faster, cleaner interactions that feel better to most users. Long responses are slow to generate, increase latency, take longer to stream, and fail more often. When responses get too long, model performance degrades—they ramble, contradict themselves, or drift away from the original instructions.

Athena/Nyx Thinking

This is why I personally always delete my thinking prompt and don't bother with things like trackers. I would rather have the context window full of things that actually happened in the roleplay and have less reliance on the nexus, when the short term memory works well. On other sites, short term memory is sometimes all you have. By completely neglecting the context window, you're kind of throwing away a useful tool and putting the entire burden of your RP's memory onto the nexus.

Simple Example (6k window, messages only):

With thinking prompts kept:
- 1k thinking + 1k plot
- 1k thinking + 1k plot
- 1k thinking + 1k plot
(Only 3 turns before window is full)

With thinking prompts deleted:
- 1k plot
- 1k plot
- 1k plot
- 1k plot
- 1k plot
- 1k plot
(6 turns of actual RP before window is full!)

Do Thinking Prompts Fall Off?

Yup! They get pushed out of the context window after a few messages because they take up space. It's not an extra mechanism. They make the output larger (especially if you have a massive thinking prompt), and therefore get pushed off the edge faster. If you have something like a 2000 token thinking output and then a 2000 token text output, your entire context window might be taken up by just those last two messages.

Do I Have to Remove It?

No! But if it's really big, it's much more likely to mess things up (forget older events, make more mistakes about small things). If your prompt is smaller, it won't make a massive difference. But consider: if your thinking is huge, maybe look for other places to cut down. Make lorebooks shorter and simpler, cut down on your persona, cut down on your system prompt. Something has to give! 💚

(Nebula and Aster can both help you easily remove your thinking blocks with a tap of a button!)

Context Window Does More Than Remember

The LLM basically studies its window not just for the instructions or what has happened, but also how it is worded. It's why the first message controls so much of the writing style and tone for the roleplay. It's also where the LLM picks up their patterns!

Examples:

Non-thinking models copying thinking: They see thinking tags in the context and try to use them, even though they can't actually reason. They just see it in their context window and think "yes, this is what I must do!"
OOC comments: If you use OOC and it's in the 10k token window, the bot is more likely to also use OOC! To avoid this, when you use OOC and the bot's reply is what you wanted, go back and delete your OOC comment.
Message length: A short first message is more likely to get short replies. A long first message tends to get longer replies.
Trackers: If you put trackers at the beginning of your message, the models will start to pick them up and add them themselves.

Pro Tip: You can change the POV, message length, tense, or pronouns by shifting what's in your context. If the bot has an intro in the first person and you don't like it, just write your preferred tense for a few messages. Once the context has more of your preferred style, it will stick to that. The old tense is out of the context, gone!

Calculating Your Context Usage

Here's a rough way to estimate how many tokens you're using before messages even come into play:

Grab all the text from: bot guts (persona, details, author note), persona, system prompt, and lorebooks (estimate 100-1500 tokens depending on optimization)
Put them into a tokenizer like OpenAI's tokenizer
Add maybe 200–300 for other small things like chat history

That's roughly your little package of tokens you're sending to the model when you send a reply.

What Is Context?

What Takes Up Space?

What Happens When It Fills Up?

Example:

Why Have Token Limits?

Current Limits:

Why Is There an Output Cap?

Athena/Nyx Thinking

Simple Example (6k window, messages only):

Do Thinking Prompts Fall Off?

Do I Have to Remove It?

Context Window Does More Than Remember

Examples:

Calculating Your Context Usage

Further Reading