Chat Tab
Starting a Chat

When you open the Chat tab, a screen like this is displayed. To start a chat, you first need to select the model you want to use for the chat.

To select a model, click or tap the model switch button in the top right corner. This will display a list of models on the currently selected Ollama server, and you can select the model you want to use (to switch the selected server, please refer to the Server Tab documentation).

Once a model is selected, you can enter a message. Type the text you want to send to the AI model and click or tap the Send button.
To insert a line break on macOS, press ⇧ (Shift) + ↩︎ (Return) simultaneously.

When you send a message, a chat request is sent to the Ollama server, and the model loading process will begin if necessary.
Loading may take time depending on the model type and the type of storage where the model is saved on the Ollama server.
Tip
The API timeout is set to 30 seconds by default, so a timeout error may occur if model loading takes a long time.
If you know that model loading will take time, I recommend increasing the API timeout duration in Settings or setting it to unlimited.
For instructions on how to set the API timeout duration, please refer to the Settings documentation.

After a while, you will receive a response from the AI.
Once the response is fully completed, you can perform operations on the message. The possible operations are as follows:
- Your Messages
- Copy
- You can copy the message as Markdown-formatted text.
- Edit
- You can edit the sent message and resend it. Only the last sent message can be edited.
- AI Messages
- Switch to Previous Revision
- You can switch to the revision before the regeneration. This is displayed only if there are two or more revisions.
- Switch to Next Revision
- You can switch from the previous revision to the next revision. This is displayed only if there are two or more revisions.
- Retry
- You can regenerate the AI's response. Only the last message can be regenerated.
- Copy
- You can copy the message as Markdown-formatted text.
- Share
- You can share the generated text.
Starting a New Chat
To clear the chat history and start a new chat, click the New Chat button in the top right corner or press ⌥ (Option) + ⌘ (Command) + N simultaneously.
Changing Chat Settings

By opening the Inspector, you can customize the chat settings.
To open the Inspector, click or tap the sidebar toggle button in the top right corner.
From the Inspector, you can customize the following settings:
- Chat Settings
- Stream Response
- Toggles whether to receive the AI's message continuously. If turned off, you will not receive a response until the final answer is returned, so I recommend setting the API timeout to unlimited. For instructions on how to set the API timeout duration, please refer to the Settings documentation.
- Keep Alive
- Select how long the model stays loaded in the server's memory.
- Thinking
- Toggles whether to perform inference when using a model that supports Thinking. This is only configurable for models that support "Thinking" in their "Model Capabilities." To check model capabilities, please refer to the Model Tab documentation.
- System Prompt
- You can set a system prompt for the AI model.
- Custom Settings
- Enable Custom Settings
- Toggles whether to enable custom settings for the configurations below.
- Seed
- You can specify a seed value to achieve reproducible generation.
- Temperature
- Specifies the model's temperature. It can be set between
0.0 and 2.0. Lowering the temperature makes the output more accurate, while raising it makes it more creative (not all models may follow this setting, and incorrect output may result depending on the setting).
- Context Window
- Specifies the number of tokens the model can load at once. It can be set between
512 and the model's context length. To check the model's context length, please refer to the Model Tab documentation.
- Repeat Last N
- Sets how far back for the model to look back to prevent repetition.
- Repeat Penalty
- Sets how strongly to penalize repetitions.
- Top-k
- Reduces the probability of generating nonsense. A higher value like 100 will give more diverse answers, while a lower value like 10 will give more stable answers.
- Top-p
- Works together with Top-k. A higher value like 0.95 will lead to more diverse text, while a lower value like 0.5 will generate more focused and stable text.
- Min-p
- An alternative to Top-p, aimed at ensuring a balance of quality and variety. It discourages low-quality responses by excluding tokens with a relative probability below a threshold (P) compared to the most likely token.
The settings configured here will be reflected from the next message you send onward.