Logo

Mocolamma

Document Home / Chat Tab

Chat Tab

Starting a Chat

Chat Tab

When you open the Chat tab, a screen like this is displayed. To start a chat, you first need to select the model you want to use for the chat.

Select Model

To select a model, click or tap the model switch button in the top right corner. This will display a list of models on the currently selected Ollama server, and you can select the model you want to use (to switch the selected server, please refer to the Server Tab documentation).

Enter Message

Once a model is selected, you can enter a message. Type the text you want to send to the AI model and click or tap the Send button.
To insert a line break on macOS, press ⇧ (Shift) + ↩︎ (Return) simultaneously.

Model Loading

When you send a message, a chat request is sent to the Ollama server, and the model loading process will begin if necessary.
Loading may take time depending on the model type and the type of storage where the model is saved on the Ollama server.

Tip
The API timeout is set to 30 seconds by default, so a timeout error may occur if model loading takes a long time.
If you know that model loading will take time, I recommend increasing the API timeout duration in Settings or setting it to unlimited.
For instructions on how to set the API timeout duration, please refer to the Settings documentation.

Message Operations

After a while, you will receive a response from the AI.
Once the response is fully completed, you can perform operations on the message. The possible operations are as follows:

  • Your Messages
    • Copy
      • You can copy the message as Markdown-formatted text.
    • Edit
      • You can edit the sent message and resend it. Only the last sent message can be edited.
  • AI Messages
    • Switch to Previous Revision
      • You can switch to the revision before the regeneration. This is displayed only if there are two or more revisions.
    • Switch to Next Revision
      • You can switch from the previous revision to the next revision. This is displayed only if there are two or more revisions.
    • Retry
      • You can regenerate the AI's response. Only the last message can be regenerated.
    • Copy
      • You can copy the message as Markdown-formatted text.

Information
To prevent performance degradation, Markdown text is processed line by line while the AI's response is being returned (during stream response).
Therefore, the display may appear corrupted temporarily, but it should quickly change to the correct display.

Starting a New Chat

To clear the chat history and start a new chat, click the New Chat button in the top right corner or press ⌥ (Option) + ⌘ (Command) + N simultaneously.

Changing Chat Settings

Chat Inspector

By opening the Inspector, you can customize the chat settings.
To open the Inspector, click or tap the sidebar toggle button in the top right corner.

From the Inspector, you can customize the following settings:

  • Chat Settings
    • Stream Response
      • Toggles whether to receive the AI's message continuously. If turned off, you will not receive a response until the final answer is returned, so I recommend setting the API timeout to unlimited. For instructions on how to set the API timeout duration, please refer to the Settings documentation.
    • Thinking
      • Toggles whether to perform inference when using a model that supports Thinking. This is only configurable for models that support "Thinking" in their "Model Capabilities." To check model capabilities, please refer to the Model Tab documentation.
    • System Prompt
      • You can set a system prompt for the AI model.
  • Custom Settings
    • Enable Custom Settings
      • Toggles whether to enable custom settings for the configurations below.
    • Temperature
      • Specifies the model's temperature. It can be set between 0.0 and 2.0. Lowering the temperature makes the output more accurate, while raising it makes it more creative (not all models may follow this setting, and incorrect output may result depending on the setting).
    • Context Window
      • Specifies the number of tokens the model can load at once. It can be set between 512 and the model's context length. To check the model's context length, please refer to the Model Tab documentation.

The settings configured here will be reflected from the next message you send onward.

Mocolamma Icon

Mocolamma

Ollama Manager App

GitHub Latest Release

GitHub Repo stars

GitHub Downloads (all assets, all releases)

Download on the Mac App Store
Free Download on the GitHub
Download on the App Store
Introduction