Skip to content

API Reference

Chat Completions

POST https://api.platform.preferredai.jp/v1/chat/completions

This endpoint generates responses from generation AI models based on provided chat messages.

Currently, only the PLaMo Prime model is supported.

Model IDContext LengthMaximum Output TokensReasoning Capability
plamo-2.2-prime32,768 tokens4,096 tokensNot supported
plamo-3.0-prime-beta65,536 tokens20,000 tokensSupported

Models plamo-2.2-prime and earlier have been upgraded to plamo-2.2-prime as of 2026/01/28.

While plamo-3.0-prime-beta is available as a separate model, it is currently only available to monitored enterprise customers. Those interested should submit an inquiry/application via this form.

Sample Usage

For curl, set your PLaMo API key in an environment variable named PLAMO_API_KEY, and for Python libraries, set it in an environment variable named OPENAI_API_KEY.

Request

bash
$ curl \
    -H "Authorization: Bearer ${PLAMO_API_KEY}" \
    -H "Content-Type: application/json" \
    "https://api.platform.preferredai.jp/v1/chat/completions" \
    -d @- << EOF
{
  "messages": [
    {
      "role": "system",
      "content": "You are a travel advisor."
    },
    {
      "role": "user",
      "content": "Please recommend an optimal sightseeing route in Kanazawa from morning until evening."
    }
  ],
  "model": "plamo-2.2-prime"
}
EOF
python
import os
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.platform.preferredai.jp/v1",
    model="plamo-2.2-prime",
    streaming=True,
    # other parameters...,
)

messages=[
    {"role": "system", "content": "You are a travel advisor."},
    {"role": "user", "content": "Please recommend an optimal sightseeing route in Kanazawa from morning until evening."},
]

for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)
python
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.platform.preferredai.jp/v1",
    # other parameters...,
)

completion = client.chat.completions.create(
    model="plamo-2.2-prime",
    messages=[
        {"role": "system", "content": "You are a travel advisor."},
        {"role": "user", "content": "Please recommend an optimal sightseeing route in Kanazawa from morning until evening."},
    ],
    stream=True,
)

for chunk in completion:
    if chunk.choices and chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

Response

json

{
    "id": "chat-eb6da3a371c546c8a8c4629794328c5b",
    "object": "chat.completion",
    "created": 1733220118,
    "model": "plamo-2.2-prime",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The quadratic formula for solving second-degree equations is as follows:...",
                "tool_calls": []
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
        }
    ],
    "usage": {
        "prompt_tokens": 169,
        "total_tokens": 394,
        "completion_tokens": 225
    },
    "prompt_logprobs": null
}
json
data: {"id":"chat-e05656fa4721494a93d54410881f00c8","object":"chat.completion.chunk","created":1734345308,"model":"plamo-2.2-prime","choices":[{"index":0,"delta":{"role":"assistant"},"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":143,"total_tokens":143,"completion_tokens":0}}

data: {"id":"chat-e05656fa4721494a93d54410881f00c8","object":"chat.completion.chunk","created":1734345308,"model":"plamo-2.2-prime","choices":[{"index":0,"delta":{"content":"PL"},"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":143,"total_tokens":144,"completion_tokens":1}}

data: {"id":"chat-e05656fa4721494a93d54410881f00c8","object":"chat.completion.chunk","created":1734345308,"model":"plamo-2.2-prime","choices":[{"index":0,"delta":{"content":"a"},"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":143,"total_tokens":145,"completion_tokens":2}}

data: {"id":"chat-e05656fa4721494a93d54410881f00c8","object":"chat.completion.chunk","created":1734345308,"model":"plamo-2.2-prime","choices":[{"index":0,"delta":{"content":"Mo"},"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":143,"total_tokens":146,"completion_tokens":3}}

data: {"id":"chat-e05656fa4721494a93d54410881f00c8","object":"chat.completion.chunk","created":1734345308,"model":"plamo-2.2-prime","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":143,"total_tokens":148,"completion_tokens":5}}

data: {"id":"chat-e05656fa4721494a93d54410881f00c8","object":"chat.completion.chunk","created":1734345308,"model":"plamo-2.2-prime","choices":[],"usage":{"prompt_tokens":143,"total_tokens":148,"completion_tokens":5}}

data: [DONE]

For the /chat/completions endpoint, models supporting reasoning capabilities support both reasoning and reasoning_content. Non-streaming responses include reasoning and reasoning_content within each choice.message, while streaming responses (when stream: true) incrementally include reasoning and reasoning_content in each chunk's choice.delta.

Request Parameters

Please include the following parameters in the request body in JSON format.

ParameterTypeRequiredDefault ValueDescription
modelstringrequiredSpecifies the ID of the model to use.
messagesarray of MessageObjectrequiredSets the list of chat messages. Each message must include a role and content.
frequency_penaltyfloatoptional0.0Controls the frequency of token repetition. Valid values range from -2.0 to 2.0.
max_tokensinteger or nulloptional4096Stops generation once the specified number of tokens have been produced. Depending on the resulting content, generation may terminate earlier. Cannot be set to exceed the model's maximum output token limit.
nintegeroptional1Specifies the number of times to generate content in a single request. Valid values are 1 or 2.
presence_penaltyfloatoptional0.0Applies a penalty for tokens that have already appeared in the output. Valid values range from -2.0 to 2.0.
seedinteger or nulloptionalnullSpecifies the random seed for token generation. Depending on internal state, different results may occur even when using the same seed value.
safety_identifierstring or nulloptionalnullAn identifier for user identification. This may be used by the PLaMo API side to detect anomalies or other issues. Since it may be persisted on the server, do not include any personal information or sensitive data.
stoparray of strings, strings, or nulloptionalnullStops generation when any of the specified words are encountered.
streambooleanoptionalfalseSet to 'true' when using streaming responses.
stream_optionsStreamOptionObject or nulloptionalnullSpecify options when enabling streaming. See StreamOptionObject below for details.
temperaturefloatoptional0.3Specifies the sampling temperature. Valid values range from 0 to 2.0.
toolsarray of ChatCompletionToolsObjectoptionalnullDefines the functions to be used for Function Calling.
tool_choiceChatCompletionToolChoiceObject or "none" or "auto" or "required"optionalnullSpecifies the function to be invoked.
top_pfloatoptional1.0Controls how far into the token generation results to consider. Valid values range from 0 to 1.0.

Response (when not using streaming)

FieldTypeDescription
idstringA unique ID assigned for each message.
choicesarray of ChoiceObjectReturns chat-style generation results.
createdintegerReturns the generated UNIX timestamp.
modelintegerReturns the model ID.
objectstringAlways returns 'chat.completion'.
usageUsageObjectReturns an object containing usage statistics.

Response Format (When Streaming is Used)

The response is returned using server-sent events. The response will conclude with an event: [DONE].

FieldTypeDescription
idstringA unique ID assigned for each message.
choicesarray of StreamChoiceObjectReturns chat-style generation results.
createdintegerUnix timestamp of when the generation was completed.
modelintegerReturns the model ID.
objectstringAlways returns 'chat.completion.chunk'.
usageUsageObjectReturns an object containing usage statistics.

Important Note about max_tokens

Currently, the API will output an error when the sum of input tokens + max_tokens exceeds the model's context length. Even if the actual generated output contains fewer tokens than the context length when rendered, the API will still produce an error during the input phase.

Specifically, if max_tokens is not explicitly set, it defaults to 4096, which means an error will occur even if the input length is below the context limit. In such cases, please explicitly specify max_tokens to ensure that input tokens + max_tokens does not exceed the model's context length.

Tokenize API

POST https://api.platform.preferredai.jp/v1/tokenize

Converts a given string into tokens and returns both the token sequence and the total number of tokens.

Note that in Chat Completion, the text will be converted to a chat format, so various prompts will be automatically inserted. Therefore, the token count returned here may not strictly match the actual number of tokens generated in the final text output.

Also, since PLaMo uses the same Tokenizer as this publicly available model, you can integrate similar functionality directly into your own system rather than using the API.

Example Usage

Request

bash
$ curl \
    -H "Authorization: Bearer ${PLAMO_API_KEY}" \
    -H "Content-Type: application/json" \
    "https://api.platform.preferredai.jp/v1/tokenize" \
    -d '{"prompt": "Hello", "model": "plamo-2.2-prime"}'

Response

bash
{
    "count": 4,
    "max_model_len": 16384,
    "tokens": [
        1,
        1,
        35913,
        37607
    ]
}

Request Parameters

Specify the following parameters in JSON format within the request body.

ParameterTypeRequiredDefault ValueDescription
modelstringyesSpecify the ID of the model to be used.
promptstringEither messages or prompt must be specifiedSpecify the string for which you want to calculate tokens.
messagesarray of MessageObjectEither messages or prompt must be specifiedSpecify the strings for which you want to calculate tokens. The same format as chat messages used in the Chat Completions endpoint can be used.
add_special_tokensbooleannotrueSpecify whether to add special tokens like BOS.
add_generation_promptbooleannotrueSpecify whether to apply internal generation prompts during tokenization. This option is only valid when messages are specified.

Response

FieldTypeDescription
countintegerTotal number of tokens.
max_model_lenintegerMaximum number of tokens the model can process.
tokensarray of integerThe actual token sequence.

Models API

Retrieves data including available models.

Example

Request

For curl, set your PLaMo API key in the PLAMO_API_KEY environment variable, and for Python libraries, set it in the OPENAI_API_KEY environment variable.

bash
$ curl \
    -H "Authorization: Bearer ${PLAMO_API_KEY}" \
    -H "Content-Type: application/json" \
    "https://api.platform.preferredai.jp/v1/models"
python
from openai import OpenAI
client = OpenAI(
    base_url="https://api.platform.preferredai.jp/v1",
)

models = client.models.list()
print(models)

model_name = models.data[0].id
model = client.models.retrieve(model_name)
print(model)

Response

json
{
    "data": [
        {
            "id": "plamo-2.2-prime",
            "created": 1732978800,
            "object": "model",
            "owned_by": "system"
        }
    ],
    "object": "list"
}

Index Models

GET https://api.platform.preferredai.jp/v1/models

Retrieves a list of available models.

Response

FieldTypeDescription
objectstringAlways contains the value list.
dataarray of ModelObjectReturns an array of model objects

Get Model Details

GET https://api.platform.preferredai.jp/v1/models/{model}

Fetches detailed information about a specified model.

Path Parameters

ParameterTypeRequiredDescription
modelstringyesSpecifies the ID of the model to retrieve.

Response

Returns a ModelObject.

Object List

MessageObject

Represents a chat message, primarily consisting of System, User, and Assistant components.

ParameterTypeRequiredDefault ValueDescription
contentstringyesSpecifies the content of the message.
rolestringyesIndicates the role of the message; must be one of 'system', 'user', or 'assistant'.
namestringnoA name used for further identification when the same role is referenced.
reasoningstringnoSets the reasoning provided in the response.
reasoning_contentstringnoSets the reasoning_content provided in the response.

StreamOptionObject

Specifies options when using streaming responses.

ParameterTypeRequiredDefault ValueDescription
include_usagebooleannoSpecifies whether to return usage statistics just before the streaming response completes.

ChatCompletionToolsObject

Defines the functions to be used with the Function Calling feature.

FieldTypeDescription
typestringMust be set to 'function'.
functionChatCompletionsToolsFunctionObjectDefines the actual function to be used with Function Calling.

ChatCompletionToolsFunctionObject

Defines the actual function implementation.

FieldTypeDescription
namestringThe name of the function.
descriptionstringA description of the function.
parametersstringDefinition of arguments that can be passed to the function, expressed in JSON Schema. For details on JSON Schema, see: https://json-schema.org/understanding-json-schema

ChatCompletionToolChoiceObject

Specifies which function to use.

FieldTypeDescription
typestringMust be set to 'function'.
functionChatCompletionToolChoiceFunctionObjectSpecifies the function to use.

ChatCompletionToolChoiceFunctionObject

Specifies the function details.

FieldTypeDescription
namestringSpecifies the name of the function specified in the 'tools' parameter.

ChoiceObject

Represents the result of generating a response in chat format.

FieldTypeDescription
indexintegerReturns the index of the object when multiple ChoiceObjects are available.
finish_reason'stop' or 'length'Returns 'stop' if the specified word was encountered or if generation naturally concluded; returns 'length' if the maximum token limit was exceeded.
logprobsnullCurrently not supported.
messageChatMessageObjectReturns the content of the generated message.
stop_reasoninteger, string, or nullCurrently not supported.

ChatMessageObject

Represents the content of a generated message.

FieldTypeDescription
contentstringReturns the message content.
reasoningstring or nullReturns the model's reasoning process. This field may only be present for models supporting reasoning.
reasoning_contentstring or nullContains the same content as reasoning.
rolestringReturns the role of the message, which can be one of 'system', 'user', or 'assistant'.
tool_callsChatMessageToolCallsObjectInformation about Function Calling generated by the model.

ChatMessageToolCallsObject

Indicates the result of a function call.

FieldTypeDescription
idstringA unique identifier.
typestringMust be 'function'.
functionChatMessageToolCallsFunctionObjectInformation about the Function called via Function Calling.

ChatMessageToolCallsFunctionObject

Information about the called Function.

FieldTypeDescription
namestringIndicates the name of the Function.
argumentsstringFunction arguments specified in JSON format.

StreamChoiceObject

Represents the result of generated chat messages in a stream format.

FieldTypeDescription
indexintegerReturns the index of the object when multiple ChoiceObject instances exist.
finish_reason'stop' or 'length'Returns stop if the specified word is encountered or generation ends naturally, and length if the maximum generated token count is exceeded.
logprobsnullCurrently not supported.
deltaChatMessageDeltaObjectReturns the content of the additionally generated message.
stop_reasoninteger or string or nullCurrently not supported.

ChatMessageDeltaObject

Represents the content of a message in a stream format.

FieldTypeDescription
contentstringReturns the content of the message. The content field may be absent in cases such as when a role assignment is emitted.
reasoningstring or nullReturns the model's reasoning process. This field may be present only for models that support reasoning.
reasoning_contentstring or nullContains the same content as reasoning.
rolestringReturns the role of the message, which is one of system, user, or assistant. This field is set only when a new role is assigned.

UsageObject

Indicates token generation results and related information.

FieldTypeDescription
completion_tokensintegerReturns the number of generated tokens.

ModelObject

Indicates information about a model.

FieldTypeDescription
idstringA string that identifies the model. It can be used as an API parameter.
createdintegerThe time when the model was created.
objectstringThe type of this object. Currently, this field is always model.
owned_bystringThe owner of the model. Currently, this field is always system.

Old API Endpoints

The API endpoints under https://platform.preferredai.jp/api/completion/v1 have been migrated. The new endpoint is https://api.platform.preferredai.jp/v1.

The old API endpoints remain available, but they are deprecated. Please use the new API endpoints.