REST API

Our LLM models are accessible via REST API, and we follow the Chat Completion standard popularized by OpenAI. Below you can see a simple cURL example and JSON response for our endpoint, along with explanations of all parameters.

Example Request

curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $HYPERBOLIC_API_KEY" \
    -d '{
        "messages": [
            {
                "role": "system",
        	"content": "You are a helpful and polite assistant."
            },
            {
                "role": "user",
                "content": "What is Chinese hotpot?"
            }
        ],
        "model": "meta-llama/Meta-Llama-3-70B-Instruct",
        "presence_penalty": 0,
        "temperature": 0.1,
        "top_p": 0.9,
        "stream": false
    }'

Input Parameters

  • messages (list of objects): A list of chat messages, where each message is an object with properties: role and content. Supported roles are “system”, “assistant”, and “user”.
  • model (string): The model to be used for chat completion. Here is the complete list of presently supported model arguments. For more information regarding these models, see this description.
  • frequency_penalty (float, optional): A value between 0.0 and 1.0 that controls how much the model penalizes generating repetitive responses.
  • logit_bias (Dict[str, float], optional): Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
  • logprobs (int, optional): Number of log probabilities to return per output token.
  • top_logprobs (int, optional): An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used. Default: None
  • max_tokens (int, optional): The maximum number of tokens to generate.
  • n(int, optional): How many completions to generate for each prompt.
  • presence_penalty (float, optional): A value between 0.0 and 1.0 that controls how much the model penalizes generating responses that contain certain words or phrases.
  • seed (int, optional): random seed
  • stop (list of strings, optional): A list of strings that the model will stop generating text if it encounters any of them.
  • stream (boolean, optional): Indicates whether the response should be streamed.
  • temperature (float, optional): A value between 0.0 and 2.0 that controls the randomness of the model’s output.
  • top_p (float, optional): A value between 0.0 and 1.0 that controls the probability of the model generating a particular token.
  • user (string, optional): A unique identifier representing your end-user.
  • top_k (int, optional): Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens.
  • min_p (float, optional): Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.
  • repetition_penalty (float, optional): Prevents the repetition of previous tokens through a penalty. Default: 1.0.

Response Example:

{
   "id": "chatcmpl-3ETXpxdiM44Nnt9KF7feGp",
   "object": "chat.completion",
   "created": 1709791208,
   "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
   "choices": [
      {
         "index": 0,
         "message": {
            "role": "assistant",
            "content": " Thank you for asking! Chinese hotpot, also known as \"huoguo\" or \"fire pot,\" is a popular and sociable dining experience in Chinese cuisine.\n\nAt its core, hotpot is a simmering pot of broth or soup placed at the center of the dining table. Diners then cook various raw ingredients, such as thinly sliced meat, seafood, vegetables, tofu, and noodles, by dipping them into the hotpot. The cooked food is usually dipped in various sauces before eating.\n\nThe broth in the hotpot can vary in flavor, ranging from mild and soothing to spicy and numbing, depending on the region and personal preference. Sichuan hotpot, for example, is famous for its spicy and mouth-numbing broth made with chili oil, peppercorns, and various spices.\n\nHotpot is a fun and interactive way to enjoy a meal with friends and family, and it's a great way to try a variety of flavors and textures in one sitting. Thank you for your interest!"
         },
         "finish_reason": "stop"
      }
   ],
   "usage": {
      "prompt_tokens": 23,
      "total_tokens": 258,
      "completion_tokens": 235
   }
}

Stream Response

If you are building some application that needs more interactive experiences (e.g., a chatbot), you can set the stream parameter to true in the request.

Example request:

curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $HYPERBOLIC_API_KEY" \
    -d '{
        "messages": [
            {
                "role": "system",
        	"content": "You are a helpful and polite assistant."
            },
            {
                "role": "user",
                "content": "What is Chinese hotpot?"
            }
        ],
        "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
        "stream": true
    }'

This configuration enables your application to receive responses on a token-by-token basis.

Here is a segment of the response chunks:

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":" and"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":" gather"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":"ings"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":" in"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":" Chinese"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":" culture"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","object":"chat.completion.chunk","created":1709832532,"model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]