REST API

Our LLM models are accessible via REST API, and we follow the Chat Completion standard popularized by OpenAI. Below you can see a simple cURL example and JSON response for our endpoint, along with explanations of all parameters.

Example Request

curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $HYPERBOLIC_API_KEY" \
    -d '{
        "messages": [
            {
                "role": "system",
        	"content": "You are a helpful and polite assistant."
            },
            {
                "role": "user",
                "content": "What is Chinese hotpot?"
            }
        ],
        "model": "meta-llama/Meta-Llama-3-70B-Instruct",
        "presence_penalty": 0,
        "temperature": 0.1,
        "top_p": 0.9,
        "stream": false
    }'

Input Parameters

  • messages (list of objects): A list of chat messages, where each message is an object with properties: role and content. Supported roles are “system”, “assistant”, and “user”.
  • model (string): The model to be used for chat completion. Here is the complete list of presently supported model arguments. For more information regarding these models, see this description.
  • temperature (float, optional): A value between 0.0 and 2.0 that controls the randomness of the model’s output.
  • top_p (float, optional): A value between 0.0 and 1.0 that controls the probability of the model generating a particular token.
  • n(int, optional): How many completions to generate for each prompt.
  • max_tokens (int, optional): The maximum number of tokens to generate.
  • stop (list of strings, optional): A list of strings that the model will stop generating text if it encounters any of them.
  • stream (boolean, optional): Indicates whether the response should be streamed.
  • frequency_penalty (float, optional): A value between 0.0 and 1.0 that controls how much the model penalizes generating repetitive responses.
  • presence_penalty (float, optional): A value between 0.0 and 1.0 that controls how much the model penalizes generating responses that contain certain words or phrases.
  • user (string, optional): A unique identifier representing your end-user.

Response Example:

{
   "id": "chatcmpl-3ETXpxdiM44Nnt9KF7feGp",
   "object": "chat.completion",
   "created": 1709791208,
   "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
   "choices": [
      {
         "index": 0,
         "message": {
            "role": "assistant",
            "content": " Thank you for asking! Chinese hotpot, also known as \"huoguo\" or \"fire pot,\" is a popular and sociable dining experience in Chinese cuisine.\n\nAt its core, hotpot is a simmering pot of broth or soup placed at the center of the dining table. Diners then cook various raw ingredients, such as thinly sliced meat, seafood, vegetables, tofu, and noodles, by dipping them into the hotpot. The cooked food is usually dipped in various sauces before eating.\n\nThe broth in the hotpot can vary in flavor, ranging from mild and soothing to spicy and numbing, depending on the region and personal preference. Sichuan hotpot, for example, is famous for its spicy and mouth-numbing broth made with chili oil, peppercorns, and various spices.\n\nHotpot is a fun and interactive way to enjoy a meal with friends and family, and it's a great way to try a variety of flavors and textures in one sitting. Thank you for your interest!"
         },
         "finish_reason": "stop"
      }
   ],
   "usage": {
      "prompt_tokens": 23,
      "total_tokens": 258,
      "completion_tokens": 235
   }
}

Stream Response

If you are building some application that needs more interactive experiences (e.g., a chatbot), you can set the stream parameter to true in the request.

Example request:

curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $HYPERBOLIC_API_KEY" \
    -d '{
        "messages": [
            {
                "role": "system",
        	"content": "You are a helpful and polite assistant."
            },
            {
                "role": "user",
                "content": "What is Chinese hotpot?"
            }
        ],
        "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
        "stream": true
    }'

This configuration enables your application to receive responses on a token-by-token basis.

Here is a segment of the response chunks:

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":" and"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":" gather"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":"ings"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":" in"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":" Chinese"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":" culture"},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}

data: {"id":"chatcmpl-R8Fbh8UyspGjhXNdKqAQtT","object":"chat.completion.chunk","created":1709832532,"model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]