NAI Create API Endpoint
- Taylor Norris

- 2 days ago
- 3 min read

Nutanix Enterprise AI (NAI) turns your AI infrastructure into a production-ready platform for serving models at scale. One of the most powerful features of the platform is the ability to create inference endpoints—secure, scalable API interfaces that allow your applications to communicate with your deployed large language models (LLMs).
Whether you're building chatbots, copilots, or search assistants, inference endpoints are the bridge between your LLMs and the applications that depend on them.
In this guide, we’ll walk through how to create an inference endpoint in Nutanix Enterprise AI, what each configuration setting means, and why these endpoints are essential for powering modern AI workflows.
Start by logging in to NAI and clicking on the "Models" tab. You need an imported model with an Active status that is available for deployment.
For detailed instructions on deploying a model, please reference my previous blog post: Powering Your Private AI: Seamlessly Importing LLMs from Hugging Face with Nutanix Enterprise AI.

The next step in the Nutanix Enterprise AI workflow is to create an endpoint. After successfully importing the LLM, you can deploy it to an AI inference endpoint for real-time applications. This inference endpoint accepts requests and sends back responses, enabling your AI applications to communicate with the model.
By streamlining the LLM import process and integrating closely with Hugging Face, Nutanix Enterprise AI ensures that you can rapidly bring state-of-the-art models into your secure, managed enterprise environment.
To create an API endpoint, click the "Endpoints" tab in the left hand navigation menu

Click "Create New Endpoint" (note: Your screen will look like this if it is the first time you create an endpoint)

(note: If you already have existing endpoints, you will see them here. In this case, click "Create Endpoint")
Next, give your API Endpoint a name.
The name can include only lowercase letters, numbers, or '-'. It must start with a letter and end with either a letter or a number.

I will call this one "llama-3-2-1b".
Select the imported model you want to associate with this endpoint. I will uncheck the box for use gpu as my system doesn't have a GPU, but you would leave this checked if your system does. Next, click "Create a New API Key"

Give the key a name.
Click Create

Click "Copy Key" and store this information somewhere safe (like on a notepad or in a password manager).
Click "Close"

Click "Create" to create the API Key.

On the Endpoints tab, you will notice your new endpoint change status from "Pending" to "Processing" to "Active".



Once the endpoint status is active, click on the "Endpoint Name" hyperlink to test your endpoint.

Then click on the "Test" button.

Here you can use the built in sample request or do a custom request.

If I click on the built in request, the output from the inference will appear in the right hand pane.

Alternatively, you could click the "Custom Request" radio button and do a use case specific test.

When you are satisfied with your test, click "Done".
Click the "back arrow" to navigate back to the endpoint dashboard landing page.

The last thing we will review is how to view the code for a sample api request.
Click "View Sample Request"


The endpoints can be accessed using the URL and the API Key. Share the sample code with the developers to access the endpoints and integrate endpoints into their applications.
Summary
Inference endpoints are the critical link that turn large language models into usable, production-ready services. With Nutanix Enterprise AI, creating these endpoints is a streamlined, secure process that enables real-time interaction between your applications and deployed LLMs. From selecting an imported model and configuring GPU usage to generating API keys, testing requests, and sharing sample code with developers, NAI provides everything needed to operationalize AI quickly and safely. By simplifying endpoint creation and tightly integrating with model sources like Hugging Face, Nutanix Enterprise AI empowers teams to move from experimentation to enterprise-grade AI applications with confidence.
Additional Links




Comments