24th April 2025

CogVLM, a robust open-source Giant Multimodal Mannequin (LMM), gives strong capabilities for duties like Visible Query Answering (VQA), Optical Character Recognition (OCR), and Zero-shot Object Detection

On this information, I am going to stroll you thru deploying a CogVLM Inference Server with 4-bit quantization on Amazon Internet Companies (AWS). Let’s get began.

Setup EC2 Occasion

This part is essential even for these skilled with EC2. It should assist you to perceive the {hardware} and software program necessities for a CogVLM Inference Server.

To begin the method seek for EC2, then below ‘Cases’ click on the ‘Launch Cases’ button and fill out the shape in accordance with the specs under.

  • GPU Reminiscence: The 4-bit quantized CogVLM mannequin requires 11 GB of reminiscence. Go for an NVIDIA T4 GPU, usually accessible in AWS g4dn cases. You would possibly must request a rise in your AWS quota to entry these cases.
  • CUDA and Software program Necessities: Guarantee your machine has a minimum of CUDA 11.7 and Docker supporting NVIDIA. Selecting an OS Picture like ‘Deep Studying AMI GPU Pytorch’ simplifies the method.
  • Community: For this setup, enable all incoming SSH and HTTP visitors for safe entry and net connections.
  • Keys: Create and securely retailer an SSH key for accessing your machine.
  • Storage: Allocate round 50 GB for the Docker picture and CogVLM mannequin weights, with a little bit additional house as a buffer.

Setup Inference Server

As soon as logged in by way of SSH utilizing your domestically saved key, proceed with the next steps:

  • Verify CUDA Model, GPU Accessibility, and Confirm Docker and Python Installations.
# confirm GPU accessibility and CUDA model 
nvidia-smi  # confirm Docker set up 
docker --version
nvidia-docker --version # confirm Python set up 
python --version
  • Set up Python packages and begin the Inference Server.
# set up required python packages
pip set up inference==0.9.7rc2 inference-cli==0.9.7rc2  # begin inference server
inference server begin

This step entails downloading a big Docker picture (11GB) to run CogVLM, which could take a couple of minutes.

  • Run docker ps to ensure the server is operating. You must see a roboflow/roboflow-inference-server-gpu:newest container operating within the background.

Run Inference

To check the CogVLM inference, use a consumer script accessible on GitHub:

  • Clone the repository and arrange the atmosphere.
# clone cog-vlm-client repository
git clone https://github.com/roboflow/cog-vlm-client.git
cd cog-vlm-client # setup python atmosphere and activate it [optional]
python3 -m venv venv
supply venv/bin/activate # set up required dependencies
pip set up -r necessities.txt # obtain instance information [optional]
./setup.sh
  • Purchase your Roboflow API key and export it as an atmosphere variable to authenticate to the Inference Server.
export ROBOFLOW_API_KEY="xSI558nrSshjby8Y4WMb"
  • Run the Gradio app and question photos.
python app.py

The Gradio app will generate for you a novel hyperlink that you need to use to question your CogVLM mannequin from any pc or cellphone.

instance OCR picture question
instance VQA picture question

Be aware: The primary request to the server would possibly take a number of minutes because it hundreds mannequin weights into the GPU reminiscence. Monitor this course of utilizing docker system df and nvidia-smi. Subsequent requests shouldn’t take longer than a dozen seconds.

docker system df output after loading the Inference Server picture and CogVLM weights
nvidia-smi output after loading CogVLM weights into reminiscence

Conclusions

CogVLM is a flexible and highly effective LMM, adept at dealing with a spread of pc imaginative and prescient duties. In lots of instances, it could efficiently substitute GPT-4V and offer you extra management. Go to the Inference documentation to discover ways to deploy CogVLM in addition to different pc imaginative and prescient fashions.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.