AI All Funding Startup News in Korea (Korean Startup News)

Twelve Labs Earns $50 Million Series A to Build the Future of Multimodal AI and Video Understanding

2024-06-05 3 min read

author:

Twelve Labs Earns $50 Million Series A to Build the Future of Multimodal AI and Video Understanding

Reading Time: 3 minutes

Twelve Labs, the video understanding company, raised $50 million in Series A funding to fuel the ongoing development of its industry-leading foundation models dedicated to all aspects of video. The round was co-led by new investor New Enterprise Associates (NEA) and NVentures, NVIDIA’s venture capital arm, which recently participated in Twelve Labs’ strategic round. Previous investors, including Index Ventures, Radical Ventures, WndrCo, and Korea Investment Partners also joined the round. In addition to R&D, funds will be used to nearly double headcount. 

Twelve Labs has integrated a number of NVIDIA frameworks and services within its platform, including the NVIDIA H100 Tensor Core GPU and NVIDIA L40S GPU, as well as inference frameworks such as NVIDIA Triton Inference Server and NVIDIA TensorRT. Twelve Labs is also exploring product and research collaborations with NVIDIA to bring best-in-class multimodal foundation models and enabling frameworks to market.

Twelve Labs was created specifically for multimodal video understanding. Its release of Marengo-2.6 model, a state-of-the-art multimodal embedding model, is unlike anything currently available to companies. Marengo 2.6 offers a pioneering approach to multimodal representations tasks– not just to video but also image and audio, performing any-to-any search tasks, including Text-To-Video, Text-To-Image, Text-To-Audio, Audio-To-Video, Image-To-Video, and more. This model represents a significant leap in video understanding technology, enabling more intuitive and comprehensive search capabilities across various media types.

Twelve Labs also opened its beta of Pegasus-1, which sets a new standard in video-language modeling. Pegasus-1 is designed to understand and articulate complex video content, transforming how we interact with and analyze multimedia. It can process and generate language from video input with exceptional accuracy and detail. To get there, the Twelve Labs team drastically reduced the model’s size, from 80 billion parameters to 17 billion, with three components jointly trained together: video encoder, video-language alignment model, language decoder. Twelve Labs will release additional flagship Pegasus models in the coming months for organizations that can support larger models.

Twelve Labs introduced its Embeddings API, which gives users direct access to the raw multimodal embeddings that power the existing Video Search API and Classify API. This first-of-its-kind API supports all data modalities (image, text, audio, and video), turning data into vectors in the same space, without relying on siloed solutions for each modality.

Its new Embeddings API is powered by the Twelve Labs’ video foundation model and inference infrastructure, which are fundamentally different from those that process images one-by-one and stitch them together. By providing native support for multimodality in a single API, Twelve Labs can offset the large volume of assets models need to understand with low latency. 

“Through our work, particularly our perceptual-reasoning research, we are solving the problems associated with multimodal AI. We seek to become the semantic encoder for all future AI agents that need to understand the world as humans do,” said Jae Lee, co-founder and CEO of Twelve Labs. “With our Series A funding, we can invest into further research and development, hire aggressively across all roles, as well as to extend our reach and continue building partnerships with the most innovative, forward-thinking companies in existence to eliminate the boundaries of video understanding.”

Since debuting its platform, Twelve Labs has 30,000 users that are utilizing its APIs for tasks such as semantic video search and summarization across notable organizations in sports, media and entertainment, advertising, automotive, and security. In doing so, the company has started establishing deep industry partnerships and integrations with companies like Vidispine, EMAM, Blackbird, and more.

[Korean AI Startups News]



스타트업 투자의 숨은 말 '다코스'
Leave a comment

댓글 남기기