Vertex AI Streaming Ingestion allows for real-time AI
Near-real-time predictions are required for many machine learning (ML), use cases such as fraud detection, ad targeting, recommendation engines, and other areas. These predictions are dependent on having access to the latest data. Even a few seconds delay can make a big difference in performance. It’s not easy to create the infrastructure necessary to support high-throughput updates as well as low-latency retrieval.
Vertex AI Search Engine, and Feature Library will now support real-time streaming ingestion. Streaming Ingestion is a fully managed vector database that allows for vector similarity searches. Items in an index are continuously updated and reflected immediately in similarity search results. Streaming Ingestion can be used to retrieve the most recent feature values and extract real-time data for training.
Digits has taken advantage of Vertex Ai Matching Engine Streaming Ingestion in order to power their product, Boost. This tool saves accountants’ time and automates manual quality control. Digits Boost is now able to provide analysis and features in real-time thanks to Vertex AI Matching Engine Streaming Ingestion. Prior to Matching Engine transactions were classified according to a 24-hour batch schedule.
However, with Matching Engine streaming ingestion, we are able perform near-real time incremental indexing activities such as inserting, updating, or deleting embedded embeddings on existing indexes. This has helped us speed up our process. “Now we can provide immediate feedback to our customers and handle more transactions more quickly,” stated Hannes Hapke (machine learning engineer at Digits).
This blog post explains how these new features improve predictions and allow near-real-time use cases such as recommendations, content personalization and cybersecurity monitoring.
Streaming Ingestion enables real-time AI
Organizations are realizing the potential business benefits of predictive models that use up-to-date information, and more AI applications are being developed. Here are some examples.
Real time recommendations and a marketplace. Mercari has added Streaming Ingestion to their existing Matching Engine product recommendations. This creates a real-time market where users can search products based upon their interests and are updated immediately when new products are added. It will feel like shopping at a farmer’s market in the morning. Fresh food is brought in while you shop.
Mercari’s Matching Engine filtering capabilities and Streaming Ingestion can be combined to determine whether an item should appear in search results. This is based on tags like “online/offline” and “instock/nostock”.
Large-scale personalized streaming: You can create pub-sub channels for any stream of content that is represented with feature vectors. This allows you to select the most valuable content according to each subscriber’s interests.
Matching Engine’s scalability (i.e. it can process millions upon queries per second) means that you can support millions online subscribers to content streaming. You can also serve a wide range of topics that change dynamically because it is highly scalable. Matching Engine’s filtering capabilities allow you to control what content is included by assigning tags like “explicit”, “spam” and other attributes to each object.
Feature Store can be used as a central repository to store and serve the feature vectors of your contents in close real-time.
Monitoring – Content streaming can be used to monitor events and signals from IT infrastructures, IoT devices or manufacturing production lines. You can, for example, extract signals from millions sensors and devices and turn them into feature vectors.
Matching Engine allows you to update in near real-time a list “top 100 devices with defective signals” or “top 100 sensor events that have outliers”
Spam detection: Matching Engine can instantly identify potential attacks from millions upon millions of monitoring points if you’re looking for security threat signatures and spam activity patterns. Security threat identification that relies on batch processing can have significant delays, making the company more vulnerable. Your models can detect threats and spams more quickly with real-time data.
Implementing streaming use cases
Let’s look closer at some of these use cases.
Retailers get real-time advice
Mercari created a feature extraction pipeline using Streaming Ingestion.
To initiate the process, the feature extraction pipeline is called Vertex AIP Pipelines. It is periodically invoked by Cloud Scheduler or Cloud Functions.
Get item information: The pipeline issues an query to retrieve the latest item data from BigQuery.
Extract feature vector The pipeline makes predictions on the data using the word2vec modeling to extract feature vectors.
Update index The pipeline calls Matching engine APIs to to add feature vectors to the Vector Index. Also, the vectors can be saved to Bigtable (and may be replaced by Feature Store in future).
“We were pleasantly surprised by the extremely short latency for index updates when we tested the Matching Engine Streaming Ingestion. Nogami Wakana (a software engineer at Souzoh, a Mercari-group company) stated that they would like to add the functionality to their production service as soon it becomes GA.
This architecture design is also applicable to retail businesses that require real-time product recommendations.
Ad targeting
Real-time features, item matching and the latest information are key to ad recommender systems. Let’s look at how VertexAI can help you build an ad targeting system.
First, generate a list of candidates from the advertisement corpus. This is difficult because you need to generate relevant candidates in milliseconds. Vertex AI Matching engine can be used to generate relevant candidates and perform low-latency vector similarity matches. Streaming Ingestion is also available to keep your index up-to-date.
The next step is to rerank the candidate selection with a machine-learning model in order to ensure you have the right order of ad applicants. To ensure that the model uses the most recent data, you can use Feature Store Streaming ingestion to import the most recent features and use online to serve feature values at low latency to improve precision.
Final optimizations can be applied after reranking ads candidates. You can implement the optimization step using a