The Good Tech Companies - How Vector Search Cracks the Code on Contract Analytics
Episode Date: December 9, 2024This story was originally published on HackerNoon at: https://hackernoon.com/how-vector-search-cracks-the-code-on-contract-analytics. Learn how vector search helped buil...d a powerful way to identify recurring payments in the financial sector. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #vector-search, #software-development, #app-development, #what-is-vector-search, #wealthapi, #wealthapi-implementation, #transaction-search-engine, #good-company, and more. This story was written by: @datastax. Learn more about this writer by checking @datastax's about page, and for more stories, please visit hackernoon.com. A look at the application architecture of wealthAPI, a data analytics provider for the financial sector that has created a highly accurate way to identify recurring payment entries.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
How Vector Search Cracks the Code on Contract Analytics by Datastacks
A look at the application architecture of Wealth API, a data analytics provider for
the financial sector that has created a highly accurate way to identify recurring payment entries.
At Wealth API, we've always believed financial analytics should be smarter and faster,
especially when identifying recurring payments hidden in transaction data.
We've built a solution that transforms raw transaction data into actionable insights
by leveraging AI. Our system uses vector embeddings to group transactions into recurring
payment patterns, ensuring accuracy even when recurring payment entries contain subtle wording
differences.
From subscriptions to insurance payments, our platform delivers reliable results while maintaining the speed and scalability financial companies need. Here, we'll show how we designed
our architecture to solve these challenges, from data ingestion and vector embeddings to
clustering transactions into meaningful groups. We'll also explore how AI powers advanced features
like semantic search, allowing users to find and analyze financial data effortlessly.
What the application does
Wealth API tackles a common yet challenging problem for financial companies.
Identifying recurring payments, such as subscriptions, in bank transaction histories.
Traditional methods struggled with scaling and often relied on exact matches, missing subtle differences, e.g. Spotify vs. Spotify AB. Wealth API addresses
this problem with an AI-driven approach that delivers accuracy and speed. At the heart of
this solution lies Datastacks AstraDB, a database platform purpose-built for modern, scalable, and AI-integrated workflows.
Architecture. Wealthoppy's system takes raw bank transactions, processes them into embeddings,
and groups them into recurring payment patterns, all powered by AstraDB's vector similarity search
capabilities. The architecture ensures scalability and responsiveness at each stage, even under high data volumes. Here's a simplified
flow of the process. 1. Data ingestion. When bank transactions are received, the Wealth API
backend publishes them on a message queue for asynchronous processing. Backslash dot. 2.
Embedding creation. Each transaction, e.g. Spotify, minus 10 euros, 22, 10, 24 inches, is transformed into a numeric vector,
EG, 0, 12, 0, 65, 0, 78, 0, 23, using AstraDB's vectorize feature.
Backslash dot, 3, vector storage and search in AstraDB. The embeddings are stored in AstraDB,
where lightning-fast vector similarity searches allow the system to find and cluster similar
transactions. 4. Regularity analysis. The clusters are analyzed to identify recurring payments,
categorizing them as contracts like Spotify, Music Service, Monthly, or Health Insurance, Health, yearly. AstraDB ensures the
entire process is scalable and responsive, even with high volumes of data. The process also adheres
to strict data security measures to ensure that end users and their transactions remain anonymous
and protected from external access. Technical implementation, clustering transactions into
contracts grouping transactions has always
been a core challenge. Previous tools depended in exact matches, e.g., vendor name or payment
amount, which often failed to capture variations and were slow to scale. At Wealth API, we tried
searching for patterns among millions of transactions with traditional databases in the
past, which was both slow and prone to errors. Even small variations in transaction details broke the clustering logic.
Because we're using AstraDB, we can store embeddings and efficiently search for similar
transactions, even with minor variations in details. Here's an example. A payment labeled
Spotify AB for 10 euros on one day and Spotify for 10 euros the next is
correctly grouped as the same recurring payment. Handling large data volumes with thousands of
transactions processed daily, Wealth API required a database that could scale seamlessly while
maintaining speed and accuracy. AstraDB's foundation is Apache Cassandra, so it's built
for scalability. It also integrates with AI workflows,
enabling Wealth API to maintain fast queries without compromising precision.
Transaction search engine Because embeddings capture the underlying meaning of transactions,
Wealth API can also implement a search feature. Users can type a keyword like
health to retrieve all health-related transactions without relying on predefined tags or categories. The system generates an embedding from the user query and
runs a simple similarity search using AstraDB. Its vector search capability makes this kind of
semantic search fast and accurate. A user typing health, for example, will see all payments for
health-related services, like insurance or gym memberships,
even if the vendor names differ. Wrapping up, Wealthoppy's use of AstraDB demonstrates how advanced database technology can drive innovation in financial analytics. From precise transaction
clustering to enabling a cutting-edge semantic search engine, AstraDB's vector search and
scalability empower Wealth API to deliver faster, smarter solutions to its clients. By integrating AI workflows directly into AstraDB's architecture,
Wealth API has enhanced financial data processing and introduced a valuable new capability for
contract analytics. By Belkacem Berchiche, machine learning engineer, Wealth API,
and Dieter Flick, solution engineer, data stacks learn more about AstraDB and Wealth API, and Dieter Flick, Solution Engineer, Data Stacks learn more about AstraDB
and Wealth API. Thank you for listening to this Hackernoon story, read by Artificial Intelligence.
Visit hackernoon.com to read, write, learn and publish.