With all of the progress in the world of AI over the past year thanks to LLMs, it's no longer completely crazy to believe that AI will soon coordinate a large swath of our day-to-day computing lives. If this future is to become a reality, these LLMs must transition from being used as glorified API-endpoints, to instead become fully integrated AI Agents in the modern computing stack.
The Shinkai team is quite opinionated that achieving this lofty goal requires building out an entire personal-server based AI OS, however the problems ahead are similar for all AI projects tackling these same problems.
Though there are many hurdles, the first major unsolved problem is quite simple to understand, yet as most engineering problems go, non-trivial to solve. How will our fully-integrated AI Agents be powerful enough to seamlessly interact with and fetch any data across the entire internet/user devices & apps/social media/services/blockchains/etc?
As some of you may know, the core primitive available for AI Agents to access and find needed data is Vector Search, however the current paradigm of Vector Search is stuck in last-gen AI tech; Vector Databases. These databases may seem powerful at first glance, yet they have no composability, offer clunky interfaces, limited feature-sets, and fail to map onto the user-facing computing experience. No average user today interacts with DBs themselves, but everybody, even on mobile, interfaces with file systems (ex. something as basic as downloading a pdf or picture from the internet).
Moving Past Vector Databases
Databases are a great backend technology, but they were never meant to become the single and only data model used in computing. No one in their right mind would send a Postgres snapshot to an end user requesting their personal data from a service, yet the current paradigm of Vector DBs would inevitably lead this to be the only solution.
If we take the AI-coordinated future premise seriously, we need to honestly considering how we can proliferate Vector Search throughout the entire internet to empower our AI Agents as first-class actors that can do everything (and much more) that you and I do today.
Take Github for example. Do we honestly believe that in the near future Github will natively support uploading Vector DB snapshots alongside your repo, with seamless diffs fully integrated, a full DB browser, and 20 other helpful features that integrate AI Agents fully? Of course not. With countless varied and unstandardized Vector DB implementations out there and no clean path forward for git integration (due to starkly conflicting data models), this use case among many other highlights DBs are great for backends, but fail as a universal data format.
Instead of clunky Vector DBs, we need a primitive that can just as easily be commit onto Github, as it can be sent over a p2p network between AI Agents, or delivered from one phone to the next via DM between users. Just like how we send a document or photo over Slack, we should be able to send Vector Searchable files just the same.
Wrote a lengthy business or research document that you want to share across your company and have your employees actually read through/get the gist of? Wouldn’t it be great if you could just send an email or Slack message with a file attached that seamlessly allowed everyone to import and use with their own AI Agents with 0 processing/wait time?
None of this is possible when we are stuck in the paradigm of databases, which is why we have spent a significant amount of R&D in building and pioneering a novel solution that we believe will push the space forward for all projects; Vector Resources.
Breaking Down Vector Resources
In short; Vector Resources are built to be the portable file-format of the AI Era. Just like standard files you have on your existing pc, Vector Resources can be stored in a file with a
.vrkai extension, while seamlessly providing all the Vector Search capabilities your AI Agents need (with even more advanced functionality when paired together with Shinkai's VectorFS, which we will touch on in the future).
Internally, Vector Resources are made up of a set of nodes and embeddings. For each node there is always a matching embedding (with a matching id); meaning any data stored within a node can always be found via performing a Vector Search using the embeddings.
Each node holds one of the following types of content:
- External Content Reference
- Another Vector Resource
- A Vector Resource Header
Let’s break each of these types down with a couple examples.
In the case of processing a simple “.txt” file into a Vector Resource, the whole text will be broken down into chunks (capped with a max size) where each chunk is stored in a text-holding node and has an embedding generated for it. Thus this simple Vector Resource will only include text nodes and nothing else.
However as you know, most data on the internet is a lot more complicated and structured compared to a basic `.txt` file. This is where the other nodes come into play.
Let’s say we are now processing a Wikipedia page into a Vector Resource. These pages are split up into sections and subsections which hold both text and images. When processing this kind of data, we will have a single Vector Resource for the whole page, however each section and subsection will also be another Vector Resource which is held inside of a node (at the right level of depth). This hierarchy of Resources inside of Resources provides us with seamless mapping of nested data which is extremely common in all kinds of content and use cases.
Furthermore, as mentioned, Wikipedia pages also have non-text content in the form of images. Thus such content need not be included in the Vector Resource itself, but instead by using OCR to generate a description of the image (or transcripts for videos), we can then generate a text embedding and add an External Content node (with relevant data such as file name, url, file type, etc.) that allows seamless integration into the Resource itself. Any time a Vector Search is performed and the External Content node is returned, the application can simply fetch the image and provide it as needed to the frontend (or whatever the use case may be).
Lastly, nodes can also hold Vector Resource Headers. These Headers contain metadata about an existing Vector Resource, including its Embedding, thereby allowing the Header to be used in Vector Searches as well. In other words, the Header is a form of reference/pointer, which is primarily used in more complex low-level architecture such as in Shinkai’s VectorFS.
As such, with these 4 types of nodes, we can cover the gamut of supporting all types of content inside of Vector Resources seamlessly together with a unified Vector Search interface (and thus plug into your AI Agents instantly).
Types Of Vector Resources
Because Vector Resources are designed to be the portable file format of the AI era, we not only have multiple types of nodes, we also in fact have multiple types of Vector Resources.
Coming from a basic Vector DB approach, this may seem counterintuitive when classically one simply dumps all embeddings into fungible collections. However, if we expect all content on computers to eventually become Vector Searchable, we must be able to support & turbo-charge the capabilities of AI Agents when working with structured data.
Of course as stated earlier, no matter the type of Vector Resource, each node has a single matching embedding and this does not change. However this restriction does not presuppose any specific internal data model. Thus different types of Vector Resources have different internal structures which can change both how the nodes are scored when performing Vector Searches (due to the hierarchical nature of Vector Resources), and the interface for direct reading and writing to the Vector Resource (including sometimes even offering different extra types of Vector Search).
To give a better idea for why this is needed in practice , let’s take a look at the types of Vector Resources currently available:
- Document Vector Resources
- Map Vector Resources
- Code Vector Resources (Coming soon)
Document Vector Resources are built for consuming data which is sequentially ordered (such as documents, books, web content, etc). The ids of nodes are guaranteed to be sequential (integer based), and Document Vector Resources provide a push/pop interface that guarantees this sequential ordering. This furthermore makes them extremely useful for use cases like constantly updating chat logs from Slack/Discord/etc. where new messages can simply be pushed into the Doc Resource with ordering guaranteed and dealt with behind the scenes.
Because nodes in Document Resources are guaranteed to be sequentially ordered, this constraints even allows us to implement specific alternative types of Vector Search. For example, Doc Resources have a Proximity Vector Search implemented which allows finding the single most similar node, and fetching N nodes before/after to return as the final result. In use cases where large surrounding context is very important it can be extremely useful compared to a classical Vector Search. This specific capability is only possible for content types which cleanly map onto a Document Vector Resource, and other types of Resources have similar unique benefits.
Map Vector Resources on the other hand are built for data types that need a key-value interface while providing full Vector Search capabilities. Have a table of data, json file, data structure, or anything else that needs to be often updated based on a specific key/identifier? Map Vector Resources fit in seamlessly for these use cases.
You can even use Map Vector Resources together with classical databases to unlock full Vector Search capabilities without having to switch over to an entire new stack or fit a clunky Vector DB into your existing one. With arbitrarily-deep nesting of Vector Resource nodes, you have a lot of flexibility for plugging in Map Vector Resources into your existing use cases (a lot of internal Shinkai Node architecture uses them, from toolkits to our VectorFS).
The Hierarchical Nature Of Vector Resources
Though the internal data structure of each type of Vector Resource may be unique, the fact remains that Vector Resources natively support deep nesting of Resources inside of Resources. This hierarchical structure is another unique difference compared to classical Vector DBs, with several benefits offered because of it.
Of note, classical Vector DBs were not designed as a universal Vector Search solution for the AI Era, but primarily to deal with last-generation embedding use cases such as recommendation systems and personalized search. For such use cases what mattered was the ability to search through billions/trillions of independent pieces of data as fast as possible assuming little structural relation between the data (with only relations between their embeddings deemed relevant). As such, these DBs implemented great algorithms for making flat search fast at scale, but such efficiency is largely unused in today’s landscape of LLM-powered agents.
What we need today is to naturally encode the structure of data we are ingesting and have said structure add “weight” to the Vector Search thereby ensuring the quality of results is high. When a user ingests 20-40 pdfs, a handful of blog posts, transcripts from videos, and some documents into Vector Resources, the number of embeddings we are dealing with are merely in the thousands, which can be very quickly searched through even on a low powered CPU.
As such, one of Vector Resources’ large focuses is on solving the quality problem, not quantity. The most significant example of how we do this is by taking advantage of the hierarchical nature of content when performing a Vector Search. The search automatically takes into account the weight/score of Resources higher in the hierarchy and averages them together with ones which are deeper. This allows the list of results to be built out of much more "structural context", thereby improving search results without requiring any extra computation time.
Vector Resources Vs. Vector DBs In Practice
Now that we've laid out the inner workings of Vector Resources, let's compare the experience of using a classical Vector DB vs. Vector Resources in a real world use case to make the differences more palpable.
To continue with our Wikipedia trend of examples, let's assume a user is working on writing a research paper related to the global effects of the 2007/2008 financial crisis, and is using Vector Search to find vital information on economic effects it had on all Western countries.
Standard Vector DB
- Challenging To Get Started: Getting started using a classical Vector DB today, the user will be required to write custom code which fetches all of the Wikipedia articles, processes them into chunks, generate embeddings, and then finally save them into the Vector DB. Frameworks exist, but take time to plug in all of the pieces.
- Integration Complexity/Limited Content: Content in images or videos wil be generally ignored as there is limited support for seamlessly processing them currently, with no global standard available for storing them in Vector DBs to enable robust retrieval of the original content.
- Lack Of Structural Hierarchy: All of the text from the dozens of articles the user processed into their DB have equal and non-compounding weighting, which can have large effects on quality. Given a search term of
2008 economy, a piece of information about novel cuisine in France that came out in 2008 may end up with a similar score to a piece of economic data which happened to only mention
2008in an earlier paragraph.
- Potential For DB Contamination: If the user by mistake also ingests completely unrelated articles into their DB which happen to have data about 2008, especially with a lack of structural hierarchy, the quality of the results they get back goes down greatly. Furthermore, there is no easy/standard way to know which articles have been ingested into the DB, nor any trivial method to remove all of the data/embeddings of unwanted articles from the DB.
- Lack Of Networking Effects: Once the user ingests all of the data into their own Vector DB on their computer, there is no straightforward path for them to share the data/embeddings with a team/community/ecosystem/as a Github repo. Hosting the DB publicly suddenly requires a domain/server/integrating an authentication flow in addition to a whole host of issues with trying to connect the Vector DB to any modern service like Slack or Discord.
- Ease Of Use: The Shinkai node is built to support full processing of data into Vector Resources with a single click of a button/API call (with libraries in multiple languages to make things even easier). Furthermore with cron-job support coming in the near future, users can ask their Shinkai nodes to ingest new content from news sites, social media, and more every X hours automatically.
- Designed For Multiple Content Types: Vector Resources are architected with the reality that non-text content must be accounted for and supported for retrieval when performing Vector Searches. This allows for easy displaying of images/videos when they are part of the top results from a Vector Search.
- Fully Hierarchical Data Model: All Vector Resources are hierarchical, with this hierarchy mapping onto sections/sub-sections in articles. This allows weighting of content found in the
2000-2010subsection of the
Economicssection to be much higher than a section about
French Cuisine, improving result quality.
- Strong Data Transparency: Another great sideffect of our hierarchical data model is that all data you ingest into Vector Resources is labled and tied-together. This means that if the user ever ingests articles unrelated to the core topic of economics, it is as easy as checking a box or deleting a file on one's computer to ignore it from the search temporarily or get rid of it completely.
- Empowers Networking Effects In AI: Vector Resources are a fully portable file format that can be shared over Slack, Discord, or even sent from one AI Agent to another across the internet. After publishing their paper, the user can easily release a Github repository with their Vector Resources allowing anyone to download and get access to the same data & Vector Search capabilities in a matter of seconds. No messy DB snapshots or hooking up arcane libraries, just instantly usable by both humans and their AI Agents.
As we develop Shinkai with all of the architecture required of an open AI OS, we’re hyper-focused on building solutions that go after the root of problems. If we honestly believe that there is a foundational paradigm-shift ahead of us, we need to start from square-one, and look at every piece of the puzzle of the modern computing experience.
In this direction, Vector Resources are the modern forward-thinking “AI replacement” of classical files, however files in and of themself alone are not enough. Files must exist within a file system, which is why we are building the very first VectorFS that natively embeds Vector Search as a core primitive to make every single piece of user data stored in the Shinkai Node fully vector searchable at a moment’s notice (thereby fully accessible to your AI Agents).
Going further, merely storing files on your local pc/node is great, but the internet was born out of people’s desire to share their files/data with others. As such we need a decentralized network that both seamlessly allows users to share all of their data in their VectorFS with others, while also empowering their AI Agents to message and interact with other Agents via cryptographic blockchain-based identities (thereby not being bound to any centralized authority).
After messaging/file sharing became the norm on the internet, soon after payment rails and financial transactions came into the picture with incredible effect. Just the same, any AI OS worth its salt must natively architect a payment solution that securely and privately unlocks both the existing financial stack and the world of crypto to our AI Agents.
Each of these steps requires building one foundational block on top of another, just like how the internet itself was built into what we know today. Shinkai’s mission is to ambitiously tackle all of these problems step-by-step as well, thereby creating an open source ecosystem that will outlast us and provide the much-needed robust alternative to large tech Megacorp AI.
If AI will change computing as we believe, then it is our imperative to create a future where everyone has the opportunity to take part and reap the fruits of this new era.