Published Apr 11, 2024

Author’s Note

This piece was originally written in November 2022, before ChatGPT, GPT-4, GPT Plugins, and AutoGPT. As such, some of the references may be slightly outdated. It was originally circulated internally in

Fermat

, but we’ve decided to publish it (save the intro and outro paragraphs) because many of the shortcomings highlighted with LLM-forward apps are still prevalent in the many chat-forward apps in use today. Many of the patterns surfaced have yet to be fully explored as well.

Also: I adapted this piece into a talk!

Watch it here

Some Context

Large Language Models (LLMs or LMs) are an incredibly powerful class of tool, but there are two main issues in how they’re often used:

language models are often used to generate text for content, and
you’re limited in how you interact with them by their interface: it’s most commonly a generic text-in, text-out interface, such as GPT-3’s playground.

There are plenty of powerful and useful apps that employ both of these strategies, but there are some inherent cognitive limitations to working with language models in a one dimensional, linear interface like a text editor or playground. As for the first point, that’s a more philosophical question that we’ll save for a later post, but in the meantime

Linus’s thoughts on the matter

very much mirror our own.

Spatial Queries x LLMs: Spatial Awareness

Some of the limitations mentioned above can be addressed when you bring language model interaction out of the playground and into a more dynamic 2D environment. Clicking buttons can be a little clunky, but something humans love to do, both online and off, is move and organize things. Whether it be clicking and dragging, grouping, or rearranging things, this behavior is ingrained into how we work and think. The concept itself is a tool for thought; we manipulate the physical or digital world to encode information in its very state, like alphabetized books or a browser window of tabs you have open to “read later”. We naturally do this when working in spatial canvases too, and we realized that yes-anding this behavior by allowing it to interact with language model operations is key to building truly special interactions.

To do that we use spatial querying, which is the idea that, within a 2D (or 3D) space, elements can get information from other elements at specific points or areas within that space. When you combine spatial querying with language models, you start to get some cool interactions. These tools, which combine spatial querying with the pseudo-cognitive abilities of language models are called “spatially aware”.

Here’s an example:

Here’s another:

Here, the “sentiment analyzer” block queries the space around it, performs a basic sentiment analysis operation using GPT-3 on any text block that enters its area of effect, and colors that block depending on its sentiment. What we’re focusing on here is not the specific operation (sentiment analysis), but the interaction the user has with this system. What this creates is a user-manipulable zone of sentiment analysis, so to speak, around the block itself, which can be moved freely around the document, painting other elements accordingly. This gives the user the power to perform language model operations around their canvas just by moving or organizing the information they have stored there, making it a much more natural part of their workflow and cognitive processes.

Here’s another example of this kind of tool. We call it the vibe clusterer:

This interaction is similar, but it acts on all the pieces of text around it, in this case finding a single word which can describe all of them. An interesting way of thinking about this is that essentially, there is a hidden layer beneath the canvas, populated at each point with words that represent the “common vibe” of the concepts on the visible layer. So, if there are 2 blocks near each other in the canvas labeled “panther” and “gazelle”, then around that point, the hidden layer is populated by words such as “animal”, “mammal”, or “fast”. The “vibe clusterer” block then makes that hidden layer visible: it drags that language up into the main canvas for you to see. By moving the main block around, you’re moving where you’re inspecting the hidden layer, and by moving the text blocks, you’re changing the actual makeup of the hidden layer. For example, if you then drag a “racecar” block near “panther” and “gazelle”, this changes the makeup of the hidden layer at that location, to things like “fast”, “speed”, or “quick”. And before we forget why we’re here, it’s the analysis power of the language model that populates this hidden semantic layer by doing the work of finding similar vibes.

This “vibe clusterer” is one of the simplest implementations of this idea of spatial awareness, but we’ve already entered territory that absolutely would not be possible were we using a linear, text-forward interface. The combined power and language models and spatial canvases is really quite cool, and we’re honestly just scratching the surface. On we go.

Grounding

This idea may seem a little ephemeral, so let’s talk about something more concrete. Something you can also do with these spatially aware zones is place them at different places in your workspace, designating those places as ones where such-and-such action happens.. By combining this with language model operations that are much more useful than sentiment analysis or vibe clustering, you can build a hyper-personalized workspace populated by different tools that cater to the exact tasks you’re trying to accomplish. Here’s an example of a tool that compares different pieces of writing.

This tool also yes-ands our natural tendency to group things and organize them. Imagine you have a lot of abstracts for papers you’re meaning to read, or you’re in a multiplayer document and everyone comes up with a potential solution to a problem, you’d naturally place all of those items together. Using spatially aware tools, you can have the document perform cogent analysis on your abstracts or solutions, for example: finding common themes between all the papers you’re meaning to read, or finding a solution area your team may have missed. These operations don’t require any extra cognitive effort to be done by you or your team, and their benefits are made almost instantly available.

Implicit Reaction

Another behavior we can augment in this way is writing itself. Displaying live AI-generated summaries of your paragraphs in a sidebar

can potentially

improve the writing experience, allowing people to better revise and understand their writing. People also view these summaries as an outside perspective on their own work, a notable cognitive framing which means that AI generated critiques can, if done well, allow writers to develop their writing as if being advised by another person. We built a live AI writing augmentation tool that lets you use different AI techniques to augment your writing while you write.

Because they’re natively built in Fermat’s spatial canvas, these tools are also drag-and-drop. This means you can have your tools laid out in front of you and build your augmentation suite spatially instead of selecting them from a menu, and customize it depending on what and how you want to write. This interaction mode, which lowers the barrier to switching tools around, encourages playing around with the tools at your disposal. It also helps one understand the capabilities of the language model you’re using. As LLM adoption in different contexts increases, it becomes increasingly more important to understand what they offer and where they are limited, and you do that through experimentation. There is a much higher ceiling to our achievement when we use these incredibly powerful AI tools to augment our ability to think, not just our ability to produce content, and using a spatial canvas as the interaction substrate multiplies their usefulness and value massively. This is a first foray into what an actual tool, or even app, that uses these strategies could look like, and it’s very exciting to see how that could play out.

Thanks for Reading!

Language models are an incredibly powerful tool, but they are just that: a tool. They are not the end-all be-all to “AI”, nor will they replace us as thinkers or builders. Unfortunately, we’ve already started to hamper our ability to use these tools to help us become better thinkers and builders by only interacting with language models through the reducing valve of claustrophobic, text only interfaces, instead of incorporating their pseudo-cognitive abilities into our workflows in ways that can augment our intelligence and ability to interact with abstract concepts.

back to notes