Learn more about how we built the Raycast API and how it works under the hood
Since releasing our API, lots of developers have asked how it all works under the hood. It’s a great question, if a little knotty, especially because we’ve intentionally tried to hide when users interact with extensions in Raycast.
Although users do technically install an “extension” through our store, it should never feel like they are. One of our early design goals was that we avoid a second-class mini-app that shows in some frame in Raycast, where developers use technology X and user interface paradigm Y, then ship it. Sorry if that’s disappointing to any webview fans out there!
Why’s that our philosophy? Well, Raycast is a fully native macOS app and we treat extensions as first-class citizens. To us that means consistently rendering a native user interface with our customizations. So users feel at home as soon as they open a command powered by an extension.
But you typically develop native UI in Swift and compile code in Xcode, so how does the entire system work? Instead of diagrams and code, we’ll use “tech storytelling” to dive into our journey here, including our current solution, the decisions and tradeoffs we made and the concrete technology and architecture we’ve been using. Here’s an outline:
After the public release of Raycast in October 2020, we started exploring how we could create an API for developers. Our goal was for them to have the freedom to customize Raycast and create shareable extensions that other users could install, in a similar way to an app store.
This was important for our success because we couldn’t build integrations for the vast tooling landscape all by ourselves. Sidenote: if you’ve ever tried to build an extension for a more complex tool such as Atlassian’s Jira, you might feel us 🥲 We wanted to create a community with collective creativity, where people could build cool and useful things that we wouldn’t even think of.
You can launch commands to run some business logic, often by presenting a user interface to Raycast. Inside the command, you can perform “actions” in our central “action panel” that’s everywhere in the app. There’s no limit on how extensive a command can be. If you look at some of our extensions in the store, they could classify as their own apps running inside Raycast and building on the UX toolbox we use throughout.
At first we thought there should only be a single command per installable bundle or package. We had no idea people would push the platform and build uber-extensions such as GitLab or Supernova. So it took us a while to arrive at the conceptual model of having an “extension” that exposes one or more “commands” (yes, naming is hard 😅).
Since we’re in the world of desktop apps, we didn’t have a blueprint for an architecture that we could copy and implement. On the other hand, enabling plugin functionality for applications is nothing new, so we started researching the basic tech that would let us dynamically load, run and unload extensions in our desktop client.
Looking at our old comparison table of pros and cons of the various approaches, you’d find tech buzzwords like:
I could go on but you get the idea 😅 Popular extensible apps such as Visual Studio Code seemed worth a closer look because it:
is a desktop app running on macOS
has an extension store and developer community
is open source.
Problem solved? Unfortunately, not quite. That’s because Visual Studio Code is an Electron app and there’s a Node runtime running. This might trigger a sense of unease in some macOS developers who’re used to paying for polished native macOS apps and can spot an Electron app a hundred miles away.
We didn’t know. Nobody knows…
Having sorted the challenge of running user-provided code dynamically and performing some native functionality when the extension calls, we looked into how you render a user interface. A flurry of questions came to mind. Do we use an imperative approach of old-school UI toolkits like Java Swing and AppKit? Or maybe something more declarative, after the industry’s big aha moment that v = f(s)? Basically the view is a function of the application’s state = React. And if we use a declarative approach, shouldn’t we also use SwiftUI in our native code base?
We started a series of explorations on how to define the user interface and efficiently render it using our custom UI components built with AppKit. We chose not to use SwiftUI as we weren’t confident in it not increasing our crash rate or degrading performance. So AppKit, the world of UI lifecycle methods and imperative UI code would be our native base.
In the extension, we defined the components using a JSON model that formed a tree of components to be rendered. We translated this tree to native view models to render using our components. That worked.
But in any user interface you need to manage state at some point. This is where things started to get less straightforward from point of usage, or to put it in new-age terms: the developer experience got worse. So we pulled in a state management library for extensions since this seemed to be a problem solved.
At this point, we could already communicate between the extension and native components, render a basic description of components and had some state management in place that would update, and eventually re-render, the UI.
“Test early and iterate” they say, so we did. We created a small community of users who could play around with our Alpha version. They started building basic extensions - win. But three things quickly became clear:
Developers weren’t delighted that they couldn’t just use all of their preferred npm packages.
UI and state management weren’t as easy to understand as we thought they were. Functional reactive approaches were luring around the corner and “React” was mentioned more often.
There were more APIs. Which leads us to polyfilling.
“Polyfill” is a fancy term for an adaptor-like pattern that takes something and makes it work with something else. Using polyfills is common in web development because of the need to support older or different JS engines, browsers, ecosystems, and more.
If we wanted to let developers use their favorite library, we’d need to introduce additional polyfills for compatibility. So it wouldn’t generate an endless work stream to provide all possible APIs that developers may need (”cool that you’ve exposed performing HTTP requests but now I need sockets and running an OS executable!”) – it’d also mean messy polyfilling work and more bugs.
Knowing what you know about software development, you didn’t really think we’d found the silver bullet, did you? 😉 First, how do we get the Node runtime to users? And second, how do we make sure that extensions are not running evil code, now that they have full access to Node’s APIs? To answer the first question: we decided against embedding a massive amount of C++ code into Raycast which would bloat the app size with each release. Instead, we chose to use an external runtime but “manage” it through Raycast.
Managing means we auto-download and install the right Node runtime, ideally without the user even noticing it before the first extension gets opened. And when we launch it from Raycast, we do integrity checking to make sure that Darth Vader hasn’t replaced the binary. This gives us an extra process that we can manage from Raycast and ask it to load an extension package for us (more on that later). Another benefit of this model is that the Node process can crash without crashing Raycast itself (most of the time).
One option to add constraints to a runtime is sandboxing, which could be loosely described in our context as: our API giving developers options to configure what an extension is allowed to do, with a system trying to enforce those constraints when end users run them.
We considered sandboxing but rejected it, since there are two main approaches for our runtime model, both which weren’t satisfying.
Sandbox at the process level
We’d need to run one Node process per extension but this can quickly get expensive. Also, Apple’s sandboxing tools like sandbox-exec, that ship with macOS, aren’t supported for third-party development. So we’d need to find another way, like writing our own tool to act as a security proxy for the Node process. Node 19 has an experimental new permissions feature but this only works well at the process level. Even newer runtimes, such as Deno, need you to configure permissions per process.
Not only does development become more complicated; you also need to show and explain permissions to users. At some point they’re likely to ignore these and other prompts out of insecurity and fatigue.
Preferably, a rogue extension doesn’t get installed on an end user’s machine at all, so that sandboxing doesn’t have to prevent disaster (but could still fail to do so for the user experience issues mentioned above). Visual Studio Code, for instance, maintains a “kill list” of flagged extensions that would then be automatically uninstalled for users.
So far we’ve been following an approach that needs all extensions to be reviewed and open source. Not a bulletproof solution by any means, but it gives us lots of extra benefits like:
Open sourcing and community reviews alone may still not be satisfying for an enterprise with stricter policies, so we’re investigating options for giving teams additional control over which extensions can be installed, how they are updated, and removed if they have been flagged.
Since extension land runs as a child process, it also inherits the environment and sandbox of the Raycast parent process. So if an extension needs access to some items listed in the macOS security preferences, system prompts may appear. Plus, to make sure that Bob’s server management extension didn’t interfere with Alice’s accounting extension, we put another type of technical isolation in place.
A worker can crash or run out of memory; we catch this and show an error screen for the extension but Raycast itself is largely unimpressed by that. Overall, workers have worked quite well for us (apart from a couple of obscure issues with exception and stack trace handling). Despite the isolation, there are still potential risks for extensions to negatively affect Raycast. For example, if a called native land API method is buggy and crashes.
We’re now in the realm of inter-process-communication (IPC) and there are a couple of ways to do that. We opted for a communication path that uses streams on the standard file descriptors (in Swift via the DispatchIO framework). This enables two-way communication. So the next question is what do you actually communicate over those streams? We picked the JSON-RPC protocol because it’s:
Extensions only send registered messages (”render”, “setClipboard”, etc.) to Raycast through the exposed API, meaning arbitrary calling into Raycast code isn’t possible. We target an extension from inside the Node worker of a specific extension and use temporary session IDs to process across boundaries to Raycast’s native representation. We generate the IDs when an extension gets loaded, so we know which one to refer to in native land. This also let us run multiple extension instances in parallel 💪
Now that we can exchange structured messages between two processes, how do we make sure the message ordering is right and that we don’t run into concurrency issues? In Node land inside the worker we’re single-threaded again, so there’s less potential for message requests mixing in unwanted ways.
In Swift, we use serial queues and buffered streams to make sure messages arrive and leave in order. And from there we can hop on another queue if we want to process a message in the background, or on the main serial queue when we need to update the user interface. Unfortunately this doesn’t fully solve the problem of race conditions that can still happen between the two processes. But we manage it reasonably well with custom logic.
Access to Node APIs and the IPC model is also what sets us apart from tech such as React Native. You might think this is a lot of overhead: from an extension to its worker port, to the parent, to the Raycast process via RPC. But it’s all surprisingly fast, even when we factor in the next important part 👇
In their extensions, developers use React to declare the UI through the custom Raycast components that we expose through our API. They can use React hooks for state management, which eventually may cause re-rendering of the user interface and Raycast updating its native UI.
This is all possible through a customizable part of React called the “reconciler”. In simple terms, the reconciler takes all pending changes and translates them to concrete updates of the rendering technology of the target platform. On the web, this is the browser’s DOM. In Raycast, this is AppKit. The React DOM reconciler is what you get for free when you use browser React. But in our case we needed to implement our own, send something over from the Node process to the Raycast process, understand it there and efficiently update the native UI. Eeek!
Getting the reconciler implementation right took us a couple of iterations as documentation back then was “sparse”. Our solution was to create a JSON representation for each component, eventually composing in the reconciler to what we call the “render tree”. Now that we have a complete description of the UI after each render pass, we can compare it to the previous render using a standard called JSON Patch. If there are no patches, it means there are no changes. So native land can just sit there and do nothing because there’s nothing to update 👌
After the render tree generation, we look at whether we can gain anything by compressing it. We found a threshold where gzip compression would save us more time than the actual compress/uncompress step takes overall. Once we know what to render and what the changes are, we send it to native land, construct lightweight Swift view models and translate patches to bitset types. So at any time we know what has changed and what the data representation of a component is.
Having this view model layer in place is also a way to shield native components from changes of the render tree representation. We pass it all down to custom AppKit components and ask them to update what is needed using standard UI code. This cycle continues for each render pass, starting when React senses something’s changed and needs to be updated. v = f(s) across process boundaries!
All of the above gives an overview of the two-way communication path from extensions to Raycast, and the tech behind it. But there’s more – developers ideally don’t want to press a compile button, wait to see their changes, search for cryptic errors, fix things, and do that all day long. Wait, that sounds like macOS development? 😛
Enter the revived meta buzzword “Developer Experience” (DX). Also known as developers wanting tools that don’t eat up all of their time, so they get to do what they actually want to do. That applies to both the “internal DX” – the experience of developing the tools for other developers. And “external DX”, which are the resulting tools for developers who want to build cool stuff for your platform, without the pain.
Our command line tooling, the ray CLI, is an essential tool for both DX streams. Internally, we use it to develop and build the API itself. And externally, developers use it for creating extensions, debugging and seeing their changes in Raycast.
For our internal DX, we can start the CLI in development mode, watching API source files and then transpiling them into the resource bundle of Raycast (the API is bundled into the app). To make this even easier for developers in our team who aren’t working on the API, we’ve also added a build phase to Xcode. It automatically downloads the compiled CLI and compiles the most recent version of the API when needed, as part of the main app compile step. That way, any developer in the team can launch Raycast in debug mode and run extensions with the latest API version.
We also distribute an internal build of the API, automatically created and published via GitHub CI to an internal package registry. We can then link the package to extensions to test early beta versions of new API features that we haven’t officially published yet.
We wrap the main CLI commands in “npm run” scripts so that developers can start a development session via “npm run dev” in their extension folder. From that point on we watch source files, transpile and “hot deploy” them into Raycast and reload the extension bundle so that people instantly see changes.
There’s no server running in Raycast to detect changes. Instead we rely on communication over app schemes and a good old pid file. The CLI basically talks to Raycast via URLs and Raycast talks to the CLI (for example, to stop a development session) via the process ID file that the CLI creates.
Raycast also provides a couple of UI tools, such as: the “Create Extension” command that scaffolds a new extension based on a template, or “Manage Extensions” to open and uninstall them, all without CLI involvement. We stream logs to the CLI via an OS log stream that we capture in the CLI. We catch errors at various places in the v8 worker and Node process, which are forwarded to Raycast. From there we extract the info, stack trace and try to present it as nicely as we can in a native overlay screen, including some actions to jump to the error in the editor.
We all know situations where dependency X wasn’t compatible with dependency Y and didn’t run on platform Z because of unknown reasons (╯°□°）╯︵ ┻━┻). With Raycast and the API, we have a couple of moving parts that need to be compatible with each other, such as: Raycast, Node, the API, the extension, the CLI. Our goal was to radically simplify this, so ideally versioning shouldn’t be an issue to developers and end users of Raycast.
Since we operate an “app store” for extensions, it made sense for us to fit our versioning model to what app stores typically do: only publish one latest version. Developers don’t need to deal with SemVer for something that end users rarely care about. Consequently, developers don’t need to specify a version in their extension (but they can still do so for informational purposes in their changelog, if they want to).
We also didn’t want to enforce declaring compatibility with an “engine” (the Node runtime) like other extension ecosystems do. Or even worse, to also express the compatibility with Raycast itself (another engine). The only version that developers really need to care about is the API version; this is the package dependency to the API types, distributed through npm. When developers want to use a new API feature, they bump up the API to the package version that includes the types for that new feature.
Raycast then handles the rest, that is: we auto-update Raycast and extensions, while only installing an extension when we know it’s compatible with that Raycast version, depending on the exact API version that this particular extension uses. The Raycast app version needs to be greater or equal to the API version, otherwise the extension might use an API feature that is not available at runtime (since the user hasn’t upgraded Raycast yet).
Raycast, in turn, includes the API and React, manages the Node runtime, and auto-installs the CLI when developing an extension. The important glue is to synchronize all of our version numbers. So the Raycast app version is the same as that of the API and CLI and they all get released together. Matching version numbers makes reasoning about what is compatible with what much simpler.
We want to stay backwards-compatible as much as possible. Our evolution and release cycle often goes like this:
Creating proposals is important for us to ensure consistency across our API and arrive at good solutions. The occasional need to deprecate an old API is unavoidable. But for the rare situations we need it, we usually create an automatic migration through code mods, so that updating an extension to changed types or method signatures is easy. We update the docs with each release. We use CI workflows that publish them to our public extensions repo and incorporate community contributions that we sync back to our internal repo before each release.
For developers to get their extension into the store and share it with others, we piggyback on GitHub’s infrastructure – an open monorepository and pull request workflow for reviews. When someone creates the pull request, we run a couple of automated checks with manifest checking, linting, asset checks for the store, and so on. We also review and test extensions and try to give meaningful feedback along the process.
Once published, we send automatic notifications to a Slack channel and developers often announce their new creation in the Slack community too. Overall, this model has worked really well so far, with people already creating hundreds of extensions. You might wonder how a monorepository and manual reviews of each extension scales. But “do things that don’t scale first” they say, right? Recently, we’ve introduced new Raycast and CLI commands that make forking an extension and contributing much more straightforward and don’t force you to clone the entire monorepo-the CLI performs the right Git(Hub) incantations behind the scenes.
Building the extension ecosystem after we’d released Raycast with a couple of built-in extensions had an interesting side effect. Developers were wondering how their own extension could make use of the same features and UI components that some of the integrated native extensions were using.
This has pushed us to extend the API with those capabilities, but also to port extensions formerly written in Swift and part of the Raycast codebase to open-source extensions (examples are GitHub, Linear, Google Workspace, or Zoom). Those extensions have become the baseline and informed what we need to build and expose through the API.
The API needs to evolve in various directions like providing UI components; OS level APIs that would otherwise be hard to implement purely in Node land; and making complex things simpler. As for the latter, we created and have now open-sourced a utilities package that works in tandem with our API. This makes typical tasks like async operations, networking and caching easier and promotes best practices around React.
Another direction we started exploring last year is enabling completely new types of extensions that Raycast didn’t provide before: menu bar commands. You get the same development model through React and hot reloading. And the end product is a native macOS menu bar app that end users can install, activate and use for their individual workflows.
Our recent release of Raycast’s AI features has also opened up new possibilities for developers since we’ve exposed an API that lets you create custom AI extensions. And finally, we’ve been looking into ways to give developers more insights into their extensions’ exceptions and analytics through a “developer hub” that we’re currently evaluating.
Thanks for sticking with us! If you’re a developer, please keep building cool extensions (and maybe start replacing those pesky internal automation scripts in your organization with an extension UI). And if you’re not (yet), why not start learning how to build your first extension.