How the Raycast API and extensions work

Learn more about how we built the Raycast API and how it works under the hood

Written by
AvatarFelix Raab
Published on

Since releasing our API, lots of developers have asked how it all works under the hood. It’s a great question, if a little knotty, especially because we’ve intentionally tried to hide when users interact with extensions in Raycast.

Although users do technically install an “extension” through our store, it should never feel like they are. One of our early design goals was that we avoid a second-class mini-app that shows in some frame in Raycast, where developers use technology X and user interface paradigm Y, then ship it. Sorry if that’s disappointing to any webview fans out there!

Why’s that our philosophy? Well, Raycast is a fully native macOS app and we treat extensions as first-class citizens. To us that means consistently rendering a native user interface with our customizations. So users feel at home as soon as they open a command powered by an extension.

But you typically develop native UI in Swift and compile code in Xcode, so how does the entire system work? Instead of diagrams and code, we’ll use “tech storytelling” to dive into our journey here, including our current solution, the decisions and tradeoffs we made and the concrete technology and architecture we’ve been using. Here’s an outline:

Some context from the early days

Soon after launching we found our superpower: creating our community

After the public release of Raycast in October 2020, we started exploring how we could create an API for developers. Our goal was for them to have the freedom to customize Raycast and create shareable extensions that other users could install, in a similar way to an app store.

This was important for our success because we couldn’t build integrations for the vast tooling landscape all by ourselves. Sidenote: if you’ve ever tried to build an extension for a more complex tool such as Atlassian’s Jira, you might feel us 🥲 We wanted to create a community with collective creativity, where people could build cool and useful things that we wouldn’t even think of.

An extension became an umbrella term for one or more “commands”

You can launch commands to run some business logic, often by presenting a user interface to Raycast. Inside the command, you can perform “actions” in our central “action panel” that’s everywhere in the app. There’s no limit on how extensive a command can be. If you look at some of our extensions in the store, they could classify as their own apps running inside Raycast and building on the UX toolbox we use throughout.

At first we thought there should only be a single command per installable bundle or package. We had no idea people would push the platform and build uber-extensions such as GitLab or Supernova. So it took us a while to arrive at the conceptual model of having an “extension” that exposes one or more “commands” (yes, naming is hard 😅).

Thinking through our technology choices

Since we’re in the world of desktop apps, we didn’t have a blueprint for an architecture that we could copy and implement. On the other hand, enabling plugin functionality for applications is nothing new, so we started researching the basic tech that would let us dynamically load, run and unload extensions in our desktop client.

Looking at our old comparison table of pros and cons of the various approaches, you’d find tech buzzwords like:

I could go on but you get the idea 😅 Popular extensible apps such as Visual Studio Code seemed worth a closer look because it:

  1. is a desktop app running on macOS

  2. has an extension store and developer community

  3. is open source.

Problem solved? Unfortunately, not quite. That’s because Visual Studio Code is an Electron app and there’s a Node runtime running. This might trigger a sense of unease in some macOS developers who’re used to paying for polished native macOS apps and can spot an Electron app a hundred miles away.

Plus, wasn’t Node a runtime that forced developers to use a programming language that many see as flawed? With a package ecosystem riddled by security problems that caused half of the entire web not to work at times? Plus a bunch of other issues, so that its creator went ahead to fix all mistakes in a new shiny runtime? Perhaps. On the other hand, according to Atwoods’s Law, JavaScript is here to stay, other languages compile nicely to JavaScript, and Visual Studio Code does work quite well…

Our first attempt: TypeScript + JSC

We set JavaScript as our primary runtime language and TypeScript as our main extension language. We could have chosen something more eclectic and simple, like Lua. But there were a few questions on our minds about that:

  1. Would people really build extensions with it?
  2. Would the ecosystem flourish? How easy is it to pick up a less popular language and convince others to learn it, so would we leverage lots of open source projects to build extensions?
  3. Would we end up with a default dead ecosystem and a handful of power users who’d later question why they, for example, can’t also use Unreal Engine in their extensions?

We didn’t know. Nobody knows…

Running it

Running JavaScript in a macOS app first leads to Apple’s official ways of executing JS, which is JavaScriptCore, via its bundled engine or WebKit. Extensions should still run isolated. So we figured we could use one JavaScriptCore instance per extension – dynamically load the code, run it, bridge to the native codebase, and have some supervising entity that takes care of loading, unloading and recovering from crashes.

Ideally we didn’t want the main Raycast process to be seriously affected if those JS engines went rogue. So we thought we should run them out-of-process via XPC. That’s Apple’s own version of inter-process-communication that uses a binary protocol. And that’s what we did: one Raycast main process, one XPC support process (the extension host) and many JavaScriptCore engines running in that support process. We hacked a proof-of-concept and it all worked nicely… suspicious.

Next up, we looked into rendering a user interface

Having sorted the challenge of running user-provided code dynamically and performing some native functionality when the extension calls, we looked into how you render a user interface. A flurry of questions came to mind. Do we use an imperative approach of old-school UI toolkits like Java Swing and AppKit? Or maybe something more declarative, after the industry’s big aha moment that v = f(s)? Basically the view is a function of the application’s state = React. And if we use a declarative approach, shouldn’t we also use SwiftUI in our native code base?

We started a series of explorations on how to define the user interface and efficiently render it using our custom UI components built with AppKit. We chose not to use SwiftUI as we weren’t confident in it not increasing our crash rate or degrading performance. So AppKit, the world of UI lifecycle methods and imperative UI code would be our native base.

In the extension, we defined the components using a JSON model that formed a tree of components to be rendered. We translated this tree to native view models to render using our components. That worked.

But in any user interface you need to manage state at some point. This is where things started to get less straightforward from point of usage, or to put it in new-age terms: the developer experience got worse. So we pulled in a state management library for extensions since this seemed to be a problem solved.

What we learnt from the Alpha version

At this point, we could already communicate between the extension and native components, render a basic description of components and had some state management in place that would update, and eventually re-render, the UI.

“Test early and iterate” they say, so we did. We created a small community of users who could play around with our Alpha version. They started building basic extensions - win. But three things quickly became clear:

  1. Developers weren’t delighted that they couldn’t just use all of their preferred npm packages.

  2. UI and state management weren’t as easy to understand as we thought they were. Functional reactive approaches were luring around the corner and “React” was mentioned more often.

  3. There were more APIs. Which leads us to polyfilling.

“Polyfill” is a fancy term for an adaptor-like pattern that takes something and makes it work with something else. Using polyfills is common in web development because of the need to support older or different JS engines, browsers, ecosystems, and more.

So how was polyfilling relevant for our API?

JavaScriptCore is a vanilla engine doing JavaScript code. But there’s no access to web APIs or other OS-level features. Our host, Raycast, would need to provide important functionality such as networking or file IO with flexible APIs for performing a network request or reading a file.

If we wanted to let developers use their favorite library, we’d need to introduce additional polyfills for compatibility. So it wouldn’t generate an endless work stream to provide all possible APIs that developers may need (”cool that you’ve exposed performing HTTP requests but now I need sockets and running an OS executable!”) – it’d also mean messy polyfilling work and more bugs.

Our second attempt (and now business is calling)

Our first attempt took us quite far. We even had a working API for most of the file IO operations that you might need. We interviewed more developers to learn about their experiences using the API, what else they’d need and what they’d do differently. We were exposing a JavaScript/TypeScript API but developers were limited in which libraries they could use. This caused something like cognitive dissonance, or perhaps just wrong expectations and some disappointment. We couldn’t please everyone, right?

But what if the fact that somebody couldn’t pull in say OctoKit, to easily interact with the GitHub API, would make a developer not adopt our platform? Because we promised JavaScript, and with that, access to a ton of previously published work to build upon. And if developers didn’t use the platform to build extensions, our effort to extend would fail. We didn’t intend to monetize through extensions. But this seemed to be a critical product and business concern. So it pushed us to keep looking for other solutions.

We needed to run React on Node.js and make it interact with Raycast

So if we wanted to: a) use JavaScript/TypeScript, b) not spend the rest of our lives exposing and polyfilling OS-level APIs, c) enable easy access to the entire JS ecosystem and d) meet developers expectations for declaring a user interface and managing its state, then we concluded we needed to run React on Node.js and make it interact with Raycast.

No silver bullet

Knowing what you know about software development, you didn’t really think we’d found the silver bullet, did you? 😉 First, how do we get the Node runtime to users? And second, how do we make sure that extensions are not running evil code, now that they have full access to Node’s APIs? To answer the first question: we decided against embedding a massive amount of C++ code into Raycast which would bloat the app size with each release. Instead, we chose to use an external runtime but “manage” it through Raycast.

Managing means we auto-download and install the right Node runtime, ideally without the user even noticing it before the first extension gets opened. And when we launch it from Raycast, we do integrity checking to make sure that Darth Vader hasn’t replaced the binary. This gives us an extra process that we can manage from Raycast and ask it to load an extension package for us (more on that later). Another benefit of this model is that the Node process can crash without crashing Raycast itself (most of the time).

Options for sandboxing the runtime

One option to add constraints to a runtime is sandboxing, which could be loosely described in our context as: our API giving developers options to configure what an extension is allowed to do, with a system trying to enforce those constraints when end users run them.

We considered sandboxing but rejected it, since there are two main approaches for our runtime model, both which weren’t satisfying.

Sandbox at the process level

We’d need to run one Node process per extension but this can quickly get expensive. Also, Apple’s sandboxing tools like sandbox-exec, that ship with macOS, aren’t supported for third-party development. So we’d need to find another way, like writing our own tool to act as a security proxy for the Node process. Node 19 has an experimental new permissions feature but this only works well at the process level. Even newer runtimes, such as Deno, need you to configure permissions per process.

Sandbox at the JavaScript engine level

And run all JS code through an additional virtual layer. But this is known to degrade performance and the approach is limited. Ironically, using JavaScriptCore like we did in the first attempt we discarded, would have given us a sandbox for free. But this would’ve introduced a different set of problems, while potentially putting the entire project at risk.

From sandboxing considerations to open sourcing extensions and reviews

Not only does development become more complicated; you also need to show and explain permissions to users. At some point they’re likely to ignore these and other prompts out of insecurity and fatigue.

Preferably, a rogue extension doesn’t get installed on an end user’s machine at all, so that sandboxing doesn’t have to prevent disaster (but could still fail to do so for the user experience issues mentioned above). Visual Studio Code, for instance, maintains a “kill list” of flagged extensions that would then be automatically uninstalled for users.

So far we’ve been following an approach that needs all extensions to be reviewed and open source. Not a bulletproof solution by any means, but it gives us lots of extra benefits like:

Open sourcing and community reviews alone may still not be satisfying for an enterprise with stricter policies, so we’re investigating options for giving teams additional control over which extensions can be installed, how they are updated, and removed if they have been flagged.

One process for Raycast and one for the world of extensions

Since extension land runs as a child process, it also inherits the environment and sandbox of the Raycast parent process. So if an extension needs access to some items listed in the macOS security preferences, system prompts may appear. Plus, to make sure that Bob’s server management extension didn’t interfere with Alice’s accounting extension, we put another type of technical isolation in place.

Since version 12, Node supports “worker threads”, essentially v8 isolates (v8 is Node’s JavaScript engine). This gives us a separate JavaScript engine instance and run loop so that we can get a level of isolation between each extension loaded into the runtime. We can also set up memory limits for an extensions’s heap (extensions that get too greedy will be stopped without a heads-up), create and destroy workers as needed, safely communicate back and forth to the Node parent process, and from there to Raycast.

A worker can crash or run out of memory; we catch this and show an error screen for the extension but Raycast itself is largely unimpressed by that. Overall, workers have worked quite well for us (apart from a couple of obscure issues with exception and stack trace handling). Despite the isolation, there are still potential risks for extensions to negatively affect Raycast. For example, if a called native land API method is buggy and crashes.

From this Node worker, how do we communicate to Raycast and back?

We’re now in the realm of inter-process-communication (IPC) and there are a couple of ways to do that. We opted for a communication path that uses streams on the standard file descriptors (in Swift via the DispatchIO framework). This enables two-way communication. So the next question is what do you actually communicate over those streams? We picked the JSON-RPC protocol because it’s:

Extensions only send registered messages (”render”, “setClipboard”, etc.) to Raycast through the exposed API, meaning arbitrary calling into Raycast code isn’t possible. We target an extension from inside the Node worker of a specific extension and use temporary session IDs to process across boundaries to Raycast’s native representation. We generate the IDs when an extension gets loaded, so we know which one to refer to in native land. This also let us run multiple extension instances in parallel 💪

Order and speed matters

Now that we can exchange structured messages between two processes, how do we make sure the message ordering is right and that we don’t run into concurrency issues? In Node land inside the worker we’re single-threaded again, so there’s less potential for message requests mixing in unwanted ways.

In Swift, we use serial queues and buffered streams to make sure messages arrive and leave in order. And from there we can hop on another queue if we want to process a message in the background, or on the main serial queue when we need to update the user interface. Unfortunately this doesn’t fully solve the problem of race conditions that can still happen between the two processes. But we manage it reasonably well with custom logic.

Access to Node APIs and the IPC model is also what sets us apart from tech such as React Native. You might think this is a lot of overhead: from an extension to its worker port, to the parent, to the Raycast process via RPC. But it’s all surprisingly fast, even when we factor in the next important part 👇

Reconciling and creating the render tree

In their extensions, developers use React to declare the UI through the custom Raycast components that we expose through our API. They can use React hooks for state management, which eventually may cause re-rendering of the user interface and Raycast updating its native UI.

This is all possible through a customizable part of React called the “reconciler”. In simple terms, the reconciler takes all pending changes and translates them to concrete updates of the rendering technology of the target platform. On the web, this is the browser’s DOM. In Raycast, this is AppKit. The React DOM reconciler is what you get for free when you use browser React. But in our case we needed to implement our own, send something over from the Node process to the Raycast process, understand it there and efficiently update the native UI. Eeek!

Getting the reconciler implementation right took us a couple of iterations as documentation back then was “sparse”. Our solution was to create a JSON representation for each component, eventually composing in the reconciler to what we call the “render tree”. Now that we have a complete description of the UI after each render pass, we can compare it to the previous render using a standard called JSON Patch. If there are no patches, it means there are no changes. So native land can just sit there and do nothing because there’s nothing to update 👌

After the render tree generation, we look at whether we can gain anything by compressing it. We found a threshold where gzip compression would save us more time than the actual compress/uncompress step takes overall. Once we know what to render and what the changes are, we send it to native land, construct lightweight Swift view models and translate patches to bitset types. So at any time we know what has changed and what the data representation of a component is.

Having this view model layer in place is also a way to shield native components from changes of the render tree representation. We pass it all down to custom AppKit components and ask them to update what is needed using standard UI code. This cycle continues for each render pass, starting when React senses something’s changed and needs to be updated. v = f(s) across process boundaries!

Even more tech but more than tech: the developer experience

All of the above gives an overview of the two-way communication path from extensions to Raycast, and the tech behind it. But there’s more – developers ideally don’t want to press a compile button, wait to see their changes, search for cryptic errors, fix things, and do that all day long. Wait, that sounds like macOS development? 😛

Enter the revived meta buzzword “Developer Experience” (DX). Also known as developers wanting tools that don’t eat up all of their time, so they get to do what they actually want to do. That applies to both the “internal DX” – the experience of developing the tools for other developers. And “external DX”, which are the resulting tools for developers who want to build cool stuff for your platform, without the pain.

Return of the command line interface

Our command line tooling, the ray CLI, is an essential tool for both DX streams. Internally, we use it to develop and build the API itself. And externally, developers use it for creating extensions, debugging and seeing their changes in Raycast.

But how does it work technically? Our CLI is a Go application that we compile both for Darwin (macOS) and Linux (to run automations on the continuous integration via GitHub actions). We use a framework for processing command line arguments and registering commands that do something useful such as building an extension. An important part is the integration of esbuild in its library version to transpile TypeScript code into a JavaScript bundle.

For our internal DX, we can start the CLI in development mode, watching API source files and then transpiling them into the resource bundle of Raycast (the API is bundled into the app). To make this even easier for developers in our team who aren’t working on the API, we’ve also added a build phase to Xcode. It automatically downloads the compiled CLI and compiles the most recent version of the API when needed, as part of the main app compile step. That way, any developer in the team can launch Raycast in debug mode and run extensions with the latest API version.

We also distribute an internal build of the API, automatically created and published via GitHub CI to an internal package registry. We can then link the package to extensions to test early beta versions of new API features that we haven’t officially published yet.

External DX comes with a couple of additional features

We wrap the main CLI commands in “npm run” scripts so that developers can start a development session via “npm run dev” in their extension folder. From that point on we watch source files, transpile and “hot deploy” them into Raycast and reload the extension bundle so that people instantly see changes.

There’s no server running in Raycast to detect changes. Instead we rely on communication over app schemes and a good old pid file. The CLI basically talks to Raycast via URLs and Raycast talks to the CLI (for example, to stop a development session) via the process ID file that the CLI creates.

Raycast also provides a couple of UI tools, such as: the “Create Extension” command that scaffolds a new extension based on a template, or “Manage Extensions” to open and uninstall them, all without CLI involvement. We stream logs to the CLI via an OS log stream that we capture in the CLI. We catch errors at various places in the v8 worker and Node process, which are forwarded to Raycast. From there we extract the info, stack trace and try to present it as nicely as we can in a native overlay screen, including some actions to jump to the error in the editor.

X depends on Y depends on Z – versioning can get complicated

We all know situations where dependency X wasn’t compatible with dependency Y and didn’t run on platform Z because of unknown reasons (╯°□°)╯︵ ┻━┻). With Raycast and the API, we have a couple of moving parts that need to be compatible with each other, such as: Raycast, Node, the API, the extension, the CLI. Our goal was to radically simplify this, so ideally versioning shouldn’t be an issue to developers and end users of Raycast.

Since we operate an “app store” for extensions, it made sense for us to fit our versioning model to what app stores typically do: only publish one latest version. Developers don’t need to deal with SemVer for something that end users rarely care about. Consequently, developers don’t need to specify a version in their extension (but they can still do so for informational purposes in their changelog, if they want to).

We also didn’t want to enforce declaring compatibility with an “engine” (the Node runtime) like other extension ecosystems do. Or even worse, to also express the compatibility with Raycast itself (another engine). The only version that developers really need to care about is the API version; this is the package dependency to the API types, distributed through npm. When developers want to use a new API feature, they bump up the API to the package version that includes the types for that new feature.

Raycast then handles the rest, that is: we auto-update Raycast and extensions, while only installing an extension when we know it’s compatible with that Raycast version, depending on the exact API version that this particular extension uses. The Raycast app version needs to be greater or equal to the API version, otherwise the extension might use an API feature that is not available at runtime (since the user hasn’t upgraded Raycast yet).

Raycast, in turn, includes the API and React, manages the Node runtime, and auto-installs the CLI when developing an extension. The important glue is to synchronize all of our version numbers. So the Raycast app version is the same as that of the API and CLI and they all get released together. Matching version numbers makes reasoning about what is compatible with what much simpler.

API evolution means you can add new features but not remove them

We want to stay backwards-compatible as much as possible. Our evolution and release cycle often goes like this:

  1. based on user feedback, we get an idea of what is important to add to the API,
  2. we create an internal API evolution proposal where we describe the new feature, sketch the API and discuss it with the team,
  3. once the proposal’s accepted: we implement it, review it, dogfood it internally, release it with the next app update cycle, and collect more feedback and real world use cases.

Creating proposals is important for us to ensure consistency across our API and arrive at good solutions. The occasional need to deprecate an old API is unavoidable. But for the rare situations we need it, we usually create an automatic migration through code mods, so that updating an extension to changed types or method signatures is easy. We update the docs with each release. We use CI workflows that publish them to our public extensions repo and incorporate community contributions that we sync back to our internal repo before each release.

Publishing, automating, thriving

For developers to get their extension into the store and share it with others, we piggyback on GitHub’s infrastructure – an open monorepository and pull request workflow for reviews. When someone creates the pull request, we run a couple of automated checks with manifest checking, linting, asset checks for the store, and so on. We also review and test extensions and try to give meaningful feedback along the process.

Once published, we send automatic notifications to a Slack channel and developers often announce their new creation in the Slack community too. Overall, this model has worked really well so far, with people already creating hundreds of extensions. You might wonder how a monorepository and manual reviews of each extension scales. But “do things that don’t scale first” they say, right? Recently, we’ve introduced new Raycast and CLI commands that make forking an extension and contributing much more straightforward and don’t force you to clone the entire monorepo-the CLI performs the right Git(Hub) incantations behind the scenes.

Feature parity and new territory

Building the extension ecosystem after we’d released Raycast with a couple of built-in extensions had an interesting side effect. Developers were wondering how their own extension could make use of the same features and UI components that some of the integrated native extensions were using.

This has pushed us to extend the API with those capabilities, but also to port extensions formerly written in Swift and part of the Raycast codebase to open-source extensions (examples are GitHub, Linear, Google Workspace, or Zoom). Those extensions have become the baseline and informed what we need to build and expose through the API.

As extensions get more complex, the API must evolve

The API needs to evolve in various directions like providing UI components; OS level APIs that would otherwise be hard to implement purely in Node land; and making complex things simpler. As for the latter, we created and have now open-sourced a utilities package that works in tandem with our API. This makes typical tasks like async operations, networking and caching easier and promotes best practices around React.

Another direction we started exploring last year is enabling completely new types of extensions that Raycast didn’t provide before: menu bar commands. You get the same development model through React and hot reloading. And the end product is a native macOS menu bar app that end users can install, activate and use for their individual workflows.

Our recent release of Raycast’s AI features has also opened up new possibilities for developers since we’ve exposed an API that lets you create custom AI extensions. And finally, we’ve been looking into ways to give developers more insights into their extensions’ exceptions and analytics through a “developer hub” that we’re currently evaluating.

You’ve made it through this far 👏

Thanks for sticking with us! If you’re a developer, please keep building cool extensions (and maybe start replacing those pesky internal automation scripts in your organization with an extension UI). And if you’re not (yet), why not start learning how to build your first extension.