Security

Last updated: Sep 12, 2024

Keeping your source code and developer environment secure is critical to us. This page outlines how we approach security for Cursor.

Please submit potential vulnerabilities to our GitHub Security page.

For any security-related questions, feel free to contact us at security@cursor.com.

Certifications and Third-Party Assessments

Cursor is SOC 2 Type I certified, and in process of completing SOC 2 Type II (expected November, 2024). Please email hi@cursor.com to request a copy of the report.

We commit to doing at-least-annual penetration testing by reputable third parties. Our first report is available as of September, 2024. Please email hi@cursor.com to request a copy of the report.

Infrastructure Security

We depend on the following subprocessors, roughly organized from most critical to least. Note that code data is sent up to our servers to power all of Cursor's AI features (see AI Requests section), and that code data for users on privacy mode is never persisted (see Privacy Mode Guarantee section).

Click to see how Privacy Mode affects where code data is sent:
Privacy Mode Enabled
  • AWSSees code data: Our infrastructure is primarily hosted on AWS. Most of our servers are in the US, with some latency-critical servers located in AWS regions in Asia (Tokyo) and Europe (London).
  • FireworksSees code data: Our custom models are hosted with Fireworks, on servers in the US, Asia (Tokyo), and Europe (London). Fireworks may store some code data if privacy mode is disabled to speed up inference for our models.
  • OpenAISees code data: We rely on many of OpenAI's models to give AI responses. Requests may be sent to OpenAI even if you have an Anthropic (or someone else's) model selected in chat (e.g. for summarization). We have a zero data retention agreement with OpenAI.
  • AnthropicSees code data: We rely on many of Anthropic's models to give AI responses. Requests may be sent to Anthropic even if you have an OpenAI (or someone else's) model selected in chat (e.g. for summarization). We have a zero data retention agreement with Anthropic.
  • Google Cloud Vertex APISees code data: We rely on some Gemini models offered over Google Cloud's Vertex API to give AI responses. Requests may be sent to Google Cloud Vertex API even if you have an OpenAI (or someone else's) model selected in chat (e.g. for summarization).
  • TurbopufferStores obfuscated code data: Embeddings of indexed codebases, as well as metadata associated with the embeddings (obfuscated file names), are stored with Turbopuffer on Google Cloud's servers in the US. You can read more on the Turbopuffer security page. Users can disable codebase indexing; read more about it in the Codebase Indexing section of this document.
  • Exa and SerpApiSee search requests (potentially derived from code data): Used for web search functionality. Search requests are potentially derived from code data (e.g. when using "@web" in the chat, a separate language model will look at your message, conversation history and current file to determine what to search for, and Exa/SerpApi will see the resulting search query).
  • MongoDBSees no code data: We use MongoDB for some of our analytics data, for users who do not have privacy mode enabled.
  • DatadogSees no code data: We use Datadog for logging and monitoring. As discussed in the Privacy Mode Guarantee section, logs related to privacy mode users do not contain any code data.
  • DatabricksSees no code data: We use Databricks MosaicML for training some of our custom models. Data from privacy mode users never reaches Databricks.
  • FoundrySees no code data: We use Foundry for training some of our custom models. Data from privacy mode users never reaches Foundry.
  • SlackSees no code data: We use Slack as our internal communication tool. We may send snippets of prompts of non-privacy users in our internal chats for debugging.
  • Google WorkspaceSees no code data: We use Google Workspace to collaborate. We may send snippets of prompts of non-privacy users in our internal emails for debugging.
  • PineconeSees no code data: Embeddings and metadata of indexed docs are stored on Pinecone. These docs are fetched from the public web. We are in the process of migrating these to Turbopuffer.
  • AmplitudeSees no code data: We use Amplitude for some of our analytics data. No code data is stored with Amplitude; only event data such as "number of Cursor Tab requests".
  • HashiCorpSees no code data: We use HashiCorp Terraform to manage our infrastructure.
  • StripeSees no code data: We use Stripe to handle billing. Stripe will store your personal data (name, credit card, address).
  • VercelSees no code data: We use Vercel to deploy our website. The website has no way of accessing code data.
  • WorkOSSees no code data: We use WorkOS to handle auth. WorkOS may store some personal data (name, email address).

None of our infrastructure is in China. We do not directly use any Chinese company as a subprocessor, and to our knowledge none of our subprocessors do either.

We assign infrastructure access to team members on a least-privilege basis. We enforce multi-factor authentication for AWS. We restrict access to resources using both network-level controls and secrets.

Client Security

Cursor is a fork of the open-source Visual Studio Code (VS Code), maintained by Microsoft. They publish security advisories on their GitHub security page. Every other mainline VS Code release, we merge the upstream microsoft/vscode codebase into Cursor. You can check which version of VS Code that your Cursor version is based on by clicking "Cursor > About Cursor" in the app. If there is a high-severity security-related patch in the upstream VS Code, we will cherry-pick the fix before the next merge and release immediately.

We use ToDesktop for distributing our app and for doing auto-updates. They are trusted by several widely used apps, such as Linear and ClickUp.

Our app will make requests to the following domains to communicate with our backend. If you're behind a corporate proxy, please whitelist these domains to ensure that Cursor works correctly.

  • api2.cursor.sh: Used for most API requests.
  • api3.cursor.sh: Used for Cursor Tab requests (HTTP/2 only).
  • repo42.cursor.sh: Used for codebase indexing (HTTP/2 only).
  • api4.cursor.sh, us-asia.gcpp.cursor.sh, us-eu.gcpp.cursor.sh, us-only.gcpp.cursor.sh: Used for Cursor Tab requests depending on your location (HTTP/2 only).
  • marketplace.cursorapi.com, cursor-cdn.com: Used for downloading extensions from the extension marketplace.
  • download.todesktop.com: Used for checking for and downloading updates.

Two security-related differences to VS Code to note:

  1. Workspace Trust is disabled by default in Cursor. You can enable it by setting security.workspace.trust.enabled to true in your Cursor settings. It is disabled by default to prevent confusion between Workspace Trust's "Restricted Mode" and Cursor's "Privacy Mode", and because its trust properties are nuanced and hard to understand (for example, even with Workspace Trust enabled, you are not protected from malicious extensions, only from malicious folders). We are open to community feedback on whether we should enable it by default.
  2. Our cursor-server builds, which are installed whenever you do remote development with Cursor (e.g. when developing over SSH), are based on Node 16, which has reached its EOL. We do this to support machines that do not have glibc>=2.28 installed (e.g. Ubuntu 18). VS Code currently gives the Node 16-based legacy build to everyone who does not have glibc>=2.28, and gives a Node 20-based non-legacy build to everyone else, whereas we give the Node 16-based legacy build to everyone. Once VS Code ceases to support the Node 16-based build in February, 2025 (see here), we will also upgrade to the non-legacy build process and distribute that to everyone.

AI Requests

To provide its features, Cursor makes AI requests to our server. This happens for many different reasons. For example, we send AI requests when you ask questions in chat, we send AI requests on every keystroke so that Cursor Tab can make suggestions for you, and we may also send AI requests in the background for building up context or looking for bugs to show you.

An AI request generally includes context such as your recently viewed files, your conversation history, and relevant pieces of code based on language server information. This code data is sent to our infrastructure on AWS, and then to the appropriate language model inference provider (Fireworks/OpenAI/Anthropic/Google). Note that the requests always hit our infrastructure on AWS even if you have configured your own API key for OpenAI in the settings.

We currently do not have the ability to direct-route from the Cursor app to your enterprise deployment of OpenAI/Azure/Anthropic, as our prompt-building happens on our server, and our custom models on Fireworks are critical in providing a good user experience. We do not yet have a self-hosted server deployment option.

You own all the code generated by Cursor.

Codebase Indexing

Cursor allows you to semantically index your codebase, which allows it to answer questions with the context of all of your code as well as write better code by referencing existing implementations. Codebase indexing is enabled by default, but can be turned off during onboarding or in the settings.

Our codebase indexing feature works as follows: when enabled, it scans the folder that you open in Cursor and computes a Merkle tree of hashes of all files. Files and subdirectories specified by .gitignore or .cursorignore are ignored. The Merkle tree is then synced to the server. Every 10 minutes, we check for hash mismatches, and use the Merkle tree to figure out which files have changed and only upload those.

At our server, we chunk and embed the files, and store the embeddings in Turbopuffer. To allow filtering vector search results by file path, we store with every vector an obfuscated relative file path, as well as the line range the chunk corresponds to. We also store the embedding in a cache in AWS, indexed by the hash of the chunk, to ensure that indexing the same codebase a second time is much faster (which is particularly useful for teams).

At inference time, we compute an embedding, let Turbopuffer do the nearest neighbor search, send back the obfuscated file path and line range to the client, and read those file chunks on the client locally. We then send those chunks back up to the server to answer the user's question. This means that no plaintext code is stored on our servers or in Turbopuffer.

Some notes:

  • While a .cursorignore file can prevent files from being indexed, those files may still be included in AI requests, such as if you recently viewed a file and then ask a question in the chat. We are considering adding a .cursorban file to address the use case of wanting to block files from being sent up in any request — please make a forum post or reach out at hi@cursor.com if this is a feature that would be interesting to you.
  • File path obfuscation details: the path is split by / and . and each segment is encrypted with a secret key stored on the client and a deterministic short 6-byte nonce. This leaks information about directory hierarchy, and will have some nonce collisions, but hides most information.
  • Embedding reversal: academic work has shown that reversing embeddings is possible in some cases. Current attacks rely on having access to the model and embedding short strings into big vectors, which makes us believe that the attack would be somewhat difficult to do here. That said, it is definitely possible for an adversary who breaks into our vector database to learn things about the indexed codebases.
  • When codebase indexing is enabled in a Git repo, we also index the Git history. Specifically, we store commit SHAs, parent information and obfuscated file names (same as above). To allow sharing the data structure for users in the same Git repo and on the same team, the secret key for obfuscating the file names is derived from hashes of recent commit contents. Commit messages and file contents or diffs are not indexed.
  • Our indexing feature often experiences heavy load, which can cause many requests to fail. This means that sometimes, files will need to be uploaded several times before they get fully indexed. One way this manifests is that if you check the network traffic to repo42.cursor.sh, you may see more bandwidth used than expected.

Privacy Mode Guarantee

Privacy mode can be enabled during onboarding or in settings. When it is enabled, we guarantee that code is not stored at our servers or by our subprocessors. Privacy mode can be enabled by anyone (free or Pro user), and is by default forcibly enabled for any user that is a member of a team.

We take the privacy mode guarantee very seriously. About 50% of all Cursor users have privacy mode enabled. You can read more about the privacy guarantee in our Privacy Policy.

With privacy mode enabled, code data is not persisted at our servers or by any of our subprocessors. The code data is still visible to our servers in memory for the lifetime of the request, and may exist for a slightly longer period (on the order of minutes to hours) for long-running background jobs or KV caching. The code data submitted by privacy mode users will never be trained on.

A user's privacy mode setting is stored on the client. Each request to our server includes an x-ghost-mode header. To prevent accidentally treating a privacy mode user as a non-privacy mode user, we always default to assuming that a user is on privacy mode if the header is missing.

All requests to our server first hit a proxy, that decides which logical service should handle the request (e.g. the "chat service" or the "Cursor Tab service"). Each logical service comes in two near-identical replicas: one replica that handles privacy mode requests, and one replica that handles non-privacy mode requests. The proxy checks the value of the x-ghost-mode header and sends the request to the appropriate replica. The replicas themselves also check the header for redundancy. By default, all log functions from the privacy mode replicas are no-ops, unless suffixed like infoUnrestricted, which we carefully review to never attach any potential code data or prompts. For requests that spawn off background tasks, we similarly have parallel queues and worker replicas for privacy mode and non-privacy mode. This parallel infrastructure makes us confident in our privacy mode guarantee and its resilience against accidental mistakes or bugs.

For team-level privacy mode enforcement, each client pings the server every 5 minutes to check if the user is on a team that enforces privacy mode. If so, it overrides the client's privacy mode setting. To prevent cases where the privacy mode ping by the client fails for any reason, our server also, in the hot path, checks whether the user is part of a team that enforces privacy mode, and if so treats the request as if it is on privacy mode even if the header says otherwise. On latency-sensitive services, we cache this value for 5 minutes, and for any cache miss we assume that the user is on privacy mode. All in all, this means that when a user joins a team, they will be guaranteed to be on privacy mode at the very latest 5 minutes after joining the team. As a special case, if a user signs into a team account at onboarding, they will be guaranteed to be on privacy mode immediately.

Account Deletion

You can delete your account at any time in the Settings dashboard (click "Advanced" and then "Delete Account"). This will delete all data associated with your account, including any indexed codebases. We guarantee complete removal of your data within 30 days (we immediately delete the data, but some of our databases and cloud storage have backups of no more than 30 days).

It's worth noting that if any of your data was used in model training (which would only happen if you were not on privacy mode at the time), our existing trained models will not be immediately retrained. However, any future models that are trained will not be trained on your data, since that data will have been deleted.

Vulnerability Disclosures

If you believe you have found a vulnerability in Cursor, please follow the guide on our GitHub Security page and submit the report there. If you're unable to use GitHub, you may also reach us at security@cursor.com.

We commit to addressing vulnerability reports immediately, and will publish the results in the form of security advisories on our GitHub security page. Critical incidents will be communicated both on the GitHub security page and via email to all users.

MadebyAnysphere
SOC 2 Certified