Building a serverless response to Substack subscriptions

Solution engineering snippets

Jan 05, 2025

The basic problem: No public API

As of this writing, Substack doesn’t offer a public API for its services. This means that computer programs can’t ask Substack’s servers questions like “who has subscribed to the newsletter since last week?” For a newsletter that offers more (for example, access to a github repository for subscribers), this can mean additional, manual work to process the notification emails that Substack sends.

Any time I hear “and then you process it manually,” I feel the urge to automate. There are things that shouldn’t be automated — in many processes, there’s considerable scope for human judgment (even in the age of LLMs). But this case isn’t one of them. There’s no human judgment or human touch in adding someone to a list. Let’s gather some requirements and summarize them.

System goals

We begin with two goals:

React when a subscription happens
Keep a list, outside of Substack, of our subscribers and their subscription status

We could begin with much more specific goals, and with implementation details assumed. That could look something like:

When there is a new subscription, use the github API to add the subscriber to the github repo

This is probably enough for some newsletters (see the previous post about tech debt), but it’s not very general. For a general solution, we just need to react somehow (that can be concretized later) and we probably want to keep a list of subscribers so that we (a) only react once and (b) can process old subscriptions for new things we want to do. This means data storage of some kind.

Working backwards to requirements

Knowing that we want to react, but only to changes, and that we need a data store, the most obvious thing is to use a NoSQL document store. At least, this is the most obvious thing for solutions architects. 😊

Using a pre-requisite diagram

Pre-requisite diagrams are read from top to bottom, with each “Goal” box being achieved by the “Overcome” boxes that have arrows leading to it. I think of the “Overcome” boxes as letting us “jump” the goal boxes. So, we start at the top by writing our goals, then think of objections to each goal — what prevents us from achieving it? (The “Overcome” boxes are sometimes listed as “Obstacle” boxes for this reason.) When we have an objection, we write how we’ll overcome it and link that to the goal.

One reason for using this kind of diagram, instead of a process diagram, is that a process diagram works forward (what do we do next?) and a pre-requisite tree works backward (what do we do before that?). Process diagrams, typically, explain and record how things should be done; pre-requisite diagrams explain and record what needs doing. There’s nothing wrong with using either type of diagram if you’re comfortable with it, but our understanding is shaped by our tools, so choice of tool can influence the outcome. Here, I want to both explore what’s possible and explore what’s necessary.

So, our first “Overcome” becomes sending an update to a NoSQL document store when we get a notice. It’s not important which document store or which cloud service. The large public clouds all have this kind of service (e.g., DynamoDB for AWS; Firestore for GCP; Azure Cosmos DB for Microsoft Azure) because it’s a really useful type of service. In each one, it’s possible to trigger messages on changes to the data store and to ignore duplicate events.

Before we can send an update, though, we’ll need to parse the notification email. And we’ll want to be sure it’s a real email. Once we’ve done that, we can notify our data store.

We have a few different ways of reacting to an email, which is a pre-requisite to reacting to the subscription change. One way to react to emails is to periodically check for new emails and read them. This is how the Zapier hook for email works. The other option is to parse the emails on receipt. This is what the Sendgrid Incoming API allows. Let’s choose this second option. I like it because

It doesn’t require providing login credentials to a third-party
It can be limited to receiving only relevant emails

Setting this up requires signing up for a Sendgrid account, verifying domain ownership, and changing some DNS records to have the email sent to Sendgrid. That’s all beyond the scope of this, but these requirements mean you’ll need to be using a domain you can control, not an email address from a third party (e.g., GMail or Proton Mail). It also means you can’t send your Substack emails directly to only Sendgrid — you won’t get any of your notices from Substack if you do, because they will all trigger (only) the Sendgrid API instead of landing in your inbox.

But, if we use a group email address, then it can forward emails to Sendgrid and still deliver them to you. Most providers of domain email can handle list setup or forwarding rules that allow multiple recipients. Make your “real” email address one of them, and make any address at the Sendgrid API subdomain the other.

Caveats and diving deeper

If you set this up (actually or mentally), you’ll notice that there’s nothing preventing anyone else from sending emails that will trigger your processing. And because of how MX records work and because they’re public, anyone can discover this subdomain. So you can’t rely on secrets in sending to Sendgrid.

Another limitation of the Sendgrid API is that all it can do is turn an email receipt (which uses SMTP, the standard protocol for email on the internet) into a webhook call. There’s no authentication of the email received, and no authentication into the webhook.

The diagram below shows the pre-requisite tree for this project and a pattern for handling validation that the call is coming from Sendgrid. As for code, that’s in the repository but will be straightforward for any technical person used to building cloud applications.

An image showing a pre-requisite tree for adding serverless events in response to Substack notifications — Turning Substack notices into serverless function events

Don’t hesitate to reach out if something is unclear of if you need help architecting a solution.

Data Heretics

Discussion about this post