Jonathan Bennett

Critical SaaS Infrastructure

Every SaaS requires a basic amount of infrastructure, with a few critical additions as it grows. Most of this can be avoided, but there are serious downside to skipping out, typically to your customer’s satisfaction and confidence in your product.

Table of Contents

  1. Phase 1: Early SaaS
    1. Scalable Hosting
    2. Backups
    3. Application Logging
      1. Runtime Logging
      2. Error Logging
    4. Basic Analytics
    5. Basic Automation
      1. Testing
      2. Deployment
  2. Phase 2: Product Market Fit
    1. Realtime User Monitoring (RUM)
  3. Phase 3: Growth
    1. Performance Monitoring

Phase 1: Early SaaS

Scalable Hosting

Traditionally hosting required you to buy a physical machine, hook it to the internet, and keep it updated and secure. This had all the inherent complexity of the task, plus there was a ton of additional complexity to manage the system.

Modern systems are typically running on virtual or cloud instances. This means that instead of having one physical machine, you run many virtual machines on physically hardware. This often reduces complexity as it requires standardization processes which can be properly tested and automated.

With this automation in place, you can often move a slide to increase your capacity safety and easily as needed.


Backups are critical to your application. Downtime, errors, and other quality issues can usually be addressed. Lose of data can be an extinction level event for your SaaS.

Fortunately most database system can easily and cheaply be backed up. Additionally, services like S3 can be configured to not permit removal of files. This allows you to have the confidence that if the worse case scenario happens, you have a way to recover and move forward

Application Logging

Application logging gives you insight into what is happening while your application is running. This allows you to address errors and fix issues. Application logging is broken down into to types, runtime logging, and error logging.

Runtime Logging

While an application is running it is able to create messages about what is happening. These are logged chronologically and can be review to help deduce what has gone wrong. A log could look something like:

Importing Client 1
Importing Job 1
Importing Job 2
Done Client 1
Importing Clinet 2
Importing Job 3
Importing Client 3

Reading a log like the above would indicate something wrong with client 2 or Job 3/4 since no Done Client 2 message appears.

Error Logging

Additional logging can be performed by an error logger. This would specifically capture any errors that cause the program to crash, though other details may be captured. Error loggers capture a lot more detail about the status of the entire system than a runtime log would. Because of this, the details can be harder to work with, but are super useful to diagnos complex issues.

Basic Analytics

Basic analytics allow you to know what parts of your application are being used, how often and by whom. This is useful for retroactively evaluating your application to know what areas are successfully being used by customers, what areas need more engagement, or places where people need more assistance.

Analytics is a tricky topic since collecting all data possible typically leads to problems with not being able to see the forest for the trees, but you often need more historical data to answer questions in the future. This leads to a struggle between collecting to little and too much data. Adding to this complexity are privacy laws like GDPR. You will want to keep this in mind as you select a tool.

Basic Automation

Automation is a infinity deep pool. There are two areas I do recommend for every startup though: testing critical functionality and deployment.


Having good coverage of your core features allows you to confidently work on your application. Without critical test coverage you do not know if the changes that you are making are safe or could be damaging something.

Typical critical coverage areas are:

  • shopping cart checkout for shopping apps
  • financial transactions for finance apps
  • message delivery for a communication apps

By having test coverage over your critical code, you can make changes knowing that at least this core functionality will continue to work.


Deploying your application typically requires multiple steps which must be performed in order, some of which might be optional, and needs to be done reliably or things go very, very, bad, and in a hurry.

This is a perfect situation for automation and the wins can be huge. Applying automation to the deployment of your system is more reliable, consistent, and quick. Manual deployment is tedious and error prone and should be avoided at all costs. Manual deployment is often an issue because it leads to unique, special servers (snowflake servers) that increase the fragility of your infrastructure.

Phase 2: Product Market Fit

Realtime User Monitoring (RUM)

Realtime user monitoring, RUM, takes basic analytics and turns the dial to 11. Actively tracking what the user is doing in the app and for how long gives very specific feedback for where you can optimize the application.

Phase 3: Growth

Performance Monitoring

Most hosting platforms give basic information on the performance of your application, but specialized performance monitoring tools greatly increase the detail available. In addition, they give greater historical context, and can even proactively alert you of issues before they occur.

Taking this to the next level, you can even feed performance metrics back into deployment automation to increase the number of servers being used, and increasing the capacity of your application.

Having these tools available to you allows you to serve your customers well, make better decisions, and move more quickly.

If you don’t, let’s talk about how we can get that taken care of ASAP.