High Availability
When you turn ON High Availability (HA), we will spin up (at least) 3 nodes in your cluster in 3 different physically & geographically separate data centers, each of which can service reads and writes.
So even if one of these nodes is inaccessible for any reason (for eg: due to underlying infrastructure issues or due to maintenance), the other two nodes can continue servicing requests with zero downtime.
When you enable HA, you will get a special load-balanced hostname and searches sent to this endpoint will automatically be distributed to one of the nodes in your cluster.
Writes can also be sent to this load-balanced endpoint and your data will be automatically replicated to all the nodes in your cluster.
We highly recommend enabling High Availability when running Typesense in a production environment.
With HA turned ON, you can avoid a downtime during the following scenarios:
- Infrastructure issues: in a multi-node HA cluster, even if one node or one datacenter has underlying hardware issues, the other two nodes in the cluster will continue servicing traffic, as the problematic node recovers.
- Capacity changes: when RAM / CPU is changed in a multi-node HA cluster, we will upgrade/downgrade one node at a time and your cluster will continue servicing traffic from the other two nodes, as the configuration change happens on the 3rd one.
- Typesense version changes: when the Typesense Server version is changed in a multi-node HA cluster, we will upgrade/downgrade one node at a time and your cluster will continue servicing traffic from the other two nodes, as the version change happens on the 3rd one.
- Maintenance: we typically tend to do maintenance on the underlying OS every 1-2 months. In a multi-node cluster, we will service one node at a time and your cluster will continue servicing traffic from the other two nodes, as the 3rd one is being serviced.
With HA turned OFF, you will experience a downtime proportional to the size of your dataset during the scenarios above. For scenario #1, the downtime could be several hours depending on infrastructure recovery times. For scenario #2, #3 and #4 this could be anywhere from 15 minutes to 2 hours as the single non-HA node in your cluster recovers.
Read more about High Availability in Typesense Cloud here.