A reliable update process is our top priority, so it's no wonder that we have built tons of features into our service that support us in achieving an update success rate of over 99.99% for our customers.


Reliability engineering

Automated end-to-end testing

We test all of our different update modes, every single field type and every single entity type we support with their own end-to-end test. These tests are run before every release of our module and connected services.


Because every Drupal setup is different and can become very complex at scale, we recommend to all our customers to build a set of automated tests that they can run before deploying updates in their specific environment. We can help creating a test plan and share best practices to incorporate automated content syndication testing into your setup.


Cloud: Availability

All our services are fully redundant, vertically scalable and hosted in multiple availability zones to ensure permanent uptime.



Syndication process

Automated retries

We retry failed updates for up to 24 hours to automatically recover from temporary issues like a site being temporarily not available or overloaded. The retry interval depends on the Update priority.


Throttling

To avoid overloading your sites with requests and risking availability issues or degraded performance for your visitors, we throttle requests by default. You can customize this throttling to optimize it for your site's performance and available resources. See Advanced settings.


Asynchronous processing

We process as much as possible asynchronously to make it easy to recover from any kind of failure. E.g. when an editor pushes content we don't serialize and send the content- instead, we send a lightweight update request to our service that is then processed asynchronously in the background; this makes the update process more reliable and significantly improves the performance and usability for editors.

All operations in our service are broken down into small components that are also processed asynchronously and automatically retried on failure.


Adaptive update performance

By default, we update all entities that are part of the content when we make an update request at a site. This allows us to recover from any potential previous failure or technical issue. Imagine there's a glitch e.g. with your cache and it's serving outdated content that we use as the base to make updates. If we only did delta updates, any problem at any point in time would carry on indefinitely even if the code or service was fixed. That's why by default we do full updates on all entities that are part of the content that's pulled into the site. But in some cases when you're working with hundreds of entities like very nested paragraphs, this can lead to performance bottlenecks and even timeouts. For those cases, we offer an Adaptive setting that will perform highly reliable updates by default and opt-in to optimized updates for the minority of content where it's needed. See Advanced settings.


Update priorities

You can assign different priorities to updates to process them faster and retry them quicker and more often in case of failures. See Update priorities.


Update process

Locking

Before we start an update, we lock the entity temporarily and if another update tries to update the same entity, it will fail. This avoids two simultaneous updates trying to update the same entity and potentially leading to corrupted data that's difficult to recover from.

The lock is automatically removed when the update finishes, when the update fails or after your PHP's max_execution_time was reached.


Missing dependency manager

When a reference can't be resolved during the time of the update, we flag the entity so that whenever the referenced content becomes available, we also update the original content and fill in the missing entity reference. E.g. Article A references Article B, but Article A is created before Article B, so the reference can't be resolved at first; after adding Article B to the site, Article A is re-saved with the missing reference to Article B filled out.


Follow-up entity cache clear

When Drupal is under heavy load it can create a race condition where entity cache is storing outdated content. This will not only serve outdated content to users, but also means you might lose changes when editing the content on the target site and you may even lose translations on follow-up updates.

To avoid this, Content Sync can make a follow-up request to clear the entity cache again for Highly Critical updates.


Observability, Troubleshooting and Recovery

Update status (observability)

Content Sync allows you to view every single update of every single piece of content to every single site. See Update status.


Troubleshooting

Content Sync provides many features to make it easy to troubleshoot potential issues. Besides detailed log messages, we save all failed updates with all request details, so it's easy to investigate potential issues, understand in detail what's happening and even retry the requests in different environments to debug. Request details also include Content Sync log messages, the timing of every single entity update during an update request and the request + response body and headers. See Troubleshooting.


Failure states

We handle unexpected failures on every step along the process. When an update fails in Drupal (e.g. a network issue leads to the site being unable to connect to our service), we not only log the issue but also save it as a flag on the entity so that we can recover from this failure automatically later when the issue is resolved.


Automated recovery

If you have experienced multiple failures e.g. from a bad code deployment, you can use our Push/Pull failed update mode to automatically re-try all of the failed updates. See mass updates.