I’m building a local-first web app, and SQLite works well for my case since a single project can be contained in one database file, just like users are used to with existing desktop applications.
What I’d really like is an easy way to sync the SQLite database state to a cloud service. Most existing options expect you to query against a remotely hosted database and charge per read/write.
Since the database will have around 100,000 rows and you're typically working with all the data at once, streaming parts of it doesn’t make sense for my use case.
The closest I’ve found is Turso, which has offline writes in private beta, and SQLite Cloud, which lists local-first and offline sync as "coming soon."
The simplest approach might be letting users push to S3 storage with versioning. Ideally, it would also support point-in-time restores, tracking incremental updates alongside full snapshots.
Even better, I’d manage minimal server-side infrastructure and just pull the SQLite database from a service that handles syncing and management.
Maybe I am misunderstanding which part you want in the cloud, but that sounds like litestream. Let’s you transparently backup a live SQLite database to a remote destination.
How about: Have 1 + N separate SQLite database-files.
Each user would have their own database-file which contains only information about that user. Then 1 shared database-file which contains info needed for all users.
Users would update their own data, which is a small database file which can be easily uploaded. They would not need to update the shared data.
Not knowing your app I don't know what the shared data would contain, presumably something. Perhaps the shared data-file would be updated on the server based on what individual user-data the users upload.
The obvious caveat here is any situation where you need global tables. I've never worked on a product where 100% of the schema could be partitioned like this. I've done a ton of SQLite-per-<entity type>, but there's always been a singleton SQLite database above this tracking the metadata.
Looking up a user session via token is not an analytics task. Neither is checking if an email address is already registered. These are very pedestrian queries that demand global visibility, ideally with hard transactional guarantees.
Is it even realistic to depend on transactional guarantees, with hundreds of services hammering the DB(s) more or less concurrently? Don’t they need to coordinate themselves outside of transactions?
> Is it even realistic to depend on transactional guarantees, with hundreds of services hammering the DB(s) more or less concurrently?
If you have hundreds of services hammering the datastore concurrently, then the ability to offer transactional guarantees seems absolutely mandatory for me.
> Don’t they need to coordinate themselves outside of transactions?
I think we're back at the beginning of our journey here.
Do you mean via distributed transactions? I‘m clueless in that area, and don’t understand how hundreds of self sufficient services would partake in transactions that span different sets of those services, without entangling them badly.
Hmm, this article is a little confusing. I'm not familiar with Vitess or Citus, but am familiar with "manually" sharded Postgres/Mysql, and I'm not sure I understand if there's any "interaction effects" of the decision to shard or not and the decision between MySQL/Postgres and Sqlite.
Like, the article's three sections are:
1. The challenges of sharding
2. The benefits of these new sharded Sqlite solutions over conventional Sqlite
3. A list conflating the benefits of SQL databases generally with the benefits of Sqlite
None of which answer the question of "Why should I use sharded Sqlite instead of, say, sharded Postgres, for hyperscale?".
If you think this is a good fit for your case, you should embed SQLite in your application and shard your application. An embedded SQLite is faster and uses less memory than a PostgreSQL running as a separate process and possibly on a different machine.
Looks interesting. How would you approach write and multiple read replicas with this setup? My holy grail would be allowing users to get data from read replica, but also writing to the read replica which then forwards those writes to the primary write DB
One of the classic solutions to this was to put the data behind a cluster and route by verb. All the GET requests go to the replicas and everything else goes to the writeable instance.
Suffers from dirty reads: Write to the primary, get confirmation of write-success, reload page, user sees 'stale' data from the read-replica and thinks you lost their data.
I love the idea of Durable Objects, and have thoughts about it in the health space (although having a single provider of them is probably a bad thing) but cross-cutting questions such as this seem to be quite difficult to achieve.
From the chat logs example in your article: how do you cope with a requirement such as "I as a user want to see all my latest thread activity in one place, across all my chat rooms?"
> From the chat logs example in your article: how do you cope with a requirement such as "I as a user want to see all my latest thread activity in one place, across all my chat rooms?"
I’m building a local-first web app, and SQLite works well for my case since a single project can be contained in one database file, just like users are used to with existing desktop applications.
What I’d really like is an easy way to sync the SQLite database state to a cloud service. Most existing options expect you to query against a remotely hosted database and charge per read/write.
Since the database will have around 100,000 rows and you're typically working with all the data at once, streaming parts of it doesn’t make sense for my use case.
The closest I’ve found is Turso, which has offline writes in private beta, and SQLite Cloud, which lists local-first and offline sync as "coming soon."
The simplest approach might be letting users push to S3 storage with versioning. Ideally, it would also support point-in-time restores, tracking incremental updates alongside full snapshots.
Even better, I’d manage minimal server-side infrastructure and just pull the SQLite database from a service that handles syncing and management.
Maybe I am misunderstanding which part you want in the cloud, but that sounds like litestream. Let’s you transparently backup a live SQLite database to a remote destination.
https://litestream.io/
How about: Have 1 + N separate SQLite database-files.
Each user would have their own database-file which contains only information about that user. Then 1 shared database-file which contains info needed for all users.
Users would update their own data, which is a small database file which can be easily uploaded. They would not need to update the shared data.
Not knowing your app I don't know what the shared data would contain, presumably something. Perhaps the shared data-file would be updated on the server based on what individual user-data the users upload.
In this multi-tenant arrangement, you run into synchronization problems.
Developers should expect users to connect to the service using multiple devices (clients).
AFAIK bare SQLite doesn't offer synchronization mechanisms between multiple SQLite instances.
I believe Turso offers some solution of this kind, but not sure if that's open source or not.
> Benefits of SQLite-Per-Partition
The obvious caveat here is any situation where you need global tables. I've never worked on a product where 100% of the schema could be partitioned like this. I've done a ton of SQLite-per-<entity type>, but there's always been a singleton SQLite database above this tracking the metadata.
Looking up a user session via token is not an analytics task. Neither is checking if an email address is already registered. These are very pedestrian queries that demand global visibility, ideally with hard transactional guarantees.
Is it even realistic to depend on transactional guarantees, with hundreds of services hammering the DB(s) more or less concurrently? Don’t they need to coordinate themselves outside of transactions?
> Is it even realistic to depend on transactional guarantees, with hundreds of services hammering the DB(s) more or less concurrently?
If you have hundreds of services hammering the datastore concurrently, then the ability to offer transactional guarantees seems absolutely mandatory for me.
> Don’t they need to coordinate themselves outside of transactions?
I think we're back at the beginning of our journey here.
Do you mean via distributed transactions? I‘m clueless in that area, and don’t understand how hundreds of self sufficient services would partake in transactions that span different sets of those services, without entangling them badly.
would redis/in memory DB be a better suited for this task? I don't think a regular rdbms will meet this requirement at scale
Hmm, this article is a little confusing. I'm not familiar with Vitess or Citus, but am familiar with "manually" sharded Postgres/Mysql, and I'm not sure I understand if there's any "interaction effects" of the decision to shard or not and the decision between MySQL/Postgres and Sqlite.
Like, the article's three sections are:
1. The challenges of sharding
2. The benefits of these new sharded Sqlite solutions over conventional Sqlite
3. A list conflating the benefits of SQL databases generally with the benefits of Sqlite
None of which answer the question of "Why should I use sharded Sqlite instead of, say, sharded Postgres, for hyperscale?".
If you think this is a good fit for your case, you should embed SQLite in your application and shard your application. An embedded SQLite is faster and uses less memory than a PostgreSQL running as a separate process and possibly on a different machine.
TDLib (Telegram Database library)[0] uses SQLite.
In Telegram Bot API, each TDLib instance handles more than 24,000 active bots simultaneously.[1]
[0]: https://github.com/tdlib/td
[1]: https://core.telegram.org/tdlib
Looks interesting. How would you approach write and multiple read replicas with this setup? My holy grail would be allowing users to get data from read replica, but also writing to the read replica which then forwards those writes to the primary write DB
One of the classic solutions to this was to put the data behind a cluster and route by verb. All the GET requests go to the replicas and everything else goes to the writeable instance.
Suffers from dirty reads: Write to the primary, get confirmation of write-success, reload page, user sees 'stale' data from the read-replica and thinks you lost their data.
there is no need in read replica, because every micro sqlite partition has very low workload (single digit queries per second)
Author here, happy to answer questions!
I love the idea of Durable Objects, and have thoughts about it in the health space (although having a single provider of them is probably a bad thing) but cross-cutting questions such as this seem to be quite difficult to achieve.
From the chat logs example in your article: how do you cope with a requirement such as "I as a user want to see all my latest thread activity in one place, across all my chat rooms?"
That's an easy case: write to two places (one for the chat and one for the user's activity).
In this case I think you can let them become inconsistent in the face of, e.g., write errors.
> From the chat logs example in your article: how do you cope with a requirement such as "I as a user want to see all my latest thread activity in one place, across all my chat rooms?"
create a copy, for example
Interesting read, thank you!
Do you use any special tools to manage all these separate databases, track performance and debug problems?
SQLite is a very good technology.