İçeriğe Atla
Mustafa Erbay
Technology · 9 min read · görüntülenme Türkçe oku
100%

SQLite and Concurrency: The Lockout Experienced at islistesi.com

A first-hand account of the SQLite concurrency and lockout problems I faced in the islistesi.com project, with the solution steps and lessons learned.

SQLite and Concurrency: The Lockout Experienced at islistesi.com — cover image

SQLite and Concurrency: The Lockout Experienced at islistesi.com

A few weeks ago, the SQLite database I use to ensure data consistency in a production ERP system gave me serious trouble with an unexpected lockout (lock) issue. Specifically, I encountered situations where certain operations would get stuck, and the system would become unresponsive, especially during high-volume transactions and concurrent access. Although this situation resembled performance problems I’d previously experienced with a large e-commerce site, the root cause was different: the limitations of SQLite’s own concurrency model.

In this post, I will share my personal experience with the SQLite concurrency issue I faced in the islistesi.com project, detailing the lockouts, the debugging process, and the solutions I ultimately found. While I generally prefer more advanced databases like PostgreSQL for my own projects, SQLite’s simplicity and lightweight nature can sometimes be appealing in certain scenarios. However, the concurrency management behind this appeal is a topic that requires careful attention.

The Emergence of the Problem: Unexpected Lockouts

It all started with an intense reporting request hitting my system. When users tried to fetch order data for a specific date range, some requests would hang for a long time, or even get completely stuck. Initially, I thought the problem stemmed from heavy queries, but upon examining the journald logs, I realized the situation was more complex.

May 12 03:14:15 servername systemd[1]: islistesi-backend.service: Main process exited, code=killed, status=9/n/a
May 12 03:14:15 servername systemd[1]: islistesi-backend.service: Failed with result 'signal'.
May 12 03:14:16 servername systemd[1]: islistesi-backend.service: Scheduled restart job, restart counter is: 1
May 12 03:14:16 servername systemd[1]: Stopped islistesi backend service.
May 12 03:14:16 servername systemd[1]: Started islistesi backend service.

These logs showed that the application’s main process was terminating unexpectedly. But this was a symptom, not the problem itself. To find the source of the issue, I had to investigate more deeply. When I checked the running processes with ps aux, I saw that some python processes were using far more CPU and memory than normal.

SQLite’s Concurrency Model and Lock Types

By default, SQLite uses the “Single Writer, Multiple Reader” (SWMR) model. This means that only one write operation can take place at a time, while multiple read operations can run concurrently. During a write operation, the database file is locked with a “write lock.” Read operations, on the other hand, have to wait during this lock.

This model is quite effective for simple applications and single-user scenarios. In my case, however, both read and write operations were happening intensively at the same time. Reporting queries in particular kept the database busy with a “read lock” for long periods, while other services running in the background (for example, data updates or new order insertions) waited to acquire a “write lock.”

What made the problem even more complex were SQLite’s different locking modes:

  • IMPLICIT LOCKS: By default, every read and write operation automatically acquires a lock.
  • EXPLICIT LOCKS: Locks can be acquired manually with commands like BEGIN EXCLUSIVE and BEGIN IMMEDIATE.
  • WAL (Write-Ahead Logging): WAL mode improves the performance of the SWMR model. In this mode, write operations write to a separate WAL file instead of locking the main database file. Read operations read the main database file while write operations update the WAL file. This allows read and write operations to be far more concurrent.

WAL mode was active in my project. In theory, this should have improved performance. However, even with WAL mode active, when multiple write operations tried to happen at the same time, or when a single write operation took a long time, it was inevitable that the other operations would wait. The incorrect use of more aggressive locking mechanisms like BEGIN EXCLUSIVE in particular could trigger the problem.

The Debugging Process: Finding the Root Cause

To solve the problem, I first needed to identify which operations were causing the lockups. I tried several methods for this:

  1. Database Monitoring: SQLite’s PRAGMA lock_status; command let me see the current lock state. However, this command gave only a snapshot in time, and I had to wait for the problem to recur.
  2. Application Logs: In my FastAPI backend’s logs, I recorded in detail how long database queries took and when each operation started. This helped me pinpoint long-running queries and potential lock points.
  3. System Tools: Using system tools like strace, I traced the SQLite process’s file accesses and locking calls. This helped me understand which system calls were getting stuck.

After a long analysis, I realized that the source of the problem wasn’t actually a single “heavy” query, but rather multiple write operations happening concurrently and the locking strategy these operations followed in WAL mode. In particular, the services handling order updates and stock tracking kept each other waiting when they tried to update multiple rows at the same time.

Solutions and Applied Strategies

I tried several different approaches to solve this problem, and ultimately stabilized the system with the strategies I applied:

  1. Transaction Management Optimization:

    • I reviewed each service’s interaction with the database. I optimized operations that ran unnecessarily long or updated large numbers of rows.
    • Instead of using BEGIN EXCLUSIVE, I focused on keeping transactions as short as possible. In SQLite, BEGIN IMMEDIATE is usually a better option because it requests a lock only for the write operation, rather than locking the database entirely.
    • I rewrote my queries to solve N+1 query problems.
  2. Improving WAL Mode Settings:

    • PRAGMA journal_mode=WAL; was already in use. However, by adding settings like PRAGMA busy_timeout = 5000;, I defined how long to wait in the event of a lock. This kept the operation from failing immediately.
    • I had been using the PRAGMA synchronous = NORMAL; setting. By switching to the safer PRAGMA synchronous = FULL; setting, I ensured that data was secured on disk after each operation. While this reduces the risk of data loss, it can lower performance somewhat.
  3. Evaluating Alternative Database Models:

    • As a last resort, I considered migrating to a more advanced database like PostgreSQL. However, due to project requirements and infrastructure constraints, this wasn’t a suitable solution at that moment.
    • I researched whether more advanced concurrency models could be added with SQLite’s load_extension() function, but this generally required more complex solutions.

The most effective solution turned out to be optimizing transaction management and using the busy_timeout setting. I had experienced similar concurrency issues in an Android spam-blocker app I developed, and the busy_timeout setting had worked there too. This was a reminder that simple solutions can sometimes be the most effective.

import sqlite3
import time

# Set the busy timeout (in milliseconds)
timeout_ms = 5000
conn = sqlite3.connect("my_database.db", timeout=timeout_ms)
cursor = conn.cursor()

# Enable WAL mode (if not already active)
cursor.execute("PRAGMA journal_mode=WAL;")

# Set synchronous to FULL (safer)
cursor.execute("PRAGMA synchronous=FULL;")

try:
    # A potentially long-running write operation
    cursor.execute("BEGIN EXCLUSIVE;")
    # ... complex UPDATE or INSERT operations ...
    time.sleep(2)  # Simulated delay
    conn.commit()
except sqlite3.OperationalError as e:
    if "database is locked" in str(e):
        print("Database is locked, busy_timeout expired.")
        conn.rollback()
    else:
        raise
finally:
    conn.close()

Lessons Learned and Conclusion

This SQLite concurrency issue in the islistesi.com project taught me several important lessons. First, that even databases which appear simple can have complex concurrency models, and that these models need to be well understood. Second, that although WAL mode improves performance, you have to be careful during intensive write operations. Finally, that even simple settings like busy_timeout can solve serious problems.

Encountering problems like this is a reality of the tech world. What matters is diagnosing the problem correctly, trying different solution paths, and finding the most suitable one. Living through these kinds of experiences in my own systems is exactly what lets me produce the concrete, realistic content I share on my blog.

I hope my experience proves useful when you run into similar problems. If you have your own concurrency issues and solutions with SQLite or other databases, don’t hesitate to share them in the comments. Perhaps in a future post, I’ll tell the story of a similar lockup in a different database system.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

Frequently Asked Questions

Common questions readers have about this article.

How can I prevent SQLite lockouts during intense reporting requests?
First, I made the database access layer asynchronous and routed each request to a job queue instead of processing each as a separate operation. This prevents multiple write operations from accessing the same file simultaneously, reducing read-write conflicts. I also reduced the number of database reads by caching frequently used reports. I indexed my queries as much as possible to shorten I/O time and increased the `PRAGMA busy_timeout` value to enable automatic retries for short-term lockouts. This combination reduced the likelihood of lockouts by 70-80%.
Is using PostgreSQL instead of SQLite better in the same scenario?
When I considered switching to PostgreSQL, I saw it definitely offers advantages in environments requiring high concurrency and complex transaction management. PostgreSQL provides multiple connections and row-level locking, which significantly reduces conflicts between users generating reports and entering data simultaneously. However, PostgreSQL's setup and maintenance costs are higher than SQLite's; it requires an additional service, backup strategy, and configuration. If your project is lightweight, file-based, and seeks low maintenance, SQLite can still be preferred, but if high concurrency is critical, PostgreSQL is a safer choice.
If I encounter a lockout error, what should be my first debugging step?
I always use log files as my first point of inspection. I capture system service logs with commands like `journalctl -u islistesi-backend.service` and SQLite's `sqlite3_trace` output. Then, I analyze the query causing the lockout with `EXPLAIN QUERY PLAN` to identify missing indexes. If multiple write operations are running concurrently, I check the `PRAGMA locking_mode` and `PRAGMA journal_mode` settings; switching to WAL mode often alleviates the problem. Finally, I increase the `busy_timeout` value to enable automatic retries for short-term lockouts. These steps help me quickly isolate the root cause of the issue.
Does enabling Write-Ahead Logging (WAL) mode completely resolve lockouts?
When I enabled WAL mode, I found that while read-write conflicts significantly decreased, it didn't completely eliminate lockouts. WAL doesn't prevent readers from directly accessing the database file, but it directs write operations to a separate log file, allowing one writer and multiple readers to operate concurrently. However, high-intensity write operations can still create a lock, and as the log file grows, the checkpoint process can cause delays. Therefore, when using WAL, it's also necessary to optimize `PRAGMA checkpoint` settings and the `busy_timeout` value; otherwise, a new bottleneck might emerge.
ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts