Django get_or_create 함수 쓸 때는 unique 같이 쓰자

TL;DR: get_or_create 함수를 사용할 때에는 unique 혹은 unique_together 를 함께 사용해야, 동시성 환경에서 보장된다.

Django ORM에는 다른 프레임워크의 ORM에는 일반적으로 없는 get_or_create 라는 기능을 제공해 준다. 그대로 직역하자면 데이터베이스에 특정 데이터를 가진 열이 있으면 만들지 않고, 특정 데이터를 가진 열이 없으면 만들어준다는 것이다.

진짜로 그럴까?

최근에, 테이블 상에 유일해야 하는 데이터가 두 개가 들어가 있다고, 로직에 이슈가 없는지 데이터 확인 요청이 들어왔다. 해당 로직은 get_or_create 함수를 이용해서 작성해 두었는데, 당연히 해당 열은 unique 하게 처리가 될 것이라고 생각했었다. 하지만 여러 트랜잭션이 동시에 실행되는 경우, 각 세션의 트랜잭션 격리 때문에 서로의 데이터를 보지 못할 것이고, 그 결과 get을 동시에 실패하고, 두 개 이상의 row가 동시에 생성될 것이라는 의심이 들었다. 장고의 getorcreate 소스 코드를 살펴 보아도 해당하는 경우에 대응되는 코드가 없어 보였다.

실험을 진행해 보았다. PostgreSQL 13.3에서 실험했다.

# models.py

class TestTable(models.Model):
    data = models.BigIntegerField()

# management/commands/benchmark.py

from django.core.management.base import BaseCommand
from testapp.models import TestTable
import time

class Command(BaseCommand):
    def handle(self, *args, **kwargs):
        while True:
            TestTable.objects.get_or_create(data=time.time() * 1000)

벤치마크를 동시에 한 세션만 돌리는 경우 : 기대한 대로 실행된다.

select data from testapp_testtable group by data having count(*) > 1;

 data
------
(0 rows)

벤치마크를 동시에 두 세션을 돌리는 경우 : 에러가 나서 튕긴다.

특이 사항: 데이터베이스 격리 수준을 어떻게 설정해도 동일한 오류가 발생했다.

psycopg2.extensions.ISOLATION_LEVEL_READ_COMMITTED
psycopg2.extensions.ISOLATION_LEVEL_REPEATABLE_READ
psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE

(.venv) λ ~/dist/get_or_create/ ./manage.py benchmark
Traceback (most recent call last):
  File "./manage.py", line 22, in <module>
    main()

  ...

  File "/Users/youngminz/dist/get_or_create/.venv/lib/python3.8/site-packages/django/db/models/query.py", line 439, in get
    raise self.model.MultipleObjectsReturned(
testapp.models.MultipleObjectsReturned: get() returned more than one TestTable -- it returned 2!
(.venv) λ ~/dist/get_or_create/

에러를 무시하도록 소스 코드를 수정해 보았다.

class Command(BaseCommand):
    def handle(self, *args, **kwargs):
        while True:
            try:
                TestTable.objects.get_or_create(data=time.time() * 1000)
            except TestTable.MultipleObjectsReturned:
                pass

그 결과, 수많은 여러 row들이 생성되었다.

select data from testapp_testtable group by data having count(*) > 1;

 data
---------------
 1623568361312
 1623568360484
 1623568361256
 1623568358832
 1623568361526
 ...
(4343 rows)

unique=True 조건을 주어 보았다.
```
class TestTable(models.Model):
    data = models.BigIntegerField(unique=True)
```
3의 에러를 무시하도록 소스 코드를 수정한 것도 원래대로 복원했다. 그 결과 동시에 두 개가 아니라 세 개의 세션을 돌려도 중복이 발생하지 않았다.
```
select data from testapp_testtable group by data having count(*) > 1;

 data
------
(0 rows)
```

왜 이럴까?

get_or_create 장고 문서를 살펴보니 이에 대한 Warning이 있었다. 역시 이번에도 컴퓨터는 잘못이 없고 내가 제대로 읽어보지 않고 가져다 쓴 결과이다.

This method is atomic assuming that the database enforces uniqueness of the keyword arguments (see unique or unique_together). If the fields used in the keyword arguments do not have a uniqueness constraint, concurrent calls to this method may result in multiple rows with the same parameters being inserted.

결론

get_or_create 쓴 코드가 동시에 실행될 수 있다면 unique 혹은 unique_together 를 같이 써야 한다. MySQL에서도, PostgreSQL에서도 마찬가지이다.