How to Get Foreign Keys Horribly Wrong

56 points by Bogdanp 4 days ago

cogman10 10 hours ago

This sort of thing hasn't really done much to make me like ORMs.

It seems like a lot of code to generate the tables in the first place and you STILL need to read the output scripts just to ensure the ORM isn't generating some garbage you didn't want.

That seems like a lot of extra effort when a simple migration service (such as liquibase) could do the same work running SQL directly. No question on "which indexes are getting created and why". No deep knowledge of Django interactions with sql. Instead, it's just directly running the SQL you want to run.

wvenable 9 hours ago

I do read my migration scripts generated from an ORM to make sure my source code is correct.
Liquibase starts with "Write your database change code in your preferred authoring tool in SQL, YAML, JSON, or XML." So instead of just having my ORM generate that and I just have to read them to ensure correctness, I have to manually write change scripts instead? I don't see how that's is comparable.
Liquibase could certainly come in after I have some SQL scripts generated from my ORM and do whatever it does.
teaearlgraycold 10 hours ago

I would say automatic migration generation isn’t a necessary or particularly important part of an ORM. ORMs are there to map your database relational objects to your client language’s objects.
- cjs_ac 10 hours ago
  
  I think the person you're replying to is arguing for using some sort of database migration library without using an ORM library. It's the same position I came to recently.
  - teaearlgraycold 9 hours ago
    
    Yes but they seem to have switched because they didn’t like ORM-generated migration code. I think that’s a bad reason to switch because it wasn’t an important part of ORMs in the first place. Basically, I want to know why they were even using ORMs before.
    I don’t want to go without an ORM because I’ll end up making one ad-hoc anyway. I’m not going to do work on raw tuples in my application code.
- Tostino 10 hours ago
  
  I'd call it an anti-feature for most long-lived projects that will end up needing migrations through its lifetime.
  I go the liquibase route for migrations, and just use the mapping portion of any ORM.
- pphysch 10 hours ago
  
  Most(?) devs nowadays are introduced to database migration tools as a DX feature.
  "Wow, 1-2 command and my app and database are in sync!"
  In reality, migration tools are 100% about data loss prevention.
  If you do not care about data loss, updating your schema is trivial, just drop everything and create. Dev environments should be stateless anyways, using separate data "fixtures" when needed.
  Data loss itself is a highly nuanced topic. Some data is replaceable, some might be protected in a separate store. So I agree that ORMs should challenge the assumption that automatic migration tools need to be part of their kitchen sink.
  - wagwang 6 hours ago
    
    The ORM auto migration tools are a 100% a DX feature. Obviously any serious application will have complicated migrations that outgrow the generated sql; doesn't mean its not a nice to have feature for quick iteration.
  - teaearlgraycold 9 hours ago
    
    I like that they provide the basic structure of how to apply yet unseen migrations. But they don’t need to generate the SQL at all. You quickly learn to never trust the generated code. It always needs to be manually reviewed.

aidos 9 hours ago

I’ve done a lot of interviewing and I’ve discovered that many devs (even experienced ones) don’t understand the difference between indexes and foreign keys.

My assumption is that people have used orms that automatically add the index for you when you create a relationship so they just conflate them all. Often they’ll say that a foreign key is needed to improve the performance and when you dig into it, their mental model is all wrong. The sense they have is that the other table gets some sort of relationship array structure to make lookups fast.

It’s an interesting phenomenon of the abstraction.

Don’t get me wrong, I love sqlalchemy and alembic but probably because I understand what’s happening underneath so I know the right way to hold it so things are efficient and migrations are safe.

alexjplant 4 hours ago

During a work meeting I once suggested using a non-PK column in a Postgres database for a foreign key. A coworker confidently said that we shouldn't because joins would be slow. I pointed out that we could create an index on that column and they rebutted by claiming that PKs created some kind of "special" index. I didn't want to burn goodwill and so didn't push it further but it always struck me as silly.
Depending upon the database storage engine, available memory, and table size I could see there being _some_ performance hit if only PKs are used for statistics but I'd think that modern RDBMSes are smart enough to cache appropriately. Am I missing something?
- quectophoton an hour ago
  
  > and they rebutted by claiming that PKs created some kind of "special" index
  Maybe they were thinking about something like the "clustered indexes" from SQL Server, and mistakenly thought PostgreSQL also worked like that:
  > "When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table doesn't already exist and you don't specify a unique nonclustered index." [1]
  > "Clustered indexes sort and store the data rows in the table or view based on their key values." [2]
  So I'm guessing you could squeeze some extra performance for certain access patterns, maybe? I have not worked at any place where I had needed to worry about low level details like this, though, so obligatory disclaimer to take this comment with a grain of salt due to my lack of first-hand experience.
  [1]: https://learn.microsoft.com/en-us/sql/relational-databases/i...
  [2]: https://learn.microsoft.com/en-us/sql/relational-databases/i...
Fishkins 8 hours ago

Huh, that's interesting. Mixing indexes and FKs is a major conceptual error.
FWIW, I've also asked everyone I've interviewed in the past decade about indexes and FKs. Most folks I've talked to seem to understand FKs. They're often fuzzier on the details of indexes, but I don't recall anyone conflating the two.
- aidos 6 hours ago
  
  I guess it depends on how much time you’ve spent in a relational db. For people who mostly interact with them via an orm, I can see where the confusion comes from.
bevr1337 8 hours ago

> their mental model is all wrong.
Is it? In Postgres, all FK references must be to a column with a PK or unique constraint or part of another index. Additionally, Postgres and Maria (maybe all SQL?) automatically create indexes for PKs and unique constraints. There's a high likelihood that a foreign key is already indexed _in the other table_.
Generally, I agree with your statement. Adding a FK won't magically improve performance or create useful indices. But, the presence of a FK or refactoring to support a FK does (tangentially) point back to that index.
- aidos 6 hours ago
  
  I wasn’t totally clear on my original statement. As you point out, the referenced columns in the referenced table need to have a unique constraint and that’s done with a unique index. My understanding is that this ensures there’s no ambiguity as to which row is referenced and allows for efficient enforcement of the FK constraint.
  Django automatically creates an index on the referencing table to ensure that joins are fast. The fact that you have the relationship in the ORM means that’s how you’re likely to access the data so it makes perfect sense.
  The mental model mismatch I’ve seen is that people appear to think of the relationship as being on the parent object “pointing” at the child table.
  - bevr1337 2 hours ago
    
    I'll admit my experience in Django is only migrating customers off Django. Thanks for adding some interesting details about that ecosystem
- ak39 8 hours ago
  
  By definition, a FK has to reference a PK in the “parent”.
  - aidos 6 hours ago
    
    Not quite. It can reference any combination of columns with a unique index (of with the PK is by definition).
- UltraSane 7 hours ago
  
  Yes. Not understanding the difference means you really don't understand the relational model. It would be like a network engineer not understanding the difference between IP and MAC addresses.
whyowhy3484939 8 hours ago

Very strange if you ask me and disturbing. I don't know if I'd let such a dev touch a database. Of course nowadays we just vibe code and YOLO everything, but still. This is making me feel old.
hobs 5 hours ago

An index is one thing (and important and good), but an FK allows you to completely eschew IO if done right. In other words "I guarantee that all values in this list exist in that list" is a great simple optimization path and some sql engines can use it to avoid joining data or checking for existence at all.

miggol 6 hours ago

I don't want to defend Django here, surely this should be categorized as a bug. But on the other hand, for this situation to come up you have to be the following:

- The kind of person to dive into the schema and worry about an unnecessary index

- Smart enough to heed Django's warnings and use `Meta.UniqueConstraint`

- Dumb enough to ignore Django's warnings and not use `Meta.Indexes`

I think it's funny that the kind of dev that 100% relies on the ORM and would benefit from this warning would probably never find themselves in this gritty optimization situation in the first place.

That being said, I enjoyed the article and learned something so maybe I'm the target audience and not them.

jihadjihad 11 hours ago

> Django will implicitly add an index on a ForeignKey field unless explicitly stated otherwise.

This is nice to know if you're using Django, but as important to note is that neither Postgres nor SQLAlchemy / Alembic will do this automatically.

rrauenza 10 hours ago

How can we determine if an index can be satisfied by a constraint index?

For example, does the FK need to be the first field in a unique together?

dakiol 7 hours ago

Is this for real? I don’t know why anyone would deal with such amount of incidental complexity (django orm) when one can just use plain sql.

twelve40 7 hours ago

why is this so surprising? every place i worked at, going back probably 6 jobs, was using an ORM (django, hibernate, or even a self-built one), they went on to get acquired by Twitter, Microsoft, Uber etc, so not completely stupid or obscure. Even if you have a personal dislike of ORMs, if you ever work with/for another team with an exiting codebase and a DB, chances are you will have to work with one.