DB Overhaul Trauma: Running Lemmy on Postgres, Failing Drives, and Null Logos
The platform recently endured major technical trauma, involving moves from OVH to dedicated hardware and critical upgrades like migrating the pict-rs database from sled-db to Postgres. Furthermore, an unscheduled failure forced the replacement of two Raid 1 Samsung MZVL2512HCJQ-00B07 drives during one maintenance window.
The core arguments are dominated by infrastructure risk. 'Illecors' flagged that the pict-rs migration is dangerously complex because the sled database is 'not stateless.' More critically, 'Illecors' warned admins must set the site logo to *NULL* to prevent the lemmy-ui from crashing with a 500 error during image handling. The controversy centers on risk management: do you accept temporary data loss or system instability—like 'split brain' conditions—to avoid a total service blackout?
The clear takeaway is that the platform requires sweeping, difficult infrastructure replacement. The community understands the scope includes object storage migrations and database shifts. The fault lines are drawn between those who document the necessary patches and those who fear the inherent instability of the ongoing migration process.
Key Points
The database must move from sled-db to Postgres.
This is a necessary, complex data structure upgrade point highlighted by multiple records.
System instability risks during migration are paramount.
The core debate focuses on whether accepting temporary data loss is preferable to risking a complete service downtime.
Setting the site logo to NULL prevents application crashes.
'Illecors' specified this technical fix is required to keep the lemmy-ui from throwing a 500 error.
The pict-rs migration is inherently risky.
'Illecors' noted the process is highly complex because the sled database state cannot be ignored.
Hardware failures necessitate unscheduled downtime.
A specific incident required a two-hour window due to failing Raid 1 drives.
Some users reported noticeable speed improvements.
'wise_pancake' reported browsing and loading speeds were significantly snappier post-maintenance.
Source Discussions (4)
This report was synthesized from the following Lemmy discussions, ranked by community score.