Opened 9 years ago

Last modified 3 years ago

#330 new task

document current back-up and recovery strategy

Reported by: bewelcome@… Owned by: planetcruiser
Priority: blocker Milestone: unassigned
Component: ServerSetup Keywords: technical: security, emergency, escalation
Cc:

Description

Hey,

Probably most of you know the story about the reliability and availability of couchsurfing.com

So for BW to be on the safe side, start earliest possible to implement some basic parts of security:

What about backup plans?

What about Load balancing?

Decentralization?

Escalation plans?

Regular backup verification?

Database management? Backup here? Bkp verification?

Infrastructure for learning about drawbacks, failures etc?

Someone explicitely responsible for caring about emergency procedures? Reliable infrastructure for warranting operability of backoffice and contactability of emergency/escalation teams?

... more to come :)

Cheers,

Nils
(runia)

Change History (21)

comment:1 Changed 9 years ago by philipp

thanks for the reminder - you got a private message

comment:2 Changed 9 years ago by guaka

  • freq_reported set to 1
  • show_on_bw set to 0

I think this is way too general to leave this bug open.

comment:3 Changed 9 years ago by guaka

  • Summary changed from technical background to emergency plan, back-up

comment:4 Changed 9 years ago by philipp

  • Component changed from FrameWork to ServerSetup
  • Owner set to philipp hannu
  • Type changed from new feature to developer task

comment:5 Changed 9 years ago by feuerdaemon

  • Milestone changed from unassigned to BigPicture

comment:6 Changed 9 years ago by philipp

  • Milestone changed from BigPicture to unassigned

Milestone BigPicture? deleted

comment:7 Changed 8 years ago by globetrotter_tt

  • Owner changed from philipp hannu to fake51, tobixen

comment:8 Changed 8 years ago by fake51

  • Owner changed from fake51, tobixen to fake51 tobixen

comment:9 Changed 7 years ago by fake51

  • Owner changed from fake51 tobixen to fake51
  • Status changed from new to assigned

comment:10 Changed 7 years ago by fake51

  • Cc tobixen added

comment:11 Changed 6 years ago by planetcruiser

  • Owner changed from fake51 to planetcruiser
  • Summary changed from emergency plan, back-up to document current back-up and recovery strategy

this is super important. i will look into this and write up a little summary about the current state of things. input as comments here is welcome :)

comment:12 Changed 6 years ago by planetcruiser

  • Priority changed from critical to blocker

making this stick out from all the "critical" bugs..

comment:13 Changed 6 years ago by jeanyves

Still supper important

The lesson I learn from the big Mistake I made in September (regardless that forgetting the "where" in a SQL query is a trap in which everyone can fall)

1) Backup should not be encrypted or at least the encryption known for several people, the recent one is to be on the production server

2) in case of problem, moving BW down to take time to repair properly is better than trying to repairing with members online

3) the setting of the Disk is importang (DB on a 5Gb disk is clearly not enough, not only for the binary files problems)

Last edited 6 years ago by jeanyves (previous) (diff)

comment:14 Changed 5 years ago by James_Oder_Dave

Add to the list above.

  • Backups copied to another physical data center (fire, natural disaster risk etc).
  • Most recent backup left on same server or in same data center in order to make for a quicker restore if needed.

comment:15 Changed 5 years ago by TimLoal

  • Priority changed from blocker to critical

comment:16 Changed 5 years ago by jsfan

  • Milestone Future deleted

Milestone Future deleted

comment:17 Changed 4 years ago by planetcruiser

  • Milestone set to unassigned
  • Priority changed from critical to blocker

always was and still is super important. not just that urgent ;)

comment:18 Changed 4 years ago by planetcruiser

  • Cc tobixen removed

this is not to be seen as a blocker in release cycles, but in terms of importance. i see no other way to express hyper-importance for system admin in trac right now. maybe a separate tracker for sys admin?

a reminder: an oversight in their backup strategy almost killed couchsurfing once!

comment:19 Changed 3 years ago by leoalone

Upped, since it is a recurring problem. (see messages on bw dev list)

comment:20 Changed 3 years ago by leoalone

again ...

comment:21 Changed 3 years ago by shevek

  • Status changed from assigned to new
Note: See TracTickets for help on using tickets.