update README and create a Maintenance.md file

This commit is contained in:
Jonathan Math 2020-11-06 14:40:58 +07:00
parent cbd08902c5
commit c452728f82
2 changed files with 128 additions and 64 deletions

128
Maintenance.md Normal file
View file

@ -0,0 +1,128 @@
### Bootstrapping Search **** is this still applicable?
Once you have an elasticsearch server running, you'll want to bootstrap it with feed and story indexes.
./manage.py index_feeds
Stories will be indexed automatically.
If you need to move search servers and want to just delete everything in the search database, you need to reset the MUserSearch table. Run
`make shell`
>>> from apps.search.models import MUserSearch
>>> MUserSearch.remove_all()
### If feeds aren't fetching:
check that the `tasked_feeds` queue is empty. You can drain it by running:
`make shell`
```
Feed.drain_task_feeds()
```
This happens when a deploy on the task servers hits faults and the task servers lose their
connection without giving the tasked feeds back to the queue. Feeds that fall through this
crack are automatically fixed after 24 hours, but if many feeds fall through due to a bad
deploy or electrical failure, you'll want to accelerate that check by just draining the
tasked feeds pool, adding those feeds back into the queue. This command is idempotent.
## In Case of Downtime
You got the downtime message either through email or SMS. This is the order of operations for determining what's wrong.
0a. If downtime goes over 5 minutes, go to Twitter and say you're handling it. Be transparent about what it is,
NewsBlur's followers are largely technical. Also the 502 page points users to Twitter for status updates.
0b. Ensure you have `secrets-newsblur/configs/hosts` installed in your `/etc/hosts` so server hostnames
work.
1. Check www.newsblur.com to confirm it's down.
If you don't get a 502 page, then NewsBlur isn't even reachable and you just need to contact [the
hosting provider](https://cloudsupport.digitalocean.com/s/createticket) and yell at them.
2. Check which servers can't be reached on HAProxy stats page. Basic auth can be found in secrets/configs/haproxy.conf. Search the secrets repo for "gimmiestats".
Typically it'll be mongo, but any of the redis or postgres servers can be unreachable due to
acts of god. Otherwise, a frequent cause is lack of disk space. There are monitors on every DB
server watching for disk space, emailing me when they're running low, but it still happens.
3. Check [Sentry](https://app.getsentry.com/newsblur/app/) and see if the answer is at the top of the
list.
This will show if a database (redis, mongo, postgres) can't be found.
4. Check the various databases:
a. If Redis server (db_redis, db_redis_story, db_redis_pubsub) can't connect, redis is probably down.
SSH into the offending server (or just check both the `db_redis` and `db_redis_story` servers) and
check if `redis` is running. You can often `tail -f -n 100 /var/log/redis.log` to find out if
background saving was being SIG(TERM|INT)'ed. When redis goes down, it's always because it's
consuming too much memory. That shouldn't happen, so check the [munin
graphs](http://db_redis/munin/).
Boot it with `sudo /etc/init.d/redis start`.
b. If mongo (db_mongo) can't connect, mongo is probably down.
This is rare and usually signifies hardware failure. SSH into `db_mongo` and check logs with `tail
-f -n 100 /var/log/mongodb/mongodb.log`. Start mongo with `sudo /etc/init.d/mongodb start` then
promote the next largest mongodb server. You want to then promote one of the secondaries to
primary, kill the offending primary machine, and rebuild it (preferably at a higher size). I
recommend waiting a day to rebuild it so that you get a different machine. Don't forget to lodge a
support ticket with the hosting provider so they know to check the machine.
If it's the db_mongo_analytics machine, there is no backup nor secondaries of the data (because
it's ephemeral and used for, you guessed it, analytics). You can easily provision a new mongodb
server and point to that machine.
If mongo is out of space, which happens, the servers need to be re-synced every 2-3 months to
compress the data bloat. Simply `rm -fr /var/lib/mongodb/*` and re-start Mongo. It will re-sync.
If both secondaries are down, then the primary Mongo will go down. You'll need a secondary mongo
in the sync state at the very least before the primary will accept reads. It shouldn't take long to
get into that state, but you'll need a mongodb machine setup. You can immediately reuse the
non-working secondary if disk space is the only issue.
c. If postgresql (db_pgsql) can't connect, postgres is probably down.
This is the rarest of the rare and has in fact never happened. Machine failure. If you can salvage
the db data, move it to another machine. Worst case you have nightly backups in S3. The fabfile.py
has commands to assist in restoring from backup (the backup file just needs to be local).
4. Point to a new/different machine
a. Confirm the IP address of the new machine with `fab list_do`.
b. Change `secrets-newsbur/config/hosts` to reflect the new machine.
c. Copy the new `hosts` file to all machines with:
```
fab all setup_hosts
```
d. Changes should be instant, but you can also bounce every machine with:
```
fab web deploy
fab task celery
```
e. Monitor `utils/tlnb.py` and `utils/tlnbt.py` for lots of reading and feed fetching.
5. If feeds aren't fetching, check that the `tasked_feeds` queue is empty. You can drain it by running:
```
Feed.drain_task_feeds()
```
This happens when a deploy on the task servers hits faults and the task servers lose their
connection without giving the tasked feeds back to the queue. Feeds that fall through this
crack are automatically fixed after 24 hours, but if many feeds fall through due to a bad
deploy or electrical failure, you'll want to accelerate that check by just draining the
tasked feeds pool, adding those feeds back into the queue. This command is idempotent.

View file

@ -98,70 +98,6 @@ reader, and feed importer. To run the test suite:
`make test`
## Keeping NewsBlur Running **** is this still applicable?
These commands keep NewsBlur fresh and updated. While on a development server, these
commands do not need to be run more than once. However, you will probably want to run
the `refresh_feeds` command regularly so you have new stories to test with and read.
### Fetching feeds **** is this still applicable?
If you just want to fetch feeds once, you can use the `refresh_feeds` management command:
./manage.py refresh_feeds
If you want to fetch feeds regardless of when they were last updated:
./manage.py refresh_feeds --force
You can also fetch the feeds for a specific user:
./manage.py refresh_feeds --user=newsblur
You'll want to put this `refresh_feeds` command on a timer to keep your feeds up to date.
### Feedback **** is this still applicable?
To populate the feedback table on the homepage, use the `collect_feedback` management
command every few minutes:
./manage.py collect_feedback
### Statistics **** is this still applicable?
To populate the statistics graphs on the homepage, use the `collect_stats` management
command every few minutes:
./manage.py collect_stats
### Bootstrapping Search **** is this still applicable?
Once you have an elasticsearch server running, you'll want to bootstrap it with feed and story indexes.
./manage.py index_feeds
Stories will be indexed automatically.
If you need to move search servers and want to just delete everything in the search database, you need to reset the MUserSearch table. Run
`make shell`
>>> from apps.search.models import MUserSearch
>>> MUserSearch.remove_all()
### If feeds aren't fetching:
check that the `tasked_feeds` queue is empty. You can drain it by running:
`make shell`
```
Feed.drain_task_feeds()
```
This happens when a deploy on the task servers hits faults and the task servers lose their
connection without giving the tasked feeds back to the queue. Feeds that fall through this
crack are automatically fixed after 24 hours, but if many feeds fall through due to a bad
deploy or electrical failure, you'll want to accelerate that check by just draining the
tasked feeds pool, adding those feeds back into the queue. This command is idempotent.
## Author
* Created by [Samuel Clay](http://www.samuelclay.com).