FeedHQ
FeedHQ is a simple, lightweight web-based feed reader. Main features:
User-facing features
- RSS and ATOM support
- Grouping by categories
- Awesome pagination and intelligent browsing
- Great readability on all screen sizes (smatphones, tablets and desktops)
- Mobile-first, retina-ready
- Reading list management with Instapaper, Readability or Read It Later support
- Filter out already read entries
- Hides images/media by default (and therefore filters ads and tracking stuff)
- Multiple user support
- OPML import
- Syntax highlighting, awesome for reading tech blogs
- Keyboard navigation
- Subtome support
Developer- / Sysadmin-facing features
- Nice with web servers, uses ETag and Last-Modified HTTP headers
- Handles HTTP status codes nicely (permanent redirects, gone, not-modified…)
- Exponential backoff support
- PubSubHubbub support
Installation
Requirements:
- Python 3.4 or greater
- Redis (2.6+ recommended)
- PostgreSQL (9.2+ recommended but anything >=Â 8.4 should work)
- Elasticsearch (see compatibility table below)
Getting the code:
git clone https://github.com/feedhq/feedhq.git cd feedhq virtualenv -p python2 env source env/bin/activate add2virtualenv . pip install -r requirements.txt
Elasticsearch version requirements:
Commit hash | Date | ES version |
---|---|---|
Up to 3aea18 | Sep 18, 2014 | 1.1 |
From 6d5cbc | Oct 28, 2014 | 1.3 |
Configuration
FeedHQ relies on environment variables for its configuration. The required environment variables are:
DJANGO_SETTINGS_MODULE
: set it tofeedhq.settings
.SECRET_KEY
: set to a long random string.ALLOWED_HOSTS
: space-separated list of hosts which serve the web app. E.g.www.feedhq.org feedhq.org
.FROM_EMAIL
: the email address that sends automated emails (password lost, etc.). E.g.FeedHQ <feedhq@example.com>
.REDIS_URL
: a URL for configuring redis. E.g.redis://localhost:6354/1
.DATABASE_URL
: a heroku-like database URL. E.g.postgres://user:password@host:port/database
.
Optionally you can customize:
DEBUG
: set it to a non-empty value to enable the Django debug mode.MEDIA_ROOT
: the absolute location where media files (user-generated) are stored. This must be a public directory on your webserver available underMEDIA_URL
.MEDIA_URL
: the URL that handles media files (user-generated) served fromMEDIA_ROOT
. By default, it is set to/media/
.STATIC_ROOT
: the absolute location where static files (CSS/JS files) are stored. This must be a public directory on your webserver available under the/static/
URL.STATIC_URL
: the URL that serves static files (CSS/JS files) located inSTATIC_ROOT
. By default, it is set to/static/
.SENTRY_DSN
: a DSN to enable Sentry error reporting.SESSION_COOKIE_PATH
: the path set on the session cookie. E.g.,/
.HTTPS
: set-it to a non-empty value to configure FeedHQ for SSL access.EMAIL_URL
: a URL for configuring email. E.g.smtp://user:password@host:port/?backend=my.EmailBackend&use_tls=true
. Thebackend
querystring parameter sets the DjangoEMAIL_BACKEND
setting. By default emails only go to the development console.ES_NODES
: space-separated list of elasticsearch nodes to use for cluster discovery. Defaults tolocalhost:9200
.ES_INDEX
: the name of the elasticsearch index. Defaults tofeedhq
.ES_ALIAS_TEMPLATE
: string template to format the per-user alias that is created in elasticsearch. Defaults tofeedhq-{0}
.ES_SHARDS
: number of shards that you want in your elasticsearch index. This cannot be changed once you have created the index. Defaults to 5.ES_REPLICAS
: number of times you want your shards to be replicated on the elasticsearch cluster. Defaults to 1. Set it to 0 if you only have one node. This is only used at index creation but the index setting can be altered dynamically.HEALTH_SECRET
: a shared secret between your FeedHQ server(s) and your monitoring. The/health/
endpoint can be protected by requiring clients to provide a shared secret in anX-Token
header. If no secret is set, the health endpoint is open to all.LOG_SYSLOG
: set it to1
to send logs to/dev/log
instead of stdout (the default). Logs are formatted in JSON.
For integration with external services:
READITLATER_API_KEY
: your Pocket API key.INSTAPAPER_CONSUMER_KEY
,INSTAPAPER_CONSUMER_SECRET
: your Instapaper API keys.READABILITY_CONSUMER_KEY
,READABILITY_CONSUMER_SECRET
: your Readability API keys.
Then deploy the Django app using the recipe that fits your installation. More
documentation on the Django deployment guide. The WSGI application is
located at feedhq.wsgi.application
.
To create the Elasticsearch index:
django-admin.py create_index
Then you'll need to create the appropriate PostgreSQL database and run:
django-admin.py syncdb django-admin.py migrate
Note that additionally to the web server, you need to run one or more
consumers for the task queue. This is done with the rqworker
management
command:
django-admin.py rqworker store high default low favicons
The arguments are queue names.
Once your application is deployed (you've run django-admin.py syncdb
to
create the database tables, django-admin.py migrate
to run the initial
migrations and django-admin.py collectstatic
to collect your static
files), you can add users to the application. On the admin interface, add
as many users as you want. Then add some some categories and feeds to your
account using the regular interface.
Crawl for updates:
django-admin.py sync_scheduler django-admin.py updatefeeds
Set up a cron job to update your feeds on a regular basis. This puts the oldest-updated feeds in the update queue:
*/5 * * * * /path/to/env/django-admin.py updatefeeds
The updatefeeds
command puts 1/12th of the feeds in the update queue. Feeds
won't update if they've been updated in the past 60 minutes, so the 5-minute
period for cron jobs distributes nicely the updates along the 1-hour
period.
A cron job should also be set up for picking and updating favicons (the
--all
switch processes existing favicons in case they have changed, which
you should probably do every month or so):
@monthly /path/to/env/bin/django-admin.py favicons --all
Here is a full list of management commands that you should schedule:
add_missing
creates the missing denormalized URLs for crawling. Since URLs are denormalized it's recommended to run it every now and then to ensure consistency.Recommended frequency: hourly.
Resource consumption: negligible (2 database queries).
delete_unsubscribed
is the delete counterpart ofadd_missing
.Recommended frequency: hourly.
Resource consumption: negligible (2 database queries).
favicons --all
forces fetching the favicons for all existing URLs. It's useful for picking up new favicons when they're updated. Depending on your volume of data, this can be resource-intensive.Recommended frequency: monthly.
Resource consumption: the command itself only triggers async jobs but the jobs perform network I/O, HTML parsing, disk I/O and database queries.
updatefeeds
picks 1/12th of the URLs and fetches them.Recommended frequency: every 5 minutes.
Resource consumption: the command itself only triggers async jobs but the jobs perform network I/O, HTML parsing and -- when updates are found -- database queries.
sync_scheduler
adds missing URLs to the scheduler. Also useful to run every now and then.Recommended frequency: every hour.
Resource consumption: one large database query per chunk of 10k feeds which aren't in the scheduler, plus one redis
HMSET
per URL that's not in the scheduler. As a routine task it's not resource-intensive.sync_pubsubhubbub
unsubscribes from unneeded PubSubHubbub subscriptions.Recommended frequency: once a day.
Resource consumption: low.
clean_rq
removes stale RQ jobs.Recommended frequency: once a day.
Resource consumption: low. Only makes requests to Redis.
delete_old
removes expired entries as determined by each user's entry TTL.Recommended frequency: once a day.
Resource consumption: medium, makes
delete_by_query
queries to ES (1 per user).delete_expired_tokens
removes expired API tokens. Tokens are valid for 7 days, after which they are renewed by client apps.Recommended frequency: once a day.
Resource consumption: low (one
DELETE
query).
Development
Install the development requirements:
pip install -r requirements-dev.txt
Run the tests:
make test
Or if you want to run the tests with django-admin.py
directly, make sure
you use feedhq.test_settings
as the DJANGO_SETTINGS_MODULE
environment
variable to avoid making network calls while running the tests.
The Django debug toolbar is enabled when the DEBUG
environment variable is
true and the django-debug-toolbar
package is installed.
Foreman is used in development to start a lightweight Django server and run
RQ workers. Environment variables are managed using a python port of
Daemontools' envdir
utility. A running Redis server is required for
this workflow:
make run
When running django-admin.py updatefeeds
on your development machine,
make sure you have the DEBUG
environment variable present to avoid making
PubSubHubbub subscription requests without any valid callback URL.
Environment variables for development are set in the envdir
directory. For
tests, they are located in the tests/envdir
directory.
When working on frontend assets (SCSS or js files), watchman can be used to
automatically run compass and uglify on file changes. Install watchman,
Compass (gem install bundle && bundle install
) and npm (part of
nodejs) to get started. Then run:
make watch
Once you're done working with assets, simply kill watchman:
pkill watchman