What is RudderStack?
Short answer: RudderStack is an open-source Segment alternative written in Go, built for the enterprise. .
Long answer: RudderStack is a platform for collecting, storing and routing customer event data to dozens of tools. RudderStack is open-source, can run in your cloud environment (AWS, GCP, Azure or even your data-centre) and provides a powerful transformation framework to process your event data on the fly.
RudderStack runs as a single go binary with Postgres. It also needs the destination (e.g. GA, Amplitude) specific transformation code which are node scripts. This repo contains the core backend and the transformation modules of Rudder. The client SDKs are in a separate repo (link below).
Rudder server is released under AGPLv3 License
See the HackerNews discussion around RudderStack.
Questions? Read our Docs OR join our Discord channel. Or please email soumyadeb at rudderlabs.com.
Try RudderStack?
You can use the cloud hosted RudderStack instance to experience the product. Click here.
Features
- Production Ready: Multiple companies (e.g. MatterMost, IFTTT, Grofers, 1mg and more) are running RudderStack for collecting events.
- Extreme Scale: One of our largest installations is sending 300M events/day with peak of 40K req/sec via a multi-node RudderStack setup.
- Segment API Compatibile: RudderStack is Segment API and library compatible so don't need to change your app if you are using Segment.
- Cloud Destinations: Google Analytics, Amplitude, MixPanel, Adjust, AppsFlyer and dozens more destinations.
- Warehouse Destinations: S3, Minio, Redshift, Snowflake, Google BigQuery support.
- Transformations: User-specified transformation to filter/transform events.
- Rich UI: Written in react
- SDKs: Javascript, Android or iOS and server-side SDKs.
- Detailed Docs: Docs
Why RudderStack ?
We are building RudderStack because we believe open-source and cloud-prem is important for three main reasons
-
Privacy & Security: You should be able to collect and store your customer data without sending everything to a 3rd party vendor or embedding proprietary SDKs (and getting blocked by ad-blockers). With RudderStack, the event data is always in your control. Besides, RudderStack gives you fine-grained control over what data to forward to what analytical tool.
-
Processing Flexibility: You should be able to enhance OR transform your event data by combining it with your other internal data, e.g. stored in your transactional systems. RudderStack makes that possible because it provides a powerful JS-based event transformation framework. Furthermore, since RudderStack runs inside your cloud or on-prem environment, you can access your production data to join with the event data.
-
Unlimited Events: Event volume-based pricing of most commercial systems is broken. You should be able to collect as much data as possible without worrying about overrunning event budgets. RudderStack's core BE is open-source and free to use.
Contribution
We would love to see people contributing to RudderStack. see CONTRIBUTING.md for more information on contributing to RudderStack.
Stay Connected
- Join our Discord
- Follow RudderStack on Twitter
UI Pages
Connections Page
Events Page
Setup Instructions (Hosted Demo Account)
- Go to the dashboard and set up your account.
- Select
RudderStack Hosted Service
from the top right corner after you login. - Follow (Send Test Events) instructions below to send test event.
Setup Instructions (Docker)
The docker setup is the easiest & fastest way to try out RudderStack.
-
Go to the dashboard
https://app.rudderstack.com
and set up your account. Copy your workspace token from top of the home page. The Hosted Control plane is FREE (and will always be) for open-source users.(Note) You can also use the open-source config-generator-UI to create the source & destination configs and pass it to RudderStack in case you don't want to use the hosted control plane. The open-source generator however does not have certain features like Transformations or Live Event Debugger
-
If you have a Github account with SSH key added, then clone the repo with
git clone git@github.com:rudderlabs/rudder-server.git
. Move to the directorycd rudder-server
and update the rudder-transformer withgit submodule init && git submodule update
(Optional) If you don't have SSH enabled Github account or prefer HTTPS, then clone the repo with
git clone https://github.com/rudderlabs/rudder-server.git
. Move to the directorycd rudder-server
and change the rudder-transformer submodule path to HTTPSsed -i.bak 's,git@github.com:rudderlabs/rudder-transformer.git,https://github.com/rudderlabs/rudder-transformer.git,g' .gitmodules
. Update the rudder-transformer withgit submodule init && git submodule update
-
Replace
<your_workspace_token>
inbuild/docker.env
with the above token. -
(Optional) Uncomment and set
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
inbuild/docker.env
if you want to add S3 as a destination on the UI. -
Run the command
docker-compose up --build
to bring up all the services. -
Follow (Send Test Events) instructions below to send test event.
Setup Instructions (Kubernetes)
Note: This is the recommended way of installing RudderStack for running in production. Our hosted deployment runs on Kubernetes so we maintain/patch much more frequently.
-
Go to the dashboard
https://app.rudderstack.com
and set up your account. Copy your workspace token from top of the home page. -
Our helm scripts and instructions are in a separate repo - Download Here
Setup Instructions (Native Installation)
Disclaimer: This is not the recommended way of installing RudderStack. Please use this if you want to know more about the internals.
- Install Golang 1.13 or above. Download Here
- Install NodeJS 10.6 or above. Download Here
- Install PostgreSQL 10 or above and set up the DB. If you are on a linux distribution, you have to switch to postgres user
sudo su - postgres
before running the below commands.
psql -c "CREATE DATABASE jobsdb"
psql -c "CREATE USER rudder SUPERUSER"
psql "jobsdb" -c "ALTER USER rudder WITH ENCRYPTED PASSWORD 'rudder'";
psql "jobsdb" -c "GRANT ALL PRIVILEGES ON DATABASE jobsdb to rudder";
-
Go to the dashboard and set up your account. Copy your workspace token from top of the home page
-
If you have a Github account with SSH key added, then clone the repo with
git clone git@github.com:rudderlabs/rudder-server.git
. Move to the directorycd rudder-server
and update the rudder-transformer withgit submodule init && git submodule update
(Optional) If you don't have SSH enabled Github account or prefer HTTPS, then clone the repo with
git clone https://github.com/rudderlabs/rudder-server.git
. Move to the directorycd rudder-server
and change the rudder-transformer submodule path to HTTPSsed -i.bak 's,git@github.com:rudderlabs/rudder-transformer.git,https://github.com/rudderlabs/rudder-transformer.git,g' .gitmodules
. Update the rudder-transformer withgit submodule init && git submodule update
-
Navigate to the transformer directory
cd rudder-transformer
-
Install dependencies
npm i
and start the destination transformernode destTransformer.js
-
Navigate back to main directory
cd rudder-server
. Copy the sample.env to the main directorycp config/sample.env .env
-
Update the
WORKSPACE_TOKEN
environment variable with the token fetched in step 4 -
Run the backend server
go run -mod=vendor main.go
-
Follow (Send Test Events) instructions below to send test event.
Send Test Events
- If you already have a Google Analytics account, keep the tracking ID handy. If not, please create one and get the tracking ID. The Google Analytics account needs to have a Web Property (Web+App does't seem to work)
- Create one source (Android or iOS) and configure a Google Analytics destination for the same with the above tracking ID
- We have bundled a shell script that can generate test events. Get the source “writeKey” from our app dashboard and then run the following command. Run
cd scripts; ./generate-event <writeKeyHere> http://localhost:8080/v1/batch
. NOTE:writeKey
is different from theyour_workspace_token
in step 2. Former is associated with the source while the latter is for your account. - You can then login to your Google Analytics account and verify that events are delivered. Go to
MainPage->RealTime->Events
.RealTime
view is important as the other dashboard can sometimes take 24-48 hrs to refresh. - You can use our Javascript, Android or iOS SDKs for sending events from your app.
RudderStack Config Generator
Rudderstack has two components control plane and data plane. Data plane reliably delivers your event data. Control plane manages the configuration of your sources and destinations. This configuration can also be read from a file instead of from Control plane, if you don't want to use our hosted control plane.
Config-generator provides the UI to manage the source and destination configurations without needing to signup, etc. All the source and destination configuration stays on your local storage. You can export/import config to a JSON file.
Setup
cd utils/config-gen
npm install
npm start
RudderStack config generator starts on the default port i.e., http://localhost:3000. On a successful setup, you should see the following
Export workspace config
After adding the required sources and destinations, export your workspace config. This workspace-config is required by the RudderStack Server. To learn more about adding sources and destinations in RudderStack, refer Adding a Source and Destination in RudderStack
Update the config variables configFromFile
and configJSONPath
in rudder-server to read workspace config from the exported JSON file.
Start RudderStack with the workspace config file
- Download the workspace config file on your machine.
- In
docker-compose.yml
, uncommentvolumes
section underbackend
service. Specify the path to your workspace config. - In
build/docker.env
, set the environment variableRSERVER_BACKEND_CONFIG_CONFIG_FROM_FILE=true
Telemetry
To help us improve RudderStack, we collect performance and diagnostic metrics about how you use it and how it's working. No customer data is present in the metrics.
The metrics collection can be disabled by setting the variable enableDiagnostics
to false in config/config.toml
Following are the metrics that are being collected. They are listed in config/config.toml
under the Diagnostics section.
- enableServerStartMetric: Tracks every time when server starts
- enableConfigIdentifyMetric: Tracks when the config is fetched for the first time from control-plane
- enableServerStartedMetric: Tracks when the server is ready to accept requests
- enableConfigProcessedMetric: Tracks when the config is changed
- enableGatewayMetric: Tracks no. of success/failed requests
- enableRouterMetric: Tracks no. of success/aborted/retries requests for every router destination
- enableBatchRouterMetric: Tracks no. of success/failed requests for every batch router destination
- enableDestinationFailuresMetric: Tracks destination failures
RudderStack Architecture
The following is a brief overview of the major components of RudderStack.
RudderStack Control Plane
The UI to configure the sources, destinations etc. It consists of
Config backend: This is the backend service that handles the sources, destinations and their connections. User management and access based roles are defined here.
Customer webapp: This is the front end application that enables the teams to set up their customer data routing with RudderStack. These will show you high-level data on event deliveries and more stats. It also provides access to custom enterprise features.
RudderStack Data Plane
Data plane is our core engine that receives the events, stores, transforms them and reliably delivers to the destinations. This engine can be customized to your business requirements by a wide variety of configuration options. Eg. You can choose to enable backing up events to an S3 bucket, the maximum size of the event for the server to reject malicious requests. Sticking to defaults will work well for most of the companies but you have the flexibility to customize the data plane.
The data plane uses Postgres as the store for events. We built our streaming framework on top of Postgres – that’s a topic for a future blog post. Reliable delivery and order of the events are the first principles in our design.
RudderStack Destination Transformation
Conversion of events from RudderStack format into destination-specific format is handled by the transformation module. The transformation codes are written in Javascript. I
The following blogs provide an overview of our transformation module
https://rudderlabs.com/transformations-in-rudder-part-1/
https://rudderlabs.com/transformations-in-rudder-part-2/
If you are missing a transformation, please feel free to add it to the repository.
RudderStack User Transformation
RudderStack also supports user-specific transformations for real-time operations like aggregation, sampling, modifying events etc. The following blog describes one real-life use case of the transformation module
https://rudderlabs.com/customer-case-study-casino-game/
Client SDKs
The client SDKs provide APIs collecting events and sending it to the RudderStack Backend.
Coming Soon
- More performance benchmarks. On a single m4.2xlarge, RudderStack can process ~3K events/sec. We will evaluate other instance types and publish numbers soon.
- More documentation
- More destination support
- HA support
- More SDKs (or Segment compatibility)