Pipes
The common-or-garden pipe is Meerschaum’s abstraction for incremental ETL. Pipes have enter and output connectors and retailer parameters to configure the habits of their syncing processes. This can be so simple as a SQL question or might embrace customized keys to be used in your plugins.
As a result of pipes’ metadata are saved alongside their tables, they’re simply editable (whether or not by way of edit pipes
or on the internet UI), which facilitates prototyping. However this dynamic nature introduces the identical drawback described at first of this text: with the intention to scale growth, a Compose file is required to outline a undertaking’s elements in a means that may be simply version-controlled.
In line with the Meerschaum Compose specification, pipes are outlined in a listing below the keys sync:pipes
. Every merchandise defines the keys and parameters wanted to assemble the pipe, like a blueprint for what you count on the pipes within the database to mirror.
For instance, the next snippet would outline a pipe that will sync a desk climate
from a distant PostgreSQL database (outlined under as sql:supply
) to a neighborhood SQLite file (sql:dest
on this undertaking).
sync:
pipes:
- connector: "sql:supply"
metric: "climate"
goal: "climate"
columns:
datetime: "timestamp"
station: "station"
parameters:
fetch:
backtrack_minutes: 1440
question: |-
SELECT timestamp, station, temperature
FROM climateconfig:
meerschaum:
occasion: "sql:dest"
connectors:
sql:
supply: "postgresql://person:go@host:5432/db"
dest: "sqlite:////tmp/dest.db"
This instance would incrementally replace a desk named climate
utilizing the datetime axis timestamp
for vary bounding (1 day backtracking), and this column plus the ID column station
collectively would make up a composite major key used for de-duplication.
The URI is written actually simply for instance; if you’re committing a compose file, both reference an surroundings variable (e.g.
$SECRET_URI
) or your host Meerschaum configuration (e.g.MRSM{meerschaum:connectors:sql:supply}
).
Connectors
First, a fast refresher on Meerschaum connectors: you’ll be able to outline connectors via a number of methods, the preferred of which being via surroundings variables. Suppose you outline your connection secrets and techniques in an surroundings file:
export MRSM_SQL_REMOTE='postgresql://person:go@host:5432/db'
export MRSM_FOO_BAR='{
"person": "abc",
"password": "def"
}'
The primary surroundings variable MRSM_SQL_REMOTE
would outline the connector sql:distant
. If you happen to sourced this file, you may confirm this connector with the command mrsm present connectors sql:distant
.
The second variable is an instance of the best way to outline a customized FooConnector
, which you may create utilizing the @make_connector
decorator in a plugin. Customized connectors are a robust instrument, however for now, right here’s the essential construction:
from meerschaum.connectors import make_connector, Connector@make_connector
class FooConnector(Connector):
REQUIRED_ATTRIBUTES = ['username', 'password']
def fetch(pipe, **kwargs):
docs = []
return docs
So we’ve simply reviewed the best way to outline connectors in our host surroundings. Let’s see the best way to make these host connectors accessible in a Meerschaum undertaking. Within the compose file, all the connectors we’d like for our undertaking are outlined below config:meerschaum:connectors
. Use the MRSM{}
syntax to reference the keys out of your host surroundings and go them into the undertaking.
config:
meerschaum:
occasion: "sql:app"
connectors:
sql:
app: MRSM{meerschaum:connectors:sql:distant}
foo:
bar: MRSM{meerschaum:connectors:foo:bar}
Plugins
Meerschaum is definitely extendable by way of plugins, that are Python modules. Plugins might fetch knowledge, implement customized connectors, and/or lengthen Meerschaum (e.g. customized actions, flags, API endpoints, and many others.).
Meerschaum helps a number of plugins directories (by way of
MRSM_PLUGINS_DIR
), which can be set below theplugins_dir
key inmrsm-compose.yaml
(the default is a listingplugins
).
Storing your plugins inside a Compose undertaking makes it clear the way you count on your plugins for use. For instance, the Compose file throughout the MongoDBConnector undertaking demonstrates how the customized connector is used as each a connector and for example.
Package deal Administration
Whenever you first begin utilizing Meerschaum Compose, the very first thing you’ll discover is that it’ll begin putting in a good quantity of Python packages. Don’t fear about your surroundings ― all the pieces is put in into digital environments inside your undertaking’s root
subdirectory (a bit ironic, proper?). You possibly can set up your plugins’ dependencies with mrsm compose init
.
To share packages between tasks, set the important thing root_dir
in mrsm-compose.yml
to a brand new path. Deleting this root
listing will successfully uninstall all the packages that Compose downloaded, protecting your host surroundings intact.