Data Strategies for Panintelligence

Panintelligence (pi) communicates with databases using SQL.

The metadata capabilities of pi allow us to access data that may not always be perfectly structured. However, when querying large volumes of data, the structure and location of your data become more important.

Two concepts are worth understanding when querying the data in the production schema is not possible:

E.T.L / E.L.T

Extract Transform and Load, more often Extract Load and then Transform.
But essentially,

  • Extract: get data from a source(s)

  • Transform: alter data structure, pre-calculate and validate data

  • Load: put data into a DB designed for high volume queries

Data warehouse

What is a data warehouse.

When we talk about a data warehouse we often mean 2 things.

  1. The database used to hold and allow querying of the data.

  2. The fact that we have applied Transformations to the data to optomise queries. The term data lake refers to a database that holds data from multiple data sources.

Technologies that work well with pi.

Whilst pi is tool agnostic, here are some examples of tools that we have had success with.

ELT

Rivery - https://rivery.io/

Great simple tool - particularly good at pulling data from API sources and combining into Snowfake or BigQuery. Has standard toolkits (Kit Hub) to combine data from common sources, i.e. social media.

https://rivery.io/kits/panintelligence-marketing-visualization/

Stitch - https://www.stitchdata.com/

Simple tool - great a pulling data from API sources and combining in to a large selection of databases. Wide range of connectors.

Fivetran -

Sophisticated ETL tool. Often used by Data Engineers.

Segment -

Sophisticated ETL tool. Often used by Data Engineers.

Data warehouses

Snowflake -

BigQuery -

MS Azzure -

Redshift -

Yellowbrick -

Clickhouse -

Firebolt -

 

pi can actually connect to any DB that supports JDBC

The complete list of connectors used by pi can be found here;