Using Star Schema SCD Type 2

4 minute read

A few articles ago, we wrote about data modeling. In this article, we will continue our discussion of data warehousing using star schema design principles. More specifically, we will explore Type 2 slowly changing dimension (SCD) types and using star schema SCD Type 2.

Type 2 SCDs, instead of overwriting existing dimension records, create new dimension records. 

Operational Schema

Patient table (patient)

This table includes an insurance provider column. When this column changes, the operational record is updated, however, the data mart must keep both the original version of the patient record and the new one. Otherwise, historical patient provider visit information will not be accurate.

create table patient

(

id serial

constraint patient_pk

primary key,

name varchar(50) not null,

mrn varchar(50) not null,

insurance_provider varchar(50)

);

alter table patient owner to postgres;

Provider table (provider)

This table includes an insurance providers accepted column. Since this is only an example, we have added a single column for this information. An actual implementation would more likely create an insurance provider table and then a provider insurance provider relationship table, which would allow you to normalize the 1 to many relationship between medical providers and insurance providers.

In our example, when the insurance providers accepted column changes, the operational record is updated. However, the data mart must keep both the original version of the medical provider record and the new one. Otherwise, historical patient provider visit information will not be accurate.

create table provider

(

id serial

constraint provider_pk

primary key,

name varchar(50) not null,

license_number varchar(50) not null,

office_address varchar(200) not null,

insurance_providers_accepted varchar

);

alter table provider owner to postgres;

Patient provider visit table (patient_provider_visit)

create table patient_provider_visit

(

id serial

constraint patient_provider_visit_pk

primary key,

patient_id integer not null

constraint patient_provider_visit_patient_id_fk

references patient,

provider_id integer not null

constraint patient_provider_visit_provider_id_fk

references provider,

visit_date date not null,

visit_notes varchar,

reimbursement_date date

);

alter table patient_provider_visit owner to postgres;

Data Mart Schema

To use SCD type 2, we could create a data mart with the schema below.

Dimension patient table (dim_patient)

We added SCD type 2 columns start date and end date

If no end date exists, the record is current. Only one patient record can have a null end date.

If a new record is added for a given patient, the data mart process will expire the previous record, making it no longer current, by setting its end date to the current date. The new record will be added with its start date as the current date.

create table dim_patient

(

patient_key serial

constraint dim_patient_pk

primary key,

patient_id integer not null

constraint dim_patient_patient_id_fk

references patient,

name varchar(50) not null,

mrn varchar(50) not null,

insurance_provider varchar(50),

start_date timestamp not null,

end_date timestamp

);

alter table dim_patient owner to postgres;

create unique index dim_patient_patient_id_end_date_uindex

on dim_patient (patient_id, end_date);

create unique index dim_patient_patient_key_uindex

on dim_patient (patient_key);

Dimension provider table (dim_provider)

We added SCD type 2 columns start date and end date

If no end date exists, the record is current. Only one provider record can have a null end date.

If a new record is added for a given provider, the data mart process will expire the previous record, making it no longer current, by setting its end date to the current date. The new record will be added with its start date as the current date.

create table dim_provider

(

provider_key serial

constraint dim_provider_pk

primary key,

provider_id integer not null

constraint dim_provider_provider_id_fk

references provider,

name varchar(50) not null,

license_number varchar(50) not null,

office_address varchar(200) not null,

insurance_providers_accepted varchar,

start_date timestamp not null,

end_date timestamp

);

alter table dim_provider owner to postgres;

create unique index dim_provider_provider_id_end_date_uindex

on dim_provider (provider_id, end_date);

create unique index dim_provider_provider_key_uindex

on dim_provider (provider_key);

Dimension date table (dim.date)

create table dim_date

(

date_key serial

constraint dim_date_pk

primary key,

day smallint not null,

month smallint not null,

year smallint not null

);

alter table dim_date owner to postgres;

create unique index dim_date_date_key_uindex

on dim_date (date_key);

Fact patient provider visit table (fact_patient_provider_visit)

create table fact_patient_provider_visit

(

patient_key integer not null

constraint fact_patient_provider_visit_dim_patient_patient_key_fk

references dim_patient,

provider_key integer not null

constraint fact_patient_provider_visit_dim_provider_provider_key_fk

references dim_provider,

visit_date_key integer not null

constraint fact_patient_provider_visit_dim_date_date_key_fk

references dim_date,

reimbursement_date_key integer

constraint fact_patient_provider_visit_dim_date_date_key_fk_2

references dim_date,

visit_notes varchar,

constraint fact_patient_provider_visit_pk

primary key (patient_key, provider_key, visit_date_key)

);

alter table fact_patient_provider_visit owner to postgres;

Summary

Our data mart has 2 dimensions which are SCD2, dim_patient and dim_provider. Multiple patient and provider dimensional records can exist for a given operational record. For example, if there is a patient record for John Doe in the operational database, there could be multiple John Doe records in dim_patient, each representing a snapshot of the John Doe operational record at a point in time.

Up next

Bi-temporality.

Twitter

LinkedIn