Importing and Exporting Data
A few articles ago, we delved into data storage. In this article, we will cover importing and exporting data to power your application. In many cases, your application will require data from 3rd parties in order to operate. Examples are: 1. Locations of all hospitals in the US; 2. EHR data for your users; etc. The first question you need to answer is where you can source this data from. For certain data, such as location data, there are paid services like the Google Places API as well as free sources such as OpenStreetMap’s Nominatim. Typically there are also databases or files that you can purchase. These could be thought of as snapshot sources rather than real-time because they are created at a specific point in time. Make sure to verify the date at which the snapshot was created as well as how often updates are made available and at what additional cost, if any.
A second question is around what data your application will need to make available to 3rd parties. Examples of how your customers might access your data include: 1. API; 2. flat file; 3. web portal; etc.
Once you have determined the data inputs and outputs for your application, you will need to decide how to import (inputs) and export (outputs) the data. Next is to decide which data integration technology to use. ETL, which stands for extract, transform and load, is a common choice. ELT stands for extract, load and transform is a newer technology that is gaining adoption in particular in cloud environments. One main difference between ETL and ELT is that, in ETL, transformations occur outside of your destination database (because transformation happens before load). With ELT, since the load happens before the transformation, the database itself can be leveraged for the transformation. The ELT approach can help you provide better data lineage and data provenance because each step that the data takes is saved and available for querying in your database.
There are many ETL and ELT tools available in the market. If the data import and export needs of your application are significant, using a tool can be the right choice. However, if data management itself is part of the value proposition for your company, investing in building this functionality in house can be advantageous.