Course: Big Data - IU S24
Author: Firas Jolha
In this stage, you need to build a dashboard to present your analysis results. Dashboards are single screens in which various critical pieces of information are placed in the form of panels.
For the project purposes, you have to present at least the results of EDA and PDA, in addition to data characteristics but try to build a cool dashboard and present more about your findings in the project. Your objective here should be to impress the business stakeholders and provide them with insights which can help them take decisions.
Note: If you want to create a chart for data stored in tabular form (*.csv, *.json,…etc), you need to store it as a table in your hive database.
You can easily create a dashboard in Apache Superset from Dashboards
tab as follows:
You can add your project title as title for the dashboard. In this window, you can see your charts and also Layout elements for organizing your charts/panels in the dashboard.
Apache Superset provides the following layout elements:
Use the layout elements to organize your charts and to build cool dashboards.
You can create CSS templates and edit the CSS of the dashboard.
In this part of the dashboard, you need to present the characteristics of the dataset and data features that you got from public sources. This section should present the description of the initial dataset in addition to some samples from the data.
You can use SQL Editor of Apache Superset to query the data then save it as a dataset.
You can query the datatypes of a table in your psql database as follows:
SELECT
column_name,
data_type
FROM
information_schema.columns
WHERE
table_name = 'tabe_name';
For table emps
, it would be:
SELECT
column_name,
data_type
FROM
information_schema.columns
WHERE
table_name = 'emps';
Here you add the charts you built in stage II. You also need to add a conclusion for each data insight.
In the figure below, you see the charts and conclusion for one data insight.
In this section, you present the features after feature extraction, the performance of models, prediction results for some data samples.
For *.csv
files that you stored in HDFS in stage III, you create external Hive tables and create datasets and charts for them.
A dashboard helps you to monitor events or activities at a glance by providing key insights and analysis about your data on one or more pages or screens. You can explore the data that is shown in a visualization by using the interactive title, drilling up or down columns, and viewing the details of a data point.
You can change the visualization type or change the columns that are used in the visualization. You can use filters to focus on one area of your data or to see the impact of one column, and you can use calculations to answer questions that cannot be answered by the source columns.
A story is a type of view that contains a set of scenes that are displayed in sequence over time.
Stories are similar to dashboards because they also use visualizations to share your insights. Stories differ from dashboards because they provide an over-time narrative and can convey a conclusion or recommendation.
stage4.sh
to test this stage.pylint
command.