Data Visualization using Tableau

Titanic Data Visualization with Tableau-Akanksha Goel

Tableau links

Before Feedback- https://public.tableau.com/profile/akanksha005#!/vizhome/TitanicDataVisualisation/Story1 After Feedback- https://public.tableau.com/profile/akanksha005#!/vizhome/TitanicDatasetVisualisationafter_feedback/Story1?publish=yes

Summary

The largest passenger liner in service at the time, Titanic had an estimated 2,224 people on board when she struck an iceberg at around 23:40 on Sunday, 14 April 1912. On Monday, 15 April resulted in the deaths of more than 1,500 people, which made it one of the deadliest peacetime maritime disasters in history. In this provided visualization, we’re going to see how several factors affect the survival rate of passengers.First we see which cabin was firstly evacuated and more people were saved. Then we take ticket class which will be our main study factor. Then we will add passengers’ sex and port of Embarkation to the ticket class to see the effect on the survival rate. We will also see how number of dependents and age groups affect the probability of survival.

Design

The whole story use: 1. Bar Graphs -In every bar chat, the y-axis shows the count of passengers, and the labels on top of the bars show the percentage or number of pessengers of the same single bar.As the whole story is focused on the people who survived or died from the accident,Bar charts easily help to see the count of people for particular category.It is easy to compare different categories of a variable. 2. Line Plots-Line plots are used for Quantitative continuous variables, which helps in finding the relationship between two variables.

Throughout the visualization, only three colors were used: Blue, orange, and Dark red gradient. Reasoning: This will help keep a consistent color encoding to make it easier to read the plots.

Feedback

The initial version of the visualization was shared with two co-workers. Below is the received feedback and the changes that I made based on the discussions we’ve had: In the cabin group vs (Dead vs Survivors) visualization the relation between cabin group and the number of survivors is bit unclear.Therefore made the required changes in the latest visualization. In the Port of Embarkation visualization the comparison between the number of people dead and survived cannot be seen properly.Therefore I changed the axis to vertically compare the number of survivors and perished people and added a lable telling the count of people dead and survived respectively. In the age group visualization the number of people in significant age group is not considered which may affect the complete observation. Therefore to consider i took average number of survivors on Y axis. In the second last visualization,don't understand how the percentile of survived can be 100% when there are 1 sibling.Therefore in reply replaced labels with actual probability of survival.

Resouces

The Titanic dataset can be downloaded from Dataset Options

Description Data Dictionary Variable Definition Key survival Survival 0 = No, 1 = Yes pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd sex Sex Age Age in years sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

Variable    Notes
pclass:     A proxy for socio-economic status (SES)
1st = Upper
2nd = Middle
3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way...
Sibling = brother, sister, stepbrother, stepsister

Cleaning the dataset

  • Filling the missing values of age with the mean of the age.
#cleaning the dataset
import pandas as pd
Titanic_data=pd.read_csv('titanic-data.csv')   
Mean_age=Titanic_data['Age'].mean()
Titanic_data['Age']= Titanic_data['Age'].fillna(Mean_age)
# print Titanic_data
df=pd.DataFrame(Titanic_data,columns=['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'])
# df.to_csv('titanic-data.csv', sep=',', encoding='utf-8')


print Titanic_data.head()
   Unnamed: 0  PassengerId  Survived  Pclass  \
0           0            1         0       3   
1           1            2         1       1   
2           2            3         1       3   
3           3            4         1       1   
4           4            5         0       3

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S
  • Grouping the age in bins of 10 yrs.

  • Dropping features that were not adding value to the purpose of the needed analysis such as: 'Ticket', 'Fare', 'Name', 'Pessenger ID'

  • Grouping the Cabin into ['A','B','C','D','E','F','other'] by using the starting Alphabet of cabin person belong to.

links

social