Descriptive Analytics and Visualisation for Advanced data analysis report
This assignment requires you to analyse a given data set, interpret
and draw conclusions from your analysis, and then convey your conclusions in a written
report to a person with little or no knowledge of Business Analytics. Analysis
of the data requires the use of techniques predominately studied in Module 2
(but will also require some techniques from Module 1).
Case Study
Baycoast is a (fictitious) local government area (called
a 'city') within greater Melbourne, Australia. It consists of a number of
different suburbs, all with their own history of development. The city grew in
different stages, with new suburbs
gradually emerging. It covers some wealthy suburbs
and some not so wealthy. As the name would
indicate, the city is located on the Bay.
The city stretches for several kilometres along the
Bay's lovely beaches, and for several kilometres inland. About 60,000 people
live in the suburbs of Baycoast.
The main objective is to conduct exploratory,
descriptive and causal analysis is to gain a comprehensive understanding of
house prices in the Baycoast region and an understanding of the most important
factors that impact prices. Your analysis will be based on a random sample of
120 houses from the city. Note that for the purpose of the assignment the unit of analysis is a ‘House’.
It is defined as a stand-alone dwelling. That is, flats, apartments, etc are not included in the database.
The assignment requires five separate tasks:
1. An overall view of house prices in Baycoast.
2. Identification of the main factors influencing house prices
3. Development of a multiple regression model for prices.
4. Some basic time series analysis of house prices.
5. Discuss the suitability of the data set along with other potential
data sources and approaches for the purpose of this analysis.
Further details of each task is given below.
The Data
The cross-sectional data collected contains
a number of categorical and numerical variables
which are described below:
Price
|
Selling price of house in $'000
|
Rooms
|
Number of main rooms in the house
|
Lot
Size
|
Area of the block of land (lot) in
square metres
|
Age
|
Age of the house in years
|
Area
|
Area of the house in square metres
|
Material
|
Timber = 1, Veneer = 2, Brick = 3
|
To Train
|
Distance of the house to the nearest
train station (kilometres)
|
To Bus
|
Distance of the house to the nearest
bus stop (kilometres)
|
To Shops
|
Distance of the house to the nearest
shopping centre (kilometres)
|
Street
|
Street appeal as evaluated by the real
estate agency:
|
ranges from 0 (lowest appeal) to 10 (highest appeal)
|
|
Storeys
|
Number of storeys or levels in the house
|
Style
|
Traditional Style = 0, Non-Traditional
Style = 1
|
Bedrooms
|
Number of bedrooms
|
Bathrooms
|
Number of bathrooms
|
Kitchen
|
Style of kitchen: Adequate = 0, Modern
= 1
|
Heating
|
Central or other heating system
installed: No Heat = 0, Yes Heat = 1
|
AirCon
|
Air conditioning installed: No AC (No
AirCon) = 0, AC (Yes AirCon) = 1
|
Bay Views
|
Proportion of views of the Bay from a
prominent part of the property:
|
ranges from 0 = Nil views up to 1 = Full views
|
|
Suburb
|
Three different suburbs: 1 = Brightly, 2 = Tarron B, 3 = Millard
|
Weekly Rent
$
|
Actual or estimated weekly rent in $.
|
Rental Return %
|
Annual rate of return from rent income
(Weekly rent x 52)/(Price in $'000) as a percentage (%)
|
Condition
|
The condition of the house in general.
Very Poor = 1, Poor = 2, Good = 3, Excellent = 4
|
Rental
Status
|
Vacant (available for rent)
= 1; Rented (currently rented) = 2; Owner (occupied by owner) = 3
|
In addition, time
series data is available on Quarterly Median House Prices
Time Period
|
Time Period Index
|
Quarter
|
Quarter Description
|
Median House Price ($'000)
|
Median House price in $'000
|
Task One – Summary of House Prices
Only analyse Price by itself. The importance of other variables is
considered in other tasks. You should, at the very least, thoroughly
investigate relevant summary measures (and their reliability) for this
variable. Also, there may well be suitable tables and graphs that will
illustrate, further and more clearly, other important features of house prices.
In your report you should comment, where relevant, on data location, central
tendency, variability, shape and outliers for this variable.
Task Two – Factors
influencing house prices
Analyse house prices against other variables included in the data
set. Use appropriate descriptive techniques such as cross-tabulations,
comparative summary measures, scatter diagrams to identify key relationships.
In your report you should only include the most important factors that impact
house prices (approximately between 3 – 5 factors).
Task Three – Development
of a multiple regression model
You should follow the model building process outlined in topic 5.
You are only required to consider linear relationships in the model. Each stage
of developing your model should be included in your analysis. You will notice
in the Baycoast spreadsheet that there are tabs called Q3-1, Q3-2, etc. These
are where you place each version of your model. Note that if you have
undertaken more iterations of the model then add more worksheets.
The
report should only include your final model and a description of its overall
strength as well as the influence of each variable.
Task Four – Time Series analysis
Quarterly median house prices in Baycoast from Q4, 2009 to Q3, 2013
are given in QtrPriceData worksheet.
Develop a multiplicative time series model to forecast median house prices for
the next 4 quarters (Q4, 2013 to Q3, 2014).
If the observed
values for those 4 quarters are as below, calculate the MAPE of the forecast.
Time Period
|
Quarter
|
Observed
|
17
|
2013-Q4
|
980
|
18
|
2014-Q1
|
1062
|
19
|
2014-Q2
|
1206
|
20
|
2014-Q3
|
954
|
Task Five – Critique the Business Research Approach
Discuss the suitability of the general business research approach
taken. In your response, include possible alternative approaches and other
sources of (secondary) data. If the analysis was to be repeated in the future,
would you recommend a different approach? Note that no actual analysis is
required for this task
Submission
You are required
to submit both your written report (approx. 2000 words) and analysis (in Excel).
Report (40%)
The report should be written for an audience that has no or minimal
business analytics background. You should avoid the use of technical terms and
mathematics. The one exception may be in task 3 as you may want to include the
actual regression model in the report. You are required to describe all five
tasks. It is up to you how to structure and format the final report.
Analysis (60%)
The analysis should be submitted in the appropriate worksheets in
the Excel file. Ie. all analysis for task one should be included in tab ‘Q1’, task two in ‘Q2’, etc. Each
step in the model building for task three should be
included in the tabs Q3-Correlation, Q3-1, Q3-2, etc. If you need more worksheets
then add them. Further instructions are included at the top of each worksheet.
Before submitting your analysis make sure it is logically organised
and any incorrect or unnecessary output has been removed. Marks will be
penalised for poor presentation or disorganised/incorrect results.
Approximate
breakdown of marks for the analysis are task 1 (10%), task 2 (10%), task 3 (30%),
task 4 (10%),
and total for analysis (60%)
No comments:
Post a Comment