Data Preparation Guidelines for BHATLIB#
This document provides clear, step-by-step guidelines for preparing your data for use with the BHATLIB library in GAUSS.
Overview#
BHATLIB requires your data to be structured in a clear, consistent way to enable seamless estimation of discrete choice models. This guide will help you prepare your dataset correctly.
File Format#
Your data file should be in any format compatible with the GAUSS
loadd()
procedure (e.g.,.csv
,.gdat
,.xls
,.dta
, etc.).Ensure the file is clean and formatted with column headers for easy reference.
Rows: Observations#
Each row should represent one observation (choice situation).
Each row corresponds to an individual’s decision in a specific scenario.
Columns: Alternatives and Choices#
Create separate columns for each possible alternative.
Use binary coding to represent choices:
The column for the chosen alternative should be
1
.All other alternative columns for that row should be
0
.
Only one alternative should be marked as
1
per row.
Example:
If a person chooses transit (TR) over driving alone (DA) or shared ride (SR):
Alt1_ch, Alt2_ch, Alt3_ch
0, 0, 1
Availability of Alternatives#
BHATLIB allows for flexibility in specifying the availability of alternatives for each individual.
If all alternatives are available to all individuals (no restrictions):
No additional availability columns are needed in your dataset.
The system will assume all alternatives are fully available for each observation.
If availability varies across individuals:
Create one column per alternative to indicate availability.
Use binary coding:
1
if the alternative is available to that individual.0
if the alternative is unavailable.
Example column names:
alt1_avail
,alt2_avail
,alt3_avail
, etc.
Example:
If Alt2 is unavailable to a respondent:
alt1_avail, alt2_avail, alt3_avail
1, 0, 1
Columns: Individual-Specific Variables#
You can include additional columns for individual-specific variables (e.g., income, age, etc.) that may influence the choice.
Summary Checklist#
Use a GAUSS
loadd()
compatible file with clear column headers.Each row contains one observation/choice situation.
Separate columns for each alternative with binary choice coding (
1
for chosen,0
for non-chosen).Only one
1
per row in choice columns.No availability columns needed if all alternatives are available.
If availability varies, add one binary availability column per alternative.
Following these guidelines will ensure that your data is ready for BHATLIB analysis without additional restructuring, enabling a smooth and efficient estimation process.