Data Preparation Guidelines for BHATLIB¶
This document provides clear, step-by-step guidelines for preparing your data for use with the BHATLIB library in GAUSS.
Overview¶
BHATLIB requires your data to be structured in a clear, consistent way to enable seamless estimation of discrete choice models. This guide will help you prepare your dataset correctly.
File Format¶
- Your data file should be in any format compatible with the GAUSS
loadd()procedure (e.g.,.csv,.gdat,.xls,.dta, etc.). - Ensure the file is clean and formatted with column headers for easy reference.
Rows: Observations¶
- Each row should represent one observation (choice situation).
- Each row corresponds to an individual’s decision in a specific scenario.
Columns: Alternatives and Choices¶
- Create separate columns for each possible alternative.
- Use binary coding to represent choices:
- The column for the chosen alternative should be
1. - All other alternative columns for that row should be
0.
- The column for the chosen alternative should be
- Only one alternative should be marked as
1per row.
Example:
If a person chooses transit (TR) over driving alone (DA) or shared ride (SR):
Alt1_ch, Alt2_ch, Alt3_ch
0, 0, 1
Availability of Alternatives¶
BHATLIB allows for flexibility in specifying the availability of alternatives for each individual.
- If all alternatives are available to all individuals (no restrictions):
- No additional availability columns are needed in your dataset.
- The system will assume all alternatives are fully available for each observation.
- If availability varies across individuals:
- Create one column per alternative to indicate availability.
- Use binary coding:
1if the alternative is available to that individual.0if the alternative is unavailable.
- Example column names:
alt1_avail,alt2_avail,alt3_avail, etc.
Example:
If Alt2 is unavailable to a respondent:
alt1_avail, alt2_avail, alt3_avail
1, 0, 1
Columns: Individual-Specific Variables¶
- You can include additional columns for individual-specific variables (e.g., income, age, etc.) that may influence the choice.
Summary Checklist¶
- Use a GAUSS
loadd()compatible file with clear column headers. - Each row contains one observation/choice situation.
- Separate columns for each alternative with binary choice coding (
1for chosen,0for non-chosen). - Only one
1per row in choice columns. - No availability columns needed if all alternatives are available.
- If availability varies, add one binary availability column per alternative.
Following these guidelines will ensure that your data is ready for BHATLIB analysis without additional restructuring, enabling a smooth and efficient estimation process.