dfWider#

Purpose#

Converts a GAUSS dataframe in long panel format to wide panel format.

Format#

df_wide = dfWider(df_long, names_from, values_from[, pctl])#

Parameters:

df_long (Dataframe) – A GAUSS dataframe in long panel format.
names_from (String array) – The name(s) of the columns from which the new column names will be created.
values_from (String array) – The values with which to fill the newly created columns.

pctl (Struct) –

An optional pivotControl structure with the following members:

pctl.names_prefix	String, the characters, if any, that should be added to the front of the newly created variable names. Default = “”, no prefix.
pctl.names_sep_combine	String, the characters, if any, that should be added between the tokens when creating the new variable names. Default = “_”. NOTE: This can ONLY be used if names_from contains multiple variable names.
pctl.id_cols	String array, containing the names of the variables that should be used to determine a unique observation. Default = “”, meaning the combination of all variables other than those specified by names_from and values_from will be used.

Returns:

df_wide (Dataframe) – The input data converted to wide form.

Examples#

Example 1#

// Load long form data
fname = getGAUSSHome("examples/eagle_nests_long.csv");
df_long = loadd(fname);

print df_long;

         region                 year            num_nests
        Pacific                 2007               1039.0
        Pacific                 2009               2587.0
      Southwest                 2007                 51.0
      Southwest                 2009                176.0
Rocky Mountains                 2007                200.0
Rocky Mountains                 2009                338.0

// Specify columns to pull new column names from
names_from = "year";

// Specify columns to pull new column values from
values_from = "num_nests";

// Convert to wide form
df_wide = dfWider(df_long, names_from, values_from);

print df_wide;

         region                 2007                 2009
        Pacific               1039.0               2587.0
Rocky Mountains                200.0                338.0
      Southwest                 51.0                176.0

Example 2: Using id_cols and names_prefix#

Let’s continue with the data from the previous example, but add a new variable, report_id.

// Create new report_id variable
report_id = {
    61178,
    73511,
    26219,
    14948,
    67679,
    71635
 };

// Add report_id to the front of df_long
df_long = asdf(report_id, "report_id") ~ df_long;
print df_long;

report_id          region            year       num_nests
       Pacific            2007            1039
       Pacific            2009            2587
     Southwest            2007              51
     Southwest            2009             176
Rocky Mountains            2007             200
Rocky Mountains            2009             338

By default, dfWider will use all variables that are not in either names_from or values_from to uniquely identify the observations. This worked well in our previous example, but with the report_id variable, every observation is considered unique. This results in output that is not very useful.

print dfWider(df_long, "year", "num_nests");

report_id          region            2007            2009
     Southwest               .             176
     Southwest              51               .
       Pacific            1039               .
Rocky Mountains             200               .
Rocky Mountains               .             338
       Pacific               .            2587

We can use the pivotControl structure to tell dfWider() to only use the region variable to uniquely identify the observations. And just to show you how it works, we’ll also add a prefix to our new year variable names.

// Declare 'pctl' to be a pivotControl structure
// and fill with default settings
struct pivotControl pctl;
pctl = pivotControlCreate();

// Specify `region` as id col
pctl.id_cols = "region";

// Specify names prefix
pctl.names_prefix = "year_";

// Pivot data
print dfWider(df_long, "year", "num_nests", pctl);

         region       year_2007       year_2009
        Pacific            1039            2587
Rocky Mountains             200             338
      Southwest              51             176