# sortc¶

## Purpose¶

Sorts a matrix, dataframe or string array.

## Format¶

y = sortc(x[, c])
Parameters
• x (NxK matrix, dataframe or string array.) – data

• c (scalar, column vector or string array.) – Optional input, specifies the column(s) of x to sort on. Default=1.

Returns

y (NxK matrix, dataframe or string array.) – equal to x and sorted on the column(s) represented by c.

## Examples¶

### Sort rows of a matrix based upon first column¶

x = { 4 7 3,
1 3 2,
3 4 8 };

// Sort 'x' based upon the first column
y = sortc(x);


The above example code produces y equal to:

1 3 2
3 4 8
4 7 3


### Sort rows of a dataframe based on multiple columns¶

This example will demonstrate sorting a dataframe on a single column, and then on two combinations of two columns.

First we will load the data and print it out.

// Get file name with full path
fname = getGAUSSHome() $+ "examples/tips2.dta"; // Load 3 variables from dataset tips = loadd(fname, "size + tip + sex"); // Index the first 10 observations tips = tips[1:10,.]; print "original = "; tips;  will print out: original size tip sex 2.0000000 1.0100000 Female 3.0000000 1.6600000 Male 3.0000000 3.5000000 Male 2.0000000 3.3100000 Male 4.0000000 3.6100000 Female 4.0000000 4.7100000 Male 2.0000000 2.0000000 Male 4.0000000 3.1200000 Male 2.0000000 1.9600000 Male 2.0000000 3.2300000 Male  Next we will sort the data based on the first column. We do not need to specify which column to sort on since the first column is the default. print "sorted on 1st column"; sortc(tips);  will print out: sorted on 1st column size tip sex 2.0000000 1.0100000 Female 2.0000000 3.3100000 Male 2.0000000 2.0000000 Male 2.0000000 1.9600000 Male 2.0000000 3.2300000 Male 3.0000000 1.6600000 Male 3.0000000 3.5000000 Male 4.0000000 3.6100000 Female 4.0000000 4.7100000 Male 4.0000000 3.1200000 Male  This time we will sort the tips dataframe on size and then tip. print "sorted on 1st column then 2nd column"; sortc(tips, "size"$| "tip");


will print out the following. Notice that tip column is now sorted for every category of size.

sorted on 1st column then 2nd column

size              tip              sex
2.0000000        1.0100000           Female
2.0000000        1.9600000             Male
2.0000000        2.0000000             Male
2.0000000        3.2300000             Male
2.0000000        3.3100000             Male
3.0000000        1.6600000             Male
3.0000000        3.5000000             Male
4.0000000        3.1200000             Male
4.0000000        3.6100000           Female
4.0000000        4.7100000             Male


Finally, we will reverse the order of the sort variables to make sure that behavior is clear.

print "sorted on 2nd column then 1st column";
sortc(tips, "tip" $| "size");  Notice that this time, all the tip observations are in sequential order. However, since there are no ties in the tip variable, this is the same result we would get if we only sorted on the tip column. sorted on 2nd column then 1st column size tip sex 2.0000000 1.0100000 Female 3.0000000 1.6600000 Male 2.0000000 1.9600000 Male 2.0000000 2.0000000 Male 4.0000000 3.1200000 Male 2.0000000 3.2300000 Male 2.0000000 3.3100000 Male 3.0000000 3.5000000 Male 4.0000000 3.6100000 Female 4.0000000 4.7100000 Male  ### Sorting on categorical variables¶ By default categorical variables are sorted by their underlying key value. We will start by loading our data and taking a sample. // Get file name with full path fname = getGAUSSHome()$+ "examples/tips2.dta";

// Load 2 variables from dataset
tips = loadd(fname, "sex + size");

// Take a repeatable random sample
rndseed 72917;
tips = sampleData(tips, 10);

print tips;

   sex             size
Male        2.0000000
Male        2.0000000
Female        2.0000000
Female        3.0000000
Male        2.0000000
Male        2.0000000
Female        1.0000000
Female        2.0000000
Male        2.0000000
Female        3.0000000


Before we sort this data, let’s get the categorical keys and compare them to the printed labels.

{ label, k } = getcollabels(tips, "sex");


Running the above code will show us the labels and their corresponding keys.

label = Female    k = 0.0000
Male        1.0000


Therefore when we sort the data on the sex variable:

print "sorted on 1st column";
sortc(tips);


we see that Female is first and Male is second. This is because the key for Female is zero, not because Female comes before Male alphabetically. See reordercatlabels() to see how to set the order for the categories.

   sex             size
Female        2.0000000
Female        3.0000000
Female        1.0000000
Female        2.0000000
Female        3.0000000
Male        2.0000000
Male        2.0000000
Male        2.0000000
Male        2.0000000
Male        2.0000000


### Sort rows of a 5x1 string vector¶

// Create a 5x1 string array, using the string
// vertical concatenation operator '$|' letters = "epsilon"$|
"gamma" $| "beta"$|
"alpha" \$|
"delta";

// Sort 'letters'
letters_s = sortc(letters, 1);


The above example code produces, letters_s equal to:

  alpha
beta
delta
epsilon
gamma


## Remarks¶

• These functions will sort the rows of a matrix with respect to a specified column. That is, they will sort the elements of a column and will arrange all rows of the matrix in the same order as the sorted column.

• Missing values will sort as if their value is below $$-\infty$$.

• The sort will be in ascending order.

• This function uses the Quicksort algorithm.

• If you need to obtain the matrix sorted in descending order, you can use:

rev(sortc(x, c))