sortc#
Purpose#
Sorts a matrix, dataframe or string array.
Format#
- y = sortc(x[, c])#
- Parameters:
x (NxK matrix, dataframe or string array.) – data
c (scalar, column vector or string array.) – Optional input, specifies the column(s) of x to sort on. Default=1.
- Returns:
y (NxK matrix, dataframe or string array.) – equal to x and sorted on the column(s) represented by c.
Examples#
Sort rows of a matrix based upon first column#
x = { 4 7 3,
1 3 2,
3 4 8 };
// Sort 'x' based upon the first column
y = sortc(x);
The above example code produces y equal to:
1 3 2
3 4 8
4 7 3
Sort rows of a dataframe based on multiple columns#
This example will demonstrate sorting a dataframe on a single column, and then on two combinations of two columns.
First we will load the data and print it out.
// Get file name with full path
fname = getGAUSSHome("examples/tips2.dta");
// Load 3 variables from dataset
tips = loadd(fname, "size + tip + sex");
// Index the first 10 observations
tips = tips[1:10,.];
print "original = ";
tips;
will print out:
original
size tip sex
2.0000000 1.0100000 Female
3.0000000 1.6600000 Male
3.0000000 3.5000000 Male
2.0000000 3.3100000 Male
4.0000000 3.6100000 Female
4.0000000 4.7100000 Male
2.0000000 2.0000000 Male
4.0000000 3.1200000 Male
2.0000000 1.9600000 Male
2.0000000 3.2300000 Male
Next we will sort the data based on the first column. We do not need to specify which column to sort on since the first column is the default.
print "sorted on 1st column";
sortc(tips);
will print out:
sorted on 1st column
size tip sex
2.0000000 1.0100000 Female
2.0000000 3.3100000 Male
2.0000000 2.0000000 Male
2.0000000 1.9600000 Male
2.0000000 3.2300000 Male
3.0000000 1.6600000 Male
3.0000000 3.5000000 Male
4.0000000 3.6100000 Female
4.0000000 4.7100000 Male
4.0000000 3.1200000 Male
This time we will sort the tips
dataframe on size
and then tip
.
print "sorted on 1st column then 2nd column";
sortc(tips, "size" $| "tip");
will print out the following. Notice that tip
column is now sorted for every category of size
.
sorted on 1st column then 2nd column
size tip sex
2.0000000 1.0100000 Female
2.0000000 1.9600000 Male
2.0000000 2.0000000 Male
2.0000000 3.2300000 Male
2.0000000 3.3100000 Male
3.0000000 1.6600000 Male
3.0000000 3.5000000 Male
4.0000000 3.1200000 Male
4.0000000 3.6100000 Female
4.0000000 4.7100000 Male
Finally, we will reverse the order of the sort variables to make sure that behavior is clear.
print "sorted on 2nd column then 1st column";
sortc(tips, "tip" $| "size");
Notice that this time, all the tip
observations are in sequential order. However, since there are no ties in the tip
variable, this is the same result we would get if we only sorted on the tip
column.
sorted on 2nd column then 1st column
size tip sex
2.0000000 1.0100000 Female
3.0000000 1.6600000 Male
2.0000000 1.9600000 Male
2.0000000 2.0000000 Male
4.0000000 3.1200000 Male
2.0000000 3.2300000 Male
2.0000000 3.3100000 Male
3.0000000 3.5000000 Male
4.0000000 3.6100000 Female
4.0000000 4.7100000 Male
Sorting on categorical variables#
By default categorical variables are sorted by their underlying key value. We will start by loading our data and taking a sample.
// Get file name with full path
fname = getGAUSSHome("examples/tips2.dta");
// Load 2 variables from dataset
tips = loadd(fname, "sex + size");
// Take a repeatable random sample
rndseed 72917;
tips = sampleData(tips, 10);
print tips;
sex size
Male 2.0000000
Male 2.0000000
Female 2.0000000
Female 3.0000000
Male 2.0000000
Male 2.0000000
Female 1.0000000
Female 2.0000000
Male 2.0000000
Female 3.0000000
Before we sort this data, let’s get the categorical keys and compare them to the printed labels.
{ label, k } = getcollabels(tips, "sex");
Running the above code will show us the labels and their corresponding keys.
label = Female k = 0.0000
Male 1.0000
Therefore when we sort the data on the sex
variable:
print "sorted on 1st column";
sortc(tips);
we see that Female
is first and Male
is second. This is because the key for Female
is zero, not because Female
comes before Male
alphabetically. See reordercatlabels()
to see how to set the order for the categories.
sex size
Female 2.0000000
Female 3.0000000
Female 1.0000000
Female 2.0000000
Female 3.0000000
Male 2.0000000
Male 2.0000000
Male 2.0000000
Male 2.0000000
Male 2.0000000
Sort rows of a 5x1 string vector#
// Create a 5x1 string array, using the string
// vertical concatenation operator '$|'
letters = "epsilon" $|
"gamma" $|
"beta" $|
"alpha" $|
"delta";
// Sort 'letters'
letters_s = sortc(letters, 1);
The above example code produces, letters_s equal to:
alpha
beta
delta
epsilon
gamma
Remarks#
These functions will sort the rows of a matrix with respect to a specified column. That is, they will sort the elements of a column and will arrange all rows of the matrix in the same order as the sorted column.
Missing values will sort as if their value is below \(-\infty\).
The sort will be in ascending order.
This function uses the Quicksort algorithm.
If you need to obtain the matrix sorted in descending order, you can use:
rev(sortc(x, c))
See also
Functions getcollabels()
, reordercatlabels()
, rev()
, sortind()
, unique()