Data cleaning

Size

cols Returns number of columns in a matrix, string array or dataframe.
getdims Returns the number of dimensions in a matrix, string array or n-dimensional array.
getorders Returns the dimensions corresponding to matrix, string array or n-dimensional array.
rows Returns number of rows in a matrix, string array or dataframe.

Selection

delcols Removes variables from a dataframe specified by index or name.
delif Removes rows of data based on a logical expression.
delrows Removes observations (rows) from a dataframe by index.
diag Extracts the diagonal of a matrix.
getmatrix Gets a contiguous matrix from an N-dimensional array.
head Returns the first n rows of a matrix, dataframe or string array.
selif Keeps rows of data based on a logical expression.
submat Extracts a submatrix from a matrix.
subvec Extracts an Nx1 vector of elements from an NxK matrix.
tail Returns the last n rows of a matrix, dataframe or string array.
trimr Trims rows from the top or bottom.

Merging

dfappend Vertically concatenates (or stacks) two dataframes.
innerJoin Performs a left, or full, outer join on two matrices based upon user-specified key columns.
insertcols Inserts one or more new columns into a matrix or dataframe at a specified location.
outerJoin Joins two matrices, or dataframes based upon user-specified key columns, with non-matching rows removed.
where Returns elements from a or b, depending on condition.

Duplicate observations

dropduplicates Drops duplicate observations from data.
getduplicates Identifies duplicate observations and prints report.
isunique Checks if all observations in the matrix or dataframe are unique.
isrowunique Returns a binary vector with a one for every row that is unique, otherwise a zero.

Missing values

impute Replaces missing values in the columns of a matrix by a specified imputation method.
isinfnanmiss Returns true if the argument contains an infinity, NaN, or missing value.
ismiss Returns 1 if matrix has any missing values, 0 otherwise.
miss, missrv Creates a scalar missing value, or converts (or replaces) specified elements in a matrix to GAUSS’s missing value code.
missex Converts numeric values to the missing value code according to the values given in a logical expression.
msym Controls the symbol printed to represent missing values.
packr Deletes the rows of a matrix that contain any missing values.
scalmiss Returns 1 if the input is a scalar missing value.

Searching

between Indicates whether elements in a matrix fall between a specified lower and upper bound.
contains Indicates whether one matrix, multidimensional array or string array contains any elements from another symbol.
counts Returns number of elements of a vector falling in specified ranges.
countwts Returns weighted count of elements of a vector falling in specified ranges.
indexcat Returns indices of elements falling within a specified range.
indnv Checks one numeric vector against another and returns the indices of the elements of the first vector in the second vector.
isempty Checks whether a symbol is an empty matrix.
ismember Checks whether each element of a matrix or string array matches any element from a separate symbol.
maxindc Returns row number of largest element in each column of a matrix.
minindc Returns row number of smallest element in each column of a matrix.
rowcontains Checks whether any element in the row of a matrix or string array matches any element from a separate symbol.

Sorting and set functions

intrsect Returns the intersection of two vectors.
setdif Returns the unique elements in one vector that are not present in a second vector.
sortc Sorts a numeric matrix, character matrix or string array.
sortind, sortindc Returns the sorted index of x.
sortmc Sorts a matrix on multiple columns.
sortr, sortrc Sorts the columns of a matrix of numeric or character data, with respect to a specified row.
union Returns the union of two vectors.
unique Sorts and removes duplicate elements from a vector.
uniqindx Computes the sorted index of x, leaving out duplicate elements.

String and categorical variables

getcollabels Returns the unique set of column labels and corresponding key values for a categorical variable.
recodeCatLabels Replaces the labels in a categorical variable of a dataframe.
reorderCatLabels Changes the order of the labels in a categorical variable of a dataframe.
setBaseCat Sets a specified category to be the base case for a categorical variable.

These functions can be used to fix errors in categorical labels.

strreplace Replaces a substring within a categorical label or string element.
strtof Converts a string or categorical variable of a dataframe to a numeric variable.
strtrim Strips all white space characters from the left and right side of each element in a categorical variable or string array.
strtriml Strips all white space characters from the left side of each element in a categorical variable or string array.
strtrimr Strips all white space characters from the right side of each element in a categorical variable or string array.

Transform

code Allows a new variable to be created (coded) with different values depending upon which one of a set of logical expressions is true.
dfLonger Converts a GAUSS dataframe in wide panel format to long panel format.
dfWider Converts a GAUSS dataframe in long panel format to wide panel format.
diagrv Inserts a vector into the diagonal of a matrix.
dummy Creates a set of dummy (0/1) variables by breaking up a variable into specified categories. The highest (rightmost) category is unbounded on the right.
dummybr Creates a set of dummy (0/1) variables. The highest (rightmost) category is bounded on the right.
dummydn Creates a set of dummy (0/1) variables by breaking up a variable into specified categories. The highest (rightmost) category is unbounded on the right, and a specified column of dummies is dropped.
lagn Lags (or leads) a matrix a specified number of time periods for time series analysis.
lagTrim Lags (or leads) a vector a specified number of time periods and removes the incomplete rows.
maxv Performs an element by element comparison of two matrices and returns the maximum value for each element.
minv Performs an element by element comparison of two matrices and returns the minimum value for each element.
order Reorder a matrix based on user-specified ordering. Relocates columns to the beginning of the dataset in the order in which the variables are specified.
pdDiff Computes time series differences of panel data.
pdLag Computes time series lags of panel data.
reclassify Replaces specified values of a matrix, array or string array
reclassifyCuts Replaces values of a matrix or array within specified ranges
rev Reverses the order of rows of a matrix.
reshape Reshapes a dataframe, matrix or string array to new dimensions.
rotater Rotates the rows of a matrix, wrapping elements as necessary.
shiftc Shifts, lags or leads, columns of a matrix, filling in holes with a specified value.
shiftr Shifts rows of a matrix, filling in holes with a specified value.
subscat Changes the values in a vector depending on the category a particular element falls in.
substute Substitutes new values for old values in a matrix, depending on the outcome of a logical expression.
vec, vecr Stacks columns or rows of a matrix to form a single column.
vech Reshapes the lower triangular portion of a symmetric matrix into a column vector.
xpnd Expands a column vector into a symmetric matrix.

Scaling and normalization

rescale Scales the columns of a matrix using a specified centering and scaling method.