Data cleaning#

Size#

cols

Returns number of columns in a matrix, string array or dataframe.

getdims

Returns the number of dimensions in a matrix, string array or n-dimensional array.

getorders

Returns the dimensions corresponding to matrix, string array or n-dimensional array.

rows

Returns number of rows in a matrix, string array or dataframe.

Selection#

delcols

Removes variables from a dataframe specified by index or name.

delif

Removes rows of data based on a logical expression.

delrows

Removes observations (rows) from a dataframe by index.

diag

Extracts the diagonal of a matrix.

getmatrix

Gets a contiguous matrix from an N-dimensional array.

head

Returns the first n rows of a matrix, dataframe or string array.

selif

Keeps rows of data based on a logical expression.

submat

Extracts a submatrix from a matrix.

subvec

Extracts an Nx1 vector of elements from an NxK matrix.

tail

Returns the last n rows of a matrix, dataframe or string array.

trimr

Trims rows from the top or bottom.

Merging#

dfappend

Vertically concatenates (or stacks) two dataframes.

innerJoin

Performs a left, or full, outer join on two matrices based upon user-specified key columns.

insertcols

Inserts one or more new columns into a matrix or dataframe at a specified location.

outerJoin

Joins two matrices, or dataframes based upon user-specified key columns, with non-matching rows removed.

where

Returns elements from a or b, depending on condition.

Duplicate observations#

dropduplicates

Drops duplicate observations from data.

getduplicates

Identifies duplicate observations and prints report.

isunique

Checks if all observations in the matrix or dataframe are unique.

isrowunique

Returns a binary vector with a one for every row that is unique, otherwise a zero.

Missing values#

impute

Replaces missing values in the columns of a matrix by a specified imputation method.

isinfnanmiss

Returns true if the argument contains an infinity, NaN, or missing value.

ismiss

Returns 1 if matrix has any missing values, 0 otherwise.

miss, missrv

Creates a scalar missing value, or converts (or replaces) specified elements in a matrix to GAUSS’s missing value code.

missex

Converts numeric values to the missing value code according to the values given in a logical expression.

msym

Controls the symbol printed to represent missing values.

packr

Deletes the rows of a matrix that contain any missing values.

scalmiss

Returns 1 if the input is a scalar missing value.

Searching#

between

Indicates whether elements in a matrix fall between a specified lower and upper bound.

contains

Indicates whether one matrix, multidimensional array or string array contains any elements from another symbol.

counts

Returns number of elements of a vector falling in specified ranges.

countwts

Returns weighted count of elements of a vector falling in specified ranges.

indexcat

Returns indices of elements falling within a specified range.

indnv

Checks one numeric vector against another and returns the indices of the elements of the first vector in the second vector.

isempty

Checks whether a symbol is an empty matrix.

ismember

Checks whether each element of a matrix or string array matches any element from a separate symbol.

maxindc

Returns row number of largest element in each column of a matrix.

minindc

Returns row number of smallest element in each column of a matrix.

rowcontains

Checks whether any element in the row of a matrix or string array matches any element from a separate symbol.

Sorting and set functions#

intrsect

Returns the intersection of two vectors.

setdif

Returns the unique elements in one vector that are not present in a second vector.

sortc

Sorts a numeric matrix, character matrix or string array.

sortind, sortindc

Returns the sorted index of x.

sortmc

Sorts a matrix on multiple columns.

sortr, sortrc

Sorts the columns of a matrix of numeric or character data, with respect to a specified row.

union

Returns the union of two vectors.

unique

Sorts and removes duplicate elements from a vector.

uniqindx

Computes the sorted index of x, leaving out duplicate elements.

String and categorical variables#

getcollabels

Returns the unique set of column labels and corresponding key values for a categorical variable.

recodeCatLabels

Replaces the labels in a categorical variable of a dataframe.

reorderCatLabels

Changes the order of the labels in a categorical variable of a dataframe.

setBaseCat

Sets a specified category to be the base case for a categorical variable.

These functions can be used to fix errors in categorical labels.

strreplace

Replaces a substring within a categorical label or string element.

strtof

Converts a string or categorical variable of a dataframe to a numeric variable.

strtrim

Strips all white space characters from the left and right side of each element in a categorical variable or string array.

strtriml

Strips all white space characters from the left side of each element in a categorical variable or string array.

strtrimr

Strips all white space characters from the right side of each element in a categorical variable or string array.

Transform#

code

Allows a new variable to be created (coded) with different values depending upon which one of a set of logical expressions is true.

dfLonger

Converts a GAUSS dataframe in wide panel format to long panel format.

dfWider

Converts a GAUSS dataframe in long panel format to wide panel format.

diagrv

Inserts a vector into the diagonal of a matrix.

dummy

Creates a set of dummy (0/1) variables by breaking up a variable into specified categories. The highest (rightmost) category is unbounded on the right.

dummybr

Creates a set of dummy (0/1) variables. The highest (rightmost) category is bounded on the right.

dummydn

Creates a set of dummy (0/1) variables by breaking up a variable into specified categories. The highest (rightmost) category is unbounded on the right, and a specified column of dummies is dropped.

lagn

Lags (or leads) a matrix a specified number of time periods for time series analysis.

lagTrim

Lags (or leads) a vector a specified number of time periods and removes the incomplete rows.

maxv

Performs an element by element comparison of two matrices and returns the maximum value for each element.

minv

Performs an element by element comparison of two matrices and returns the minimum value for each element.

order

Reorder a matrix based on user-specified ordering. Relocates columns to the beginning of the dataset in the order in which the variables are specified.

pdDiff

Computes time series differences of panel data.

pdLag

Computes time series lags of panel data.

reclassify

Replaces specified values of a matrix, array or string array

reclassifyCuts

Replaces values of a matrix or array within specified ranges

rev

Reverses the order of rows of a matrix.

reshape

Reshapes a dataframe, matrix or string array to new dimensions.

rotater

Rotates the rows of a matrix, wrapping elements as necessary.

shiftc

Shifts, lags or leads, columns of a matrix, filling in holes with a specified value.

shiftr

Shifts rows of a matrix, filling in holes with a specified value.

subscat

Changes the values in a vector depending on the category a particular element falls in.

substute

Substitutes new values for old values in a matrix, depending on the outcome of a logical expression.

vec, vecr

Stacks columns or rows of a matrix to form a single column.

vech

Reshapes the lower triangular portion of a symmetric matrix into a column vector.

xpnd

Expands a column vector into a symmetric matrix.

Scaling and normalization#

rescale

Scales the columns of a matrix using a specified centering and scaling method.