Data cleaning#
Size#
Returns number of columns in a matrix, string array or dataframe. |
|
Returns the number of dimensions in a matrix, string array or n-dimensional array. |
|
Returns the dimensions corresponding to matrix, string array or n-dimensional array. |
|
Returns number of rows in a matrix, string array or dataframe. |
Selection#
Removes variables from a dataframe specified by index or name. |
|
Removes rows of data based on a logical expression. |
|
Removes observations (rows) from a dataframe by index. |
|
Extracts the diagonal of a matrix. |
|
Gets a contiguous matrix from an N-dimensional array. |
|
Returns the first |
|
Keeps rows of data based on a logical expression. |
|
Extracts a submatrix from a matrix. |
|
Extracts an Nx1 vector of elements from an NxK matrix. |
|
Returns the last |
|
Trims rows from the top or bottom. |
Merging#
Vertically concatenates (or stacks) two dataframes. |
|
Performs a left, or full, outer join on two matrices based upon user-specified key columns. |
|
Inserts one or more new columns into a matrix or dataframe at a specified location. |
|
Joins two matrices, or dataframes based upon user-specified key columns, with non-matching rows removed. |
|
Returns elements from |
Duplicate observations#
Drops duplicate observations from data. |
|
Identifies duplicate observations and prints report. |
|
Checks if all observations in the matrix or dataframe are unique. |
|
Returns a binary vector with a one for every row that is unique, otherwise a zero. |
Missing values#
Replaces missing values in the columns of a matrix by a specified imputation method. |
|
Returns true if the argument contains an infinity, NaN, or missing value. |
|
Returns 1 if matrix has any missing values, 0 otherwise. |
|
Creates a scalar missing value, or converts (or replaces) specified elements in a matrix to GAUSS’s missing value code. |
|
Converts numeric values to the missing value code according to the values given in a logical expression. |
|
Controls the symbol printed to represent missing values. |
|
Deletes the rows of a matrix that contain any missing values. |
|
Returns 1 if the input is a scalar missing value. |
Searching#
Indicates whether elements in a matrix fall between a specified lower and upper bound. |
|
Indicates whether one matrix, multidimensional array or string array contains any elements from another symbol. |
|
Returns number of elements of a vector falling in specified ranges. |
|
Returns weighted count of elements of a vector falling in specified ranges. |
|
Returns indices of elements falling within a specified range. |
|
Checks one numeric vector against another and returns the indices of the elements of the first vector in the second vector. |
|
Checks whether a symbol is an empty matrix. |
|
Checks whether each element of a matrix or string array matches any element from a separate symbol. |
|
Returns row number of largest element in each column of a matrix. |
|
Returns row number of smallest element in each column of a matrix. |
|
Checks whether any element in the row of a matrix or string array matches any element from a separate symbol. |
Sorting and set functions#
Returns the intersection of two vectors. |
|
Returns the unique elements in one vector that are not present in a second vector. |
|
Sorts a numeric matrix, character matrix or string array. |
|
Returns the sorted index of x. |
|
Sorts a matrix on multiple columns. |
|
Sorts the columns of a matrix of numeric or character data, with respect to a specified row. |
|
Returns the union of two vectors. |
|
Sorts and removes duplicate elements from a vector. |
|
Computes the sorted index of x, leaving out duplicate elements. |
String and categorical variables#
Returns the unique set of column labels and corresponding key values for a categorical variable. |
|
Replaces the labels in a categorical variable of a dataframe. |
|
Changes the order of the labels in a categorical variable of a dataframe. |
|
Sets a specified category to be the base case for a categorical variable. |
These functions can be used to fix errors in categorical labels.
Replaces a substring within a categorical label or string element. |
|
Converts a string or categorical variable of a dataframe to a numeric variable. |
|
Strips all white space characters from the left and right side of each element in a categorical variable or string array. |
|
Strips all white space characters from the left side of each element in a categorical variable or string array. |
|
Strips all white space characters from the right side of each element in a categorical variable or string array. |
Transform#
Allows a new variable to be created (coded) with different values depending upon which one of a set of logical expressions is true. |
|
Converts a GAUSS dataframe in wide panel format to long panel format. |
|
Converts a GAUSS dataframe in long panel format to wide panel format. |
|
Inserts a vector into the diagonal of a matrix. |
|
Creates a set of dummy (0/1) variables by breaking up a variable into specified categories. The highest (rightmost) category is unbounded on the right. |
|
Creates a set of dummy (0/1) variables. The highest (rightmost) category is bounded on the right. |
|
Creates a set of dummy (0/1) variables by breaking up a variable into specified categories. The highest (rightmost) category is unbounded on the right, and a specified column of dummies is dropped. |
|
Lags (or leads) a matrix a specified number of time periods for time series analysis. |
|
Lags (or leads) a vector a specified number of time periods and removes the incomplete rows. |
|
Performs an element by element comparison of two matrices and returns the maximum value for each element. |
|
Performs an element by element comparison of two matrices and returns the minimum value for each element. |
|
Reorder a matrix based on user-specified ordering. Relocates columns to the beginning of the dataset in the order in which the variables are specified. |
|
Computes time series differences of panel data. |
|
Computes time series lags of panel data. |
|
Replaces specified values of a matrix, array or string array |
|
Replaces values of a matrix or array within specified ranges |
|
Reverses the order of rows of a matrix. |
|
Reshapes a dataframe, matrix or string array to new dimensions. |
|
Rotates the rows of a matrix, wrapping elements as necessary. |
|
Shifts, lags or leads, columns of a matrix, filling in holes with a specified value. |
|
Shifts rows of a matrix, filling in holes with a specified value. |
|
Changes the values in a vector depending on the category a particular element falls in. |
|
Substitutes new values for old values in a matrix, depending on the outcome of a logical expression. |
|
Stacks columns or rows of a matrix to form a single column. |
|
Reshapes the lower triangular portion of a symmetric matrix into a column vector. |
|
Expands a column vector into a symmetric matrix. |
Scaling and normalization#
Scales the columns of a matrix using a specified centering and scaling method. |