open ============================================== Purpose ---------------- Opens an existing GAUSS data file. .. _open: .. index:: open Format ---------------- :: open fh = filename; open fh = filename for mode; open fh = filename for mode varindx ioffs; **Parameters:** :filename: (*literal or ^string*) name of the file on the disk. The name can include a path if the directory to be used is not the current directory. This filename will automatically be given the extension .dat. If an extension is specified, the :file:`.dat` will be overridden. If the file is an :file:`.fmt` matrix file, the extension must be explicitly given. If the name of the file is to be taken from a string variable, the name of the string must be preceded by the ``^`` (caret) operator. :mode: (*literal*) the modes supported with the optional for subcommand are: .. csv-table:: :widths: auto "read", "This is the default file opening mode and will be the one used if none is specified. Files opened in this mode cannot bewritten to. The pointer is set to the beginning of the file and the :func:`writer` function is disabled for files opened in this way. This is the only mode available for matrix files (:file:`.fmt`), which are always written in one piece with the `save` command." "append", "Files opened in this mode cannot be read. The pointer will be set to the end of the file so that a subsequent write to the file with the :func:`writer` function will add data to the end of the file without overwriting any of the existing data in the file. The :func:`readr` function is disabled for files opened in this way. This mode is used to add additional rows to the end of a file." "update", "Files opened in this mode can be read from and written to. The pointer will be set to the beginning of the file. This mode is used to make changes in a file." :offs: (*scalar*) offset added to "index variables" The optional *varindxi* subcommand tells GAUSS to create a set of global scalars that contain the index (column position) of the variables in a GAUSS data file. These "index variables" will have the same names as the corresponding variables in the data file but with ``i`` added as a prefix. They can be used inside index brackets, and with functions like submat to access specific columns of a matrix without having to remember the column position. The optional *offs* argument is an offset that will be added to the index variables. This is useful if data from multiple files are concatenated horizontally in one matrix. It can be any scalar expression. The default is 0. The index variables are useful for creating submatrices of specific variables without requiring that the positions of the variables be known. For instance, if there are two variables, *xvar* and *yvar* in the dataset, the index variables will have the names *ixvar*, *iyvar*. If *xvar* is the first column in the data file, and *yvar* is the second, and if no offset, *offs*, has been specified, then *ixvar* and *iyvar* will equal ``1`` and ``2`` respectively. If an offset of 3 had been specified, then these variables would be assigned the values ``4`` and ``5`` respectively. The *varindxi* option cannot be used with :file:`.fmt` matrix files because no column names are stored with them. If *varindxi* is used, GAUSS will ignore the Undefined symbol error for global symbols that start with ``i``. This makes it much more convenient to use index variables because they don't have to be cleared before they are accessed in the program. Clearing is otherwise necessary because the index variables do not exist until execution time when the data file is actually opened and the names are read in from the header of the file. At compile time a statement like: ``y = x[., ixvar];`` will be illegal if the compiler has never heard of *ixvar*. If *varindxi* is used, this error will be ignored for symbols beginning with ``i``. Any symbols that are accessed before they have been initialized with a real value will be trapped at execution time with a Variable not initialized error. **Returns:** :fh: (*scalar*), file handle. *fh* is the file handle which will be used by most commands to refer to the file within GAUSS. This file handle is actually a scalar containing an integer value that uniquely identifies each file. This value is assigned by GAUSS when the `open` command is executed. If the file was not successfully opened, the file handle will be set to -1. Examples ---------------- :: // Filename fname = "/data/rawdat"; // Open file for reading open dt = ^fname for append; // Error if no file found if dt == -1; print "File not found"; end; endif; y = writer(dt, x); if y /= rows(x); print "Disk Full"; end; endif; dt = close(dt); In the example above, the existing dataset :file:`/data/rawdat.dat` is opened for appending new data. The name of the file is in the string variable *fname*. In this example the file handle is tested to see if the file was opened successfully. The matrix *x* is written to this dataset. The number of columns in *x* must be the same as the number of columns in the existing dataset. The first row in *x* will be placed after the last row in the existing dataset. The :func:`writer` function will return the number of rows actually written. If this does not equal the number of rows that were attempted, then the disk is probably full. :: open fin = mydata for read; open fout = mydata for update; do until eof(fin); x = readr(fin, 100); x[., 1 3] = ln(x[. ,1 3]; call writer(fout, x); endo; closeall fin,fout; In the above example, the same file, :file:`mydata.dat`, is opened twice with two different file handles. It is opened for read with the handle *fin*, and it is opened for update with the handle *fout*. This will allow the file to be transformed in place without taking up the extra space necessary for a separate output file. Notice that *fin* is used as the input handle and *fout* is used as the output handle. The loop will terminate as soon as the input handle has reached the end of the file. Inside the loop the file is read into a matrix called *x* using the input handle, the data are transformed (columns 1 and 3 are replaced with their natural logs), and the transformed data is written back out using the output handle. This type of operation works fine as long as the total number of rows and columns does not change. The following example assumes a data file named :file:`dat1.dat` that has the variables: *visc*, *temp*, *lub*, and *rpm*: :: open f1 = dat1 varindxi; dtx = readr(f1, 100); x = dtx[., irpm ilub ivisc]; y = dtx[., itemp]; call seekr(f1, 1); In this example, the dataset :file:`dat1.dat` is opened for reading (the :file:`.dat` and the ``for read`` are implicit). *varindxi* is specified with no constant. Thus, index variables are created that give the positions of the variables in the dataset. The first 100 rows of the dataset are read into the matrix *dtx*. Then, specified variables in a specified order are assigned to the matrices *x* and *y* using the index variables. The last line uses the :func:`seekr` function to reset the pointer to the beginning of the file. :: open q1 = dat1 varindx; open q2 = dat2 varindx colsf(q1); nr = 100; y = readr(q1, nr)~readr(q2, nr); closeall q1,q2; In this example, two data sets are opened for reading and index variables are created for each. A constant is added to the indices for the second dataset (*q2*), equal to the number of variables (columns) in the first dataset (*q1*). Thus, if there are three variables *x1*, *x2*, *x3* in *q1*, and three variables *y1*, *y2*, *y3* in *q2*, the index variables that were created when the files were opened would be *ix1*, *ix2*, *ix3*, *iy1*, *iy2*, *iy3*. The values of these index variables would be 1, 2, 3, 4, 5, 6, respectively. The first 100 rows of the two data sets are read in and concatenated to produce the matrix *y*. The index variables will thus give the correct positions of the variables in *y*. :: open fx = x.fmt; rf = rowsf(fx); sampsize = round(rf*0.1); rndsmpx = zeros(sampsize, colsf(fx)); for(1, sampsize, 1); r = ceil(rndu(1, 1)*rf); call seekr(fx, r); rndsmpx[i, .] = readr(fx, 1); endfor; fx = close(fx); In this example, a 10% random sample of rows is drawn from the matrix file :file:`x.fmt` and put into the matrix *rndsmpx*. Note that the extension :file:`.fmt` must be specified explicitly in the `open` statement. The :func:`rowsf` command is used to obtain the number of rows in :file:`x.fmt`. This number is multiplied by 0.10 and the result is rounded to the nearest integer; this yields the desired sample size. Then random integers (*r*) in the range 1 to *rf* are generated. :func:`seekr` is used to locate to the appropriate row in the matrix, and the row is read with :func:`readr` and placed in the matrix *rndsmpx*. This is continued until the complete sample has been obtained. Remarks ------- The file must exist before it can be opened with the `open` command. To create a new file, see `create` or `save`. A file can be opened simultaneously under more than one handle. See the second example following. If the value that is in the file handle when the `open` command begins to execute matches that of an already open file, the process will be aborted and a ``File already open`` message will be given. This gives you some protection against opening a second file with the same handle as a currently open file. If this happens, you would no longer be able to access the first file. It is important to set unused file handles to zero because both `open` and `create` check the value that is in a file handle to see if it matches that of an open file before they proceed with the process of opening a file. This should be done with `close` or `closeall`. .. seealso:: Functions :func:`dataopen`, `create`, `close`, `closeall`, :func:`readr`, :func:`writer`, :func:`seekr`, :func:`eof`