Opal Commands Cookbook

Skip to end of metadata
Go to start of metadata

Overview


General

How do I know which units/datasources/tables are available ?

Use the show command.

How can I delete a datasource or a table or a variable ?

Currently, you can't do it using an Opal command.

You can only make a datasource not visible, by changing the Datasources Configuration.

How to import data from A to Z ?

  1. Identify the functional unit which is the source of data,
  2. Make sure the corresponding functional unit is declared in the system by running the show --units command,
  3. If the file is too large to be imported, see How do I import a large file ?,
  4. If you want to check the imported variable catalogue, see Where is the variable catalogue that is imported ?,
  5. Run the import command with the appropriate --unit option:
    • either provide the file to be imported as arguments of the import command,
    • or (recommended) copy the files into the unit's directory which is located in the units directory (see Where is the unit's directory ?).
  6. If the --archive was not used when running the import command, archive your files wherever you want when import is completed. Do not leave the imported files in the unit's directory otherwise Opal will attempt to import them next time.

Import

Where is the variable catalogue that is imported ?

The variables description is embedded in the imported file. If the file is encrypted, use the decrypt command to decrypt it and open it. Each directory is a table, and in each of them there is a variables.xml file. Currently there is no friendlier way to access to a variable catalogue before it has been imported. Once imported, see How can I see the variable catalogue ?.

How does Opal handles variable versions ?

There are no variable versions. Instead Opal ensures the consistency of the variables and categories during the importation process:

  • variables:
    • if a new variable appears in the imported catalogue, a corresponding new variable is created in Opal,
    • if a variable is no more in the imported catalogue, the corresponding variable in Opal remains untouched (there just won't be new values for it),
    • if the value type of a variable changes, the importation process fails.
  • categories
    • if a variable from the imported catalogue has a new category, a corresponding category is added to the Opal variable,
    • if a variable from the imported catalogue has no more a category, the corresponding category remains untouched in the Opal variable.
  • attributes:
    • if a variable/category from the imported catalogue has a new attribute, a corresponding attribute is added to the Opal variable/category,
    • if a variable/category from the imported catalogue has a new value for an attribute, the corresponding attribute value is updated in the Opal variable/category,
    • if a variable/category from the imported catalogue has no more an attribute, the corresponding attribute remains untouched in the Opal variable/category.

To explain why there are no more values for a variable, it is recommended to have another variable that tells the version of the imported catalogue. For instance Onyx provides for each questionnaire a variable named 'QuestionnaireRun.version' that tells the version of questionnaire administered to the participant. There is also a variable named 'Admin.onyxVersion' that tells the version of Onyx used to collect the participant data.

How do I import a large file ?

Importation process is done in one transaction. This means that if import fails for any reason (corrupted file, out of memory on server side etc.) the entire process will be rolled back. This also means that if the file to import is large, it will be high memory-consuming on server side (due to the storage of transaction), increasing the process time a lot and increasing the risk of getting short of memory.

So in the case of a large file (i.e. containing a lot of participants or a lot of data per participants), you should split the file in smaller pieces using the split command. This will ensure you of:

  • faster import process: the sum of processing times of the parts is smaller than the processing time of the whole,
  • safer: no out of memory on server side,
  • no useless roll-back: import as much participant as possible.

Where is the unit's directory ?

On server side you might want to access to the unit's directory for copying files to be imported and having access to the ones that where archived by the import command.

Opal defines a File System Root. Units data are stored and handled relatively from the <opal-file-system-root>/units/<unit-name> directory.

Copy

Can I change variable names ?

It is not recommended to change a variable name while there are still data to be imported in the corresponding datasource. This would result in the re-creation of the original variable when a new import process happens. So you might want to not interfere with the import process.

If you want to reorganize your variables, the copy command should be used. For the purpose of renaming a variable, use the --transform option.

How to split a table into smaller tables ?

For any reasons, a table has too many variables in it, making it hard to process (to many columns in a SQL tables or in a Excel sheet for instance). Two solutions are offered to you:

  1. Design a view that is a subset of the original table: using the select statement, you can design as many views that match your needs. The advantage of the views is that there is no copy of data and therefore are always up-to-date while the import process is going on.
  2. Copy the table with a multiplex option, allowing to define destination tables dynamically and to dispatch each variable in one of them. The problem of this solution is that it involves data copy, and the same copy should be performed if the original table is updated.

How can I see the variable catalogue ?

How can I check the variable catalogue before copying data ?

There is no variable catalogue, there are as much variable catalogues you are interested in. Currently the friendlier way of viewing the variables for a user is to do a copy into an Excel file using the --no-values option.

What can I do for handling the limits of the SQL or Excel datasources ?

SQL and Excel datasources have strong limitations in terms of:

  • variables/tables naming:
    • SQL:
      • not allowed characters for names: ,.-'"/\<>
      • limit of 64 characters for a table name (see MySQL limits).
    • Excel:
      • not allowed characters for a sheet title (which is the table name): /\:*?
      • limit of 30 characters for a sheet title.
  • maximum number of variables in a table:
    • SQL: maximum number of columns (and also maximum total size for column header names) in a SQL table,
    • Excel: maximum number of columns in a Excel sheet, depends on Excel version.
  • maximum number of value sets in a table:
    • SQL: none,
    • Excel: maximum number of rows in a sheet, depends on Excel version.

For Excel datasource, preferably use ".xlsx" file suffix, the corresponding Excel format has higher limits.

Use views to limit the number of rows and change the table names.

Use the copy command options --multiplex and --transform to reduce the number of variables in a table and to change the variable names. The right scripts for these options may be difficult to write:

Search Opal Documentation
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.