In the mapping module, the Tab "Measures" allows to define the observation value of interest or the primary measure. In SDMX terminology, this is typically denoted as "OBS_VALUE". As shown in the diagram below, there are four parameters that can be specified and will determine the way the measure is computed:
COUNT
". To operate a summation or an average, it is possible to specify operation as "SUM
" or "MEAN
".
In the case of transcoding aggregate datasets (not microdata), the operation “NONE
” must be selected.
Notice that the weights
variable can be assigned afterwards in Tab "Others", and this will automatically expand the results, so it is not necessary to specify here the expansion operation.
When the operation is performed over an input variable, such like "SUM
", "MEAN
" or "NONE
", the corresponding variable must be assigned. When "COUNTing" the cases, the option to specify the input variable is disabled as no input variable is required (cases will be counted anyway).
TRUE/FALSE
sentences are valid. "Logical Operators" in Toolbar will help you to specify conditions with the most common R operators. For example, the following condition selects male individuals older than 15 years of age, when gender
is a categorical variable with 2 values (1 = male, 2 = female) and age
is an integer variable:
gender == 1 & age > 15
Specify Denominator (+/-)
to specify the quotients between variables. The same filtering rule applies for both the numerator and the denominator.
SMART parses the input SDMX DSD artifacts and decomposes the components of Dimension
(including TimeDimension
). These components will further be presented in the Tab "Dimensions" and all of them must be mapped.
According to the SDMX concept definition, these "dimensions" determine the (output) data set's "physical" structure. There are usually codelists
associated to the "dimensions" which list the possible values the "dimensions" can take.
To map a particular dimension, depending on whether it will take a single key value (constant) or multiple key values (variable), and whether it has codelist or not, there are four different treatments:
The TimeDimension
TIME_PERIOD
does not have codelist, but the input dataset has a variable called Time
which takes multiple values, i.e., 2001,2002,...,2015. In this case, just pick variable Time
.
A micro dataset doesn't contain any variable regarding TimeDimension
, but the user knows this data corresponds to the Household Survey done in 2002. In this case, just enter 2002
in the Textbox to assign such value to the TIME_PERIOD
dimension.
This is a common scenario. For example, a dimension CLASSIF_SEX
has codelist with possible values (SEX_@NA@,SEX_T,SEX_M,SEX_F
) and input data contains a variable called gender
which has two specifications (1
is male, 2
is female). First pick the input variable gender
, and then recode the items accordingly.
Notice that, the mapping can be one-to-one or many-to-one. For the case of many-to-one, for example consider an input dataset which has an integer variable age
and the output dimension CLASSIF_AGE
with a codelist with aggregated age bands, i.e., the aggregate category 10-years age band "15-24" includes all the individuals in the age group from 15 years old to 24 years old. To facilitate the many-to-one mapping, SMART supports wildcard expressions and specific masking for continuous value operators. The rules are defined as follows:
Masking Rules in Many-to-one Mapping
Categorical variable value set
%
---- the masking should start with %
;
---- multiple codes are separated by ;
, e.g., 1;2;3
?
---- stands in for any single character, e.g., %1?
=> 10,11,12,...,19
*
---- stands in for any number of characters, e.g., %*
=> include all
!
---- allows to exclude input values one at a time, e.g., %*!A!C
=> include all except A and C
-
---- only the range input is allowed by using '-' separator. An empty lower-bound value means
"from the beginning" (e.g., "-15
" means "up to 15"); an empty upper-bound value means
"and beyond" (e.g., "65-
" means "65 and more").
"-∞-
" means "To the infinite and beyond" (Buzz Lightyear).
FREQ
(frequencies) has a codelist with multiple values (A
is Annual, D
is Daily, ...). In this case, just select the value A
from the Combobox.
Together with Dimension
, SDMX Attributes
are also presented in the Tab "Attributes" for mapping. Attribute
holds descriptive metadata, which gives additional information about the concepts used but do not affect the dataset structure itself. Just like Dimension
, Attribute
can also be coded or not coded.
The mapping process for Attribute
is basically the same as for Dimension
. The only difference is that in this version, SMART does not allow many-to-one recoding for attributes. That is, in Case 3: Variable with Codelist, any masking for many-to-one mapping is not allowed. On the other hand, assigning an input variable as Attribute
may only occur in the transcoding task.
Parameter 1 ---- The Excel report will mark any value as 'Not available' if the number of observations is less than this parameter, default is 5. Parameter 2 ---- The Excel report will mark any value as 'Unreliable' if the number of observations is less than this parameter, default is 15.
When the mapping is done, all the mapped items can be reviewed in a summarized mapping list, which can be displayed by clicking the button View Mapping in Toolbar.
As the example shown below, the last column "DSD_Dim_Type" denotes the type of entity: STUDYVAR
is Measure; WEIGHT
is Sample Weights; DIM
is Dimension; ATTRI
is Attribute.
The first two columns "DSD_Dim" and "DSD_Dim_Code" specify the concept and code from DSD, and the third and fourth column "EXT_Dim" and "EXT_Dim_Code" specify the variable and recode value from the external input dataset. The remaining two columns "DSD_Attri"
and "DSD_Attri_Code" are only used when extra mapping items are needed to record. For example, in the first row Measure (STUDYVAR
), the filter condition job_holder == 1
was recorded in column DSD_Attri_Code.
For dimensions and attributes, the column "DSD_Attri_Code" also specifies the type of recoding: 1
- dimension/attribute can take multiple values and it doesn't have a codelist; 2
- dimension/attribute takes only a single value and it doesn't have a codelist; 3
- dimension/attribute can take multiple value and it has a codelist; 4
- dimension/attribute takes only a single value and it has a codelist.
The mapping list can be saved in the CSV format for further re-use. Go to Toolbar and click the button Save Mapping to save it. To re-use the mapping, use at anytime the button Upload Mapping . This is a very useful feature as mappings can be partially uploaded from different saved ones. SMART looks at the names of concepts and variables and if they match the mappings are uploaded, being added to other concepts already mapped in memory. If a concept already mapped in memory is found in the mapping file being uploaded, the whole mapping for this concept is overwritten.
Once the mapping is done, you can navigate to the Module Generate by clicking the button . Following steps will guide you to generate different reports:
Click the button Process to run the calculation or click Reset if you want to clean the current runs.
Select any table in the Tables (Indicators) List to view the results in the right-hand side Table Viewer
It is possible to make minor changes in the mapping list and re-run the process. Click the button View Mapping to modify, and then click button Process again. If there is a major change to be done in mapping, then you can go back to Module Mapping and modify in details. This process can be repeated many times and runs will be executed until it reaches the memory limit.
At last, choose the formats you wish to report with for the desired tables and click the button Export. The results are saved under the Export folder. Click Open Export Folder to review.
Instead of downloading the DSDs and have them stored in files to be uploaded into SMART, the tool provides the facility to fetch the DSDs from an SDMX registry on-the-fly (provided internet access is available). In the first Module Datasets and table Structures, click the button Online Query to launch the DSD Online Query Builder.
SMART uses the dataflow
resource to query a list of available dataflows on the selected Registry. Each of the listed dataflows points to a DSD. The users can select any of them and click Load to pull the DSDs directly into the SMART application. If the downloaded DSDs are intended for future off-line use, the users can click "Save..." button to generate an XML local copy of DSDs.
Be default, there are several SDMX registries already configured in SMART DSD Query Builder. In addition, users can add or edit any registry they want in the Query Builder. To do so, click Edit Registry or go to Tools and then "Add/Edit SDMX Registry..." to launch the pop-up editor.
The parameters "Is this .Stat?
" refers to .Stat version 7 platforms (not SDMX v2.1 compatible). SMART is able to correct on-the-fly the non-standard format and consume the web service.