Difference between revisions of "Write and read data with PIDX"

From
Jump to: navigation, search
(How to read)
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
 
== Introduction ==
 
== Introduction ==
  
In these examples we describe how to write and read data using the PIDX library.
+
In these examples we describe how to write and read data using the PIDX library. We consider a global 3D regular grid domain that we will call global domain (g). This global domain represents the grid space where all the data resides.
We consider a global 3D regular grid domain that we will call global domain (g).
 
This global domain represents the grid space where all the data resides.
 
 
 
In a parallel environment each core (e.g. MPI rank) owns a portion of the data
 
that has to be written on the disk. We refer to this portion of the domain as
 
local domain (l).
 
 
 
In this example we want to write in parallel data from a number of cores (N).
 
For simplicity we define N local domains of equal size
 
(i.e. `calculate_per_process_offsets()`). In a real case the local domains
 
can have arbitrary size.
 
  
The data used for these examples are generated by a function
+
In a parallel environment each core (e.g. MPI rank) owns a portion of the data that has to be written on the disk. We refer to this portion of the domain as local domain (l).
(i.e. `create_synthetic_simulation_data()`) that fills the global grid with
 
a gradient.
 
  
PIDX is an library that uses MPI, as such we need first of all to initialize
+
In this example we want to write in parallel data from a number of cores (N). For simplicity we define N local domains of equal size (i.e. <code>calculate_per_process_offsets()</code>). In a real case the local domains can have arbitrary size.
MPI and get the some useful information (e.g. current rank and processors count).
 
In this example is performed by the `init_mpi()` function.
 
  
If you want to write and read data in PIDX you just need to follow a few simple steps:
+
The data used for these examples are generated by a function (i.e. <code>create_synthetic_simulation_data()</code>) that fills the global grid with a gradient.
- identify the per-process local domain
 
- create a `PIDX_access`, it defines how you want to access the data (serial or parallel)
 
- create or access a `PIDX_file`, it defines the dataset
 
- define or read one or more `PIDX_variable`
 
- write/read the variables
 
  
== How to write ==
+
PIDX is an library that uses MPI, as such we need first of all to initialize MPI and get the some useful information (e.g. current rank and processors count). In this example is performed by the <code>init_mpi()</code> function.
  
The data that we want to write from each processor can be considered as a _box_
+
If you want to write and read data in PIDX you just need to follow a few simple steps: - identify the per-process local domain - create a <code>PIDX_access</code>, it defines how you want to access the data (serial or parallel) - create or access a <code>PIDX_file</code>, it defines the dataset - define or read one or more <code>PIDX_variable</code> - write/read the variables
in the global domain. This _box_ is represented by and _offset_ (i.e. the position
 
in the global domain) and a _size_ (i.e. the number of elements in the box).
 
  
PIDX uses a _struct_ called `PIDX_point` to represent a value in n-dimensional space.
+
== How to write ==
We can use this _struct_ to **define the global domain size, and the local box information**.
 
   
 
    PIDX_set_point(global_size, global_box_size[X], global_box_size[Y], global_box_size[Z]);
 
    PIDX_set_point(local_offset, local_box_offset[X], local_box_offset[Y], local_box_offset[Z]);
 
    PIDX_set_point(local_size, local_box_size[X], local_box_size[Y], local_box_size[Z]);
 
  
The next step is to **create an access** to tell PIDX how we want to access the data, in  
+
The data that we want to write from each processor can be considered as a ''box'' in the global domain. This ''box'' is represented by and ''offset'' (i.e. the position in the global domain) and a ''size'' (i.e. the number of elements in the box).
this case in parallel. We can do this as following:
 
  
    PIDX_create_access(&p_access);
+
PIDX uses a ''struct'' called <code>PIDX_point</code> to represent a value in n-dimensional space. We can use this ''struct'' to '''define the global domain size, and the local box information'''.
    PIDX_set_mpi_access(p_access, MPI_COMM_WORLD);
 
  
Now it is time to **create the file** using:
+
<syntaxhighlight lang="C++">
 +
PIDX_set_point(global_size, global_box_size[X], global_box_size[Y], global_box_size[Z]);
 +
PIDX_set_point(local_offset, local_box_offset[X], local_box_offset[Y], local_box_offset[Z]);
 +
PIDX_set_point(local_size, local_box_size[X], local_box_size[Y], local_box_size[Z]);
 +
</syntaxhighlight>
  
    PIDX_file file;
+
The next step is to '''create an access''' to tell PIDX how we want to access the data, in this case in parallel. We can do this as following:
    PIDX_file_create(output_file_name, PIDX_MODE_CREATE, p_access, global_size, &file);
 
  
The parameters that we use are:
+
<syntaxhighlight lang="C++">
- _output_file_name_, the name of the file that we want to create
+
PIDX_create_access(&p_access);
- _file_access_mode_, in this case we create the file
+
PIDX_set_mpi_access(p_access, MPI_COMM_WORLD);
- _pidx_access_, the access configuration that we defined
+
</syntaxhighlight>
- _global_size_, the global domain size as we defined
 
- _file_, the PIDX_file struct that we are creating
 
  
Other general settings that we want to do are **setting the current timestep**
+
Now it is time to '''create the file''' using:
(i.e. logical time)using:
 
  
    PIDX_set_current_time_step(file, ts);
+
<syntaxhighlight lang="C++">
 +
PIDX_file file;
 +
PIDX_file_create(output_file_name, PIDX_MODE_CREATE, p_access, global_size, &amp;file);
 +
</syntaxhighlight>
  
and the current **number of variables** that we want to write, as following:
+
The parameters that we use are: - ''output_file_name'', the name of the file that we want to create - ''file_access_mode'', in this case we create the file - ''pidx_access'', the access configuration that we defined - ''global_size'', the global domain size as we defined - ''file'', the PIDX_file struct that we are creating
  
    PIDX_set_variable_count(file, variable_count);
+
Other general settings that we want to do are '''setting the current timestep''' (i.e. logical time)using:
  
We are now ready to define these variables. A variable in PIDX is defined by:
+
<syntaxhighlight lang="C++">
- a name
+
PIDX_set_current_time_step(file, ts);
- a number of bits, e.g. 32 for a 32 bit integer variable
+
</syntaxhighlight>
- a typename, e.g. "1*int32"
+
and the current '''number of variables''' that we want to write, as following:
  
The number of bits is given by the product of number of bits per value
+
<syntaxhighlight lang="C++">PIDX_set_variable_count(file, variable_count);</syntaxhighlight>
(e.g. an _int_ value has 32 bits) and values per sample
+
We are now ready to define these variables. A variable in PIDX is defined by: - a name - a number of bits, e.g. 32 for a 32 bit integer variable - a typename, e.g. &quot;1*int32&quot;
(e.g. a vector of _float_ with 3 components has 3 values per sample).
 
  
With these information we can **create our variables** as following:
+
The number of bits is given by the product of number of bits per value (e.g. an ''int'' value has 32 bits) and values per sample (e.g. a vector of ''float'' with 3 components has 3 values per sample).
  
    PIDX_variable_create(var_name[var], bpv[var] * vps[var], type_name[var], &variable[var]);
+
With these information we can '''create our variables''' as following:
  
where _variable_ is an array of `PIDX_variable`.
+
<syntaxhighlight lang="C++">
 +
PIDX_variable_create(var_name[var], bpv[var] * vps[var], type_name[var], &amp;variable[var]);
 +
</syntaxhighlight>
 +
where ''variable'' is an array of <code>PIDX_variable</code>.
  
We want to tell to PIDX where this variable's data are in memory in the  
+
We want to tell to PIDX where this variable's data are in memory in the per-process '''local domain''', we do this as following:
per-process **local domain**, we do this as following:
 
  
    PIDX_variable_write_data_layout(variable[var], local_offset, local_size, data[var], PIDX_row_major);
+
<syntaxhighlight lang="C++">
 +
PIDX_variable_write_data_layout(variable[var], local_offset, local_size, data[var], PIDX_row_major);
 +
</syntaxhighlight>
  
where _data_ is a _char_ pointer to the data array (of the specific variable) and the  
+
where ''data'' is a ''char'' pointer to the data array (of the specific variable) and the last parameter (<code>PIDX_row_major</code>) indicates the order of the data in this array (e.g. row major).
last parameter (`PIDX_row_major`) indicates the order of the data in this array  
 
(e.g. row major).
 
  
Finally we can tell PIDX to submit this **variable for the parallel write**,  
+
Finally we can tell PIDX to submit this '''variable for the parallel write''', we do this using:
we do this using:
 
  
    PIDX_append_and_write_variable(file, variable[var]);
+
<syntaxhighlight lang="C++">
 +
PIDX_append_and_write_variable(file, variable[var]);</syntaxhighlight>
 +
For performance improvement PIDX tries to gather data from multiple variables and dump them to disk at once only when the function <code>PIDX_close(file)</code> is called.
  
For performance improvement PIDX tries to gather data from multiple variables  
+
In some cases it could be useful to explicitly control when PIDX has to dump a number of variables on the disk. This can be done using the function <code>PIDX_flush()</code>.
and dump them to disk at once only when the function `PIDX_close(file)` is
 
called.
 
  
In some cases it could be useful to explicitly control when PIDX has to dump a
+
Example source: [https://github.com/sci-visus/PIDX/blob/master/examples/basic-io/idx_write.c idx_write.c]
number of variables on the disk. This can be done using the function `PIDX_flush()`.
 
 
 
Example source: [idx_write.c](https://github.com/sci-visus/PIDX/blob/master/examples/basic-io/idx_write.c)
 
  
 
== How to read ==
 
== How to read ==
  
To read data written with PIDX we follow the same initialization process to create the  
+
To read data written with PIDX we follow the same initialization process to create the <code>PIDX_access</code> and <code>PIDX_file</code> that we saw in the previous section.
`PIDX_access` and `PIDX_file` that we saw in the previous section.
 
  
After the initialization the first step is to know how many variables are in the dataset,
+
After the initialization the first step is to know how many variables are in the dataset, we can do this using:
we can do this using:
 
  
    PIDX_get_variable_count(file, &variable_count);
+
<syntaxhighlight lang="C++">
 +
PIDX_get_variable_count(file, &amp;variable_count);
 +
</syntaxhighlight>
 +
With this information we can try to gather more information about a specific variable, in the following case the variable number ''variable_index'':
  
With this information we can try to gather more information about a specific variable,
+
<syntaxhighlight lang="C++">
in the following case the variable number _variable_index_:
+
PIDX_set_current_variable_index(file, variable_index);
 
+
PIDX_get_current_variable(file, &amp;variable);
    PIDX_set_current_variable_index(file, variable_index);
+
PIDX_values_per_datatype(variable-&gt;type_name, &amp;values_per_sample, &amp;bits_per_sample);
    PIDX_get_current_variable(file, &variable);
+
</syntaxhighlight>
    PIDX_values_per_datatype(variable->type_name, &values_per_sample, &bits_per_sample);
+
Using these functions we retrieved alse the type name, the number of values per sample and the number of bits per sample.
 
 
Using these functions we retrieved alse the type name, the number of values per sample
 
and the number of bits per sample.
 
  
 
Now it is time to read the data using:
 
Now it is time to read the data using:
  
    PIDX_variable_read_data_layout(variable, local_offset, local_size, data, PIDX_row_major);
+
<syntaxhighlight lang="C++">
 
+
PIDX_variable_read_data_layout(variable, local_offset, local_size, data, PIDX_row_major);</syntaxhighlight>
Also in this case we need to indicate the per-process local box information (i.e. local offset
+
Also in this case we need to indicate the per-process local box information (i.e. local offset and local size) and the array where we want to store the data of this variable. When the <code>PIDX_close(file)</code> will be called the data of this variable will be copied to our local ''data'' array in row major order.
and local size) and the array where we want to store the data of this variable.
 
When the `PIDX_close(file)` will be called the data of this variable will be copied to our  
 
local _data_ array in row major order.
 
  
Example source: [idx_read.c](https://github.com/sci-visus/PIDX/blob/master/examples/basic-io/idx_read.c)
+
Example source: [https://github.com/sci-visus/PIDX/blob/master/examples/basic-io/idx_read.c idx_read.c]

Latest revision as of 22:52, 29 May 2018

Introduction

In these examples we describe how to write and read data using the PIDX library. We consider a global 3D regular grid domain that we will call global domain (g). This global domain represents the grid space where all the data resides.

In a parallel environment each core (e.g. MPI rank) owns a portion of the data that has to be written on the disk. We refer to this portion of the domain as local domain (l).

In this example we want to write in parallel data from a number of cores (N). For simplicity we define N local domains of equal size (i.e. calculate_per_process_offsets()). In a real case the local domains can have arbitrary size.

The data used for these examples are generated by a function (i.e. create_synthetic_simulation_data()) that fills the global grid with a gradient.

PIDX is an library that uses MPI, as such we need first of all to initialize MPI and get the some useful information (e.g. current rank and processors count). In this example is performed by the init_mpi() function.

If you want to write and read data in PIDX you just need to follow a few simple steps: - identify the per-process local domain - create a PIDX_access, it defines how you want to access the data (serial or parallel) - create or access a PIDX_file, it defines the dataset - define or read one or more PIDX_variable - write/read the variables

How to write

The data that we want to write from each processor can be considered as a box in the global domain. This box is represented by and offset (i.e. the position in the global domain) and a size (i.e. the number of elements in the box).

PIDX uses a struct called PIDX_point to represent a value in n-dimensional space. We can use this struct to define the global domain size, and the local box information.

PIDX_set_point(global_size, global_box_size[X], global_box_size[Y], global_box_size[Z]);
PIDX_set_point(local_offset, local_box_offset[X], local_box_offset[Y], local_box_offset[Z]);
PIDX_set_point(local_size, local_box_size[X], local_box_size[Y], local_box_size[Z]);

The next step is to create an access to tell PIDX how we want to access the data, in this case in parallel. We can do this as following:

PIDX_create_access(&p_access);
PIDX_set_mpi_access(p_access, MPI_COMM_WORLD);

Now it is time to create the file using:

PIDX_file file;
PIDX_file_create(output_file_name, PIDX_MODE_CREATE, p_access, global_size, &amp;file);

The parameters that we use are: - output_file_name, the name of the file that we want to create - file_access_mode, in this case we create the file - pidx_access, the access configuration that we defined - global_size, the global domain size as we defined - file, the PIDX_file struct that we are creating

Other general settings that we want to do are setting the current timestep (i.e. logical time)using:

PIDX_set_current_time_step(file, ts);

and the current number of variables that we want to write, as following:

PIDX_set_variable_count(file, variable_count);

We are now ready to define these variables. A variable in PIDX is defined by: - a name - a number of bits, e.g. 32 for a 32 bit integer variable - a typename, e.g. "1*int32"

The number of bits is given by the product of number of bits per value (e.g. an int value has 32 bits) and values per sample (e.g. a vector of float with 3 components has 3 values per sample).

With these information we can create our variables as following:

PIDX_variable_create(var_name[var], bpv[var] * vps[var], type_name[var], &amp;variable[var]);

where variable is an array of PIDX_variable.

We want to tell to PIDX where this variable's data are in memory in the per-process local domain, we do this as following:

PIDX_variable_write_data_layout(variable[var], local_offset, local_size, data[var], PIDX_row_major);

where data is a char pointer to the data array (of the specific variable) and the last parameter (PIDX_row_major) indicates the order of the data in this array (e.g. row major).

Finally we can tell PIDX to submit this variable for the parallel write, we do this using:

PIDX_append_and_write_variable(file, variable[var]);

For performance improvement PIDX tries to gather data from multiple variables and dump them to disk at once only when the function PIDX_close(file) is called.

In some cases it could be useful to explicitly control when PIDX has to dump a number of variables on the disk. This can be done using the function PIDX_flush().

Example source: idx_write.c

How to read

To read data written with PIDX we follow the same initialization process to create the PIDX_access and PIDX_file that we saw in the previous section.

After the initialization the first step is to know how many variables are in the dataset, we can do this using:

PIDX_get_variable_count(file, &amp;variable_count);

With this information we can try to gather more information about a specific variable, in the following case the variable number variable_index:

PIDX_set_current_variable_index(file, variable_index);
PIDX_get_current_variable(file, &amp;variable);
PIDX_values_per_datatype(variable-&gt;type_name, &amp;values_per_sample, &amp;bits_per_sample);

Using these functions we retrieved alse the type name, the number of values per sample and the number of bits per sample.

Now it is time to read the data using:

PIDX_variable_read_data_layout(variable, local_offset, local_size, data, PIDX_row_major);

Also in this case we need to indicate the per-process local box information (i.e. local offset and local size) and the array where we want to store the data of this variable. When the PIDX_close(file) will be called the data of this variable will be copied to our local data array in row major order.

Example source: idx_read.c