Write and read data with PIDX

Introduction

In these examples we describe how to write and read data using the PIDX library. We consider a global 3D regular grid domain that we will call global domain (g). This global domain represents the grid space where all the data resides.

In a parallel environment each core (e.g. MPI rank) owns a portion of the data that has to be written on the disk. We refer to this portion of the domain as local domain (l).

In this example we want to write in parallel data from a number of cores (N). For simplicity we define N local domains of equal size (i.e. calculate_per_process_offsets()). In a real case the local domains can have arbitrary size.

The data used for these examples are generated by a function (i.e. create_synthetic_simulation_data()) that fills the global grid with a gradient.

PIDX is an library that uses MPI, as such we need first of all to initialize MPI and get the some useful information (e.g. current rank and processors count). In this example is performed by the init_mpi() function.

If you want to write and read data in PIDX you just need to follow a few simple steps: - identify the per-process local domain - create a PIDX_access, it defines how you want to access the data (serial or parallel) - create or access a PIDX_file, it defines the dataset - define or read one or more PIDX_variable - write/read the variables

How to write

The data that we want to write from each processor can be considered as a box in the global domain. This box is represented by and offset (i.e. the position in the global domain) and a size (i.e. the number of elements in the box).

PIDX uses a struct called PIDX_point to represent a value in n-dimensional space. We can use this struct to define the global domain size, and the local box information.

PIDX_set_point(global_size, global_box_size[X], global_box_size[Y], global_box_size[Z]);
PIDX_set_point(local_offset, local_box_offset[X], local_box_offset[Y], local_box_offset[Z]);
PIDX_set_point(local_size, local_box_size[X], local_box_size[Y], local_box_size[Z]);

The next step is to create an access to tell PIDX how we want to access the data, in this case in parallel. We can do this as following:

PIDX_create_access(&p_access);
PIDX_set_mpi_access(p_access, MPI_COMM_WORLD);

Now it is time to create the file using:

PIDX_file file;
PIDX_file_create(output_file_name, PIDX_MODE_CREATE, p_access, global_size, &amp;file);

The parameters that we use are: - output_file_name, the name of the file that we want to create - file_access_mode, in this case we create the file - pidx_access, the access configuration that we defined - global_size, the global domain size as we defined - file, the PIDX_file struct that we are creating

Other general settings that we want to do are setting the current timestep (i.e. logical time)using:

PIDX_set_current_time_step(file, ts);

and the current number of variables that we want to write, as following:

PIDX_set_variable_count(file, variable_count);

We are now ready to define these variables. A variable in PIDX is defined by: - a name - a number of bits, e.g. 32 for a 32 bit integer variable - a typename, e.g. "1*int32"

The number of bits is given by the product of number of bits per value (e.g. an int value has 32 bits) and values per sample (e.g. a vector of float with 3 components has 3 values per sample).

With these information we can create our variables as following:

PIDX_variable_create(var_name[var], bpv[var] * vps[var], type_name[var], &amp;variable[var]);

where variable is an array of PIDX_variable.

We want to tell to PIDX where this variable's data are in memory in the per-process local domain, we do this as following:

PIDX_variable_write_data_layout(variable[var], local_offset, local_size, data[var], PIDX_row_major);

where data is a char pointer to the data array (of the specific variable) and the last parameter (PIDX_row_major) indicates the order of the data in this array (e.g. row major).

Finally we can tell PIDX to submit this variable for the parallel write, we do this using:

PIDX_append_and_write_variable(file, variable[var]);

For performance improvement PIDX tries to gather data from multiple variables and dump them to disk at once only when the function PIDX_close(file) is called.

In some cases it could be useful to explicitly control when PIDX has to dump a number of variables on the disk. This can be done using the function PIDX_flush().

Example source: idx_write.c

How to read

To read data written with PIDX we follow the same initialization process to create the PIDX_access and PIDX_file that we saw in the previous section.

After the initialization the first step is to know how many variables are in the dataset, we can do this using:

PIDX_get_variable_count(file, &amp;variable_count);

With this information we can try to gather more information about a specific variable, in the following case the variable number variable_index:

PIDX_set_current_variable_index(file, variable_index);
PIDX_get_current_variable(file, &amp;variable);
PIDX_values_per_datatype(variable-&gt;type_name, &amp;values_per_sample, &amp;bits_per_sample);

Using these functions we retrieved alse the type name, the number of values per sample and the number of bits per sample.

Now it is time to read the data using:

PIDX_variable_read_data_layout(variable, local_offset, local_size, data, PIDX_row_major);

Also in this case we need to indicate the per-process local box information (i.e. local offset and local size) and the array where we want to store the data of this variable. When the PIDX_close(file) will be called the data of this variable will be copied to our local data array in row major order.

Example source: idx_read.c