Skip to content

Note

Click here to download the full example code

NWB & Lazy-loading

Pynapple currently provides loaders for two data formats :

  • npz with a special structure. You can check this notebook for a descrition of the methods for saving/loading npz files.

  • NWB format

This notebook focuses on the NWB format. Additionaly it demonstrates the capabilities of pynapple for lazy-loading different formats.

The dataset in this example can be found here.

import numpy as np
import pynapple as nap

NWB

When loading a NWB file, pynapple will walk through it and test the compatibility of each data structure with a pynapple objects. If the data structure is incompatible, pynapple will ignore it. The class that deals with reading NWB file is nap.NWBFile. You can pass the path to a NWB file or directly an opened NWB file. Alternatively you can use the function nap.load_file.

Note

Creating the NWB file is outside the scope of pynapple. The NWB file used here has already been created before. Multiple tools exists to create NWB file automatically. You can check neuroconv, NWBGuide or even NWBmatic.

data = nap.load_file("../../your/path/to/MyProject/sub-A2929/A2929-200711/pynapplenwb/A2929-200711.nwb")

print(data)

Out:

A2929-200711
┍━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┑
 Keys                   Type        ┝━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━┥
 units                  TsGroup      position_time_support  IntervalSet  epochs                 IntervalSet  z                      Tsd          y                      Tsd          x                      Tsd          rz                     Tsd          ry                     Tsd          rx                     Tsd         ┕━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┙

Pynapple will give you a table with all the entries of the NWB file that are compatible with a pynapple object. When parsing the NWB file, nothing is loaded. The NWBFile keeps track of the position of the data whithin the NWB file with a key. You can see it with the attributes key_to_id.

data.key_to_id

Out:

{'units': '70deb090-70e9-4c1d-ac9c-55abca5e41eb', 'position_time_support': '86c8ad5d-bcbd-48dc-a78b-c6004d7fd027', 'epochs': 'a4a46a1d-34b9-4436-9be1-cfb06f95fffe', 'z': 'a5762b8b-f500-491d-842f-2fe48f4239dd', 'y': 'a2a21e58-580a-41ef-82a3-62ee1661c72b', 'x': '7df5a354-2c36-4ee6-8562-5cbefe2ea7fd', 'rz': '0eff6663-f138-4537-af3e-4f1b1dcf0c4d', 'ry': 'dad1c314-4cee-4f15-9adf-aa447404aaca', 'rx': 'a57cb94a-0f84-4334-a23d-8f744014e3ab'}

Loading an entry will get pynapple to read the data.

z = data['z']

print(data['z'])

Out:

Time (s)
----------  ---------
670.6407    -0.195725
670.649     -0.19511
670.65735   -0.194674
670.66565   -0.194342
670.674     -0.194059
670.68235   -0.193886
670.69065   -0.193676
670.699     -0.193592
670.70735   -0.193613
670.71565   -0.19369
670.724     -0.193737
670.73235   -0.193868
670.74065   -0.193965
670.749     -0.194054
670.75735   -0.194111
670.76565   -0.19409
670.774     -0.194127
670.78235   -0.194147
670.79065   -0.194166
670.799     -0.194154
670.80735   -0.194096
...
1199.82825   0.010054
1199.8366    0.0096
1199.84495   0.009097
1199.85325   0.008593
1199.8616    0.008154
1199.86995   0.007557
1199.87825   0.006923
1199.8866    0.006249
1199.89495   0.005565
1199.90325   0.004828
1199.9116    0.004058
1199.91995   0.003207
1199.92825   0.002277
1199.9366    0.001367
1199.94495   0.000398
1199.95325  -0.000552
1199.9616   -0.001479
1199.96995  -0.00237
1199.97825  -0.003156
1199.9866   -0.003821
1199.99495  -0.004435
dtype: float32, shape: (63527,)

Internally, the NWBClass has replaced the pointer to the data with the actual data.

While it looks like pynapple has loaded the data, in fact it did not. By default, calling the NWB object will return an HDF5 dataset.

Warning

New in 0.6.6

print(type(z.values))

Out:

<class 'h5py._hl.dataset.Dataset'>

Notice that the time array is always loaded.

print(type(z.index.values))

Out:

<class 'numpy.ndarray'>

This is very useful in the case of large dataset that do not fit in memory. You can then get a chunk of the data that will actually be loaded.

z_chunk = z.get(670, 680) # getting 10s of data.

print(z_chunk)

Out:

Time (s)
----------  ---------
670.6407    -0.195725
670.649     -0.19511
670.65735   -0.194674
670.66565   -0.194342
670.674     -0.194059
670.68235   -0.193886
670.69065   -0.193676
670.699     -0.193592
670.70735   -0.193613
670.71565   -0.19369
670.724     -0.193737
670.73235   -0.193868
670.74065   -0.193965
670.749     -0.194054
670.75735   -0.194111
670.76565   -0.19409
670.774     -0.194127
670.78235   -0.194147
670.79065   -0.194166
670.799     -0.194154
670.80735   -0.194096
...
679.83185    0.061196
679.84015    0.061403
679.8485     0.061553
679.85685    0.061727
679.86515    0.061826
679.8735     0.062025
679.88185    0.06219
679.89015    0.062361
679.8985     0.062509
679.90685    0.062633
679.91515    0.062745
679.9235     0.062844
679.93185    0.062874
679.94015    0.062856
679.9485     0.062836
679.95685    0.062831
679.96515    0.062789
679.9735     0.062756
679.98185    0.06277
679.99015    0.062819
679.9985     0.062878
dtype: float32, shape: (1124,)

Data are now loaded.

print(type(z_chunk.values))

Out:

<class 'numpy.ndarray'>

You can still apply any high level function of pynapple. For example here, we compute some tuning curves without preloading the dataset.

tc = nap.compute_1d_tuning_curves(data['units'], data['y'], 10)

print(tc)

Out:

                0           1          2          3         4   ...         10         11        12         13        14
0.012548  2.841894   13.038401   3.308084  11.673396  6.045551  ...   3.986856   4.169603  0.578076   0.783199  0.126804
0.056355  7.510898    3.452621  10.658762   3.425617  8.972957  ...   5.180861   5.713220  3.128576   0.806255  0.262322
0.100161  0.000000    0.000000   0.000000   0.000000  6.667138  ...  10.000707   0.000000  0.000000   6.667138  0.000000
0.143968  0.000000    0.000000   0.000000   0.000000  0.000000  ...   0.000000   6.000424  0.000000  12.000848  0.000000
0.187774  0.000000    0.000000   0.000000   7.500530  0.000000  ...  15.001060   7.500530  0.000000   0.000000  0.000000
0.231581  0.000000    7.500530   0.000000  30.002121  0.000000  ...   7.500530   0.000000  0.000000   0.000000  0.000000
0.275387  0.000000   75.005301   7.500530  60.004241  0.000000  ...   7.500530   0.000000  0.000000   0.000000  0.000000
0.319194  0.000000   97.506892   0.000000  45.003181  0.000000  ...   0.000000   7.500530  7.500530   0.000000  0.000000
0.363000  0.000000  100.807125   0.000000  38.402714  0.000000  ...   9.600679  33.602375  0.000000   0.000000  0.000000
0.406807  0.000000   31.581179   1.263247  29.054685  0.000000  ...   2.526494   7.579483  0.000000   0.000000  0.000000

[10 rows x 15 columns]

Warning

Carefulness should still apply when calling any pynapple function on a memory map. Pynapple does not implement any batching function internally. Calling a high level function of pynapple on a dataset that do not fit in memory will likely cause a memory error.

To change this behavior, you can pass lazy_loading=False when instantiating the NWBClass.

path = "../../your/path/to/MyProject/sub-A2929/A2929-200711/pynapplenwb/A2929-200711.nwb"
data = nap.NWBFile(path, lazy_loading=False)

z = data['z']

print(type(z.d))

Out:

<class 'numpy.ndarray'>

Numpy memory map

In fact, pynapple can work with any type of memory map. Here we read a binary file with np.memmap.

eeg_path = "../../your/path/to/MyProject/sub-A2929/A2929-200711/A2929-200711.eeg"
frequency = 1250 # Hz
n_channels = 16
f = open(eeg_path, 'rb') 
startoffile = f.seek(0, 0)
endoffile = f.seek(0, 2)
f.close()
bytes_size = 2
n_samples = int((endoffile-startoffile)/n_channels/bytes_size)
duration = n_samples/frequency
interval = 1/frequency

fp = np.memmap(eeg_path, np.int16, 'r', shape = (n_samples, n_channels))
timestep = np.arange(0, n_samples)/frequency

print(type(fp))

Out:

<class 'numpy.memmap'>

Instantiating a pynapple TsdFrame will keep the data as a memory map.

eeg = nap.TsdFrame(t=timestep, d=fp)

print(eeg)

Out:

Time (s)       0     1     2     3     4  ...
----------  ----  ----  ----  ----  ----  -----
0.0         1003   836  1075   681   918  ...
0.0008       968   781   984   613   878  ...
0.0016       869   683   880   515   770  ...
0.0024       886   717   903   528   789  ...
0.0032       791   659   790   479   738  ...
0.004        765   634   776   452   717  ...
0.0048       838   675   855   477   736  ...
0.0056       863   696   888   480   736  ...
0.0064       907   741   960   503   797  ...
0.0072       965   781  1010   600   866  ...
0.008       1011   831  1054   740   932  ...
0.0088      1070   888  1145   789  1018  ...
0.0096      1096   858  1148   710  1006  ...
0.0104      1081   833  1099   662   937  ...
0.0112      1045   861  1081   674   932  ...
0.012        991   816  1030   639   896  ...
0.0128       937   750   987   577   807  ...
0.0136       868   719   964   538   774  ...
0.0144       848   681   943   512   795  ...
0.0152       901   705   990   509   818  ...
0.016        909   769  1022   525   827  ...
...
1199.9792   -444  -376  -558  -404  -449  ...
1199.98     -383  -318  -427  -311  -315  ...
1199.9808   -578  -530  -638  -490  -519  ...
1199.9816   -573  -515  -643  -494  -533  ...
1199.9824   -291  -315  -453  -356  -357  ...
1199.9832   -368  -429  -560  -494  -506  ...
1199.984    -554  -555  -599  -586  -582  ...
1199.9848   -458  -480  -514  -494  -486  ...
1199.9856   -436  -429  -575  -422  -511  ...
1199.9864   -363  -267  -415  -229  -316  ...
1199.9872   -105   -91  -181   -73   -68  ...
1199.988    -194  -179  -288  -152  -180  ...
1199.9888   -460  -249  -354  -173  -253  ...
1199.9896   -360  -219  -321  -155  -229  ...
1199.9904   -194  -290  -432  -287  -357  ...
1199.9912   -256  -293  -410  -325  -337  ...
1199.992    -251  -173  -244  -192  -163  ...
1199.9928   -161  -175  -289  -183  -208  ...
1199.9936   -286  -320  -484  -323  -387  ...
1199.9944   -533  -448  -577  -403  -476  ...
1199.9952   -443  -334  -380  -266  -323  ...
dtype: int16, shape: (1499995, 16)

We can check the type of eeg.values.

print(type(eeg.values))

Out:

<class 'numpy.memmap'>

Zarr

It is also possible to use Higher level library like zarr also not directly.

import zarr
data = zarr.zeros((10000, 5), chunks=(1000, 5), dtype='i4')
timestep = np.arange(len(data))

tsdframe = nap.TsdFrame(t=timestep, d=data)

Out:

/mnt/home/gviejo/pynapple/pynapple/core/utils.py:186: UserWarning: Converting 'd' to numpy.array. The provided array was of type 'Array'.
  warnings.warn(

As the warning suggest, data is converted to numpy array.

print(type(tsdframe.d))

Out:

<class 'numpy.ndarray'>

To maintain a zarr array, you can change the argument load_array to False.

tsdframe = nap.TsdFrame(t=timestep, d=data, load_array=False)

print(type(tsdframe.d))

Out:

<class 'zarr.core.Array'>

Within pynapple, numpy memory map are recognized as numpy array while zarr array are not.

print(type(fp), "Is np.ndarray? ", isinstance(fp, np.ndarray))
print(type(data), "Is np.ndarray? ", isinstance(data, np.ndarray))

Out:

<class 'numpy.memmap'> Is np.ndarray?  True
<class 'zarr.core.Array'> Is np.ndarray?  False

Similar to numpy memory map, you can use pynapple functions directly.

ep = nap.IntervalSet(0, 10)
tsdframe.restrict(ep)

Out:

Time (s)      0    1    2    3    4
----------  ---  ---  ---  ---  ---
0             0    0    0    0    0
1             0    0    0    0    0
2             0    0    0    0    0
3             0    0    0    0    0
4             0    0    0    0    0
5             0    0    0    0    0
6             0    0    0    0    0
7             0    0    0    0    0
8             0    0    0    0    0
9             0    0    0    0    0
10            0    0    0    0    0
dtype: int32, shape: (11, 5)
group = nap.TsGroup({0:nap.Ts(t=[10, 20, 30])})

sta = nap.compute_event_trigger_average(group, tsdframe, 1, (-2, 3))

print(type(tsdframe.values))
print("\n")
print(sta)

Out:

<class 'zarr.core.Array'>


Time (s)
----------  -----------------
-2          [[0. ... 0.] ...]
-1          [[0. ... 0.] ...]
0           [[0. ... 0.] ...]
1           [[0. ... 0.] ...]
2           [[0. ... 0.] ...]
3           [[0. ... 0.] ...]
dtype: float64, shape: (6, 1, 5)

Total running time of the script: ( 0 minutes 0.508 seconds)

Download Python source code: tutorial_pynapple_nwb.py

Download Jupyter notebook: tutorial_pynapple_nwb.ipynb

Gallery generated by mkdocs-gallery