INSTANCE – the Italian seismic dataset for machine learning

<p>The Italian earthquake waveform data are collected here in a dataset suited for machine learning analysis (ML) applications. The dataset consists of nearly 1.2 million three-component (3C) waveform traces from about 50 000 earthquakes and more than 130 000 noise 3C waveform traces, for a t...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: A. Michelini, S. Cianetti, S. Gaviano, C. Giunchi, D. Jozinović, V. Lauciani
Formato: article
Lenguaje:EN
Publicado: Copernicus Publications 2021
Materias:
Acceso en línea:https://doaj.org/article/1c077d416e79468a9c95a1ccf41f1892
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:<p>The Italian earthquake waveform data are collected here in a dataset suited for machine learning analysis (ML) applications. The dataset consists of nearly 1.2 million three-component (3C) waveform traces from about 50 000 earthquakes and more than 130 000 noise 3C waveform traces, for a total of about 43 000 h of data and an average of 21 3C traces provided per event. The earthquake list is based on the Italian Seismic Bulletin (<span class="uri">http://terremoti.ingv.it/bsi</span>, last access: 15 February 2020​​​​​​​) of the Istituto Nazionale di Geofisica e Vulcanologia between January 2005 and January 2020, and it includes events in the magnitude range between 0.0 and 6.5. The waveform data have been recorded primarily by the Italian National Seismic Network (network code IV) and include both weak- (HH, EH channels) and strong-motion (HN channels) recordings. All the waveform traces have a length of 120 s, are sampled at 100 Hz, and are provided both in counts and ground motion physical units after deconvolution of the instrument transfer functions. The waveform dataset is accompanied by metadata consisting of more than 100 parameters providing comprehensive information on the earthquake source, the recording stations, the trace features, and other derived quantities. This rich set of metadata allows the users to target the data selection for their own purposes. Much of these metadata can be used as labels in ML analysis or for other studies. The dataset, assembled in HDF5 format, is available at <span class="uri">http://doi.org/10.13127/instance</span> <span class="cit" id="xref_paren.1">(<a href="#bib1.bibx53">Michelini et al.</a>, <a href="#bib1.bibx53">2021</a>)</span>.</p>