Identifying attacks in the Russia–Ukraine conflict using seismic array data – Nature

-


Malyn AKASG (PS45)

The detection of explosions in this study (Supplementary Table 1) was performed using data collected at the Ukrainian primary seismic station of the IMS, which operates as part of the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO). The station is denoted as Malyn AKASG with the treaty code PS45. Details of all 24 sensors in this seismic array are listed in Supplementary Table 2.

Seismic detection methodology

The methodology we use for seismic event detection is based on the QuakeMigrate software package13, designed for the automatic detection and location of earthquakes using waveform migration and stacking. Using continuous seismic data recorded on the Malyn seismic array, we transform the data at each sensor (and each channel for the case of the three-component sensor AKBB) into onset functions using the STA/LTA to help identify P-wave or S-wave seismic arrivals. For the detection of P-waves, we first apply a two-pole Butterworth bandpass filter with corner frequencies of 6 and 16 Hz, whereas for the detection of S-waves, we use a frequency band between 6 and 14 Hz. For both phases, we use a STA window of 0.3 s and a LTA window of 3 s. Using a lookup table of precomputed seismic travel times, we migrate and stack these onset functions over a grid of candidate locations at each time step. For an explosion that has been successfully recorded on the seismic network, the amplitudes of the onset functions will sum coherently (or coalesce) for the grid point corresponding to the source location and the time step matching the origin time of the explosion.

Our grid search is performed in two dimensions at the Earth’s surface, between latitudes of 50° and 52° N and longitudes of 28° and 32.3° E, using a 1-km grid spacing. The theoretical P-wave and S-wave travel times for this grid are computed using an eikonal solver from NonLinLoc34. As input to the eikonal solver, we use a 1D velocity model extracted from the 1 × 1 degree global crustal model, CRUST1.0 (ref. 35), at a longitude of 29.5° E and a latitude of 50.5° N (see Extended Data Table 1).

To generate event triggers, we apply a static detection threshold of 3.0 to the coalescence values generated at each time step in the migration and stacking. This threshold value was chosen to be deliberately low to ensure a high detection level. Although above the background noise level, this threshold also generates many false positives for signals originating outside our grid search, which become aliased to within our search area during the migration. To reduce the number of false positives, we remove all triggers that have locations originating in Belarus or that are mapped within the footprint of the array, in which many of the false-positive events become aliased.

We then rerun the migration and stacking on the remaining triggers using a modified STA/LTA function that better approximates a Gaussian for the onset functions, resulting in more precise event locations and origin times. Automatic magnitudes are also computed during this stage. A final manual screening of the events is performed by reviewing both the filtered waveforms with their theoretical arrival times and the migration images showing the coalescence values mapped onto our 2D regional grid. Any clear false positives are removed from the event catalogue during this manual screening. It should be noted that both the spatial filtering and manual screening are not required with a higher detection threshold, but the true detection rate would be reduced. We present a detection sensitivity analysis for three selected days in a subsequent section of Methods.

Location uncertainty analysis

The location uncertainty of the migration algorithm can be assessed in terms of the migration images that are computed for each event. These 2D images show the coalescence function for a given time window, which is the basis for the location and origin time for each detection. It is first worth considering the theoretical location uncertainty for different locations in our target area. This can be achieved by computing point spread functions to show the impulse response for the migration. This theoretical response is influenced by the imaging function, the network geometry, the velocity model, the event location and the seismic phases we use to generate our images. We show point spread functions for two locations with very different distances to the array (more locations are provided in the Supplementary Information). Also, we show examples of observed explosions at similar locations to compare the synthetic case with that observed. In Extended Data Fig. 1, we show point spread functions and observed data from Malyn, approximately 9 km from the AKBB reference station. In Extended Data Fig. 2, we show examples from Chernihiv in the northeast, which is approximately 170 km from the AKBB reference station. To illustrate the necessity and influence of using both P-wave and S-wave onsets in the migration, we compute point spread functions for three cases: (1) when we have only the P-wave; (2) when we have only the S-wave; and (3) when we have both the P-wave and the S-wave. To quantify the location uncertainty within the point spread functions, we computed uncertainty ellipses based on the full width at half maximum. We perform a principal component analysis on all points that fall within 50% of the maximum value of the point spread function to generate an uncertainty ellipse, which is then centred on the image maximum. Such an approach has its limitations, as demonstrated by the uncertainty ellipses generated for the Chernihiv point spread functions (Extended Data Fig. 1a). For the P-wave only and S-wave only point spread functions, the values within 50% of the maximum are limited by the size of the migration area, which has the effect of artificially reducing the size of the uncertainty ellipse. Despite this limitation, we observe that the optimal imaging response (with both P-waves and S-waves across the network) provides a location precision of 4.7 km (semimajor axis) for an event at Chernihiv and 2.9 km for an event at Malyn, in which the distance to the array is greatly reduced. We also observe that, with only a single phase type (particularly only the P-wave), there is not sufficient resolution to locate events, especially at large distances or to the northwest and southeast of the array, at which azimuthal coverage is reduced. In the observed data from Malyn (Extended Data Fig. 1b), although there is now increased noise, and there are epistemic uncertainties included in the migration, the resulting migration image can still be regarded qualitatively as showing high precision owing to the clear P-wave and S-wave onsets in the waveform data. For Chernihiv, we show observed data (Extended Data Fig. 2b) that is of much poorer quality, with far fewer impulsive signals and higher noise in the waveform data, further degrading the imaging resolution that is inherent for that location. This poorer data quality is because of increased distance from the network, the single frequency band that is applied to all events in the imaging region and potential path effects. Although we observe a high azimuthal uncertainty, we are still able to constrain the distance relatively well owing to the P-wave and S-wave observations.

We show further observed examples from our event catalogue in Extended Data Fig. 3, to demonstrate the effect of location and data quality on the migration images. For example, in Extended Data Fig. 3a, we show an event typical of the high quality we observe in the region up to 50 km northeast of the array. This generates a high-resolution migration image. As a comparison, we also show a much lower quality event from a similar location in Extended Data Fig. 3b. For all examples, there is generally a good fit between the theoretical P-wave and S-wave arrival times and the observed arrivals in the waveforms.

Detection sensitivity analysis

To demonstrate the detection sensitivity and to justify our choice of trigger threshold, we calculate the true positive rate (TPR) and false discovery rate (FDR) for three different days. The TPR is defined as,

$${rm{TPR}}=frac{{rm{TP}}}{{rm{P}}}=frac{{rm{TP}}}{{rm{TP}}+{rm{FN}}}$$

in which TP is the number of true positives that have been detected and located by the migration algorithm and P is the total number of real positives in the dataset, that is, the total number of events that could theoretically be detected and located, which includes the total number of true positives and the total number of false negatives (FN), that is, events that the migration has been unable to reliably detect and locate. To verify marginal true positives (for which the signal-to-noise ratio is low) and to identify false-negative events (not detected by the migration algorithm), we manually screen all waveform data for each of the three days investigated. Although there are events in the dataset in which only single phases are observed across the network, this is not sufficient to provide a reliable event location, as demonstrated by the point spread functions in Extended Data Figs. 1 and 2. Such events are therefore regarded as false positives (FP), which are detected and mislocated by the migration algorithm. Similarly, in estimating the number of false negatives by manually screening the waveform data, we only consider events that can be manually picked and located using a traditional arrival-time-based location algorithm (HYPOSAT36) as potential false negatives. Events with extremely low signal-to-noise ratio or with only the P-wave present are not regarded as false negatives.

Because we wish to maximize the number of true positives while minimizing the number of false positives, we also calculate the FDR, which is defined as,

$${rm{FDR}}=frac{{rm{FP}}}{{rm{FP}}+{rm{TP}}}.$$

As the input to our migration algorithm are onset functions generated from the STA/LTA, assuming the trigger threshold is sufficiently above the noise floor, false positives arise from one of three scenarios. First, seismic events from outside the migration area can be spatially aliased into the region. Second, seismic events may have too few onsets across the network to reliably resolve the event location, but still be above the trigger threshold owing to high signal-to-noise ratios for a subset of phases or stations. Third, several events at different locations occurring simultaneously, or events that occur in short succession, resulting in onset functions for two different events being migrated as a single event. For example, the onset functions from the P-waves for event 1 may be migrated with the onset functions from the S-waves for event 2, generating an incorrect event.

The three days we selected for this analysis were 3 February, 7 March and 20 May 2022. 3 February was chosen as it represents a day before the invasion, for which only quarry blasts were detected and thus provides a good baseline measure. 7 March represents the day with the highest number of explosions detected and 20 May was selected because it is after the main Russian withdrawal from the region, yet there were still targeted attacks that were detected. Moreover, 20 May is a day with a high number of repeating signals that we commonly observe throughout our study period and which substantially contributes to the number of false positives detected. We believe these repeating signals to originate from possible mining activity in Belarus, yet are repeatedly aliased into our migration region.

In Extended Data Fig. 4, we show the time series of the maximum coalescence from across the migration region for each of the three days for which the triggering is performed. The repeating signals from the assumed mining activity from Belarus can be clearly observed between approximately 16:40 and 18:15 UTC and again from approximately 19:25 to 21:20 UTC on 20 May. The coalescence values corresponding to these events are similar to the values from the military attacks observed on this day.

In Extended Data Figs. 5–7, we show the TPR and the FDR as a function of triggering threshold. The total number of true positives and false positives are shown in the Supplementary Tables 3–5. For 3 February (Extended Data Fig. 5), for which we observe only three explosions related to quarry activity in the region, we observe a TPR of 100% between a trigger threshold of 2.4 and 4.0, whereas the FDR reduces from 99.4% to 40% over the same threshold range. Although the FDR generally decreases with a higher trigger threshold, we also observe some increases. For example, between thresholds of 2.9 to 3.0, there is an increase in FDR from 40% to 62.5%. This is caused by the duration of the coalescence values exceeding the threshold. At lower thresholds, the coalescence may exceed the threshold but not drop below it within a given time interval, meaning that there is no detrigger. At a higher threshold, within the same period, it is possible that the coalescence becomes triggered and detriggered several times, resulting in more false positives.

For 7 March (Extended Data Fig. 6), for which there are a total of 74 valid events, we observe an increase in the TPR from between 75.7% at a threshold of 2.4 to a maximum of 94.6% at thresholds of both 2.9 and 3.0. This increase in TPR despite a higher threshold is because of the influence of the false positives. For example, with several explosions, false positives can be generated owing to incorrect association of the different phases between the events, which will prevent the correct events being identified. Thus, with higher false positives, the number of true positives may also be reduced.

It is also worth noting that, although the number of true positives observed for 7 March never exceeds 70, we detected a total of 73 unique events across the different thresholds out of the possible 74 events. The single false-negative event that was only observed in the waveform screening was not detected by the migration owing to a lower frequency band being required for its detection.

For 20 May (Extended Data Fig. 7), the influence of the activity from Belarus is apparent in the FDR. Although the TPR reaches a maximum of 80% at thresholds of 2.7 and 2.8, it drops from 70% to 53.3% going from a threshold of 3.0 to 3.1. However, the FDR remains above 90% until a threshold of 3.5. This means that we are unable to minimize the number of false positives during this time period without severely affecting the TPR.

Although our aim is to maximize the number of true positives whilst minimizing the FDR, based on the TPR and FDR for the three selected days, there is no clear optimal threshold. The results from 3 February show that a threshold of 4.0 would provide the highest TPR (100%) while minimizing the FDR (40%). By contrast, for 7 March, when we observe the highest number of explosions, a lower threshold of 3.0 would be required to allow us to achieve the maximum TPR of 94.6%, resulting in an FDR of 58.1%. For 20 May, we would be required to further reduce the threshold to 2.8 to maximize the TPR (80%), but we then experience an FDR of 96.4% at this level.

Because many of the false positives that arise from the repeating Belarusian signals are generally aliased either into locations in the array or in Belarus, we choose to select a relatively low threshold of 3.0 as stated in the ‘Seismic detection methodology’ section and apply the aforementioned spatial filtering and final QC to remove the false positives.

Infrasound detection methodology

The detection of infrasound phases is performed after the seismic detection and localization stage presented in these Methods. We assume that observed infrasound arrivals correspond to lower-tropospheric infrasonic propagation close to the surface, which is valid at close distances from the source (<100 km).

We search for infrasound arrivals for each event e in time windows we constrained by the event distance to each sensor de,s (km) and origin time t0,e, using realistic adiabatic sound velocities c (km s−1) between cmin = 0.325 km s−1 and cmax = 0.37 km s−1, such that we = {minssensors(de,s/cmax), maxssensors(de,s/cmin)}. To account for the candidate event time, we take the product of the waveform envelopes recorded at each sensor and a Gaussian function ({g}_{e,s}={{{rm{e}}}^{{left(t-{mu }_{e,s}right)/sigma }}}^{2}), in which t(s) is the time, with mean value μe,s = t0,e + de,s/c(s) and standard deviation σ = 7.5s for each candidate velocity c. We consider ten different candidate velocities uniformly distributed in the range (cmin, cmax) to account for uncertainties in event location and surface sound velocities. Envelopes convolved with a Gaussian function are then linearly stacked to extract the stacking maximum smax used to identify infrasound arrivals. For each event, only the entry with candidate velocity c showing the maximum value for smax is kept in the database as a potential detection.

To confirm the detection of infrasound arrivals, we apply a series of static detection thresholds, tstack, on the stacking maximum, smax, on the ratio of average maximum to average standard deviation across the array tSNR and on the ratio of maximum standard deviation to average standard deviations across the array tstd. Thresholds are chosen empirically after spot checking the detected waveforms such that tstack = 2.2 × 10−8, tSNR = 3.655 and tstd = 7.1. These values correspond to conservative choices to reduce the number of false positives.

Automatic magnitude computation

We compute automatic local magnitudes using the QuakeMigrate software package13 during the final migration and stacking stage described above. We remove the instrument response from each sensor and filter the data to simulate the response of a Wood–Anderson seismometer. To establish phase arrival times, we run an autopicker for each event by fitting a 1D Gaussian to the onset functions, in which the onset function exceeds the median absolute deviation outside the picking window by a factor of 8. We filter the instrument-corrected data with a 1–8-Hz bandpass filter and measure the maximum S-wave amplitude in a 4-s window from the automatic S-wave arrival time. Noise is estimated by measuring the root-mean-square amplitude in a 5-s window before the P-wave signal window. S-wave amplitudes that exceed the noise amplitude by a factor of 3 are then used to compute the local magnitude using the Hutton–Boore attenuation curves37, with the mean value across all sensors used to compute a final network magnitude.

Estimating explosive yield from seismic magnitudes

We estimate explosive yield from the automatic magnitude estimates using two empirically derived estimates. The upper yield estimates are calculated on the basis of the relationship,

$${M}_{{rm{L}}}=0.8834{log }_{10}W-1.4221$$

in which W is the explosive yield in kg. This is derived from land-based explosions with known charge size listed in Supplementary Table 6 and compiled in ref. 30.

The lower yield estimates are based on the relationship derived for the Novaya Zemlya nuclear test site by29,

$${m}_{{rm{b}}}=0.75{log }_{10}Y+4.25$$

in which Y is the explosive yield in kilotonnes. It should be noted that this relationship has been derived for body-wave magnitudes (mb) but we have applied it to the local magnitudes (ML) calculated for the Ukrainian explosions.

Both the lower and upper estimates of explosive yield can be regarded as having high uncertainties and should be used only as a guide to the relative sizes of the explosions.

A histogram showing the distribution of yield estimates derived from seismic magnitudes is shown in Extended Data Fig. 8.

Estimating explosive yield from acoustic phases

Empirical relationships exist between the explosive yield and acoustic maximum amplitudes, referred as the Blast Operational Overpressure Model (BOOM)33, or dominant frequencies, referred to here as the Revelle model38. Frequency-based estimates are generally less sensitive to the atmospheric variability compared with amplitude estimates. However, empirical models based on frequency inputs have been constructed from far-field (>500 km from the source) historical data of mostly atmospheric nuclear explosions that are markedly different, in terms of energy release, from the conflict-related explosions investigated here. By contrast, models based on amplitude inputs have used close-range stations (<50 km from the source), which is more realistic for our event dataset.

Because the Revelle model relies on stratospheric returns at much larger distances from the source, we only estimate acoustic-based yields using the BOOM model. The BOOM empirical yield estimates39 are constructed as:

$${Y}_{{rm{B}}{rm{O}}{rm{O}}{rm{M}}},=,{[{{rm{e}}}^{(L-103.1-B/5.3)/20}/{({(S/1013)}^{0.556})times ({(1/110)}^{0.444})times {(25/R)}^{1.333}}]}^{frac{1}{0.444}}$$

in which L = 20 × log10(p/2e − 5) (dB), with p the pressure perturbation amplitude and B = arctan(3 × (dv/dz) × (R/ca)), with R (km) the source–receiver distance, dv (m s−1) the maximum difference in the sound speed and the surface sound speed, dz (km) the altitude at which dv is observed and ca (m s−1) the sound velocity at the ground. YBOOM requires the pressure amplitude as input, which is not directly available from our seismic recordings. To produce pressure amplitude estimates, we consider the relationship vz = HρwP, in which Hρw is the air-to-ground transmission coefficient. Hρw for acoustic waves travelling along the surface40 is such that:

$${H}_{rho w}=frac{{c}_{{rm{a}}}}{2left(lambda +mu right)}frac{lambda +2mu }{mu },$$

in which (λ, μ) are the ground Lame parameters. Because high-frequency air-to-ground acoustic transmission can be sensitive to the poorly constrained uppermost ground layers, we consider two scenarios: (1) ‘rock’, which corresponds to the seismic velocities presented in Extended Data Table 1 and density ρ = 2.85 × 103 kg m−3, and (2) ‘sediment’, which corresponds to a scenario with much lower shear velocities at the ground such that vp, vs and ρ are 2 km s−1, 0.55 km s−1 and 1.93 × 103 kg m−3, respectively. Furthermore, the BOOM model (YBOOM) uses sound velocity gradients as inputs through dv and dz. Yet, these inputs only have a second-order impact on the energy transmission compared with seismic velocity parameters. Therefore, we use the following arbitrary values dv = 1 m s−1 and dz = 1 km. Because we have signals recorded at several stations, we build estimates (widetilde{Y}) as averages across all stations.

Yield estimates using the BOOM model are presented in Extended Data Fig. 9. We observe large discrepancies between the two types of seismic velocity model, with a much larger energy transmission, that is, lower yields, in the case of sediments. Only the estimates using the ‘sediment’ seismic model qualitatively match the distribution of yield estimates based on the magnitude computations in Extended Data Fig. 8. Strong biases exist in the empirical estimates provided by the BOOM model, which was developed using a different range of source yields and source–receiver distances. This highlights the need for ground-truth data for yield calibration in future studies.

Event location using manually picked seismic arrivals

As well as the automatic event detections, we built a further database of located events for the purpose of validation (about 800 events). In contrast to the automatic catalogue, the P-wave and S-wave arrivals were manually picked by analysts on all AKASG stations, for which a clear onset was visible. For S-wave arrivals, picking on the single three-component station of the array tended to give more confident time picks. For a few events, acoustic arrivals were also picked manually.

We then located the events using the HYPOSAT location software36. This software implements an iterative optimization procedure inverting travel times for epicentre and source time. The source depth was set to zero because we anticipated signals generated on the Earth’s surface. The same seismic velocity model as for the automatic stacking location was used. An uncertainty of 1 s on both P-wave and S-wave arrivals was selected before inversion. Events with acoustic arrivals have been located by considering an extra acoustic phase in the inversion, travelling at constant velocity ca. Because ca is not accurately known, we consider five potential velocity candidates uniformly distributed in the range ca = 0.33 ± 0.01 km s−1. We then select the solution with the largest posterior likelihood. An uncertainty of 5 s on acoustic arrivals was selected before inversion to account for the difficulty in picking the onset when arrivals are dispersed and/or low in amplitude. Examples of two events located with manually picked arrivals, which include acoustic arrivals, are shown in Fig. 4a,e.

Extracting event reports from Liveuamap

We extracted all the reported events between February and November 2022 from the automatic event catalogue available at https://liveuamap.com/. Daily event catalogues can be downloaded from the website as .geojson files, which can be processed using the built-in JSON Python library. Because the event catalogue contains entries unrelated to direct military activity, that is, potential explosions, we filtered the original catalogue using the following methodology: (1) we removed repeated entries, that is, entries with the same description that share the exact same location and time of the day but with dates varying across several days; (2) we then kept only entries that included the keywords ksave shown below; (3) and, finally, we removed entries that included the keywords kremove shown below.

List of keywords of interest:

ksave = ‘exercise’, ‘ceasefire violations’, ‘military activity’, ‘artillery’, ‘damaged’, ‘wounded’, ‘launched’, ‘crash’, ‘strike’, ‘clashes’, ‘targeted’, ‘targeting’, ‘projectile’, ‘exploded’, ‘shot’, ‘fight’, ‘bombed’, ‘burn’, ‘blown up’, ‘siren’, ‘airdrop’, ‘destroy’, ‘killed’, ‘attacking’, ‘dropped’, ‘target’, ‘hit’, ‘struck’, ‘boom’, ‘smoke’, ‘explosion’, ‘blow up’, ‘firing’, ‘damage’, ‘hit’, ‘shelling’, ‘shelled’, ‘escalation’, ‘injured’.

kremove = ‘@Maxar satellite’, ‘Embassy’, ‘statement’, ‘German chancellor’, ‘DDoS’, ‘procession’, ‘warning’, ‘another video’, ‘will not succeed’, ‘calls civil’, ‘call civil’, ‘unexploded’, ‘repair’, ‘driving’, ‘treated’, ‘visit’, ‘take cover now!’, ‘says’, ‘telegram’, ‘found dead’, ‘satellite imagery’, ‘more footage’, ‘still no strikes’, ‘death toll’, ‘Ukrainian FM’, ‘frigate’, ‘seized’, ‘negotiation’, ‘phone call’, ‘minister’, ‘commander-in-chief’, ‘Zelensky’, ‘advisor’, ‘president of Ukraine’, ‘evacuation’.



Source link

Latest news

My Favorite All-in-One Printer and Scanner Is $50 Off

While a printer upgrade might not sound like the most exciting way to spend your hard-earned dollars, the...

Government Workers Say Their Out-of-Office Replies Were Forcibly Changed to Blame Democrats for Shutdown

On Wednesday, the first day of the US government shutdown, employees at the Department of Education (DOE) set...

How startups could be affected by a prolonged government shutdown

The U.S. government shutdown could stifle deal flow, freeze visa processing for workers, and cause other problems for...

Celebrating the partners driving Disrupt’s big ideas, connections, and community

Tech Zone Daily Disrupt 2025 wouldn’t be possible without the incredible support of our sponsors, who bring world-class...

Phia’s Phoebe Gates and Sophia Kianni talk consumer AI at Disrupt 2025

Consumer AI is having its breakout moment — and few startups have captured the spotlight this year quite...

China Rolls Out Its First Talent Visa as the US Retreats on H-1Bs

The bottom line is that, unlike the US, China is not a country of immigrants. In 2020, only...

Must read

You might also likeRELATED
Recommended to you