### Mice

Seven- to fourteen-week-old C57BL/6 wild-type, *Pitx2-cre::Tau-LSL-Flp0-INLA* (*Pitx2-Flp*, derived from *Pitx2-cre* and *Tau-LSL-Flp0-INLA* mice, provided by J. Martin and S. Arber, respectively), *Pitx2-cre::Rosa-LSL-tdTomato* (derived from *Pitx2-cre* and *Rosa-LSL-tdTomato* mice; 007914, The Jackson Laboratory) or *Vgat-cre* (016961, The Jackson Laboratory) mice were used for this study. Mice of both sexes were used for anatomical experiments and only males were used for behavioural experiments. All animal procedures were conducted in accordance with the UK Animals (Scientific procedures) Act 1986 and European Community Council Directive on Animal Care under project license PPL PCDD85C8A and approved by The Animal Welfare and Ethical Review Body (AWERB) committee of the MRC Laboratory of Molecular Biology. Animals undergoing surgical implantation were individually housed to prevent damage to implants. Lighting was set to a reversed light:dark cycle, with simulated dawn and dusk at 19:00 and 07:00, respectively. Temperature was controlled at 19–23 °C and humidity at 45–65%. For open-field recordings, mice were placed on a restricted diet sufficient to maintain 85% of their free-feeding weight. When possible, analyses were blinded to data collection. While sample sizes were not determined using statistical methods, we selected sample sizes for each experiment based on variance observed in similar studies^{9,38} and on practical experimental considerations.

### Surgery

Mice were anaesthetized with isoflurane. Upon cessation of reflexes, the top of the head of the mouse was shaved, the mouse was placed on a stereotaxic frame and the skin was opened with a scalpel in a single clean vertical cut.

Viral injections in the brain were done using a nanoject (Scientific Laboratory Supplies) equipped with a pulled borosilicate glass capillary (1.5 outer diameter × 0.86 internal diameter × 100 mm, Harvard Apparatus). Up to a maximum of 300 nl of virus were injected at 3 heights within the SC (coordinates: 3.80 AP, 1 ML, 1.2, 1.5 and 1.8 DV) at a rate of 5 nl every 5 s. For retrograde tracing, AAV(1)-CMV-FRTed-TVAmCherry-2A-Gly (500 nl, titre: 3.9 × 10^{12} genomic copies per ml) was injected at day 0, followed by injection of virus DG-rabies-GFP(EnvA) (500 nl, titre: 4.3 × 10^{8} infectious units per ml) at day 21 through the same craniotomy. Mice were perfused one week after injection of rabies virus. Brain tissue was processed as described in ‘Histology’. For selective long-term labelling of Pitx2^{ON-PRE} neurons in the SC, we performed an initial injection with a mix of AAV(2)-hSyn1-FLEX-nucHA-2A-TVA-2A-G(N2c) (500 nl, titre: 1.5 × 10^{12} genomic copies per ml) and either AAV(1)-pAAV-nEF-Cre^{OFF}/Flp^{ON}-ChR2(ET/TC)-EYFP (500 nl, Addgene #137141, titre: 4.3 × 10^{12} genomic copies per ml) or AAV(1)-Ef1a-fDIO-GCaMP6f (500 nl, Addgene #128315, titre: 5 × 10^{12} genomic copies per ml). Following 3 weeks of expression, a second injection with a self-inactivating rabies virus^{35,36,37} SiR-N2c-Flp (EnvA) (500 nl, titre: 1 × 10^{7} genomic copies per ml) was performed before implanting an optetrode or a cranial window for chronic Ca^{2+} imaging (see below). In a subset of experiments AAV(9)-EF1a-double floxed-hChR2(H134R)-mCherry-WPRE-HGHpA (800 nl, Addgene #20297, titre: 5 × 10^{12} genomic copies per ml) was injected in the SC followed by optetrode implant. For Ca^{2+} experiments injections were performed at 3 different locations surrounding the area to be imaged, namely: 3.2, 3.7 and 4.2 mm posterior from Bregma, 0.8, 1.5 and 0.8 mm lateral of the midline and 1.3, 1.5 and 1.2 mm ventral to the brain surface. For a subset of experiments, the right eye of the mouse was injected with an AAV(2)-CAG-ChR2(H248R)-mCherry (1.5 ml, titre: 3.9 × 10^{12} genomic copies per ml). A drop of 1% tropicamide and another of 2.5% phenylephrine hydrochloride were applied on the eye before injecting up to 3 μl of the virus with a Hamilton syringe. Following viral injection, a drop of 0.5% proxymetacaine and another of 0.5% chloramphenicol were applied to the injected eye. Acute brain slices were performed 4 weeks post injection.

For all in vivo recordings, the skin covering the left hemisphere was removed and a craniotomy covering 3.5–3.8 mm AP and 0.8–1.2 ML was performed and custom-made head plate was cemented (Super-Bond C & B; Prestige Dental) around the craniotomy. Care was taken to avoid bleeding or drying of the meninges and brain tissue. For whole-cell recordings in anaesthetized mice, saline (0.9% NaCl) was superfused constantly with a 2 ml min^{−1} laminar flow using a peristaltic pump and body temperature was maintained at 36 °C using a low-noise heating pad (FHC, Termobit).

For tetrode recordings, mice were implanted with moveable 17-mm-diamteer platinum-iridium (H-ML insulated) microelectrodes (California Fine Wire), configured as four tetrodes and carried by 16-channel microdrives (Axona) and with a custom-made head plate. Tetrodes were platinum electroplated to an impedance of 100–250 kΩ using a Kohlraush:gelatin (9:1, 0.5% gelatin) solution. Electrodes were implanted at the surface of the SC: at coordinates 3.8–4.2 mm posterior from Bregma, 1.25 mm lateral of the midline and 1.2 mm ventral to the brain surface. All mice were given at least one week to recover before recording and food deprivation. The same protocol was used for optetrode surgery. For in vivo chronic selective Ca^{2+} imaging of Pitx2^{ON-PRE} neurons in the SC recordings, mice underwent an initial viral surgical injection with an AAV-TVA and an AAV-GCaMP6f as described above and were injected with Dexafort at 2 μg g^{−1} the day prior to surgery. A head post and a cranial window over the SC were implanted as previously described^{17}. In brief, following isoflurane anaesthesia, Vetergesic was injected subcutaneously at 0.1 mg kg^{−1} and a metal head post was affixed to the skull with Crown & Bridge Metabond. Epivicaine was splashed over the skull, and a 3-mm-diameter craniotomy was performed on the left hemisphere, centred on the rostral SC. The surface of the SC was exposed through removal of the overlying cortex and the SiR was then injected as described above. A 3-mm cannular window was then fixed on top of the colliculus using dental cement (Crown & Bridge Metabond).

### Histology

Once tetrodes were estimated to have passed beyond the SC, mice were anaesthetized with Euthatal (0.2 ml) and transcardially perfused with 4% formaldehyde in phosphate buffered saline (PBS). Brains were stored in the fixative and then 30% w/v sucrose solution for 24–48 h in order to cryoprotect the tissue. Brains were subsequently embedded in O.C.T. (VWR), frozen to ~−20° and cut in 30 μm coronal sections using a CM1950 cryostat (Leica). Nissl staining was used to determine tetrode depth as previously described^{38}.

For immunohistochemistry (IHC) experiments, 40 μm cryo-sections were performed. Free-floating sections were rinsed in PBS and incubated in blocking solution (1% donkey serum and 0.3% Triton X-100 in PBS) containing primary antibodies for 24 h at 4 °C. Sections were washed with PBS four times at room temperature and incubated for 24 h at 4 °C in blocking solution with secondary antibodies. Immunolabelled sections were washed four times with PBS at room temperature and mounted on glass slides (SuperFrost Plus, Thermo Scientific) using DAPI Fluoromount-G (SouthernBiotech). Biocitin-filled neurons were manually traced and aligned across sections to obtain the final reconstruction. Primary antibodies used in this study were: chicken anti-GFP (Aves Labs, GFP-1020, 1:2,000) and rabbit anti-RFP (Rockland, 600-401-379, 1:2,000). Secondary antibodies used were Alexa Fluor 488 donkey anti-chicken (Jackson ImmunoResearch, 703-545-155, 1:1,000), Cy3 donkey anti-rabbit (Jackson ImmunoResearch, 711-165-152, 1:1,000) and Alexa Fluor 488-conjugated streptavidin (Invitrogen, 1:2,000). Images were acquired using a Zeiss780 confocal microscope using a 20×/0.8 NA air lens (Carl Zeiss).

Retinas were dissected from eyecups, incubated in 4% formaldehyde for 24 h. Following several washes in PBS full retinas were mounted for imaging.

### Electrophysiology

#### Whole-cell

For whole-cell recordings, coronal slices (350 μm) containing the SC were prepared using a vibrating microtome (7000smz-2, Campden Instruments) in ice-cold sucrose-based cutting solution oxygenated with carbogen gas (95% O_{2}, 5% CO_{2}) and with the following composition (in mM): KCl 3, NaH_{2}PO_{4}1.25, MgSO_{4} 2, MgCl_{2} 1, CaCl_{2} 1, NaHCO_{3} 26.4, glucose 10, sucrose 206, ascorbic acid 0.40, kynurenic acid 1. Slices were incubated at 37 °C for 30 min in a submerged-style holding chamber with oxygenated artificial cerebrospinal fluid (aCSF; in mM: NaCl 126, KCl 3, NaH_{2}PO_{4} 1.25, MgSO_{4} 2, CaCl_{2} 2, NaHCO_{3} 26.4, glucose 10) with an osmolarity adjusted to 280–300 mOsm l^{−1} and stored thereafter in the same holding chamber at room temperature for at least a further 30 min. Slices were individually transferred to the recording chamber and were superfused with oxygenated aCSF at room temperature at a flow-rate of approximately 2 ml min^{−1}. To block GABAergic receptors CGP (52431, 10 μM, Tocris Bioscience) and gabazine (SR95531, 10 μM, Tocris Bioscience) were diluted into the superfusate.

Whole-cell current-clamp recordings were obtained from collicular neurons using 5–8 MΩ pipettes pulled from borosilicate glass capillaries (1.5 mm outer diameter × 0.86 mm inner diameter). Pipettes were filled with artificial intracellular solution containing (in mM): potassium gluconate 150, HEPES 10, NaCl 4, magnesium ATP 4, sodium GTP 0.3 and EGTA 0.2, 0.4% biocitin; adjusted to pH 7.2 and osmolarity 270–290 mOsm l^{−1}. Data were recorded using an Axon Multiclamp 700B amplifier (Molecular Devices) and signals were low-pass filtered at 2 kH and acquired at 5 kHz using a digitizer (Axon Digidata 1550 A, Molecular Devices) on a PC running pClamp. Light-evoked responses were elicited using a 450–490 nm LED light (pE-300 coolLED system, Scientifica) through a 40× water immersion objective (0.8 NA).

Whole-cell in vivo recordings were performed under isoflurane anaesthesia as previously described^{56}. In brief, the recording pipette was placed perpendicularly to the brain surface and a reference silver pellet electrode (A-M Systems) was placed in the saline bath covering the craniotomy. High positive pressure (>500 mbar) was applied before lowering the pipette to the surface of the brain. A 5-ms-long square pulse of voltage of 4 to 8 mV at 100 Hz was delivered via the recording electrode. The pipette was quickly advanced ~1.2 mm to reach the surface of the SC. The pressure was lowered to 40–60 mbar and the pipette was advanced in 2-μm steps. Cell contact produced a small reduction (around 10%) in resistance of the pipette, which could be seen as a proportional decrease in the size of the step of current in the oscilloscope.

#### Tetrodes

Single-unit recording was carried out using a multi-channel DacqUSB recording system (Axona) as previously described^{38}. The microdrive and head stage were attached to the pre-amplifier via a lightweight cable. Signals were amplified 12–20,000 times and bandpass filtered between 500 Hz and 7 kHz. Recording thresholds were set to 70% above baseline activity levels, and data from spikes above the threshold from all channels were collected across a period spanning 200 µs preceding and 800 µs following the peak amplitude of a spike. The activity of channels from any given tetrode was referenced against the activity of a single channel from another tetrode to increase the signal-to-noise ratio. Tetrodes were advanced ventrally into the brain by ~50–75 μm after each recording session. The inertial sensor was attached to the head stage on the head of the mice using Mill-Max connectors. The signal from the sensor was passed through a lightweight cable via one Arduino for processing the signal and computing the direction cosine matrix algorithm. The control Arduino was connected to the DacqUSB system using the system’s digital input–output port. A custom BASIC script was written in DacqUSB to synchronize the start of single-unit recording with the key-press initiation of inertial sensor recording (controlled using the Processing software sketchbook; https://processing.org) or visual stimulation. During head-restrained recordings, a TTL pulse was sent to an infrared LED to align eye tracking and electrophysiology.

#### Optetrodes

For simultaneous single-unit recording and optogenetic stimulation, a single optic fibre (core = 100 μm, NA = 0.22; Doric Lenses) was inserted between each bundle of tetrodes during production as previously described^{9}. Light was delivered using a 473 nm laser diode module (Cobolt 06-MLD, Cobolt) coupled to a 100-μm multimode fibre (NA = 0.22) through a Schäfter + Kirchhoff fibre coupler (Cobolt). The laser power employed in all stimulation experiments was 3–5 mW at the tip of the fibre. For optotagging, mice were probed in an open-field arena. Following 10 min of acclimatization, bursts of 30× 5-ms-long pulses at 30 Hz were delivered with a 9-s rest period in between burst for a total of 900 stimulations over 5 min. Blue light-activated units were defined on the basis of the latency of the response to a pulse of light within a time window^{57} of 5 ms. In a subset of experiments, *Vgat-cre* mice expressing ChR2 were recorded in head-restrained conditions to assess the change in tuning to moving gratings following light stimulation of collicular Vgat^{ON}. The same implants and setup was used in those experiments, although continuous 10-ms-long pulses at 30 Hz were delivered during visual stimulation in light-ON trials instead.

#### Ca^{2+} imaging

Mice were imaged from one week after surgery with a two-photon microscope (Bergamo II, Thorlabs), equipped with a 16× 0.8 NA objective (Nikon). Mice were recorded awake and head-fixed on a custom-made floating platform. tdTomato-positive Pitx2^{ON} neurons and SiR-infected cells were excited with a Ti:Sapphire laser at 1,030 nm and 920 nm, respectively, with a power of around 20 mW (Mai TaiDeepSee, Spectra Physics). Red and green emitted fluorescence were collected through a 607 ± 35 nm and 525 ± 25 nm filters, respectively (Brightline). For imaging of neuronal responses, recordings consisted of either multiple planes at different depths imaged quasi-simultaneously using a piezo device, or a single plane. The pixel resolution was kept at around 0.8 µm per pixel, while the number of pixels, field of view and imaging rate were adjusted to cover the labelled cells. On average this resulted in imaging frame rates of 30 to 60 Hz and pixel dwell times of 0.2 to 0.08 µs.

Two-photon recordings were then registered and ROIs were determined manually and extracted using CaImAn^{58} (Flatiron Institute) in Python. Variation of fluorescent values over baseline (Δ*F*/*F*) were computed and used for further analysis.

### Behaviour

#### Training procedure

Before recording, mice were acclimatized with being handled for two days and with carrying the head stage and being head-fixed for three more days. For all head-fixed recordings, mice were positioned standing over a wheel. During the acclimation phase on the wheel mice were able to run freely at all times but the duration of restraint was gradually increased from 5–10 min the first day to up to 30 min the third day.

#### Visual stimulation

Visual stimuli were generated using a customized version of Python PsychoPy toolbox, presented on a LCD monitor (Dell P2414H; mean luminance 35–45 cd m^{−2}) positioned 20 cm from the right eye of the mouse, spanning 31° down, 42° up, 45° nasal and 59° temporal. The screen was gamma-corrected and refreshed at 60 Hz. Electrophysiological recordings were aligned to the visual stimulus using a photodiode placed at a bottom right corner of the stimulus monitor and covered to not elicit a visual response on the mouse. The signal from the photodiode and the accelerometer on the wheel were time stamped and recorded using an Arduino.

Static receptive field position was estimated by 750-ms-long flashes of uniform black or white 9 cm^{2} squares on a grey background. The screen was divided in 165 locations covering 73° by 104° (corresponding to 30 × 53 cm). To assess the receptive field the full protocol was repeated three times.

Full screen sinusoidal gratings of 12 different directions (30° steps) were used to determine direction and orientation selectivity. Each grating (spatial frequency: 0.08 cycles per degree) would first remain static for 1 s, then move at 2.83 Hz for 2 s and stop for a further 1 s before changing direction. Gratings were displayed three times in a semi-randomized manner. This protocol was repeated three more times for a total of nine presentations per stimulus over the three trials.

For a subset of mice two extra visual stimulation paradigms were used: directional moving spot and moving Gabor patches were also used. A small black spot (1.3 cm diameter corresponding to ~3.7°) moving at 30 cm s^{−1} in a grey background was used in order to assess the direction selectivity of neurons to a stimulus mimicking a prey moving. The stimulation paradigm consisted of a small black spot moving towards the centre of the screen (roughly aligned to the centre of the visual field) for 1 s, then staying static in the centre of the screen for 0.5 s before retracting to the opposite direction with the same speed of approach. Eight starting points were used: all four corners of the screen and all midpoints between two corners of the screen. Each starting location was presented three times per trial in a semi-randomized manner. Similar to the other visual stimulation paradigms, we recorded three trials for a total of nine presentations per movement direction.

In order to assess the spatial receptive field of neurons to moving stimuli, we used moving Gabor patches at 24 different locations (6 along the *x* axis and 4 along *y* axis) moving in 8 different directions (45° steps) for 1.5 s (spatial frequency: 0.08 cycles per degree and 2.83 Hz temporal frequency). All directions and locations were randomly presented 3 times for a total of 576 presentations corresponding to 3 times 8 directions at 24 locations.

Before all visual stimulation trials, ‘spontaneous’ firing rates were estimated over a black screen (15 s) followed by a grey screen of same average luminescence as the grating presentation (15 s). After each trial the mouse was shown first the grey screen (15 s) and then a black screen (15 s). Average activity during grey screen presentation was used as baseline and the transitions were used to compare responses to luminescence.

For in vivo whole-cell recordings, visual presentation was done using a Dell E176FP LCD screen. Full screen gratings (12 directions) were presented a total at least 3 times and up to 9. Flashing squares (covering a 100 locations) were presented 0 to 3 times, depending on the length and stability of recording.

#### Open-field foraging

Single units were recorded as mice foraged for droplets of 30% diluted condensed milk on a white Perspex arena (50 × 50 cm) to limit variability on visual input. Recording sessions consisted of 4× 5-min foraging trials, with the first and last occurring in light conditions and the second and third occurring in complete darkness. During dark trials all other sources of light within the experimental room such as computer screens were switched off or covered with red screens. For a subset of mice, open-field recordings were also performed in an open-field arena covered with either vertical or horizontal 3-cm-wide black and white stripes. Those recordings were performed in light conditions to enforce orientation-specific self-generated visual flow.

#### Eye tracking

For eye tracking in head-restrained conditions we used a camera (DMK 21BU04.H, The Imaging Source) with a zoom lens (MVL7000, ThorLabs) focused on the right eye. The eye was illuminated with an infrared LED lamp (LIU850A, ThorLabs) and an infrared filter was used on the camera (FEL0750, ThorLabs; with adapters SM2A53, SM2A6 and SM1L03, ThorLabs). When fully zoomed and placed ~20 cm from the mouse, this setup provided ~73 pixels per mm. The video was acquired using DacqUSB and synchronized to the electrophysiological recording using a small flashing infrared LED linked to the bottom edge of the camera.

In order to measure eye movements in freely moving mice we used a custom head-mounted eye and head tracking system, as previously described^{40,59}. In brief, we used a commercially available camera module (1937, Adafruit; infrared filter removed). A custom 3D printed camera holder with a 21 G cannula (Coopers Needle Works) was used to hold the camera, IR LEDs (VSMB2943GX01, Vishay) and a 7.0 mm × 9.3 mm IR mirror (Calflex-X NIR-Blocking Filter, Optics Balzers). A connector (852-10-00810-001101, Preci-Dip) was used to attach the camera holder to the head plate of the mice. Mice were head-fixed, and the mirror’s position was adjusted until the eye was in the centre of the eye camera. Epoxy (Araldite Rapid, Araldite) was used to fix the mirror position. A single-board computer (Raspberry Pi 3 model B, Raspberry Pi Foundation) recorded camera data at 30 Hz, capturing images of 1,296 × 972 pixels per frame for eye camera. The head roll, pitch, and yaw were estimated using an inertial motion unit including an accelerometer, gyroscope and magnetometer using previously described methods^{38} and open source Arduino code (https://github.com/razor-AHRS) using an Arduino Mega 2560 rev 3.

### Quantification and statistical analysis

#### Whole-cell in vitro and in vivo electrophysiology

For optogenetic stimulation in acute brain slices, light was adjusted to elicit ~5 mV postsynaptic potentials following 5-ms-long pulses. In order to calculate the latency of light-evoked responses a linear fit was made between time points corresponding to 25–30% and 70–75% of peak amplitude of the excitatory postsynaptic potential (EPSP), or of the first slope in the case of polysynaptic EPSPs. The latency was measured as the time elapsed between light onset and the point of crossing between the linear fit on the EPSP slope and the resting membrane potential level. To determine the latency of response to visual stimuli recorded in whole-cell mode in vivo we first calculated the differential of the membrane potential during baseline (grey screen) to assess the s.d. of baseline presynaptic activity. For each repetition at the preferred direction or orientation of tuning we determined the time of the first event with a slope greater than 3 × s.d. of baseline presynaptic activity and of at least 0.5 mV ms^{−1}. Recordings with a s.d. of baseline presynaptic activity >1.5 mV were discarded.

#### Spike sorting for tetrode recordings

The electrophysiological data were spike sorted using Tint cluster cutting software (Axona). Cluster cutting was carried out by hand as clusters were generally well separated. Clusters were included in analysis if they were stable across all trials throughout the day and did not belong to clusters identified in previous recording days.

#### 3D head-rotation recording

To determine the precise head rotation of the mouse during tetrode recordings we employed a sensor (50 Hz sampling frequency) equipped with accelerometers, gyroscopes, and magnetometers as previously described^{38}. In brief, the sensor outputs were fed to a direction cosine matrix algorithm to provide measurements of head orientation expressed in Euler angles with respect to the Earth reference frame (yaw, pitch and roll). The rotation matrix is:

$${R}_{xyz}=\left(\begin{array}{ccc}\cos \theta \cos \psi & \cos \theta \sin \psi & -\sin \theta \\ \sin \phi \sin \theta \cos \psi -\cos \phi \sin \psi & \sin \phi \sin \theta \sin \psi +\cos \phi \cos \psi & \sin \phi \cos \theta \\ \cos \phi \sin \theta \cos \psi +\sin \phi \sin \psi & \cos \phi \sin \theta \sin \psi -\sin \phi \cos \psi & \cos \phi \cos \theta \end{array}\right)$$

(1)

From which one can extract the Euler angles as:

$$\begin{array}{c}\phi ={\rm{a}}{\rm{t}}{\rm{a}}{\rm{n}}2({R}_{23},{R}_{33})\\ \theta =\,-\arcsin ({R}_{13})\\ \psi ={\rm{a}}{\rm{t}}{\rm{a}}{\rm{n}}2({R}_{12},{R}_{11})\end{array}$$

(2)

The primary source of the Euler angles is the gyroscope measurements, which are expressed as angular velocity

$$\omega =\left(\begin{array}{c}{\omega }_{x}\\ {\omega }_{y}\\ {\omega }_{z}\end{array}\right)=\left(\begin{array}{c}\frac{\partial \phi }{\partial t}\\ \frac{\partial \theta }{\partial t}\\ \frac{\partial \psi }{\partial t}\end{array}\right)$$

(3)

The main equation used to update the rotation matrix over time from gyroscope signals:

$${R}^{T}\left(t+{\rm{d}}t\right)={R}^{T}(t)\left(\begin{array}{ccc}1 & -{\omega }_{z}{\rm{d}}t & {\omega }_{y}{\rm{d}}t\\ {\omega }_{z}{\rm{d}}t & 1 & -{\omega }_{x}{\rm{d}}t\\ {-\omega }_{y}{\rm{d}}t & {\omega }_{x}{\rm{d}}t & 1\end{array}\right)$$

All drift corrections and calibrations were performed as previously described^{38}.

#### Motion tuning

The motion tuning of SC neurons was determined by carrying out STAs of head displacements. To compute the STA of motion, the angular head velocity for the 25 temporal bins (0.5 s) preceding and 50 bins (1 s) following the onset of spike were averaged for all spikes and for each of the three Eulerian components. The direction of the head at the onset of each spike was normalized to zero for each Eulerian component. For each spike, the calculated angular head velocities were cumulatively summated for each temporal bin to produce a head displacement for the 0.5 s preceding and 1 s following the onset of spike. The mean and s.e.m. of spike related head displacements were calculated for each temporal bin to illustrate the tuning of neurons. Displacement vectors for each Eulerian component were calculated as the difference between the minimum and maximum of the computed average displacement. The direction of the displacement vector was defined according to the temporal order of the minimum and maximum values of the computed displacement. A neuron was considered to be tuned to either light or dark if the average displacement vector for at least one of the Eulerian components was >5°, with the same direction and ranking >95% compared to a shuffled distribution for both light trials or dark trials, respectively. When angular difference was reported this referred to the magnitude in degrees of the average displacement angle a neuron was tuned to in a specific condition (for example, light condition) minus the displacement angle at another (for example, horizontal stripes).

#### Visual tuning

Analysis routines for visual tuning were developed in Igor Pro (WaveMetrics). The neuronal response to drifting sinusoidal gratings was averaged for all trials and normalized to the baseline firing rate. The selectivity was then calculated both in direction and orientation space (360° and 180°, respectively) by computing the mean orientation and direction vectors in polar coordinates, described by their modulus (corresponding to the selectivity index) and average angle.

$$\begin{array}{c}{\rm{S}}{\rm{I}}=\left|\frac{{\sum }_{k}R({\theta }_{k})\,{e}^{2i{\theta }_{k}}}{{\sum }_{k}R({\theta }_{k})}\right|\\ \bar{\theta }={\rm{a}}{\rm{t}}{\rm{a}}{\rm{n}}\left(\frac{{\sum }_{k}R({\theta }_{k})\sin {\theta }_{k}}{{\sum }_{k}R({\theta }_{k})\cos {\theta }_{k}}\right)\\ {\rm{f}}{\rm{o}}{\rm{r}}\,\cos {\theta }_{k} < 0,\,\bar{\theta }={\rm{a}}{\rm{t}}{\rm{a}}{\rm{n}}\left(\frac{{\sum }_{k}R({\theta }_{k})\sin {\theta }_{k}}{{\sum }_{k}R({\theta }_{k})\cos {\theta }_{k}}\right)+{\rm{\pi }}\\ {\rm{f}}{\rm{o}}{\rm{r}}\,\sin {\theta }_{k} < 0\,{\rm{a}}{\rm{n}}{\rm{d}}\,\cos {\theta }_{k} > 0,\\ \,\,\bar{\theta }={\rm{a}}{\rm{t}}{\rm{a}}{\rm{n}}\left(\frac{{\sum }_{k}R({\theta }_{k})\sin {\theta }_{k}}{{\sum }_{k}R({\theta }_{k})\cos {\theta }_{k}}\right)+2{\rm{\pi }}\end{array}$$

Where *R*(*θ*_{k}) is the response at each sampled direction or orientation *θ*_{k} (12 for direction space and 6 for orientation space). Static gratings’ response was averaged before and after drifting and probed only in orientation space. Given that the modulus of the vectors is dependent on the firing pattern and firing rate of the neuron, the same calculation was performed for shuffled spike times to obtain a probability distribution of shuffled DSI and SI for OS neurons (OSI) to probe for both direction selectivity and orientation selectivity (see ‘Generation of shuffled datasets’). To determine whether a neuron was tuned to moving or static gratings 3 parameters were used: the selectivity index (≥0.1), the trial-to-trial angular variance (≤0.8; used as a measure of the reliability of selectivity) and the significance of the SI compared to the shuffled distribution. If two out of the three criteria were fulfilled the neuron was considered tuned. If the criteria were met for both direction and orientation spaces, a neuron was classified as DS if DSI > OSI, and OS if OSI > DSI.

Direction and orientation selectivity for small moving spots and for Gabor patches was measured as described for gratings. Two separate analyses of selectivity and preferred angle of tuning were performed for Gabor patches: a location-independent analysis by averaging the responses to each direction of movement across all locations and another analysis only considering the responses at the location of maximal average activity.

In order to determine the spatial tuning of neurons to static stimuli, we averaged the response (firing rate) to flashing 9 cm^{2} squares per location in the screen (15 by 11 locations) for each colour separately (black or white) and divided it by the response during baseline conditions and obtained a 2D matrix corresponding to the increase or decrease of firing rate over baseline per location. This matrix was further transformed into *z* scores. The maximal *z* score was compared to those obtained performing the same analysis in 1000 shuffled trials (see ‘Generation of shuffled datasets’). If the maximal *z* score was >2 and ranked higher than the top 5% of those obtained from the shuffled distribution, a 2D Gaussian was the fit to the matrix. The centre of the Gaussian fit was used to determine the centre of the ssRF. The same analysis was performed to assess whether neurons had a spatial and kinetic receptive field as measured with Gabor patches. The centre of the location of maximal response was used as centre of the receptive field. The overlap of the tunings of individual neurons to different visual stimuli was computed using Intervene^{60}.

To investigate the modulation of the tuning to gratings by locomotion, trials for each angle presentation (4 s total) were divided based on whether the mouse was running or not. To identify running bouts we low-pass filtered the angular velocity of the wheel and used a threshold of 20 deg s^{−1}. We averaged the angular velocity of the wheel on each trial of visual stimulation and considered as ‘run’ trials, those in which the average running speed exceeded 20 deg s^{−1}.

#### Eye tracking in head-restrained mice

We used DeeplabCut^{61} to extract the pupil position from the eye videos. Four cardinal pupil points located at the top, bottom, left and right extremities of the pupil were tracked (Extended Data Fig. 4h). The pupil position was then computed as the centre of mass of these four points. The pupil position varied across two axes: the horizontal nasal–temporal axis, and the vertical upwards–downwards axis.

The tuning of SC neurons to eye displacement was determined by carrying out STAs, following the same method as for head displacements. The angular amplitude of eye movements relative to the resting position was estimated as: *α* = atan(*d*/*r*), where *d* is the distance travelled by the pupil centre and *r* is the radius of the eye, approximated to a sphere. A neuron was considered to be tuned for eye movement if the average displacement vector for at least one of the movement components was >1°, and ranking >95% compared to a shuffled distribution.

#### Extraction of pupil position in freely moving mice

Eye tracking was performed as previously described^{40}. In brief, we tracked the position of the pupil, defined as its centre, together with the nasal and temporal eye corners. The eye corners were used to automatically align the horizontal eye axis. Thirty to fifty randomly selected frames were labelled manually for each recording day. The labelled data were used to train a deep convolutional network via transfer learning using open source code^{61} (https://github.com/AlexEMG/DeepLabCut). The origin of the eye coordinate system was defined as the mid-point between the nasal and temporal eye corners. Pixel values in the 2D video plane were converted to angular eye positions using a model-based approach developed for the C57BL/6J mouse line used in this study^{62}. Saccades were defined as rapid, high-velocity movements occurring in both eyes with a magnitude exceeding 350 deg s^{−1}.

#### Generation of shuffled datasets

For each cell, the spike-onset times were temporally shifted by 2–180 s in a wrap-around manner. This works to shift the relationship between the spike times and the recorded heading directions of the mice or the visual stimuli while maintaining the temporal relationship between spiking events. Once these data were shifted, analyses were carried out to determine the mean displacement vector or selectivity index of the temporally shifted data. This process was repeated 1,000 times so as to produce a random distribution.

#### Visuo-motor alignment

This analysis was performed on neurons that had a significant tuning to moving gratings and a significant tuning to head rotations. In order to determine whether any alignment existed between these 2 types of tuning we first modelled the 3D head-rotation tuning of a neuron as a 2D projection of a vector coming out of the eye of that mouse (corresponding to gaze) into the 2D plane of the screen in which the visual stimuli were shown. We considered the head position at time 0 equal to the position of the mouse head when head-restrained for visual stimuli (see ‘3D head rotations to 2D screen plane transformation model’). We then determined the weighted average of the newly computed motion vectors that make up the 2D trajectory. We weighted the motion vector at each time-point by the instantaneous velocity of the gaze movement. This analysis gave us a vector with an angle \({\bar{\theta }}_{{\rm{gaze}}}\) on the same plane as the gratings presentation and that could be directly compared to the angle of maximum selectivity \({\bar{\theta }}_{{\rm{gratings}}}\). We focussed on comparing the direction of these two vectors by subtraction: \({\bar{\theta }}_{{\rm{gaze}}}-{\bar{\theta }}_{{\rm{gratings}}}\).

### Modelling

#### 3D head rotations to 2D screen plane transformation model

In order to project a gaze vector from the eye of the mouse onto the plane of the screen in which the visual stimuli are shown we first computed the equation of the plane in the laboratory’s reference frame. ** l** is a line from point 1 to point 2 in the screen plane and since it also lies within the

*xy*plane (for the particular choice of points), we have (see Supplementary Fig. 1):

$$\begin{array}{c}{\boldsymbol{l}}\perp \widehat{z}\\ {\boldsymbol{l}}\,{\boldsymbol{\cdot }}\,\widehat{z\,}=\left(-{x}_{0},{y}_{0},\,00,0,1\right)=0\end{array}$$

Thus, the normal vector to the plane is given by:

$${{\boldsymbol{n}}}^{{\prime} }=\widehat{z\,}\times {\boldsymbol{l}}=\left|\begin{array}{ccc}\widehat{i} & \widehat{j} & \widehat{k}\\ 0 & 0 & 1\\ {-x}_{0} & {y}_{0} & 0\end{array}\right|=(-{y}_{0}\widehat{i},{-x}_{0}\,\widehat{j},0\widehat{k})$$

(4)

For simplicity we pick the plane normal vector as:

$${\boldsymbol{n}}=-{n}^{{\prime} }=(\,{y}_{0}\widehat{i},{x}_{0}\,\widehat{j},0\widehat{k})$$

(5)

Using point 1 coordinates, we can find the equation of the plane:

$$\begin{array}{c}{\bf{n}}\,{\boldsymbol{\cdot }}\,(x-{P}_{1}^{x},y-{P}_{1}^{y},z-{P}_{1}^{z})=0\\ (\,{y}_{0},{x}_{0},0)\,{\boldsymbol{\cdot }}\,(x-{x}_{0},y,z)=0\\ {y}_{0}\left(x-{x}_{0}\right)+{x}_{0}\,y=0\end{array}$$

(6)

Next, we determined the equation of a vector \({{\bf{r}}}_{e}^{H}\) coming out of the mouse eye in the mouse head’s reference frame:

$${{\bf{r}}}_{e}^{H}=\left(\begin{array}{c}{x}^{{\prime} }\\ {y}^{{\prime} }\\ {z}^{{\prime} }\end{array}\right)=\left(\begin{array}{c}{P}_{0}^{{x}^{{\prime} }}\\ {P}_{0}^{{y}^{{\prime} }}\\ {P}_{0}^{{z}^{{\prime} }}\end{array}\right)+{m}^{{\prime} }\left(\begin{array}{c}a\\ b\\ c\end{array}\right)$$

(7)

Where *a*, *b* and *c* are the elements of a vector that determines the direction of the gaze and \({P}_{0}^{{\prime} }\) is the initial location of the pupil in the head frame (see Supplementary Fig. 2).

Next, we need to find the coordinates of \({r}_{e}^{H}\) in the inertial reference frame once the head rotation has occurred:

$${r}_{e}^{I}=R{r}_{e}^{H}=\left(\begin{array}{c}{P}_{0}^{x}\\ {P}_{0}^{y}\\ {P}_{0}^{z}\end{array}\right)+m\left(\begin{array}{c}a\\ b\\ c\end{array}\right)=\left(\begin{array}{c}x\\ y\\ z\end{array}\right)$$

(8)

With *x*, *y* and *z* corresponding to the components of the eye vector in the inertial frame.

$$\begin{array}{c}x={P}_{0\,}^{x}+ma\\ y={P}_{0\,}^{y}+mb\\ z={P}_{0\,}^{z}+mc\end{array}$$

where \(m\left(t\right)=R\left(t\right){m}^{{\prime} }\) and \({P}_{0}\left(t\right)=R\left(t\right){P}_{0\,}^{{\prime} }\).

Now, we need to find the intersection of \({r}_{e}^{I}\) with the plane defined in Equation (6) at times *t* and *t* + d*t*:

$${r}_{e}^{I}(t)={P}_{0}\left(t\right)+m(t)\left(\begin{array}{c}a\\ b\\ c\end{array}\right)$$

(9)

$${r}_{e}^{I}(t+{\rm{d}}t)={P}_{0}\left(t+{\rm{d}}t\right)+m(t+{\rm{d}}t)\left(\begin{array}{c}a\\ b\\ c\end{array}\right)$$

(10)

$$\begin{array}{l}{r}_{e}^{I}=\left(\begin{array}{ccc}\cos \theta \cos \psi & \cos \theta \sin \psi & -\sin \theta \\ \sin \phi \sin \theta \cos \psi -\cos \phi \sin \psi & \sin \phi \sin \theta \sin \psi +\cos \phi \cos \psi & \sin \phi \cos \theta \\ \cos \phi \sin \theta \cos \psi +\sin \phi \sin \psi & \cos \phi \sin \theta \sin \psi -\sin \phi \cos \psi & \cos \phi \cos \theta \end{array}\right)\\ \,\,\left(\begin{array}{c}{P}_{0}^{{x}^{{\prime} }}\\ {P}_{0}^{{y}^{{\prime} }}\\ {P}_{0}^{{z}^{{\prime} }}\end{array}\right)+{m}^{{\prime} }\left(\begin{array}{c}a\\ b\\ c\end{array}\right)\end{array}$$

(11)

$${r}_{e,x}^{I}={P}_{0}^{{x}^{{\prime} }}\left(\cos \theta \cos \psi \right)+\,{P}_{0}^{{y}^{{\prime} }}(\cos \theta \cos \psi )-{P}_{0}^{{z}^{{\prime} }}(\sin \theta )+{m}^{{\prime} }[a\cos \theta \cos \psi +b\cos \theta \sin \psi -c\sin \theta ]$$

(12.1)

$$\begin{array}{l}{r}_{e,y}^{I}\,=\,{P}_{0}^{{x}^{{\prime} }}(\sin \phi \sin \theta \cos \psi -\cos \phi \sin \psi )\\ \,+\,{P}_{0}^{{y}^{{\prime} }}(\sin \phi \sin \theta \sin \psi +\cos \phi \cos \psi )+{P}_{0}^{{z}^{{\prime} }}(\sin \phi \cos \theta )\\ \,+\,{m}^{{\prime} }[a(\sin \phi \sin \theta \cos \psi -\cos \phi \sin \psi )\\ \,+\,b(\sin \phi \sin \theta \sin \psi +\cos \phi \cos \psi )+c\sin \phi \cos \theta ]\end{array}$$

(122)

$$\begin{array}{c}{r}_{e,z}^{I}={P}_{0}^{{x}^{{\prime} }}(\cos \phi \sin \theta \cos \psi +\sin \phi \sin \psi )\\ \,\,+\,{P}_{0}^{{y}^{{\prime} }}(\cos \phi \sin \theta \sin \psi -\sin \phi \cos \psi )\\ \,\,+\,{P}_{0}^{{z}^{{\prime} }}(\cos \phi \cos \theta )+{m}^{{\prime} }[a(\cos \phi \sin \theta \cos \psi +\sin \phi \sin \psi )\\ \,\,+\,b(\cos \phi \sin \theta \sin \psi -\sin \phi \cos \psi )+c\cos \phi \cos \theta ]\end{array}$$

(123)

To find the point of intersection, we can substitute the \({r}_{e}^{I}\) component in the equation of the plane and compute the coefficient *m*. Starting with Equation (6):

$$\begin{array}{c}{y}_{0}\left(x-{x}_{0}\right)+{x}_{0}\,y=0\\ {y}_{0}\left({r}_{e,x}^{I}-{x}_{0}\right)+{x}_{0}{r}_{e,y}^{I}=0\end{array}$$

(13)

In order to have a better perspective of the track that the intersection of \({r}_{e}^{I}\) with the screen at different times produces, we treat each of the intersection points as the endpoint of a vector with its base at the inertial frame origin. Then we can rotate these vectors around the *z* axis of the inertial frame with a desired angle to have a better view. In other words, it is as if we have rotated the screen with that angle, since the relative geometry of the points on the screen would not change after this rotation. In order to perform this rotation, we use a Rodrigues’ rotation formula^{63}:

$${{\bf{v}}}_{rot}={\bf{v}}\cos \beta +(\widehat{z}\,\times \,{\bf{v}})+\widehat{z\,}(\widehat{z}\,{\rm{\cdot }}\,{\bf{v}})(1-\cos \beta )$$

(14)

Where **v** is the vector of intersection points and *β* is the rotation angle around the \(\widehat{z}\) axis. If we pick \(\beta ={\sin }^{-1}\left(\frac{{y}_{0}}{\sqrt{{x}_{0}^{2}+{y}_{0}^{2}}}\right)\), we practically rotate the screen such that it becomes parallel to \(\widehat{x}\) and \(\widehat{z}\) axes of the inertial frame and perpendicular to \(\widehat{y}\).

In the above treatment, the gaze vector \({{\bf{r}}}_{e}^{H}\) is fixed in the head frame. However, this is not correct as the pupil would move in the head frame. In order to correct for this effect, we simultaneously recorded eye and head rotations in mice during foraging, as described^{40}, and implemented the pupil rotations in the head frame to our model. This effectively makes the direction of the gaze dependent on pitch, roll and yaw. In mathematical terms, in Equation (7) we would make the correction:

$$d=\left(\begin{array}{c}a\\ b\\ c\end{array}\right)\to d\left(\phi ,\theta ,\psi \right)={R}_{{\rm{correction}}}^{H}\left(\phi ,\theta ,\psi \right)\left(\begin{array}{c}a\\ b\\ c\end{array}\right)$$

(15)

The rest of the transformations follows as before. The correction rotation matrix in head frame, \({R}_{{\rm{correction}}}^{H}\left(\phi ,\,\theta ,\psi \right)\), was computed from the head–eye rotations measured and represented in Extended Data Fig. 9f,g.

#### Neural network model

The neural network model consisted of 3 layers, each with 500 neurons. The first layer of neurons had receptive fields, \({{\bf{z}}}_{i}=\left[{z}_{i,{\rm{NT}}}\,;{z}_{i,{\rm{VD}}}\right]\in {{\mathbb{R}}}^{2}\), spanning a (signed) range of \({z}_{{\rm{NT}}}\in [0,140]\) degrees along the naso-temporal (NT) axis and \({z}_{{\rm{VD}}}\in [0,70]\) degrees along the ventro-dorsal (VD) axis. The angle in the NT–VD plane corresponding to the centre of the receptive field of neuron *i* was denoted \({\theta }_{i}^{{\rm{RF}}}:\,={\tan }^{-1}({z}_{i,{\rm{VD}}}/{z}_{i,{\rm{NT}}})\). Each neuron in layer 1 connected to a corresponding DS neuron in layer 2, which was selective to movement in direction \({\theta }_{i}^{{\rm{DS}}}={\theta }_{i}^{{\rm{RF}}}+{\rm{\pi }}\). That is, if a RF neuron responded to stimuli in a particular part of the receptive field, the corresponding DS neuron responded to motion from this location towards the agent. Finally, layer 3 consisted of motor neurons that were ‘anti-aligned’ with the DS neurons, \({\theta }_{i}^{{\rm{M}}}={\theta }_{i}^{{\rm{DS}}}+{\rm{\pi }}\). Each motor neuron induced movement in the direction \({{\boldsymbol{m}}}_{i}=\left[\cos {\theta }_{i}^{{\rm{M}}};\sin {\theta }_{i}^{{\rm{M}}}\right]\).

All simulations were run using Euler integration with a discrete timestep of size \(\Delta t=0.5\) ms and a neural time constant of \({\tau }_{{\rm{neural}}}\,=\,10\) ms. The firing rates of all neurons evolved according to \({r}_{t+1}={\left[{r}_{t}+\frac{\Delta t}{{\tau }_{{\rm{neural}}}}\left(-{r}_{t}+x+{\epsilon }\right)\right]}_{+}\), where \({\left[\cdot \right]}_{+}\) indicates a rectified linear unit nonlinearity and \({\epsilon } \sim N(0,{\sigma }^{2})\) is Gaussian input noise with s.d. of \(\sigma =0.1\). *x* indicates the input to each neuron, which is described for each layer in the following.

The receptive field neurons responded to a stimulus at location \({\bf{s}}=\left[{s}_{{\rm{NT}}}\,;{s}_{{\rm{VD}}}\right]\) in egocentric coordinates with Gaussian tuning curves of the form \({x}_{i}^{{\rm{RF}}}=1.5\,\exp (-0.6\,\kappa \,{| {\boldsymbol{s}}-{{\boldsymbol{z}}}_{i}| }_{2}^{2})\), where \(\kappa =40\).

In the static setting, the DS neurons received input from the RF neurons such that \({x}_{i}^{{\rm{DS}}}={r}_{i}^{{\rm{RF}}}\). In the kinetic setting, the input was given by \({x}_{i}^{{\rm{DS}}}=\gamma \,\exp \left(\kappa \,\left[\cos \left({\theta }^{{\rm{S}}}-{\theta }_{i}^{{\rm{DS}}}+{\rm{\pi }}\right)-1\right]\right)\), where \({\theta }^{{\rm{S}}}={\tan }^{-1}\left({s}_{{\rm{DV}}}/{s}_{{\rm{NT}}}\right)\) is the angle of the stimulus within the visual field in egocentric coordinates. Here, \(\gamma ={\left[-\frac{{{\boldsymbol{s}}}^{{\rm{T}}}\dot{{\boldsymbol{s}}}}{\left|{\boldsymbol{s}}\right|\left|\dot{{\boldsymbol{s}}}\right|}\right]}_{+}\) is a scale factor that adjusts the input strength according to the ‘concentricity’ of the stimulus, such that the responses of all DS neurons are stronger when the motion of the stimulus (\(\dot{{\bf{s}}}\)) is ‘concentric’ to the stimulus location (\({\bf{s}}\)) in the visual field.

Finally, motor neurons received input from the DS neurons, \({x}_{i}^{{\rm{M}}}={r}_{i}^{{\rm{DS}}}\), and the motion of the agent was computed as \(\Delta {{\bf{a}}}_{t}=\frac{{\Delta }_{t}}{{\tau }_{m}}{\sum }_{i}{r}_{i}^{M}{{\bf{m}}}_{i}\). Simulations were terminated once (1) 2.5 s had elapsed, (2) the stimulus left the receptive field of the agent, or (3) the total agent motion exceeded 200° in the NT direction or 100° in the VD direction. \({\tau }_{m}\) took a default value of 250 ms and was adjusted to modulate the relative velocity of the agent compared to the stimulus.

For quantitative analyses, energy consumption was computed as proportional to the total cumulative movement speed, \(E\propto {\sum }_{t}{\left|\Delta {{\bf{a}}}_{t}\right|}_{2}\). An ‘intercept’ was considered successful if the agent moved within 24° of the stimulus (20% of the receptive field).

### Statistical methods

Data were tested for normality. If the distribution was not normal, non-parametric tests such as Mann–Whitney and Kruskal–Wallis tests were used instead of *t*-tests and ANOVA. When the number of recordings per experimental condition was too low to assess the type of distribution, the data were assumed normal.

In all STA analyses, a *t*-test was applied between the displacement vector and the displacement vectors of the shuffled data with a threshold of 0.05. Neurons were considered to be motion tuned if the *t*-test determined a significant difference between the displacement vectors of the real and shuffled data. Similarly, to determine SI significant, a *t*-test was applied between the SI and the SIs of the shuffled distribution with a threshold of 0.05. To test the angular alignment of visuo-motor neurons Watson–Williams tests were performed between the dataset and a random distribution of equal s.d. centred at 0°, 90°, 180° and 270°, an *F*-statistic value lower than the critical value (3.85) indicates significant similarity between distributions.

All results are presented as mean ± s.e.m. unless otherwise stated. Results were considered statistically significant at **P* ≤ 0.05, ***P* ≤ 0.01, ****P* ≤ 0.001. Relevant *P* values and tests used are reported in the figure legends.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.