A subscription to JoVE is required to view this content. Sign in or start your free trial.
Method Article
The article shows how to use the program SpikeSorter to detect and sort spikes in extracellular recordings made with multi-electrode arrays.
Few stand-alone software applications are available for sorting spikes from recordings made with multi-electrode arrays. Ideally, an application should be user friendly with a graphical user interface, able to read data files in a variety of formats, and provide users with a flexible set of tools giving them the ability to detect and sort extracellular voltage waveforms from different units with some degree of reliability. Previously published spike sorting methods are now available in a software program, SpikeSorter, intended to provide electrophysiologists with a complete set of tools for sorting, starting from raw recorded data file and ending with the export of sorted spikes times. Procedures are automated to the extent this is currently possible. The article explains and illustrates the use of the program. A representative data file is opened, extracellular traces are filtered, events are detected and then clustered. A number of problems that commonly occur during sorting are illustrated, including the artefactual over-splitting of units due to the tendency of some units to fire spikes in pairs where the second spike is significantly smaller than the first, and over-splitting caused by slow variation in spike height over time encountered in some units. The accuracy of SpikeSorter's performance has been tested with surrogate ground truth data and found to be comparable to that of other algorithms in current development.
Anyone who records extracellular signals from the brain using methods more sophisticated than simple on-line thresholding and windowing faces the task of identifying and separating the signals from different neurons from the noisy voltage signals recorded by the electrode. This task is commonly known as spike sorting. The difficulty of spike sorting is compounded by various factors. Neurons can be very close together so that the signals recorded from them by a nearby electrode are likely to be similar and hard to distinguish. The signals produced by a single neuron may vary over time, perhaps because of movements of the electrode, variable sodium channel kinetics during periods of high firing rate, variable degrees of activation of voltage conductances in dendrites that are close to the electrode, or possibly as a result of changes in brain state. These problems can be mitigated by using multi-electrode arrays (MEAs) with many closely spaced (20 - 100 µm) recording channels which allows better spatial definition of the signals from single neurons since they are typically spread out over several channels1,2. However, this, combined with the fact that signals from neurons spread along the entire length of the electrode overlap in space, results in a potentially very high dimensional space within which clusters corresponding to unique neurons need to be identified. This problem becomes computationally intractable for more than a small number of electrode channels. To date, there is no generally agreed-upon best method for spike sorting, though many solutions have been proposed3,4,5,6,7,8 and recordings from MEAs are becoming increasingly common9,10. Because spike sorting is not an end in itself, but is simply a necessary preliminary step before further data analysis, there is a need for an easily usable package that will read in raw recording data files and convert them to sorted spike trains with as little user input, and as quickly and reliably, as possible.
This paper provides a tutorial for the use of SpikeSorter — a program developed with the aim of meeting these needs. The program is based on algorithms described in previously published papers11,12,13. The goals in designing the program were that a) it should have a user-friendly interface requiring little or no prior knowledge of computer programming or of spike sorting methodology; b) few or no other specialized software components beyond standard Windows or Linux operating systems should be needed; c) a wide range of recording data formats for data import and export should be supported; d) the need for user input during sorting should be minimized, and e) sorting times should scale in a reasonable way, ideally linearly, with recording duration and the number of channels on the electrode. The algorithms implemented in the program include a) a flexible set of pre-processing and event detection strategies; b) an automated divide and conquer strategy of dimension reduction which clusters voltage waveforms based on the principal components (PC) distributions obtained from subsets of channels assigned to specific clusters; c) automated clustering of PC distributions with a fast clustering procedure based on the mean-shift algorithm3,14, and d) partially automated pairwise merging and splitting of clusters to ensure that each is as distinct as possible from all others. To this, a set of procedures has been added that allow manual splitting or merging of clusters based on inspection of PC distributions, cross- and auto-correlograms of spike trains and time-amplitude plots of spike waveforms. Recordings from tetrodes, tetrode arrays, Utah arrays as well as single and multi-shank MEAs can be read and sorted. The current limit on the number of channels is 256 but this may be increased in future.
Another cross-platform open-source implementation, "spyke" (http://spyke.github.io), is also available. Written by one of us (MS) in Python and Cython, spyke uses the same overall approach as SpikeSorter, with some differences: to reduce memory demands, raw data is loaded in small blocks, and only when absolutely necessary; clusters are exclusively displayed, manipulated, and sorted in 3D; and principal component and independent component analysis are both used as complementary dimension reduction methods. Spyke requires more user interaction, but relies heavily on keyboard and mouse shortcuts and an undo/redo queue to rapidly explore the effects of various factors on the clustering of any given subset of spikes. These factors include spike channel and time range selection, spike alignment, clustering dimensions and spatial bandwidth (sigma)11.
The following is a brief description of the algorithms and strategies used for sorting. More complete descriptions can be found in previous publications11,12,13 and in annotations that can be accessed via help buttons (identified with a '?') within SpikeSorter. After loading a raw extracellular voltage file and filtering out the lower frequency components, an initial stage of event detection results in a set of events, each of which consists of a brief voltage snapshot before and after the event time. If the electrode sites are sufficiently closely spaced (< 100 µm), single unit signals will generally appear on several neighboring channels. A central channel is automatically chosen for each event, corresponding to the channel on which the peak-to-peak voltage of the event is largest. Automated sorting starts by forming a single initial cluster for each electrode channel, consisting of all the events that were localized to that channel. A unit located midway between channels may give rise to spikes that are localized (perhaps randomly) to different channels: the clusters from these two sets of spikes will be identified as similar and merged at a later stage. The average waveform of the events in each initial cluster is then calculated. This is referred to as the cluster template. Subsidiary channels are assigned to each cluster based on the amplitudes and standard deviation of the template waveforms on each channel. Principal component values are then calculated for each cluster based on the waveforms on the assigned set of channels. The user can choose the number of principal component dimensions to use: usually 2 is sufficient. Each cluster is then split into a further set of clusters, and this is repeated until none can be further split by automated clustering.
At this point, an initial set of say, 64 clusters from a 64-channel electrode, may be split into two or three times that number, depending on the number of units that was present in the recording. But because of the variable assignment of events from single units to different channels, the number of clusters found at this stage is almost certainly larger than it should be. The next stage of sorting is to correct the oversplitting by comparing pairs of clusters and merging similar pairs or reassigning events from one to another. This stage of sorting is referred to as 'merge and split'.
Merging and Splitting
For N clusters, there are N*(N-1)/2 pairs and hence the number of pairs grows as N2, which is undesirable. However, many pairs can be excluded from the comparison because the two members of the pair are physically far apart. This reduces the dependence to something that is more linearly related to the number of channels. In spite of this shortcut, the merge and split stage can still be quite time consuming. It works in the following way. Each cluster pair that is to be compared (those that are physically close together, as judged by the overlap in the channel sets assigned to each) is temporarily merged, though keeping the identities of the spikes in the two member clusters known. The principal components of the merged pair are then calculated. A measure of the overlap between the points in the two clusters is calculated based on the distribution of the first two principal components.
The way the overlap measure is calculated is described in more detail elsewhere11. Its value is zero if the clusters do not overlap at all, i.e. the nearest neighbor of each point is in the same cluster. Its value is close to 1 if the clusters overlap completely, i.e. the probability of the nearest neighbor being in the same cluster is the same as that predicted from a uniform mixing of points.
Various decisions are made which take the overlap measure into account. If the overlap is greater than a certain value, clusters may be merged. If the overlap is very small, the cluster pair may be defined as distinct and left alone. Intermediate values, indicating incomplete separation of the cluster pair, may signal that the pair should be merged and then re-split, the desired outcome being a pair of clusters with less overlap. These procedures are run first in an automated stage and then in a manually guided stage.
In the automated stage, cluster pairs with a high overlap value are merged; then cluster pairs with intermediate to low overlap values are merged and re-split. In the second, user-guided stage, the user is presented with all the remaining ambiguous cluster pairs (i.e. those with overlap values in a defined intermediate range) in sequence and is asked to choose whether a) to merge the pair, b) merge and resplit the pair, c) to declare the pair to be distinct (which will override the significance of the overlap measure), or d) to define the relation between the pair as 'ambiguous' indicating that the spikes in the pair are unlikely to be well sorted. Various tools are provided to help with these decisions, including auto- and cross-correlograms and time series plots of spike height and PC values.
Ideally, at the end of the merging and splitting stages, every cluster should be distinct from all others, either because it has few or no channels in common with other clusters, or because the overlap index is less than a defined value. This value is user-selectable but is typically 0.1. Clusters (units) that pass this test are defined as 'stable', those that do not (because the overlap with one or more other clusters is greater than the threshold) are defined as 'unstable'. In practice, the great majority of units end up being defined as 'stable' at the finish of sorting, leaving the remainder to either be discarded or treated as potentially multi-unit.
Software Requirements
SpikeSorter is compatible with 64 bit versions of Windows 7 and Windows 10, and has also been run successfully under Linux using the Wine emulator. Data files are loaded completely into memory (for speed) hence available RAM needs to scale with the size of the recording (allow about 2 GB for the program itself). Electrophysiological data files larger than 130 GB in size have been successfully sorted in both Windows and Linux environments. Options are accessed through standard Windows menus, a toolbar and dialogs. The layout of items on the menu matches roughly the order of operations in sorting, beginning with the 'File' menu on the left for data input and the 'Export' menu on the right allowing for export of sorted data. Toolbar buttons provide shortcuts to commonly used menu items.
The Channel Configuration File
Many recording data formats do not store channel locations. However, knowing these is essential for spike sorting. Channels may also be numbered in various ways by acquisition software: SpikeSorter requires that channels are numbered in sequence, beginning with channel 1. Thus, an ancillary electrode configuration file has to be created that can remap channel numbers to follow the sequential rule, and to store channel locations. The channel configuration file is a text file with a single row of text for each channel. The first line of the file stores a text name, up to 16 characters long, that identifies the electrode. The numbers in subsequent lines can be separated by tabs, a single comma, or spaces. There are four numbers in each row providing (in order): the channel number in the file, the channel number to which it is to be mapped (i.e. the number that will be used by SpikeSorter), and the x and y coordinates of the channel, in microns. The x coordinate would normally be taken as perpendicular to the direction of electrode insertion and the y coordinate accordingly would be depth into the tissue. The configuration file has to be placed in the same directory as the recording file. There is some flexibility in how it can be named. The program will first search for a file that has the same name as the raw data file but with a .cfg extension. If that file is not found, it will search for the file 'electrode.cfg'. If that file in turn is not found an error message is generated to indicate a lack of channel layout information.
1. Program Setup
2. Event Detection
3. Sorting
NOTE: The next step is not normally performed before routine sorting, but it is very useful to do it when sorting for the first time, or when encountering unfamiliar data.
4. Customization
5. Merge and Split
6. Review – Post-processing
Figure 7 shows the display (obtained by going to 'View - Sorted waveforms') for a typical sorted recording. The default view option is just to show the waveforms on the center channel for each cluster. A common experience is that waveforms for a cluster pair on the same channel look identical, but when the 'Compare pairs' dialog is used to examine the two clusters there are distinct clusters in the PC projection, most often resulting from waveform differen...
File Formats
Currently supported file formats include Neuralynx (.ntt and .ncs), Plexon (.plx), Neuroscope (.xml + .dat), MultiChannel Systems (.mcd), Blackrock (.nev) and Intan (.rhd). For unsupported formats, there are two options. One is to request addition of the file format to an upcoming release (an email link to the developer is provided in the 'Help - About' dialog). The other is to convert the file to a supported format. A simple option is to use the time-spike-format '.tsf&#...
The authors have nothing to disclose.
We thank those individuals and groups who have used SpikeSorter and who have provided requests for file format support and suggestions and feedback on how to improve it. These include Youping Xiao, Felix Fung, Artak Khachatryan, Eric Kuebler, Curtis Baker, Amol Gharat and Dongsheng Xiao. We thank Adrien Peyrache for the false positive and negative figures given in 'Representative Results'.
Name | Company | Catalog Number | Comments |
spikesorter.exe | N/A | http://www.swindale.ecc.ubc.ca/SpikeSorter |
Request permission to reuse the text or figures of this JoVE article
Request PermissionThis article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. All rights reserved