Final version.

author: André Nusser <andre.nusser@googlemail.com> 2020-02-10 12:49:41 +0100
committer: André Nusser <andre.nusser@googlemail.com> 2020-02-10 12:49:41 +0100
commit: 6402bb72784f921c029df379285a3e751532157d (patch)
tree: 3de02efee69bb67db77096039c5671945153e150
parent: 4c653f1fc5102bf5b86fa72d225371a257992597 (diff)
1 files changed, 61 insertions, 53 deletions
diff --git a/sampling_alg_lac2020/LAC-20.tex b/sampling_alg_lac2020/LAC-20.tex
index 7e8846b..0d3b9b9 100644
--- a/sampling_alg_lac2020/LAC-20.tex
+++ b/sampling_alg_lac2020/LAC-20.tex
@@ -32,7 +32,7 @@
 %------------------------------------------------------------------------------------------
 %  !  !  !  !  !  !  !  !  !  !  !  ! user defined variables  !  !  !  !  !  !  !  !  !  !  !  !  !  !
 % Please use these commands to define title and author(s) of the paper:
-\def\papertitle{On Sampling Algorithms for Drums}
+\def\papertitle{On Choosing Best Samples for Virtual Drums}
 % \def\paperauthorA{André Nusser}
 % \def\paperauthorB{Bent Bisballe Nyeng}
 \def\paperauthorA{}
@@ -264,31 +264,45 @@
 \maketitle
 
 \begin{abstract}
-\noindent Sampling drum kits well is a difficult and challenging task. Especially, building a drum kit sample bank with different velocity layers requires producing samples of very similar loudness, as changing the gain of a sample after recording makes it sound less natural. An approach that avoids this issue is to not categorize the samples in fixed groups but to simply calculate their loudness and then dynamically choose a sample, when a sample corresponding to e.g.\ a specific MIDI value is requested. We present a first investigation of algorithms doing this selection and discuss their advantages and disadvantages. The seemingly best candidate we implemented in DrumGizmo -- a FLOSS drum plugin -- and we do experiments on how our suggested algorithms perform on the samples drum kits.
+\noindent Sampling drum kits well is a difficult and challenging task. Especially, building a drum kit sample bank with different velocity layers requires producing samples of very similar loudness, as changing the gain of a sample after recording makes it sound less natural. An approach that avoids this issue is to not categorize the samples in fixed groups but to simply calculate their loudness and then dynamically choose a sample, when a sample corresponding to e.g.\ a specific MIDI velocity is requested. We present a first investigation of algorithms doing this selection. The seemingly best candidate we implemented in DrumGizmo -- a free software drum plugin -- and we do experiments on how our suggested algorithms perform on the samples drum kits.
 \end{abstract}
 
 \section{Introduction} \label{sec:introduction}
-\todoandre{Talk about the general problem of sample selection.}
-\todoandre{Limit scope to drums.}
-\todoandre{Talk about round robin.}
-\todoandre{Mention drawbacks.}
-\todoandre{Introduce high-level ideas of our work.}
-\todoandre{Make difference between humanization and sample selection clear.}
-\todo{Motivation: Make sure we have the following covered: Randomized samples,
- Prevent repetitions, Sample coverage (use all samples in the set).}
+% \todoandre{Talk about the general problem of sample selection.}
+When creating virtual instruments that correspond to a certain analogous instrument, we naturally always aim at making them sound as realistic as possible.
+% \todoandre{Limit scope to drums.}
+There are at least two ways to achieve this. First, we can use physical simulations like the famous Pianoteq\footnote{\url{https://www.modartt.com/pianoteq}} virtual instrument does or, second, we can use real samples from the instrument as e.g. most drum plugins do. In this article we focus on the second approach.
+% \todoandre{Make difference between humanization and sample selection clear.}
+When using this approach, a question that naturally comes to mind is how to use the sample data to get the most realistic sound. There are two orthogonal directions to tackle this problem. First, when getting the input of a programmed drum (e.g., as MIDI), we want to humanize it such that the MIDI velocities are according to how a real drummer would play this piece. Second, after applying such a humanization, we get to a lower-level problem which is that of choosing the right sample from our limited data set. Again, we focus on the second point in this article.
+
+% \todoandre{Talk about round robin.}
+The arguable standard for choosing samples is the famous round robin algorithm. This algorithm groups samples of similar loudness together and then selects them in a circular manner (first sample, second sample, \dots, last sample, first sample, \dots).
+% \todoandre{Mention drawbacks.}
+While this is the standard, it has significant drawbacks. First, it requires a somewhat arbitrary grouping of velocities. This in turn might lead to so called \enquote{staircase effects} when playing sweeps. This is usually resolved by scaling the loudness of the sample to the corresponding MIDI velocity. However, this decouples the sample loudness from the actual strength of the hit, again potentially leading to a less natural sound as this leads to inconsistencies between the different played samples.
+
+% \todoandre{Introduce high-level ideas of our work.}
+% \todo{Motivation: Make sure we have the following covered: Randomized samples,
+%  Prevent repetitions, Sample coverage (use all samples in the set).}
+In this work, we introduce a new sample selection algorithm that is not based on grouping of the samples but instead works purely on the given loudness of the samples. Additionally, it does not adjust the gain of single samples. The goals of this algorithm is to choose the best possible sample, according to the requested MIDI velocity after humanization. This means, we want to choose a sample which is as close as possible. However, if we always just choose the closest sample we run into two issues. First, this creates artifacts (as shown later in this article), and second, it leads to a robotic sound when we play the same sample(s) over and over. Thus, we additionally have to make sure to choose a reasonably diverse set of samples from our sample data set. Finally, to avoid further artifacts in the form of patterns, we additionally want randomization to help us breaking these patterns.
+
+\subsection{Our Contribution}
+To the best of our knowledge, this is the first academic article that deals with the issue of selecting samples from a set of samples with \enquote{continuous power values}. To this end, we first identify important aspects that sampling algorithms in this setting have to fulfill. After we formulate those requirements and formalize them to some degree, we present our algorithm based on those requirements, which is based on the computation of a multi-criteria objective function. Consequently, we give an overview over an implementation of this approach and then conduct experiments to evaluate the actual quality. As reference implementation, we use the old sample selection method of DrumGizmo -- an open source drum machine.
 
 \subsection{Related Work}
-\todo{I don't really know what to write, except about round robin. Is there any other common method or any academic literature? Are there other methods in open source programs?}
+Regarding related work, we consider the previous sample selection methods used by DrumGizmo. DrumGizmo is an open source, cross-platform, drum sample engine and
+audio plugin aiming to give an output that is as close to a real drummer as possible.\footnote{The source-code is available through git at \url{git://git.drumgizmo.org/drumgizmo.git}, and the source code can be browsed online at \url{http://cgit.drumgizmo.org/drumgizmo.git/}.}
+
+% \todo{I don't really know what to write, except about round robin. Is there any other common method or any academic literature? Are there other methods in open source programs?}
 %\todobent{Discuss DGs old sampling algorithm briefly.}
-The engine gets a value $l \in [0,1]$ which must then be used by the
-engine for deciding how the output should be produced.
+In DrumGizmo, the engine gets a value $l \in [0,1]$ which must then be used
+for deciding how the output should be produced.
 Some engines use this value as a gain factor but in the case of
 DrumGizmo it is used for sample selection only.
 The early versions used a sample selection algorithm based on velocity
-groups, akin to the one used by sfz \todo{add bib reference}, in which
-each group spans a specfied velocity range and the sample selection is
-made by uniformly randomly selecting one of the samples contained in
-the group corresponding to the input velocity, see Figure \ref{fig:alg1}.
+groups, akin to the one used by sfz\footnote{\url{https://sfzformat.com/}}, in which
+each group spans a specified velocity range and the sample selection is
+made by selecting one of the samples contained in
+the group corresponding to the input velocity uniformly at random. See Figure \ref{fig:alg1} for the flow diagram.
 % {\tiny\begin{verbatim}
 %                    ________________                _________________
 %                   /                \              / uniformly random \
@@ -296,28 +310,27 @@ the group corresponding to the input velocity, see Figure \ref{fig:alg1}.
 %      [0; 1]       \________________/              \__________________/
 % \end{verbatim}}
 
-This algorithm did not give good results in small samplesets so later
+This algorithm did not give good results in small sample sets so later
 an improved algorithm was introduced which was instead based on normal
 distributed random numbers and with power values for each sample in
 the set.
-
 A prerequisite for this new algorithm is that the power of each sample is
 stored along with the sample data of each sample.
 
 The power values of a drum kit are floating point numbers without any
 restrictions but assumed to be positive. Then the input value
-$l$ is mapped using the canonical bijections between $[0,1]$ and
-$[p_{\min}, p_{max}]$ and afterwards shifted\todo{by which
-amount?}. We call this new value $p$.
+$l$ is mapped using the canonical bijection between $[0,1]$ and
+$[p_{\min}, p_{max}]$.
+% and afterwards shifted\todo{by which amount?}.
+We call this new value $p$.
 
 Now the real sample selection algorithm starts. We select a value $p'$
-drawn normal distributed at random from $\mathcal{N}(p', \sigma^2)$,
-where the mean value, $\mu$, is set to the input value $l$ and
-and the stddev, $\sigma$, is a parameter controlled by the user
-expressed in fractions of the size and span of the sampleset.
+drawn normally distributed from $\mathcal{N}(\mu = p', \sigma^2)$,
+where the mean value, $\mu$, is set to the input value $l$
+and the standard deviation, $\sigma$, is a parameter controlled by the user
+expressed in fractions of the size and span of the sample set.
 Now we simply find the sample $s$ with the power $q$ which is closest
-to $p'$ -- ties are broken such that the first minimal value is chosen
-(which is problematic as explained below). In case $s$ is equal to the
+to $p'$. In case $s$ is equal to the
 last sample that we played we repeat this process, otherwise we return
 $s$. If we did not find another sample than the last played after $4$
 iterations, we just return the last played sample, see Figure \ref{fig:alg2}.
@@ -349,9 +362,6 @@ iterations, we just return the last played sample, see Figure \ref{fig:alg2}.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-\subsection{Our Contribution}
-To the best of our knowledge, this is the first academic article that deals with the issue of selecting samples from a set of samples with \enquote{continuous power values}. To this end, we first identify important aspects that sampling algorithms in this setting have to fulfill. After we formulate those requirements and formalize them to some degree, we present our algorithm based on those requirements, which is based on computation of a multi-criteria objective function. Consequently, we give an overview over an implementation of this approach and then conduct experiments to evaluate the actual quality. As reference implementation, we use the old method by DrumGizmo -- an open source drum machine.
-
 \section{Preliminaries}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -361,17 +371,17 @@ Drum samples are cut from recordings of drums with multiple
 microphones.
 Each hit on a drum must be distinguishable from the others and can
 therefore not overlap in time.
-due to the multiple microphones used for the recording, each sample
-spans multiple channels. Due to the speed of sound in air and the
-distance from each drum towards each of the micorphones used the
+Due to the multiple microphones used for the recording, each sample
+spans multiple channels. Additionally, due to the speed of sound in air and the
+distance from each drum towards each of the microphones used, the
 initial sample position in each of the channels will not be at the
-same place - the channel which are the closest to the sound source of
+same place -- the channel which is the closest to the sound source of
 a particular instrument is the \textit{main channel} of that instrument, see Figure \ref{fig:signals}.
 
 \begin{figure}
 	\centering
 	\includegraphics[width=.9\linewidth]{figures/signals.pdf}
-	\caption{Sketch of the original signals of a sample.}
+	\caption{Sketch of the original signals of a sample recorded with multiple microphones.}
 	\label{fig:signals}
 \end{figure}
 
@@ -405,18 +415,18 @@ a particular instrument is the \textit{main channel} of that instrument, see Fig
 %\todobent{Talk about loudness computation of samples.}
 A sample has a \textit{stroke power} which is the physical power used
 by the drummer when making that particular hit.
-Since this is not something that can be easily measured each sample
+Since this is not something that can be easily measured, each sample
 power is instead calculated as the power of the signal in an initial
 attack period of the audio of the main channel:
 \[
-power(s, n) = \sum\limits_{i=0}^n s[i]^2
+power(s, n) = \sum\limits_{i=0}^n s[i]^2,
 \]
-, where $n$ is defined on a per instrument basis and will vary from
-instrument to instrument and $s[i]$ is the ith sample of the main
+where $n$ is defined on a per instrument basis and will vary from
+instrument to instrument and $s[i]$ is the $i$th audio sample of the main
 channel.
 
-Since the power are simply calculated sums of squares they can be used
-for comparing one sample to another and ultimately be used for mapping
+Since the powers are simply sums of squares they can be used
+for comparing one sample to another and ultimately for mapping
 a midi velocity to a matching sample.
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -436,18 +446,18 @@ After reading the drum kit, requests of the form $(i, p) \in I \times \mathbb{R}
 
 % \todoandre{Make terminology and notation clear and check for consistency in the document.}
 \subsection{Notation and Terminology}
-We use the following notation throughout this article. An \emph{instrument} is considered to be one of the drums of the drum kit that we sampled. A \emph{sample} (denoted by $s, s', \dots$) is recording of one hit on a specific instrument. Given a sample $s$, the \emph{power} of it (denoted by $p_s, p_s', \dots$) is the perceived loudness and can be expressed by any common loudness measure of an audio clip. If a power value is requested and does not correspond to a sample, we denote it by $p, p', \dots$. With the term \emph{velocity} (denoted by $v, v', \dots$), we refer to the attack velocity of a MIDI note and it is thus between 0 and 127. We consider time in a discretized way and thus a \emph{time point} is an integer value intuitively referring to the number of time steps passed since the beginning of time. For a sample $s$, we refer with $t_s$ to the time point at which the sample was played last.
+We use the following notation throughout this article. An \emph{instrument} is considered to be one of the drums of the drum kit that we sampled. A \emph{sample} (denoted by $s, s', \dots$) is the recording of one hit on a specific instrument. Given a sample $s$, the \emph{power} of it (denoted by $p_s, p_s', \dots$) is the perceived loudness and can be expressed by any common loudness measure of an audio clip. If a power value is requested and does not correspond to a sample, we denote it by $p, p', \dots$. With the term \emph{velocity} (denoted by $v, v', \dots$), we refer to the attack velocity of a MIDI note and it is thus between 0 and 127. We consider time in a discretized way and thus a \emph{time point} is an integer value intuitively referring to the number of time steps passed since the beginning of time. For a sample $s$, we refer with $t_s$ to the time point at which the sample was played last.
 
 \section{Requirements} \label{sec:requirements}
 
 % \todoandre{Intuitively discuss the requirements of a good sampling algorithm.}
-We now discuss which requirements a good sampling algorithm intuitively has to fulfill. Such an algorithm has a tradeoff between two main objectives: choosing a sample which is close to the requested power value, while not choosing the same sample too close to the previous time it was used. Note that if we just want to be as close as possible to the requested power value, then we would always just choose the closest sample. However, if we now play a sequence of the same instrument at the same power level, then we play the same sample and thereby obtain a robotic sound. Thus, we want to find other samples that are not too far.
+We now discuss which requirements a good sampling algorithm intuitively has to fulfill. Such an algorithm has a trade off between two main objectives: choosing a sample which is close to the requested power value, while not choosing the same sample too close to the previous time it was used. Note that if we just want to be as close as possible to the requested power value, then we would always just choose the closest sample. However, if we now play a sequence of the same instrument at the same power level, then we always play the same sample and thereby obtain a robotic sound. Thus, we want to find other samples that are not too far.
 
 % \todoandre{List the requirements one by one and discuss them. Try to formalize them in some way.}
 More concretely, we aim to fulfill the following requirements with our proposed algorithm.
 \begin{description}
-	\item[Close Sample:] The chosen sample should be reasonably close to the requested power value, such that the listener perceives it as being played at the same velocity.
-	\item[Avoid Same Samples:] When we have multiple samples to choose from we should always take one that was last played far enough in the past to avoid a robotic sound.
+	\item[Close sample:] The chosen sample should be reasonably close to the requested power value, such that the listener perceives it as being played at the same velocity.
+	\item[Avoid same samples:] When we have multiple samples to choose from, we should always take one that was last played far enough in the past to avoid a robotic sound.
 	\item[Randomization:] Furthermore, to avoid patterns (like e.g. in round robin, where exactly every $n$th hit sounds the same when we have $n$ samples in our velocity group), we want some randomization.
 	\item[Locality:] If two samples have a very similar power value, they should also be treated similarly by the algorithm. In other words, locally, samples should have almost the same probability of being chosen.
 \end{description}
@@ -532,9 +542,7 @@ First, note that all extreme choices of the parameters -- meaning that we set on
 
 \section{Implementation} \label{sec:implementation}
 %\todobent{Give a short introduction to DrumGizmo, including a link to the git repository.}
-We added our new sampling algorithm to DrumGizmo, replacing the one it previously used. DrumGizmo is an open source, cross-platform, drum sample engine and
-audio plugin aiming to give an output that is as close to a real drummer as possible.\footnote{The source-code is available through git at \url{git://git.drumgizmo.org/drumgizmo.git}, and the source code can be browsed online at \url{http://cgit.drumgizmo.org/drumgizmo.git/}.}
-
+We added our new sampling algorithm to DrumGizmo, replacing the one it previously used.
 % \todoandre{Talk about how the sampling algorithm was implemented. What do we need to store?}
 The sampling algorithm itself did not require any particular implementation efforts. Most of the time was spent on the theoretical part of it. To give a better overview over the technicalities, we briefly list the information that needs to be stored. In a preprocessing phase, we compute $p_{\min}$ and $p_{\max}$ for each instrument and set all values of the $\mathit{last}$ arrays to 0. The power values of the samples are given by the drum kit in DrumGizmo. The parameters $\alpha, \beta, \gamma$ have default values that were determined experimentally. Each of them can be changed by the user, either interactively in the GUI or via the command line interface.
 
@@ -542,7 +550,7 @@ The sampling algorithm itself did not require any particular implementation effo
 As DrumGizmo is free software, the exact details of the implementation can be checked by everyone.
 
 % \todoandre{Give less important implementation details, e.g., like adaptive search starting from the most promising value}
-For instruments with reasonably small sample sets, simply iterating over all samples for the specific instrument as shown in Algorithm \ref{alg:sampling} is sufficiently performant. However, imagine an instrument with an extremenly large sample set. As DrumGizmo drum kits can be created by everyone, there is no restriction and we cannot assume a small sample size. To avoid performance issues arising from such a scenario, we employ a non-naive search by starting with the \enquote{most promising} sample and then inspecting its neighbors until the currently best sample is known to dominate the remaining samples. More concretely, we do the following: we start with the sample $s$ that has the closest power value $p_s$ to the requested power value $p$, i.e., we find $s$ that minimizes $\abs{p - p_s}$. The key observation why a local search often suffices is that we can lower bound the second and third summand of the objective function by 0. Thus, for a given sample $s$ and a time point $t$, we have
+For instruments with reasonably small sample sets, simply iterating over all samples for the specific instrument as shown in Algorithm \ref{alg:sampling} is sufficiently performant. However, imagine an instrument with an extremely large sample set. As DrumGizmo drum kits can be created by everyone, there is no restriction and we cannot assume a small sample size. To avoid performance issues arising from such a scenario, we employ a non-naive search by starting with the \enquote{most promising} sample and then inspecting its neighbors until the currently best sample is known to dominate the remaining samples. More concretely, we do the following: we start with the sample $s$ that has the closest power value $p_s$ to the requested power value $p$, i.e., we find $s$ that minimizes $\abs{p - p_s}$. The key observation why a local search often suffices is that we can lower bound the second and third summand of the objective function by 0. Thus, for a given sample $s$ and a time point $t$, we have
 \begin{equation}\label{eq:lower_bound}
 	f(s,t) \geq \alpha \cdot \left( \frac{p-p_s}{p_{\max} - p_{\min}}\right)^2.
 \end{equation}
@@ -672,8 +680,8 @@ However, there are still some open problems and directions to be addressed in fu
 % \todo{Thank people for proof-reading?}
 
 %\newpage
-\nocite{*}
-\bibliographystyle{IEEEbib}
-\bibliography{LAC-20} % requires file lac-20.bib
+% \nocite{*}
+% \bibliographystyle{IEEEbib}
+% \bibliography{LAC-20} % requires file lac-20.bib
 
 \end{document}
author	André Nusser <andre.nusser@googlemail.com>	2020-02-10 12:49:41 +0100
committer	André Nusser <andre.nusser@googlemail.com>	2020-02-10 12:49:41 +0100
commit	6402bb72784f921c029df379285a3e751532157d (patch)
tree	3de02efee69bb67db77096039c5671945153e150
parent	4c653f1fc5102bf5b86fa72d225371a257992597 (diff)