summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAndré Nusser <anusser@mpi-inf.mpg.de>2020-11-22 16:27:04 +0100
committerAndré Nusser <anusser@mpi-inf.mpg.de>2020-11-22 16:27:04 +0100
commitaae29c6916af5104636bdac2a7f47d2ad44f5123 (patch)
tree44ac4117f6200cb50c4b01342af5c416e0a57877
parentbb983ea5d9cb2e4047c4237657bd08fa877d5215 (diff)
Pass over the whole paper.
-rw-r--r--sampling_alg_lac2020/LAC-20.bib25
-rw-r--r--sampling_alg_lac2020/LAC-20.tex106
2 files changed, 74 insertions, 57 deletions
diff --git a/sampling_alg_lac2020/LAC-20.bib b/sampling_alg_lac2020/LAC-20.bib
index 9247ddb..0a04fd7 100644
--- a/sampling_alg_lac2020/LAC-20.bib
+++ b/sampling_alg_lac2020/LAC-20.bib
@@ -1,5 +1,22 @@
-@inproceedings{Guy:17,
- author = {One Guy},
- TITLE = {Fancy Title},
- year = {2017},
+@manual{Pianoteq,
+ title = "Pianoteq Website",
+ author = "MODARTT",
+ note = "\url{https://www.modartt.com/pianoteq}",
+ year = "2020 (accessed November 22, 2020)"
}
+
+@manual{Drumgizmo,
+ title = "DrumGizmo Website",
+ author = "DrumGizmo Team",
+ note = "\url{https://drumgizmo.org}",
+ year = "2020 (accessed November 22, 2020)"
+}
+
+@manual{Sfz,
+ title = "SFZ Format",
+ author = "",
+ note = "\url{https://sfzformat.com/}",
+ year = "2020 (accessed November 22, 2020)"
+}
+
+
diff --git a/sampling_alg_lac2020/LAC-20.tex b/sampling_alg_lac2020/LAC-20.tex
index 1ca3240..d83eb75 100644
--- a/sampling_alg_lac2020/LAC-20.tex
+++ b/sampling_alg_lac2020/LAC-20.tex
@@ -264,33 +264,33 @@
\maketitle
\begin{abstract}
-\noindent Sampling drum kits well is a difficult and challenging task. Especially, building a drum kit sample bank with different velocity layers requires producing samples of very similar loudness, as changing the gain of a sample after recording makes it sound less natural. An approach that avoids this issue is to not categorize the samples in fixed groups but to simply calculate their loudness and then dynamically choose a sample, when a sample corresponding to e.g.\ a specific MIDI velocity is requested. We present a first investigation of algorithms performing this selection. The seemingly best candidate we implemented in DrumGizmo -- a free software drum plugin -- and we do experiments on how our suggested algorithm performs on the sampled drum kits.
+\noindent Sampling drum kits well is a difficult and challenging task. Especially, building a drum kit sample bank with different velocity layers requires producing samples of very similar loudness, as changing the gain of a sample after recording makes it sound less natural. An approach that avoids this issue is to not categorize the samples in fixed groups but to simply calculate their loudness and then dynamically choose a sample, when a sample corresponding to, e.g., a specific MIDI velocity is requested. We present a first investigation of algorithms performing this selection. We implemented the seemingly best candidate in DrumGizmo -- a free software drum plugin -- and we do experiments on how our suggested algorithm performs on the sampled drum kits.
\end{abstract}
\section{Introduction} \label{sec:introduction}
% \todoandre{Talk about the general problem of sample selection.}
-When creating virtual instruments that correspond to a certain analogous instrument, we naturally always aim at making them sound as realistic as possible.
+When creating virtual instruments that correspond to a certain analog instrument, we naturally always aim at making them sound as realistic as possible.
% \todoandre{Limit scope to drums.}
-There are at least two ways to achieve this. First, we can use physical simulations like the famous Pianoteq\footnote{\url{https://www.modartt.com/pianoteq}} virtual instrument does or, second, we can use real samples from the instrument as e.g. most drum plugins do. In this article we focus on the second approach.
+There are at least two ways to achieve this. First, we can use physical simulations like the famous Pianoteq virtual instrument \cite{Pianoteq} does or, second, we can use real samples from the instrument as, e.g., most drum plugins do. In this article we focus on the second approach.
% \todoandre{Make difference between humanization and sample selection clear.}
When using this approach, a question that naturally comes to mind is how to use the sample data to get the most realistic sound. There are two orthogonal directions to tackle this problem. First, when getting the input of a programmed drum (e.g., as MIDI), we want to humanize it such that the MIDI velocities are according to how a real drummer would play this piece. Second, after applying such a humanization, we get to a lower-level problem which is that of choosing the right sample from our limited data set. Again, we focus on the second point in this article.
% \todoandre{Talk about round robin.}
-The arguable standard for choosing samples is the famous round robin algorithm. This algorithm groups samples of similar loudness together and then selects them in a circular manner (first sample, second sample, \dots, last sample, first sample, \dots).
+The arguable standard for choosing samples is the famous Round Robin algorithm. This algorithm groups samples of similar loudness together and then selects them in a circular manner (first sample, second sample, \dots, last sample, first sample, \dots).
% \todoandre{Mention drawbacks.}
-While this is the standard, it has significant drawbacks. First, it requires a somewhat arbitrary grouping of velocities. This in turn might lead to so called \enquote{staircase effects} when playing sweeps. This is usually resolved by scaling the loudness of the sample to the corresponding MIDI velocity. However, this decouples the sample loudness from the actual strength of the hit, again potentially leading to a less natural sound as this leads to inconsistencies between the different played samples.
+While this is the standard, it has significant drawbacks. First, it requires a somewhat arbitrary grouping of velocities. This in turn might lead to so called \enquote{staircase effects} when playing sweeps from quiet to loud notes. This is often resolved by scaling the loudness of the sample to the corresponding MIDI velocity. However, this decouples the sample loudness from the actual strength of the hit, again potentially leading to a less natural sound as this leads to inconsistencies between the different played samples.
% \todoandre{Introduce high-level ideas of our work.}
% \todo{Motivation: Make sure we have the following covered: Randomized samples,
% Prevent repetitions, Sample coverage (use all samples in the set).}
-In this work, we introduce a new sample selection algorithm that is not based on grouping of the samples but instead works purely on the given loudness of the samples. Additionally, it does not adjust the gain of single samples. The goals of this algorithm is to choose the best possible sample, according to the requested MIDI velocity after humanization. This means, we want to choose a sample which is as close as possible. However, if we always just choose the closest sample we run into two issues. First, this creates artifacts (as shown later in this article), and second, it leads to a robotic sound when we play the same sample(s) over and over. Thus, we additionally have to make sure to choose a reasonably diverse set of samples from our sample data set. Finally, to avoid further artifacts in the form of patterns, we additionally want randomization to help us breaking these patterns.
+In this work, we introduce a new sample selection algorithm that is not based on grouping of the samples, but instead works purely on the given loudness of the samples. Additionally, it does not adjust the gain of single samples. The goal of this algorithm is to choose the best possible sample, according to the requested MIDI velocity after humanization. This means, we want to choose a sample which is as close as possible. However, if we always just choose the closest sample we run into two issues. First, this creates artifacts (as shown later in this article), and second, it leads to a robotic sound when we play the same sample(s) over and over. Thus, we additionally have to make sure to choose a reasonably diverse set of samples from our sample data set. Finally, to avoid further artifacts in the form of patterns, we additionally want randomization to help us breaking these patterns.
\subsection{Our Contribution}
-To the best of our knowledge, this is the first academic article that deals with the issue of selecting samples from a set of samples with \enquote{continuous power values}. To this end, we first identify important aspects that sampling algorithms in this setting have to fulfill. After we formulate those requirements and formalize them to some degree, we present our algorithm based on those requirements, which is based on the computation of a multi-criteria objective function. Consequently, we give an overview over an implementation of this approach and then conduct experiments to evaluate the actual quality. As reference implementation, we use the old sample selection method of DrumGizmo -- an open source drum machine.
+To the best of our knowledge, this is the first academic article that deals with the issue of selecting best samples from a set of samples with \enquote{continuous power values}. To this end, we first identify important aspects that sampling algorithms in this setting have to fulfill. After we formulate these requirements and formalize them to some degree, we present our resulting algorithm, which is based on the computation of a multi-criteria objective function. Consequently, we give an overview over an implementation of this approach and then conduct experiments to evaluate the actual quality. As reference implementation, we use the old sample selection method of DrumGizmo \cite{Drumgizmo} -- an open source drum machine.
\subsection{Related Work}
Regarding related work, we consider the previous sample selection methods used by DrumGizmo. DrumGizmo is an open source, cross-platform, drum sample engine and
-audio plugin aiming to give an output that is as close to a real drummer as possible.\footnote{The source-code is available through git at \url{git://git.drumgizmo.org/drumgizmo.git}, and the source code can be browsed online at \url{http://cgit.drumgizmo.org/drumgizmo.git/}.}
+audio plugin aiming to give an output that is as close to a real drummer as possible.
% \todo{I don't really know what to write, except about round robin. Is there any other common method or any academic literature? Are there other methods in open source programs?}
%\todobent{Discuss DGs old sampling algorithm briefly.}
@@ -299,7 +299,7 @@ for deciding how the output should be produced.
Some engines use this value as a gain factor but in the case of
DrumGizmo it is used for sample selection only.
The early versions used a sample selection algorithm based on velocity
-groups, akin to the one used by sfz\footnote{\url{https://sfzformat.com/}}, in which
+groups, akin to the one used by the sfz format \cite{Sfz}, in which
each group spans a specified velocity range and the sample selection is
made by selecting one of the samples contained in
the group corresponding to the input velocity uniformly at random. See Figure \ref{fig:alg1} for the flow diagram.
@@ -310,13 +310,13 @@ the group corresponding to the input velocity uniformly at random. See Figure \r
% [0; 1] \________________/ \__________________/
% \end{verbatim}}
-This algorithm did not give good results in small sample sets so later
+This algorithm did not give good results on small sample sets, so later
an improved algorithm was introduced which was instead based on normal
distributed random numbers and with power values for each sample in
the set.
A prerequisite for this new algorithm is that the power of each sample is
stored along with the sample data of each sample.
-
+%
The power values of a drum kit are floating point numbers without any
restrictions but assumed to be positive. Then the input value
$l$ is mapped using the canonical bijection between $[0,1]$ and
@@ -324,16 +324,16 @@ $[p_{\min}, p_{max}]$.
% and afterwards shifted\todo{by which amount?}.
We call this new value $p$.
-Now the real sample selection algorithm starts. We select a value $p'$
-drawn normally distributed from $\mathcal{N}(\mu = p', \sigma^2)$,
-where the mean value, $\mu$, is set to the input value $l$
-and the standard deviation, $\sigma$, is a parameter controlled by the user
+Now we describe the aforementioned improved sample selection algorithm. We select a value $p'$
+drawn from the normal distribution $\mathcal{N}(\mu = p', \sigma^2)$,
+where the mean value $\mu$ is set to the input value $l$
+and the standard deviation $\sigma$ is a parameter controlled by the user
expressed in fractions of the size and span of the sample set.
Now we simply find the sample $s$ with the power $q$ which is closest
to $p'$. In case $s$ is equal to the
-last sample that we played we repeat this process, otherwise we return
+last sample that we played, we repeat this process, otherwise we return
$s$. If we did not find another sample than the last played after $4$
-iterations, we just return the last played sample, see Figure \ref{fig:alg2}.
+iterations, we just return the last played sample, see Figure \ref{fig:alg2} for the flow diagram.
% {\tiny\begin{verbatim}
% stddev
@@ -347,14 +347,14 @@ iterations, we just return the last played sample, see Figure \ref{fig:alg2}.
\begin{figure}
\centering
- \includegraphics[width=.9\linewidth]{figures/alg1.pdf}
+ \includegraphics[width=\linewidth]{figures/alg1.pdf}
\caption{Flow diagram of the first sampling algorithm of DrumGizmo.}
\label{fig:alg1}
\end{figure}
\begin{figure}
\centering
- \includegraphics[width=.9\linewidth]{figures/alg2.pdf}
+ \includegraphics[width=\linewidth]{figures/alg2.pdf}
\caption{Flow diagram of the second sampling algorithm of DrumGizmo.}
\label{fig:alg2}
\end{figure}
@@ -373,7 +373,7 @@ Each hit on a drum must be distinguishable from the others and can
therefore not overlap in time.
Due to the multiple microphones used for the recording, each sample
spans multiple channels. Additionally, due to the speed of sound in air and the
-distance from each drum towards each of the microphones used, the
+distance of each drum from each of the microphones used, the
initial sample position in each of the channels will not be at the
same place -- the channel which is the closest to the sound source of
a particular instrument is the \textit{main channel} of that instrument, see Figure \ref{fig:signals}.
@@ -413,7 +413,7 @@ a particular instrument is the \textit{main channel} of that instrument, see Fig
%\todobent{Talk about loudness computation of samples.}
-A sample has a \textit{stroke power} which is the physical power used
+A sample has a \textit{stroke power} that is the physical power used
by the drummer when making that particular hit.
Since this is not something that can be easily measured, each sample
power is instead calculated as the power of the signal in an initial
@@ -422,12 +422,12 @@ attack period of the audio of the main channel:
\mathit{power}(s, n) = \sum\limits_{i=0}^n s[i]^2,
\]
where $n$ is defined on a per instrument basis and will vary from
-instrument to instrument and $s[i]$ is the $i$th audio sample of the main
+instrument to instrument, and $s[i]$ is the $i$th audio sample of the main
channel.
-Since the powers are simply sums of squares they can be used
+Since the powers are simply sums of squares, they can be used
for comparing one sample to another and ultimately for mapping
-a midi velocity to a matching sample.
+a MIDI velocity to a matching sample.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -446,7 +446,7 @@ After reading the drum kit, requests of the form $(i, p) \in I \times \mathbb{R}
% \todoandre{Make terminology and notation clear and check for consistency in the document.}
\subsection{Notation and Terminology}
-We use the following notation throughout this article. An \emph{instrument} is considered to be one of the drums of the drum kit that we sampled. A \emph{sample} (denoted by $s, s', \dots$) is the recording of one hit on a specific instrument. Given a sample $s$, the \emph{power} of it (denoted by $p_s, p_s', \dots$) is the perceived loudness and can be expressed by any common loudness measure of an audio clip. If a power value is requested and does not correspond to a sample, we denote it by $p, p', \dots$. With the term \emph{velocity} (denoted by $v, v', \dots$), we refer to the attack velocity of a MIDI note and it is thus between 0 and 127. We consider time in a discretized way and thus a \emph{time point} is an integer value intuitively referring to the number of time steps passed since the beginning of time. For a sample $s$, we refer with $t_s$ to the time point at which the sample was played last.
+We use the following notation throughout this article. An \emph{instrument} is considered to be one of the drums of the drum kit that we sampled. A \emph{sample} (denoted by $s, s', \dots$) is the recording of one hit on a specific instrument. Given a sample $s$, its \emph{power} $p_s$ is the perceived loudness and can be expressed by any common loudness measure of an audio clip. If a power value is requested and does not correspond to a sample, we denote it by $p, p', \dots$. With the term \emph{velocity}, we refer to the attack velocity of a MIDI note and it is thus between 0 and 127. We consider time in a discretized way and thus a \emph{time point} is an integer value intuitively referring to the number of time steps passed since the beginning of time. For a sample $s$, we use $t_s$ to refer to the time point at which the sample was played last.
\section{Requirements} \label{sec:requirements}
@@ -458,11 +458,11 @@ More concretely, we aim to fulfill the following requirements with our proposed
\begin{description}
\item[Close sample:] The chosen sample should be reasonably close to the requested power value, such that the listener perceives it as being played at the same velocity.
\item[Avoid same samples:] When we have multiple samples to choose from, we should always take one that was last played far enough in the past to avoid a robotic sound.
- \item[Randomization:] Furthermore, to avoid patterns (like e.g. in round robin, where exactly every $n$th hit sounds the same when we have $n$ samples in our velocity group), we want some randomization.
- \item[Locality:] If two samples have a very similar power value, they should also be treated similarly by the algorithm. In other words, locally, samples should have almost the same probability of being chosen.
+ \item[Randomization:] Furthermore, to avoid patterns (like e.g. in Round Robin, where exactly every $n$th hit sounds the same when we have $n$ samples in our velocity group), we want some randomization.
+ \item[Locality:] If two samples have a similar power value, they should also be treated similarly by the algorithm. In other words, locally, samples should have almost the same probability of being chosen.
\end{description}
-We now formalize the requirements stated above. Let $p, p'$ be two power levels. We define their dissimilarity to simply be their distance $\abs{p - p'}$. Thus, if $p$ is the input power value and $p'$ is the power value of the chosen sample, we want to minimize the above term. Let $s$ be a sample and $t_s$ the time point it was played last. When we are now queried for a sample at time $t$, then for $s$ to be a good sample, we want $t - t_s$ to be reasonably high. Again, we just use the distance between the current time step and the last time step a sample was used. Randomization is difficult to formalize in a simple way in this context, thus, we simply require that for the same history, different outcomes of choosing a sample should be possible. The last requirement we also state in a more intuitive than formal way. Assume we are requested a sample for the power value $p$ and the two samples $s, s'$ have a very similar power value. Then, if we exchange $t_s$ and $t_{s'}$, the probability of choosing $s$ over $s'$ should be roughly the same as if we do not exchange them.
+We now formalize the requirements stated above. Let $p, p'$ be two power levels. We define their dissimilarity to simply be their distance $\abs{p - p'}$. Thus, if $p$ is the input power value and $p'$ is the power value of the chosen sample, we want to minimize the above term. Let $s$ be a sample and $t_s$ the time point it was played last. When we are now queried for a sample at time $t$, then for $s$ to be a good sample, we want $t - t_s$ to be reasonably high. Again, we just use the distance between the current time step and the last time step a sample was used. Randomization is difficult to formalize in a simple way in this context, thus, we just require that for the same history, different outcomes of choosing a sample should be possible. The last requirement we also state in a rather intuitive than formal way. Assume we are requested a sample for the power value $p$ and the two samples $s, s'$ have a very similar power value. Then, if we exchange $t_s$ and $t_{s'}$, the probability of choosing $s$ over $s'$ should be roughly the same as if we do not exchange them.
\section{Algorithm}
% Before we start describing the approach that we decided for, let us first discuss other approaches that might seem promising at this point of the discussion. \todoandre{Also quickly discuss other alternative models that are following naturally from above observations; but wait until Bent wrote the related work part to make sure to be consistent with that.}
@@ -471,14 +471,14 @@ We now formalize the requirements stated above. Let $p, p'$ be two power levels.
In this section we discuss the new algorithm that we suggest for sample selection. The requirements mentioned in Section \ref{sec:requirements} consist of several different objectives. Thus, we are dealing with a multi-objective optimization, where we have to somehow combine the different objectives into one. As we are dealing with an interactive setting where the quality of the solution of the optimization is determined by the user, it seems natural to make the algorithm parametrized and expose the parameters to the user. Using these parameters, the user can influence how the different objectives are weighted and thereby influence the final result, just having a very rough intuitive and non-technical understanding of the process.
% \todoandre{Talk about the general idea of a potential function.}
-Following from the above description, we choose a single objective function that we optimize. This objective function consists of the terms described in Section \ref{sec:requirements} as well as the parameters that are factors in front of the requirements terms. We formulate our objective function such that smaller values are better and thus we end up with a minimization problem. Consequently, we just have to evaluate a single term on each sample and then pick the one which has the best value.
+Following from the above description, we choose a single objective function that we optimize. This objective function consists of the terms that were roughly outlined in Section \ref{sec:requirements} as well as the parameters that are factors in front of the requirements terms. We formulate our objective function such that smaller values are better and thus we end up with a minimization problem. Consequently, we just have to evaluate a single term on each sample and then pick the sample which has the smallest value.
% \todoandre{Introduce the potential function of the algorithm.}
We now give the objective function. Let $p$ be the power that is requested. As before, for any sample $s$, let $t_s$ be the time at which $s$ was played last (the unit does not matter as it is parametrized by $\beta$ anyway), and let $r(s,t)$ be a random number generator producing numbers in the range $[0,1]$ uniformly and independently at random. Let $\alpha, \beta, \gamma > 0$ be the parameters that are later exposed to the user of the sample algorithm. Also, recall that $p_{\min}, p_{\max}$ is the minimal and maximal power value, respectively, and that $S$ is the sample rate, i.e., the number of time steps per second. At the current time $t$, we now want to find the sample $s$ minimizing the objective function
\[
f(s, t) \coloneqq \alpha \cdot \left( \frac{p-p_s}{p_{\max} - p_{\min}}\right)^2 + \beta \cdot \left( 1 + \frac{t - t_s}{S}\right)^{-1} + \gamma \cdot r(s,t).
\]
-Note that we have to ensure $t_s \neq t$ to avoid division by zero.
+Note that we have to ensure $p_{\max} \neq p_{\min}$ to avoid division by zero.
% \todoandre{Talk about the single terms of the potential function.}
Let us now consider the objective function in more detail. The objective function consists of three summands and we will have a closer look at them in order.
@@ -487,7 +487,7 @@ The first summand, namely
\[
\alpha \cdot \left( \frac{p-p_s}{p_{\max} - p_{\min}}\right)^2,
\]
-is for expressing the distance of the sample's power value $p_s$ to the desired power value $p$. Instead of using the absolute value $\abs{p-p_s}$ as discussed in Section \ref{sec:requirements}, we first normalize the value to be in the range $[0,1]$ and then square it. By squaring, we put a significantly stronger penalty than the absolute value on samples whose power value is very far from the requested power value. Note that if we request a power value that lies in the middle of the power values of our sample set, then this term will maximally be around $\frac{1}{4}\alpha$. However, if $p = p_{\max}$ or $p = p_{\min}$, then we might obtain a value of $\alpha$. While this might seem unreasonable at first glance, we want to highlight that this indeed matches the desired behavior, because the penalty should be only dependent on the distance to the sample and not the possible worst case value it can attain. In other words, a bad choice should not be make to look better, if there are much worse choices possible.
+is for expressing the distance of the sample's power value $p_s$ to the desired power value $p$. Instead of using the absolute value $\abs{p-p_s}$ as discussed in Section \ref{sec:requirements}, we first normalize the value to be in the range $[0,1]$ and then square it. By squaring, we put a significantly stronger penalty than the absolute value on samples whose power value is very far from the requested power value. Note that if we request a power value that lies in the middle of the power values of our sample set, then this term will maximally be around $\frac{1}{4}\alpha$. However, if $p = p_{\max}$ or $p = p_{\min}$, then we might obtain a value of $\alpha$. While this might seem unreasonable at first glance, we want to highlight that this indeed matches the desired behavior, because the penalty should be only dependent on the distance to the sample and not the possible worst case value it can attain. In other words, a bad choice should not be made to look better, if there are much worse choices possible.
The second summand, namely
\[
@@ -505,10 +505,10 @@ just adds some noise to the process to make it non-deterministic and thus avoid
We already explained the core part of the sample selection algorithm. The remainder is now straight-forward. We simply evaluate the objective function for each sample and then pick the one with the smallest value. For completeness, Algorithm \ref{alg:sampling} shows the pseudo code for the sample selection algorithm.
\begin{algorithm}
\begin{algorithmic}
- \Require Requested power $p$, Instrument $i$, current time step $t$, parameters $\alpha, \beta, \gamma$, and array $\mathit{last}$ with the time points a sample has been played last
+ \Require Requested power $p$, instrument $i$, current time step $t$, parameters $\alpha, \beta, \gamma$, and array $\mathit{last}$ with the time points a sample has been played last
\Ensure Sample $s$
- \State $s = \text{undefined}$
- \State $f_{\min} = \infty$
+ \State $s \gets \text{undefined}$
+ \State $f_{\min} \gets \infty$
\For{$s' \in S_i$}
\State $v \gets \alpha \cdot \left( \frac{p-p_{s'}}{p_{\max} - p_{\min}}\right)^2 + \beta \cdot \left( 1 + \frac{t - \mathit{last}[s']}{S}\right)^{-1} + \gamma \cdot r(s',t)$
\If{$v < f_{\min}$}
@@ -519,30 +519,30 @@ We already explained the core part of the sample selection algorithm. The remain
\State $\mathit{last}[s] = t$
\State \Return $s$
\end{algorithmic}
-\caption{This is the high-level code of the sampling function.}
+\caption{This is the pseudo code of the sampling function.}
\label{alg:sampling}
\end{algorithm}
-Note that the worst-case complexity of evaluating the objective function is linear in the number of samples for the instrument that we are considering. However, in practice we can avoid evaluation for most samples by simply starting with the \enquote{most promising} sample and recursively evaluate the neighbors until the future possible evaluations cannot beat the currently best value.
+Note that the worst-case complexity of evaluating the objective function is linear in the number of samples for the instrument that we are considering. However, in practice we can avoid evaluation for most samples by simply starting with the \enquote{most promising} sample and recursively evaluating the neighbors (with respect to power value) until the future possible evaluations cannot beat the currently best value.
\section{Emulating Other Sample Selection Algorithms}
One of the main advantages of the described sampling algorithm is that it can emulate the most common sample choice algorithms.
-Sometimes this can be done with just adjusting the parameters $\alpha, \beta, \gamma$, and sometimes we have to prepare the power values of the drum kit accordingly.
-In the following we describe which algorithms can be emulated and how we have to set the parameters and power values for that.
+Sometimes this can be done by just adjusting the parameters $\alpha, \beta, \gamma$, and sometimes we have to prepare the power values of the drum kit accordingly.
+In the following, we describe which algorithms can be emulated and how we have to set the parameters and power values for that.
-First, note that all extreme choices of the parameters -- meaning that we set one parameter of $\alpha, \beta, \gamma$ to a positive value and all other to zero -- emulate different selection algorithms.
+First, note that all extreme choices of the parameters -- meaning that we set one parameter of $\alpha, \beta, \gamma$ to a positive value and all others to zero -- emulate different selection algorithms.
-\paragraph{Choose Closest.} If we set $\alpha > 0$ and $\beta = \gamma = 0$, then the objective function reduces to the first summand and thus we just alway choose the sample $s$ that minimizes $\abs{p - p_s}$, i.e., the closest sample.
+\paragraph{Choose Closest.} If we set $\alpha > 0$ and $\beta = \gamma = 0$, then the objective function reduces to the first summand and thus we just always choose the sample $s$ that minimizes $\abs{p - p_s}$, i.e., the closest sample.
-\paragraph{Choose Oldest.} Similarly, if $\beta > 0$ but $\alpha = \gamma = 0$, then the objective function reduces to the second summand and thus is minimal by the sample $s$ that maximizes $t-t_s$, i.e., the oldest sample.
+\paragraph{Choose Oldest.} Similarly, if $\beta > 0$ but $\alpha = \gamma = 0$, then the objective function reduces to the second summand and thus is minimized by the sample $s$ that maximizes $t-t_s$, i.e., the sample that was last played the furthest back in time.
\paragraph{Random Selection.} If now $\gamma > 0$ and $\alpha = \beta = 0$, then the objective function reduces to the third summand and we thus always select a sample uniformly at random.
-\paragraph{Round Robin.} The previously mentioned emulations were straight forward, however, the arguably most commonly used sample selection algorithm in practice is round robin. As already discussed in Section \ref{sec:introduction}, round robin assumes the samples to already be grouped. In our case this means that samples $s_1, \dots,s_k$ that belong to the same group should all be assigned the same power value, i.e., $p_{s_1} = \cdots = p_{s_k}$. Now, if there is a query with power value $p$, we want to always choose the closest group of samples, thus $\alpha$ should be large compared to $\beta$ and $\gamma$, e.g. $\alpha = 1$. After restricting to the samples of a specific group, we now always want to play the oldest sample, thus we simply want a small value $\beta > 0$, say $\beta = 1/1000$. If we additionally want to randomize round robin in a way that we sometimes choose the second or third oldest sample, then we want to set $\gamma$ to a small value smaller than but in a similar range as $\beta$, say $\gamma = 1/4000$.
+\paragraph{Round Robin.} The previously mentioned emulations were straight forward, however, the arguably most commonly used sample selection algorithm in practice is Round Robin. As already discussed in Section \ref{sec:introduction}, Round Robin assumes the samples to already be grouped. In our case this means that samples $s_1, \dots,s_k$ that belong to the same group should all be assigned the same power value, i.e., $p_{s_1} = \cdots = p_{s_k}$. Now, if there is a query with power value $p$, we want to always choose the closest group of samples, thus $\alpha$ should be large compared to $\beta$ and $\gamma$, e.g., $\alpha = 1$. After restricting to the samples of a specific group, we now always want to play the oldest sample, thus we simply want a small value $\beta > 0$, say $\beta = 1/1000$. If we additionally want to randomize Round Robin in a way that we sometimes choose the second or third oldest sample, then we want to set $\gamma$ to a small value smaller than but in a similar range as $\beta$, say $\gamma = 1/4000$.
\section{Implementation} \label{sec:implementation}
%\todobent{Give a short introduction to DrumGizmo, including a link to the git repository.}
-We added our new sampling algorithm to DrumGizmo, replacing the one it previously used.
+We added our new sampling algorithm to DrumGizmo, replacing the one it previously used.\footnote{The source-code is available through git at \url{git://git.drumgizmo.org/drumgizmo.git}, and the source code can be browsed online at \url{http://cgit.drumgizmo.org/drumgizmo.git/}.}
% \todoandre{Talk about how the sampling algorithm was implemented. What do we need to store?}
The sampling algorithm itself did not require any particular implementation efforts. Most of the time was spent on the theoretical part of it. To give a better overview over the technicalities, we briefly list the information that needs to be stored. In a preprocessing phase, we compute $p_{\min}$ and $p_{\max}$ for each instrument and set all values of the $\mathit{last}$ arrays to 0. The power values of the samples are given by the drum kit in DrumGizmo. The parameters $\alpha, \beta, \gamma$ have default values that were determined experimentally. Each of them can be changed by the user, either interactively in the GUI or via the command line interface.
@@ -550,7 +550,7 @@ The sampling algorithm itself did not require any particular implementation effo
As DrumGizmo is free software, the exact details of the implementation can be checked by everyone.
% \todoandre{Give less important implementation details, e.g., like adaptive search starting from the most promising value}
-For instruments with reasonably small sample sets, simply iterating over all samples for the specific instrument as shown in Algorithm \ref{alg:sampling} is sufficiently performant. However, imagine an instrument with an extremely large sample set. As DrumGizmo drum kits can be created by everyone, there is no restriction and we cannot assume a small sample size. To avoid performance issues arising from such a scenario, we employ a non-naive search by starting with the \enquote{most promising} sample and then inspecting its neighbors until the currently best sample is known to dominate the remaining samples. More concretely, we do the following: we start with the sample $s$ that has the closest power value $p_s$ to the requested power value $p$, i.e., we find $s$ that minimizes $\abs{p - p_s}$. The key observation why a local search often suffices is that we can lower bound the second and third summand of the objective function by 0. Thus, for a given sample $s$ and a time point $t$, we have
+For instruments with reasonably small sample sets, simply iterating over all samples for the specific instrument as shown in Algorithm \ref{alg:sampling} is sufficiently performant. However, imagine an instrument with an extremely large sample set. As DrumGizmo drum kits can be created by everyone, there is no restriction and we cannot assume a small sample size. To avoid performance issues arising from such a scenario, we employ a non-naive search by starting with the \enquote{most promising} sample and then inspecting its neighbors until the currently best sample is known to dominate the remaining samples. More concretely, we do the following: we start with the sample $s$ that has the closest power value $p_s$ to the requested power value $p$, i.e., we find the sample $s$ that minimizes $\abs{p - p_s}$. The key observation why a local search often suffices is that we can lower bound the second and third summand of the objective function by 0. Thus, for a given sample $s$ and a time point $t$, we have
\begin{equation}\label{eq:lower_bound}
f(s,t) \geq \alpha \cdot \left( \frac{p-p_s}{p_{\max} - p_{\min}}\right)^2.
\end{equation}
@@ -567,11 +567,11 @@ Note that by traversing the samples in order of their distance to $p$ (which is
% \todoandre{Talk about the setup.}
To conduct the experiments, we use the implementation of the new sampling algorithm in DrumGizmo.
As a base-line for comparison, we use the previous sample selection algorithm of DrumGizmo.
-We want to evaluate how the sample selection algorithm performs in practice, therefore we use the drum kits of DrumGizmo. In particular, we use the power value distribution of the samples of those kits.
-We do experiments on three different drum kits.
+We want to evaluate how the sample selection algorithm performs in practice, therefore we use a drum kit of DrumGizmo.
+% We do experiments on three different drum kits.
As normally the snare drum is the instrument with the highest number of samples, we choose this drum for our experiments.
% See Table \ref{tab:drumkit_data} for some information about the samples of the drum kits and see Figure \ref{fig:power_level_distribution} for a visualization of the power level distribution of the kits.
-For our experiments, we use the Crocell kit, which has 98 snare samples. See Figure \ref{fig:power_level_distribution} for a visualization of the power level distribution of the Crocell kit. In this plot, we can see that the sampling is heavy on the more quiet samples and significantly sparser on the louder samples. We ask the reader to keep this distribution in mind for the remainder. Due to this distribution, we expect that when playing notes all over the power spectrum, the number of times each individual sample is played will be higher for louder samples as there are significantly less to choose from.
+More precisely, we use the Crocell kit, which has 98 snare samples. In particular, we use the power value distribution of the samples of this kit. See Figure \ref{fig:power_level_distribution} for a visualization of the power level distribution. In this plot, we can see that the sampling is heavy on the more quiet samples and significantly sparser on the louder samples. We ask the reader to keep this distribution in mind for the remainder, when considering the experimental results. Due to this distribution, we expect that when playing notes all over the power spectrum, the number of times each individual sample is played will be higher for louder samples as there are significantly less to choose from.
% \begin{table}
% \caption{Drum kit data.}
@@ -591,7 +591,7 @@ For our experiments, we use the Crocell kit, which has 98 snare samples. See Fig
\begin{figure}
\centering
\includegraphics[width=.8\linewidth]{figures/power_level_distribution/crocell_sample_powers.pdf}
- \caption{This shows the power value distribution of the Crocell kit.}
+ \caption{The power value distribution of the Crocell kit.}
\label{fig:power_level_distribution}
\end{figure}
@@ -611,7 +611,7 @@ To test the above hypotheses, we conduct the following experiments:
\end{enumerate}
% \todoandre{Do beautiful tables and plots here}
-To be able to compare the results of those experiments, we have to somehow visualize the selections. We choose to plot histograms. More concretely, for a run of the experiment, we plot the number of times a sample was selected over its index in the array of ordered samples. For the first experiment, i.e., the sweeps, you can see the histograms in Figure \ref{fig:sweep}. The plot of the old sampling algorithm clearly is less homogeneous and has larger variance than the plot of the new sampling algorithm. Especially, the old sampling algorithm uses several samples a lot while barely using others, which can lead to a more robotic sound. Especially, it seems a waste of storage and sampling effort to not use the full potential of the data set.
+To be able to compare the results of these experiments, we have to somehow visualize the selections. We choose to plot histograms. More concretely, for a run of the experiment, we plot the number of times a sample was selected over its index in the array of ordered samples. For the first experiment, i.e., the sweeps, you can see the histograms in Figure \ref{fig:sweep}. The plot of the old sampling algorithm clearly is less homogeneous and has larger variance than the plot of the new sampling algorithm. Especially, the old sampling algorithm uses several samples a lot while barely using others, which can lead to a more robotic sound. Especially, it seems a waste of storage and sampling effort to not use the full potential of the data.
\begin{figure*}
\includegraphics[width=.47\linewidth]{figures/sweep_old.pdf}
@@ -641,7 +641,7 @@ The second experiment we conducted for two different MIDI velocities: MIDI veloc
% \todoandre{Also do an experiment regarding the adaptive search}
To also get an idea of the performance of the new sampling algorithm, we want to see how many power values of samples are evaluated per query.
-Without the smart search optimization described at the end of Section \ref{sec:implementation}, this number would always be the number of samples. However, we expect that the smart search optimization significantly reduces the number of evaluations. To test this hypothesis, we take the above experiment and look at the number of evaluations.
+Without the search optimization described at the end of Section \ref{sec:implementation}, this number would always be the number of samples. However, we expect that the search optimization significantly reduces the number of evaluations. To test this hypothesis, we take the above experiment and look at the number of evaluations.
Recall that the Crocell kit contains 98 snare samples and thus naively evaluating the objective function would lead to 98 evaluations. You can see the outcome of the experiment in Table \ref{tab:evaluation_count}. The mean number of evaluations is significantly less (at most 14!) than the worst-case 98 evaluations per sample selection and also the variance is very low, meaning that even for large sample sets, this sample selection algorithm should work well in real-time scenarios.
% \begin{figure}
@@ -668,7 +668,7 @@ In summary, the experiments show that the new sampling algorithm is clearly supe
\section{Conclusion and Future Work}
% \todoandre{Recapitulate what was done in this paper. Highlight some of the difficulties and surprises.}
-This article presented a new algorithm for choosing samples in a setting where the samples are annotated with power values, with the desired behavior of choosing samples close to the input power value while having a reasonable diversity and no significant patterns in the sequence of selected samples. We first formulated the requirements, which lead us to an algorithm that we added to DrumGizmo. Through experiments we showed clear improvements over the old method and also the fulfillment of the desired requirements.
+This article presented a new algorithm for choosing samples in a setting where the samples are annotated with power values, with the desired behavior of choosing samples close to the input power value while having a reasonable diversity and no significant patterns in the sequence of selected samples. We first formulated the requirements, which lead us to an algorithm that we added to DrumGizmo. Through experiments, we showed clear improvements over the old method and also the fulfillment of the desired requirements.
However, there are still some open problems and directions to be addressed in future work. First, in this article we assumed the power values of the samples to be given. We could alter the setting by allowing a transformation on the power values, thus, for example, distributing them better over the power range. Second, the objective function that we used could still be further refined. For example, the diversity term (controlled by parameter $\beta$) could be modified to have a relatively larger penalty for samples that were played very recently, e.g., one could move away from just using linear terms. Third, having a more general view, one could try to adapt our work to instruments that are significantly different from drums. And finally, the real quality of this work should be determined by the users themselves. In this direction, a user study could lead to insights about how to further refine our approach.
@@ -681,7 +681,7 @@ However, there are still some open problems and directions to be addressed in fu
%\newpage
% \nocite{*}
-% \bibliographystyle{IEEEbib}
-% \bibliography{LAC-20} % requires file lac-20.bib
+\bibliographystyle{IEEEbib}
+\bibliography{LAC-20} % requires file lac-20.bib
\end{document}