summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAndré Nusser <andre.nusser@googlemail.com>2020-02-02 12:54:06 +0100
committerAndré Nusser <andre.nusser@googlemail.com>2020-02-02 12:54:06 +0100
commitd33334b4914e8a75e822bf1919a61ca9f2d9cae3 (patch)
tree12d0db6085a57ec3395da5dee1c0c816bc8294a2
parent108435bab43c109d3bfc9ffd9432a66706a18345 (diff)
Continued writing on description of the algorithm.
-rw-r--r--sampling_alg_lac2020/LAC-20.tex41
1 files changed, 31 insertions, 10 deletions
diff --git a/sampling_alg_lac2020/LAC-20.tex b/sampling_alg_lac2020/LAC-20.tex
index f509237..a16f82d 100644
--- a/sampling_alg_lac2020/LAC-20.tex
+++ b/sampling_alg_lac2020/LAC-20.tex
@@ -285,10 +285,10 @@ To the best of our knowledge, this is the first academic article that deals with
\todobent{Talk about loudness computation of samples.}
\todo{Mathematical basics (if there are any important ones).}
\todo{Formalize the setting, i.e.\ what is the input/output of our algorithm?}
-\todoandre{Make terminology and notation clear and check for consistency in the document.}
+% \todoandre{Make terminology and notation clear and check for consistency in the document.}
\subsection{Notation and Terminology}
-We use the following notation throughout this article. An \emph{instrument} is considered to be one of the drums of the drum kit that we sampled. A \emph{sample} (denoted by $s, s', \dots$) is recording of one hit on a specific instrument. The \emph{power} of a sample (denoted by $p, p', \dots$) is the perceived loudness and can be expressed by any common loudness measure of an audio clip. With the term \emph{velocity} (denoted by $v, v', \dots$), we refer to the attack velocity of a MIDI note and it is thus between 0 and 127. We consider time in a discretized way and thus a \emph{time point} is an integer value intuitively referring to the number of time steps passed since the beginning of time. For a sample $s$, we refer with $t_s$ to the time point at which the sample was played last.
+We use the following notation throughout this article. An \emph{instrument} is considered to be one of the drums of the drum kit that we sampled. A \emph{sample} (denoted by $s, s', \dots$) is recording of one hit on a specific instrument. Given a sample $s$, the \emph{power} of it (denoted by $p_s, p_s', \dots$) is the perceived loudness and can be expressed by any common loudness measure of an audio clip. If a power value is requested and does not correspond to a sample, we denote it by $p, p', \dots$. With the term \emph{velocity} (denoted by $v, v', \dots$), we refer to the attack velocity of a MIDI note and it is thus between 0 and 127. We consider time in a discretized way and thus a \emph{time point} is an integer value intuitively referring to the number of time steps passed since the beginning of time. For a sample $s$, we refer with $t_s$ to the time point at which the sample was played last.
\section{Requirements} \label{sec:requirements}
@@ -304,26 +304,47 @@ More concretely, we aim to fulfill the following requirements with our proposed
\item[Locality:] If two samples have a very similar power value, they should also be treated similarly by the algorithm. In other words, locally, samples should have almost the same probability of being chosen.
\end{description}
-We now formalize the requirements stated above. Let $p, p'$ be two power levels. We define their dissimilarity to simply be their distance $\abs{p - p'}$. Thus, if $p$ is the input power value and $p'$ is the power value of the chosen sample, we want to minimize the above term. Let $s$ be a sample and $t_s$ the time point it was played last. When we are now queried for a sample at time $t'$, then for $s$ to be a good sample, we want $\abs{t_s - t'}$ to be reasonably high. Again, we just use the distance between the current time step and the last time step a sample was used. Randomization is difficult to formalize in a simple way in this context, thus, we simply require that for the same history, different outcomes of choosing a sample should be possible. The last requirement we also state in a more intuitive than formal way. Assume we are requested a sample for the power value $p$ and the two samples $s, s'$ have a very similar power value. Then, if we exchange $t_s$ and $t_{s'}$, the probability of choosing $s$ over $s'$ should be roughly the same as if we don't exchange them.
+We now formalize the requirements stated above. Let $p, p'$ be two power levels. We define their dissimilarity to simply be their distance $\abs{p - p'}$. Thus, if $p$ is the input power value and $p'$ is the power value of the chosen sample, we want to minimize the above term. Let $s$ be a sample and $t_s$ the time point it was played last. When we are now queried for a sample at time $t$, then for $s$ to be a good sample, we want $t - t_s$ to be reasonably high. Again, we just use the distance between the current time step and the last time step a sample was used. Randomization is difficult to formalize in a simple way in this context, thus, we simply require that for the same history, different outcomes of choosing a sample should be possible. The last requirement we also state in a more intuitive than formal way. Assume we are requested a sample for the power value $p$ and the two samples $s, s'$ have a very similar power value. Then, if we exchange $t_s$ and $t_{s'}$, the probability of choosing $s$ over $s'$ should be roughly the same as if we don't exchange them.
\section{Algorithm}
Before we start describing the approach that we decided for, let us first discuss other approaches that might seem promising at this point of the discussion. \todoandre{Also quickly discuss other alternative models that are following naturally from above observations; but wait until Bent wrote the related work part to make sure to be consistent with that.}
% \todoandre{We have a multi-criteria optimization!}
-The requirements mentioned in Section \ref{sec:requirements} consist of several different objectives. Thus, we are dealing with a multi-objective optimization, where we have to somehow combine the different objectives into one. As we are dealing with an interactive setting where the quality of the solution of this optimization is determined by the user, it seems natural to make the algorithm parametrized. By these parameters, the user can influence how the different objectives are weighted and thereby influence the final result just having a very rough intuitive and non-technical understanding of the process.
+The requirements mentioned in Section \ref{sec:requirements} consist of several different objectives. Thus, we are dealing with a multi-objective optimization, where we have to somehow combine the different objectives into one. As we are dealing with an interactive setting where the quality of the solution of the optimization is determined by the user, it seems natural to make the algorithm parametrized and expose the parameters to the user. Using these parameters, the user can influence how the different objectives are weighted and thereby influence the final result, just having a very rough intuitive and non-technical understanding of the process.
% \todoandre{Talk about the general idea of a potential function.}
-Following from the above description, we choose a single objective function that we optimize. This objective function consists of the terms described in Section \ref{sec:requirements} as well as the parameters that are factors in front of the requirements terms. We formulate our objective function such that smaller values are better and thus we end up with a minimization problem. Thus, we just have to evaluate a single term on each sample and then pick the one which has the best value.
+Following from the above description, we choose a single objective function that we optimize. This objective function consists of the terms described in Section \ref{sec:requirements} as well as the parameters that are factors in front of the requirements terms. We formulate our objective function such that smaller values are better and thus we end up with a minimization problem. Consequently, we just have to evaluate a single term on each sample and then pick the one which has the best value.
-\todoandre{Introduce the potential function of the algorithm.}
-We now give the objective function. Let $p$ be the power that is requested. As before, for any sample $s$, let $t_s$ be the time at which $s$ was played last (the unit does not matter as it is parametrized by $\beta$ anyway), and let $r(s,t)$ be a random number generator producing numbers in the range $[0,1]$ uniformly and independently at random. At the current time $t$, we now want to find the sample $s$ minimizing the objective function
+% \todoandre{Introduce the potential function of the algorithm.}
+We now give the objective function. Let $p$ be the power that is requested. As before, for any sample $s$, let $t_s$ be the time at which $s$ was played last (the unit does not matter as it is parametrized by $\beta$ anyway), and let $r(s,t)$ be a random number generator producing numbers in the range $[0,1]$ uniformly and independently at random. Let $\alpha, \beta, \gamma > 0$ be the parameters that are later exposed to the user of the sample algorithm. Also, recall that $p_{\min}, p_{\max}$ is the minimal and maximal power value, respectively, and that $S$ is the sample rate, i.e., the number of time steps per second. At the current time $t$, we now want to find the sample $s$ minimizing the objective function
\[
- f(s, t) \coloneqq \alpha \cdot (p-p_s)^2 + \beta \cdot (t_s - t)^{-2} + \gamma \cdot r(s,t),
+ f(s, t) \coloneqq \alpha \cdot \left( \frac{p-p_s}{p_{\max} - p_{\min}}\right)^2 + \beta \cdot \left( 1 + \frac{t - t_s}{S}\right)^{-1} + \gamma \cdot r(s,t).
\]
-where $p_s$ is the power of sample $s$. We have to ensure that $t_q \neq t$ to avoid division by zero.\todo{Adapt to what is actually in the code. Also introduce the parameters $\alpha, \dots$ beforehand.}
+Note that we have to ensure $t_s \neq t$ to avoid division by zero.
+
+% \todoandre{Talk about the single terms of the potential function.}
+Let us now consider the objective function in more detail. There are three terms in the sum and we will have a closer look at them in order.
+
+The first term, namely
+\[
+ \alpha \cdot \left( \frac{p-p_s}{p_{\max} - p_{\min}}\right)^2,
+\]
+is for expressing the distance of the sample's power value $p_s$ to the desired power value $p$. Instead of using the absolute value $\abs{p-p_s}$ as discussed in Section \ref{sec:requirements}, we first normalize the value to be in the range $[0,1]$ and then square it. By squaring, we put a significantly stronger penalty than the absolute value on samples whose power value is very far from the requested power value. Note that if we request a power value that lies in the middle of the power values of our sample set, then this term will maximally be around $\frac{1}{4}\alpha$. However, if $p = p_{\max}$ or $p = p_{\min}$, then we might obtain a value of $\alpha$. While this might seem unreasonable at first glance, we want to highlight that this indeed matches the desired behavior, because the penalty should be only dependent on the distance to the sample and not the possible worst case value it can attain. In other words, a bad choice should not be make to look better, if there are much worse choices possible.
+
+The second term, namely
+\[
+ \beta \cdot \left( 1 + \frac{t - t_s}{S}\right)^{-1},
+\]
+expresses how much time passed since we last played the sample for which we are currently evaluating the objective function. However, we want large values if little time passes and therefore we raise the whole term to the power of $-1$. To avoid extreme values, we add 1 to the normalized distance of the current time and the last time the sample was played. Note that if we wouldn't have a \enquote{+1}, then for $t - t_s$ being very small, the values of this term would be huge and thus dominate the whole objective function. We normalize by the sample rate, as we want the time distance to be in seconds and not in samples.
+
+The third therm, namely
+\[
+ \gamma \cdot r(s,t),
+\]
+just adds some noise to the process to make it non-deterministic and thus avoid patterns in the selection as mentioned in Section \ref{sec:requirements}.
-\todoandre{Talk about the single terms of the potential function.}
\todoandre{Maybe add some pseudo-code to make things easier to understand?}
+Algorithm \ref{alg:sampling} shows the pseudo code for the sample selection algorithm.
\begin{algorithm}
\begin{algorithmic}