[mlpack] Inquiry into SGD
tu.stephenl at gmail.com
Tue Mar 14 01:31:04 EDT 2017
I think these definitions are for analysis purposes only. Certainly, the
implementation should not try to track these or "break ties at random"--
that defeats the whole purpose! In fact, I think part of this project could
be to understand (empirically) how different consistency levels (meaning,
when you do a write to the model, what, if any mfences do you use, etc) in
the algorithm affect convergence.
As for memory partitioning, this is sort of open ended. I've always
wondered if a more "columnar" storage format (ie storing the data by
features instead of by data points) has any benefits-- this was inspired by
a lot of the literature on column stores in databases. But you are free to
propose other ideas.
On Sun, Mar 12, 2017 at 9:33 PM, 孟憲妤 <spur398 at gmail.com> wrote:
> Thanks for your reply.
> In page 4 of the paper : Even though the processors
> have no knowledge as to whether any of the other processors have modified
> x, we define x j to be
> the state of the decision variable x after j updates have been performed 1
> . Since two processors can
> write to x at the same time, we need to be a bit careful with this
> definition, but we simply break ties
> at random.
> I don't fully understand what does it mean by "break ties at random". Does
> it indicate that we should store all x_k(j) for evaluating G_j(x_j) ? How
> do we know that the state we read is the newest state(no other process is
> writing) ?
> In https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#para
> llel-stochastic-optimization-methods mentioned that we should consider
> how to partition data in memory for good cache locality. Does it suggest
> that we should consider the size of L1 L2 cache and the block size and
> design the size of array not to be prone to cache miss?
> Thanks for your patience.
> 2017-03-08 15:35 GMT+08:00 Stephen Tu <tu.stephenl at gmail.com>:
>> The practical way to run parallel SGD is to use a constant stepsize per
>> epoch, and then set the step size \eta <- \eta * \rho for some \rho \in
>> (0,1) (typical value is like 0.9). I would not recommend changing the step
>> size at every time. See the discussion in Section 5 (Robust 1/k rates).
>> I don't understand your second question since there shouldn't be locks?
>> On Tue, Mar 7, 2017 at 5:57 AM, 孟憲妤 <spur398 at gmail.com> wrote:
>>> I'm HsienYu Meng from Taiwan. I'm currently in my first year of graduate
>>> studies at Tsinghua Univ. As you mentioned, the SGD project should
>>> implement HOGWILD! algorithm and consider to set stepsize at each
>>> iteration. However, the paper indicates that they assume the stepsize to be
>>> a constant value so that the formula 7 make sense. I wonder if we change
>>> the stepsize at each time, will the properties of HOGWILD! still holds?
>>> Another question is, if the dataset is large and we partition it in memory,
>>> which kind of losk-release should we apply? Sequential consistency, Release
>>> consistency or lazy release consistency?
>>> Appreciate for your patience .Thanks!
>>> *MengHsienyu**Department of Computer Science ,Tsinghua university *
>>> *Beijing ,China*
>>> *Mobile:(+86)18201162149 <+86%20182%200116%202149>*
>>> * (+886)0952424693 <+886%20952%20424%20693>*
>>> mlpack mailing list
>>> mlpack at lists.mlpack.org
> *MengXianYu**Department of Electronic Engineering ,Tsinghua university *
> *Beijing ,China*
> *Mobile:(+86)18201162149 <+86%20182%200116%202149>*
> * (+886)0952424693 <+886%20952%20424%20693>*
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mlpack