Coding The Matrix Pdf 190382

Partial capture of text on file.

8 IEEE COMMUNICATIONS LETTERS,VOL. 23, NO. 1, JANUARY 2019
Erasure Coding for Distributed Matrix Multiplication for
Matrices With Bounded Entries
Li Tang , Konstantinos Konstantinidis, and Aditya Ramamoorthy
Abstract—Distributed matrix multiplication is widely used in of B. The master node does some basic processing on its end
several scientiﬁc domains. It is well recognized that computation andsendsappropriatelycodedsubmatricestoeachworker.The
times on distributed clusters are often dominated by the slowest workers multiply their stored (coded) submatrices and return
workers (called stragglers). Recent work has demonstrated that the result to the master. The key result of [1] shows that the
straggler mitigation can be viewed as a problem of designing T
erasure codes. For matrices A and B, the technique essentially productA Bcanberecoveredaslongasanyτ = pmn+p1
maps the computation of ATB into the multiplication of smaller workers complete their computation; the value τ is called the
(coded) submatrices. The stragglers are treated as erasures in recovery threshold of the computation.
this process. The computation can be completed as long as Interestingly, similar ideas (relating matrix multiplication
a certain number of workers (called the recovery threshold) to polynomial interpolation) were investigated in a different
complete their assigned tasks. We present a novel coding strategy context by Yagle [2] in the mid 90’s. However, the motivation
for this problem when the absolute values of the matrix entries for that work was fast matrix multiplication using pseudo-
are sufﬁciently small. We demonstrate a tradeoff between the
assumed absolute value bounds on the matrix entries and the numbertheoretic transforms, rather than fault tolerance. There
recovery threshold. At one extreme, we are optimal with respect havebeenothercontributionsin this area [3]–[7] as well, some
to the recovery threshold, and on the other extreme, we match of which predate [1].
the threshold of prior work. Experimental results on cloud-based
clusters validate the beneﬁts of our method.
Index Terms—Distributed computing, erasure codes, Main Contributions
stragglers. In this work, we demonstrate that as long as the entries in A
andBareboundedbysufﬁcientlysmallnumbers,therecovery
I. INTRODUCTION threshold (τ) can be signiﬁcantly reduced as compared to the
approach of [1]. Speciﬁcally, the recovery threshold in our
HEmultiplication of large-dimensional matrices is a key work can be of the form p0mn+p0 1 where p0 is a divisor
Tproblemthat is at the heart of several big data computa- of p. Thus, we can achieve thresholds as low as mn (which is
tions. For example, high-dimensional deep learning problems optimal), depending on our assumptions on the matrix entries.
often require matrix-vector products at every iteration. In most We show that the required upper bound on the matrix
of these problems the sheer size of the matrices precludes com- entries can be traded off with the corresponding threshold in
putation on a single machine. Accordingly, the computation a simple manner. Finally, we present experimental results that
is typically performed in a distributed fashion across several demonstratethe superiority of our method via an Amazon Web
computation units (or workers). The overall job execution time Services (AWS) implementation.
in these systems is typically dominated by the slowest worker;
this is often referred to as the “straggler problem”. II. PROBLEMFORMULATION
In recent years, techniques from coding theory have been Let A (size v × r)andB (size v × t) be two integer
efﬁciently utilized in mitigating the effect of stragglers. 1 T
As pointed out in [1] (see [1, Appendix B]), this issue can matrices. We are interested in computing C A B in a
distributed fashion. Speciﬁcally, each worker node can store a
be viewed as equivalent to coding for fault tolerance over a 1/mp fraction of matrix A and a 1/np fraction of matrix B.
channel where the stragglers can be viewed as erasures. The job given to the worker node is to compute the product
More speciﬁcally, the work of [1] considers the distributed of the submatrices assigned to it. The master node waits
T
computation of the product of two large matrices A and for a sufﬁcient number of the submatrix products to be
B. Matrices A and B are ﬁrst partitioned into p × m and communicated to it. It then determines the ﬁnal result after
p×nblocks of submatrices of equal size by the master node. further processing at its end. More precisely, matrices A and
Each worker is assumed to have enough memory to store the Bare ﬁrst block decomposed as follows:
equivalent of a single submatrix of A and a single submatrix A=[A ], 0≤i 1. of two or more stragglers (average latency ≥ 15.65 seconds).
Example 1: Let m = n =2, p =4and p0 =2so that Real Vandermonde matrices are well-known to have bad
    condition numbers. The condition number is better when we
A A B B
00 01 00 01
    consider complex Vandermondematrices with entries from the
A A B B
A= 10 11  10 11
 and B = . unit circle [10]. In our method, the |X | and |Y | values can
   ij ij
A A B B
20 21 20 21 be quite large. This introduces small errors in the decoding
A A B B
30 31 30 31 ˆ T
process. Let C be the decoded matrix and C A B be the
ˆ
We let actual product. Our error metric is e = ||CC||F (subscript F
||C||F
˜ 1 1 refers to the Frobenius norm). The results in Fig. 1, had an
A(s,z)=A +A s +(A +A s )z
00 10 20 30 7
+(A +A s1)z2+(A +A s1)z3, and error e of at most 10 . We studied the effect of increasing
01 11 21 31 the average value of the entries in A and B in Table I.
˜
B(s,z)=(B +B s)z+B +B s The error is consistently low up to a bound of L = 1000,
00 10 20 30
+(B +B s)z5+(B +B s)z4. following which the calculation is useless owing to numerical
01 11 21 31 overﬂowissues. We point out that in our experiments the error
The product of the above polynomials can be veriﬁed to con- e was identically zero if the zi’s were chosen from the unit
3 5 7 circle. However, this requires complex multiplication, which
tain the useful terms with coefﬁcients z,z ,z ,z ; the others
are interference terms. For this scheme the corresponding increases the computation time.
|Xij| can at most be 2L2, though the recovery threshold is
9. Applying the method of Section III-B would result in the REFERENCES
|X | values being bounded by 8L4 with a threshold of 4. [1] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr. (2018). “Straggler
ij mitigation in distributed matrix multiplication: Fundamental limits and
optimal coding.” [Online]. Available: https://arxiv.org/abs/1801.07487
[2] A. E. Yagle, “Fast algorithms for matrix multiplication using pseudo-
V. EXPERIMENTALRESULTSANDDISCUSSION number-theoretic transforms,” IEEE Trans. Signal Process., vol. 43,
no. 1, pp. 71–76, Jan. 1995.
We ran our experiments on AWS EC2 r3.large instances. [3] S. Dutta, V. Cadambe, and P. Grover, “Short-dot: Computing large linear
Our code is available online [8]. The input matrices transforms distributedly using coded short dot products,” in Proc. Adv.
Aand B were randomly generated integer matrices of size Neural Inf. Process. Syst. (NIPS), 2016, pp. 2100–2108.
[4] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran,
8000 × 8000 with elements in the set {0,1,...,50}.These “Speeding up distributed machine learning using codes,” IEEE Trans.
matrices were pre-generated (for the different straggler counts) Inf. Theory, vol. 64, no. 3, pp. 1514–1529, Mar. 2018.
and remained the same for all experiments. The master node [5] K. Lee, C. Suh, and K. Ramchandran, “High-dimensional coded matrix
multiplication,” in Proc. IEEE Int. Symp. Inf. Theory, Jun. 2017,
wasresponsiblefor the 2×2 block decomposition of A and B, pp. 2418–2422.
˜ ˜ [6] A. Mallick, M. Chaudhari, and G. Joshi. (2018). “Rateless codes for
computingA(s,zi) and B(s,zi) for i =1,...,10 and sending near-perfect load balancing in distributed matrix-vector multiplication.”
them to the worker nodes. The evaluation points (zi’s) were [Online]. Available: https://arxiv.org/abs/1804.10331
chosen as 10 equally spaced reals within the interval [1,1]. [7] S. Wang, J. Liu, and N. Shroff, “Coded sparse matrix multi-
The stragglers were simulated by having S randomly chosen plication,” in Proc. Int. Conf. Mach. Learn. (ICML), Jul. 2018,
machines perform their local computation twice. pp. 5152–5160.
[8] Repository of Erasure Coding for Distributed Matrix Multiplication for
We compared the performance of our method (cf. Section Matrices With Bounded Entries. Accessed: 2018. [Online]. Available:
III) with [1]. For fairness, we chose the same evaluation points https://bitbucket.org/kkonstantinidis/stragglermitmm/src/master
in both methods. In fact, the choice of points in their code [9] Repository of Polynomial Code for Prior Implementation. Accessed:
2018. [Online]. Available: https://github.com/AvestimehrResearchGroup/
available online [9] (which we adapted for the case when Polynomial-Code
p>1), provides worse results than those reported here. [10] W. Gautschi, “How (un)stable are vandermonde systems?” in Asymp-
Computation latency refers to the elapsed time from the totic and Computational Analysis (Lecture Notes in Pure and
Applied Mathematics), vol. 124. New York, NY, USA: Dekker, 1990,
point when all workers have received their inputs until enough pp. 193–210.
Authorized licensed use limited to: Iowa State University. Downloaded on June 23,2020 at 19:13:30 UTC from IEEE Xplore. Restrictions apply.

The words contained in this file might help you see if this file matches what you are looking for:

...Ieee communications letters vol no january erasure coding for distributed matrix multiplication matrices with bounded entries li tang konstantinos konstantinidis and aditya ramamoorthy abstract is widely used in of b the master node does some basic processing on its end several scientic domains it well recognized that computation andsendsappropriatelycodedsubmatricestoeachworker times clusters are often dominated by slowest workers multiply their stored coded submatrices return called stragglers recent work has demonstrated result to key shows straggler mitigation can be viewed as a problem designing t codes technique essentially producta bcanberecoveredaslongasany pmn p maps atb into smaller complete value treated erasures recovery threshold this process completed long interestingly similar ideas relating certain number polynomial interpolation were investigated different assigned tasks we present novel strategy context yagle mid s however motivation when absolute values was fast usin...

Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area