jagomart
digital resources
picture1_Coding The Matrix Pdf 190382 | Comml 2019


 119x       Filetype PDF       File size 0.37 MB       Source: kkonstantinidis.github.io


File: Coding The Matrix Pdf 190382 | Comml 2019
8 ieee communications letters vol 23 no 1 january 2019 erasure coding for distributed matrix multiplication for matrices with bounded entries li tang konstantinos konstantinidis and aditya ramamoorthy abstract distributed ...

icon picture PDF Filetype PDF | Posted on 03 Feb 2023 | 2 years ago
Partial capture of text on file.
               8                                                                                    IEEE COMMUNICATIONS LETTERS,VOL. 23, NO. 1, JANUARY 2019
                                    Erasure Coding for Distributed Matrix Multiplication for
                                                            Matrices With Bounded Entries
                                            Li Tang , Konstantinos Konstantinidis, and Aditya Ramamoorthy
                 Abstract—Distributed matrix multiplication is widely used in             of B. The master node does some basic processing on its end
               several scientific domains. It is well recognized that computation          andsendsappropriatelycodedsubmatricestoeachworker.The
               times on distributed clusters are often dominated by the slowest           workers multiply their stored (coded) submatrices and return
               workers (called stragglers). Recent work has demonstrated that             the result to the master. The key result of [1] shows that the
               straggler mitigation can be viewed as a problem of designing                           T
               erasure codes. For matrices A and B, the technique essentially             productA Bcanberecoveredaslongasanyτ = pmn+pŠ1
               maps the computation of ATB into the multiplication of smaller             workers complete their computation; the value τ is called the
               (coded) submatrices. The stragglers are treated as erasures in             recovery threshold of the computation.
               this process. The computation can be completed as long as                     Interestingly, similar ideas (relating matrix multiplication
               a certain number of workers (called the recovery threshold)                to polynomial interpolation) were investigated in a different
               complete their assigned tasks. We present a novel coding strategy          context by Yagle [2] in the mid 90’s. However, the motivation
               for this problem when the absolute values of the matrix entries            for that work was fast matrix multiplication using pseudo-
               are sufficiently small. We demonstrate a tradeoff between the
               assumed absolute value bounds on the matrix entries and the                numbertheoretic transforms, rather than fault tolerance. There
               recovery threshold. At one extreme, we are optimal with respect            havebeenothercontributionsin this area [3]–[7] as well, some
               to the recovery threshold, and on the other extreme, we match              of which predate [1].
               the threshold of prior work. Experimental results on cloud-based
               clusters validate the benefits of our method.
                 Index     Terms—Distributed         computing,     erasure     codes,    Main Contributions
               stragglers.                                                                   In this work, we demonstrate that as long as the entries in A
                                                                                          andBareboundedbysufficientlysmallnumbers,therecovery
                                       I. INTRODUCTION                                    threshold (τ) can be significantly reduced as compared to the
                                                                                          approach of [1]. Specifically, the recovery threshold in our
                    HEmultiplication of large-dimensional matrices is a key               work can be of the form p0mn+p0 Š1 where p0 is a divisor
               Tproblemthat is at the heart of several big data computa-                  of p. Thus, we can achieve thresholds as low as mn (which is
               tions. For example, high-dimensional deep learning problems                optimal), depending on our assumptions on the matrix entries.
               often require matrix-vector products at every iteration. In most              We show that the required upper bound on the matrix
               of these problems the sheer size of the matrices precludes com-            entries can be traded off with the corresponding threshold in
               putation on a single machine. Accordingly, the computation                 a simple manner. Finally, we present experimental results that
               is typically performed in a distributed fashion across several             demonstratethe superiority of our method via an Amazon Web
               computation units (or workers). The overall job execution time             Services (AWS) implementation.
               in these systems is typically dominated by the slowest worker;
               this is often referred to as the “straggler problem”.                                         II. PROBLEMFORMULATION
                 In recent years, techniques from coding theory have been                    Let A (size v × r)andB (size v × t) be two integer
               efficiently utilized in mitigating the effect of stragglers.                           1                                                  T
               As pointed out in [1] (see [1, Appendix B]), this issue can                matrices.    We are interested in computing C  A B in a
                                                                                          distributed fashion. Specifically, each worker node can store a
               be viewed as equivalent to coding for fault tolerance over a               1/mp fraction of matrix A and a 1/np fraction of matrix B.
               channel where the stragglers can be viewed as erasures.                    The job given to the worker node is to compute the product
                 More specifically, the work of [1] considers the distributed              of the submatrices assigned to it. The master node waits
                                                                               T
               computation of the product of two large matrices A                  and    for a sufficient number of the submatrix products to be
               B. Matrices A and B are first partitioned into p × m and                    communicated to it. It then determines the final result after
               p×nblocks of submatrices of equal size by the master node.                 further processing at its end. More precisely, matrices A and
               Each worker is assumed to have enough memory to store the                  Bare first block decomposed as follows:
               equivalent of a single submatrix of A and a single submatrix                          A=[A ], 0≤i 1.                                   of two or more stragglers (average latency ≥ 15.65 seconds).
                     Example 1: Let m = n =2, p =4and p0 =2so that                                            Real Vandermonde matrices are well-known to have bad
                                                                                                       condition numbers. The condition number is better when we
                                     A        A                         B         B
                                       00       01                         00       01
                                                                                                       consider complex Vandermondematrices with entries from the
                                     A        A                         B         B
                           A= 10               11                    10          11
                                                        and B =                         .                 unit circle [10]. In our method, the |X | and |Y | values can
                                                                                                                                                             ij           ij
                                     A        A                         B         B
                                       20       21                         20       21                     be quite large. This introduces small errors in the decoding
                                     A        A                         B         B
                                       30       31                         30       31                                       ˆ                                                   T
                                                                                                           process. Let C be the decoded matrix and C  A B be the
                                                                                                                                                                        ˆ
                 We let                                                                                    actual product. Our error metric is e = ||CŠC||F (subscript F
                                                                                                                                                                    ||C||F
                   ˜                              Š1                       Š1                              refers to the Frobenius norm). The results in Fig. 1, had an
                  A(s,z)=A +A s +(A +A s )z
                                    00       10              20        30                                                                Š7
                                 +(A +A sŠ1)z2+(A +A sŠ1)z3, and                                           error e of at most 10            . We studied the effect of increasing
                                        01        11                  21        31                         the average value of the entries in A and B in Table I.
                   ˜
                  B(s,z)=(B +B s)z+B +B s                                                                  The error is consistently low up to a bound of L = 1000,
                                     00        10            20       30
                                 +(B +B s)z5+(B +B s)z4.                                                   following which the calculation is useless owing to numerical
                                        01        11               21       31                             overflowissues. We point out that in our experiments the error
                 The product of the above polynomials can be verified to con-                               e was identically zero if the zi’s were chosen from the unit
                                                                            3    5   7                     circle. However, this requires complex multiplication, which
                 tain the useful terms with coefficients z,z ,z ,z ; the others
                 are interference terms. For this scheme the corresponding                                 increases the computation time.
                 |Xij| can at most be 2L2, though the recovery threshold is
                 9. Applying the method of Section III-B would result in the                                                                REFERENCES
                 |X | values being bounded by 8L4 with a threshold of 4.                                    [1] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr. (2018). “Straggler
                     ij                                                                                          mitigation in distributed matrix multiplication: Fundamental limits and
                                                                                                                 optimal coding.” [Online]. Available: https://arxiv.org/abs/1801.07487
                                                                                                            [2] A. E. Yagle, “Fast algorithms for matrix multiplication using pseudo-
                          V. EXPERIMENTALRESULTSANDDISCUSSION                                                    number-theoretic transforms,” IEEE Trans. Signal Process., vol. 43,
                                                                                                                 no. 1, pp. 71–76, Jan. 1995.
                     We ran our experiments on AWS EC2 r3.large instances.                                  [3] S. Dutta, V. Cadambe, and P. Grover, “Short-dot: Computing large linear
                 Our code is available online [8]. The input matrices                                            transforms distributedly using coded short dot products,” in Proc. Adv.
                 Aand B were randomly generated integer matrices of size                                         Neural Inf. Process. Syst. (NIPS), 2016, pp. 2100–2108.
                                                                                                            [4] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran,
                 8000 × 8000 with elements in the set {0,1,...,50}.These                                         “Speeding up distributed machine learning using codes,” IEEE Trans.
                 matrices were pre-generated (for the different straggler counts)                                Inf. Theory, vol. 64, no. 3, pp. 1514–1529, Mar. 2018.
                 and remained the same for all experiments. The master node                                 [5] K. Lee, C. Suh, and K. Ramchandran, “High-dimensional coded matrix
                                                                                                                 multiplication,” in Proc. IEEE Int. Symp. Inf. Theory, Jun. 2017,
                 wasresponsiblefor the 2×2 block decomposition of A and B,                                       pp. 2418–2422.
                                 ˜                 ˜                                                        [6] A. Mallick, M. Chaudhari, and G. Joshi. (2018). “Rateless codes for
                 computingA(s,zi) and B(s,zi) for i =1,...,10 and sending                                        near-perfect load balancing in distributed matrix-vector multiplication.”
                 them to the worker nodes. The evaluation points (zi’s) were                                     [Online]. Available: https://arxiv.org/abs/1804.10331
                 chosen as 10 equally spaced reals within the interval [Š1,1].                              [7] S. Wang, J. Liu, and N. Shroff, “Coded sparse matrix multi-
                 The stragglers were simulated by having S randomly chosen                                       plication,”  in Proc. Int. Conf. Mach. Learn. (ICML), Jul. 2018,
                 machines perform their local computation twice.                                                 pp. 5152–5160.
                                                                                                            [8] Repository of Erasure Coding for Distributed Matrix Multiplication for
                     We compared the performance of our method (cf. Section                                      Matrices With Bounded Entries. Accessed: 2018. [Online]. Available:
                 III) with [1]. For fairness, we chose the same evaluation points                                https://bitbucket.org/kkonstantinidis/stragglermitmm/src/master
                 in both methods. In fact, the choice of points in their code                               [9] Repository of Polynomial Code for Prior Implementation. Accessed:
                                                                                                                 2018. [Online]. Available: https://github.com/AvestimehrResearchGroup/
                 available online [9] (which we adapted for the case when                                        Polynomial-Code
                 p>1), provides worse results than those reported here.                                    [10] W. Gautschi, “How (un)stable are vandermonde systems?” in Asymp-
                     Computation latency refers to the elapsed time from the                                     totic  and Computational Analysis (Lecture Notes in Pure and
                                                                                                                 Applied Mathematics), vol. 124. New York, NY, USA: Dekker, 1990,
                 point when all workers have received their inputs until enough                                  pp. 193–210.
                             Authorized licensed use limited to: Iowa State University. Downloaded on June 23,2020 at 19:13:30 UTC from IEEE Xplore.  Restrictions apply. 
The words contained in this file might help you see if this file matches what you are looking for:

...Ieee communications letters vol no january erasure coding for distributed matrix multiplication matrices with bounded entries li tang konstantinos konstantinidis and aditya ramamoorthy abstract is widely used in of b the master node does some basic processing on its end several scientic domains it well recognized that computation andsendsappropriatelycodedsubmatricestoeachworker times clusters are often dominated by slowest workers multiply their stored coded submatrices return called stragglers recent work has demonstrated result to key shows straggler mitigation can be viewed as a problem designing t codes technique essentially producta bcanberecoveredaslongasany pmn p maps atb into smaller complete value treated erasures recovery threshold this process completed long interestingly similar ideas relating certain number polynomial interpolation were investigated different assigned tasks we present novel strategy context yagle mid s however motivation when absolute values was fast usin...

no reviews yet
Please Login to review.