N (disambiguation): Difference between revisions
en>Bkonrad extra link is unhelpful for disambiguation |
en>Eventhorizon51 No edit summary |
||
Line 1: | Line 1: | ||
{{primary sources|date=November 2012}} | |||
{{Infobox Software | |||
| name = ATLAS | |||
| genre = [[Software library]] | |||
| license = [[BSD license]] | |||
| website = http://math-atlas.sourceforge.net | |||
}} | |||
'''Automatically Tuned Linear Algebra Software''' ('''ATLAS''') is a [[Library (computer science)|software library]] for [[linear algebra]]. It provides a mature [[open source]] implementation of [[Basic Linear Algebra Subprograms|BLAS]] [[application programming interface|APIs]] for [[C programming language|C]] and [[Fortran|Fortran77]]. | |||
ATLAS is often recommended as a way to automatically generate an [[Optimization (computer science)|optimized]] BLAS library. While its performance often trails that of specialized libraries written for one specific [[platform (computing)|hardware platform]], it is often the first or even only optimized BLAS implementation available on new systems and is a large improvement over the generic BLAS available at [[Netlib]]. For this reason, ATLAS is sometimes used as a performance baseline for comparison with other products. | |||
ATLAS runs on most [[Unix]]-like operating systems and on [[Microsoft Windows]] (using [[Cygwin]]). It is released under a [[BSD license|BSD-style license]] without advertising clause, and many well-known mathematics applications including [[MATLAB]], [[Mathematica]], [[Scilab]], [[Sage (mathematics software)|Sage]], and some builds of [[GNU Octave]] may use it. | |||
==Functionality== | |||
ATLAS provides a full implementation of the BLAS APIs as well as some additional functions from [[LAPACK]], a higher-level library built on top of BLAS. In BLAS, functionality is divided into three groups called levels 1, 2 and 3. | |||
* Level 1 contains ''vector operations'' of the form | |||
:<math>\mathbf{y} \leftarrow \alpha \mathbf{x} + \mathbf{y} \!</math> | |||
:as well as scalar [[dot product]]s and [[norm (mathematics)|vector norm]]s, among other things. | |||
* Level 2 contains ''matrix-vector operations'' of the form | |||
:<math>\mathbf{y} \leftarrow \alpha A \mathbf{x} + \beta \mathbf{y} \!</math> | |||
:as well as solving <math>T \mathbf{x} = \mathbf{y}</math> for '''x''' with <math>T</math> being triangular, among other things. | |||
* Level 3 contains ''matrix-matrix operations'' such as the widely used [[General Matrix Multiply]] (GEMM) operation | |||
:<math>C \leftarrow \alpha A B + \beta C \!</math> | |||
:as well as solving <math>B \leftarrow \alpha T^{-1} B</math> for triangular matrices <math>T</math>, among other things. | |||
==Optimization approach== | |||
The [[Optimization (computer science)|optimization]] approach is called Automated Empirical Optimization of Software (AEOS), which identifies four fundamental approaches to computer assisted optimization of which ATLAS employs three:<ref>{{cite journal | |||
| author = R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra | |||
| title = Automated Empirical Optimization of Software and the ATLAS Project | |||
| journal = Parallel Computing | |||
| volume = 27 | |||
| pages = 3–35 | |||
| year = 2001 | |||
| doi = 10.1016/S0167-8191(00)00087-9 | |||
| url = http://www.netlib.org/lapack/lawnspdf/lawn147.pdf | |||
| format = [[Portable Document Format|PDF]] | |||
| accessdate = 2006-10-06 | |||
}}</ref> | |||
# [[Parameter (computer science)|Parameter]]ization—searching over the parameter space of a function, used for blocking factor, cache edge, ... | |||
# Multiple implementation—searching through various approaches to implementing the same function, e.g., for [[Streaming SIMD Extensions|SSE]] support before intrinsics made them available in C code | |||
# [[Automatic programming|Code generation]]—programs that write programs incorporating what knowledge they can about what will produce the best performance for the system | |||
* Optimization of the level 1 BLAS uses parameterization and multiple implementation | |||
: Every ATLAS level 1 BLAS function has its own kernel. Since it would be difficult to maintain thousands of cases in ATLAS there is little architecture specific optimization for Level 1 BLAS. Instead multiple implementation is relied upon to allow for [[compiler optimization]] to produce high performance implementation for the system. | |||
* Optimization of the level 2 BLAS uses parameterization and multiple implementation | |||
: With <math>N^2</math> data and <math>N^2</math> operations to perform the function is usually limited by bandwidth to memory, and thus there is not much opportunity for optimization | |||
: All routines in the ATLAS level 2 BLAS are built from two Level 2 BLAS kernels: | |||
** GEMV—matrix by vector multiply update: | |||
::<math>\mathbf{y} \leftarrow \alpha A \mathbf{x} + \beta \mathbf{y} \!</math> | |||
** GER—general rank 1 update from an outer product: | |||
::<math>A \leftarrow \alpha \mathbf{x} \mathbf{y}^T + A \! </math> | |||
* Optimization of the level 3 BLAS uses code generation and the other two techniques | |||
: Since we have <math>N^3</math> ops with only <math>N^2</math> data, many opportunities for optimization | |||
==Level 3 BLAS== | |||
Most of the Level 3 BLAS is derived from [[General Matrix Multiply|GEMM]], so that is the primary focus of the optimization. | |||
:<math>O(n^3)</math> operations vs. <math>O(n^2)</math> data | |||
The intuition that the <math>n^3</math> operations will dominate over the <math>n^2</math> data accesses only works for roughly square matrices. | |||
The real measure should be some kind of surface area to volume. | |||
The difference becomes important for very non-square matrices. | |||
===Can it afford to copy?=== | |||
Copying the inputs allows the data to be arranged in a way that provides optimal access for the kernel functions, | |||
but this comes at the cost of allocating temporary space, and an extra read and write of the inputs. | |||
So the first question GEMM faces is, can it afford to copy the inputs? | |||
If so, | |||
* Put into block major format with good alignment | |||
* Take advantage of user contributed kernels and cleanup | |||
* Handle the transpose cases with the copy: make everything into TN (transpose - no-transpose) | |||
* Deal with α in the copy | |||
If not, | |||
* Use the nocopy version | |||
* Make no assumptions on the stride of matrix ''A'' and ''B'' in memory | |||
* Handle all transpose cases explicitly | |||
* No guarantee about alignment of data | |||
* Support α specific code | |||
* Run the risk of [[Translation Lookaside Buffer|TLB]] issues, bad strides, ... | |||
The actual decision is made through a simple [[Heuristic (computer science)|heuristic]] which checks for "skinny cases". | |||
===Cache edge=== | |||
For 2nd Level Cache blocking a single cache edge parameter is used. | |||
The high level choose an order to traverse the blocks: ''ijk, jik, ikj, jki, kij, kji''. | |||
These need not be the same order as the product is done within a block. | |||
Typically chosen orders are ''ijk'' or ''jik''. | |||
For ''jik'' the ideal situation would be to copy ''A'' and the ''NB'' wide panel of ''B''. | |||
For ''ijk'' swap the role of ''A'' and ''B''. | |||
Choosing the bigger of ''M'' or ''N'' for the outer loop reduces the footprint of the copy. | |||
But for large ''K'' ATLAS does not even allocate such a large amount of memory. | |||
Instead it defines a parameter, ''Kp'', to give best use of the L2 cache. | |||
Panels are limited to ''Kp'' in length. | |||
It first tries to allocate (in the ''jik'' case) <math>M*Kp + NB*Kp + NB*NB</math>. | |||
If that fails it tries <math>2*Kp*NB + NB*NB</math>. | |||
(If that fails it uses the no-copy version of GEMM, but this case is unlikely for reasonable choices of cache edge.) | |||
''Kp'' is a function of cache edge and ''NB''. | |||
==LAPACK== | |||
When integrating the ATLAS BLAS with [[LAPACK]] an important consideration is the choice of blocking factor for LAPACK. If the ATLAS blocking factor is small enough the blocking factor of LAPACK could be set to match that of ATLAS. | |||
To take advantage of recursive factorization, ATLAS provides replacement routines for some LAPACK routines. These simply overwrite the corresponding LAPACK routines from [[Netlib]]. | |||
==Need for installation== | |||
Installing ATLAS on a particular platform is a challenging process which is typically done by a system vendor or a local expert and made available to a wider audience. | |||
For many systems, architectural default parameters are available; these are essentially saved searches plus the results of hand tuning. | |||
If the arch defaults work they will likely get 10-15% better performance than the install search. On such systems the installation process is greatly simplified. | |||
==References== | |||
<references /> | |||
==External links== | |||
*[http://math-atlas.sourceforge.net/ math-atlas.sourceforge.net] Project homepage | |||
*[http://math-atlas.sourceforge.net/devel/atlas_contrib/ User contribution to ATLAS] | |||
*[http://math-atlas.sourceforge.net/devel/atlas_devel/ A Collaborative guide to ATLAS Development] | |||
*The [http://math-atlas.sourceforge.net/faq.html#doc FAQ] has links to the Quick reference guide to BLAS and Quick reference to ATLAS LAPACK API reference | |||
*[http://www.terborg.net/research/kml/installation.html Microsoft Visual C++ Howto] for ATLAS | |||
{{Numerical linear algebra}} | |||
[[Category:C libraries]] | |||
[[Category:Fortran libraries]] | |||
[[Category:Numerical software]] | |||
[[Category:Numerical linear algebra]] |
Revision as of 06:51, 21 December 2013
Template:Primary sources Template:Infobox Software
Automatically Tuned Linear Algebra Software (ATLAS) is a software library for linear algebra. It provides a mature open source implementation of BLAS APIs for C and Fortran77.
ATLAS is often recommended as a way to automatically generate an optimized BLAS library. While its performance often trails that of specialized libraries written for one specific hardware platform, it is often the first or even only optimized BLAS implementation available on new systems and is a large improvement over the generic BLAS available at Netlib. For this reason, ATLAS is sometimes used as a performance baseline for comparison with other products.
ATLAS runs on most Unix-like operating systems and on Microsoft Windows (using Cygwin). It is released under a BSD-style license without advertising clause, and many well-known mathematics applications including MATLAB, Mathematica, Scilab, Sage, and some builds of GNU Octave may use it.
Functionality
ATLAS provides a full implementation of the BLAS APIs as well as some additional functions from LAPACK, a higher-level library built on top of BLAS. In BLAS, functionality is divided into three groups called levels 1, 2 and 3.
- Level 1 contains vector operations of the form
- as well as scalar dot products and vector norms, among other things.
- Level 2 contains matrix-vector operations of the form
- Level 3 contains matrix-matrix operations such as the widely used General Matrix Multiply (GEMM) operation
Optimization approach
The optimization approach is called Automated Empirical Optimization of Software (AEOS), which identifies four fundamental approaches to computer assisted optimization of which ATLAS employs three:[1]
- Parameterization—searching over the parameter space of a function, used for blocking factor, cache edge, ...
- Multiple implementation—searching through various approaches to implementing the same function, e.g., for SSE support before intrinsics made them available in C code
- Code generation—programs that write programs incorporating what knowledge they can about what will produce the best performance for the system
- Optimization of the level 1 BLAS uses parameterization and multiple implementation
- Every ATLAS level 1 BLAS function has its own kernel. Since it would be difficult to maintain thousands of cases in ATLAS there is little architecture specific optimization for Level 1 BLAS. Instead multiple implementation is relied upon to allow for compiler optimization to produce high performance implementation for the system.
- Optimization of the level 2 BLAS uses parameterization and multiple implementation
- With data and operations to perform the function is usually limited by bandwidth to memory, and thus there is not much opportunity for optimization
- All routines in the ATLAS level 2 BLAS are built from two Level 2 BLAS kernels:
- GEMV—matrix by vector multiply update:
- GER—general rank 1 update from an outer product:
- Optimization of the level 3 BLAS uses code generation and the other two techniques
Level 3 BLAS
Most of the Level 3 BLAS is derived from GEMM, so that is the primary focus of the optimization.
The intuition that the operations will dominate over the data accesses only works for roughly square matrices. The real measure should be some kind of surface area to volume. The difference becomes important for very non-square matrices.
Can it afford to copy?
Copying the inputs allows the data to be arranged in a way that provides optimal access for the kernel functions, but this comes at the cost of allocating temporary space, and an extra read and write of the inputs.
So the first question GEMM faces is, can it afford to copy the inputs?
If so,
- Put into block major format with good alignment
- Take advantage of user contributed kernels and cleanup
- Handle the transpose cases with the copy: make everything into TN (transpose - no-transpose)
- Deal with α in the copy
If not,
- Use the nocopy version
- Make no assumptions on the stride of matrix A and B in memory
- Handle all transpose cases explicitly
- No guarantee about alignment of data
- Support α specific code
- Run the risk of TLB issues, bad strides, ...
The actual decision is made through a simple heuristic which checks for "skinny cases".
Cache edge
For 2nd Level Cache blocking a single cache edge parameter is used. The high level choose an order to traverse the blocks: ijk, jik, ikj, jki, kij, kji. These need not be the same order as the product is done within a block.
Typically chosen orders are ijk or jik. For jik the ideal situation would be to copy A and the NB wide panel of B. For ijk swap the role of A and B.
Choosing the bigger of M or N for the outer loop reduces the footprint of the copy. But for large K ATLAS does not even allocate such a large amount of memory. Instead it defines a parameter, Kp, to give best use of the L2 cache. Panels are limited to Kp in length. It first tries to allocate (in the jik case) . If that fails it tries . (If that fails it uses the no-copy version of GEMM, but this case is unlikely for reasonable choices of cache edge.) Kp is a function of cache edge and NB.
LAPACK
When integrating the ATLAS BLAS with LAPACK an important consideration is the choice of blocking factor for LAPACK. If the ATLAS blocking factor is small enough the blocking factor of LAPACK could be set to match that of ATLAS.
To take advantage of recursive factorization, ATLAS provides replacement routines for some LAPACK routines. These simply overwrite the corresponding LAPACK routines from Netlib.
Need for installation
Installing ATLAS on a particular platform is a challenging process which is typically done by a system vendor or a local expert and made available to a wider audience.
For many systems, architectural default parameters are available; these are essentially saved searches plus the results of hand tuning. If the arch defaults work they will likely get 10-15% better performance than the install search. On such systems the installation process is greatly simplified.
References
- ↑ One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting
In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang
Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules
Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.
A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running
The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more
There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang
External links
- math-atlas.sourceforge.net Project homepage
- User contribution to ATLAS
- A Collaborative guide to ATLAS Development
- The FAQ has links to the Quick reference guide to BLAS and Quick reference to ATLAS LAPACK API reference
- Microsoft Visual C++ Howto for ATLAS