Skip to content

Commit 11d751a

Browse files
authored
Merge pull request #164 from hhy3/fix-typo
Fix typo
2 parents e5a7b80 + b193582 commit 11d751a

File tree

6 files changed

+7
-7
lines changed

6 files changed

+7
-7
lines changed

content/english/hpc/compilation/flags.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ There are 4 *and a half* main levels of optimization for speed in GCC:
1212

1313
- `-O0` is the default one that does no optimizations (although, in a sense, it does optimize: for compilation time).
1414
- `-O1` (also aliased as `-O`) does a few "low-hanging fruit" optimizations, almost not affecting the compilation time.
15-
- `-O2` enables all optimizations that are known to have little to no negative side effects and take reasonable time to complete (this is what most projects use for production builds).
15+
- `-O2` enables all optimizations that are known to have little to no negative side effects and take a reasonable time to complete (this is what most projects use for production builds).
1616
- `-O3` does very aggressive optimization, enabling almost all *correct* optimizations implemented in GCC.
1717
- `-Ofast` does everything in `-O3`, plus a few more optimizations flags that may break strict standard compliance, but not in a way that would be critical for most applications (e.g., floating-point operations may be rearranged so that the result is off by a few bits in the mantissa).
1818

content/english/hpc/compilation/situational.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ The whole process is automated by modern compilers. For example, the `-fprofile-
9696
g++ -fprofile-generate [other flags] source.cc -o binary
9797
```
9898

99-
After we run the program — preferably on input that is as representative of real use case as possible — it will create a bunch of `*.gcda` files that contain log data for the test run, after which we can rebuild the program, but now adding the `-fprofile-use` flag:
99+
After we run the program — preferably on input that is as representative of the real use case as possible — it will create a bunch of `*.gcda` files that contain log data for the test run, after which we can rebuild the program, but now adding the `-fprofile-use` flag:
100100

101101
```
102102
g++ -fprofile-use [other flags] source.cc -o binary

content/english/hpc/external-memory/_index.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -19,15 +19,15 @@ When you fetch anything from memory, the request goes through an incredibly comp
1919
2020
-->
2121

22-
When you fetch anything from memory, there is always some latency before the data arrives. Moreover, the request doesn't go directly to its ultimate storage location, but it first goes through a complex system of address translation units and caching layers designed to both help in memory management and reduce the latency.
22+
When you fetch anything from memory, there is always some latency before the data arrives. Moreover, the request doesn't go directly to its ultimate storage location, but it first goes through a complex system of address translation units and caching layers designed to both help in memory management and reduce latency.
2323

2424
Therefore, the only correct answer to this question is "it depends" — primarily on where the operands are stored:
2525

2626
- If the data is stored in the main memory (RAM), it will take around ~100ns, or about 200 cycles, to fetch it, and then another 200 cycles to write it back.
2727
- If it was accessed recently, it is probably *cached* and will take less than that to fetch, depending on how long ago it was accessed — it could be ~50 cycles for the slowest layer of cache and around 4-5 cycles for the fastest.
2828
- But it could also be stored on some type of *external memory* such as a hard drive, and in this case, it will take around 5ms, or roughly $10^7$ cycles (!) to access it.
2929

30-
Such high variance of memory performance is caused by the fact that memory hardware doesn't follow the same [laws of silicon scaling](/hpc/complexity/hardware) as CPU chips do. Memory is still improving through other means, but if 50 years ago memory timings were roughly on the same scale with the instruction latencies, nowadays they lag far behind.
30+
Such a high variance of memory performance is caused by the fact that memory hardware doesn't follow the same [laws of silicon scaling](/hpc/complexity/hardware) as CPU chips do. Memory is still improving through other means, but if 50 years ago memory timings were roughly on the same scale with the instruction latencies, nowadays they lag far behind.
3131

3232
![](img/memory-vs-compute.png)
3333

content/english/hpc/external-memory/hierarchy.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ There are other caches inside CPUs that are used for something other than data.
5858

5959
### Non-Volatile Memory
6060

61-
While the data cells in CPU caches and the RAM only gently store just a few electrons (that periodically leak and need to be periodically refreshed), the data cells in *non-volatile memory* types store hundreds of them. This lets the data to persist for prolonged periods of time without power but comes at the cost of performance and durability — because when you have more electrons, you also have more opportunities for them colliding with silicon atoms.
61+
While the data cells in CPU caches and the RAM only gently store just a few electrons (that periodically leak and need to be periodically refreshed), the data cells in *non-volatile memory* types store hundreds of them. This lets the data persist for prolonged periods of time without power but comes at the cost of performance and durability — because when you have more electrons, you also have more opportunities for them to collide with silicon atoms.
6262

6363
<!-- error correction -->
6464

content/english/hpc/external-memory/model.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Similar in spirit, in the *external memory model*, we simply ignore every operat
1818

1919
In this model, we measure the performance of an algorithm in terms of its high-level *I/O operations*, or *IOPS* — that is, the total number of blocks read or written to external memory during execution.
2020

21-
We will mostly focus on the case where the internal memory is RAM and external memory is SSD or HDD, although the underlying analysis techniques that we will develop are applicable to any layer in the cache hierarchy. Under these settings, reasonable block size $B$ is about 1MB, internal memory size $M$ is usually a few gigabytes, and $N$ is up to a few terabytes.
21+
We will mostly focus on the case where the internal memory is RAM and the external memory is SSD or HDD, although the underlying analysis techniques that we will develop are applicable to any layer in the cache hierarchy. Under these settings, reasonable block size $B$ is about 1MB, internal memory size $M$ is usually a few gigabytes, and $N$ is up to a few terabytes.
2222

2323
### Array Scan
2424

content/english/hpc/number-theory/modular.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ $$
100100
$$
101101
\begin{aligned}
102102
a^p &= (\underbrace{1+1+\ldots+1+1}_\text{$a$ times})^p &
103-
\\\ &= \sum_{x_1+x_2+\ldots+x_a = p} P(x_1, x_2, \ldots, x_a) & \text{(by defenition)}
103+
\\\ &= \sum_{x_1+x_2+\ldots+x_a = p} P(x_1, x_2, \ldots, x_a) & \text{(by definition)}
104104
\\\ &= \sum_{x_1+x_2+\ldots+x_a = p} \frac{p!}{x_1! x_2! \ldots x_a!} & \text{(which terms will not be divisible by $p$?)}
105105
\\\ &\equiv P(p, 0, \ldots, 0) + \ldots + P(0, 0, \ldots, p) & \text{(everything else will be canceled)}
106106
\\\ &= a

0 commit comments

Comments
 (0)