.. _appendix-background:

=====================
background
=====================

.. contents:: :local:
    :depth: 1

.. index:: ! totally bounded set, diameter of a set, bounded set

Metric spaces
-------------

A **metric space** is a pair :math:`(X, d)` where :math:`X` is a set and :math:`d: X × X → ℝ` is a **metric** (or **distance function**), that is, a function satisfying the following conditions for all :math:`x, y, z ∈ X`:

  #. :math:`d(x, y) ≥ 0`
  #. :math:`d(x,y) = 0` if and only if :math:`x = y`
  #. (symmetry) :math:`d(x, y) = d(y, x)` 
  #. (triangle inequality) :math:`d(x, z) ≤ d(x, y)+d(y, z)`.

Let :math:`(X, d)` be a metric space and :math:`E` a subset of :math:`X`.

If :math:`\{V_α\}` is a family of subsets of :math:`X` such that :math:`E ⊆ ⋃_α V_α`, then :math:`\{V_α\}` is called a **cover** of :math:`E`.

The **diameter** of :math:`E` is :math:`\mathrm{diam} E = \sup \{d(x, y) : x, y ∈ E\}`.

The set :math:`E` is **bounded** if :math:`\mathrm{diam} E < ∞` and **totally bounded** if for every :math:`ε > 0` it can be covered by finitely many balls of radius :math:`ε`.

Compactness
~~~~~~~~~~~

.. index:: ! Bolzano-Weierstrass theorem, Heine-Borel theorem, compact set

.. _bolzano-weierstrass:
.. _heine-borel:
.. _fo.0.25:
.. _co.4.9:

.. proof:theorem:: Compactness theorem

   If :math:`E` is a subset of a metric space :math:`(X, d)`, then the following are equivalent.
  
   #. :math:`E` is complete and totally bounded.
   #. Every infinite set in :math:`E` has a limit point in :math:`E`;
   #. (Bolzano-Weierstrass) Every sequence in :math:`E` has a subsequence that converges to a point of :math:`E`.
   #. (Heine-Borel) If :math:`\{V_\alpha\}` is a cover of :math:`E` by open sets, there is a finite set :math:`\{\alpha_1, \dots, \alpha_n\}` such that :math:`\{V_{\alpha_i}\}_{i=1}^n` covers :math:`E`.

Sets that satisfying one (hence all) of the conditions in the previous theorem are called **compact**.  It is probably most common to define compactness using the last item, the Heine-Borel property, which is can be stated simply as follows: a set is compact iff *every open cover reduces to a finite subcover*.

(Cf. :cite:`Folland:1999` Thm. 0.25, :cite:`Conway:1978` Cor. 4.9)

Density and Category
~~~~~~~~~~~~~~~~~~~~

Let us recall some basic definitions.

+ :math:`G` is a **dense** set in :math:`X` if each :math:`x\in X` is a limit point of :math:`G`. (Equivalently, :math:`\bar{G} = X`.)
+ :math:`G` is a **nowhere dense** set in :math:`X` if :math:`\bar{G}` contains no nonempty open subsets of :math:`X`. (Equivalently, :math:`\bar{G}^o = \emptyset`.)
+ A set :math:`G` is of the **first category** if it is a countable union of nowhere dense sets.
+ A set :math:`G` is of the **second category** if it is not of the first category.

.. index:: ! Baire category theorem
.. _baire:

.. proof:theorem:: Baire category theorem

   No nonempty complete metric space is of the first category.

In other words, if :math:`X` is a complete metric space and :math:`\{A_n\}` is a collection of open dense subsets, then :math:`⋂_{n=1}^∞ A_n` is dense in :math:`X`.

There are many important consequences of the Baire category theorem. The most famous are probably the :term:`Banach-Steinhaus theorem`, :term:`Open mapping theorem`, :term:`Inverse mapping theorem`, and :term:`Closed graph theorem`.

We present versions of those important results in the :ref:`appendix of theorems <appendix-theorems>`, but first here are two immediate corollaries of the :term:`Baire category theorem`.

.. _cor1-baire:

.. proof:corollary:: Baire corollary 1

   If :math:`X` is a complete metric space and :math:`G ⊆ X` is a nonempty open subset and :math:`G= ⋃_{n=1}^∞ G_n` then :math:`Ḡ_n^° ≠ ∅` for at least one :math:`n ∈ ℕ`.

.. _cor2-baire:

.. proof:corollary:: Baire corollary 2

   A nonempty complete metric space is not a countable union of nowhere dense sets.

.. index:: ! uniformly continuous, continuous
.. _uniformly-continuous:
.. _continuous:

Continuous maps of a metric space
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let :math:`(X, d_X)` and :math:`(Y, d_Y)` be metric spaces. 

The notation :math:`|x - x'|_X`, or even :math:`|x - x'|`, is often used in place of :math:`d_X(x,x')`.

A function :math:`f : X \to Y` is called

+ **continuous** at the point :math:`x_0 \in X` if

  .. math:: (\forall \epsilon >0)\, (\exists \delta > 0) \, (|x - x_0| < \delta \, \to \, |f(x) -f(x_0)| < \epsilon),

+ **continuous** in :math:`E\subseteq X` if it is continuous at every point of :math:`E`,

+ **uniformly continuous** in :math:`E\subseteq X` if

  .. math:: (\forall \epsilon >0)\, (\exists \delta >0)\, (\forall x, x_0 \in E) \, (|x - x_0| < \delta \, \to \, |f(x) -f(x_0)| < \epsilon).

.. _basic-continuity-theorem:

.. proof:theorem:: Basic continuity theorem

   If :math:`f` is continuous in a compact set, then it is uniformly continuous in that set.

-------------------------------------

Topological spaces
------------------

For a :term:`topological space` :math:`(X, τ)` and a point :math:`x ∈ X`, a collection :math:`ℬ_x` of :term:`neighborhoods <neighborhood>` of :math:`x` is called a **base** for the topology at :math:`x` provided for any neighborhood :math:`V` of :math:`x`, there is a set :math:`B ∈ ℬ_x` for which :math:`B ⊆ V`. A collection :math:`ℬ` of open sets is called a **base** for the topology :math:`τ` provided it contains a base for the topology at each point.

Observe that a subcollection :math:`ℬ ⊆ τ` is a base for :math:`τ` if and only if every nonempty open set is the union of a subcollection of :math:`ℬ`.

.. proof:theorem

    For a nonempty set :math:`X`, let :math:`ℬ` be a collection of subsets of :math:`X`. Then :math:`ℬ` is a base for a topology if and only if
    
    #. :math:`ℬ` covers :math:`X`; that is, :math:`X = ⋃_{B ∈ ℬ} B`.
    #. if :math:`B_1` and :math:`B_2` are in :math:`ℬ` and :math:`x ∈ B_1 ∩ B_2`, then there is a set :math:`B ∈ ℬ` such that :math:`x ∈ B ⊆ B_1 ∩ B_2`.

    The unique topology that has :math:`ℬ` as its base consists of :math:`∅` and unions of subcollections of :math:`ℬ`.


.. Continuous maps of topological spaces
.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. Let :math:`(X, \tau)` and :math:`(Y, \nu)` be topological spaces.

.. A function :math:`f : X \to Y` is called **continuous** if :math:`V \in \nu` implies :math:`f^{-1}(V)\in \tau`.


--------------------------

Abstract Measure
------------------

Recall the definitions of a :term:`semiring of sets`, a :term:`ring of sets` and an :term:`algebra of sets`.

It should be obvious that every ring is closed under finite unions. Also, since an arbitrary ring :math:`R` is nonempty, it contains a set, say :math:`A ∈ R`, and so :math:`∅ = A - A ∈ R`. That is, every ring contains the empty set.

Also, from the identities 

.. math:: A \mathrel{Δ} B =(A- B) ∪ (B -A) \quad  \text{ and } \quad A∩ B =A - (A- B),

it easily follows that every ring is closed under symmetric differences and finite intersections.

From the identities it should also be obvious that every algebra is a ring. In the converse direction, we have the following result:

.. proof:lemma::

   A :term:`ring of subsets <ring of sets>` of :math:`X` is an algebra if and only if it contains :math:`X`.

----------------------------------------

Measurable Functions
--------------------

:term:`Continuous functions <continuous function>` of continuous functions are continuous, and continuous functions of :term:`measurable functions <measurable function>` are measurable. We state this as

.. theorem 1.7 of :cite:`Rudin:1987`

.. _ru.1.7:

.. proof:theorem::

   Let :math:`Y` and :math:`Z` be :term:`topological spaces <topological space>`, and let :math:`g: Y → Z` be a :term:`continuous function`.

   #. If :math:`X` is a topological space, if :math:`f: X → Y` is continuous, and if :math:`h = g ∘ f`, then :math:`h: X → Z` is continuous.

   #. If :math:`X` is a :term:`measurable space`, if :math:`f: X → Y` is a :term:`measurable function`, and if :math:`h = g ∘ f`, then :math:`h: X → Z` is measurable.

   .. container:: toggle

     .. container:: header

        *Proof*.

     If :math:`V` is an :term:`open set` in :math:`Z`, then :math:`g^{-1}(V)` is open in :math:`Y` and :math:`h^{-1}(V) = (g ∘ f)^{-1}(V) = f^{-1}(g^{-1}(V))`.
   
     If :math:`f` is continuous, then :math:`h^{-1}(V)` is open, proving the first statement of the theorem.
   
     If :math:`f` is measurable, then :math:`h^{-1}(V)` is measurable, proving the second statement of the theorem.
   
     Note, however, that *measurable functions of continuous functions need not be measurable*.

.. Theorem 1.8. of :cite:`Rudin:1987` 

.. _ru.1.8:

.. proof:theorem::

   Let :math:`u` and :math:`v` be real-valued :term:`measurable functions <measurable function>` on a :term:`measurable space` :math:`X`, let :math:`Φ` be a continuous mapping of the plane into a :term:`topological space` :math:`Y`, and define :math:`h(x) = Φ(u(x), v(x))` for all :math:`x ∈ X`. Then :math:`h: X → Y` is measurable.

------------------------

.. _absolute-continuity-of-functions:

Absolute continuity of functions
---------------------------------

.. proof:theorem:: AC is equivalent to being an indefinite integral

   A function :math:`f` is :term:`absolutely continuous <absolutely continuous function>` on :math:`[a,b]` if and only if :math:`f'` exists a.e. in :math:`(a,b)`, :math:`f' ∈ L_1[a,b]`, and :math:`∫_a^x f'(t) \,dt = f(x) - f(a)` for all :math:`a ≤ x ≤ b`.

   .. container:: toggle

       .. container:: header

           **Proof**.

       (⇒) Suppose :math:`f∈ AC[a,b]`.

         Then :math:`f∈ BV[a,b]`, since :term:`AC implies BV`.  Therefore, :math:`f = g - h` for some :term:`monotone increasing functions <monotone increasing function>` :math:`g` and :math:`h`.  By the :term:`differentiability of increasing functions` :math:`f' = g' - h'` exists a.e. in :math:`(a,b)` and :math:`|f'(x)| ≤ g'(x) + h'(x)` for all :math:`a< x <b`.  
         
         .. math:: ∫_a^x |f'(t)| \,dt &≤ ∫_a^x (g'(t) + h'(t)) \, dt\\
                                      &= ∫_a^x g'(t)\, dt + ∫_a^x h'(t) \, dt\\
                                      &≤ (g'(x) - g'(a)) + (h'(x) - h'(a))\\
                                      &≤ g'(x) + h'(x) - g'(a) - h'(a).
   
         The second inequality follows from the :term:`theorem on differentiability of increasing functions <differentiability of increasing functions>`. 
   
         Therefore, :math:`f'∈ L_1[a,b]`.
         
         Let :math:`G(x) = ∫_a^x f'(t)\, dt`. Then :math:`G'(x) = f'(x)` for almost all :math:`x ∈ [a,b]`, and :math:`G` is absolutely continuous and so is the function :math:`F = f - G`.
         
         Therefore, :math:`F'(x) = f'(x) - G'(x) = 0` for almost all :math:`x ∈ [a,b]`, by the :term:`theorem on the derivative of the integral <derivative of the integral>`.
         
         It follows from this and the :term:`lemma on functions with a.e. zero derivative <AC and a.e. zero derivative implies constant>` that :math:`F` is constant a.e. on :math:`[a,b]`.
         
         Since :math:`F(a) = f(a) + G(a) = f(a)`, we see that :math:`F(x) = f(a)` for almost all :math:`x∈ [a,b]`.  Therefore,
   
         .. math:: f(a) = F(a) = F(x) = f(x) - G(x) = f(x) - ∫_a^x f'(t)\, dt,
   
         as desired.
   
       (⇐) Assume the stated hypotheses.
   
         By a standard theorem, [2]_ :math:`f' ∈ L_1` implies that for all :math:`ε > 0` there is a :math:`δ > 0` such that, if :math:`E ⊆ ℝ` is a measurable set of measure :math:`m E < δ`, then
   
         .. math:: ∫_E |f'|\, dm < ε.
            :label: l8-1
   
         Let :math:`A = ⋃_{i=1}^n (a_i,b_i)` be a finite union of disjoint open intervals in :math:`[a,b]` such that :math:`∑_{i=1}^n (b_i-a_i) < δ`. Then :math:`m A ≤ ∑_{i=1}^n (b_i-a_i) < δ`, so
   
         .. math:: ∑_{i=1}^n |f(b_i)-f(a_i)| = ∑_{i=1}^n \left|∫_{a_i}^{b_i} f' dm\right| ≤ ∑_{i=1}^n ∫_{a_i}^{b_i} |f'| dm = ∫_A |f'| dm < ε
            :label: l8-2
   
         by :eq:`l8-1`. Thus, :math:`f ∈ AC[a,b]`.
         
       ☐

The following is essentially just a rewording of the previous result in such a way that might make it easier to apply.

.. proof:theorem:: Fundamental theorem of calculus

   If :math:`-∞ < a < b < ∞` and :math:`f: [a,b] → ℂ`, then the following are equivalent.

   #. :math:`f ∈ AC[a,b]`
   #. :math:`f(x) - f(a) = ∫_a^x g(t) \, dt` for some :math:`g∈ L_1([a,b], m)`.
   #. :math:`f` is differentiable a.e. on :math:`[a,b]`, :math:`f' ∈ L_1([a,b],m)` and :math:`∫_a^x f' \, dm = f(x) - f(a)`.

   .. container:: toggle
   
       .. container:: header
   
           **Proof**.

       The equivalence 1 ⟺ 2 is the theorem asserting that :term:`AC is equivalent to being an indefinite integral`.

       The fact that 1 ⟹ 3 follows from the forward direction of the previous theorem.

       The converse, 3 ⟹ 1, is also covered by the previous theorem and its proof, but here is another argument (which is only slightly different).

       Recall, if :math:`f ∈ L_1` then :math:`∀ ε > 0` :math:`∃ δ > 0` such that if :math:`E` is measurable then 

       .. math:: m E < δ \quad ⟹ \quad \left|∫_E f' \, dm \right| < ε.

       Therefore, if :math:`\{(a_i, b_i)\}` is a finite collection of disjoint intervals with :math:`∑_{i=1}^n (b_i - a_i) < δ`, then :math:`m(⋃_i (a_i, b_i)) < δ`, so

       .. math:: ∑_{i=1}^n (f(b_i) - f(a_i)) =  ∑_{i=1}^n ∫^{b_i}_{a_i} f'\, dm = \left|∫_{⋃_i(a_i,b_i)} f'\, dm \right| < ε.

       ☐

----------------------------------------------

.. index:: conjugate exponents

Integration
-----------

There are a handful of results that are the most essential, and lay the foundation on which everything else is built. Rudin :cite:`Rudin:1987` gives a beautifully succinct and clear presentation of these in just seven pages (pp. 21--27). [3]_ Some of these results are presented below, but do yourself a favor and learn from the master himself by reading :cite:`Rudin:1987`.

If :math:`p` and :math:`q` are positive real numbers such that :math:`p+q = pq` (equivalently, :math:`(1/p) + (1/q) = 1`), then we call :math:`p` and :math:`q` a pair of **conjugate exponents.**

It is clear that conjugate exponents satisfy :math:`1 < p, q < ∞` and that as :math:`p → 1`, :math:`q → ∞`. Thus, :math:`1` and :math:`∞` are regarded as conjugate exponents.

The following theorem is an essential ingredient of many proofs (e.g. the proof that simple functions are dense in :math:`L_p`, presented below).

.. _ru.1.17:

.. proof:theorem::

   If :math:`f: X → [0,∞]` is a :term:`measurable function`, then there exist measurable :term:`simple functions <simple function>` :math:`s_1, s_2, \dots` on :math:`X` such that

   #. :math:`0 ≤ s_1 ≤ s_2 ≤ \cdots ≤ f`,

   #. :math:`s_n(x) → f(x)` as :math:`n → ∞`, for every :math:`x ∈ X`.

This is Theorem 1.17 of :cite:`Rudin:1987`, where the proof is also presented.

Here is a list of the other most important and useful results about integration.

  * :term:`monotone convergence theorem`
  * :term:`Fatou's lemma`
  * :term:`dominated convergence theorem`
  * :term:`Egoroff's theorem`
  * :term:`Hölder's inequality`
  * :term:`Minkowski's inequality`

.. .. math:: A_0 = \{x ∈ X : f(x) = 0\}, \quad A_n = \{x ∈ X : |f(x)| ∈ [1/n, n]\}, \quad A_∞ = \{x : |f(x)| = ∞\}.

Here is a nice application of the :term:`dominated convergence theorem`; it essentially says that if :math:`f` is integrable, then the majority of the integral :math:`∫f` comes from integrating over a set of finite measure.

.. proof:theorem::

   Let :math:`(X, 𝔐, μ)` be a measure space.  If :math:`1 ≤ p < ∞` and :math:`f∈ L_p(μ)` and :math:`ε>0`, then there exists a set :math:`A ∈ 𝔐` such that :math:`μ(A) < ∞`, :math:`f` is bounded on :math:`A` and :math:`∫_{X-A}|f|^p \,dμ < ε`.

   .. container:: toggle
   
       .. container:: header
   
           **Proof**.
   
       **Case 1**. Assume :math:`f∈ L_1`.
       
         Define :math:`A_0 = \{x ∈ X : f(x) = 0\}` and :math:`A_∞ = \{x : |f(x)| = ∞\}` and 
   
         .. math:: A_n = \{x ∈ X : |f(x)| ∈ [1/n, n]\}.
   
         Then :math:`A_1 ⊆ A_2 ⊆ \cdots` and
       
         .. math:: \lim_{n→∞} A_n  = 𝔄 := ⋃_{i=1}^∞ A_n = X - A_0 - A_∞.
   
         Note that :math:`A_∞` must have measure 0, since :math:`f ∈ L_1`.  Therefore, :math:`∫_{A_0 ∪ A_∞} f\, dμ = 0`, so 
   
         .. math:: ∫_X f\, dμ = ∫_𝔄 f\, dμ + ∫_{A_0 ∪ A_∞} f\, dμ = ∫_𝔄 f \, dμ.
            :label: neglig
   
         Define :math:`g_n = f χ_{A_n}`.  Then 
   
         .. math:: |g_n(x)| = |f(x)| χ_{A_n}(x) ≤ |f(x)| \quad (∀ x ∈ X; n = 1, 2, \dots),
     
         and
     
         .. math:: \lim_n g_n(x) = f(x) χ_𝔄(x) \quad (∀ x ∈ X).
     
         Therefore, the :term:`dominated convergence theorem` implies that :math:`∫_X|g_n - f χ_𝔄|\, dμ → 0`.
     
         Next observe,
     
         .. math:: \bigl| ∫_{A_n}f\, dμ - ∫_X f\, dμ \bigr| &= \bigl| ∫_{A_n}f\, dμ - ∫_𝔄 f\, dμ \bigr| \quad \text{(by eq:`neglig`)}\\
          & = \bigl| ∫_X g_n \, dμ - ∫_𝔄 f\, dμ\bigr|\\
          & ≤ ∫_X |g_n - f χ_𝔄|\, dμ.
     
         which tends to zero since :math:`g_n → f χ_𝔄` as :math:`n→ ∞`.
         
         Thus, we can choose :math:`N>0` such that
     
         .. math:: \bigl| ∫_{X - A_N}f\, dμ \bigr| = \bigl| ∫_{A_N}f\, dμ -  ∫_X f\, dμ\bigr| ≤ ∫_𝔄 |g_N - f|\, dμ < ε.
     
         Finally, note that, by definition of :math:`A_N`, we have :math:`1/N ≤ |f(x)| ≤ N` (so :math:`f` is bounded) on :math:`A_N` and
     
         .. math:: \frac{1}{N} μ A_N ≤ ∫_{A_N}|f|\, dμ ≤ ∫_X |f|\, dμ < ∞.
   
       **Case 2**. Assume :math:`f∈ L_p` and :math:`1 < p < ∞`.
   
         .. todo:: complete proof.


----------------------------------------------------------------


Approximating integrable functions by step functions
-----------------------------------------------------

A property that holds for all step functions and is preserved under the taking of limits also holds for all integrable functions.  This is a consequence of the following lemma. (See also the exercises in Chapter 2 of :cite:`Rudin:1987`).

.. proof:lemma::

   If :math:`f ∈ L_1(ℝ)` then there exists a sequence :math:`\{g_n\}` of :term:`step functions <step function>` such that :math:`\lim_{n → ∞} ∫ |f-g_n| = 0`.

   .. container:: toggle
   
       .. container:: header
   
           **Proof**.
   
       We must show that there exists a sequence :math:`\{g_n\}` of step functions with the following property: :math:`∀ ε > 0`, :math:`∃ N ∈ ℕ`,
   
       .. math:: n ≥ N \; ⟹ \; ∫ |f-g_n| < ε.
   
       We proceed by a sequence of steps in which :math:`f` is assumed to have a special form. In each step the form of :math:`f` is slightly more general than in the previous step.
   
       *Step 1*. Suppose :math:`f = χ_A` for some measurable set :math:`A ⊆ ℝ`.
       
         By assumption :math:`f ∈ L_1(ℝ)`, so :math:`μ A < ∞`.
   
         By definition of outer measure,
   
         .. math:: μ A = \inf \bigl\{ ∑ μ A_i ∣ \{A_i\} ⊂ S, A ⊆ ⋃ A_i\},
   
         where :math:`S = \{[a, b) : -∞ < a < b < ∞\}`.
   
         Thus we can choose :math:`\{A_i\} ⊆ S` such that :math:`A ⊆ ⋃ A_i` and :math:`A_i ∩ A_j = ∅` :math:`(i ≠ j)` and
   
         .. math:: μ A ≤ μ (∪ A_i) ≤ ∑ μ A_i ≤ μ A + ε/2.
   
         Define :math:`B := ⋃ A_i`. Then :math:`A ⊆ B` and :math:`μ A ≤ μ B ≤ μ A + ε/2`.
   
         Since :math:`A ⊆ B` implies :math:`χ_A ≤ χ_B`, we have
   
         .. math:: ∫ |f-χ_B| = ∫ |χ_B - χ_A| = μ B - μ A < ε/2.
   
         Now :math:`χ_B` is not a :term:`step function` as it may have infinitely many terms, so we consider
   
         .. math:: φ_n = ∑_{i=1}^n χ_{A_i}.
   
         Since the :math:`A_i`'s are disjoint, we have
   
         .. math:: χ_B = χ_{∪A_i} = ∑_{i=1}^∞ χ_{A_i} = \lim_{n→∞} φ_n.
   
         By the :term:`monotone convergence theorem`, :math:`∫ φ_n → ∫ χ_B`.
   
         Let :math:`N` be such that :math:`∫(χ_B - φ_n) < ε/2` :math:`(n ≥ N)`.  Then :math:`∀ n ≥ N`,
   
         .. math:: ∫ |χ_A-φ_n| &≤ ∫ |χ_A - χ_B| + ∫ |χ_B-φ_n|\\
                               &= ∫ (χ_B - χ_A) + ∫ (χ_B-φ_n) < ε.
   
         Now note that :math:`φ_n` is a finite linear combination of characteristic functions of bounded intervals; i.e., :math:`φ_n` is a :term:`step function`.
   
         This completes the proof for the special case :math:`f = χ_A` where :math:`A ⊆ ℝ` is a measurable set of finite measure.
   
         *Step 2*. Suppose :math:`f` is a measurable :term:`simple function`.
         
           Then :math:`f = ∑_{i=1}^n α_i χ_{A_i}`, where each :math:`A_i` is measurable.
   
           By assumption :math:`f ∈ L_1(ℝ)`, so :math:`μ A_i < ∞` for each :math:`i = 1, \dots, n`.
   
           By Step 1, for each :math:`1 ≤ i ≤ n`, we can find a step function :math:`φ_i` such that
   
           .. math:: ∫ |χ_{A_i} - φ_i| < \frac{ε}{nM},
   
           where :math:`M = \max \{|α_i| : 1≤ j≤ n\}`.  Then :math:`φ = ∑_{i=1}^n α_i φ_i` is a step function and
   
           .. math:: ∫ |f-φ| &= ∫ \bigl|∑α_i χ_{A_i} - ∑φ_i\bigr|\\
                             &= ∫ \bigl|∑(α_i χ_{A_i} - φ_i)|\\
                             &≤ ∑ α_i ∫ |χ_{A_i} - φ_i|\\
                             &≤ ∑ α_i \frac{ε}{nM} ≤ ε.
   
         *Step 3*. Suppose :math:`f` is a :term:`nonnegative <nonnegative function>` integrable function; i.e., :math:`f ≥ 0` and :math:`f ∈ L_1(ℝ)`.
   
         Then :math:`∀ ε > 0` there exists a simple function :math:`s ≤ f` such that :math:`∫ s ≤ ∫ f < ∫ s + ε/2`. Whence, :math:`0 ≤ ∫ (f - s) < ε/2`.
   
         Also, the assumption :math:`s ≤ f ∈ L_1(ℝ)` implies :math:`s ∈ L_1(ℝ)`.
   
           By Step 2, there exists a step function :math:`φ` such that :math:`∫ (s - φ) < ε/2`.  Therefore,
   
           .. math:: ∫ |f-φ| = ∫ |f - s + s - φ| ≤ ∫ (f - s) + ∫ |s - φ| < ε.
   
         *Step 4*. Let :math:`f ∈ L_1(ℝ)`.  Write :math:`f = f^+ - f^-`  where :math:`f^+` and :math:`f^-` are nonnegative integrable functions on :math:`ℝ`.
   
           Then, for all :math:`ε > 0` there exist simple functions :math:`φ, ψ` such that
   
           .. math:: ∫ |f^+-φ| < ε/2 \; \text{ and } \; ∫ |f^--ψ| < ε/2.
   
           Whence,
   
           .. math:: ∫ |f - (φ - ψ)|  &= ∫ | f^+ - f^- - φ + ψ| \\
                                      &= ∫ | (f^+ - φ) + (ψ - f^-) \\
                                      &≤ ∫ | f^+ - φ| + ∫|ψ - f^-| < ε. \\
   
           Thus, :math:`g := φ-ψ` is a simple function such that :math:`∫|f - g| < ε`. ☐

----------------------------------------------------

Fubini and Tonelli Theorems
------------------------------

.. index:: ! Fubini's theorem
.. _fubini:


We present this version of the Fubini/Tonelli theorems in the :ref:`appendix of theorems <appendix-theorems>`.

--------------------------------------------

.. index:: Banach space, normed linear space, linear functional, functional

Linear Spaces and Functionals
-----------------------------

The main reference for this section is :cite:`Rudin:1987`.

**Notation.** (cf. :cite:`Rudin:1987`, 2.9) The support of a complex function :math:`f` on a topological space :math:`X` is the closure of the set :math:`\{x:f(x) \neq 0\}`.

The collection of all continuous complex functions on :math:`X` whose support is compact is denoted by :math:`C_c(X)`.

Observe that :math:`C_c(X)` is a vector space because,

  (a) the support off :math:`f + g` lies in the union of the respective supports of :math:`f` and :math:`g`, and any finite union of compact sets is compact, and

  (b) the sum of two continuous complex functions is continuous, as are scalar multiples of continuous functions.

There are at least two useful versions of the famous representation theorem of F. Riesz. We unimaginatively call these the :term:`Riesz representation theorem` and :term:`Riesz representation theorem (version 2)` in the :ref:`appendix of theorems <appendix-theorems>`.

Another result that everyone should know is the :term:`Hahn-Banach theorem`.

Consequences of the Baire Category Theorem
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here are the four famous consequences of the Baire category theorem that we mentioned above.

* :term:`Banach-Steinhaus theorem`
* :term:`open mapping theorem`
* :term:`inverse mapping theorem`.
* :term:`closed graph theorem`.

-------------------------------------------------------------------------

.. index:: ! Hilbert space, separable Hilbert space, isometry

Hilbert Space
-------------

A **Hilbert space** is a normed linear space whose norm arises from an inner product.

An (infinite) Hilbert space is called **separable** if it has a countable orthonormal basis :math:`S`.

(We adopt the convention that the term :term:`separable` is only applied to infinite Hilbert spaces. [6]_ )

Two Hilbert spaces :math:`ℋ_1, ℋ_2` are called **isometrically isomorphic** if there exists a unitary operator :math:`U: ℋ_1 ↠ ℋ_2`.

In other words, :math:`U` is a surjective :term:`isometry` from :math:`ℋ_1` to :math:`ℋ_2`, which means that :math:`U` is a linear surjection that "preserves the inner product" in the following sense: :math:`⟨ Ux, Uy ⟩_{ℋ_2} = ⟨ x, y ⟩_{ℋ_1}`.

----------------------

.. rubric:: Footnotes

.. [2]
   This "standard theorem" appears often on exams (see, e.g., :numref:`Problem {number} <1991Nov_p6>`), but in a slightly weaker form in which the conclusion is that :math:`|∫_E f' \, dm| < ε`. In the present case we need :math:`∫_E |f'|\, dm < ε` to get the sum in :eq:`l8-2` to come out right.

.. [3]
   Study these seven pages until you can recite all seven theorems and their proofs in your sleep. Also, pay close attention to the details. Rudin is careful to choose definitions and hypotheses that lend themselves concise exposition, usually without too much loss of generality. For example, he often takes the range of a “real-valued” function to be :math:`[-\infty, \infty]`, rather than :math:`\mathbb R`. It is instructive to pause occasionally and consider how his arguments depend on such choices.

.. [6]
   If we allowed a finite Hilbert space in the definition, then it would automatically be separable, so the concept is not interesting in the finite case.

---------------------------------

.. blank