social.tchncs.de is one of the many independent Mastodon servers you can use to participate in the fediverse.
A friendly server from Germany – which tends to attract techy people, but welcomes everybody. This is one of the oldest Mastodon instances.

Administered by:

Server stats:

3.8K
active users

#parametrization

0 posts0 participants0 posts today

Extending the Forward Forward Algorithm
arxiv.org/abs/2307.04205

The Forward Forward algorithm (Geoffrey Hinton, 2022-11) is an alternative to backpropagation for training neural networks (NN)

Backpropagation - the most widely successful and used optimization algorithm for training NN - has 3 important limitations ...

Hinton's paper: cs.toronto.edu/~hinton/FFA13.p
Discussion: bdtechtalks.com/2022/12/19/for
...

arXiv.orgExtending the Forward Forward AlgorithmThe Forward Forward algorithm, proposed by Geoffrey Hinton in November 2022, is a novel method for training neural networks as an alternative to backpropagation. In this project, we replicate Hinton's experiments on the MNIST dataset, and subsequently extend the scope of the method with two significant contributions. First, we establish a baseline performance for the Forward Forward network on the IMDb movie reviews dataset. As far as we know, our results on this sentiment analysis task marks the first instance of the algorithm's extension beyond computer vision. Second, we introduce a novel pyramidal optimization strategy for the loss threshold - a hyperparameter specific to the Forward Forward method. Our pyramidal approach shows that a good thresholding strategy causes a difference of upto 8% in test error. 1 Lastly, we perform visualizations of the trained parameters and derived several significant insights, such as a notably larger (10-20x) mean and variance in the weights acquired by the Forward Forward network.

Pruning vs Quantization: Which is Better?
arxiv.org/abs/2307.02973

* Pruning remove weights reducing memory footprint
* Quantization (4-bit, 8-bit matrix multiplication; ...) reduces bit-width used for both weights / computation used in neural networks, leading to both predictable memory savings & reductions in the necessary compute

In most cases quantization outperforms pruning.

arXiv.orgPruning vs Quantization: Which is Better?Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.

Training Transformers with 4-bit Integers
arxiv.org/abs/2306.11987

... we propose a training method for transformers with matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging ... we carefully analyze the specific structures of activation & gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify ...

arXiv.orgTraining Transformers with 4-bit IntegersQuantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural network training. However, existing 4-bit training methods require custom numerical formats which are not supported by contemporary hardware. In this work, we propose a training method for transformers with all matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging. To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers. For backpropagation, we leverage the structural sparsity of gradients by proposing bit splitting and leverage score sampling techniques to quantize gradients accurately. Our algorithm achieves competitive accuracy on a wide range of tasks including natural language understanding, machine translation, and image classification. Unlike previous 4-bit training methods, our algorithm can be implemented on the current generation of GPUs. Our prototypical linear operator implementation is up to 2.2 times faster than the FP16 counterparts and speeds up the training by up to 35.1%.

John von Neumann once claimed, "with 4 parameters, I can fit an elephant, and with 5, I can make him wiggle his trunk."
\[x(t)=\displaystyle\sum_{k=0}^\infty\left(A_k^x\cos(kt)+B_k^x\sin(kt) \right)\]
\[y(t)=\displaystyle\sum_{k=0}^\infty\left(A_k^y\cos(kt)+B_k^y\sin(kt) \right)\]
Here's a paper proving that von Neumann's claim is valid! 🔗 aapt.scitation.org/doi/10.1119
#Neumann #JohnVonNeumann #VonNeumann #FourierSeries #parameters #complexparameters #parametrization #mathematics #maths